DroneAttention: Sparse weighted temporal attention for drone-camera based activity recognition

Authors: Yadav, S.K., Luthra, A., Pahwa, E., Tiwari, K., Rathore, H., Pandey, H.M. and Corcoran, P.

Journal: Neural Networks

Volume: 159

Pages: 57-69

eISSN: 1879-2782

ISSN: 0893-6080

DOI: 10.1016/j.neunet.2022.12.005

Abstract:

Human activity recognition (HAR) using drone-mounted cameras has attracted considerable interest from the computer vision research community in recent years. A robust and efficient HAR system has a pivotal role in fields like video surveillance, crowd behavior analysis, sports analysis, and human–computer interaction. What makes it challenging are the complex poses, understanding different viewpoints, and the environmental scenarios where the action is taking place. To address such complexities, in this paper, we propose a novel Sparse Weighted Temporal Attention (SWTA) module to utilize sparsely sampled video frames for obtaining global weighted temporal attention. The proposed SWTA is comprised of two parts. First, temporal segment network that sparsely samples a given set of frames. Second, weighted temporal attention, which incorporates a fusion of attention maps derived from optical flow, with raw RGB images. This is followed by a basenet network, which comprises a convolutional neural network (CNN) module along with fully connected layers that provide us with activity recognition. The SWTA network can be used as a plug-in module to the existing deep CNN architectures, for optimizing them to learn temporal information by eliminating the need for a separate temporal stream. It has been evaluated on three publicly available benchmark datasets, namely Okutama, MOD20, and Drone-Action. The proposed model has received an accuracy of 72.76%, 92.56%, and 78.86% on the respective datasets thereby surpassing the previous state-of-the-art performances by a margin of 25.26%, 18.56%, and 2.94%, respectively.

https://eprints.bournemouth.ac.uk/37900/

Source: Scopus

DroneAttention: Sparse weighted temporal attention for drone-camera based activity recognition.

Authors: Yadav, S.K., Luthra, A., Pahwa, E., Tiwari, K., Rathore, H., Pandey, H.M. and Corcoran, P.

Journal: Neural Netw

Volume: 159

Pages: 57-69

eISSN: 1879-2782

DOI: 10.1016/j.neunet.2022.12.005

Abstract:

Human activity recognition (HAR) using drone-mounted cameras has attracted considerable interest from the computer vision research community in recent years. A robust and efficient HAR system has a pivotal role in fields like video surveillance, crowd behavior analysis, sports analysis, and human-computer interaction. What makes it challenging are the complex poses, understanding different viewpoints, and the environmental scenarios where the action is taking place. To address such complexities, in this paper, we propose a novel Sparse Weighted Temporal Attention (SWTA) module to utilize sparsely sampled video frames for obtaining global weighted temporal attention. The proposed SWTA is comprised of two parts. First, temporal segment network that sparsely samples a given set of frames. Second, weighted temporal attention, which incorporates a fusion of attention maps derived from optical flow, with raw RGB images. This is followed by a basenet network, which comprises a convolutional neural network (CNN) module along with fully connected layers that provide us with activity recognition. The SWTA network can be used as a plug-in module to the existing deep CNN architectures, for optimizing them to learn temporal information by eliminating the need for a separate temporal stream. It has been evaluated on three publicly available benchmark datasets, namely Okutama, MOD20, and Drone-Action. The proposed model has received an accuracy of 72.76%, 92.56%, and 78.86% on the respective datasets thereby surpassing the previous state-of-the-art performances by a margin of 25.26%, 18.56%, and 2.94%, respectively.

https://eprints.bournemouth.ac.uk/37900/

Source: PubMed

DroneAttention: Sparse weighted temporal attention for drone-camera based activity recognition

Authors: Yadav, S.K., Luthra, A., Pahwa, E., Tiwari, K., Rathore, H., Pandey, H.M. and Corcoran, P.

Journal: NEURAL NETWORKS

Volume: 159

Pages: 57-69

eISSN: 1879-2782

ISSN: 0893-6080

DOI: 10.1016/j.neunet.2022.12.005

https://eprints.bournemouth.ac.uk/37900/

Source: Web of Science (Lite)

DroneAttention: Sparse Weighted Temporal Attention for Drone-Camera Based Activity Recognition

Authors: Pandey, H., Yadav, S.K., Luthra, A., Pahwa, E., Tiwari, K., Rathore, H. and Corcoran, P.

Journal: Neural Networks

Publisher: Elsevier

ISSN: 0893-6080

Abstract:

Human activity recognition (HAR) using drone-mounted cameras has attracted considerable interest from the computer vision research community in recent years. A robust and efficient HAR system has a pivotal role in fields like video surveillance, crowd behavior analysis, sports analysis, and human-computer interaction. What makes it challenging are the complex poses, understanding different viewpoints, and the environmental scenarios where the action is taking place. To address such complexities, in this paper, we propose a novel Sparse Weighted Temporal Attention (SWTA) module to utilize sparsely sampled video frames for obtaining global weighted temporal attention. The proposed SWTA is comprised of two parts. First, temporal segment network that sparsely samples a given set of frames. Second, weighted temporal attention, which incorporates a fusion of attention maps derived from optical flow, with raw RGB images. This is followed by a basenet network, which comprises a convolutional neural network (CNN) module along with fully connected layers that provide us with activity recognition. The SWTA network can be used as a plug-in module to the existing deep CNN architectures, for optimizing them to learn temporal information by eliminating the need for a separate temporal stream. It has been evaluated on three publicly available benchmark datasets, namely Okutama, MOD20, and Drone-Action. The proposed model has received an accuracy of 72.76%, 92.56%, and 78.86% on the respective datasets thereby surpassing the previous state-of-the-art performances by a margin of 25.26%, 18.56%, and 2.94%, respectively.

https://eprints.bournemouth.ac.uk/37900/

https://www.sciencedirect.com/journal/neural-networks?cat0=nursing&cat1=informatics

Source: Manual

DroneAttention: Sparse weighted temporal attention for drone-camera based activity recognition.

Authors: Yadav, S.K., Luthra, A., Pahwa, E., Tiwari, K., Rathore, H., Pandey, H.M. and Corcoran, P.

Journal: Neural networks : the official journal of the International Neural Network Society

Volume: 159

Pages: 57-69

eISSN: 1879-2782

ISSN: 0893-6080

DOI: 10.1016/j.neunet.2022.12.005

Abstract:

Human activity recognition (HAR) using drone-mounted cameras has attracted considerable interest from the computer vision research community in recent years. A robust and efficient HAR system has a pivotal role in fields like video surveillance, crowd behavior analysis, sports analysis, and human-computer interaction. What makes it challenging are the complex poses, understanding different viewpoints, and the environmental scenarios where the action is taking place. To address such complexities, in this paper, we propose a novel Sparse Weighted Temporal Attention (SWTA) module to utilize sparsely sampled video frames for obtaining global weighted temporal attention. The proposed SWTA is comprised of two parts. First, temporal segment network that sparsely samples a given set of frames. Second, weighted temporal attention, which incorporates a fusion of attention maps derived from optical flow, with raw RGB images. This is followed by a basenet network, which comprises a convolutional neural network (CNN) module along with fully connected layers that provide us with activity recognition. The SWTA network can be used as a plug-in module to the existing deep CNN architectures, for optimizing them to learn temporal information by eliminating the need for a separate temporal stream. It has been evaluated on three publicly available benchmark datasets, namely Okutama, MOD20, and Drone-Action. The proposed model has received an accuracy of 72.76%, 92.56%, and 78.86% on the respective datasets thereby surpassing the previous state-of-the-art performances by a margin of 25.26%, 18.56%, and 2.94%, respectively.

https://eprints.bournemouth.ac.uk/37900/

Source: Europe PubMed Central

DroneAttention: Sparse weighted temporal attention for drone-camera based activity recognition

Authors: Yadav, S.K., Luthra, A., Pahwa, E., Tiwari, K., Rathore, H., Pandey, H. and Corcoran, P.

Journal: Neural Networks

Volume: 159

Pages: 57-69

Publisher: Elsevier

ISSN: 0893-6080

Abstract:

Human activity recognition (HAR) using drone-mounted cameras has attracted considerable interest from the computer vision research community in recent years. A robust and efficient HAR system has a pivotal role in fields like video surveillance, crowd behavior analysis, sports analysis, and human-computer interaction. What makes it challenging are the complex poses, understanding different viewpoints, and the environmental scenarios where the action is taking place. To address such complexities, in this paper, we propose a novel Sparse Weighted Temporal Attention (SWTA) module to utilize sparsely sampled video frames for obtaining global weighted temporal attention. The proposed SWTA is comprised of two parts. First, temporal segment network that sparsely samples a given set of frames. Second, weighted temporal attention, which incorporates a fusion of attention maps derived from optical flow, with raw RGB images. This is followed by a basenet network, which comprises a convolutional neural network (CNN) module along with fully connected layers that provide us with activity recognition. The SWTA network can be used as a plug-in module to the existing deep CNN architectures, for optimizing them to learn temporal information by eliminating the need for a separate temporal stream. It has been evaluated on three publicly available benchmark datasets, namely Okutama, MOD20, and Drone-Action. The proposed model has received an accuracy of 72.76%, 92.56%, and 78.86% on the respective datasets thereby surpassing the previous state-of-the-art performances by a margin of 25.26%, 18.56%, and 2.94%, respectively.

https://eprints.bournemouth.ac.uk/37900/

https://www.sciencedirect.com/journal/neural-networks?cat0=nursing&cat1=informatics

Source: BURO EPrints