WTM: Weighted Temporal Attention Module for Group Activity Recognition

Authors: Yadav, S.K., Agrawal, P., Tiwari, K., Adeli, E., Pandey, H.M. and Akbar, S.A.

Journal: Proceedings of the International Joint Conference on Neural Networks

Volume: 2022-July

ISBN: 9781728186719

DOI: 10.1109/IJCNN55064.2022.9892215

Abstract:

Group Activity Recognition requires spatiotemporal modeling of an exponential number of semantic and geometric relations among various individuals in a scene. Previous attempts model these relations by aggregating independently derived spatial and temporal features. This increases the modeling complexity and results in sparse information due to lack of feature correlation. In this paper, we propose Weighted Temporal Attention Mechanism (WTM), a representational mechanism that combines spatial and temporal features of a local subset of a visual sequence into a single 2D image representation, highlighting areas of a frame where actor motion is significant. Pairwise dense optical flow maps representing the temporal characteristic of individuals over a sequence are used as attention masks over raw RGB images through a multi-layer weighted aggregation. We demonstrate a strong correlation between spatial and temporal features, which helps localize actions effectively in a multi-person scenario. The simplicity of the input representation allows the model to be trained by 2D image classification architectures in a plug-and-play fashion, which outperforms its multi-stream and multi-dimensional counterparts. The proposed method achieves the lowest computational complexity in comparison to other works. We demonstrate the performance of WTM on two widely used public benchmark datasets, namely the Collective Activity Dataset (CAD) and the Volleyball Dataset. and achieve state-of-the-art accuracies of 95.1% and 94.6% respectively. We also discuss the application of this method to other datasets and general scenarios. The code is being made publicly available.

https://eprints.bournemouth.ac.uk/36997/

Source: Scopus

WTM: Weighted Temporal Attention Module for Group Activity Recognition

Authors: Yadav, S.K., Agrawal, P., Tiwari, K., Adeli, E., Pandey, H.M. and Akbar, S.A.

Journal: 2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN)

ISSN: 2161-4393

DOI: 10.1109/IJCNN55064.2022.9892215

https://eprints.bournemouth.ac.uk/36997/

Source: Web of Science (Lite)

WTM: Weighted Temporal Attention Module for Group Activity Recognition

Authors: Yadav, S., Agrawal, P., Tiwari, K., Adeli, E., Pandey, H. and Akbar, S.A.

Conference: IEEE WCCI 2022 International Joint Conference on Neural Networks (IJCNN 2022)

Dates: 18-23 July 2022

Journal: IEEE

Abstract:

Group Activity Recognition requires spatiotemporal modeling of an exponential number of semantic and geometric relations among various individuals in a scene. Previous attempts model these relations by aggregating independently derived spatial and temporal features. This increases the modeling complexity and results in sparse information due to lack of feature correlation. In this paper, we propose Weighted Temporal Attention Mechanism (WTM), a representational mechanism that combines spatial and temporal features of a local subset of a visual sequence into a single 2D image representation, highlighting areas of a frame where actor motion is significant. Pairwise dense optical flow maps representing the temporal characteristic of individuals over a sequence are used as attention masks over raw RGB images through a multi-layer weighted aggregation. We demonstrate a strong correlation between spatial and temporal features, which helps localize actions effectively in a multi-person scenario. The simplicity of the input representation allows the model to be trained by 2D image classification architectures in a plug-and-play fashion, which outperforms its multi-stream and multi-dimensional counterparts. The proposed method achieves the lowest computational complexity in comparison to other works. We demonstrate the performance of WTM on two widely used public benchmark datasets, namely the Collective Activity Dataset (CAD) and the Volleyball Dataset. and achieve state-of-the-art accuracies of 95.1% and 94.6% respectively. We also discuss the application of this method to other datasets and general scenarios. The code is being made publicly available.

https://eprints.bournemouth.ac.uk/36997/

Source: Manual

WTM: Weighted Temporal Attention Module for Group Activity Recognition

Authors: Yadav, S., Agrawal, P., Tiwari, K., Adeli, E., Pandey, H. and Akbar, S.A.

Conference: IEEE WCCI 2022 International Joint Conference on Neural Networks (IJCNN 2022)

Abstract:

Group Activity Recognition requires spatiotemporal modeling of an exponential number of semantic and geometric relations among various individuals in a scene. Previous attempts model these relations by aggregating independently derived spatial and temporal features. This increases the modeling complexity and results in sparse information due to lack of feature correlation. In this paper, we propose Weighted Temporal Attention Mechanism (WTM), a representational mechanism that combines spatial and temporal features of a local subset of a visual sequence into a single 2D image representation, highlighting areas of a frame where actor motion is significant. Pairwise dense optical flow maps representing the temporal characteristic of individuals over a sequence are used as attention masks over raw RGB images through a multi-layer weighted aggregation. We demonstrate a strong correlation between spatial and temporal features, which helps localize actions effectively in a multi-person scenario. The simplicity of the input representation allows the model to be trained by 2D image classification architectures in a plug-and-play fashion, which outperforms its multi-stream and multi-dimensional counterparts. The proposed method achieves the lowest computational complexity in comparison to other works. We demonstrate the performance of WTM on two widely used public benchmark datasets, namely the Collective Activity Dataset (CAD) and the Volleyball Dataset. and achieve state-of-the-art accuracies of 95.1% and 94.6% respectively. We also discuss the application of this method to other datasets and general scenarios. The code is being made publicly available.

https://eprints.bournemouth.ac.uk/36997/

Source: BURO EPrints