A novel dynamic gesture understanding algorithm fusing convolutional neural networks with hand-crafted features

Authors: Liu, Y., Song, S., Yang, L., Bian, G. and Yu, H.

Journal: Journal of Visual Communication and Image Representation

Volume: 83

eISSN: 1095-9076

ISSN: 1047-3203

DOI: 10.1016/j.jvcir.2022.103454


Dynamic gestures have attracted much attention in recent years due to their user-friendly interactive characteristics. However, accurate and efficient dynamic gesture understanding remains a challenge due to complex scenarios and motion information. Conventional handcrafted features are computationally cheap but can only extract low-level image features. This leads to performance degradation when dealing with complex scenes. In contrast, deep learning-based methods have a stronger feature expression ability and hence can capture more abstract and high-level image features. However, they critically rely on a large amount of training data. To address the above issues, a novel dynamic gesture understanding algorithm based on feature fusion is proposed for accurate dynamic gesture prediction. It leverages the advantages of handcrafted features and transfer learning. Aimed at small-scale dynamic gesture data, transfer learning is introduced for capturing effective feature expression. To precisely model the critical temporal information associated with dynamic gestures, a novel feature descriptor, namely, AlexNet2, is proposed for effective feature expression of dynamic gestures from the spatial and temporal domain. On this basis, a decision-level feature fusion framework based on support vector machine (SVM) and Dempster–Shafer (DS) evidence theory is constructed to utilize handcrafted features and AlexNet2 to realize high-precision dynamic gesture understanding. To verify the effectiveness and robustness of the proposed recognition algorithm, analysis and comparison experiments are performed on the public Cambridge gesture dataset and Northwestern University hand gesture dataset. The proposed gesture recognition algorithm achieves prediction accuracies of 99.50% and 96.97% on these two datasets. Experimental results show that the proposed recognition framework exhibits a better recognition performance in comparison with related prediction algorithms.

Source: Scopus