Speech emotion recognition based on formant characteristics feature extraction and phoneme type convergence

Authors: Liu, Z.T., Rehman, A., Wu, M., Cao, W.H., Hao, M.

Journal: Information Sciences

Publication Date: 01/07/2021

Volume: 563

Pages: 309-325

ISSN: 0020-0255

DOI: 10.1016/j.ins.2021.02.016

Abstract:

Speech Emotion Recognition (SER) has numerous applications including human-robot interaction, online gaming, and health care assistance. While deep learning-based approaches achieve considerable precision, they often come with high computational and time costs. Indeed, feature learning strategies must search for important features in a large amount of speech data. In order to reduce these time and computational costs, we propose pre-processing step in which speech segments with similar formant characteristics are clustered together and labeled as the same phoneme. The phoneme occurrence rates in emotional utterances are then used as the input features for classifiers. Using six databases (EmoDB, RAVDESS, IEMOCAP, ShEMO, DEMoS and MSP-Improv) for evaluation, the level of accuracy is comparable to that of current state-of-the-art methods and the required training time was significantly reduced from hours to minutes.

Source: Scopus