Phone duration modeling: overview of techniques and performance optimization via feature selection in the context of emotional speech.

This source preferred by Theodoros Kostoulas

This data was imported from DBLP:

Authors: Lazaridis, A., Ganchev, T., Kostoulas, T., Mporas, I. and Fakotakis, N.

Journal: I. J. Speech Technology

Volume: 13

Pages: 175-188

This data was imported from Scopus:

Authors: Lazaridis, A., Ganchev, T., Kostoulas, T., Mporas, I. and Fakotakis, N.

Journal: International Journal of Speech Technology

Volume: 13

Issue: 3

Pages: 175-188

eISSN: 1572-8110

ISSN: 1381-2416

DOI: 10.1007/s10772-010-9077-x

Accurate modeling of prosody is prerequisite for the production of synthetic speech of high quality. Phone duration, as one of the key prosodic parameters, plays an important role for the generation of emotional synthetic speech with natural sounding. In the present work we offer an overview of various phone duration modeling techniques, and consequently evaluate ten models, based on decision trees, linear regression, lazy-learning algorithms and meta-learning algorithms, which over the past decades have been successfully used in various modeling tasks. Furthermore, we study the opportunity for performance optimization by applying two feature selection techniques, the RReliefF and the Correlation-based Feature Selection, on a large set of numerical and nominal linguistic features extracted from text, such as: phonetic, phonologic and morphosyntactic ones, which have been reported successful on the phone and syllable duration modeling task. We investigate the practical usefulness of these phone duration modeling techniques on a Modern Greek emotional speech database, which consists of five categories of emotional speech: anger, fear, joy, neutral, sadness. The experimental results demonstrated that feature selection significantly improves the accuracy of phone duration prediction regardless of the type of machine learning algorithm used for phone duration modeling. Specifically, in four out of the five categories of emotional speech, feature selection contributed to the improvement of the phone duration modeling, when compared to the case without feature selection. The M5p trees based phone duration model was observed to achieve the best phone duration prediction accuracy in terms of RMSE and MAE. © 2010 Springer Science+Business Media, LLC.

The data on this page was last updated at 04:53 on March 24, 2019.