AdSMOTE: A Technique for “High Proportion” Audio Augmentation

Authors: Jayathunge, K., Yang, X. and Southern, R.

Journal: Proceedings of the Inaugural 2023 Summer Symposium Series 2023

Pages: 14-18

DOI: 10.1609/aaaiss.v1i1.27468

Abstract:

Data augmentation is a practice that is widely used in the fields of machine and deep learning. It is used primarily for its effectiveness in reducing the generalisation gap between training and validation, as well as to artificially increase in available training data points. This is particularly relevant to audio datasets, which are usually smaller and suffer from imbalanced classes in some applications. This work presents adSMOTE (audio SMOTE), a novel sampling and augmentation strategy and also compares it to Specaugment, one of the most effective augmentation strategies for audio data. We show that our method outperforms the latter by a considerable margin when the proportion of synthetic training samples is high. We also provide source code for the complete algorithm, which can easily be integrated into an existing model, enabling the rapid development of augmentation frameworks.

https://eprints.bournemouth.ac.uk/39202/

Source: Scopus

AdSMOTE: A Technique for “High Proportion” Audio Augmentation

Authors: Jayathunge, K., Yang, X. and Southern, R.

Editors: Soh, H., Geib, C. and Petrick, R.

Volume: 1

Pages: 14-18

Publisher: AAAI Publications

Place of Publication: Washington, DC

Abstract:

Data augmentation is a practice that is widely used in the fields of machine and deep learning. It is used primarily for its effectiveness in reducing the generalisation gap between training and validation, as well as to artificially increase in available training data points. This is particularly relevant to audio datasets, which are usually smaller and suffer from imbalanced classes in some applications. This work presents adSMOTE (audio SMOTE), a novel sampling and augmentation strategy and also compares it to Specaugment, one of the most effective augmentation strategies for audio data. We show that our method outperforms the latter by a considerable margin when the proportion of synthetic training samples is high. We also provide source code for the complete algorithm, which can easily be integrated into an existing model, enabling the rapid development of augmentation frameworks.

https://eprints.bournemouth.ac.uk/39202/

https://ojs.aaai.org/index.php/AAAI-SS/article/view/27468

Source: BURO EPrints