Multi-Task Learning for Improved Recognition of Multiple Types of Acoustic Information

We propose a new method for improving the recognition performance of phonemes, speech emotions, and music genres using multi-task learning. When tasks are closely related, multi-task learning can improve the performance of each task by learning common feature representation for all the tasks. Howeve...

Full description

Saved in:

Bibliographic Details
Published in	IEICE Transactions on Information and Systems Vol. E104.D; no. 10; pp. 1762 - 1765
Main Authors	KIM, Jae-Won, PARK, Hochong
Format	Journal Article
Language	English
Published	Tokyo The Institute of Electronics, Information and Communication Engineers 01.10.2021 Japan Science and Technology Agency
Subjects	emotion recognition Information sources Learning multi-task learning Music music genre recognition neural network Performance enhancement phoneme recognition Speech recognition Training
Online Access	Get full text

Cover

Loading…

More Information
Summary:	We propose a new method for improving the recognition performance of phonemes, speech emotions, and music genres using multi-task learning. When tasks are closely related, multi-task learning can improve the performance of each task by learning common feature representation for all the tasks. However, the recognition tasks considered in this study demand different input signals of speech and music at different time scales, resulting in input features with different characteristics. In addition, a training dataset with multiple labels for all information sources is not available. Considering these issues, we conduct multi-task learning in a sequential training process using input features with a single label for one information source. A comparative evaluation confirms that the proposed method for multi-task learning provides higher performance for all recognition tasks than individual learning for each task as in conventional methods.
ISSN:	0916-8532 1745-1361
DOI:	10.1587/transinf.2021EDL8029