Multi-Task Learning for Audio-Based Infant Cry Detection and Reasoning

Infant cry is a crucial indicator that offers valuable insights into their physical and mental conditions, such as hunger and pain. However, the scarcity of infant cry datasets hinders the model's generalization in real-life scenarios. The varying voiceprint characteristics among infants furthe...

Full description

Saved in:
Bibliographic Details
Published inIEEE journal of biomedical and health informatics Vol. 28; no. 12; pp. 7434 - 7446
Main Authors Xia, Ming, Huang, Dongmin, Wang, Wenjin
Format Journal Article
LanguageEnglish
Published United States IEEE 01.12.2024
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Infant cry is a crucial indicator that offers valuable insights into their physical and mental conditions, such as hunger and pain. However, the scarcity of infant cry datasets hinders the model's generalization in real-life scenarios. The varying voiceprint characteristics among infants further exacerbate this challenge, deteriorating the model's performance on unseen infants. To this end, we propose a multi-task model for Infant Cry Detection and Reasoning (ICDR). It leverages datasets from two tasks to enrich data diversity and introduces an efficient attention module to achieve inter-task feature supplementarity. To mitigate the impact of subject differences, ICDR introduces an intra-task contrastive mixture of experts (CMoE) module that adaptively allocates experts to reduce subject variance and applies contrastive learning to enhance the representation consistency of samples from different infants in the same state. Extensive cross-subject experiments show that ICDR outperforms the state-of-the-art models in infant cry detection and reasoning, with an improvement of 2-9% in the F1-score. This demonstrates that multi-task learning effectively enhances the model's generalization ability by inter-task attention and intra-task CMoE.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:2168-2194
2168-2208
2168-2208
DOI:10.1109/JBHI.2024.3454097