Multi-Task Learning for Audio-Based Infant Cry Detection and Reasoning.

Journal: IEEE journal of biomedical and health informatics
Published Date:

Abstract

Infant cry is a crucial indicator that offers valuable insights into their physical and mental conditions, such as hunger and pain. However, the scarcity of infant cry datasets hinders the model's generalization in real-life scenarios. The varying voiceprint characteristics among infants further exacerbate this challenge, deteriorating the model's performance on unseen infants. To this end, we propose a multi-task model for Infant Cry Detection and Reasoning (ICDR). It leverages datasets from two tasks to enrich data diversity and introduces an efficient attention module to achieve inter-task feature supplementarity. To mitigate the impact of subject differences, ICDR introduces an intra-task contrastive mixture of experts (CMoE) module that adaptively allocates experts to reduce subject variance and applies contrastive learning to enhance the representation consistency of samples from different infants in the same state. Extensive cross-subject experiments show that ICDR outperforms the state-of-the-art models in infant cry detection and reasoning, with an improvement of 2-9% in the F1-score. This demonstrates that multi-task learning effectively enhances the model's generalization ability by inter-task attention and intra-task CMoE.

Authors

  • Ming Xia
    Department of Neurosurgery, First Affiliated Hospital of Xinjiang Medical University, Urumqi 830054, China.
  • Dongmin Huang
  • Wenjin Wang