With the increasing global prevalence of disabling hearing loss, speech enhancement technologies have become crucial for overcoming communication barriers and improving the quality of life for those affected. Multimodal learning has emerged as a powe...
Journal of speech, language, and hearing research : JSLHR
Mar 31, 2025
PURPOSE: Phonetic forced alignment has a multitude of applications in automated analysis of speech, particularly in studying nonstandard speech such as children's speech. Manual alignment is tedious but serves as the gold standard for clinical-grade ...
Neural networks : the official journal of the International Neural Network Society
Mar 27, 2025
Target speaker extraction aims to obtain the speech of the specific speaker from a mixture of multiple voices. The conventional approach exploits the target speaker embeddings from a pre-recorded speech segment as auxiliary information, providing pri...
Neural networks : the official journal of the International Neural Network Society
Mar 26, 2025
The current dominant approach for neural speech enhancement is based on supervised learning by using simulated training data. The trained models, however, often exhibit limited generalizability to real-recorded data. To address this, this paper inves...
The focus on Speech Emotion Recognition has dramatically increased in recent years, driven by the need for automatic speech-recognition-based systems and intelligent assistants to enhance user experience by incorporating emotional content. While deep...
Computer methods and programs in biomedicine
Mar 18, 2025
BACKGROUND AND OBJECTIVE: Emotion recognition in conversations using artificial intelligence (AI) has gained significant attention due to its potential to provide insights into human social behavior. This study extends AI-based emotion recognition to...
Speech emotion recognition (SER) is an important application in Affective Computing and Artificial Intelligence. Recently, there has been a significant interest in Deep Neural Networks using speech spectrograms. As the two-dimensional representation ...
The human voice stands out for its rich information transmission capabilities. However, voice communication is susceptible to interference from noisy environments and obstacles. Here, we propose a wearable wireless flexible skin-attached acoustic sen...
This study introduces a unified computational framework connecting acoustic, speech and word-level linguistic structures to study the neural basis of everyday conversations in the human brain. We used electrocorticography to record neural signals acr...
Join thousands of healthcare professionals staying informed about the latest AI breakthroughs in medicine. Get curated insights delivered to your inbox.