Speech enhancement (SE) and automatic speech recognition (ASR) in real-time processing involve improving the quality and intelligibility of speech signals on the fly, ensuring accurate transcription as the speech unfolds. SE eliminates unwanted backg...
Extracting richer emotional representations from raw speech is one of the key approaches to improving the accuracy of Speech Emotion Recognition (SER). In recent years, there has been a trend in utilizing self-supervised learning (SSL) for extracting...
Dysarthria frequently occurs in individuals with disorders such as stroke, Parkinson's disease, cerebral palsy, and other neurological disorders. Well-timed detection and management of dysarthria in these patients is imperative for efficiently handli...
In recent years, empowered by artificial intelligence technologies, computer-assisted language learning systems have gradually become a hot topic of research. Currently, the mainstream pronunciation assessment models rely on advanced speech recogniti...
This work is dedicated to the analysis and evaluation of the DRSSU dataset: A Dataset of Real and Synthetic Speech in Ukrainian, created to support research in the field of natural language processing and speech recognition. The dataset contains a un...
Understanding speech in noisy environments is a primary challenge for individuals with hearing loss, affecting daily communication and quality of life. Traditional speech-in-noise tests are essential for screening and diagnosing hearing loss but are ...
As a vital tool for human-computer interaction, artificial intelligence (AI) voice assistants have become an integral part of individuals' everyday routines. However, there are still a series of problems caused by privacy violations in current use. T...
The human voice stands out for its rich information transmission capabilities. However, voice communication is susceptible to interference from noisy environments and obstacles. Here, we propose a wearable wireless flexible skin-attached acoustic sen...
In the medical field, text annotation involves categorizing clinical and biomedical texts with specific medical categories, enhancing the organization and interpretation of large volumes of unstructured data. This process is crucial for developing to...
IEEE transactions on pattern analysis and machine intelligence
Jan 9, 2025
Visual Speech Recognition (VSR) aims to infer speech into text depending on lip movements alone. As it focuses on visual information to model the speech, its performance is inherently sensitive to personal lip appearances and movements, and this make...
Join thousands of healthcare professionals staying informed about the latest AI breakthroughs in medicine. Get curated insights delivered to your inbox.