AIMC Topic: Speech

Clear Filters Showing 21 to 30 of 368 articles

A Multimodal Approach for Early Identification of Mild Cognitive Impairment and Alzheimer's Disease With Fusion Network Using Eye Movements and Speech.

IEEE transactions on neural systems and rehabilitation engineering : a publication of the IEEE Engineering in Medicine and Biology Society
Detecting Alzheimer's disease (AD) in its earliest stages, particularly during an onset of Mild Cognitive Impairment (MCI), remains challenging due to the overlap of initial symptoms with normal aging processes. Given that no cure exists and current ...

Does Musical Experience Facilitate Phonetic Accommodation During Human-Robot Interaction?

Journal of speech, language, and hearing research : JSLHR
PURPOSE: This study investigated the effect of musical training on phonetic accommodation in a second language (L2) after interacting with a social robot, exploring the motivations and reasons behind their accommodation strategies.

A comprehensive framework for multi-modal hate speech detection in social media using deep learning.

Scientific reports
As social media platforms evolve, hate speech increasingly manifests across multiple modalities, including text, images, audio, and video, challenging traditional detection systems focused on single modalities. Hence, this research proposes a novel M...

CMDF-TTS: Text-to-speech method with limited target speaker corpus.

Neural networks : the official journal of the International Neural Network Society
While end-to-end Text-to-Speech (TTS) methods with limited target speaker corpus can generate high-quality speech, they often require a non-target speaker corpus (auxiliary corpus) which contains a substantial amount of pairs to train ...

Natural language processing models reveal neural dynamics of human conversation.

Nature communications
Through conversation, humans engage in a complex process of alternating speech production and comprehension to communicate. The neural mechanisms that underlie these complementary processes through which information is precisely conveyed by language,...

Speech emotion recognition with light weight deep neural ensemble model using hand crafted features.

Scientific reports
Automatic emotion detection has become crucial in various domains, such as healthcare, neuroscience, smart home technologies, and human-computer interaction (HCI). Speech Emotion Recognition (SER) has attracted considerable attention because of its p...

DEMENTIA: A Hybrid Attention-Based Multimodal and Multi-Task Learning Framework With Expert Knowledge for Alzheimer's Disease Assessment From Speech.

IEEE journal of biomedical and health informatics
The prevalence of Alzheimer's disease (AD) is rising annually, imposing a severe burden on patients and society. Therefore, assisted AD assessment is crucial. The decline in language function and the cognitive impairment it reflects are key external ...

Multimodal learning-based speech enhancement and separation, recent innovations, new horizons, challenges and real-world applications.

Computers in biology and medicine
With the increasing global prevalence of disabling hearing loss, speech enhancement technologies have become crucial for overcoming communication barriers and improving the quality of life for those affected. Multimodal learning has emerged as a powe...

A Tunable Forced Alignment System Based on Deep Learning: Applications to Child Speech.

Journal of speech, language, and hearing research : JSLHR
PURPOSE: Phonetic forced alignment has a multitude of applications in automated analysis of speech, particularly in studying nonstandard speech such as children's speech. Manual alignment is tedious but serves as the gold standard for clinical-grade ...

Enhancing target speaker extraction with Hierarchical Speaker Representation Learning.

Neural networks : the official journal of the International Neural Network Society
Target speaker extraction aims to obtain the speech of the specific speaker from a mixture of multiple voices. The conventional approach exploits the target speaker embeddings from a pre-recorded speech segment as auxiliary information, providing pri...