With the rapid advancement of synthetic speech technologies, detecting deepfake audio has become essential for preventing impersonation and misinformation. This study aims to enhance detection performance by addressing limitations in existing models,...
Speech enhancement (SE) and automatic speech recognition (ASR) in real-time processing involve improving the quality and intelligibility of speech signals on the fly, ensuring accurate transcription as the speech unfolds. SE eliminates unwanted backg...
Extracting richer emotional representations from raw speech is one of the key approaches to improving the accuracy of Speech Emotion Recognition (SER). In recent years, there has been a trend in utilizing self-supervised learning (SSL) for extracting...
Speech Imagery (SI) refers to the mental experience of hearing speech and may be the core of verbal thinking for people who undergo internal monologues. It belongs to the set of possible mental imagery states that produce kinesthetic experiences whos...
Dysarthria frequently occurs in individuals with disorders such as stroke, Parkinson's disease, cerebral palsy, and other neurological disorders. Well-timed detection and management of dysarthria in these patients is imperative for efficiently handli...
SIGNIFICANCE: Functional magnetic resonance imaging provides high spatial resolution but is limited by cost, infrastructure, and the constraints of an enclosed scanner. Portable methods such as functional near-infrared spectroscopy and electroencepha...
Cross-corpus speech emotion recognition (CCSER) aims to develop robust models capable of accurately identifying a speaker's emotional state across diverse datasets. This task is challenged by variations in dataset characteristics, such as differences...
Current diagnostic procedures for attention deficit hyperactivity disorder (ADHD) are mainly subjective and prone to bias. While research on potential biomarkers, including EEG, brain imaging, and genetics is promising, it has yet to demonstrate clin...
In this study, we introduce an end-to-end single microphone deep learning system for source separation and auditory attention decoding (AAD) in a competing speech and music setup. Deep source separation is applied directly on the envelope of the obse...
This work is dedicated to the analysis and evaluation of the DRSSU dataset: A Dataset of Real and Synthetic Speech in Ukrainian, created to support research in the field of natural language processing and speech recognition. The dataset contains a un...
Join thousands of healthcare professionals staying informed about the latest AI breakthroughs in medicine. Get curated insights delivered to your inbox.