AIMC Topic: Speech

Clear Filters Showing 11 to 20 of 368 articles

LSTM autoencoder based parallel architecture for deepfake audio detection with dynamic residual encoding and feature fusion.

Scientific reports
With the rapid advancement of synthetic speech technologies, detecting deepfake audio has become essential for preventing impersonation and misinformation. This study aims to enhance detection performance by addressing limitations in existing models,...

End-to-end feature fusion for jointly optimized speech enhancement and automatic speech recognition.

Scientific reports
Speech enhancement (SE) and automatic speech recognition (ASR) in real-time processing involve improving the quality and intelligibility of speech signals on the fly, ensuring accurate transcription as the speech unfolds. SE eliminates unwanted backg...

MS-EmoBoost: a novel strategy for enhancing self-supervised speech emotion representations.

Scientific reports
Extracting richer emotional representations from raw speech is one of the key approaches to improving the accuracy of Speech Emotion Recognition (SER). In recent years, there has been a trend in utilizing self-supervised learning (SSL) for extracting...

Speech imagery brain-computer interfaces: a systematic literature review.

Journal of neural engineering
Speech Imagery (SI) refers to the mental experience of hearing speech and may be the core of verbal thinking for people who undergo internal monologues. It belongs to the set of possible mental imagery states that produce kinesthetic experiences whos...

A novel Swin transformer based framework for speech recognition for dysarthria.

Scientific reports
Dysarthria frequently occurs in individuals with disorders such as stroke, Parkinson's disease, cerebral palsy, and other neurological disorders. Well-timed detection and management of dysarthria in these patients is imperative for efficiently handli...

AI-powered remote monitoring of brain responses to clear and incomprehensible speech via speckle pattern analysis.

Journal of biomedical optics
SIGNIFICANCE: Functional magnetic resonance imaging provides high spatial resolution but is limited by cost, infrastructure, and the constraints of an enclosed scanner. Portable methods such as functional near-infrared spectroscopy and electroencepha...

Feature and classifier-level domain adaptation in DistilHuBERT for cross-corpus speech emotion recognition.

Computers in biology and medicine
Cross-corpus speech emotion recognition (CCSER) aims to develop robust models capable of accurately identifying a speaker's emotional state across diverse datasets. This task is challenged by variations in dataset characteristics, such as differences...

Exploring voice as a digital phenotype in adults with ADHD.

Scientific reports
Current diagnostic procedures for attention deficit hyperactivity disorder (ADHD) are mainly subjective and prone to bias. While research on potential biomarkers, including EEG, brain imaging, and genetics is promising, it has yet to demonstrate clin...

Single-microphone deep envelope separation based auditory attention decoding for competing speech and music.

Journal of neural engineering
In this study, we introduce an end-to-end single microphone deep learning system for source separation and auditory attention decoding (AAD) in a competing speech and music setup. Deep source separation is applied directly on the envelope of the obse...

A Dataset of Real and Synthetic Speech in Ukrainian.

Scientific data
This work is dedicated to the analysis and evaluation of the DRSSU dataset: A Dataset of Real and Synthetic Speech in Ukrainian, created to support research in the field of natural language processing and speech recognition. The dataset contains a un...