AIMC Topic: Speech

Clear Filters Showing 1 to 10 of 368 articles

A deep learning framework for gender sensitive speech emotion recognition based on MFCC feature selection and SHAP analysis.

Scientific reports
Speech is one of the most efficient methods of communication among humans, inspiring advancements in machine speech processing under Natural Language Processing (NLP). This field aims to enable computers to analyze, comprehend, and generate human lan...

A dataset for recognition of Arabic accents from spoken L2 English speech (ArL2Eng).

Scientific data
This paper introduces the ArL2Eng dataset, a speech corpus of L2 English produced by native speakers of Arabic, and highlights its potential in supporting research into automated language assessment. ArL2Eng comprises audio sequences from speakers of...

EEG-based speech imagery decoding by dynamic hypergraph learning within projected and selected feature subspaces.

Journal of neural engineering
Speech imagery is a nascent paradigm that is receiving widespread attention in current brain-computer interface (BCI) research. By collecting the electroencephalogram (EEG) data generated when imagining the pronunciation of a sentence or word in huma...

Multilingual identification of nuanced dimensions of hope speech in social media texts.

Scientific reports
Hope plays a crucial role in human psychology and well-being, yet its expression and detection across languages remain underexplored in natural language processing (NLP). This study presents MIND-HOPE, the first-ever multiclass hope speech detection ...

Speech emotion recognition based on a stacked autoencoders optimized by PSO based grass fibrous root optimization.

Scientific reports
Effective speech emotion recognition (SER) poses a significant challenge due to the intricate and subjective nature of human emotions. Recognizing emotional states accurately from speech signals has a broad spectrum of practical applications, such as...

Voice fatigue subtyping through individual modeling of vocal demand reponses.

Scientific reports
Recognizing individual variability is essential for developing targeted, personalized medical interventions. Vocal fatigue is a prevalent symptom and complaint among occupational voice users, but its identification has yielded mixed results. Vocal fa...

Detecting schizophrenia, bipolar disorder, psychosis vulnerability and major depressive disorder from 5 minutes of online-collected speech.

Translational psychiatry
Psychosis poses substantial social and healthcare burdens. The analysis of speech is a promising approach for the diagnosis and monitoring of psychosis, capturing symptoms like thought disorder and flattened affect. Recent advancements in Natural Lan...

An enhanced deep learning approach for speaker diarization using TitaNet, MarbelNet and time delay network.

Scientific reports
Speaker diarization, identifying "who spoke when," plays a vital role in speech transcription, supervised fine-tuning of large language models, conversational AI, and audio content analysis by providing labeled speaker segments. Traditional speaker d...

Evaluating Mandarin tone pronunciation accuracy for second language learners using a ResNet-based Siamese network.

Scientific reports
Evaluating tone pronunciation is essential for helping second-language (L2) learners master the intricate nuances of Mandarin tones. This article introduces an innovative automatic evaluation method for Mandarin tone pronunciation that employs a Siam...

Prediction of suicide using web based voice recordings analyzed by artificial intelligence.

Scientific reports
The integration of machine learning (ML) and deep learning models in suicide risk assessment has advanced significantly in recent years. In this study, we utilized ML in a case-control design, we predicted completed suicides using publicly available,...