AIMC Topic: Speech Perception

Clear Filters Showing 91 to 100 of 114 articles

On training targets for deep learning approaches to clean speech magnitude spectrum estimation.

The Journal of the Acoustical Society of America
Estimation of the clean speech short-time magnitude spectrum (MS) is key for speech enhancement and separation. Moreover, an automatic speech recognition (ASR) system that employs a front-end relies on clean speech MS estimation to remain robust. Tra...

Early phonetic learning without phonetic categories: Insights from large-scale simulations on realistic input.

Proceedings of the National Academy of Sciences of the United States of America
Before they even speak, infants become attuned to the sounds of the language(s) they hear, processing native phonetic contrasts more easily than nonnative ones. For example, between 6 to 8 mo and 10 to 12 mo, infants learning American English get bet...

A two-stage deep learning algorithm for talker-independent speaker separation in reverberant conditions.

The Journal of the Acoustical Society of America
Speaker separation is a special case of speech separation, in which the mixture signal comprises two or more speakers. Many talker-independent speaker separation methods have been introduced in recent years to address this problem in anechoic conditi...

Computational framework for fusing eye movements and spoken narratives for image annotation.

Journal of vision
Despite many recent advances in the field of computer vision, there remains a disconnect between how computers process images and how humans understand them. To begin to bridge this gap, we propose a framework that integrates human-elicited gaze and ...

A talker-independent deep learning algorithm to increase intelligibility for hearing-impaired listeners in reverberant competing talker conditions.

The Journal of the Acoustical Society of America
Deep learning based speech separation or noise reduction needs to generalize to voices not encountered during training and to operate under multiple corruptions. The current study provides such a demonstration for hearing-impaired (HI) listeners. Sen...

EARSHOT: A Minimal Neural Network Model of Incremental Human Speech Recognition.

Cognitive science
Despite the lack of invariance problem (the many-to-many mapping between acoustics and percepts), human listeners experience phonetic constancy and typically perceive what a speaker intends. Most models of human speech recognition (HSR) have side-ste...

Machine Learning Approaches to Analyze Speech-Evoked Neurophysiological Responses.

Journal of speech, language, and hearing research : JSLHR
Purpose Speech-evoked neurophysiological responses are often collected to answer clinically and theoretically driven questions concerning speech and language processing. Here, we highlight the practical application of machine learning (ML)-based appr...

A deep learning algorithm to increase intelligibility for hearing-impaired listeners in the presence of a competing talker and reverberation.

The Journal of the Acoustical Society of America
For deep learning based speech segregation to have translational significance as a noise-reduction tool, it must perform in a wide variety of acoustic environments. In the current study, performance was examined when target speech was subjected to in...

Talker change detection: A comparison of human and machine performance.

The Journal of the Acoustical Society of America
The automatic analysis of conversational audio remains difficult, in part, due to the presence of multiple talkers speaking in turns, often with significant intonation variations and overlapping speech. The majority of prior work on psychoacoustic sp...

Vision-referential speech enhancement of an audio signal using mask information captured as visual data.

The Journal of the Acoustical Society of America
This paper describes a vision-referential speech enhancement of an audio signal using mask information captured as visual data. Smartphones and tablet devices have become popular in recent years. Most of them not only have a microphone but also a cam...