AI Medical Compendium Topic

Explore the latest research on artificial intelligence and machine learning in medicine.

Speech

Showing 181 to 190 of 336 articles

Clear Filters

Cascaded Convolutional Neural Network Architecture for Speech Emotion Recognition in Noisy Conditions.

Sensors (Basel, Switzerland)
Convolutional neural networks (CNNs) are a state-of-the-art technique for speech emotion recognition. However, CNNs have mostly been applied to noise-free emotional speech data, and limited evidence is available for their applicability in emotional s...

Utterance Level Feature Aggregation with Deep Metric Learning for Speech Emotion Recognition.

Sensors (Basel, Switzerland)
Emotion is a form of high-level paralinguistic information that is intrinsically conveyed by human speech. Automatic speech emotion recognition is an essential challenge for various applications; including mental disease diagnosis; audio surveillance...

Speech signal enhancement in cocktail party scenarios by deep learning based virtual sensing of head-mounted microphones.

Hearing research
The cocktail party effect refers to the human sense of hearing's ability to pay attention to a single conversation while filtering out all other background noise. To mimic this human hearing ability for people with hearing loss, scientists integrate ...

Machine learning accurately classifies neural responses to rhythmic speech vs. non-speech from 8-week-old infant EEG.

Brain and language
Currently there are no reliable means of identifying infants at-risk for later language disorders. Infant neural responses to rhythmic stimuli may offer a solution, as neural tracking of rhythm is atypical in children with developmental language diso...

Convolutional fusion network for monaural speech enhancement.

Neural networks : the official journal of the International Neural Network Society
Convolutional neural network (CNN) based methods, such as the convolutional encoder-decoder network, offer state-of-the-art results in monaural speech enhancement. In the conventional encoder-decoder network, large kernel size is often used to enhanc...

Streaming cascade-based speech translation leveraged by a direct segmentation model.

Neural networks : the official journal of the International Neural Network Society
The cascade approach to Speech Translation (ST) is based on a pipeline that concatenates an Automatic Speech Recognition (ASR) system followed by a Machine Translation (MT) system. Nowadays, state-of-the-art ST systems are populated with deep neural ...

Learning to recognize while learning to speak: Self-supervision and developing a speaking motor.

Neural networks : the official journal of the International Neural Network Society
Traditionally, learning speech synthesis and speech recognition were investigated as two separate tasks. This separation hinders incremental development for concurrent synthesis and recognition, where partially-learned synthesis and partially-learned...

Anti-transfer learning for task invariance in convolutional neural networks for speech processing.

Neural networks : the official journal of the International Neural Network Society
We introduce the novel concept of anti-transfer learning for speech processing with convolutional neural networks. While transfer learning assumes that the learning process for a target task will benefit from re-using representations learned for anot...

Combination of deep speaker embeddings for diarisation.

Neural networks : the official journal of the International Neural Network Society
Significant progress has recently been made in speaker diarisation after the introduction of d-vectors as speaker embeddings extracted from neural network (NN) speaker classifiers for clustering speech segments. To extract better-performing and more ...

A dual-stream deep attractor network with multi-domain learning for speech dereverberation and separation.

Neural networks : the official journal of the International Neural Network Society
Deep attractor networks (DANs) perform speech separation with discriminative embeddings and speaker attractors. Compared with methods based on the permutation invariant training (PIT), DANs define a deep embedding space and deliver a more elaborate r...