Endpoint-aware audio-visual speech enhancement utilizing dynamic weight modulation based on SNR estimation.

Journal: Neural networks : the official journal of the International Neural Network Society
PMID:

Abstract

Integrating visual features has been proven effective for deep learning-based speech quality enhancement, particularly in highly noisy environments. However, these models may suffer from redundant information, resulting in performance deterioration when the signal-to-noise ratio (SNR) is relatively high. Real-world noisy scenarios typically exhibit widely varying noise levels. To address the above issues, this study proposes a novel Audio-Visual Speech Enhancement (AVSE) system incorporating audio and visual voice activity information, utilizing attention techniques based on an SNR estimation module, dynamically adjusting the audio and visual endpoint information weights during evaluation based on the environmental noise level. The dynamic modulation makes the model an Endpoint-Aware Network (EANet). The model prioritizes the desired voice period, thereby enhancing speech intelligibility by jointly leveraging noisy acoustic cues and noise-robust visual cues. Experiments are conducted using benchmark datasets. The results indicate that EANet effectively integrates audio and visual information, demonstrating improved performance compared to the audio-only model, especially in scenarios with wide SNR ranges. Therefore, this work shows its efficacy in improving the fusion effectiveness of multimodal information for AVSE, enhancing the quality and intelligibility of the speech.

Authors

  • Zhehui Zhu
    School of automotive studies, Tongji University, Shanghai 201804, China. Electronic address: 2131577@tongji.edu.cn.
  • Lijun Zhang
    Department of Paediatric Orthopaedics, Shengjing Hospital of China Medical University, Shenyang, Liaoning Province, China.
  • Kaikun Pei
    School of automotive studies, Tongji University, Shanghai 201804, China.
  • Siqi Chen
    College of Animal Science and Technology, Jilin Agricultural University, Changchun, China.