Enhancing target speaker extraction with Hierarchical Speaker Representation Learning.

Journal: Neural networks : the official journal of the International Neural Network Society
Published Date:

Abstract

Target speaker extraction aims to obtain the speech of the specific speaker from a mixture of multiple voices. The conventional approach exploits the target speaker embeddings from a pre-recorded speech segment as auxiliary information, providing prior for extraction. However, the naive single-vector embedding may lack attention to the subtle acoustic features such as pitch and harmonic distribution in the auxiliary speech, leading to an unsatisfying performance. Furthermore, traditional speaker embeddings are trained by speaker verification system and do not leverage the semantics of the auxiliary speech which may facilitate the extraction. To address these challenges, we propose a simple yet effective Hierarchical Speaker Representation Learning (HSRL). The proposed method comprises three modules: a Local Speaker Feature Extractor (LSFE), a Global Speaker Feature Extractor (GSFE), and a Hierarchical Cascading Input Strategy (HCIS). Specifically, the LSFE utilizes the fine-grained acoustic information in the anchor speech. In GSFE, we utilize ECAPA-TDNN to obtain the speaker embeddings of the target speaker, enhancing extraction performance with this global speaker information. In additional, a novel HCIS is proposed to integrate the output of the LSFE module to the input of the GSFE, which enables the global speaker features to focus on the semantic content of the pre-recorded speech. Experimental results on the Libri-2talker dataset demonstrate that our HSRL has achieved significant performance improvements and established new optimal benchmarks.

Authors

  • Shulin He
    College of Computer Science, Inner Mongolia University, Hohhot, China. Electronic address: heshulin@mail.imu.edu.cn.
  • Wei Xue
    School of Information Science and Technology, Nanjing Agricultural University, Nanjing 210095, Jiangsu, China.
  • Yang Yang
    Department of Gastrointestinal Surgery, The Third Hospital of Hebei Medical University, Shijiazhuang, China.
  • Huaiwen Zhang
    Department of Radiotherapy, Jiangxi Cancer Hospital, The Second Affiliated Hospital of Nanchang Medical College, Jiangxi Clinical Research Center for Cancer, Nanchang, China.
  • Jiahao Pan
    Division of Emerging Interdisciplinary Areas, Hong Kong University of Science and Technology, Hong Kong Special Administrative Region of China. Electronic address: jiahaopan@ust.hk.
  • Xueliang Zhang
    College of Medical Engineering and Technology, Xinjiang Medical University, Urumqi, 830054, People's Republic of China. shuxue2456@126.com.