Dense dynamic convolutional network for Bel canto vocal technique assessment.

Journal: Scientific reports
PMID:

Abstract

The Bel Canto performance is a complex and multidimensional art form encompassing pitch, timbre, technique, and affective expression. To accurately reflect a performer's singing proficiency, it is essential to quantify and evaluate their vocal technical execution precisely. Convolutional Neural Networks (CNNs), renowned for their robust ability to capture spatial hierarchical information, have been widely adopted in various tasks, including audio pattern recognition. However, existing CNNs exhibit limitations in extracting intricate spectral features, particularly in Bel Canto performance. To address the challenges posed by complex spectral features and meet the demands for objective vocal technique assessment, we introduce Omni-Dimensional Dynamic Convolution (ODConv). Additionally, we employ densely connected layers to optimize the framework, enabling efficient utilization of multi-scale features across multiple dynamic convolution layers. To validate the effectiveness of our method, we conducted experiments on tasks including vocal technique assessment, music classification, acoustic scene classification, and sound event detection. The experimental results demonstrate that our Dense Dynamic Convolutional Network (DDNet) outperforms traditional CNN and Transformer models, achieving 90.11%, 73.95%, and 89.31% (Top-1 Accuracy), and 41.89% (mAP), respectively. Our research not only significantly improves the accuracy and efficiency of Bel Canto vocal technique assessment but also facilitates applications in vocal teaching and remote education.

Authors

  • Zhenyi Hou
    University of Shanghai for Science and Technology, Shanghai, 200093, China. hzy@usst.edu.cn.
  • Xu Zhao
    Intensive Care Unit, Hubei University of Medicine, Renmin Hospital, Shiyan, Hubei, China.
  • Shanggerile Jiang
    University of Shanghai for Science and Technology, Shanghai, 200093, China.
  • Daijun Luo
    University of Shanghai for Science and Technology, Shanghai, 200093, China.
  • Xinyu Sheng
    School of Optical-Electrical and Computer Engineering, University of Shanghai for Science andTechnology, Shanghai, 200093, China.
  • Kaili Geng
    University of Shanghai for Science and Technology, Shanghai, 200093, China.
  • Kejie Ye
    University of Shanghai for Science and Technology, Shanghai, 200093, China.
  • Jiajing Xia
    University of Shanghai for Science and Technology, Shanghai, 200093, China.
  • Yitao Zhang
    School of Information Science and Engineering, NingboTech University, Ningbo, 315100, China.
  • Chenxi Ban
    University of Shanghai for Science and Technology, Shanghai, 200093, China.
  • Jiaxing Chen
    School of Life Science and Technology, China Pharmaceutical University, Nanjing 210009, China.
  • Yan Zou
    National Clinical Research Center of Oral Diseases, Shanghai 200011, China.
  • Yuchao Feng
    Westlake University, Hangzhou, 310024, China. fengyuchao@wioe.westlake.edu.cn.
  • Xin Yuan
  • Guangyu Fan
    Department of Medical Oncology, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing Key Laboratory of Clinical Study On Anticancer Molecular Targeted Drugs, No.17 Panjiayuan Nanli, Chaoyang District, Beijing, 100021, China.