Learning Student Networks via Feature Embedding.

Journal: IEEE transactions on neural networks and learning systems
Published Date:

Abstract

Deep convolutional neural networks have been widely used in numerous applications, but their demanding storage and computational resource requirements prevent their applications on mobile devices. Knowledge distillation aims to optimize a portable student network by taking the knowledge from a well-trained heavy teacher network. Traditional teacher-student-based methods used to rely on additional fully connected layers to bridge intermediate layers of teacher and student networks, which brings in a large number of auxiliary parameters. In contrast, this article aims to propagate information from teacher to student without introducing new variables that need to be optimized. We regard the teacher-student paradigm from a new perspective of feature embedding. By introducing the locality preserving loss, the student network is encouraged to generate the low-dimensional features that could inherit intrinsic properties of their corresponding high-dimensional features from the teacher network. The resulting portable network, thus, can naturally maintain the performance as that of the teacher network. Theoretical analysis is provided to justify the lower computation complexity of the proposed method. Experiments on benchmark data sets and well-trained networks suggest that the proposed algorithm is superior to state-of-the-art teacher-student learning methods in terms of computational and storage complexity.

Authors

  • Hanting Chen
  • Yunhe Wang
  • Chang Xu
    Institute of Cardio-Cerebrovascular Medicine, Central Hospital of Dalian University of Technology, Dalian 116089, China.
  • Chao Xu
    Department of Neurology, the Second Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou 310009, China;Department of Emergency, Zhejiang Hospital, Hangzhou 310013, China.
  • Dacheng Tao