Text-Guided Neural Network Training for Image Recognition in Natural Scenes and Medicine.

Journal: IEEE transactions on pattern analysis and machine intelligence
Published Date:

Abstract

Convolutional neural networks (CNNs) are widely recognized as the foundation for machine vision systems. The conventional rule of teaching CNNs to understand images requires training images with human annotated labels, without any additional instructions. In this article, we look into a new scope and explore the guidance from text for neural network training. We present two versions of attention mechanisms to facilitate interactions between visual and semantic information and encourage CNNs to effectively distill visual features by leveraging semantic features. In contrast to dedicated text-image joint embedding methods, our method realizes asynchronous training and inference behavior: a trained model can classify images, irrespective of the text availability. This characteristic substantially improves the model scalability to multiple (multimodal) vision tasks. We also apply the proposed method onto medical imaging, which learns from richer clinical knowledge and achieves attention-based interpretable decision-making. With comprehensive validation on two natural and two medical datasets, we demonstrate that our method can effectively make use of semantic knowledge to improve CNN performance. Our method performs substantial improvement on medical image datasets. Meanwhile, it achieves promising performance for multi-label image classification and caption-image retrieval as well as excellent performance for phrase-based and multi-object localization on public benchmarks.

Authors

  • Zizhao Zhang
    Department of Computer and Information Science and Engineering, University of Florida, FL 32611, USA.
  • Pingjun Chen
    J. Crayton Pruitt Family Department of Biomedical Engineering, University of Florida, Gainesville, FL, United States.
  • Xiaoshuang Shi
    Shandong Industrial Engineering Laboratory of Biogas Production & Utilization, Qingdao Institute of Bioenergy and Bioprocess Technology, Chinese Academy of Sciences, Qingdao 266101, China. E-mail: lujun@qibebt.ac.cn (Jun Lu), yangzm@qibebt.ac.cn (Zhiman Yang).
  • Lin Yang
    National Clinical Research Center for Metabolic Diseases, Key Laboratory of Diabetes Immunology (Central South University), Ministry of Education, and Department of Metabolism and Endocrinology, The Second Xiangya Hospital of Central South University, Changsha, China.