Biomedical literature classification with a CNNs-based hybrid learning network.

Journal: PloS one
PMID:

Abstract

Deep learning techniques, e.g., Convolutional Neural Networks (CNNs), have been explosively applied to the research in the fields of information retrieval and natural language processing. However, few research efforts have addressed semantic indexing with deep learning. The use of semantic indexing in the biomedical literature has been limited for several reasons. For instance, MEDLINE citations contain a large number of semantic labels from automatically annotated MeSH terms, and for a great deal of the literature, only the information of the title and the abstract is readily available. In this paper, we propose a Boltzmann Convolutional neural network framework (B-CNN) for biomedicine semantic indexing. In our hybrid learning framework, the CNN can adaptively deal with features of documents that have sequence relationships, and can capture context information accordingly; the Deep Boltzmann Machine (DBM) merges global (the entity in each document) and local information through its training with undirected connections. Additionally, we have designed a hierarchical coarse to fine style indexing structure for learning and classifying documents, and a novel feature extension approach with word sequence embedding and Wikipedia categorization. Comparative experiments were conducted for semantic indexing of biomedical abstract documents; these experiments verified the encouraged performance of our B-CNN model.

Authors

  • Yan Yan
    Department of Biomedical Engineering, Wayne State University, Detroit, Michigan, USA.
  • Xu-Cheng Yin
    Department of Computer Science and Technology, School of Computer and Communication Engineering, University of Science and Technology Beijing, Beijing 100083, China.
  • Chun Yang
    State Key Laboratory of Biogeology and Environmental Geology, School of Earth Sciences, China University of Geosciences, Wuhan, 430074, China.
  • Sujian Li
    Key Laboratory of Computational Linguistics, Peking University, Ministry of Education, Beijing 100871, China.
  • Bo-Wen Zhang
    Department of Computer Science and Technology, School of Computer and Communication Engineering, University of Science and Technology Beijing, Beijing 100083, China.