Document triage for identifying protein-protein interactions affected by mutations: a neural network ensemble approach.

Journal: Database : the journal of biological databases and curation
Published Date:

Abstract

The precision medicine (PM) initiative promises to identify individualized treatment depending on a patients' genetic profile and their related responses. In order to help health professionals and researchers in the PM endeavor, BioCreative VI organized a PM Track to mine protein-protein interactions (PPI) affected by genetic mutations from the biomedical literature. In this paper, we present a neural network ensemble approach to identify relevant articles describing PPI affected by mutations. In this approach, several neural network models are used for document triage, and the ensemble performs better than any individual model. In the official runs, our best submission achieves an F-score of 69.04% in the BioCreative VI PM document triage task. After post-challenge analysis, to address the problem of the limited size of training set, a PPI pre-trained module is incorporated into our approach to further improve the performance. Finally, our best ensemble method achieves an F-score of 71.04% on the test set.

Authors

  • Ling Luo
    Department of Epidemiology and Medical Statistics School of Public Health, Guangdong Medical University, Dongguan, Guangdong, China.
  • Zhihao Yang
    College of Computer Science and Technology, Dalian University of Technology, Dalian 116024, China.
  • Hongfei Lin
  • Jian Wang
    Veterinary Diagnostic Center, Shanghai Animal Disease Control Center, Shanghai, China.