Deep semi-supervised learning ensemble framework for classifying co-mentions of human proteins and phenotypes.

Journal: BMC bioinformatics
Published Date:

Abstract

BACKGROUND: Identifying human protein-phenotype relationships has attracted researchers in bioinformatics and biomedical natural language processing due to its importance in uncovering rare and complex diseases. Since experimental validation of protein-phenotype associations is prohibitive, automated tools capable of accurately extracting these associations from the biomedical text are in high demand. However, while the manual annotation of protein-phenotype co-mentions required for training such models is highly resource-consuming, extracting millions of unlabeled co-mentions is straightforward.

Authors

  • Morteza Pourreza Shahri
    Gianforte School of Computing, Montana State University, Bozeman, USA.
  • Indika Kahanda
    School of Computing, University of North Florida, Jacksonville, USA. indika.kahanda@unf.edu.