Triage of documents containing protein interactions affected by mutations using an NLP based machine learning approach.

Journal: BMC genomics
Published Date:

Abstract

BACKGROUND: Information on protein-protein interactions affected by mutations is very useful for understanding the biological effect of mutations and for developing treatments targeting the interactions. In this study, we developed a natural language processing (NLP) based machine learning approach for extracting such information from literature. Our aim is to identify journal abstracts or paragraphs in full-text articles that contain at least one occurrence of a protein-protein interaction (PPI) affected by a mutation.

Authors

  • Jinchan Qu
    Department of Statistics, Florida State University, Tallahassee, FL, 32306, USA.
  • Albert Steppi
    Laboratory of Systems Pharmacology at Harvard Medical School, Boston, MA, 02115, USA.
  • Dongrui Zhong
    Department of Statistics, Florida State University, Tallahassee, FL, 32306, USA.
  • Jie Hao
  • Jian Wang
    Veterinary Diagnostic Center, Shanghai Animal Disease Control Center, Shanghai, China.
  • Pei-Yau Lung
    Department of Statistics, Florida State University, Tallahassee, FL, 32306, USA.
  • Tingting Zhao
    School of Software Engineering, Beihang University, Beijing, China.
  • Zhe He
    School of Information, Florida State University, Tallahassee, FL, USA.
  • Jinfeng Zhang
    Department of Statistics, Florida State University, Tallahassee, FL, 32306, USA. jinfeng@stat.fsu.edu.