GPAD: a natural language processing-based application to extract the gene-disease association discovery information from OMIM.

Journal: BMC bioinformatics
Published Date:

Abstract

BACKGROUND: Thousands of genes have been associated with different Mendelian conditions. One of the valuable sources to track these gene-disease associations (GDAs) is the Online Mendelian Inheritance in Man (OMIM) database. However, most of the information in OMIM is textual, and heterogeneous (e.g. summarized by different experts), which complicates automated reading and understanding of the data. Here, we used Natural Language Processing (NLP) to make a tool (Gene-Phenotype Association Discovery (GPAD)) that could syntactically process OMIM text and extract the data of interest.

Authors

  • K M Tahsin Hassan Rahit
    Departments of Biochemistry, Molecular Biology and Medical Genetics, Cumming School of Medicine, University of Calgary, Calgary, AB, T2N 4N1, Canada.
  • Vladimir Avramovic
    Departments of Biochemistry, Molecular Biology and Medical Genetics, Cumming School of Medicine, University of Calgary, Calgary, AB, T2N 4N1, Canada.
  • Jessica X Chong
    Division of Genetic Medicine, Department of Pediatrics, University of Washington, Seattle, WA, 98195, USA.
  • Maja Tarailo-Graovac
    Departments of Biochemistry, Molecular Biology and Medical Genetics, Cumming School of Medicine, University of Calgary, Calgary, AB, T2N 4N1, Canada. maja.tarailograovac@ucalgary.ca.