GPAD: a natural language processing-based application to extract the gene-disease association discovery information from OMIM.
Journal:
BMC bioinformatics
Published Date:
Feb 27, 2024
Abstract
BACKGROUND: Thousands of genes have been associated with different Mendelian conditions. One of the valuable sources to track these gene-disease associations (GDAs) is the Online Mendelian Inheritance in Man (OMIM) database. However, most of the information in OMIM is textual, and heterogeneous (e.g. summarized by different experts), which complicates automated reading and understanding of the data. Here, we used Natural Language Processing (NLP) to make a tool (Gene-Phenotype Association Discovery (GPAD)) that could syntactically process OMIM text and extract the data of interest.