Text phrase-mining in identifying and classifying maternal proteins and genes across preeclampsia and similar pathologies.
Journal:
Physiological reports
PMID:
40102640
Abstract
This study aims to demonstrate that text phrase-mining and natural language processing (NLP) can annotate huge quantities of obstetrics textual data for the discovery and evaluation of maternal protein/gene (MPG)-disease interactions involved in the preeclampsia pathway. We employ a phrase-mining/NLP pipeline to evaluate unique MPGs involved in six cardiovascular derangements with overlapping presentations during pregnancy. The diseases were matched with Medical Subject Headings. A textual corpus was developed from abstracts matched to these terms through PubMed. Fourty-four MPGs were identified with respect to the diseases. Processing was performed, with unique scores for each MPG-disease pair. Components of the score were calculated and weighted for distinctness, integrity, and popularity. Statistical analyses were conducted for the examination of protein-disease relationships. Fourty-four MPGs with known associations to cardiovascular disease and preeclampsia pathways were identified among the 6 diseases. MPGs shared across the greatest number of disease states were implicated in: (1) angiogenesis and vasoconstriction, (2) hemodynamic regulation, (3) hormonal regulation of metabolism, and (4) inflammation. NLP and text phrase-mining are successfully applied to Obstetrics abstracts with accuracy and speed. This approach holds promise in synthesizing large volumes of data for presenting trends in the Obstetric literature and for the identification of promising biomarkers.