Natural Language Processing to Build a Multicenter Computable Phenotype Library for Adults with Congenital Heart Disease

Journal: medRxiv

Published Date: Jan 1, 2025

Abstract

Our objective was to build classifiers for multiple phenotypes that categorize a cohort of adults with congenital heart disease (ACHD), that can be used to populate variables in a biobank. A dataset of 1492 ACHD patients, with expert-created labels for eight phenotypes, was created and used to train classifiers with three different architectures. A larger unlabeled dataset containing 15869 patients was used to pre-train the classifiers, and a 20% subset of the unlabeled dataset was used to validate the classifier predictions. On held out labeled data, F1 scores for the eight target phenotypes of interest ranged from 0.66 to 1. Of those, the six phenotypes with best classification performance were then validated on unlabeled data, where positive predictive value ranged from 81.5% to 100%. We were able to classify six out of eight phenotypes with satisfactory performance. Challenging phenotypes included cyanosis and New York Heart Association functional class. Both vary over time and in the latter case there is limited agreement between human observers. Different phenotypes benefited from different model architectures to some degree, but the differences are small enough that uniformity of deployment may be a more important factor in choosing what models to deploy. We saw no benefit to joint training, but some phenotypes benefited from a multiclass model. Human-curated data can be used to train NTLP-based ACHD phenotype classifiers with excellent test characteristics acceptable for application in quality improvement efforts and to populate ACHD registry data.

Authors

Spencer Thomas; Angus Dawson; Hifsa Chaudhry; Sidra Ahmad; Xiyu Ding; Sarah A. Hummel; David M. Leone; Angela J. Weingarten; Eric Farber-Eger; Lauren Lee Shaffer; Benjamin P. Frischhertz; Sydney St. Clemmons; Sunil J. Ghelani; Fernando Baraona Reyes; Tzu-Chun Wu; Danny T. Y. Wu; Alexander R. Opotowsky; Timothy A. Miller

External Resources

View on medRxiv Access via DOI

Natural Language Processing to Build a Multicenter Computable Phenotype Library for Adults with Congenital Heart Disease

Abstract

Authors

Categories

External Resources

Popular Topics

Recent Journals

Natural Language Processing to Build a Multicenter Computable Phenotype Library for Adults with Congenital Heart Disease

Abstract

Authors

Categories

External Resources

Stay Ahead of Medical AI

Popular Topics

Recent Journals