Machine Learning and Natural Language Processing to Improve Classification of Atrial Septal Defects in Electronic Health Records.

Journal: Birth defects research
PMID:

Abstract

BACKGROUND: International Classification of Disease (ICD) codes can accurately identify patients with certain congenital heart defects (CHDs). In ICD-defined CHD data sets, the code for secundum atrial septal defect (ASD) is the most common, but it has a low positive predictive value for CHD, potentially resulting in the drawing of erroneous conclusions from such data sets. Methods with reduced false positive rates for CHD among individuals captured with the ASD ICD code are needed for public health surveillance.

Authors

  • Yuting Guo
    State Key Laboratory of Virtual Reality Technology and Systems, Beihang University, Beijing, China.
  • Haoming Shi
    Department of Biomedical Engineering, Georgia Institute Technology, Atlanta, Georgia, USA.
  • Wendy M Book
    Department of Cardiology, School of Medicine Emory University Atlanta GA.
  • Lindsey Carrie Ivey
    Department of Epidemiology, Rollins School of Public Health, Emory University, Atlanta, Georgia, USA.
  • Fred H Rodriguez
    Department of Cardiology, School of Medicine Emory University Atlanta GA.
  • Reza Sameni
    Department of Biomedical Informatics, Emory University, Atlanta, GA 30322, USA.
  • Cheryl Raskind-Hood
    Department of Epidemiology, Emory University, Rollins School of Public Health, Atlanta, Georgia, USA.
  • Chad Robichaux
    Department of Biomedical Informatics, School of Medicine Emory University Atlanta GA.
  • Karrie F Downing
    National Center on Birth Defects and Developmental Disabilities, Centers for Disease Control and Prevention, Atlanta, Georgia, USA.
  • Abeed Sarker
    Department of Biomedical Informatics, School of Medicine, Emory University, Atlanta, GA, United States.