Improving classification of myocardial infarction with machine learning in a diverse population.

Journal: American journal of epidemiology
Published Date:

Abstract

Phenotype classification with electronic health record (EHR) data is increasingly performed with machine learning (ML); however, their performance in diverse population remains understudied. We compared an international classification of diseases (ICD)-based algorithm with an ML phenotyping pipeline to classify myocardial infarction (MI) in a general and self-reported Black population. We determined the impact of differential performance by replicating a published MI risk factor study with MI defined by the ICD or ML algorithms. Individuals followed in the Veterans Health Administration (VHA) EHR with data from 2002 to 2019 were examined: 11 523 175 Veterans; mean age, 67.5 years; 93.8% male; 14.3% Black; 79.1% White. MI was classified using a published rule-based ICD algorithm and an ML pipeline, PheCAP, which incorporates natural language processing. Algorithms were trained and validated against n = 403 Veterans randomly selected and chart reviewed for MI (gold standard), oversampled for self-reported Black. Among chart-reviewed Veterans, the ICD algorithm had high positive predicted value (PPV) and low sensitivity (all race, PPV: 0.97, sensitivity: 0.17; Black Veterans, PPV: 0.94, sensitivity: 0.24). PheCAP MI had good PPV and higher sensitivity (all race, PPV: 0.90, sensitivity: 0.66; Black, PPV: 0.81, sensitivity: 0.79). Applying PheCAP MI to the entire VHA population to classify MI provided increased power to replicate findings from the published MI risk factor study compared to the ICD algorithm.

Authors

  • Alicia Chen
    Massachusetts Veterans Epidemiology Research and Information Center (MAVERIC), VA Boston Healthcare System, Boston, MA.
  • Chuan Hong
    Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA.
  • Yuk Lam Ho
    Massachusetts Veterans Epidemiology Research and Information Center (MAVERIC), VA Boston Healthcare System, Boston, MA.
  • Nicholas Link
    Division of Data Sciences, VA Boston Healthcare System, Boston, MA, USA.
  • Jacqueline P Honerlaw
    Massachusetts Veterans Epidemiology Research and Information Center (MAVERIC), VA Boston Healthcare System, Boston, MA.
  • Vidisha Tanukonda
    Centralized Interactive Phenomics Resource (CIPHER), Office of Research and Development, Veterans Health Administration, Washington, DC.
  • Ariela R Orkaby
    New England Geriatric Research Education and Clinical Center (GRECC), VA Boston Healthcare System, Boston, MA.
  • Saadia Qazi
    New England Geriatric Research Education and Clinical Center (GRECC), VA Boston Healthcare System, Boston, MA.
  • Connor Melley
    Massachusetts Veterans Epidemiology Research and Information Center (MAVERIC), VA Boston Healthcare System, Boston, MA.
  • Ashley Galloway
    Massachusetts Veterans Epidemiology Research and Information Center (MAVERIC), VA Boston Healthcare System, Boston, MA.
  • Lauren Costa
    Massachusetts Veterans Epidemiology Research and Information Center (MAVERIC), VA Boston Healthcare System, Boston, MA.
  • Monika Maripuri
    Massachusetts Veterans Epidemiology Research and Information Center (MAVERIC), VA Boston Healthcare System, Boston, MA.
  • Xuan Wang
    Baylor Scott & White Health, Dallas, TX, USA.
  • Yichi Zhang
    Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA.
  • Petra Schubert
    Massachusetts Veterans Epidemiology Research and Information Center (MAVERIC), VA Boston Healthcare System, Boston, MA.
  • Tianrun Cai
    Harvard-MIT Center for Regulatory Science, Harvard Medical School, Boston, MA, United States.
  • Zeling He
    Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, USA.
  • Vidul A Panickan
    Department of Biomedical Informatics, Harvard Medical School, Boston, MA.
  • Morgan Rosser
    Massachusetts Veterans Epidemiology Research and Information Center (MAVERIC), VA Boston Healthcare System, Boston, MA.
  • Laura Tarko
    Massachusetts Veterans Epidemiology Research and Information Center (MAVERIC), VA Boston Healthcare System, Boston, MA.
  • Sharon Dowell
    Howard University, Washington DC.
  • Candace Feldman
    Division of Rheumatology, Inflammation and Immunity, Department of Medicine, Brigham and Women's Hospital, Boston, MA.
  • Gail Kerr
    Howard University, Washington DC.
  • J Michael Gaziano
    Division of Rheumatology, Immunology, and Allergy, Brigham and Women's Hospital, Boston, MA, USA.
  • Peter W F Wilson
    Emory Clinical Cardiovascular Research Institute, Emory University, Atlanta, Georgia, United States of America.
  • Kelly Cho
    Division of Rheumatology, Immunology, and Allergy, Brigham and Women's Hospital, Boston, MA, USA.
  • Tianxi Cai
    Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, United States.
  • Katherine P Liao
    Department of Biomedical Informatics, Harvard Medical School, Boston, MA, United States.

Keywords

No keywords available for this article.