Inference of chronic obstructive pulmonary disease with deep learning on raw spirograms identifies new genetic loci and improves risk models.

Journal: Nature genetics
PMID:

Abstract

Chronic obstructive pulmonary disease (COPD), the third leading cause of death worldwide, is highly heritable. While COPD is clinically defined by applying thresholds to summary measures of lung function, a quantitative liability score has more power to identify genetic signals. Here we train a deep convolutional neural network on noisy self-reported and International Classification of Diseases labels to predict COPD case-control status from high-dimensional raw spirograms and use the model's predictions as a liability score. The machine-learning-based (ML-based) liability score accurately discriminates COPD cases and controls, and predicts COPD-related hospitalization without any domain-specific knowledge. Moreover, the ML-based liability score is associated with overall survival and exacerbation events. A genome-wide association study on the ML-based liability score replicates existing COPD and lung function loci and also identifies 67 new loci. Lastly, our method provides a general framework to use ML methods and medical-record-based labels that does not require domain knowledge or expert curation to improve disease prediction and genomic discovery for drug design.

Authors

  • Justin Cosentino
    Google Health, Palo Alto, CA 94304, USA.
  • Babak Behsaz
    Google Health, Cambridge, MA 02142, USA.
  • Babak Alipanahi
    Department of Electrical and Computer Engineering, University of Toronto, Toronto, Ontario M5S 3G4, Canada. Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, Ontario M5S 3E1, Canada. Program on Genetic Networks and Program on Neural Computation & Adaptive Perception, Canadian Institute for Advanced Research, Toronto, Ontario M5G 1Z8, Canada.
  • Zachary R McCaw
    Google Health, Palo Alto, CA 94304, USA.
  • Davin Hill
    Department of Electrical and Computer Engineering, Northeastern University, Boston, MA, USA.
  • Tae-Hwi Schwantes-An
    Department of Medical and Molecular Genetics, Indiana University School of Medicine, Indianapolis, IN, USA.
  • Dongbing Lai
    Department of Medical and Molecular Genetics, Indiana University School of Medicine, Indianapolis, IN, USA.
  • Andrew Carroll
    Google Health, Palo Alto, CA 94304, USA.
  • Brian D Hobbs
    Channing Division of Network Medicine, Brigham and Women's Hospital, Boston, MA; Division of Pulmonary and Critical Care Medicine, Brigham and Women's Hospital, Boston, MA.
  • Michael H Cho
    Channing Division of Network Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA; Pulmonary and Critical Care Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA.
  • Cory Y McLean
    Google Brain, Cambridge, Massachusetts 02142, USA.
  • Farhad Hormozdiari
    Department of Epidemiology, Harvard T. H. Chan School of Public Health, Boston, MA, USA.