Cross-protein transfer learning substantially improves disease variant prediction.

Journal: Genome biology
Published Date:

Abstract

BACKGROUND: Genetic variation in the human genome is a major determinant of individual disease risk, but the vast majority of missense variants have unknown etiological effects. Here, we present a robust learning framework for leveraging saturation mutagenesis experiments to construct accurate computational predictors of proteome-wide missense variant pathogenicity.

Authors

  • Milind Jagota
    Department of Computer Science, Stanford University, Stanford, California, USA.
  • Chengzhong Ye
    Department of Statistics, University of California, Berkeley, 94720, CA, USA.
  • Carlos Albors
    Broad Institute of MIT and Harvard, Cambridge, MA, USA.
  • Ruchir Rastogi
    Computer Science Division, University of California, Berkeley, 94720, CA, USA.
  • Antoine Koehl
    Department of Statistics, University of California, Berkeley, 94720, CA, USA.
  • Nilah Ioannidis
    Computer Science Division, University of California, Berkeley, 94720, CA, USA.
  • Yun S Song
    Computer Science Division, UC Berkeley, Berkeley, California, United States of America.