Comprehensive interaction modeling with machine learning improves prediction of disease risk in the UK Biobank.

Journal: Nature communications
Published Date:

Abstract

Understanding how risk factors interact to jointly influence disease risk can provide insights into disease development and improve risk prediction. Here we introduce survivalFM, a machine learning extension to the widely used Cox proportional hazards model that enables scalable estimation of all potential pairwise interaction effects on time-to-event outcomes. The method approximates interaction effects using a low-rank factorization, allowing it to overcome the computational and statistical limitations typically associated with high-dimensional interaction modeling. Applied to the UK Biobank dataset across nine disease examples and diverse clinical and omics risk factors, survivalFM improves prediction performance in terms of discrimination, explained variation, and reclassification in 30.6%, 41.7%, and 94.4% of the scenarios tested, respectively. In a clinical cardiovascular risk prediction scenario using the established QRISK3 model, the method adds predictive value by identifying interactions beyond the age interaction effects currently included. These results demonstrate that comprehensive modeling of interactions can facilitate advanced insights into disease development and improve risk predictions.

Authors

  • Heli Julkunen
    Department of Computer Science, Helsinki Institute for Information Technology HIIT, Aalto University, Espoo, Finland.
  • Juho Rousu
    Department of Computer Science, Aalto University, 00076, Aalto, Finland. juho.rousu@aalto.fi.