Machine Learning to Advance Human Genome-Wide Association Studies.

Journal: Genes
PMID:

Abstract

Machine learning, including deep learning, reinforcement learning, and generative artificial intelligence are revolutionising every area of our lives when data are made available. With the help of these methods, we can decipher information from larger datasets while addressing the complex nature of biological systems in a more efficient way. Although machine learning methods have been introduced to human genetic epidemiological research as early as 2004, those were never used to their full capacity. In this review, we outline some of the main applications of machine learning to assigning human genetic loci to health outcomes. We summarise widely used methods and discuss their advantages and challenges. We also identify several tools, such as Combi, GenNet, and GMSTool, specifically designed to integrate these methods for hypothesis-free analysis of genetic variation data. We elaborate on the additional value and limitations of these tools from a geneticist's perspective. Finally, we discuss the fast-moving field of foundation models and large multi-modal omics biobank initiatives.

Authors

  • Rafaella E Sigala
    Section of Statistical Multi-Omics, Department of Clinical and Experimental Medicine, Guildford GU2 7XH, Surrey, UK.
  • Vasiliki Lagou
    Section of Statistical Multi-Omics, Department of Clinical and Experimental Medicine, Guildford GU2 7XH, Surrey, UK.
  • Aleksey Shmeliov
    Section of Statistical Multi-Omics, Department of Clinical and Experimental Medicine, Guildford GU2 7XH, Surrey, UK.
  • Sara Atito
    Surrey Institute for People-Centred Artificial Intelligence, University of Surrey, Guildford GU2 7XH, Surrey, UK.
  • Samaneh Kouchaki
    Surrey Institute for People-Centred Artificial Intelligence, University of Surrey, Guildford GU2 7XH, Surrey, UK.
  • Muhammad Awais
    College of Mechanical and Electrical Engineering, Henan Agricultural University, Zhengzhou, 450002, China.
  • Inga Prokopenko
    Section of Statistical Multi-Omics, Department of Clinical and Experimental Medicine, Guildford GU2 7XH, Surrey, UK.
  • Adam Mahdi
    Oxford Internet Institute, University of Oxford, Oxford OX1 3JS, Oxfordshire, UK.
  • Ayse Demirkan
    Section of Statistical Multi-Omics, Department of Clinical and Experimental Medicine, Guildford GU2 7XH, Surrey, UK.