Machine learning based disease prediction from genotype data.

Journal: Biological chemistry
Published Date:

Abstract

Using results from genome-wide association studies for understanding complex traits is a current challenge. Here we review how genotype data can be used with different machine learning (ML) methods to predict phenotype occurrence and severity from genotype data. We discuss common feature encoding schemes and how studies handle the often small number of samples compared to the huge number of variants. We compare which ML methods are being applied, including recent results using deep neural networks. Further, we review the application of methods for feature explanation and interpretation.

Authors

  • Nikoletta Katsaouni
    Institute for Cardiovascular Regeneration, Goethe University, 60590 Frankfurt am Main, Germany.
  • Araek Tashkandi
    Institute of Computer Sciences and Engineering, University of Jeddah, 21959 Jeddah, Saudi Arabia.
  • Lena Wiese
    Research Group Bioinformatics, Fraunhofer Institute for Toxicology and Experimental Medicine, Nikolai-Fuchs-Straße 1, 30625 Hannover, Germany. Electronic address: lena.wiese@item.fraunhofer.de.
  • Marcel H Schulz
    Cluster of Excellence for Multimodal Computing and Interaction, Saarland Informatics Campus, Saarland University, Saarbrücken, 66123, Germany mschulz@mmci.uni-saarland.de.