Effects of dataset size and interactions on the prediction performance of logistic regression and deep learning models.

Journal: Computer methods and programs in biomedicine
Published Date:

Abstract

BACKGROUND AND OBJECTIVE: Machine learning and deep learning models are very powerful in predicting the presence of a disease. To achieve good predictions, those models require a certain amount of data to train on, whereas this amount i) is generally limited and difficult to obtain; and, ii) increases with the complexity of the interactions between the outcome (disease presence) and the model variables. This study compares the ways training dataset size and interactions affect the performance of those prediction models.

Authors

  • Alexandre Bailly
    Everteam Software, Research and Development Lab, 17 quai Joseph Gillet, Lyon, France; Université de Lyon, Lyon, France; Université Lyon 1, Villeurbanne, France; Service de Biostatistique-Bioinformatique, Hospices Civils de Lyon, Lyon, France; Équipe Biostatistique-Santé, Laboratoire de Biométrie et Biologie Évolutive, CNRS UMR 5558 Villeurbanne, France. Electronic address: a.bailly@everteam.com.
  • Corentin Blanc
    Everteam Software, Research and Development Lab, 17 quai Joseph Gillet, Lyon, France; Université de Lyon, Lyon, France; Université Lyon 1, Villeurbanne, France; Service de Biostatistique-Bioinformatique, Hospices Civils de Lyon, Lyon, France; Équipe Biostatistique-Santé, Laboratoire de Biométrie et Biologie Évolutive, CNRS UMR 5558 Villeurbanne, France.
  • Élie Francis
    Everteam Software, Research and Development Lab, 17 quai Joseph Gillet, Lyon, France.
  • Thierry Guillotin
    Everteam Software, Research and Development Lab, 17 quai Joseph Gillet, Lyon, France.
  • Fadi Jamal
    izyCardio - CardioParc, Lyon, France.
  • Béchara Wakim
    Mediapps Innovation SA, Lyon, France.
  • Pascal Roy
    Université de Lyon, Lyon, France; Université Lyon 1, Villeurbanne, France; Service de Biostatistique-Bioinformatique, Hospices Civils de Lyon, Lyon, France; Équipe Biostatistique-Santé, Laboratoire de Biométrie et Biologie Évolutive, CNRS UMR 5558 Villeurbanne, France.