A computational pipeline for data augmentation towards the improvement of disease classification and risk stratification models: A case study in two clinical domains.

Journal: Computers in biology and medicine
Published Date:

Abstract

Virtual population generation is an emerging field in data science with numerous applications in healthcare towards the augmentation of clinical research databases with significant lack of population size. However, the impact of data augmentation on the development of AI (artificial intelligence) models to address clinical unmet needs has not yet been investigated. In this work, we assess whether the aggregation of real with virtual patient data can improve the performance of the existing risk stratification and disease classification models in two rare clinical domains, namely the primary Sjögren's Syndrome (pSS) and the hypertrophic cardiomyopathy (HCM), for the first time in the literature. To do so, multivariate approaches, such as, the multivariate normal distribution (MVND), and straightforward ones, such as, the Bayesian networks, the artificial neural networks (ANNs), and the tree ensembles are compared against their performance towards the generation of high-quality virtual data. Both boosting and bagging algorithms, such as, the Gradient boosting trees (XGBoost), the AdaBoost and the Random Forests (RFs) were trained on the augmented data to evaluate the performance improvement for lymphoma classification and HCM risk stratification. Our results revealed the favorable performance of the tree ensemble generators, in both domains, yielding virtual data with goodness-of-fit 0.021 and KL-divergence 0.029 in pSS and 0.029, 0.027 in HCM, respectively. The application of the XGBoost on the augmented data revealed an increase by 10.9% in accuracy, 10.7% in sensitivity, 11.5% in specificity for lymphoma classification and 16.1% in accuracy, 16.9% in sensitivity, 13.7% in specificity in HCM risk stratification.

Authors

  • Vasileios C Pezoulas
    Unit of Medical Technology and Intelligent Information Systems, Dept. of Material Science and Engineering, University of Ioannina, GR45110, Ioannina, Greece.
  • Grigoris I Grigoriadis
    Unit of Medical Technology and Intelligent Information Systems, Department of Materials Science and Engineering, University of Ioannina, Ioannina, GR45110, Greece.
  • George Gkois
    Unit of Medical Technology and Intelligent Information Systems, Department of Materials Science and Engineering, University of Ioannina, Ioannina, GR45110, Greece.
  • Nikolaos S Tachos
    Unit of Medical Technology and Intelligent Information Systems, Department of Materials Science and Engineering, University of Ioannina, Ioannina, GR45110, Greece.
  • Tim Smole
    Faculty of Computer and Information Science, University of Ljubljana, Večna Pot 113, 1000, Ljubljana, Slovenia.
  • Zoran Bosnić
    Faculty of Computer and Information Science, University of Ljubljana, Večna pot 113, Ljubljana, Slovenia.
  • Matej Pičulin
    Faculty of Computer and Information Science, University of Ljubljana, Večna Pot 113, 1000, Ljubljana, Slovenia.
  • Iacopo Olivotto
    Division of Cardiology, University of Florence, Florence.
  • Fausto Barlocco
    Department of Experimental and Clinical Medicine, University of Florence and Cardiomyopathies Unit, Azienda Ospedaliera Careggi, Florence, Italy.
  • Marko Robnik-Šikonja
    Faculty of Computer and Information Science, University of Ljubljana, Večna pot 113, 1000 Ljubljana, Slovenia. Electronic address: marko.robnik@fri.uni-lj.si.
  • Djordje G Jakovljevic
    Faculty of Medical Sciences, Newcastle University, Newcastle Upon Tyne, UK and with the Faculty of Health and Life Sciences, Coventry University, Coventry, UK.
  • Andreas Goules
    Department of Pathophysiology, Faculty of Medicine, National and Kapodistrian University of Athens (NKUA), GR 15772, Athens, Greece.
  • Athanasios G Tzioufas
    Department of Pathophysiology and Joint Rheumatology, Medical School, National and Kapodistrian University of Athens, Greece; Biomedical Research Foundation of the Academy of Athens, Greece; Research Institute for Systemic Autoimmune Diseases, Greece.
  • Dimitrios I Fotiadis
    Biomedical Research Institute, Foundation for Research and Technology Hellas, Greece; Unit of Medical Technology and Intelligent Information Systems, Department of Materials Science and Engineering, University of Ioannina, Greece.