AutoXAI4Omics: an automated explainable AI tool for omics and tabular data.

Journal: Briefings in bioinformatics
Published Date:

Abstract

Machine learning (ML) methods offer opportunities for gaining insights into the intricate workings of complex biological systems, and their applications are increasingly prominent in the analysis of omics data to facilitate tasks, such as the identification of novel biomarkers and predictive modeling of phenotypes. For scientists and domain experts, leveraging user-friendly ML pipelines can be incredibly valuable, enabling them to run sophisticated, robust, and interpretable models without requiring in-depth expertise in coding or algorithmic optimization. By streamlining the process of model development and training, researchers can devote their time and energies to the critical tasks of biological interpretation and validation, thereby maximizing the scientific impact of ML-driven insights. Here, we present an entirely automated open-source explainable AI tool, AutoXAI4Omics, that performs classification and regression tasks from omics and tabular numerical data. AutoXAI4Omics accelerates scientific discovery by automating processes and decisions made by AI experts, e.g. selection of the best feature set, hyper-tuning of different ML algorithms and selection of the best ML model for a specific task and dataset. Prior to ML analysis AutoXAI4Omics incorporates feature filtering options that are tailored to specific omic data types. Moreover, the insights into the predictions that are provided by the tool through explainability analysis highlight associations between omic feature values and the targets under investigation, e.g. predicted phenotypes, facilitating the identification of novel actionable insights. AutoXAI4Omics is available at: https://github.com/IBM/AutoXAI4Omics.

Authors

  • James Strudwick
    IBM Research Europe, The Hartree Centre - Sci-Tech Daresbury, Keckwick Lane, Daresbury, Warrington WA4 4AD, United Kingdom.
  • Laura-Jayne Gardiner
    IBM Research UK, Sci-Tech Daresbury, Warrington, UK. Laura-Jayne.Gardiner@ibm.com.
  • Kate Denning-James
    Earlham Institute, Norwich Research Park, Colney Lane, Norwich NR4 7UZ.
  • Niina Haiminen
    T.J. Watson Research Center, IBM Research, Yorktown Heights, NY, 10598, USA.
  • Ashley Evans
    IBM Research Europe, The Hartree Centre - Sci-Tech Daresbury, Keckwick Lane, Daresbury, Warrington WA4 4AD, United Kingdom.
  • Jennifer Kelly
    IBM Research Europe, The Hartree Centre - Sci-Tech Daresbury, Keckwick Lane, Daresbury, Warrington WA4 4AD, United Kingdom.
  • Matthew Madgwick
    Organisms and Ecosystems, Earlham Institute, Norwich, UK.
  • Filippo Utro
    Computational Genomics, IBM Research, Yorktown Heights, NY, USA.
  • Ed Seabolt
    IBM Research, Almaden, 650 Harry Rd, San Jose, CA 95120, United States.
  • Christopher Gibson
    IBM Research Europe, The Hartree Centre - Sci-Tech Daresbury, Keckwick Lane, Daresbury, Warrington WA4 4AD, United Kingdom.
  • Bharat Bedi
    IBM Research Europe, The Hartree Centre - Sci-Tech Daresbury, Keckwick Lane, Daresbury, Warrington WA4 4AD, United Kingdom.
  • Daniel Clayton
    STFC, The Hartree Centre, Sci-Tech Daresbury, Keckwick Lane, Daresbury, Warrington WA4 4AD, United Kingdom.
  • Ciaron Howell
    STFC, The Hartree Centre, Sci-Tech Daresbury, Keckwick Lane, Daresbury, Warrington WA4 4AD, United Kingdom.
  • Laxmi Parida
    Computational Genomics, IBM Research, Yorktown Heights, NY, USA.
  • Anna Paola Carrieri
    IBM Research UK, Sci-Tech Daresbury, Warrington, UK.