Chronological age estimation from human microbiomes with transformer-based Robust Principal Component Analysis.

Journal: Communications biology
Published Date:

Abstract

Deep learning for microbiome analysis has shown potential for understanding microbial communities and human phenotypes. Here, we propose an approach, Transformer-based Robust Principal Component Analysis(TRPCA), which leverages the strengths of transformer architectures and interpretability of Robust Principal Component Analysis. To investigate benefits of TRPCA over conventional machine learning models, we benchmarked performance on age prediction from three body sites(skin, oral, gut), with 16S rRNA gene amplicon(16S) and whole-genome sequencing(WGS) data. We demonstrated prediction of age from longitudinal samples and combined classification and regression tasks via multi-task learning(MTL). TRPCA improves age prediction accuracy from human microbiome samples, achieving the largest reduction in Mean Absolute Error for WGS skin (MAE: 8.03, 28% reduction) and 16S skin (MAE: 5.09, 14% reduction) samples, compared to conventional approaches. Additionally, TRPCA's MTL approach achieves an accuracy of 89% for birth country prediction across 5 countries, while improving age prediction from WGS stool samples. Notably, TRPCA uncovers a link between subject and error prediction through residual analysis for paired samples across sequencing method (16S/WGS) and body site(oral/gut). These findings highlight TRPCA's utility in improving age prediction while maintaining feature-level interpretability, and elucidating connections between individuals and microbiomes.

Authors

  • Tyler Myers
    Center for Microbiome Innovation, Jacobs School of Engineering, University of California San Diego, La Jolla, CA, USA.
  • Se Jin Song
    Center for Microbiome Innovation, Jacobs School of Engineering, University of California San Diego, La Jolla, CA, USA.
  • Yang Chen
    Orthopedics Department of the First Affiliated Hospital of Tsinghua University, Beijing, China.
  • Britta De Pessemier
    Center for Microbial Ecology and Technology, Ghent University, Ghent, Belgium.
  • Lora Khatib
    Department of Pediatrics, University of California San Diego, La Jolla, CA, USA.
  • Daniel McDonald
    Department of Pediatrics, University of California San Diego, La Jolla, CA, USA.
  • Shi Huang
    Faculty of Dentistry, the University of Hong Kong, Hong Kong SAR, China. Electronic address: shihuang@hku.hk.
  • Richard Gallo
    Department of Dermatology, University of California San Diego, La Jolla, CA, USA.
  • Chris Callewaert
    Center for Microbial Ecology and Technology, Ghent University, Ghent, Belgium.
  • Aki S Havulinna
    National Institute for Health and Welfare, Helsinki, Finland.
  • Leo Lahti
    Department of Computing, University of Turku, Turku, Finland.
  • Guus Roeselers
    Danone Research and Innovation, Utrecht, the Netherlands.
  • Manolo Laiola
    Danone Research and Innovation, Utrecht, the Netherlands.
  • Sudarshan A Shetty
    Danone Research and Innovation, Utrecht, the Netherlands.
  • Scott T Kelley
    Bioinformatics and Medical Informatics Program, San Diego State University, San Diego, CA, USA.
  • Rob Knight
    Department of Pediatrics, University of California, San Diego School of Medicine, La Jolla, CA 92093, USA; Center for Microbiome Innovation, Jacobs School of Engineering, University of California, San Diego, La Jolla, CA 92093, USA; Department of Computer Science and Engineering, Jacobs School of Engineering, University of California San Diego, La Jolla, CA 92093, USA.
  • Andrew Bartko
    Center for Microbiome Innovation, Jacobs School of Engineering, University of California San Diego, La Jolla, CA, USA. abartko@ucsd.edu.