Nanopore full length 16S rRNA gene sequencing increases species resolution in bacterial biomarker discovery.

Journal: Scientific reports
Published Date:

Abstract

Discovery of disease-related bacterial biomarkers could be a useful approach for early prevention or diagnosis of various afflictions, such as colorectal cancer. This typically involves analyzing small regions of the 16S rRNA gene (e.g. V3V4) through short-read technologies like Illumina, obtaining genus-level results. However, recent developments in third-generation sequencing, such as Oxford Nanopore Technologies (ONT)'s new R10.4.1 chemistry and its improved basecalling models, are beginning to allow for a more complete and accessible species-level analysis through full-length 16S rRNA gene sequencing (spanning regions V1-V9). Thus, the goal of this study was to compare and evaluate both approaches, using colorectal cancer biomarker discovery as a representative case. This was achieved through the analysis of feces from 123 subjects, comparing both methods (Illumina-V3V4 with DADA2 and QIIME2 vs. ONT-V1V9 with Emu), multiple Dorado basecalling models (fast, hac and sup) and multiple databases (SILVA vs. Emu's Default database). Basecalling models broadly resulted in similar taxonomic output, but had significantly higher observed species and different taxonomic identification the lower the basecalling quality (p-value<0.05). Database choice with Emu influenced the identified species greatly, with Emu's Default database obtaining significantly higher diversity and identified species than SILVA (p-value<0.05). However, it overconfidently classified at times what should be an unknown species as the closest match due to its database structure. Bacterial abundance between Illumina-V3V4 and ONT-V1V9 at the genus level correlated well (R≥0.8). Nanopore sequencing identified more specific bacterial biomarkers for colorectal cancer than those obtained with Illumina, such as Parvimonas micra, Fusobacterium nucleatum, Peptostreptococcus stomatis, Peptostreptococcus anaerobius, Gemella morbillorum, Clostridium perfringens, Bacteroides fragilis and Sutterella wadsworthensis. Prediction of colorectal cancer through manual feature selection and machine learning resulted in an AUC of 0.87 with 14 species or 0.82 with just 4 species (P. micra, F. nucleatum, B. fragilis and Agathobaculum butyriciproducens). Full 16S rRNA V1V9 sequencing through Oxford Nanopore and its new R10.4.1 chemistry achieved accurate species-level bacterial identification, facilitating the discovery of more precise disease-related biomarkers and increasing the taxonomic fidelity of future microbiome analyses.

Authors

  • Pablo Aja-Macaya
    Grupo de Investigación en Microbiología, Instituto de Investigación Biomédica de A Coruña (INIBIC), Complejo Hospitalario Universitario de A Coruña (CHUAC), Sergas, Universidade de A Coruña (UDC), As Xubias, 15006, A Coruña, Galicia, Spain.
  • Kelly Conde-Pérez
    Grupo de Investigación en Microbiología, Instituto de Investigación Biomédica de A Coruña (INIBIC), Complejo Hospitalario Universitario de A Coruña (CHUAC), Sergas, Universidade de A Coruña (UDC), As Xubias, 15006, A Coruña, Galicia, Spain.
  • Noelia Trigo-Tasende
    Grupo de Investigación en Microbiología, Instituto de Investigación Biomédica de A Coruña (INIBIC), Complejo Hospitalario Universitario de A Coruña (CHUAC), Sergas, Universidade de A Coruña (UDC), As Xubias, 15006, A Coruña, Galicia, Spain.
  • Elena Buetas
    Genomic and Health Department, FISABIO Foundation, Center for Advanced Research in Public Health, Benimaclet, 46020, Valencia, Comunidad Valenciana, Spain.
  • Mohammed Nasser-Ali
    Grupo de Investigación en Microbiología, Instituto de Investigación Biomédica de A Coruña (INIBIC), Complejo Hospitalario Universitario de A Coruña (CHUAC), Sergas, Universidade de A Coruña (UDC), As Xubias, 15006, A Coruña, Galicia, Spain.
  • Paula Nión
    Grupo de Investigación en Microbiología, Instituto de Investigación Biomédica de A Coruña (INIBIC), Complejo Hospitalario Universitario de A Coruña (CHUAC), Sergas, Universidade de A Coruña (UDC), As Xubias, 15006, A Coruña, Galicia, Spain.
  • Soraya Rumbo-Feal
    Grupo de Investigación en Microbiología, Instituto de Investigación Biomédica de A Coruña (INIBIC), Complejo Hospitalario Universitario de A Coruña (CHUAC), Sergas, Universidade de A Coruña (UDC), As Xubias, 15006, A Coruña, Galicia, Spain.
  • Susana Ladra
    Database Laboratory (LBD), CITIC, Universidade da Coruña (UDC), Campus de Elviña, 15071, A Coruña, Galicia, Spain.
  • Germán Bou
    Clinical Microbiology Department, Complexo Hospitalario Universitario A Coruña, Institute of Biomedical Research A Coruña (INIBIC), A Coruña, Spain.
  • Álex Mira
    Genomic and Health Department, FISABIO Foundation, Center for Advanced Research in Public Health, Benimaclet, 46020, Valencia, Comunidad Valenciana, Spain.
  • Juan A Vallejo
    Grupo de Investigación en Microbiología, Instituto de Investigación Biomédica de A Coruña (INIBIC), Complejo Hospitalario Universitario de A Coruña (CHUAC), Sergas, Universidade de A Coruña (UDC), As Xubias, 15006, A Coruña, Galicia, Spain. juan.andres.vallejo.vidal@sergas.es.
  • Margarita Poza
    Grupo de Investigación en Microbiología, Instituto de Investigación Biomédica de A Coruña (INIBIC), Complejo Hospitalario Universitario de A Coruña (CHUAC), Sergas, Universidade de A Coruña (UDC), As Xubias, 15006, A Coruña, Galicia, Spain.