scPlantAnnotate: an accurate and robust transformer-based model for plant cell type annotation.

Journal: Journal of advanced research
Published Date:

Abstract

INTRODUCTION: Accurate cell type annotation remains a major bottleneck in plant single-cell RNA sequencing (scRNA-seq), where existing tools are often adapted from animal studies and perform sub-optimally on plant data. The lack of plant-specific computational frameworks limits the construction of plant cell atlases and downstream biological discovery. OBJECTIVES: We develop and evaluate scPlantAnnotate, a Transformer-based reference annotation framework tailored for plant scRNA-seq data, and benchmark it against state-of-the-art deep learning and conventional methods across multiple plant species. METHODS: Species-specific scPlantAnnotate models were trained using curated datasets from Arabidopsis thaliana, Zea mays, Oryza sativa, and Glycine max. We compared scPlantAnnotate with leading baselines under both standard random-split evaluation and a more stringent leave-one-dataset-out setting, which tests robustness to completely unseen datasets and tissue types. RESULTS: scPlantAnnotate consistently outperforms existing approaches across all four species under random-split evaluation. In the leave-one-dataset-out setting for A. thaliana, where performance drops markedly for all methods due to strong batch effects and dataset heterogeneity, scPlantAnnotate nonetheless achieves the highest Accuracy, Macro-F1, Balanced Accuracy, and Macro-AUROC on average and ranks first on most held-out datasets. These results demonstrate improved robustness to dataset shifts, a critical yet underexplored challenge in plant scRNA-seq analysis. A freely accessible web server enables users to annotate their own datasets using pretrained models. CONCLUSION: scPlantAnnotate provides a plant-specific, Transformer-based framework for single-cell annotation that delivers state-of-the-art performance and enhanced robustness to unseen datasets. By addressing limitations of existing tools and enabling scalable reference-based annotation, scPlantAnnotate supports the development of comprehensive plant cell atlases and facilitates broader use of single-cell genomics in plant biology.

Authors

  • Chunyang Lu
    University of Missouri - Columbia, Department of Electrical Engineering and Computer Science, United States. Electronic address: [email protected].
  • Manish Sridhar Immadi
    Department of Electrical Engineering and Computer Science, University of Missouri-Columbia, Columbia, MO, USA.
  • Yen On Chan
    University of Missouri - Columbia, Christopher S Bond Life Sciences Center, United States; University of Missouri - Columbia, MU Institute for Data Science and Informatics, United States. Electronic address: [email protected].
  • Sameep Dhakal
    University of Missouri - Columbia, MU Institute for Data Science and Informatics, United States. Electronic address: [email protected].
  • Dong Xu
    Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO, USA.
  • Marc Libault
    University of Missouri - Columbia, Christopher S Bond Life Sciences Center, United States; University of Missouri - Columbia, Department of Biomedical Informatics, Biostatistics and Medical Epidemiology, United States. Electronic address: [email protected].
  • Trupti Joshi
    Department of Electrical Engineering and Computer Science, Informatics Institute, and Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, MO 65211, USA.

Keywords

No keywords available for this article.