Machine learning-augmented m6A-Seq analysis without a reference genome.
Journal:
Briefings in bioinformatics
Published Date:
May 1, 2025
Abstract
Methylated RNA m6A immunoprecipitation sequencing (m6A-Seq) is a powerful technique for investigating transcriptome-wide m6A modification. However, most of the existing m6A-Seq protocols rely on reference genomes, limiting their use in species lacking sequenced genomes. Here, we introduce mlPEA, a user-friendly, multi-functional platform specifically tailored to the streamlined processing of m6A-Seq data in a reference genome-free manner. mlPEA provides a comprehensive collection of functions required for performing transcriptome-wide m6A identification and analysis, where the reference-de novo assembled transcriptome-is built solely using m6A-Seq data. By taking advantage of machine learning (ML) algorithms, mlPEA enhances m6A-Seq data analysis by constructing robust computational models for identifying high-quality transcripts and high-confidence m6A-modified regions. These functions and ML models have been integrated into a web-based Galaxy framework. This ensures that mlPEA has powerful data interaction and visualization capabilities, with flexibility, traceability, and reproducibility throughout the analytical process. mlPEA also has high compatibility and portability as it employs advanced packaging technology, dramatically simplifying its large-scale application in various species. Validated through case studies of Arabidopsis, maize, and wheat, mlPEA has demonstrated its utility and robustness regarding reference genome-free m6A-Seq data analysis for plants of various genomic complexities. mlPEA is freely available via GitHub: https://github.com/cma2015/mlPEA.