Machine learning-augmented m6A-Seq analysis without a reference genome.

Journal: Briefings in bioinformatics

Published Date: May 1, 2025

Abstract

Methylated RNA m6A immunoprecipitation sequencing (m6A-Seq) is a powerful technique for investigating transcriptome-wide m6A modification. However, most of the existing m6A-Seq protocols rely on reference genomes, limiting their use in species lacking sequenced genomes. Here, we introduce mlPEA, a user-friendly, multi-functional platform specifically tailored to the streamlined processing of m6A-Seq data in a reference genome-free manner. mlPEA provides a comprehensive collection of functions required for performing transcriptome-wide m6A identification and analysis, where the reference-de novo assembled transcriptome-is built solely using m6A-Seq data. By taking advantage of machine learning (ML) algorithms, mlPEA enhances m6A-Seq data analysis by constructing robust computational models for identifying high-quality transcripts and high-confidence m6A-modified regions. These functions and ML models have been integrated into a web-based Galaxy framework. This ensures that mlPEA has powerful data interaction and visualization capabilities, with flexibility, traceability, and reproducibility throughout the analytical process. mlPEA also has high compatibility and portability as it employs advanced packaging technology, dramatically simplifying its large-scale application in various species. Validated through case studies of Arabidopsis, maize, and wheat, mlPEA has demonstrated its utility and robustness regarding reference genome-free m6A-Seq data analysis for plants of various genomic complexities. mlPEA is freely available via GitHub: https://github.com/cma2015/mlPEA.

Authors

Jing Yang

Beijing Novartis Pharma Co. Ltd., Beijing, China.
Minggui Song

State Key Laboratory of Crop Stress Resistance and High-Efficiency Production, College of Life Sciences, Northwest A&F University, Shaanxi, Yangling 712100, China.
Yifan Bu

State Key Laboratory of Crop Stress Resistance and High-Efficiency Production, College of Life Sciences, Northwest A&F University, Shaanxi, Yangling 712100, China.
Haonan Zhao

National Key Laboratory of Crop Genetic Improvement, College of Plant Science and Technology, Huazhong Agricultural University, No. 1 Shizishan Road, Hongshan District, Wuhan, Hubei 430070, China.
Chenghui Liu

Key Laboratory of Analytical Chemistry for Life Science of Shaanxi Province, School of Chemistry and Chemical Engineering, Shaanxi Normal University, Xi'an 710062, Shaanxi Province, PR China. Electronic address: liuch@snnu.edu.cn.
Ting Zhang

Beijing Municipal Key Laboratory of Child Development and Nutriomics, Capital Institute of Pediatrics, Beijing 100020, China.
Chujun Zhang

State Key Laboratory of Crop Stress Resistance and High-Efficiency Production, College of Life Sciences, Northwest A&F University, Shaanxi, Yangling 712100, China.
Shutu Xu

Key Laboratory of Biology and Genetics Improvement of Maize in Arid Area of Northwest Region, Ministry of Agriculture, Northwest A&F University, Shaanxi, Yangling 712100, China.
Chuang Ma

State Key Laboratory of Crop Stress Biology for Arid Areas, Center of Bioinformatics, College of Life Sciences, Northwest A&F University, Yangling, 712100, Shaanxi, China. cma@nwafu.edu.cn.

Keywords

Algorithms Arabidopsis Machine Learning Sequence Analysis, RNA Software Transcriptome

External Resources

View on PubMed Access via DOI PubMed (40415679)

Machine learning-augmented m6A-Seq analysis without a reference genome.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals