Machine learning-augmented m6A-Seq analysis without a reference genome.

Journal: Briefings in bioinformatics
Published Date:

Abstract

Methylated RNA m6A immunoprecipitation sequencing (m6A-Seq) is a powerful technique for investigating transcriptome-wide m6A modification. However, most of the existing m6A-Seq protocols rely on reference genomes, limiting their use in species lacking sequenced genomes. Here, we introduce mlPEA, a user-friendly, multi-functional platform specifically tailored to the streamlined processing of m6A-Seq data in a reference genome-free manner. mlPEA provides a comprehensive collection of functions required for performing transcriptome-wide m6A identification and analysis, where the reference-de novo assembled transcriptome-is built solely using m6A-Seq data. By taking advantage of machine learning (ML) algorithms, mlPEA enhances m6A-Seq data analysis by constructing robust computational models for identifying high-quality transcripts and high-confidence m6A-modified regions. These functions and ML models have been integrated into a web-based Galaxy framework. This ensures that mlPEA has powerful data interaction and visualization capabilities, with flexibility, traceability, and reproducibility throughout the analytical process. mlPEA also has high compatibility and portability as it employs advanced packaging technology, dramatically simplifying its large-scale application in various species. Validated through case studies of Arabidopsis, maize, and wheat, mlPEA has demonstrated its utility and robustness regarding reference genome-free m6A-Seq data analysis for plants of various genomic complexities. mlPEA is freely available via GitHub: https://github.com/cma2015/mlPEA.

Authors

  • Jing Yang
    Beijing Novartis Pharma Co. Ltd., Beijing, China.
  • Minggui Song
    State Key Laboratory of Crop Stress Resistance and High-Efficiency Production, College of Life Sciences, Northwest A&F University, Shaanxi, Yangling 712100, China.
  • Yifan Bu
    State Key Laboratory of Crop Stress Resistance and High-Efficiency Production, College of Life Sciences, Northwest A&F University, Shaanxi, Yangling 712100, China.
  • Haonan Zhao
    National Key Laboratory of Crop Genetic Improvement, College of Plant Science and Technology, Huazhong Agricultural University, No. 1 Shizishan Road, Hongshan District, Wuhan, Hubei 430070, China.
  • Chenghui Liu
    Key Laboratory of Analytical Chemistry for Life Science of Shaanxi Province, School of Chemistry and Chemical Engineering, Shaanxi Normal University, Xi'an 710062, Shaanxi Province, PR China. Electronic address: liuch@snnu.edu.cn.
  • Ting Zhang
    Beijing Municipal Key Laboratory of Child Development and Nutriomics, Capital Institute of Pediatrics, Beijing 100020, China.
  • Chujun Zhang
    State Key Laboratory of Crop Stress Resistance and High-Efficiency Production, College of Life Sciences, Northwest A&F University, Shaanxi, Yangling 712100, China.
  • Shutu Xu
    Key Laboratory of Biology and Genetics Improvement of Maize in Arid Area of Northwest Region, Ministry of Agriculture, Northwest A&F University, Shaanxi, Yangling 712100, China.
  • Chuang Ma
    State Key Laboratory of Crop Stress Biology for Arid Areas, Center of Bioinformatics, College of Life Sciences, Northwest A&F University, Yangling, 712100, Shaanxi, China. cma@nwafu.edu.cn.