GANSamples-ac4C: Enhancing ac4C site prediction via generative adversarial networks and transfer learning.

Journal: Analytical biochemistry
PMID:

Abstract

RNA modification, N4-acetylcytidine (ac4C), is enzymatically catalyzed by N-acetyltransferase 10 (NAT10) and plays an essential role across tRNA, rRNA, and mRNA. It influences various cellular functions, including mRNA stability and rRNA biosynthesis. Wet-lab detection of ac4C modification sites is highly resource-intensive and costly. Therefore, various machine learning and deep learning techniques have been employed for computational detection of ac4C modification sites. The known ac4C modification sites are limited for training an accurate and stable prediction model. This study introduces GANSamples-ac4C, a novel framework that synergizes transfer learning and generative adversarial network (GAN) to generate synthetic RNA sequences to train a better ac4C modification site prediction model. Comparative analysis reveals that GANSamples-ac4C outperforms existing state-of-the-art methods in identifying ac4C sites. Moreover, our result underscores the potential of synthetic data in mitigating the issue of data scarcity for biological sequence prediction tasks. Another major advantage of GANSamples-ac4C is its interpretable decision logic. Multi-faceted interpretability analyses detect key regions in the ac4C sequences influencing the discriminating decision between positive and negative samples, a pronounced enrichment of G in this region, and ac4C-associated motifs. These findings may offer novel insights for ac4C research. The GANSamples-ac4C framework and its source code are publicly accessible at http://www.healthinformaticslab.org/supp/.

Authors

  • Fei Li
    Institute for Precision Medicine, Tsinghua University, Beijing, China.
  • Jiale Zhang
    Institute of Basic Theory for Chinese Medicine, China Academy of Chinese Medical Sciences, Beijing, China.
  • Kewei Li
    Institute of Microbiology, Jilin Provincial Center for Disease Control and Prevention Changchun, China.
  • Yu Peng
  • Haotian Zhang
  • Yiping Xu
    Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, and College of Computer Science and Technology, Jilin University, Changchun, Jilin, 130012, China.
  • Yue Yu
    Department of Mathematics, Lehigh University, Bethlehem, PA, USA.
  • Yuteng Zhang
    College of Software, Jilin University, Changchun, Jilin, 130012, China.
  • Zewen Liu
    College of Software, Jilin University, Changchun, Jilin, 130012, China.
  • Ying Wang
    Key Laboratory of Macromolecular Science of Shaanxi Province, School of Chemistry & Chemical Engineering, Shaanxi Normal University, Xi'an, Shaanxi 710062, China.
  • Lan Huang
  • Fengfeng Zhou