FunlncModel: integrating multi-omic features from upstream and downstream regulatory networks into a machine learning framework to identify functional lncRNAs.

Journal: Briefings in bioinformatics
PMID:

Abstract

Accumulating evidence indicates that long noncoding RNAs (lncRNAs) play important roles in molecular and cellular biology. Although many algorithms have been developed to reveal their associations with complex diseases by using downstream targets, the upstream (epi)genetic regulatory information has not been sufficiently leveraged to predict the function of lncRNAs in various biological processes. Therefore, we present FunlncModel, a machine learning-based interpretable computational framework, which aims to screen out functional lncRNAs by integrating a large number of (epi)genetic features and functional genomic features from their upstream/downstream multi-omic regulatory networks. We adopted the random forest method to mine nearly 60 features in three categories from >2000 datasets across 11 data types, including transcription factors (TFs), histone modifications, typical enhancers, super-enhancers, methylation sites, and mRNAs. FunlncModel outperformed alternative methods for classification performance in human embryonic stem cell (hESC) (0.95 Area Under Curve (AUROC) and 0.97 Area Under the Precision-Recall Curve (AUPRC)). It could not only infer the most known lncRNAs that influence the states of stem cells, but also discover novel high-confidence functional lncRNAs. We extensively validated FunlncModel's efficacy by up to 27 cancer-related functional prediction tasks, which involved multiple cancer cell growth processes and cancer hallmarks. Meanwhile, we have also found that (epi)genetic regulatory features, such as TFs and histone modifications, serve as strong predictors for revealing the function of lncRNAs. Overall, FunlncModel is a strong and stable prediction model for identifying functional lncRNAs in specific cellular contexts. FunlncModel is available as a web server at https://bio.liclab.net/FunlncModel/.

Authors

  • Yan-Yu Li
    Department of Radiology, State Key Laboratory of Complex Severe and Rare Diseases, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, No.1, Shuaifuyuan, Dongcheng District, Beijing, 100730, China.
  • Feng-Cui Qian
    The First Affiliated Hospital & National Health Commission Key Laboratory of Birth Defect Research and Prevention, Hengyang Medical School, University of South China, Hengyang, Hunan, 421001, China.
  • Guo-Rui Zhang
    Institute of Biochemistry and Molecular Biology, Hengyang Medical College, University of South China, Hengyang, Hunan, 421001, China.
  • Xue-Cang Li
    School of Medical Informatics, Daqing Campus, Harbin Medical University, Daqing, 163000, China.
  • Li-Wei Zhou
    State Key Laboratory of Stem Cell and Reproductive Biology, Institute of Zoology, Chinese Academy of Sciences, Beijing, 100101, China.
  • Zheng-Min Yu
    School of Computer, University of South China, Hengyang, Hunan, 421001, China.
  • Wei Liu
    Department of Radiation Oncology, Mayo Clinic, Scottsdale, AZ, United States.
  • Qiu-Yu Wang
    The First Affiliated Hospital & National Health Commission Key Laboratory of Birth Defect Research and Prevention, Hengyang Medical School, University of South China, Hengyang, Hunan, 421001, China.
  • Chun-Quan Li
    The First Affiliated Hospital & National Health Commission Key Laboratory of Birth Defect Research and Prevention, Hengyang Medical School, University of South China, Hengyang, Hunan, 421001, China.