MLCDForest: multi-label classification with deep forest in disease prediction for long non-coding RNAs.

Journal: Briefings in bioinformatics
Published Date:

Abstract

The long non-coding RNAs (lncRNAs) are subject of intensive recent studies due to its association with various human diseases. It is desirable to build the artificial intelligence-based models for prediction of diseases or tissues based on the lncRNAs data, which will be useful in disease diagnosis and therapy. The accuracy and robustness of existing models based on the machine learning techniques are subject to further improvement. In this study, we propose a deep learning model, called Multi-Label Classifications with Deep Forest, termed MLCDForest, to address multi-label classification on tissue prediction for a given lncRNA, which can be regarded as an implementation of the deep forest model in multi-label classification. The MLCDForest is a sequential multi-label-grained scanning method, which distinguishes from the standard deep forest model. It is proposed to train in sequential of multi-labels with label correlation considered. A systematic comparison using the lncRNA-disease association datasets demonstrates that our method consistently shows superior performance over the state-of-the-art methods in disease prediction. Considering label correlation in the sequential multi-label-grained scanning, our model provides a powerful tool to make multi-label classification and tissue prediction based on given lncRNAs.

Authors

  • Wei Wang
    State Key Laboratory of Quality Research in Chinese Medicine, Institute of Chinese Medical Sciences, University of Macau, Macau 999078, China.
  • QiuYing Dai
  • Fang Li
    Department of General Surgery, Chongqing General Hospital, Chongqing, China.
  • Yi Xiong
    Departement of Medical Oncology, Lung Cancer and Gastrointestinal Unit, Hunan Cancer Hospital/Affiliated Cancer Hospital of Xiangya School of Medicine, Changsha 410013, China.
  • Dong-Qing Wei