An ensemble learning framework for potential miRNA-disease association prediction with positive-unlabeled data.

Journal: Computational biology and chemistry
Published Date:

Abstract

To explore the pathogenic mechanisms of MicroRNA (miRNA) on diverse diseases, many researchers have concentrated on discovering the potential associations between miRNA and disease using machine learning methods. However, the prediction accuracy of supervised machine learning methods is limited by lacking of experimentally-validated uncorrelated miRNA-disease pairs. Without these negative samples, training a highly accurate model is much more difficult. Different from traditional miRNA-disease prediction models using randomly selected unknown samples as negative training samples, we propose an ensemble learning framework to solve this positive-unlabeled (PU) learning problem. The framework incorporates two steps, i.e., a novel semi-supervised Kmeans (SS-Kmeans) to extract reliable negative samples from unknown miRNA-disease pairs and subagging method to generate diverse training sample sets to make full use of those reliable negative samples for ensemble learning. Combined with effective random vector functional link (RVFL) network as prediction model, the proposed framework showed superior prediction accuracy comparing with other popular approaches. A case study on lung and gastric neoplasms further confirms the framework's efficacy at identifying miRNA disease associations.

Authors

  • Yao Wu
  • Donghua Zhu
    School of Management and Economics, Beijing Institute of Technology, Beijing 100081, China.
  • Xuefeng Wang
    Department of Advanced Manufacturing and Robotics, College of Engineering, Peking University, Beijing 100871, China.
  • Shuo Zhang
    Ph.D. Program in Computer Science, The City University of New York, New York, NY, United States.