HeadTailTransfer: An efficient sampling method to improve the performance of graph neural network method in predicting sparse ncRNA-protein interactions.

Journal: Computers in biology and medicine
Published Date:

Abstract

Noncoding RNA (ncRNA) is a functional RNA derived from DNA transcription, and most transcribed genes are transcribed into ncRNA. ncRNA is not directly involved in the translation of proteins, but it can participate in gene expression in cells and affect protein synthesis, thus playing an important role in biological processes such as growth, proliferation, metabolism, and information transmission. Therefore, understanding the interaction between ncRNA and protein is the basis for studying ncRNA regulation of protein-related biological activities. However, it is very expensive and time-consuming to verify ncRNA-protein interaction through biological experiments, and prediction methods based on machine learning have been developed rapidly. Recently, the graph neural network model (GNN) stands out for its excellent performance, but lacks a general framework for predicting ncRNA-protein interactions. We propose a GNN-based framework to predict ncRNA-protein interactions, which can utilize topological structure information to complete prediction tasks faster and more accurately. Meanwhile, for some smaller datasets, many ncRNA nodes lack neighbor information, resulting in lower prediction accuracy. For some larger datasets, the long-tail distribution causes the prediction of the tail nodes (sparse nodes linking few neighbors) to be affected. Therefore, we propose a new sampling method named HeadTailTransfer to mitigate these effects. Experimental results illustrate the effectiveness of this method. Especially for task-specific prediction on the RPI369 dataset in the Graphsage-based neural network framework, the AUC and ACC values increased from 56.8% and 52.2% to 80.2% and 71.8%, respectively. Our data and codes are available: https://github.com/kkkayle/HeadTailTransfer.

Authors

  • Jinhang Wei
    Wenzhou University of Technology, Wenzhou, 325000, China.
  • Linlin Zhuo
    School of Data Science and Artificial Intelligence, Wenzhou University of Technology, Wenzhou, Zhejiang 325035, China; College of Computer Science and Electronic Engineering, Hunan University, Changsha 410082, China.
  • Shiyao Pan
    Wenzhou University of Technology, Wenzhou, 325000, China.
  • Xinze Lian
    Wenzhou University of Technology, Wenzhou, 325000, China.
  • Xiaojun Yao
    Centre for Artificial Intelligence Driven Drug Discovery, Faculty of Applied Sciences, Macao Polytechnic University, Macao, 999078, PR China.
  • Xiangzheng Fu