Extraction of microRNA-target interaction sentences from biomedical literature by deep learning approach.

Journal: Briefings in bioinformatics
Published Date:

Abstract

MicroRNA (miRNA)-target interaction (MTI) plays a substantial role in various cell activities, molecular regulations and physiological processes. Published biomedical literature is the carrier of high-confidence MTI knowledge. However, digging out this knowledge in an efficient manner from large-scale published articles remains challenging. To address this issue, we were motivated to construct a deep learning-based model. We applied the pre-trained language models to biomedical text to obtain the representation, and subsequently fed them into a deep neural network with gate mechanism layers and a fully connected layer for the extraction of MTI information sentences. Performances of the proposed models were evaluated using two datasets constructed on the basis of text data obtained from miRTarBase. The validation and test results revealed that incorporating both PubMedBERT and SciBERT for sentence level encoding with the long short-term memory (LSTM)-based deep neural network can yield an outstanding performance, with both F1 and accuracy being higher than 80% on validation data and test data. Additionally, the proposed deep learning method outperformed the following machine learning methods: random forest, support vector machine, logistic regression and bidirectional LSTM. This work would greatly facilitate studies on MTI analysis and regulations. It is anticipated that this work can assist in large-scale screening of miRNAs, thereby revealing their functional roles in various diseases, which is important for the development of highly specific drugs with fewer side effects. Source code and corpus are publicly available at https://github.com/qi29.

Authors

  • Mengqi Luo
    Department of Psychiatry and Psychiatric Institute, University of Illinois College of Medicine, Chicago, IL 60612, USA.
  • Shangfu Li
    Warshel Institute for Computational Biology, The Chinese University of Hong Kong, Shenzhen, China.
  • Yuxuan Pang
    Warshel Institute for Computational Biology, The Chinese University of Hong Kong, Shenzhen, PR China, and also in the School of Science and Engineering, The Chinese University of Hong Kong, Shenzhen, PR China.
  • Lantian Yao
    Warshel Institute for Computational Biology, The Chinese University of Hong Kong, Shenzhen, PR China, and also in the School of Science and Engineering, The Chinese University of Hong Kong, Shenzhen, PR China.
  • Renfei Ma
    Warshel Institute for Computational Biology, Chinese University of Hong Kong, Shenzhen; School of Life Sciences, University of Science and Technology of China, Hefei, China.
  • Hsi-Yuan Huang
    School of Medicine and the Warshel Institute of Computational Biology, The Chinese University of Hong Kong, Shenzhen.
  • Hsien-Da Huang
    School of Life and Health Sciences, Chinese University of Hong Kong, Shenzhen, China.
  • Tzong-Yi Lee