Application of BERT to Enable Gene Classification Based on Clinical Evidence.

Journal: BioMed research international
Published Date:

Abstract

The identification of profiled cancer-related genes plays an essential role in cancer diagnosis and treatment. Based on literature research, the classification of genetic mutations continues to be done manually nowadays. Manual classification of genetic mutations is pathologist-dependent, subjective, and time-consuming. To improve the accuracy of clinical interpretation, scientists have proposed computational-based approaches for automatic analysis of mutations with the advent of next-generation sequencing technologies. Nevertheless, some challenges, such as multiple classifications, the complexity of texts, redundant descriptions, and inconsistent interpretation, have limited the development of algorithms. To overcome these difficulties, we have adapted a deep learning method named Bidirectional Encoder Representations from Transformers (BERT) to classify genetic mutations based on text evidence from an annotated database. During the training, three challenging features such as the extreme length of texts, biased data presentation, and high repeatability were addressed. Finally, the BERT+abstract demonstrates satisfactory results with 0.80 logarithmic loss, 0.6837 recall, and 0.705 -measure. It is feasible for BERT to classify the genomic mutation text within literature-based datasets. Consequently, BERT is a practical tool for facilitating and significantly speeding up cancer research towards tumor progression, diagnosis, and the design of more precise and effective treatments.

Authors

  • Yuhan Su
    National Pilot School of Software, Yunnan University, Kunming, 650091, China.
  • Hongxin Xiang
    National Pilot School of Software, Yunnan University, Kunming, 650091, China.
  • Haotian Xie
    Department of Mathematics, The Ohio State University, Columbus, OH 43210, USA.
  • Yong Yu
    Department of Automation, Xi'an Institute of High-Technology, Xi'an 710025, China, and Institute No. 25, Second Academy of China, Aerospace Science and Industry Corporation, Beijing 100854, China yuyongep@163.com.
  • Shiyan Dong
    Department of Radiation Oncology, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA.
  • Zhaogang Yang
    Department of Radiation Oncology, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA.
  • Na Zhao
    Department of Gynecology, Peking University First Hospital Ningxia Women and Children's Hospital, Yinchuan, Ningxia, China.