Extracting comprehensive clinical information for breast cancer using deep learning methods.

Journal: International journal of medical informatics
Published Date:

Abstract

OBJECTIVE: Breast cancer is the most common malignant tumor among women. The diagnosis and treatment information of breast cancer patients is abundant in multiple types of clinical fields, including clinicopathological data, genotype and phenotype information, treatment information, and prognosis information. However, current studies are mainly focused on extracting information from one specific type of clinical field. This study defines a comprehensive information model to represent the whole-course clinical information of patients. Furthermore, deep learning approaches are used to extract the concepts and their attributes from clinical breast cancer documents by fine-tuning pretrained Bidirectional Encoder Representations from Transformers (BERT) language models.

Authors

  • Xiaohui Zhang
    Department of Orthopaedic Surgery, the Second Hospital &Clinical Medical School, Lanzhou University, Lanzhou, Gansu Province, China.
  • Yaoyun Zhang
    Alibaba Damo Academy, 969 West Wen Yi Road, Yu Hang District, Hangzhou, Zhejiang, China.
  • Qin Zhang
    Department of Burn, Ruijin Hospital, Shanghai Jiaotong University School of Medicine, Shanghai, China.
  • Yuankai Ren
    Digital China Health Technologies Co. Ltd., Beijing, China.
  • Tinglin Qiu
    National Cancer Center/Cancer Hospital, Peking Union Medical College & Chinese Academy of Medical Sciences, Beijing, China.
  • Jianhui Ma
    National Cancer Center/Cancer Hospital, Peking Union Medical College & Chinese Academy of Medical Sciences, Beijing, China. Electronic address: majianhui@csco.org.cn.
  • Qiang Sun
    Research Center for Agricultural and Sideline Products Processing, Henan Academy of Agricultural Sciences, 116 Park Road, Zhengzhou 450002, PR China.