A disease-specific language representation model for cerebrovascular disease research.

Journal: Computer methods and programs in biomedicine
Published Date:

Abstract

BACKGROUND: Effectively utilizing disease-relevant text information from unstructured clinical notes for medical research presents many challenges. BERT (Bidirectional Encoder Representation from Transformers) related models such as BioBERT and ClinicalBERT, pre-trained on biomedical corpora and general clinical information, have shown promising performance in various biomedical language processing tasks.

Authors

  • Ching-Heng Lin
    Institute of Biomedical Informatics, National Yang-Ming University, Taipei, Taiwan.
  • Kai-Cheng Hsu
    Bioinformatics Section, National Institute of Neurological Disorder and Stroke, National Institutes of Health, Bethesda, MD, United States; Department of Neurology, National Taiwan University Hospital, Taipei, Taiwan.
  • Chih-Kuang Liang
  • Tsong-Hai Lee
    i Department of Neurology and Stroke Center , Chang Gung Memorial Hospital , Taoyuan , Taiwan.
  • Chia-Wei Liou
    Department of Neurology, Kaohsiung Chang Gung Memorial Hospital, Kaohsiung, Taiwan and College of Medicine, Chang Gung University, Taoyuan, Taiwan.
  • Jiann-Der Lee
    Department of Neurology, Chiayi Chang Gung Memorial Hospital, Chiayi, Taiwan and College of Medicine, Chang Gung University, Taoyuan, Taiwan.
  • Tsung-I Peng
    Department of Neurology, Keelung Chang Gung Memorial Hospital, Keelung, Taiwan and College of Medicine, Chang Gung University, Taoyuan, Taiwan.
  • Ching-Sen Shih
    Center for Geriatrics and Gerontology, Kaohsiung Veterans General Hospital, Kaohsiung, Taiwan.
  • Yang C Fann
    Bioinformatics Section, National Institute of Neurological Disorder and Stroke, National Institutes of Health, Bethesda, MD, United States. Electronic address: fann@ninds.nih.gov.