Recognizing Disjoint Clinical Concepts in Clinical Text Using Machine Learning-based Methods.

Journal: AMIA ... Annual Symposium proceedings. AMIA Symposium
Published Date:

Abstract

Clinical concept recognition (CCR) is a fundamental task in clinical natural language processing (NLP) field. Almost all current machine learning-based CCR systems can only recognize clinical concepts of consecutive words (called consecutive clinical concepts, CCCs), but can do nothing about clinical concepts of disjoint words (called disjoint clinical concepts, DCCs), which widely exist in clinical text. In this paper, we proposed two novel types of representations for disjoint clinical concepts, and applied two state-of-the-art machine learning methods to recognizing consecutive and disjoint concepts. Experiments conducted on the 2013 ShARe/CLEF challenge corpus showed that our best system achieved a "strict" F-measure of 0.803 for CCCs, a "strict" F-measure of 0.477 for DCCs, and a "strict" F-measure of 0.783 for all clinical concepts, significantly higher than the baseline systems by 4.2% and 4.1% respectively.

Authors

  • Buzhou Tang
  • Qingcai Chen
    Key Laboratory of Network Oriented Intelligent Computation, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong, China.
  • Xiaolong Wang
    Cardiovascular Department, Shuguang Hospital Affiliated to Shanghai University of TCM Shanghai, China.
  • Yonghui Wu
    Department of Health Outcomes and Biomedical Informatics.
  • Yaoyun Zhang
    Alibaba Damo Academy, 969 West Wen Yi Road, Yu Hang District, Hangzhou, Zhejiang, China.
  • Min Jiang
    Eli Lilly and Company, Indianapolis, IN, United States.
  • Jingqi Wang
    School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, USA.
  • Hua Xu
    Department of Urology, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China.