An automatic system to identify heart disease risk factors in clinical texts over time.

Journal: Journal of biomedical informatics
Published Date:

Abstract

Despite recent progress in prediction and prevention, heart disease remains a leading cause of death. One preliminary step in heart disease prediction and prevention is risk factor identification. Many studies have been proposed to identify risk factors associated with heart disease; however, none have attempted to identify all risk factors. In 2014, the National Center of Informatics for Integrating Biology and Beside (i2b2) issued a clinical natural language processing (NLP) challenge that involved a track (track 2) for identifying heart disease risk factors in clinical texts over time. This track aimed to identify medically relevant information related to heart disease risk and track the progression over sets of longitudinal patient medical records. Identification of tags and attributes associated with disease presence and progression, risk factors, and medications in patient medical history were required. Our participation led to development of a hybrid pipeline system based on both machine learning-based and rule-based approaches. Evaluation using the challenge corpus revealed that our system achieved an F1-score of 92.68%, making it the top-ranked system (without additional annotations) of the 2014 i2b2 clinical NLP challenge.

Authors

  • Qingcai Chen
    Key Laboratory of Network Oriented Intelligent Computation, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong, China.
  • Haodi Li
    Key Laboratory of Network Oriented Intelligent Computation, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen 518055, China. Electronic address: haodili.hit@gmail.com.
  • Buzhou Tang
  • Xiaolong Wang
    Cardiovascular Department, Shuguang Hospital Affiliated to Shanghai University of TCM Shanghai, China.
  • Xin Liu
    Peking University Institute of Advanced Agricultural Sciences, Shandong Laboratory of Advanced Agricultural Sciences, Weifang, Shandong, China.
  • Zengjian Liu
    Key Laboratory of Network Oriented Intelligent Computation, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen 518055, China. Electronic address: liuzengjian.hit@gmail.com.
  • Shu Liu
    Key Laboratory of Network Oriented Intelligent Computation, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen 518055, China. Electronic address: liushuhit@outlook.com.
  • Weida Wang
    Key Laboratory of Network Oriented Intelligent Computation, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen 518055, China. Electronic address: weida.wong@gmail.com.
  • Qiwen Deng
    The Sixth People's Hospital of Shenzhen, Shenzhen 518052, China. Electronic address: qiwendeng@hotmail.com.
  • Suisong Zhu
    The Sixth People's Hospital of Shenzhen, Shenzhen 518052, China. Electronic address: 13809883596@163.com.
  • Yangxin Chen
    Department of Cardiology, Sun Yat-sen Memorial Hospital of Sun Yat-sen University, Guangzhou 510120, China. Electronic address: tjcyx1995@163.com.
  • Jingfeng Wang
    Department of Cardiology, Sun Yat-sen Memorial Hospital of Sun Yat-sen University, Guangzhou 510120, China. Electronic address: dr_wjf@hotmail.com.