Automatically Identifying Topics of Consumer Health Questions in Chinese.

Journal: Studies in health technology and informatics
Published Date:

Abstract

In health question answering (QA) system development, question topic identification is crucial to understand users' information needs and further facilitate answer extraction. This paper presented a machine-learning method to automatically identify topics of health related questions in Chinese asked by the general public. We collected 2000 questions from Chinese consumer health website, and characterized them using 17 types of features such as lexical, grammatical, statistical, and semantic features. This method were applied to identify 6 health question topics of Condition Management, Healthy Lifestyle, Diagnosis, Health Provider Choosing, Treatment, and Epidemiology. The results showed the average F1-scores of the above 6 topic identification were 99.63%, 99.13%, 98.55%, 96.35%, 76.02%, and 71.77%, respectively.

Authors

  • Haihong Guo
    Institute of Medical Information & Library, Chinese Academy of Medical Sciences, Beijing, China.
  • Xu Na
    Institute of Medical Information & Library, Chinese Academy of Medical Sciences, Beijing, China.
  • Jiao Li
    CAS Key Laboratory of Tropical Marine Bio-resources and Ecology, South China Sea Institute of Oceanology, Chinese Academy of Sciences Guangzhou 510301 China yinhao@scsio.ac.cn.