Development of a robust corpus for automated evaluation of online health information in Chinese using the DISCERN scale.

Journal: Journal of the American Medical Informatics Association : JAMIA
Published Date:

Abstract

OBJECTIVE: To develop the first comprehensive, standardized annotated corpus of Chinese online health information (OHI) using the full 16-item DISCERN instrument and to establish a reliable annotation process that supports automated quality assessment. MATERIALS AND METHODS: We assembled 510 web-sourced articles on breast cancer, arthritis, and depression. All the articles were independently annotated by three trained raters using the DISCERN scale. Annotation followed a four-step workflow: data collection and preprocessing, rater training, iterative annotation, and quality control. Raters calibrated through consensus sessions and calibration articles. The Dawid-Skene model aggregated individual annotations into final consensus scores. Original five-point ratings were retained and binarized (scores 1-3 as low quality, 4-5 as high quality) to enable both fine-grained and coarse evaluation for machine learning. RESULTS: Initial annotation of a 60-article pilot produced low agreement (mean Krippendorff's α ≈ 0.022) due to subjective variability. Successive calibration exercises improved agreement markedly, culminating in a corpus-wide Krippendorff's α of 0.834. Consensus scores correlated strongly with individual rater scores, confirming annotation robustness. The dual-scale design yielded a relatively balanced distribution of labels across topics, with roughly equal representation of low- and high-quality articles, and preserved granularity for detailed DISCERN analysis. DISCUSSION: Our iterative calibration approach and consensus modeling effectively addressed the subjective ambiguity inherent in quality assessment. The binary and five-class labeling strategies facilitate flexible downstream applications, allowing automated systems to perform both broad filtering and nuanced quality differentiation. The high inter-rater reliability demonstrates that rigorous training and consensus methods can overcome domain-specific annotation challenges. CONCLUSION: The resulting Chinese OHI corpus, annotated via a standardized DISCERN framework and refined through iterative calibration, provides a robust benchmark for training and evaluating machine learning models. This resource lays the foundation for scalable, reliable automated quality assessment of OHI in Chinese public health settings.

Authors

  • Ting E
    Bloomberg School of Public Health,Johns Hopkins University, MD, 21205, United States.
  • Xingxi Li
    Department of Industrial Engineering, Tsinghua University, Beijing, 100084, China.
  • Jun Liang
    Department of AI and IT, Second Affiliated Hospital, School of Medicine, Zhejiang University, Hangzhou, Zhejiang, People's Republic of China.
  • Junhao Ma
    College of Electrical Engineering and Automation, Shandong University of Science and Technology, Qingdao 266590, China.
  • Qichuan Fang
    School of Medical Technology and Information Engineering, Zhejiang Chinese Medical University, Hangzhou, Zhejiang Province, 310053, China.
  • Shanli Chen
    School of Public Health, Southwest Medical University, Luzhou, Sichuan Province, 646000, China.
  • Jianbo Lei
    Clinical Research Center, The Affiliated Hospital of Southwest Medical University, Luzhou, Sichuan, People's Republic of China.
  • Christopher G Chute

Keywords

No keywords available for this article.