Protocol for the automatic extraction of epidemiological information via a pre-trained language model.

Journal: STAR protocols
Published Date:

Abstract

The lack of systems to automatically extract epidemiological fields from open-access COVID-19 cases restricts the timeliness of formulating prevention measures. Here we present a protocol for using CCIE, a COVID-19 Cases Information Extraction system based on the pre-trained language model. We describe steps for preparing supervised training data and executing python scripts for named entity recognition and text category classification. We then detail the use of machine evaluation and manual validation to illustrate the effectiveness of CCIE. For complete details on the use and execution of this protocol, please refer to Wang et al..

Authors

  • Zhizheng Wang
    National Library of Medicine, National Institutes of Health, Bethesda, MD, USA.
  • Xiao Fan Liu
    Web Mining Laboratory, Department of Media and Communication, City University of Hong Kong, Kowloon Tong, Hong Kong Special Administrative Region, 999077, China.
  • Zhanwei Du
    WHO Collaborating Centre for Infectious Disease Epidemiology and Control, School of Public Health, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Central And Western District, Hong Kong Special Administrative Region, 999077, China.
  • Lin Wang
    Department of Engineering Mechanics, Tsinghua University, Beijing 100084, China.
  • Ye Wu
    Department of Radiology and BRIC, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA.
  • Petter Holme
    Tokyo Tech World Research Hub Initiative (WRHI), Institute of Innovative Research, Tokyo Institute of Technology, Tokyo 152-8550, Japan.
  • Michael Lachmann
    Santa Fe Institute, Santa Fe, NM 87507, USA.
  • Hongfei Lin
  • Zhuoyue Wang
    College of Computer Science and Technology, Dalian University of Technology, 116023, Dalian, Liaoning, China.
  • Yu Cao
    Department of Neurosurgery, FuShun County Zigong City People's Hospital, Fushun, China.
  • Zoie S Y Wong
    Graduate School of Public Health, St. Luke's International University, Tokyo, 104-0045, Japan. Electronic address: zoiesywong@gmail.com.
  • Xiao-Ke Xu
    Computational Communication Research Center, Beijing Normal University, Zhuhai, Guangdong, 519087, China; School of Journalism and Communication, Beijing Normal University, Beijing, 100875, China. Electronic address: xuxiaoke@foxmail.com.
  • Yuanyuan Sun
    School of Computer Science and Technology, Dalian University of Technology, Dalian, China.