Tweet Classification Toward Twitter-Based Disease Surveillance: New Data, Methods, and Evaluations.

Journal: Journal of medical Internet research
Published Date:

Abstract

BACKGROUND: The amount of medical and clinical-related information on the Web is increasing. Among the different types of information available, social media-based data obtained directly from people are particularly valuable and are attracting significant attention. To encourage medical natural language processing (NLP) research exploiting social media data, the 13th NII Testbeds and Community for Information access Research (NTCIR-13) Medical natural language processing for Web document (MedWeb) provides pseudo-Twitter messages in a cross-language and multi-label corpus, covering 3 languages (Japanese, English, and Chinese) and annotated with 8 symptom labels (such as cold, fever, and flu). Then, participants classify each tweet into 1 of the 2 categories: those containing a patient's symptom and those that do not.

Authors

  • Shoko Wakamiya
    Nara Institute of Science and Technology (NAIST), Japan.
  • Mizuki Morita
    Okayama University, Okayama, Japan.
  • Yoshinobu Kano
    Faculty of Informatics Shizuoka University Hamamatsu Shizuoka Japan.
  • Tomoko Ohkuma
    Fuji Xerox Co., Ltd., Yokohama, Japan.
  • Eiji Aramaki
    Nara Institute of Science and Technology (NAIST), Japan.