Tweet Classification Toward Twitter-Based Disease Surveillance: New Data, Methods, and Evaluations.
Journal:
Journal of medical Internet research
Published Date:
Feb 20, 2019
Abstract
BACKGROUND: The amount of medical and clinical-related information on the Web is increasing. Among the different types of information available, social media-based data obtained directly from people are particularly valuable and are attracting significant attention. To encourage medical natural language processing (NLP) research exploiting social media data, the 13th NII Testbeds and Community for Information access Research (NTCIR-13) Medical natural language processing for Web document (MedWeb) provides pseudo-Twitter messages in a cross-language and multi-label corpus, covering 3 languages (Japanese, English, and Chinese) and annotated with 8 symptom labels (such as cold, fever, and flu). Then, participants classify each tweet into 1 of the 2 categories: those containing a patient's symptom and those that do not.