Automatic Extraction of Medication Mentions from Tweets-Overview of the BioCreative VII Shared Task 3 Competition.

Journal: Database : the journal of biological databases and curation
Published Date:

Abstract

This study presents the outcomes of the shared task competition BioCreative VII (Task 3) focusing on the extraction of medication names from a Twitter user's publicly available tweets (the user's 'timeline'). In general, detecting health-related tweets is notoriously challenging for natural language processing tools. The main challenge, aside from the informality of the language used, is that people tweet about any and all topics, and most of their tweets are not related to health. Thus, finding those tweets in a user's timeline that mention specific health-related concepts such as medications requires addressing extreme imbalance. Task 3 called for detecting tweets in a user's timeline that mentions a medication name and, for each detected mention, extracting its span. The organizers made available a corpus consisting of 182 049 tweets publicly posted by 212 Twitter users with all medication mentions manually annotated. The corpus exhibits the natural distribution of positive tweets, with only 442 tweets (0.2%) mentioning a medication. This task was an opportunity for participants to evaluate methods that are robust to class imbalance beyond the simple lexical match. A total of 65 teams registered, and 16 teams submitted a system run. This study summarizes the corpus created by the organizers and the approaches taken by the participating teams for this challenge. The corpus is freely available at https://biocreative.bioinformatics.udel.edu/tasks/biocreative-vii/track-3/. The methods and the results of the competing systems are analyzed with a focus on the approaches taken for learning from class-imbalanced data.

Authors

  • Davy Weissenbacher
    Health Language Processing Center, Institute for Biomedical Informatics at the Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA.
  • Karen O'Connor
    Department of Biomedical Informatics, Arizona State University, Scottsdale, AZ, USA.
  • Siddharth Rawal
    DBEI, The Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA.
  • Yu Zhang
    College of Marine Electrical Engineering, Dalian Maritime University, Dalian, China.
  • Richard Tzong-Han Tsai
    Department of Computer Science and Information Engineering, National Central University, Taiwan.
  • Timothy Miller
    School of Computing and Information Systems, University of Melbourne, Victoria 3010, Australia.
  • Dongfang Xu
  • Carol Anderson
    NVIDIA, Santa Clara, CA, USA.
  • Bo Liu
    Wuhan United Imaging Healthcare Surgical Technology Co., Ltd., Wuhan, China.
  • Qing Han
    Engineering College, Honghe University, Honghe Yunnan, China.
  • Jinfeng Zhang
    Department of Statistics, Florida State University, Tallahassee, FL, 32306, USA. jinfeng@stat.fsu.edu.
  • Igor Kulev
    Data and Analytics Chapter, F. Hoffmann-La Roche Ltd, Switzerland.
  • Berkay Köprü
    Data and Analytics Chapter, F. Hoffmann-La Roche Ltd, Switzerland.
  • Raul Rodriguez-Esteban
    Roche Pharmaceutical Research and Early Development, pRED Informatics, Roche Innovation Center, Basel, Switzerland.
  • Elif Ozkirimli
    Department of Chemical Engineering, Bogazici University, Istanbul, Turkey; Department of Biochemistry, University of Zurich, Winterthurerstrasse 190, CH-8057 Zurich, Switzerland. Electronic address: elif.ozkirimli@boun.edu.tr.
  • Ammer Ayach
    Speech and Language Technology Lab, DFKI, Berlin, Germany.
  • Roland Roller
    German Research Center for AI (DFKI).
  • Stephen Piccolo
    Department of Biology, Brigham Young University, Provo, UT, USA.
  • Peijin Han
    Department of Radiation Oncology and Molecular Radiation Sciences, Johns Hopkins University, Baltimore, Maryland.
  • V G Vinod Vydiswaran
    Department of Learning Health Sciences, University of Michigan Medical School, Ann Arbor, MI, USA.
  • Ramya Tekumalla
    Department of Computer Science, Georgia State University, Atlanta, GA, USA.
  • Juan M Banda
    Stanford Center for Biomedical Informatics Research, Stanford University, Stanford, CA, USA.
  • Parsa Bagherzadeh
    CLaC Labs, Concordia University, Montreal, Canada.
  • Sabine Bergler
    CLaC Labs, Concordia University, Montreal, Canada.
  • João F Silva
    DETI, Institute of Electronics and Informatics Engineering of Aveiro, University of Aveiro, Portugal.
  • Tiago Almeida
    Department of Electronics, Telecommunications and Informatics (DETI), Institute of Electronics and Informatics Engineering of Aveiro (IEETA), University of Aveiro, Aveiro, Portugal.
  • Paloma Martínez
  • Renzo Rivera-Zavala
    Computer Science and Engineering Department, Universidad Carlos III de Madrid, Madrid, Spain.
  • Chen-Kai Wang
    Big Data Laboratory, Chunghwa Telecom Laboratories, Taoyuan, Taiwan.
  • Hong-Jie Dai
    Department of Computer Science and Information Engineering, National Taitung University, Taiwan. Electronic address: hjdai@nttu.edu.tw.
  • Luis Alberto Robles Hernandez
    Department of Computer Science, Georgia State University, Atlanta, GA, USA.
  • Graciela Gonzalez-Hernandez
    Health Language Processing Center, Institute for Biomedical Informatics at the Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA.