An Ensemble Method for Spelling Correction in Consumer Health Questions.

Journal: AMIA ... Annual Symposium proceedings. AMIA Symposium
Published Date:

Abstract

Orthographic and grammatical errors are a common feature of informal texts written by lay people. Health-related questions asked by consumers are a case in point. Automatic interpretation of consumer health questions is hampered by such errors. In this paper, we propose a method that combines techniques based on edit distance and frequency counts with a contextual similarity-based method for detecting and correcting orthographic errors, including misspellings, word breaks, and punctuation errors. We evaluate our method on a set of spell-corrected questions extracted from the NLM collection of consumer health questions. Our method achieves a F1 score of 0.61, compared to an informed baseline of 0.29, achieved using ESpell, a spelling correction system developed for biomedical queries. Our results show that orthographic similarity is most relevant in spelling error correction in consumer health questions and that frequency and contextual information are complementary to orthographic features.

Authors

  • Halil Kilicoglu
    School of Information Sciences, University of Illinois Urbana-Champaign, Champaign, IL 61820, United States.
  • Marcelo Fiszman
    Lister Hill National Center for Biomedical Communications U.S. National Library of Medicine Bethesda, MD.
  • Kirk Roberts
    The University of Texas Health Science Center at Houston, USA.
  • Dina Demner-Fushman
    Lister Hill National Center for Biomedical Communications, National Library of Medicine, Bethesda, MD.