MLM-based typographical error correction of unstructured medical texts for named entity recognition.

Journal: BMC bioinformatics
Published Date:

Abstract

BACKGROUND: Unstructured text in medical records, such as Electronic Health Records, contain an enormous amount of valuable information for research; however, it is difficult to extract and structure important information because of frequent typographical errors. Therefore, improving the quality of data with errors for text analysis is an essential task. To date, few prior studies have been conducted addressing this. Here, we propose a new methodology for extracting important information from unstructured medical texts by overcoming the typographical problem in surgical pathology records related to lung cancer.

Authors

  • Eun Byul Lee
    Department of Digital Analytics, Yonsei University, 50 Yonsei-ro Seodaemun-gu, 03722, Seoul, Republic of Korea.
  • Go Eun Heo
    Department of Library and Information Science, Yonsei University, 50 Yonsei-ro Seodaemun-gu, Seoul, 03722, Republic of Korea.
  • Chang Min Choi
    Department of Oncology, Asan Medical Center, University of Ulsan College of Medicine, Seoul, South Korea.
  • Min Song
    Library and Information Science, Yonsei University, Seoul, South Korea.