Developing Methodologies to Find Abbreviated Laboratory Test Names in Narrative Clinical Documents by Generating High Quality Q-Grams.

Journal: Studies in health technology and informatics
Published Date:

Abstract

Laboratory test names are used as basic information to diagnose diseases. However, this kind of medical information is usually written in a natural language. To find this information, lexicon based methods have been good solutions but they cannot find terms that do not have abbreviated expressions, such as "neuts" that means "neutrophils". To address this issue, similar word matching can be used; however, it can be disadvantageous because of significant false positives. Moreover, processing time is longer as the size of terms is bigger. Therefore, we suggest a novel q-gram based algorithm, named modified triangular area filtering, to find abbreviated laboratory test terms in clinical documents, minimizing the possibility to impair the lexicons' precision. In addition, we found the terms using the methodology with reasonable processing time. The results show that this method can achieve 92.54 precision, 87.72 recall, 90.06 f1-score in test sets when edit distance threshold(τ) = 3.

Authors

  • Kyungmo Kim
    Interdisciplinary program for Bioengineering, Seoul National University, Seoul 03080, South Korea.
  • Jinwook Choi
    Dept. of Biomedical Engineering, College of Medicine, Seoul National University 103, Daehak-ro, Jongno-gu, Seoul, 03080, Republic of Korea. Electronic address: jinchoi@snu.ac.kr.