Comparison of different feature extraction methods for applicable automated ICD coding.
Journal:
BMC medical informatics and decision making
Published Date:
Jan 12, 2022
Abstract
BACKGROUND: Automated ICD coding on medical texts via machine learning has been a hot topic. Related studies from medical field heavily relies on conventional bag-of-words (BoW) as the feature extraction method, and do not commonly use more complicated methods, such as word2vec (W2V) and large pretrained models like BERT. This study aimed at uncovering the most effective feature extraction methods for coding models by comparing BoW, W2V and BERT variants.