Revisiting the MIMIC-IV Benchmark: Experiments Using Language Models for Electronic Health Records
Journal:
arXiv
Published Date:
Apr 29, 2025
Abstract
The lack of standardized evaluation benchmarks in the medical domain for text
inputs can be a barrier to widely adopting and leveraging the potential of
natural language models for health-related downstream tasks. This paper
revisited an openly available MIMIC-IV benchmark for electronic health records
(EHRs) to address this issue. First, we integrate the MIMIC-IV data within the
Hugging Face datasets library to allow an easy share and use of this
collection. Second, we investigate the application of templates to convert EHR
tabular data to text. Experiments using fine-tuned and zero-shot LLMs on the
mortality of patients task show that fine-tuned text-based models are
competitive against robust tabular classifiers. In contrast, zero-shot LLMs
struggle to leverage EHR representations. This study underlines the potential
of text-based approaches in the medical field and highlights areas for further
improvement.