Improving Representation Learning of Complex Critical Care Data with ICU-BERT
Journal:
arXiv
Published Date:
Feb 26, 2025
Abstract
The multivariate, asynchronous nature of real-world clinical data, such as
that generated in Intensive Care Units (ICUs), challenges traditional AI-based
decision-support systems. These often assume data regularity and feature
independence and frequently rely on limited data scopes and manual feature
engineering. The potential of generative AI technologies has not yet been fully
exploited to analyze clinical data. We introduce ICU-BERT, a transformer-based
model pre-trained on the MIMIC-IV database using a multi-task scheme to learn
robust representations of complex ICU data with minimal preprocessing. ICU-BERT
employs a multi-token input strategy, incorporating dense embeddings from a
biomedical Large Language Model to learn a generalizable representation of
complex and multivariate ICU data. With an initial evaluation of five tasks and
four additional ICU datasets, ICU-BERT results indicate that ICU-BERT either
compares to or surpasses current performance benchmarks by leveraging
fine-tuning. By integrating structured and unstructured data, ICU-BERT advances
the use of foundational models in medical informatics, offering an adaptable
solution for clinical decision support across diverse applications.