From Spreadsheets and Bespoke Models to Enterprise Data Warehouses: GPT-enabled Clinical Data Ingestion into i2b2.

Journal: medRxiv : the preprint server for health sciences
Published Date:

Abstract

OBJECTIVE: Clinical and phenotypic data available to researchers are often found in spreadsheets or bespoke data models. Bridging these to enterprise data warehouses would enable sophisticated analytics and cohort discovery for users of platforms like NHGRI's Genomic Data Science Analysis, Visualization, and Informatics Lab-space (AnVlL). We combine data mapping methodologies, biomedical ontologies, and large language models (LLMs) to load these data into Informatics for Integrating Biology and the Bedside (i2b2), making them available to AnVIL users.

Authors

  • Taowei David Wang
  • Shawn N Murphy
  • Victor M Castro
  • Jeffrey G Klann

Keywords

No keywords available for this article.