Application of Generative Artificial Intelligence to Utilise Unstructured Clinical Data for Acceleration of Inflammatory Bowel Disease Research

Journal: medRxiv
Published Date:

Abstract

Inflammatory bowel disease (IBD) research is a dynamic field. However, the growing volume of electronic health records (EHRs) and research data presents significant challenges. Traditional methods for structuring unstructured medical records are labour-intensive and lack scalability. Large language models (LLMs) may present a solution, yet their usefulness in data standardisation in the context of IBD remains unknown. To evaluate the use of LLMs in structuring free-text histology and radiology reports from IBD patients, compare their performance to manual clinician curation, and assess the usefulness of fine-tuning and retrieval-augmented generation (RAG). We developed an IBD-specialised LLM-based framework utilising structured prompt engineering and fine-tuning. Reports were manually curated and processed using various LLMs. Performance was assessed and RAG was used to enhance model responses with clinical guidelines from European Crohn’s and Colitis Organisation (ECCO) and the European Society for Paediatric Gastroenterology Hepatology and Nutrition (ESPGHAN). Overall, Llama 3.3 achieved the highest F1 for histology and imaging (1 ± 0 and 0.85 ± 0.29, respectively) in extracting findings and anatomical regions, surpassing other models in structured data generation. Fine-tuning improved the performance of the smaller Llama 3.1 8B model for imaging reports (0.7 ± 0.46 vs 0.82 ± 0.35), enabling better extraction with reduced computational requirements. Our findings demonstrate the feasibility of LLM-based automated structuring of IBD-related medical records. Unstructured data from free text reports can be reliably converted to standardised ontologies with location, severity, and qualifiers. These advancements enable scalable, privacy-compliant AI-driven solutions for data standardisation. Traditional methods for structuring unstructured medical records for research are labour-intensive and lack scalability. IBD patients generate vast quantities of longitudinal medical data due to the chronicity of disease. Large language models (LLMs) are well-positioned for data extraction and standardisation purposes. This study demonstrates that Llama 3.3-70B and fine-tuned smaller models (Llama 3.1 8B) can accurately structure IBD-related histology and radiology reports. Additionally, retrieval-augmented generation (RAG) enhances clinical interpretability by incorporating guideline-based context. The use of LLMs in structuring EHR data can significantly accelerate IBD research, improve data standardisation, and facilitate privacy-compliant AI-driven solutions for clinical decision support and policy development.

Authors

  • Alex Z Kadhim; Zachary Green; Iman Nazari; Jonathan Baker; Michael George; Ashley Heinson; Matt Stammers; Christopher M Kipps; R Mark Beattie; James J Ashton; Sarah Ennis