Less Context, Same Performance: A RAG Framework for Resource-Efficient LLM-Based Clinical NLP
Journal:
arXiv
Published Date:
May 23, 2025
Abstract
Long text classification is challenging for Large Language Models (LLMs) due
to token limits and high computational costs. This study explores whether a
Retrieval Augmented Generation (RAG) approach using only the most relevant text
segments can match the performance of processing entire clinical notes with
large context LLMs. We begin by splitting clinical documents into smaller
chunks, converting them into vector embeddings, and storing these in a FAISS
index. We then retrieve the top 4,000 words most pertinent to the
classification query and feed these consolidated segments into an LLM. We
evaluated three LLMs (GPT4o, LLaMA, and Mistral) on a surgical complication
identification task. Metrics such as AUC ROC, precision, recall, and F1 showed
no statistically significant differences between the RAG based approach and
whole-text processing (p > 0.05p > 0.05). These findings indicate that RAG can
significantly reduce token usage without sacrificing classification accuracy,
providing a scalable and cost effective solution for analyzing lengthy clinical
documents.