LLM-TA: An LLM-Enhanced Thematic Analysis Pipeline for Transcripts from Parents of Children with Congenital Heart Disease
Journal:
arXiv
Published Date:
Feb 3, 2025
Abstract
Thematic Analysis (TA) is a fundamental method in healthcare research for
analyzing transcript data, but it is resource-intensive and difficult to scale
for large, complex datasets. This study investigates the potential of large
language models (LLMs) to augment the inductive TA process in high-stakes
healthcare settings. Focusing on interview transcripts from parents of children
with Anomalous Aortic Origin of a Coronary Artery (AAOCA), a rare congenital
heart disease, we propose an LLM-Enhanced Thematic Analysis (LLM-TA) pipeline.
Our pipeline integrates an affordable state-of-the-art LLM (GPT-4o mini),
LangChain, and prompt engineering with chunking techniques to analyze nine
detailed transcripts following the inductive TA framework. We evaluate the
LLM-generated themes against human-generated results using thematic similarity
metrics, LLM-assisted assessments, and expert reviews. Results demonstrate that
our pipeline outperforms existing LLM-assisted TA methods significantly. While
the pipeline alone has not yet reached human-level quality in inductive TA, it
shows great potential to improve scalability, efficiency, and accuracy while
reducing analyst workload when working collaboratively with domain experts. We
provide practical recommendations for incorporating LLMs into high-stakes TA
workflows and emphasize the importance of close collaboration with domain
experts to address challenges related to real-world applicability and dataset
complexity. https://github.com/jiaweixu98/LLM-TA