Expediting data extraction using a large language model (LLM) and scoping review protocol: a methodological study within a complex scoping review
Journal:
arXiv
Published Date:
Jul 9, 2025
Abstract
The data extraction stages of reviews are resource-intensive, and researchers
may seek to expediate data extraction using online (large language models) LLMs
and review protocols. Claude 3.5 Sonnet was used to trial two approaches that
used a review protocol to prompt data extraction from 10 evidence sources
included in a case study scoping review. A protocol-based approach was also
used to review extracted data. Limited performance evaluation was undertaken
which found high accuracy for the two extraction approaches (83.3% and 100%)
when extracting simple, well-defined citation details; accuracy was lower (9.6%
and 15.8%) when extracting more complex, subjective data items. Considering all
data items, both approaches had precision >90% but low recall (<25%) and F1
scores (<40%). The context of a complex scoping review, open response types and
methodological approach likely impacted performance due to missed and
misattributed data. LLM feedback considered the baseline extraction accurate
and suggested minor amendments: four of 15 (26.7%) to citation details and 8 of
38 (21.1%) to key findings data items were considered to potentially add value.
However, when repeating the process with a dataset featuring deliberate errors,
only 2 of 39 (5%) errors were detected. Review-protocol-based methods used for
expediency require more robust performance evaluation across a range of LLMs
and review contexts with comparison to conventional prompt engineering
approaches. We recommend researchers evaluate and report LLM performance if
using them similarly to conduct data extraction or review extracted data. LLM
feedback contributed to protocol adaptation and may assist future review
protocol drafting.