Leveraging large language models in patient-reported outcome measure development: practical opportunities, cautions, and a human-in-the-loop roadmap.
Journal:
Journal of patient-reported outcomes
Published Date:
Jun 2, 2026
Abstract
BACKGROUND: Patient-reported outcome (PRO) measure development involves multiple language-intensive stages, including conceptual domain definition, candidate item generation, qualitative refinement, and cognitive interviewing prior to psychometric evaluation. Large language models (LLMs) may offer new opportunities to support these qualitative development activities, although their role within established PRO development frameworks remains incompletely defined. AIMS: This commentary proposes a practical human-in-the-loop roadmap for integrating LLMs into the qualitative phases of PRO development while preserving established standards for content validity and psychometric rigor. APPROACH: Drawing on examples from development of the Eating Behavior Measurement (EBM) project and emerging literature from PRO science, survey methodology, and psychological measurement, we outline several bounded use cases for LLMs in PRO measure qualitative development workflows. These include accelerating synthesis of legacy item pools ("domain cartography"), generating candidate items within human-defined constructs, supporting iterative item revision in response to cognitive interview findings, conducting semantic coherence checks for construct alignment, and assisting with developmental or contextual adaptation of candidate items. Across these applications, LLMs function as structured drafting and analytic tools rather than arbiters of validity. We additionally discuss practical risks involving confirmation bias, semantic circularity, transparency, reproducibility, and construct drift, along with strategies for mitigation through human oversight and model triangulation. CONCLUSIONS: LLMs do not replace qualitative inquiry, expert judgment, or empirical psychometric validation. Rather, they may help support more systematic and scalable qualitative development workflows when used within bounded, human-centered measurement frameworks. The central challenge for PRO science is not whether to adopt these tools, but how to integrate them responsibly without compromising the evidentiary standards on which the field depends.
Authors
Keywords
No keywords available for this article.