AutoReporter: Development of an artificial intelligence tool for automated assessment of research reporting guideline adherence

Journal: medRxiv

Published Date: Jan 1, 2025

Abstract

To develop AutoReporter, a large-language-model system that automates evaluation of adherence to research reporting guidelines. Eight prompt-engineering and retrieval strategies coupled with reasoning and general-purpose LLMs were benchmarked on the SPIRIT-CONSORT-TM corpus. The top-performing approach, AutoReporter, was validated on BenchReport, a novel benchmark dataset of expert-rated reporting guideline assessments from 10 systematic reviews. AutoReporter, a zerolZlshot, nolZlretrieval prompt coupled with the o3-mini reasoning LLM, demonstrated optimal accuracy (CONSORT: 90.09%; SPIRIT: 92.07%), run-time (CONSORT: 617.26 seconds; SPIRIT: 544.51 seconds), and cost (CONSORT: 0.68 USD; SPIRIT: 0.65 USD). AutoReporter achieved a mean accuracy of 91.8% and substantial agreement (Cohen’s κ>0.6) with expert ratings from the BenchReport benchmark. Structured prompting alone can match or exceed fine-tuned domain models while forgoing manually annotated corpora and computationally intensive training. LLMs can feasibly automate reporting guideline adherence assessments for scalable quality control in scientific research reporting. AutoReporter is publicly accessible at https://autoreporter.streamlit.app.

Authors

David Chen; Patrick Li; Ealia Khoshkish; Seungmin Lee; Tony Ning; Umair Tahir; Henry CY Wong; Michael SF Lee; Srinivas Raman

External Resources

View on medRxiv Access via DOI

AutoReporter: Development of an artificial intelligence tool for automated assessment of research reporting guideline adherence

Abstract

Authors

Categories

External Resources

Popular Topics

Recent Journals

AutoReporter: Development of an artificial intelligence tool for automated assessment of research reporting guideline adherence

Abstract

Authors

Categories

External Resources

Stay Ahead of Medical AI

Popular Topics

Recent Journals