Evaluating the performance of a custom GPT in full text screening of a systematic review.

Journal: Scandinavian journal of public health

Published Date: Mar 13, 2026

Abstract

AIM: Systematic reviewing is a time-consuming process that can be aided by artificial intelligence (AI). There are several AI options to assist with title/abstract screening, however options for full text screening are limited. The objective of this study was to evaluate the performance of a custom generative pretrained transformer (cGPT) for full text screening. METHODS: A cGPT powered by OpenAI's ChatGPT4o was tested with subsets of articles assessed in duplicate by human reviewers. Outputs from the testing subset were coded to simulate cGPT as an autonomous and an assistant reviewer. Cohen's kappa was used to assess interrater agreement. RESULTS: For the inclusion/exclusion decision, the human-human kappa scores ranged from 0.87 to 0.96, exceeding the ranges of kappa scores for autonomous cGPT-human pairings (0.59 to 0.67) and assistant cGPT-human pairings (0.62 to 0.72). For exclusion reason classification, the human-human kappa scores ranged from 0.71 to 0.78, exceeding the ranges of kappa scores for autonomous cGPT-human pairings (0.47 to 0.53) and assistant cGPT-human pairings (0.52 to 0.63). CONCLUSIONS: The assistant cGPT outperformed the autonomous cGPT. An assistant cGPT could speed up systematic reviewing in a sufficiently reliable manner, however, further research is needed to establish standardized thresholds for practical use. Improved speed of systematic reviewing has implications for directing timely public health policy decisions.

Evaluating the performance of a custom GPT in full text screening of a systematic review.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals

Evaluating the performance of a custom GPT in full text screening of a systematic review.

Abstract

Authors

Keywords

External Resources

Don't Miss the Future of Medicine

Popular Topics

Recent Journals