Benchmarking domain-specific pretrained language models to identify the best model for methodological rigor in clinical studies.
Journal:
Journal of biomedical informatics
Published Date:
Apr 15, 2025
Abstract
OBJECTIVE: Encoder-only transformer-based language models have shown promise in automating critical appraisal of clinical literature. However, a comprehensive evaluation of the models for classifying the methodological rigor of randomized controlled trials is necessary to identify the more robust ones. This study benchmarks several state-of-the-art transformer-based language models using a diverse set of performance metrics.