Benchmarking domain-specific pretrained language models to identify the best model for methodological rigor in clinical studies.

Journal: Journal of biomedical informatics
Published Date:

Abstract

OBJECTIVE: Encoder-only transformer-based language models have shown promise in automating critical appraisal of clinical literature. However, a comprehensive evaluation of the models for classifying the methodological rigor of randomized controlled trials is necessary to identify the more robust ones. This study benchmarks several state-of-the-art transformer-based language models using a diverse set of performance metrics.

Authors

  • Fangwen Zhou
    Health Information Research Unit, Department of Health Research Methods, Evidence, and Impact, Faculty of Health Sciences, McMaster University, Hamilton, Ontario, Canada.
  • Rick Parrish
    Health Information Research Unit, Department of Health Research Methods, Evidence, and Impact, McMaster University, Hamilton, Ontario, Canada.
  • Muhammad Afzal
    Department of Computer Engineering, Kyung Hee University, Seocheon-dong, Giheung-gu Yongin-si, Gyeonggi-do 446-701, Korea. muhammad.afzal@oslab.khu.ac.kr.
  • Ashirbani Saha
    Department of Radiology, Duke University School of Medicine, 2424 Erwin Road, Suite 302, Durham, NC, 27705, USA. ashirbani.saha@duke.edu.
  • R Brian Haynes
    Department of Health Research Methods, Evidence, and Impact, McMaster University, Hamilton, ON, Canada.
  • Alfonso Iorio
    Department of Health Research Methods, Evidence, and Impact, McMaster University, Hamilton, ON, Canada.
  • Cynthia Lokker
    Health Information Research Unit, Department of Health Research Methods, Evidence, and Impact, McMaster University, Hamilton, Ontario, Canada. Electronic address: lokkerc@mcmaster.ca.