DR.BENCH: Diagnostic Reasoning Benchmark for Clinical Natural Language Processing.

Journal: Journal of biomedical informatics
PMID:

Abstract

The meaningful use of electronic health records (EHR) continues to progress in the digital era with clinical decision support systems augmented by artificial intelligence. A priority in improving provider experience is to overcome information overload and reduce the cognitive burden so fewer medical errors and cognitive biases are introduced during patient care. One major type of medical error is diagnostic error due to systematic or predictable errors in judgement that rely on heuristics. The potential for clinical natural language processing (cNLP) to model diagnostic reasoning in humans with forward reasoning from data to diagnosis and potentially reduce cognitive burden and medical error has not been investigated. Existing tasks to advance the science in cNLP have largely focused on information extraction and named entity recognition through classification tasks. We introduce a novel suite of tasks coined as Diagnostic Reasoning Benchmarks, Dr.Bench, as a new benchmark for developing and evaluating cNLP models with clinical diagnostic reasoning ability. The suite includes six tasks from ten publicly available datasets addressing clinical text understanding, medical knowledge reasoning, and diagnosis generation. DR.BENCH is the first clinical suite of tasks designed to be a natural language generation framework to evaluate pre-trained language models for diagnostic reasoning. The goal of DR. BENCH is to advance the science in cNLP to support downstream applications in computerized diagnostic decision support and improve the efficiency and accuracy of healthcare providers during patient care. We fine-tune and evaluate the state-of-the-art generative models on DR.BENCH. Experiments show that with domain adaptation pre-training on medical knowledge, the model demonstrated opportunities for improvement when evaluated in DR. BENCH. We share DR. BENCH as a publicly available GitLab repository with a systematic approach to load and evaluate models for the cNLP community. We also discuss the carbon footprint produced during the experiments and encourage future work on DR.BENCH to report the carbon footprint.

Authors

  • Yanjun Gao
    Department of Biomedical Informatics, University of Colorado-Anschutz Medical, Aurora, CO 80045, United States.
  • Dmitriy Dligach
    Department of Public Health Sciences, Stritch School of Medicine, Loyola University Chicago, Maywood, IL.
  • Timothy Miller
    School of Computing and Information Systems, University of Melbourne, Victoria 3010, Australia.
  • John Caskey
    Department of Medicine, University of Wisconsin, Madison, USA.
  • Brihat Sharma
    Department of Computer Science, Loyola University Chicago, Chicago, IL, USA.
  • Matthew M Churpek
    Department of Medicine, University of Wisconsin-Madison, Madison, WI, United States.
  • Majid Afshar
    Loyola University Chicago, Chicago, IL.