LCD benchmark: long clinical document benchmark on mortality prediction for language models.
Journal:
Journal of the American Medical Informatics Association : JAMIA
Published Date:
Feb 1, 2025
Abstract
OBJECTIVES: The application of natural language processing (NLP) in the clinical domain is important due to the rich unstructured information in clinical documents, which often remains inaccessible in structured data. When applying NLP methods to a certain domain, the role of benchmark datasets is crucial as benchmark datasets not only guide the selection of best-performing models but also enable the assessment of the reliability of the generated outputs. Despite the recent availability of language models capable of longer context, benchmark datasets targeting long clinical document classification tasks are absent.