Classifying Tumor Reportability Status From Unstructured Electronic Pathology Reports Using Language Models in a Population-Based Cancer Registry Setting.
Journal:
JCO clinical cancer informatics
PMID:
39561305
Abstract
PURPOSE: Population-based cancer registries (PBCRs) collect data on all new cancer diagnoses in a defined population. Data are sourced from pathology reports, and the PBCRs rely on manual and rule-based solutions. This study presents a state-of-the-art natural language processing (NLP) pipeline, built by fine-tuning pretrained language models (LMs). The pipeline is deployed at the British Columbia Cancer Registry (BCCR) to detect reportable tumors from a population-based feed of electronic pathology.