Extraction and classification of structured data from unstructured hepatobiliary pathology reports using large language models: a feasibility study compared with rules-based natural language processing.

Journal: Journal of clinical pathology
Published Date:

Abstract

AIMS: Structured reporting in pathology is not universally adopted and extracting elements essential to research often requires expensive and time-intensive manual curation. The accuracy and feasibility of using large language models (LLMs) to extract essential pathology elements, for cancer research is examined here.

Authors

  • Ruben Geevarghese
    Division of Interventional Radiology, Department of Radiology, Memorial Sloan Kettering Cancer Center, New York, New York.
  • Carlie Sigel
    Department of Pathology, Memorial Sloan Kettering Cancer Center, New York, New York, USA.
  • John Cadley
    Department of Artificial Intelligence & Machine Learning, DigITs, Memorial Sloan Kettering Cancer Center, New York, New York.
  • Subrata Chatterjee
    Department of Artificial Intelligence & Machine Learning, DigITs, Memorial Sloan Kettering Cancer Center, New York, New York.
  • Pulkit Jain
    Department of Artificial Intelligence & Machine Learning, DigITs, Memorial Sloan Kettering Cancer Center, New York, New York.
  • Alex Hollingsworth
    Department of Artificial Intelligence & Machine Learning, DigITs, Memorial Sloan Kettering Cancer Center, New York, New York.
  • Avijit Chatterjee
    Department of Artificial Intelligence & Machine Learning, DigITs, Memorial Sloan Kettering Cancer Center, New York, New York.
  • Nathaniel Swinburne
    Department of Radiology, Icahn School of Medicine, New York, NY, USA.
  • Khawaja Hasan Bilal
    Department of Pathology, Memorial Sloan Kettering Cancer Center, New York, New York, USA.
  • Brett Marinelli
    Department of Radiology, Mount Sinai Health System, New York, New York.