Machine learning-based natural language processing to extract PD-L1 expression levels from clinical notes.

Journal: Health informatics journal

PMID: 37635280

Abstract

PD-L1 expression is used to determine oncology patients' response to and eligibility for immunologic treatments; however, PD-L1 expression status often only exists in unstructured clinical notes, limiting ability to use it in population-level studies. We developed and evaluated a machine learning based natural language processing (NLP) tool to extract PD-L1 expression values from the nationwide Veterans Affairs electronic health record system. The model demonstrated strong evaluation performance across multiple levels of label granularity. Mean precision of the overall PD-L1 positive label was 0.859 (sd, 0.039), recall 0.994 (sd, 0.013), and F1 0.921 (0.024). When a numeric PD-L1 value was identified, the mean absolute error of the value was 0.537 on a scale of 0 to 100. We presented an accurate NLP method for deriving PD-L1 status from clinical notes. By reducing the time and manual effort needed to review medical records, our work will enable future population-level studies in cancer immunotherapy.

Authors

Eric Lin

VA Boston Healthcare System, Boston, MA, USA.
Robert Zwolinski

VA Boston Healthcare System, Boston, MA, USA.
Julie Tsu-Yu Wu

VA Palo Alto Healthcare System, Palo Alto, CA, USA.
Jennifer La

VA Boston Healthcare System, Boston, MA, USA.
Sergey Goryachev
Linden Huhmann

VA Boston Healthcare System, Boston, MA, USA.
Cenk Yildrim

VA Boston Healthcare System, Boston, MA, USA.
David P Tuck

VA Boston Healthcare System, Boston, MA, USA.
Danne C Elbers

VA Boston Healthcare System, Boston, MA, USA.
Mary T Brophy

VA Boston Healthcare System, Boston, MA, USA.
Nhan V Do

VA Boston Healthcare System, Boston, MA, USA.
Nathanael R Fillmore

Harvard Medical School, Boston, MA, USA.

Keywords

B7-H1 Antigen Electronic Health Records Humans Machine Learning Medical Records Natural Language Processing Software

External Resources

View on PubMed Access via DOI PubMed (37635280)

Machine learning-based natural language processing to extract PD-L1 expression levels from clinical notes.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals