Comparative Analysis of Generative Pre-Trained Transformer Models in Oncogene-Driven Non-Small Cell Lung Cancer: Introducing the Generative Artificial Intelligence Performance Score.
Journal:
JCO clinical cancer informatics
PMID:
39661913
Abstract
PURPOSE: Precision oncology in non-small cell lung cancer (NSCLC) relies on biomarker testing for clinical decision making. Despite its importance, challenges like the lack of genomic oncology training, nonstandardized biomarker reporting, and a rapidly evolving treatment landscape hinder its practice. Generative artificial intelligence (AI), such as ChatGPT, offers promise for enhancing clinical decision support. Effective performance metrics are crucial to evaluate these models' accuracy and their propensity for producing incorrect or hallucinated information. We assessed various ChatGPT versions' ability to generate accurate next-generation sequencing reports and treatment recommendations for NSCLC, using a novel Generative AI Performance Score (G-PS), which considers accuracy, relevancy, and hallucinations.