Validation of OncoOrigin: An Integrative AI Tool for Primary Cancer Site Prediction with Graphical User Interface to Facilitate Clinical Application.
Journal:
International journal of molecular sciences
PMID:
40141210
Abstract
Cancers of unknown primary (CUPs) represent a significant diagnostic and therapeutic challenge in the field of oncology. Due to the limitations of current diagnostic tools in these cases, novel approaches must be brought forward to improve treatment outcomes for these patients. The objective of this study was to develop a machine-learning-based software for primary cancer site prediction (OncoOrigin), based on genetic data acquired from tumor DNA sequencing. By design, this was an diagnostic study, conducted using data from the cBioPortal database (accessed on 21 September 2024) and several data processing and machine learning Python libraries. The study involved over 20,000 tumor samples with information on patient age, sex, and the presence of genetic variants in over 600 genes. The main outcome of interest was machine-learning-based discrimination between cancer site classes. Model quality was assessed by training set cross-validation and evaluation on a segregated test set. Finally, the optimal model was incorporated with a graphical user interface into the OncoOrigin software. Feature importance for class discrimination was also determined on the optimal model. Out of the four tested machine learning estimators, the XGBoostClassifier-based model proved superior in test set evaluation, with a top-2 accuracy of 0.91 and ROC-AUC of 0.97. Unlike other machine learning models published in the literature, OncoOrigin stands out as the only one integrated with a graphical user interface, which is crucial for facilitating its use by oncology specialists in everyday clinical practice, where its application and implementation will have the greatest value in the future.