Leveraging immune and clinicopathological profiles with machine learning to predict axillary lymph node metastasis in breast cancer patients.

Journal: Breast cancer research : BCR
Published Date:

Abstract

BACKGROUND: Breast cancer is the leading cause of cancer-related death in women, with mortality increasing when tumor cells spread to nearby lymph nodes, particularly the axillary lymph nodes (ALNs). Although several studies predict patients with ALN metastasis at diagnosis (pdALN+), few examine the prognostic value of immune elements within ALNs. Given the impact of immune response on breast cancer, this study develops a machine learning model to identify the clinicopathological and immune features of the primary tumor and non-metastatic ALNs (ALN-) most frequently associated with pdALN+. METHODS: Two datasets of luminal breast cancer patients diagnosed between 1995 and 2008 were used: Dataset 1 involved 83 women (42 pdALN- and 41 pdALN+), and Dataset 2 comprised 344 women (204 pdALN- and 140 pdALN+). Three machine learning models were developed using the Random Forest algorithm: Model 1 included clinicopathological data from Dataset 1; Model 2 used clinicopathological and immune response data from Dataset 1; and Model 3 used clinicopathological data from Dataset 2. All models followed the same machine learning pipeline, including data pre-processing, feature selection using recursive feature elimination with cross-validation, algorithm optimization using random search cross-validation, and results interpretability using Shapley additive explanations values. After selecting the best-performing model, Model 4 was developed using its dataset and features. The optimal feature set was determined at the point where adding more features led to a decline in model performance metrics. RESULTS: Model 2 outperformed Models 1 and 3, despite the larger cohort on which Model 3 was developed. This highlights the crucial role of the immune response in breast cancer progression. Model 4 achieved a median ROC AUC of 0.84, a median accuracy of 0.76, and a median recall of 0.75. Remarkably, nine of the ten predictive features were immune populations. The intratumoral follicular dendritic cell marker CD21+ was the most predictive feature, even surpassing tumor diameter, a well-established prognostic factor in breast cancer. Thus, it might stand as a novel biomarker candidate. CONCLUSIONS: This study not only identifies promising biomarker candidates but also highlights the importance of including mechanistic features, such as mediating inflammation, in breast cancer patient stratification.

Authors

Keywords

No keywords available for this article.