Leveraging proteomics and transfer learning for head and neck cancer detection in saliva

Journal: medRxiv
Published Date:

Abstract

Early detection of Head and neck cancer (HNC) has the potential to substantially improve patient survival, yet no biomarker tests for early detection are currently in clinical practice. Case-control studies that could be used to derive diagnostic biomarkers tend to be underpowered. Recent evidence suggests that we may be able to address this challenge by applying deep learning on pan-cancer data from large population studies. We evaluate a range of machine learning methods and training scenarios to use proteome data to distinguish between HNC cases and controls. Models were trained on blood plasma proteomes from the UK Biobank (UKB) with n = 13,208 pan-cancer cases. To assess model’s generalisability across tissue types, we tested in a cross-tissue comparison using an independent saliva based proteome dataset from the SensOrPass HNC case-control study (n = 156). We obtain best performance (AUC=0.88 versus AUC < 0.77 for others) using a transfer learning approach called CNN-Synth. This convolutional neural network was trained on UKB to distinguish between profiles from a set of controls and cases including synthetic profiles generated by a pretrained variational autoencoder. Post-hoc model explainability using SHapley Additive explanations identified IL6, CXCL17, CXCL13, IGF1R and FASLG as the top five proteins contributing most to predictor performance. Our findings underscore the potential for deep learning and explainable AI to leverage data from large-scale population datasets for advancing early cancer detection and improving clinical outcomes. This work was supported by Cancer Research UK (grant numbers EDDISA-Jan22\100003 and C18281/A29019).

Authors

  • Anza Shakeel; Samuel W. D. Merriel; Joel Smith; A. Stephen McGough; Matthew Suderman; Zahraa S. Abdallah; Paul D. Yousefi

Categories