A multimodal dataset for precision oncology in head and neck cancer.
Journal:
Nature communications
Published Date:
Aug 4, 2025
Abstract
Head and neck cancer is a common disease and is associated with a poor prognosis. A promising approach to improving patient outcomes is personalized treatment, which uses information from a variety of modalities. However, only little progress has been made due to the lack of large public datasets. We present a multimodal dataset, HANCOCK, that comprises monocentric, real-world data of 763 head and neck cancer patients. Our dataset contains demographical, pathological, and blood data as well as surgery reports and histologic images, that can be explored in a low-dimensional representation. We can show that combining these modalities using machine learning is superior to a single modality and the integration of imaging data using foundation models helps in endpoint prediction. We believe that HANCOCK will not only open new insights into head and neck cancer pathology but also serve as a major source for researching multimodal machine-learning methodologies in precision oncology.