ADPv2: A Hierarchical Histological Tissue Type-Annotated Dataset for Potential Biomarker Discovery of Colorectal Disease
Journal:
arXiv
Published Date:
Jul 8, 2025
Abstract
Computational pathology (CoPath) leverages histopathology images to enhance
diagnostic precision and reproducibility in clinical pathology. However,
publicly available datasets for CoPath that are annotated with extensive
histological tissue type (HTT) taxonomies at a granular level remain scarce due
to the significant expertise and high annotation costs required. Existing
datasets, such as the Atlas of Digital Pathology (ADP), address this by
offering diverse HTT annotations generalized to multiple organs, but limit the
capability for in-depth studies on specific organ diseases. Building upon this
foundation, we introduce ADPv2, a novel dataset focused on gastrointestinal
histopathology. Our dataset comprises 20,004 image patches derived from healthy
colon biopsy slides, annotated according to a hierarchical taxonomy of 32
distinct HTTs of 3 levels. Furthermore, we train a multilabel representation
learning model following a two-stage training procedure on our ADPv2 dataset.
We leverage the VMamba architecture and achieving a mean average precision
(mAP) of 0.88 in multilabel classification of colon HTTs. Finally, we show that
our dataset is capable of an organ-specific in-depth study for potential
biomarker discovery by analyzing the model's prediction behavior on tissues
affected by different colon diseases, which reveals statistical patterns that
confirm the two pathological pathways of colon cancer development. Our dataset
is publicly available at https://zenodo.org/records/15307021