ViTaL: A Multimodality Dataset and Benchmark for Multi-pathological Ovarian Tumor Recognition
Journal:
arXiv
Published Date:
Jul 6, 2025
Abstract
Ovarian tumor, as a common gynecological disease, can rapidly deteriorate
into serious health crises when undetected early, thus posing significant
threats to the health of women. Deep neural networks have the potential to
identify ovarian tumors, thereby reducing mortality rates, but limited public
datasets hinder its progress. To address this gap, we introduce a vital ovarian
tumor pathological recognition dataset called \textbf{ViTaL} that contains
\textbf{V}isual, \textbf{T}abular and \textbf{L}inguistic modality data of 496
patients across six pathological categories. The ViTaL dataset comprises three
subsets corresponding to different patient data modalities: visual data from
2216 two-dimensional ultrasound images, tabular data from medical examinations
of 496 patients, and linguistic data from ultrasound reports of 496 patients.
It is insufficient to merely distinguish between benign and malignant ovarian
tumors in clinical practice. To enable multi-pathology classification of
ovarian tumor, we propose a ViTaL-Net based on the Triplet Hierarchical Offset
Attention Mechanism (THOAM) to minimize the loss incurred during feature fusion
of multi-modal data. This mechanism could effectively enhance the relevance and
complementarity between information from different modalities. ViTaL-Net serves
as a benchmark for the task of multi-pathology, multi-modality classification
of ovarian tumors. In our comprehensive experiments, the proposed method
exhibited satisfactory performance, achieving accuracies exceeding 90\% on the
two most common pathological types of ovarian tumor and an overall performance
of 85\%. Our dataset and code are available at
https://github.com/GGbond-study/vitalnet.