Class Imbalance Correction for Improved Universal Lesion Detection and Tagging in CT
Journal:
arXiv
Published Date:
Apr 8, 2025
Abstract
Radiologists routinely detect and size lesions in CT to stage cancer and
assess tumor burden. To potentially aid their efforts, multiple lesion
detection algorithms have been developed with a large public dataset called
DeepLesion (32,735 lesions, 32,120 CT slices, 10,594 studies, 4,427 patients, 8
body part labels). However, this dataset contains missing measurements and
lesion tags, and exhibits a severe imbalance in the number of lesions per label
category. In this work, we utilize a limited subset of DeepLesion (6\%, 1331
lesions, 1309 slices) containing lesion annotations and body part label tags to
train a VFNet model to detect lesions and tag them. We address the class
imbalance by conducting three experiments: 1) Balancing data by the body part
labels, 2) Balancing data by the number of lesions per patient, and 3)
Balancing data by the lesion size. In contrast to a randomly sampled
(unbalanced) data subset, our results indicated that balancing the body part
labels always increased sensitivity for lesions >= 1cm for classes with low
data quantities (Bone: 80\% vs. 46\%, Kidney: 77\% vs. 61\%, Soft Tissue: 70\%
vs. 60\%, Pelvis: 83\% vs. 76\%). Similar trends were seen for three other
models tested (FasterRCNN, RetinaNet, FoveaBox). Balancing data by lesion size
also helped the VFNet model improve recalls for all classes in contrast to an
unbalanced dataset. We also provide a structured reporting guideline for a
``Lesions'' subsection to be entered into the ``Findings'' section of a
radiology report. To our knowledge, we are the first to report the class
imbalance in DeepLesion, and have taken data-driven steps to address it in the
context of joint lesion detection and tagging.