Long-tailed Medical Diagnosis with Relation-aware Representation Learning and Iterative Classifier Calibration
Journal:
arXiv
Published Date:
Feb 5, 2025
Abstract
Recently computer-aided diagnosis has demonstrated promising performance,
effectively alleviating the workload of clinicians. However, the inherent
sample imbalance among different diseases leads algorithms biased to the
majority categories, leading to poor performance for rare categories. Existing
works formulated this challenge as a long-tailed problem and attempted to
tackle it by decoupling the feature representation and classification. Yet, due
to the imbalanced distribution and limited samples from tail classes, these
works are prone to biased representation learning and insufficient classifier
calibration. To tackle these problems, we propose a new Long-tailed Medical
Diagnosis (LMD) framework for balanced medical image classification on
long-tailed datasets. In the initial stage, we develop a Relation-aware
Representation Learning (RRL) scheme to boost the representation ability by
encouraging the encoder to capture intrinsic semantic features through
different data augmentations. In the subsequent stage, we propose an Iterative
Classifier Calibration (ICC) scheme to calibrate the classifier iteratively.
This is achieved by generating a large number of balanced virtual features and
fine-tuning the encoder using an Expectation-Maximization manner. The proposed
ICC compensates for minority categories to facilitate unbiased classifier
optimization while maintaining the diagnostic knowledge in majority classes.
Comprehensive experiments on three public long-tailed medical datasets
demonstrate that our LMD framework significantly surpasses state-of-the-art
approaches. The source code can be accessed at
https://github.com/peterlipan/LMD.