OWT: A Foundational Organ-Wise Tokenization Framework for Medical Imaging
Journal:
arXiv
Published Date:
May 8, 2025
Abstract
Recent advances in representation learning often rely on holistic, black-box
embeddings that entangle multiple semantic components, limiting
interpretability and generalization. These issues are especially critical in
medical imaging. To address these limitations, we propose an Organ-Wise
Tokenization (OWT) framework with a Token Group-based Reconstruction (TGR)
training paradigm. Unlike conventional approaches that produce holistic
features, OWT explicitly disentangles an image into separable token groups,
each corresponding to a distinct organ or semantic entity. Our design ensures
each token group encapsulates organ-specific information, boosting
interpretability, generalization, and efficiency while allowing fine-grained
control in downstream tasks. Experiments on CT and MRI datasets demonstrate the
effectiveness of OWT in not only achieving strong image reconstruction and
segmentation performance, but also enabling novel semantic-level generation and
retrieval applications that are out of reach for standard holistic embedding
methods. These findings underscore the potential of OWT as a foundational
framework for semantically disentangled representation learning, offering broad
scalability and applicability to real-world medical imaging scenarios and
beyond.