MetaFE-DE: Learning Meta Feature Embedding for Depth Estimation from Monocular Endoscopic Images
Journal:
arXiv
Published Date:
Feb 5, 2025
Abstract
Depth estimation from monocular endoscopic images presents significant
challenges due to the complexity of endoscopic surgery, such as irregular
shapes of human soft tissues, as well as variations in lighting conditions.
Existing methods primarily estimate the depth information from RGB images
directly, and often surffer the limited interpretability and accuracy. Given
that RGB and depth images are two views of the same endoscopic surgery scene,
in this paper, we introduce a novel concept referred as ``meta feature
embedding (MetaFE)", in which the physical entities (e.g., tissues and surgical
instruments) of endoscopic surgery are represented using the shared features
that can be alternatively decoded into RGB or depth image. With this concept,
we propose a two-stage self-supervised learning paradigm for the monocular
endoscopic depth estimation. In the first stage, we propose a temporal
representation learner using diffusion models, which are aligned with the
spatial information through the cross normalization to construct the MetaFE. In
the second stage, self-supervised monocular depth estimation with the
brightness calibration is applied to decode the meta features into the depth
image. Extensive evaluation on diverse endoscopic datasets demonstrates that
our approach outperforms the state-of-the-art method in depth estimation,
achieving superior accuracy and generalization. The source code will be
publicly available.