H-Net: A Multitask Architecture for Simultaneous 3D Force Estimation and Stereo Semantic Segmentation in Intracardiac Catheters
Journal:
arXiv
Published Date:
Dec 31, 2024
Abstract
The success rate of catheterization procedures is closely linked to the
sensory data provided to the surgeon. Vision-based deep learning models can
deliver both tactile and visual information in a sensor-free manner, while also
being cost-effective to produce. Given the complexity of these models for
devices with limited computational resources, research has focused on force
estimation and catheter segmentation separately. However, there is a lack of a
comprehensive architecture capable of simultaneously segmenting the catheter
from two different angles and estimating the applied forces in 3D. To bridge
this gap, this work proposes a novel, lightweight, multi-input, multi-output
encoder-decoder-based architecture. It is designed to segment the catheter from
two points of view and concurrently measure the applied forces in the x, y, and
z directions. This network processes two simultaneous X-Ray images, intended to
be fed by a biplane fluoroscopy system, showing a catheter's deflection from
different angles. It uses two parallel sub-networks with shared parameters to
output two segmentation maps corresponding to the inputs. Additionally, it
leverages stereo vision to estimate the applied forces at the catheter's tip in
3D. The architecture features two input channels, two classification heads for
segmentation, and a regression head for force estimation through a single
end-to-end architecture. The output of all heads was assessed and compared with
the literature, demonstrating state-of-the-art performance in both segmentation
and force estimation. To the best of the authors' knowledge, this is the first
time such a model has been proposed