FMT:A Multimodal Pneumonia Detection Model Based on Stacking MOE Framework
Journal:
arXiv
Published Date:
Mar 7, 2025
Abstract
Artificial intelligence has shown the potential to improve diagnostic
accuracy through medical image analysis for pneumonia diagnosis. However,
traditional multimodal approaches often fail to address real-world challenges
such as incomplete data and modality loss. In this study, a Flexible Multimodal
Transformer (FMT) was proposed, which uses ResNet-50 and BERT for joint
representation learning, followed by a dynamic masked attention strategy that
simulates clinical modality loss to improve robustness; finally, a sequential
mixture of experts (MOE) architecture was used to achieve multi-level decision
refinement. After evaluation on a small multimodal pneumonia dataset, FMT
achieved state-of-the-art performance with 94% accuracy, 95% recall, and 93% F1
score, outperforming single-modal baselines (ResNet: 89%; BERT: 79%) and the
medical benchmark CheXMed (90%), providing a scalable solution for multimodal
diagnosis of pneumonia in resource-constrained medical settings.