FMT:A Multimodal Pneumonia Detection Model Based on Stacking MOE Framework

Journal: arXiv

Published Date: Mar 7, 2025

Abstract

Artificial intelligence has shown the potential to improve diagnostic accuracy through medical image analysis for pneumonia diagnosis. However, traditional multimodal approaches often fail to address real-world challenges such as incomplete data and modality loss. In this study, a Flexible Multimodal Transformer (FMT) was proposed, which uses ResNet-50 and BERT for joint representation learning, followed by a dynamic masked attention strategy that simulates clinical modality loss to improve robustness; finally, a sequential mixture of experts (MOE) architecture was used to achieve multi-level decision refinement. After evaluation on a small multimodal pneumonia dataset, FMT achieved state-of-the-art performance with 94% accuracy, 95% recall, and 93% F1 score, outperforming single-modal baselines (ResNet: 89%; BERT: 79%) and the medical benchmark CheXMed (90%), providing a scalable solution for multimodal diagnosis of pneumonia in resource-constrained medical settings.

Authors

Jingyu Xu
Yang Wang

External Resources

View on arXiv arXiv (http://arxiv.org/abs/2503.05626v1)

FMT:A Multimodal Pneumonia Detection Model Based on Stacking MOE Framework

Abstract

Authors

Categories

External Resources

Popular Topics

Recent Journals

FMT:A Multimodal Pneumonia Detection Model Based on Stacking MOE Framework

Abstract

Authors

Categories

External Resources

Don't Miss the Future of Medicine

Popular Topics

Recent Journals