Toward Automated Regulatory Decision-Making: Trustworthy Medical Device Risk Classification with Multimodal Transformers and Self-Training
Journal:
arXiv
Published Date:
May 1, 2025
Abstract
Accurate classification of medical device risk levels is essential for
regulatory oversight and clinical safety. We present a Transformer-based
multimodal framework that integrates textual descriptions and visual
information to predict device regulatory classification. The model incorporates
a cross-attention mechanism to capture intermodal dependencies and employs a
self-training strategy for improved generalization under limited supervision.
Experiments on a real-world regulatory dataset demonstrate that our approach
achieves up to 90.4% accuracy and 97.9% AUROC, significantly outperforming
text-only (77.2%) and image-only (54.8%) baselines. Compared to standard
multimodal fusion, the self-training mechanism improved SVM performance by 3.3
percentage points in accuracy (from 87.1% to 90.4%) and 1.4 points in macro-F1,
suggesting that pseudo-labeling can effectively enhance generalization under
limited supervision. Ablation studies further confirm the complementary
benefits of both cross-modal attention and self-training.