Detecting Cardiac Amyloidosis in Italian Cardiology Reports: Structured Variable Extraction versus Direct Free-Text Analysis
Journal:
medRxiv
Published Date:
Jan 23, 2026
Abstract
Background: Early and accurate identification of cardiac amyloidosis improves patient outcomes, yet relevant evidence is frequently hidden in free-text records. This study assesses whether structured variable extraction or direct free-text analysis more reliably identifies patients with cardiac amyloidosis, with the goal of informing clinical decision support strategies. Methods: We extracted 21 clinical variables from 432 Italian patient records using supervised and prompt-based methods with both proprietary and locallydeployable computational models. Classification performance was evaluated by comparing extracted data with gold-standard manual annotations. Two feature sets were tested: general clinical variables and amyloidosis-specific risk factors. Additionally, we evaluated direct zero-shot prediction on unstructured clinical notes. Results: For entity extraction, GPT-4.1-mini achieved F1=0.96, comparable to supervised SpaCy (F1=0.95) and GPT-4o (F1=0.94). Local open-source models like Qwen2.5 reached F1=0.94. For cardiac amyloidosis classification, machine learning models using full extracted features matched gold-standard results (SauerkrautLM-Gemma: F1=0.80 vs. gold: F1=0.82). General clinical features alone yielded lower performance (F1=0.68), highlighting that amyloidosisspecific risk factors in unstructured text provide discriminative diagnostic value. Zero-shot direct predictions outperformed supervised feature-based approaches (MedGemma: F1=0.92). Conclusions: Automated extraction and zero-shot prediction effectively structure Italian EHRs and identify amyloidosis patients without manual annotation. Domain-specific risk factors in free-text notes provide substantial predictive value. Italian hospitals can potentially deploy locally-deployable models to screen cardiac amyloidosis without manual annotation or proprietary APIs, enabling privacy-preserving clinical decision support in real-world settings.