FFA Sora, video generation as fundus fluorescein angiography simulator
Journal:
arXiv
Published Date:
Dec 23, 2024
Abstract
Fundus fluorescein angiography (FFA) is critical for diagnosing retinal
vascular diseases, but beginners often struggle with image interpretation. This
study develops FFA Sora, a text-to-video model that converts FFA reports into
dynamic videos via a Wavelet-Flow Variational Autoencoder (WF-VAE) and a
diffusion transformer (DiT). Trained on an anonymized dataset, FFA Sora
accurately simulates disease features from the input text, as confirmed by
objective metrics: Frechet Video Distance (FVD) = 329.78, Learned Perceptual
Image Patch Similarity (LPIPS) = 0.48, and Visual-question-answering Score
(VQAScore) = 0.61. Specific evaluations showed acceptable alignment between the
generated videos and textual prompts, with BERTScore of 0.35. Additionally, the
model demonstrated strong privacy-preserving performance in retrieval
evaluations, achieving an average Recall@K of 0.073. Human assessments
indicated satisfactory visual quality, with an average score of 1.570(scale: 1
= best, 5 = worst). This model addresses privacy concerns associated with
sharing large-scale FFA data and enhances medical education.