Poetry in Pixels: Prompt Tuning for Poem Image Generation via Diffusion Models
Journal:
arXiv
Published Date:
Jan 10, 2025
Abstract
The task of text-to-image generation has encountered significant challenges
when applied to literary works, especially poetry. Poems are a distinct form of
literature, with meanings that frequently transcend beyond the literal words.
To address this shortcoming, we propose a PoemToPixel framework designed to
generate images that visually represent the inherent meanings of poems. Our
approach incorporates the concept of prompt tuning in our image generation
framework to ensure that the resulting images closely align with the poetic
content. In addition, we propose the PoeKey algorithm, which extracts three key
elements in the form of emotions, visual elements, and themes from poems to
form instructions which are subsequently provided to a diffusion model for
generating corresponding images. Furthermore, to expand the diversity of the
poetry dataset across different genres and ages, we introduce MiniPo, a novel
multimodal dataset comprising 1001 children's poems and images. Leveraging this
dataset alongside PoemSum, we conducted both quantitative and qualitative
evaluations of image generation using our PoemToPixel framework. This paper
demonstrates the effectiveness of our approach and offers a fresh perspective
on generating images from literary sources.