SAM-GAN: Self-Attention supporting Multi-stage Generative Adversarial Networks for text-to-image synthesis.

Journal: Neural networks : the official journal of the International Neural Network Society

Published Date: Feb 10, 2021

Abstract

Synthesizing photo-realistic images based on text descriptions is a challenging task in the field of computer vision. Although generative adversarial networks have made significant breakthroughs in this task, they still face huge challenges in generating high-quality visually realistic images consistent with the semantics of text. Generally, existing text-to-image methods accomplish this task with two steps, that is, first generating an initial image with a rough outline and color, and then gradually yielding the image within high-resolution from the initial image. However, one drawback of these methods is that, if the quality of the initial image generation is not high, it is hard to generate a satisfactory high-resolution image. In this paper, we propose SAM-GAN, Self-Attention supporting Multi-stage Generative Adversarial Networks, for text-to-image synthesis. With the self-attention mechanism, the model can establish the multi-level dependence of the image and fuse the sentence- and word-level visual-semantic vectors, to improve the quality of the generated image. Furthermore, a multi-stage perceptual loss is introduced to enhance the semantic similarity between the synthesized image and the real image, thus enhancing the visual-semantic consistency between text and images. For the diversity of the generated images, a mode seeking regularization term is integrated into the model. The results of extensive experiments and ablation studies, which were conducted in the Caltech-UCSD Birds and Microsoft Common Objects in Context datasets, show that our model is superior to competitive models in text-to-image synthesis.

Authors

Dunlu Peng

Shanghai Key Lab of Modern Optical System, School of Optical-Electrical and Computer Engineering, University of Shanghai for Science and Technology, 20093, Shanghai, China. Electronic address: pengdl@usst.edu.cn.
Wuchen Yang

Shanghai Key Lab of Modern Optical System, School of Optical-Electrical and Computer Engineering, University of Shanghai for Science and Technology, 20093, Shanghai, China. Electronic address: yangmiemieywc@163.com.
Cong Liu

Department of Bioengineering, University of Illinois at Chicago, 851 S Morgan St, Chicago, IL, 60607, USA.
Shuairui Lü

Shanghai Key Lab of Modern Optical System, School of Optical-Electrical and Computer Engineering, University of Shanghai for Science and Technology, 20093, Shanghai, China. Electronic address: lvshuairui@163.com.

Keywords

Color Image Processing, Computer-Assisted Neural Networks, Computer Semantics

External Resources

View on PubMed Access via DOI PubMed (33631607)

SAM-GAN: Self-Attention supporting Multi-stage Generative Adversarial Networks for text-to-image synthesis.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals