Mining core information by evaluating semantic importance for unpaired image captioning.
Journal:
Neural networks : the official journal of the International Neural Network Society
Published Date:
Jul 9, 2024
Abstract
Recently, exciting progress has been made in the research of supervised image captioning. However, manually annotated image-annotation pair data is difficult and expensive to obtain. Therefore, unpaired image captioning becomes an emerging challenge. This paper proposes a method called Mining Core Information by Evaluating Semantic Importance (MCIESI) for Unpaired Image Captioning, which is a method for image captioning using unpaired images and sentences. The main difference from the existing methods is that MCIESI focuses on mining the information that should be described in the image and embodies them in the generated natural language that conforms to human thinking. To achieve this goal, we use scene graphs to represent the semantics of images and evaluates the importance of objects and interaction relationships to mine core information in images, which are then encouraged to be embodied in generated sentences through semantic constraint. Combined with grammatical constraint using adversarial training with real sentence corpus and relative constraint using a triplet loss, the generator is trained to generate semantically plausible and grammatically correct sentences. Extensive experiments verify the effectiveness of MCIESI.