Evaluating the capacity of large language models to interpret emotions in images.
Journal:
PloS one
Published Date:
Jan 1, 2025
Abstract
The integration of artificial intelligence, specifically large language models (LLMs), in emotional stimulus selection and validation offers a promising avenue for enhancing emotion comprehension frameworks. Traditional methods in this domain are often labor-intensive and susceptible to biases, highlighting the need for more efficient and scalable alternatives. This study evaluates the capability of GPT-4, in recognizing and rating emotions from visual stimuli, focusing on two primary emotional dimensions: valence (positive, neutral, or negative) and arousal (calm, neutral, or stimulated). By comparing the performance of GPT-4 against human evaluations using the well-established Geneva Affective PicturE Database (GAPED), we aim to assess the model's efficacy as a tool for automating the selection and validation of emotional elicitation stimuli. Our findings indicate that GPT-4 closely approximates human ratings under zero-shot learning conditions, although it encounters some difficulties in accurately classifying subtler emotional cues. These results underscore the potential of LLMs to streamline the emotional stimulus selection and validation process, thereby reducing the time and labor associated with traditional methods.