Evaluating the capacity of large language models to interpret emotions in images.

Journal: PloS one
Published Date:

Abstract

The integration of artificial intelligence, specifically large language models (LLMs), in emotional stimulus selection and validation offers a promising avenue for enhancing emotion comprehension frameworks. Traditional methods in this domain are often labor-intensive and susceptible to biases, highlighting the need for more efficient and scalable alternatives. This study evaluates the capability of GPT-4, in recognizing and rating emotions from visual stimuli, focusing on two primary emotional dimensions: valence (positive, neutral, or negative) and arousal (calm, neutral, or stimulated). By comparing the performance of GPT-4 against human evaluations using the well-established Geneva Affective PicturE Database (GAPED), we aim to assess the model's efficacy as a tool for automating the selection and validation of emotional elicitation stimuli. Our findings indicate that GPT-4 closely approximates human ratings under zero-shot learning conditions, although it encounters some difficulties in accurately classifying subtler emotional cues. These results underscore the potential of LLMs to streamline the emotional stimulus selection and validation process, thereby reducing the time and labor associated with traditional methods.

Authors

  • Hend Alrasheed
    Department of Information Technology, King Saud University, Riyadh, Saudi Arabia.
  • Adwa Alghihab
    MIT Media Lab, Massachusetts Institute of Technology, Cambridge, Massachusetts, United States of America.
  • Alex Pentland
    Media Lab, Massachusetts Institute of Technology, Cambridge - MA, United States of America.
  • Sharifa Alghowinem
    Media Lab, Massachusetts Institute of Technology, Cambridge, MA, USA, with Prince Sultan University, Riyadh, Saudi Arabia and with the Australian National University, Canberra, Australia.