RECALL-MM: A Multimodal Dataset of Consumer Product Recalls for Risk Analysis using Computational Methods and Large Language Models
Journal:
arXiv
Published Date:
Mar 29, 2025
Abstract
Product recalls provide valuable insights into potential risks and hazards
within the engineering design process, yet their full potential remains
underutilized. In this study, we curate data from the United States Consumer
Product Safety Commission (CPSC) recalls database to develop a multimodal
dataset, RECALL-MM, that informs data-driven risk assessment using historical
information, and augment it using generative methods. Patterns in the dataset
highlight specific areas where improved safety measures could have significant
impact. We extend our analysis by demonstrating interactive clustering maps
that embed all recalls into a shared latent space based on recall descriptions
and product names. Leveraging these data-driven tools, we explore three case
studies to demonstrate the dataset's utility in identifying product risks and
guiding safer design decisions. The first two case studies illustrate how
designers can visualize patterns across recalled products and situate new
product ideas within the broader recall landscape to proactively anticipate
hazards. In the third case study, we extend our approach by employing a large
language model (LLM) to predict potential hazards based solely on product
images. This demonstrates the model's ability to leverage visual context to
identify risk factors, revealing strong alignment with historical recall data
across many hazard categories. However, the analysis also highlights areas
where hazard prediction remains challenging, underscoring the importance of
risk awareness throughout the design process. Collectively, this work aims to
bridge the gap between historical recall data and future product safety,
presenting a scalable, data-driven approach to safer engineering design.