SFOOD: A Multimodal Benchmark for Comprehensive Food Attribute Analysis Beyond RGB with Spectral Insights
Journal:
arXiv
Published Date:
Jul 6, 2025
Abstract
With the rise and development of computer vision and LLMs, intelligence is
everywhere, especially for people and cars. However, for tremendous food
attributes (such as origin, quantity, weight, quality, sweetness, etc.),
existing research still mainly focuses on the study of categories. The reason
is the lack of a large and comprehensive benchmark for food. Besides, many food
attributes (such as sweetness, weight, and fine-grained categories) are
challenging to accurately percept solely through RGB cameras. To fulfill this
gap and promote the development of intelligent food analysis, in this paper, we
built the first large-scale spectral food (SFOOD) benchmark suite. We spent a
lot of manpower and equipment costs to organize existing food datasets and
collect hyperspectral images of hundreds of foods, and we used instruments to
experimentally determine food attributes such as sweetness and weight. The
resulting benchmark consists of 3,266 food categories and 2,351 k data points
for 17 main food categories. Extensive evaluations find that: (i) Large-scale
models are still poor at digitizing food. Compared to people and cars, food has
gradually become one of the most difficult objects to study; (ii) Spectrum data
are crucial for analyzing food properties (such as sweetness). Our benchmark
will be open source and continuously iterated for different food analysis
tasks.