Improving fine-grained food classification using deep residual learning and selective state space models.
Journal:
PloS one
PMID:
40323945
Abstract
BACKGROUND: Food classification is the foundation for developing food vision tasks and plays a key role in the burgeoning field of computational nutrition. Due to the complexity of food requiring fine-grained classification, the Convolutional Neural Networks (CNNs) backbone needs additional structural design, whereas Vision Transformers (ViTs), containing the self-attention module, has increased computational complexity.