An Explainable CNN and Vision Transformer-Based Approach for Real-Time Food Recognition.

Journal: Nutrients
PMID:

Abstract

BACKGROUND: Food image recognition, a crucial step in computational gastronomy, has diverse applications across nutritional platforms. Convolutional neural networks (CNNs) are widely used for this task due to their ability to capture hierarchical features. However, they struggle with long-range dependencies and global feature extraction, which are vital in distinguishing visually similar foods or images where the context of the whole dish is crucial, thus necessitating transformer architecture.

Authors

  • Kintoh Allen Nfor
    Department of Computer Engineering, Inje University, Gimhae 50834, Republic of Korea.
  • Tagne Poupi Theodore Armand
    Institute of Digital Anti-Aging Healthcare, Inje University, Gimhae 50834, Republic of Korea.
  • Kenesbaeva Periyzat Ismaylovna
    Department of Computer Engineering, Inje University, Gimhae 50834, Republic of Korea.
  • Moon-Il Joo
    Institute of Digital Anti-Aging Healthcare, Inje University, Gimhae, Republic of Korea.
  • Hee-Cheol Kim
    Department of Computer Engineering/Institute of Digital Anti-Aging Healthcare, Inje University, Gimhae 50834, Korea. heeki@inje.ac.kr.