Improving fine-grained food classification using deep residual learning and selective state space models.

Journal: PloS one
PMID:

Abstract

BACKGROUND: Food classification is the foundation for developing food vision tasks and plays a key role in the burgeoning field of computational nutrition. Due to the complexity of food requiring fine-grained classification, the Convolutional Neural Networks (CNNs) backbone needs additional structural design, whereas Vision Transformers (ViTs), containing the self-attention module, has increased computational complexity.

Authors

  • Chi-Sheng Chen
    Graduate Institute of Biomedical Electronics and Bioinformatics, National Taiwan University, No.1, Sec.4, Roosevelt Road, Taipei, 10617, Taiwan.
  • Guan-Ying Chen
    Graduate Institute of Biomedical Electronics and Bioinformatics, National Taiwan University, Taipei, Taiwan.
  • Dong Zhou
    EVision Technology (Beijing) Co. LTD, 100000, China.
  • Di Jiang
    College of Engineering, China Agricultural University, Beijing 100083, China.
  • Daishi Chen
    Department of Otolaryngology, Shenzhen People's Hospital, Shenzhen, Guangdong, China.
  • Shao-Hsuan Chang
    School of Engineering, University of Liverpool, Liverpool, United Kingdom.