Vision-based food nutrition estimation via RGB-D fusion network.

Journal: Food chemistry
PMID:

Abstract

With the development of deep learning technology, vision-based food nutrition estimation is gradually entering the public view for its advantage in accuracy and efficiency. In this paper, we designed one RGB-D fusion network, which integrated multimodal feature fusion (MMFF) and multi-scale fusion for visioin-based nutrition assessment. MMFF performed effective feature fusion by a balanced feature pyramid and convolutional block attention module. Multi-scale fusion fused different resolution features through feature pyramid network. Both enhanced feature representation to improve the performance of the model. Compared with state-of-the-art methods, the mean value of the percentage mean absolute error (PMAE) for our method reached 18.5%. The PMAE of calories and mass reached 15.0% and 10.8% via the RGB-D fusion network, improved by 3.8% and 8.1%, respectively. Furthermore, this study visualized the estimation results of four nutrients and verified the validity of the method. This research contributed to the development of automated food nutrient analysis (Code and models can be found at http://123.57.42.89/codes/RGB-DNet/nutrition.html).

Authors

  • Wenjing Shao
    School of Information Science and Engineering, Shandong Normal University, Shandong 250358, China.
  • Weiqing Min
  • Sujuan Hou
    School of Information Science and Engineering, Shandong Normal University, Shandong 250358, China. Electronic address: sujuanhou@sdnu.edu.cn.
  • Mengjiang Luo
    The Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China; University of Chinese Academy of Sciences, Beijing 100049, China.
  • Tianhao Li
    Tianjin Union Medical Center, Tianjin Medical University, Tianjin, China.
  • Yuanjie Zheng
  • Shuqiang Jiang