Fusion of convolutional neural network with XGBoost feature extraction for predicting multi-constituents in corn using near infrared spectroscopy.

Journal: Food chemistry
PMID:

Abstract

Near-infrared (NIR) spectroscopy has been widely utilized to predict multi-constituents of corn in agriculture. However, directly extracting constituent information from the NIR spectra is challenging due to many issues such as broad absorption band, overlapping and non-specific nature. To solve these problems and extract implicit features from the raw data of NIR spectra to improve performance of quantitative models, a one-dimensional shallow convolutional neural network (CNN) model based on an eXtreme Gradient Boosting (XGBoost) feature extraction method was proposed in this paper. The leaf node feature information in the XGBoost was encoded and reconstructed to obtain the implicit features of raw data in the NIR spectra. A two-parametric Swish (TSwish or TS) activation function was proposed to improve the performance of CNN, and the elastic net (EN) was also applied to avoid the overfitting problem of the CNN model. Performance of the developed XGBoost-CNN-TS-EN model was evaluated using two public NIR spectroscopy datasets of corn and soil, and the obtained determination coefficients (R) for moisture, oil, protein, and starch of the corn on test set were 0.993, 0.991, 0.998, and 0.992, respectively, with that of the soil organic matter being 0.992. The XGBoost-CNN-TS-EN model exhibits superior stability, good prediction accuracy, and generalization ability, demonstrating its great potentials for quantitative analysis of multi-constituents in spectroscopic applications.

Authors

  • Xin Zou
    Key Laboratory of Systems Biomedicine, Shanghai Center for Systems Biomedicine, Shanghai Jiaotong University, Shanghai, 200240, China.
  • Qiaoyun Wang
    College of Information Science and Engineering, Northeastern University, Shenyang, Liaoning Province 110819, China; Hebei Key Laboratory of Micro-Nano Precision Optical Sensing and Measurement Technology, Qinhuangdao 066004, China. Electronic address: wangqiaoyun@neuq.edu.cn.
  • Yinji Chen
    College of Information Science and Engineering, Northeastern University, Shenyang, Liaoning Province 110819, China.
  • Jilong Wang
    Peng Cheng Laboratory, Shenzhen, 518066, China.
  • Shunyuan Xu
    College of Information Science and Engineering, Northeastern University, Shenyang, Liaoning Province 110819, China.
  • Ziheng Zhu
    College of Information Science and Engineering, Northeastern University, Shenyang, Liaoning Province 110819, China.
  • Chongyue Yan
    College of Information Science and Engineering, Northeastern University, Shenyang, Liaoning Province 110819, China.
  • Peng Shan
    Department of Control Engineering, Northeastern University, Qinhuangdao, Hebei, 066001, PR China.
  • Shuyu Wang
    Department of Control Engineering, Northeastern University, Qinhuangdao, Hebei, 066001, PR China. Electronic address: wangshuyu@neuq.edu.cn.
  • Yongqing Fu
    The State Key Laboratory of Fluid Power and Mechatronic Systems, Zhejiang University, Hangzhou 310027, China.