Multiple remotely sensed datasets and machine learning models to predict chlorophyll-a concentration in the Nakdong River, South Korea.

Journal: Environmental science and pollution research international
PMID:

Abstract

The Nakdong River is a crucial water resource in South Korea, supplying water for various purposes such as potable water, irrigation, and recreation. However, the river is vulnerable to algal blooms due to the inflow of pollutants from multiple points and non-point sources. Monitoring chlorophyll-a (Chl-a) concentrations, a proxy for algal biomass is essential for assessing the trophic status of the river and managing its ecological health. This study aimed to improve the accuracy and reliability of Chl-a estimation in the Nakdong River using machine learning models (MLMs) and simultaneous use of multiple remotely sensed datasets. This study compared the performances of four MLMs: multi-layer perceptron (MLP), support vector machine (SVM), random forest (RF), and eXetreme Gradient Boosting (XGB) using three different input datasets: (1) two remotely sensed datasets (Sentinel-2 and Landsat-8), (2) standalone Sentinel-2, and (3) standalone Landsat-8. The results showed that the MLP model with multiple remotely sensed datasets outperformed other MLMs with 0.43 - 0.86 greater in R and 0.36 - 5.88 lower in RMSE. The MLP model demonstrated the highest performance across the range of Chl-a concentrations and predicted peaks above 20 mg/m relatively well compared to other models. This was likely due to the capacity of MLP to handle imbalanced datasets. The predictive map of the spatial distribution of Chl-a generated by MLP well captured the areas with high and low Chl-a concentrations. This study pointed out the impacts of imbalanced Chl-a concentration observations (dominated by low Chl-a concentrations) on the performance of MLMs. The data imbalance likely led to MLMs poorly trained for high Chl-a values, producing low prediction accuracy. In conclusion, this study demonstrated the value of multiple remotely sensed datasets in enhancing the accuracy and reliability of Chl-a estimation, mainly when using the MLP model. These findings would provide valuable insights into utilizing MLMs effectively for Chl-a monitoring.

Authors

  • Byeongwon Lee
    Department of Environmental Science & Ecological Engineering, College of Life Sciences & Biotechnology, Korea University, 145, Anam-Ro, Seongbuk-Gu, Seoul, 02841, South Korea.
  • Jong Kwon Im
    National Institute of Environmental Research, 42, Hwangyeong-Ro, Seo-Gu, Incheon, 22689, South Korea.
  • Ji Woo Han
    Han River Environment Research Center, National Institute of Environmental Research, 42, Dumulmeori-Gil 68Beon-Gil, Yangseo-Myeon, Yangpyeong-Gun, 12585, South Korea.
  • Taegu Kang
    Han River Environment Research Center, National Institute of Environmental Research, 42, Dumulmeori-Gil 68Beon-Gil, Yangseo-Myeon, Yangpyeong-Gun, 12585, South Korea.
  • Wonkook Kim
    Department of Civil and Environmental Engineering, Pusan National University, 2, Busandaehak-Ro 63Beon-Gil, Geumjeong-Gu, Busan, 46241, South Korea.
  • Moonil Kim
    Division of ICT-Integrated Environment, Pyeongtaek University, 3825, Seodong-Daero, Pyeongtaek-Si, 17869, Gyeonggi-Do, South Korea.
  • Sangchul Lee
    Department of Urology, Seoul National University Bundang Hospital, Seongnam, Korea.