Unsupervised learning for lake underwater vegetation classification: Constructing high-precision, large-scale aquatic ecological datasets.

Journal: The Science of the total environment
PMID:

Abstract

Monitoring underwater vegetation is vital for evaluating lake ecosystem health. Automated data collection and analysis play key roles in achieving large-scale, high-precision, and high-frequency monitoring. While technologies such as unmanned vessels have made data collection more efficient, challenges persist in the analysis process, particularly in addressing the varied needs of different lake environments. Supervised AI methods can automatically identify underwater vegetation but are heavily reliant on labeled datasets. In practice, models trained on public datasets often struggle with generalization due to differences in vegetation types, collection environments, and equipment, resulting in discrepancies between training and testing datasets. Moreover, traditional dataset construction methods that rely on manual annotation are time-consuming and costly, limiting their scalability and application. This study aims to overcome these challenges by proposing an unsupervised method for automatically classifying underwater vegetation data, aiming to reduce manual annotation efforts and construct unbiased datasets at lower costs with greater efficiency. Compared with existing unsupervised, self-supervised, and unsupervised domain adaptation methods, this method introduces two key innovations: 1) a two-step dimensionality reduction method that combines pre-trained model and manifold learning to extract key features and 2) a multialgorithm voting mechanism to increase classification confidence. These features enable high-accuracy classification without prior data annotation. Experiments show 97.32 % accuracy on public dataset and 92.43 % and 96.15 % accuracy on private datasets from Erhai Lake and Wuhan East Lake, respectively, surpassing supervised methods and matching manual classification. Additionally, it drastically reduces the annotation effort, requiring only approximately 20 labeled images to classify thousands of points. By integrating unmanned vessel technology, this approach provides an efficient, cost-effective solution for large-scale, high-frequency underwater vegetation monitoring across diverse lakes.

Authors

  • Lei Liu
    Department of Science and Technology, Beijing Shijitan Hospital, Capital Medical University, Beijing, China.
  • Zhengsen Bao
    School of Engineering, Dali University, Yunnan 671003, China; Air-Space-Ground Integrated Intelligence and Big Data Application Engineering Research Center of Yunnan Provincial Department of Education, Yunnan 671003, China. Electronic address: baozhengsen@stu.dali.edu.cn.
  • Ying Liang
    Department of Therapeutic Radiology, Yale University, New Haven, CT, U.S.A.
  • Huanxi Deng
    School of Engineering, Dali University, Yunnan 671003, China; Air-Space-Ground Integrated Intelligence and Big Data Application Engineering Research Center of Yunnan Provincial Department of Education, Yunnan 671003, China. Electronic address: denghuanxi@stu.dali.edu.cn.
  • Xiaolin Zhang
    Department of Chemical and Biological Engineering, University of Wisconsin-Madison, 1415 Engineering Dr., Madison, WI, 53706, USA.
  • Te Cao
    State Key Laboratory of Freshwater Ecology and Biotechnology, Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan 430072, China. Electronic address: caote@ihb.ac.cn.
  • Chichun Zhou
    School of Engineering, Dali University, Yunnan 671003, China; Air-Space-Ground Integrated Intelligence and Big Data Application Engineering Research Center of Yunnan Provincial Department of Education, Yunnan 671003, China. Electronic address: zhouchichun@dali.edu.cn.
  • Zhenyu Zhang
    Laboratory of Industrial Biotechnology of Department of Education, Jiangnan University, Wuxi 214122, Jiangsu, China.