Comparing UNet configurations for anthropogenic geomorphic feature extraction from land surface parameters.

Journal: PloS one
Published Date:

Abstract

The application of deep learning for semantic segmentation has revolutionized image analysis, particularly in the geospatial and medical fields. UNet, an encoder-decoder architecture, has been suggested to be particularly effective. However, limitations such as small sample sizes and class imbalance in anthropogenic geomorphic feature extraction tasks have necessitated the exploration of advanced modifications to improve model performance. This study investigates a variety of architectural modifications to base UNet including replacing the rectified linear unit (ReLU) activation function with leaky ReLU or swish; incorporating residual connections within the encoder blocks, decoder blocks, and bottleneck; inserting squeeze and excitation modules into the encoder or attention gate modules along the skip connections; replacing the default bottleneck layer with one that incorporates dilated convolution; and using a MobileNetV2 architecture as an encoder backbone. Unique geomorphic datasets derived from high spatial resolution lidar data were used to evaluate the performance of these modified UNet architectures on the tasks of mapping agricultural terraces, mine benches, and valley fill faces. The results were further analyzed across varying training sample sizes (50, 100, 250, 500, and the full training set). Our results suggest that the incorporation of advanced modules can enhance segmentation performance, particularly in scenarios involving limited training data or complex geomorphic landscapes. However, differences were minimal when larger training set sizes were used (e.g., above 500 image chips) and the base UNet architecture was generally adequate. This research contributes valuable insights into the optimization of UNet-based models for anthropogenic geomorphic feature extraction and provides a foundation for future work aimed at improving the accuracy and efficiency of deep learning approaches in geospatial applications. We argue that one of the positive attributes of UNet is that it can be treated as a general framework that can easily be modified.

Authors

  • Sarah Farhadpour
    Department of Geology and Geography, West Virginia University, Morgantown, WV, United States of America.
  • Aaron E Maxwell
    Department of Geology and Geography, West Virginia University, Morgantown, WV, United States of America.