Counting Fish with Temporal Representations of Sonar Video
Journal:
arXiv
Published Date:
Feb 7, 2025
Abstract
Accurate estimates of salmon escapement - the number of fish migrating
upstream to spawn - are key data for conservation and fishery management.
Existing methods for salmon counting using high-resolution imaging sonar
hardware are non-invasive and compatible with computer vision processing. Prior
work in this area has utilized object detection and tracking based methods for
automated salmon counting. However, these techniques remain inaccessible to
many sonar deployment sites due to limited compute and connectivity in the
field. We propose an alternative lightweight computer vision method for fish
counting based on analyzing echograms - temporal representations that compress
several hundred frames of imaging sonar video into a single image. We predict
upstream and downstream counts within 200-frame time windows directly from
echograms using a ResNet-18 model, and propose a set of domain-specific image
augmentations and a weakly-supervised training protocol to further improve
results. We achieve a count error of 23% on representative data from the Kenai
River in Alaska, demonstrating the feasibility of our approach.