GuidedDE: Targeted black-box adversarial attacks via confidence-guided mutation on automatic speech recognition systems.
Journal:
Neural networks : the official journal of the International Neural Network Society
Published Date:
Nov 19, 2025
Abstract
The vulnerability of Deep Neural Networks (DNNs) to black-box adversarial attacks remains a pressing issue in modern Automatic Speech Recognition (ASR) systems. Although existing query-based methods, which combine genetic algorithms (GAs) with gradient estimation, have made progress, they are hindered by two key limitations: high query consumption (ranging from 137k to 478k queries per attack) and insufficient mechanistic analysis of targeted attacks. To address these challenges, we propose Guided Differential Evolution (Guided DE), a Differential Evolution (DE) framework enhanced with confidence-driven gradient signals. This approach integrates population-based search with CTC loss feedback, significantly improving both attack efficiency and success rates. Evaluations across three ASR architectures show that Guided DE achieves 90-95 % success rates for two-word attacks using only 40.2k-48.9k queries-a 67-73 % improvement in efficiency over GA baselines, alongside a 2.1-3× increase in success probability. Additionally, the framework uncovers critical vulnerability patterns in ASR systems: phrases containing energy-rich phonemes (e.g., plosives and fricatives) are 31 % more susceptible to attacks than vowel-heavy sequences. Compared to standard DE, Guided DE improves attack success rates by 62.4 % while reducing query costs by 19.3 %. This work advances adversarial attack methodology by introducing a gradient-evolution fusion framework that enables efficient black-box attacks on ASR systems. At the same time, it systematically reveals phonetic vulnerability patterns rooted in spectral energy dynamics, providing dual insights for both offensive and defensive strategy development. These findings not only deepen our understanding of ASR vulnerabilities but also contribute to the creation of more robust and secure speech recognition systems.
Authors
Keywords
No keywords available for this article.