DeepLASD countermeasure for logical access audio spoofing.
Journal:
Scientific reports
Published Date:
Jul 1, 2025
Abstract
Voice-based authentication systems have become increasingly vulnerable to logical access (LA) spoofing through sophisticated voice conversion (VC) and text-to-speech (TTS) attacks. This paper proposes an end-to-end deep learning approach DeepLASD, that processes raw waveforms to detect spoofed speech without relying on handcrafted features. The model incorporates a SincConv layer for interpretable spectral processing, along with residual convolutional blocks that integrate attention for improved feature extraction. We introduce GeLU activation in residual blocks to enhance our method's ability to better capture the unique traits in real and spoof samples. A gated recurrent unit is further employed for temporal dynamics modeling. Extensive experimentation was conducted on the large-scale and diverse ASVspoof 2019 and 2021 datasets. Achieving an Equal Error Rate as low as [Formula: see text] and a minimum Tandem Detection Cost Function of 0.1208, along with strong generalization to both VC and TTS spoof types, demonstrate the competency of the proposed method for LA spoofing detection. Although the results on the ASVspoof 2021 dataset underscore the challenges posed by next-generation synthetic speech, the proposed solution exhibits notable adaptability. These findings affirm that the proposed end-to-end anti-spoofing framework enhances security and detection capabilities in voice authentication systems.
Authors
Keywords
No keywords available for this article.