LSTM autoencoder based parallel architecture for deepfake audio detection with dynamic residual encoding and feature fusion.

Journal: Scientific reports

Published Date: Jul 2, 2025

Abstract

With the rapid advancement of synthetic speech technologies, detecting deepfake audio has become essential for preventing impersonation and misinformation. This study aims to enhance detection performance by addressing limitations in existing models, such as temporal inconsistencies, weak contextual representation, and reconstruction loss. A novel framework, termed Long Short-Term Memory Auto-Encoder with Dynamic Residual Difference Encoding (LSTM-AE-DRDE), is proposed to overcome these challenges. The framework consists of two parallel modules: one leverages attention-enhanced LSTM with contrastive learning to highlight critical temporal cues, while the other amplifies real-vs-fake separability by computing residual differences across transformed audio variants. By integrating diverse speech features-including MFCC, temporal, prosodic, wavelet packet, and glottal parameters the model captures both low- and high-level audio characteristics. Experimental evaluation was carried out on five benchmark datasets (CVoice Fake, FoR, Deepfake Voice Recognition, ODSS, and CMFD), where the proposed model achieved classification accuracies of 97%, 90%, 96%, 97%, and 95%, respectively. Furthermore, when compared to eleven state-of-the-art methods, the proposed model demonstrates superior performance with an overall ROC-AUC of approximately 98%. In addition, a comprehensive feature-wise ablation study was conducted to assess the contribution of each feature set, confirming the robustness and reliability of the proposed framework.

Authors

Priyanka Muruganandham

Department of CSE, Srinivasa Ramanujan Centre, SASTRA Deemed to be University, Kumbakonam, 612 001, India.
Govardhana Rajan Thangasamy

Department of CSE, Srinivasa Ramanujan Centre, SASTRA Deemed to be University, Kumbakonam, 612 001, India.
Sangeetha Jayaraman

Department of Computer Science and Engineering, Srinivasa Ramanujan Center, SASTRA Deemed to be University, Kumbakonam 612001, Tamilnadu, India.
Rekha Dharmarajan

Department of CSE, Srinivasa Ramanujan Centre, SASTRA Deemed to be University, Kumbakonam, 612 001, India.

Keywords

Algorithms Autoencoder Deep Learning Humans Neural Networks, Computer Speech Voice

External Resources

View on PubMed Access via DOI PubMed (40604055)

LSTM autoencoder based parallel architecture for deepfake audio detection with dynamic residual encoding and feature fusion.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals

LSTM autoencoder based parallel architecture for deepfake audio detection with dynamic residual encoding and feature fusion.

Abstract

Authors

Keywords

External Resources

Stay Ahead of Medical AI

Popular Topics

Recent Journals