A fake news detection model using the integration of multimodal attention mechanism and residual convolutional network.
Journal:
Scientific reports
Published Date:
Jul 1, 2025
Abstract
To improve the accuracy and efficiency of fake news detection, this study proposes a deep learning model that integrates residual networks with attention mechanisms. Building on traditional convolutional neural networks, the model incorporates multi-head attention mechanisms to enhance the extraction of key features from multimodal data such as text, images, and videos. Additionally, residual connections are introduced to deepen the network architecture, mitigate the vanishing gradient problem, and improve the model's learning depth and stability. Compared with existing approaches, this study introduces several key innovations. First, it constructs a multimodal feature fusion module that integrates text, image, and video data. Second, it designs a cross-modal alignment mechanism to better connect information across different data types. Third, it optimizes the feature fusion structure for more effective integration. Finally, the study employs attention mechanisms to highlight and enhance the representation of salient features. Experiments were conducted using three representative datasets: the LIAR dataset for political short texts, the FakeNewsNet dataset for English multimodal news, and the Weibo dataset from a Chinese social media platform. These were selected to comprehensively evaluate the model's performance across different scenarios. Baseline models used for comparison include Bidirectional Encoder Representations from Transformers (BERT), Robustly Optimized Bidirectional Encoder Representations from Transformers Approach (RoBERTa), Generalized Autoregressive Pretraining for Language Understanding (XLNet), Enhanced Representation through Knowledge Integration (ERNIE), and Generative Pre-trained Transformer 3.5 (GPT-3.5). In terms of four key performance metrics-accuracy, precision, recall, and F1 score-the proposed model achieved best-case values of 0.977, 0.986, 0.969, and 0.924, respectively, outperforming the aforementioned baseline models overall. Furthermore, simulated experiments were conducted to evaluate the model's real-world applicability from four dimensions: robustness, generalization ability, response time, and resource consumption. The results demonstrate that the model maintains strong stability and adaptability under data perturbations and diverse input conditions, with a response time controllable within 0.02 s. The model also shows significant computational advantages when handling large-scale datasets. Therefore, this study presents a high-performance and deployment-friendly solution for fake news detection in multimodal contexts. The study also offers valuable theoretical insights and practical guidance for applying deep learning to public opinion governance and text classification.