A temporal adaptive dictionary-constrained LDA and Bi-calibrated dual granularity DTM framework for dynamic topic evolution analysis in academic papers.
Journal:
Scientific reports
Published Date:
May 29, 2026
Abstract
With the explosive growth of academic achievements in artificial intelligence (AI), accurately capturing the dynamic evolution of research topics is crucial for research topic selection, policy-making, and industrial innovation. To address the limitation that traditional topic models struggle to balance static topic recognition accuracy and dynamic evolution tracking capability, this study proposes a unified framework: Temporal Adaptive Dictionary-Constrained Latent Dirichlet Allocation (LDA) combined with Bi-calibrated Dual Granularity Dynamic Topic Model (DTM). We conduct a systematic empirical analysis of dynamic topic evolution in the AI field using English AI papers retrieved from the Web of Science (WOS) Core Collection (2014-2023). Specifically, we first optimize the topic initialization process of LDA using a domain-specific dictionary to enhance the accuracy of static topic clustering. Second, we take the core topics identified by the optimized LDA as prior knowledge and input them into DTM, while optimizing the traditional single temporal granularity to an "annual + quarterly" dual granularity to accurately track the changes in topic strength, keyword composition, and cross-topic association relationships over time. Finally, we verify the performance of the proposed model using perplexity and topic coherence indicators, and reveal the coupling characteristics between different topics through correlation analysis. Results indicate that six core topic clusters have emerged in the AI field, with "generative AI", "large language models", and "multimodal learning" evolving into explosive topics after 2021, traditional machine learning shows a declining trend, with its core focus shifting toward "few-shot learning" and "edge computing adaptation". The top three topic couplings are "generative AI-large language models", "computer vision-multimodal learning" and "reinforcement learning-robotics". The proposed combined model outperforms single models, achieving a 17.3% reduction in perplexity and a 23.5% decrease in topic drift rate, thereby providing valuable decision support for researchers seeking to grasp field frontiers and management departments for optimizing resource allocation.
Authors
Keywords
No keywords available for this article.