Zipf's law in China's local government work reports: A 21-year study using natural language processing and regression analysis.

Journal: PloS one
Published Date:

Abstract

The examination and application of Zipf's law is a significant topic in quantitative linguistics. This study presents an in-depth empirical investigation of this law in 651 Chinese provincial government work reports (2003-2023). Employing natural language processing techniques (including Jieba word segmentation with a custom dictionary) and a double-logarithmic regression model, we analyzed word frequency distributions. Our findings indicate that the Zipf coefficient in these reports is close to 1, confirming general adherence to Zipf's law. Over the 21-year period, the Zipf coefficient exhibits fluctuations, with a notable inflection point in 2011, after which it follows a consistent upward trend. This shift is likely influenced by the 18th National Congress of the Communist Party of China, which marked a transition toward more standardized and centralized policy communication. While regional differences among eastern, central, western, and northeastern provinces are minimal, centrally governed municipalities exhibit higher Zipf coefficients than other provincial-level regions. Although our findings largely confirm the applicability of Zipf's Law to this specific corpus, this study is limited by the exclusion of prefecture- and county-level reports. Future research can address this limitation by incorporating a broader range of administrative levels and by conducting cross-country and cultural comparisons of political documents. Further investigation of alternate quantitative linguistic laws (e.g., Heaps' Law, Menzerath's Law) within this corpus is also warranted.

Authors

  • Yanfang Li
    Changchun University of Science and Technology, School of Computer Science and Technology, WeiXing Road, Changchun 130022, China.