Topic Modeling as an evaluation basis in literature research - A proposal for a new literature review method for machine-assisted source evaluation using the example of anthropology.
Journal:
Anthropologischer Anzeiger; Bericht uber die biologisch-anthropologische Literatur
Published Date:
Mar 16, 2023
Abstract
Topic modeling is a machine learning method that has been used in disciplines like social sciences or the industrial production sector. With topic modeling, a scientist can reduce many articles to a few topics to get an overview of a specific field (e.g., for a scoping review). The objectives of this paper were (1) to demonstrate the applicability of topic modeling to the field of anthropology by a new framework and (2) to present a new method for determining the optimal number of topics used. The documents used in this paper were collected from the database IEEE, using the search term "anthropology" to obtain a broad range of topics. Topic modeling was performed by Latent Dirichlet Allocation (LDA) method, using R. To determine the optimal candidate of topics (), a mathematical formula based on the slope of the perplexity curve was established. The application of the framework to the corpus of 518 documents was able to sort all documents into 15 research areas with little time investment by the researcher while using a standard laptop computer. The process of semantic validation was successfully done for all 15 topics. The presented framework with the optimal number of topics enables scientists in the field of anthropology to perform a scoping review and thus spend less time to manually categorize documents. Topic modeling can be used by researchers in multidisciplinary projects to improve understanding content in a faster way.