Estimating the predictability of questionable open-access journals.

Journal: Science advances
Published Date:

Abstract

Questionable journals threaten global research integrity, yet manual vetting can be slow and inflexible. Here, we explore the potential of artificial intelligence (AI) to systematically identify such venues by analyzing website design, content, and publication metadata. Evaluated against extensive human-annotated datasets, our method achieves practical accuracy and uncovers previously overlooked indicators of journal legitimacy. By adjusting the decision threshold, our method can prioritize either comprehensive screening or precise, low-noise identification. At a balanced threshold, we flag over 1000 suspect journals, which collectively publish hundreds of thousands of articles, receive millions of citations, acknowledge funding from major agencies, and attract authors from developing countries. Error analysis reveals challenges involving discontinued titles, book series misclassified as journals, and small society outlets with limited online presence, which are issues addressable with improved data quality. Our findings demonstrate AI's potential for scalable integrity checks, while also highlighting the need to pair automated triage with expert review.

Authors

  • Han Zhuang
  • Lizhen Liang
    School of Information Studies, Syracuse University, NY 13244, USA.
  • Daniel E Acuna
    Department of Computer Science, University of Colorado at Boulder, CO 80309, USA.