Can Generative AI and ChatGPT Break Human Supremacy in Mathematics and Reshape Competence in Cognitive-Demanding Problem-Solving Tasks?

Journal: Journal of Intelligence
Published Date:

Abstract

This study investigates the potential of generative artificial intelligence tools in addressing cognitive challenges encountered by humans during problem-solving. The performance of ChatGPT-4o and GPT-4 models in the NAEP mathematics assessments was evaluated, particularly in relation to the cognitive demands placed on students. Sixty NAEP mathematics assessment tasks, coded by field experts, were analyzed within a framework of cognitive complexity. ChatGPT-4o and GPT-4 provided responses to each question, which were then evaluated using NAEP's scoring criteria. The study's dataset was analyzed using the average performance scores of students who answered correctly and the item-wise response percentages. The results indicated that ChatGPT-4o and GPT-4 outperformed most students on individual items in the NAEP mathematics assessment. Furthermore, as the cognitive demand increased, higher performance scores were required to answer questions correctly. This trend was observed across the 4th, 8th, and 12th grades, though ChatGPT-4o and GPT-4 did not demonstrate statistically significant sensitivity to increased cognitive demands at the 12th-grade level.

Authors

  • Deniz Kaya
    Department of Mathematics Education, Faculty of Education, Nevsehir Hacı Bektas Veli University, 50300 Nevsehir, Türkiye.
  • Selim Yavuz
    Department of Curriculum and Instruction, School of Education, Indiana University Bloomington, Bloomington, IN 47405, USA.

Keywords

No keywords available for this article.