The influence of Gen-AI tools application for text data augmentation: case of Lithuanian educational context data classification.

Journal: Scientific reports
Published Date:

Abstract

Today, Gen-AI tools are used for various purposes, ranging from everyday tasks, such as summarizing texts, to high-level solutions tailored to a company's needs. Trustable and high-quality datasets are the most important component in building the models for all artificial intelligence-based solutions. In some specific areas, creating a large dataset manually can be challenging, so various techniques can be used to expand existing datasets. Therefore, in this research, the Gen-AI tools were used to augment the educational context text dataset that can be used to detect students who used generators to answer open-ended questions. An experimental investigation has been performed to evaluate the effectiveness of three Gen-AI tools in augmenting the existing dataset: OpenAI ChatGPT, Google Gemini, and Microsoft Copilot. During the augmentation process, the number of texts increased from 1079 to 7982. To find the efficiency of each Gen-AI tool or their combinations, the dataset has been divided into various subsets. All subsets were used to train several machine-learning algorithms. Additionally, the text has been processed into numerical data using two methods: bag-of-words and sBERT. A total of 15,296 models have been trained, tested, and evaluated. The results of the research have shown that text augmentation using Gen-AI tools increased the models' accuracy.

Authors

  • Pavel Stefanovič
    Faculty of Fundamental Science, Vilnius Gediminas Technical University, Saulėtekio al. 11, LT-10223 Vilnius, Lithuania.
  • Urtė Radvilaitė
    Department of Information Systems, Vilnius Gediminas Technical University, Saulėtekio al. 11, Vilnius, 10223, Lithuania.
  • Birutė Pliuskuvienė
    Department of Information Systems, Vilnius Gediminas Technical University, Saulėtekio al. 11, Vilnius, 10223, Lithuania.
  • Simona Ramanauskaitė
    Department of Information Technology, Vilnius Gediminas Technical University, Saulėtekio al. 11, Vilnius, 10223, Lithuania.

Keywords

No keywords available for this article.