Application of Large Language Models (LLM) for Automatic Classification of Work Accident Text Data: Verification of Accuracy and Practicality

Journal: medRxiv
Published Date:

Abstract

Falls are the most frequent type of occupational accident, making the development of effective countermeasures an urgent issue. Traditional accident analysis relies on manual classification of text data by experts, a process that is both time-consuming and labor-intensive. Large Language Models (LLMs) offer the potential to significantly streamline this analysis process without the need for task-specific pre-training. This study aims to automatically classify text data from occupational accidents using LLMs and to verify the accuracy and practicality of this approach. The analysis targeted 2,619 fall-related injury cases in the health/hygiene and retail sectors, extracted from the 2021 Survey on Industrial Accidents database. The results of manual classification performed by experts in a previous study were used as the “ground truth.” These were compared against the automatic classification results from four different LLMs (GPT-4.1, GPT-4.1 mini, GPT-4o mini, and o4-mini). Evaluation metrics included accuracy, precision, recall, F1-score, and Cohen’s kappa coefficient. The processing was conducted using OpenAI’s Batch API, with processing time and costs also being measured. Newer generation models demonstrated a high rate of agreement with expert classifications across most categories, with the exception of “causal substance,” generally achieving a Cohen’s kappa coefficient above 0.7. For the “accident location (indoor/outdoor)” category, the accuracy reached over 91%. Even for “causal substance,” the category with the lowest accuracy, the reasoning model o4-mini achieved a kappa coefficient of 0.662. In terms of practicality, even when using the highest-performing model (o4-mini), the entire dataset was processed in approximately 90 minutes at a cost of about $11, demonstrating high cost-performance. This study demonstrates that LLMs can classify occupational accident text data with an accuracy comparable to manual expert analysis, but at a lower cost and higher speed. This method is expected to facilitate large-scale accident analysis, which has been challenging in the past, and contribute to the rapid development of evidence-based preventive measures for occupational accidents.

Authors

  • Hajime Ando; Ryutaro Matsugaki; Sakumi Yamakawa; Akira Ogami