Enhancing the classification of isolated theropod teeth using machine learning: a comparative study.

Journal: PeerJ
PMID:

Abstract

Classifying objects, such as taxonomic identification of fossils based on morphometric variables, is a time-consuming process. This task is further complicated by intra-class variability, which makes it ideal for automation via machine learning (ML) techniques. In this study, we compared six different ML techniques based on datasets with morphometric features used to classify isolated theropod teeth at both genus and higher taxonomic levels. Our model also intends to differentiate teeth from different positions on the tooth row (, lateral, mesial). These datasets present different challenges like over-representation of certain classes and missing measurements. Given the class imbalance, we evaluate the effect of different standardization and oversampling techniques on the classification process for different classification models. The obtained results show that some classification models are more sensitive to class imbalance than others. This study presents a novel comparative analysis of multi-class classification methods for theropod teeth, evaluating their performance across varying taxonomic levels and dataset balancing techniques. The aim of this study is to evaluate which ML methods are more suitable for the classification of isolated theropod teeth, providing recommendations on how to deal with imbalanced datasets using different standardization, oversampling, and classification tools. The trained models and applied standardizations are made publicly available, providing a resource for future studies to classify isolated theropod teeth. This open-access methodology will enable more reliable cross-study comparisons of fossil records.

Authors

  • Carolina S Marques
    Centro de Estatística e Aplicações, Faculdade de Ciências, Universidade de Lisboa, Lisboa, Portugal.
  • Emmanuel Dufourq
    African Institute for Mathematical Sciences, Muizenberg, South Africa.
  • Soraia Pereira
    Centro de Estatística e Aplicações, Faculdade de Ciências, Universidade de Lisboa, Lisboa, Portugal.
  • Vanda F Santos
    Departamento de Geologia, Faculdade de Ciências, Universidade de Lisboa, Lisboa, Portugal.
  • Elisabete Malafaia
    Instituto Dom Luiz, Faculdade de Ciências, Universidade de Lisboa, Lisboa, Portugal.