On the use of machine learning algorithms in forensic anthropology.

Journal: Legal medicine (Tokyo, Japan)
Published Date:

Abstract

The classification performance of the statistical methods binary logistic regression (BLR), multinomial and penalized multinomial logistic regression (MLR, pMLR), linear discriminant analysis (LDA), and the machine learning algorithms naïve Bayes classification (NBC), decision trees (DT), random forest (RF), artificial neural networks (ANN), support vector machines (linear, polynomial or radial) (SVM), multivariate adaptive regression splines (MARS), and extreme gradient boosting (XGB) is examined in skeletal sex/ancestry estimation. The datasets used to test the performance of these methods were obtained from a documented human skeletal collection, Athens Collection, and the Howells Craniometric data set. For their implementation, an R package has been written to search for the optimum tuning parameters under cross-validation and perform sex/ancestry classification. It was found that the classification performance may vary significantly depending on the problem. From the methods tested, LDA and the machine learning technique of linear SVM exhibit the best performance, with high prediction accuracy and relatively low bias in most of the tests. ANN and pMLR can generally be considered to give satisfactory predictions, whereas NBC when using metric traits and DT are the worst of the classification methods examined. The possibility of making the models developed via the machine learning algorithms applicable to other assemblages without the use of a training sample is also discussed.

Authors

  • Efthymia Nikita
    Science and Technology in Archaeology and Culture Research Center, The Cyprus Institute, 20 Konstantinou Kavafi Street, 2121 Aglantzia, Nicosia, Cyprus. efi.nikita@gmail.com.
  • Panos Nikitas
    Department of Chemistry, Aristotle University of Thessaloniki, University Campus, 54124, Thessaloniki, Greece.