Quantifying Facial Gestures Using Deep Learning in a New World Monkey.
Journal:
American journal of primatology
PMID:
40019116
Abstract
Facial gestures are a crucial component of primate multimodal communication. However, current methodologies for extracting facial data from video recordings are labor-intensive and prone to human subjectivity. Although automatic tools for this task are still in their infancy, deep learning techniques are revolutionizing animal behavior research. This study explores the distinctiveness of facial gestures in cotton-top tamarins, quantified using markerless pose estimation algorithms. From footage of captive individuals, we extracted and manually labeled frames to develop a model that can recognize a custom set of landmarks positioned on the face of the target species. The trained model predicted landmark positions and subsequently transformed them into distance matrices representing landmarks' spatial distributions within each frame. We employed three competitive machine learning classifiers to assess the ability to automatically discriminate facial configurations that cooccur with vocal emissions and are associated with different behavioral contexts. Initial analysis showed correct classification rates exceeding 80%, suggesting that voiced facial configurations are highly distinctive from unvoiced ones. Our findings also demonstrated varying context specificity of facial gestures, with the highest classification accuracy observed during yawning, social activity, and resting. This study highlights the potential of markerless pose estimation for advancing the study of primate multimodal communication, even in challenging species such as cotton-top tamarins. The ability to automatically distinguish facial gestures in different behavioral contexts represents a critical step in developing automated tools for extracting behavioral cues from raw video data.