Driver facial emotion tracking using an enhanced residual network with weighted fusion of channel and spatial attention.
Journal:
Scientific reports
PMID:
40221566
Abstract
Facial expression recognition (FER) plays a crucial role in interpreting human emotions and intentions in real-life applications, such as advanced driver assistance systems. However, it faces challenges due to subtle facial variations, environmental factors, and occlusions. In this paper, we propose a novel CNN-based model for driver facial emotion tracking, named FARNet, which incorporates residual connections and is inspired by vision transformer architectures. The model integrates a fusion of channel and spatial attention mechanisms with learnable weights to enhance FER performance while maintaining moderate complexity. It comprises four stages with residual blocks in a 2:2:4:2 ratio and approximately 3.05 million parameters, making it parameter-efficient compared to existing models. We evaluate FARNet on five popular FER datasets: CK+, OuluCASIA, RAF-DB, FER+, and AffectNet. The model achieves the highest accuracy on three datasets and the second-highest on the rest, with results ranging from 57.03% on AffectNet to 100% on CK + and OuluCASIA, remaining competitive against other methods.