Semantically-Enhanced Feature Extraction with CLIP and Transformer Networks for Driver Fatigue Detection.

Journal: Sensors (Basel, Switzerland)

PMID: 39771685

Abstract

Drowsy driving is a leading cause of commercial vehicle traffic crashes. The trend is to train fatigue detection models using deep neural networks on driver video data, but challenges remain in coarse and incomplete high-level feature extraction and network architecture optimization. This paper pioneers the use of the CLIP (Contrastive Language-Image Pre-training) model for fatigue detection. And by harnessing the power of a Transformer architecture, sophisticated and long-term temporal features are adeptly extracted from video sequences, paving the way for more nuanced and accurate fatigue analysis. The proposed CT-Net (CLIP-Transformer Network) achieves an AUC (Area Under the Curve) of 0.892, a 36% accuracy improvement over the prevalent CNN-LSTM (Convolutional Neural Network-Long Short-Term Memory) end-to-end model, reaching state-of-the-art performance. Experiments show that the CLIP pre-trained model more accurately extracts facial and behavioral features from driver video frames, improving the model's AUC by 7% over the ImageNet-based pre-trained model. Moreover, compared with LSTM, the Transformer more flexibly captures long-term dependencies among temporal features, further enhancing the model's AUC by 4%.

Authors

Zhen Gao

Sylvester Comprehensive Cancer Center, University of Miami Miller School of Medicine, Miami, FL, USA.
Xiaowen Chen

The Jackson Laboratory for Genomic Medicine, Farmington, CT, 06032, USA.
Jingning Xu

School of Computer Science and Technology, Tongji University, Shanghai 201804, China.
Rongjie Yu

The Key Laboratory of Road and Traffic Engineering, Ministry of Education, 4800 Cao'an Road, 201804 Shanghai, China. Electronic address: yurongjie@tongji.edu.cn.
Heng Zhang

Department of Gastroenterology, The Central Hospital of Wuhan, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China.
Jinqiu Yang

Department of Computer Science and Software Engineering, Concordia University, Montreal, QC H3G 1M8, Canada.

Keywords

Accidents, Traffic Algorithms Automobile Driving Fatigue Humans Neural Networks, Computer Semantics Video Recording

External Resources

View on PubMed Access via DOI PubMed (39771685)

Semantically-Enhanced Feature Extraction with CLIP and Transformer Networks for Driver Fatigue Detection.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals