Improving internet of vehicles research: A systematic preprocessing framework for the VeReMi dataset.

Journal: Data in brief

Published Date: Apr 28, 2025

Abstract

The Vehicular Reference Misbehavior Dataset (VeReMi) is a vital resource for advancing Intelligent Transportation Systems (ITS) and the Internet of Vehicles (IoV). However, its large size (∼7 GB) and inherent class imbalance pose significant challenges for machine learning model development. This paper presents a preprocessing framework to enhance VeReMi's usability and relevance. Through 10 % down-sampling, the dataset was reduced to ∼724MB, making it computationally manageable. Biases were addressed by balancing benign and malicious samples through synthesis and identifying benign instances using predefined criteria. A refined feature set, including key attributes like and (renamed ), was selected to improve machine learning compatibility. This preprocessing pipeline effectively maintains data integrity and preserves the representativeness of malicious patterns. The optimized dataset is well-suited for ITS and IoV applications, such as anomaly detection and network security, underscoring the crucial role of preprocessing in overcoming real-world constraints and enhancing model performance.

Authors

Aparup Roy

Bachelor of Science (B.S.) in Data Science and Applications (Pursuing), Indian Institute of Technology Madras, BS Degree Office, 3rd Floor, ICSR Building, IIT Madras, Chennai 600036, India.
Debotosh Bhattacharjee

∥Department of Computer Science and Engineering, Jadavpur University, Kolkata-700032, West Bengal, India.
Ondrej Krejcar

Center for Basic and Applied Research, Faculty of Informatics and Management, University of Hradec Kralove, Rokitanskeho 62, Hradec Kralove 500 03, Czech Republic. ondrej.krejcar@uhk.cz.

Keywords

No keywords available for this article.

External Resources

View on PubMed Access via DOI PubMed (40475077)

Improving internet of vehicles research: A systematic preprocessing framework for the VeReMi dataset.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals

Improving internet of vehicles research: A systematic preprocessing framework for the VeReMi dataset.

Abstract

Authors

Keywords

External Resources

Don't Miss the Future of Medicine

Popular Topics

Recent Journals