mTSBench: Benchmarking Multivariate Time Series Anomaly Detection and Model Selection at Scale
Journal:
arXiv
Published Date:
Jun 26, 2025
Abstract
Multivariate time series anomaly detection (MTS-AD) is critical in domains
like healthcare, cybersecurity, and industrial monitoring, yet remains
challenging due to complex inter-variable dependencies, temporal dynamics, and
sparse anomaly labels. We introduce mTSBench, the largest benchmark to date for
MTS-AD and unsupervised model selection, spanning 344 labeled time series
across 19 datasets and 12 diverse application domains. mTSBench evaluates 24
anomaly detection methods, including large language model (LLM)-based detectors
for multivariate time series, and systematically benchmarks unsupervised model
selection techniques under standardized conditions. Consistent with prior
findings, our results confirm that no single detector excels across datasets,
underscoring the importance of model selection. However, even state-of-the-art
selection methods remain far from optimal, revealing critical gaps. mTSBench
provides a unified evaluation suite to enable rigorous, reproducible
comparisons and catalyze future advances in adaptive anomaly detection and
robust model selection.