LSM-2: Learning from Incomplete Wearable Sensor Data
Journal:
arXiv
Published Date:
Jun 5, 2025
Abstract
Foundation models, a cornerstone of recent advancements in machine learning,
have predominantly thrived on complete and well-structured data. Wearable
sensor data frequently suffers from significant missingness, posing a
substantial challenge for self-supervised learning (SSL) models that typically
assume complete data inputs. This paper introduces the second generation of
Large Sensor Model (LSM-2) with Adaptive and Inherited Masking (AIM), a novel
SSL approach that learns robust representations directly from incomplete data
without requiring explicit imputation. AIM's core novelty lies in its use of
learnable mask tokens to model both existing ("inherited") and artificially
introduced missingness, enabling it to robustly handle fragmented real-world
data during inference. Pre-trained on an extensive dataset of 40M hours of
day-long multimodal sensor data, our LSM-2 with AIM achieves the best
performance across a diverse range of tasks, including classification,
regression and generative modeling. Furthermore, LSM-2 with AIM exhibits
superior scaling performance, and critically, maintains high performance even
under targeted missingness scenarios, reflecting clinically coherent patterns,
such as the diagnostic value of nighttime biosignals for hypertension
prediction. This makes AIM a more reliable choice for real-world wearable data
applications.