FOLIAGE: Towards Physical Intelligence World Models Via Unbounded Surface Evolution
Journal:
arXiv
Published Date:
May 29, 2025
Abstract
Physical intelligence -- anticipating and shaping the world from partial,
multisensory observations -- is critical for next-generation world models. We
propose FOLIAGE, a physics-informed multimodal world model for unbounded
accretive surface growth. In its Action-Perception loop, a unified context
encoder maps images, mesh connectivity, and point clouds to a shared latent
state. A physics-aware predictor, conditioned on physical control actions,
advances this latent state in time to align with the target latent of the
surface, yielding a Modality-Agnostic Growth Embedding (MAGE) that interfaces
with critic heads for downstream objectives. FOLIAGE's Accretive Graph Network
(AGN) captures dynamic connectivity through Age Positional Encoding and
Energy-Gated Message-Passing. Geometry-Correspondence Fusion and Cross-Patch
Masking enhance MAGE's expressiveness, while Hierarchical Pooling balances
global context with local dynamics. We create SURF-GARDEN, a world model
learning platform comprising a Counterfactual Physics Simulator, a Multimodal
Correspondence Extractor, and Evolution Tracing, which generates 7,200 diverse
surface-growth sequences. SURF-BENCH, our physical-intelligence evaluation
suite, evaluates six core tasks -- topology recognition, inverse material
estimation, growth-stage classification, latent roll-out, cross-modal
retrieval, and dense correspondence -- and four stress tests -- sensor dropout,
zero-shot modality transfer, long-horizon prediction, and physics ablation --
to probe resilience. FOLIAGE outperforms specialized baselines while remaining
robust across dynamic environments, establishing a new world-model based,
multimodal pathway to physical intelligence.