Hold My Beer: Learning Gentle Humanoid Locomotion and End-Effector Stabilization Control
Journal:
arXiv
Published Date:
May 30, 2025
Abstract
Can your humanoid walk up and hand you a full cup of beer, without spilling a
drop? While humanoids are increasingly featured in flashy demos like dancing,
delivering packages, traversing rough terrain, fine-grained control during
locomotion remains a significant challenge. In particular, stabilizing a filled
end-effector (EE) while walking is far from solved, due to a fundamental
mismatch in task dynamics: locomotion demands slow-timescale, robust control,
whereas EE stabilization requires rapid, high-precision corrections. To address
this, we propose SoFTA, a Slow-Fast Two-Agent framework that decouples
upper-body and lower-body control into separate agents operating at different
frequencies and with distinct rewards. This temporal and objective separation
mitigates policy interference and enables coordinated whole-body behavior.
SoFTA executes upper-body actions at 100 Hz for precise EE control and
lower-body actions at 50 Hz for robust gait. It reduces EE acceleration by 2-5x
relative to baselines and performs much closer to human-level stability,
enabling delicate tasks such as carrying nearly full cups, capturing steady
video during locomotion, and disturbance rejection with EE stability.