Student-Informed Teacher Training
Journal:
arXiv
Published Date:
Dec 12, 2024
Abstract
Imitation learning with a privileged teacher has proven effective for
learning complex control behaviors from high-dimensional inputs, such as
images. In this framework, a teacher is trained with privileged task
information, while a student tries to predict the actions of the teacher with
more limited observations, e.g., in a robot navigation task, the teacher might
have access to distances to nearby obstacles, while the student only receives
visual observations of the scene. However, privileged imitation learning faces
a key challenge: the student might be unable to imitate the teacher's behavior
due to partial observability. This problem arises because the teacher is
trained without considering if the student is capable of imitating the learned
behavior. To address this teacher-student asymmetry, we propose a framework for
joint training of the teacher and student policies, encouraging the teacher to
learn behaviors that can be imitated by the student despite the latters'
limited access to information and its partial observability. Based on the
performance bound in imitation learning, we add (i) the approximated action
difference between teacher and student as a penalty term to the reward function
of the teacher, and (ii) a supervised teacher-student alignment step. We
motivate our method with a maze navigation task and demonstrate its
effectiveness on complex vision-based quadrotor flight and manipulation tasks.