Neurocontrol for fixed-length trajectories in environments with soft barriers.

Journal: Neural networks : the official journal of the International Neural Network Society

PMID: 39705770

Abstract

In this paper we present three neurocontrol problems where the analytic policy gradient via back-propagation through time is used to train a simulated agent to maximise a polynomial reward function in a simulated environment. If the environment includes terminal barriers (e.g. solid walls) which terminate the episode whenever the agent touches them, then we show learning can get stuck in oscillating limit cycles, or local minima. Hence we propose to use fixed-length trajectories, and change these barriers into soft barriers, which the agent may pass through, while incurring a significant penalty cost. We demonstrate that the presence of soft barriers can have the drawback of causing exploding learning gradients. Furthermore, the strongest learning gradients often appear at inappropriate parts of the trajectory, where control of the system has already been lost. When combined with modern adaptive optimisers, this combination of exploding gradients and inappropriate learning often causes learning to grind to a halt. We propose ways to avoid these difficulties; either by careful gradient clipping, or by smoothly truncating the gradients of the soft barriers' polynomial cost functions. We argue that this enables the learning algorithm to avoid exploding gradients, and also to concentrate on the most important parts of the trajectory, as opposed to parts of the trajectory where control has already been irreversibly lost.

Authors

Michael Fairbank
Danil Prokhorov

Toyota Research Institute NA, Ann Arbor, MI, US.
David Barragan-Alcantar

School of Computer Science and Electronic Engineering, University of Essex, Colchester, UK.
Spyridon Samothrakis

Institute for Analytics and Data Science, University of Essex, Colchester, Essex, United Kingdom.
Shuhui Li

Keywords

Algorithms Computer Simulation Environment Humans Learning Neural Networks, Computer Reward

External Resources

View on PubMed Access via DOI PubMed (39705770)

Neurocontrol for fixed-length trajectories in environments with soft barriers.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals