Deep Reinforcement Learning With Modulated Hebbian Plus Q-Network Architecture.

Journal: IEEE transactions on neural networks and learning systems
PMID:

Abstract

In this article, we consider a subclass of partially observable Markov decision process (POMDP) problems which we termed confounding POMDPs. In these types of POMDPs, temporal difference (TD)-based reinforcement learning (RL) algorithms struggle, as TD error cannot be easily derived from observations. We solve these types of problems using a new bio-inspired neural architecture that combines a modulated Hebbian network (MOHN) with deep Q-network (DQN), which we call modulated Hebbian plus Q-network architecture (MOHQA). The key idea is to use a Hebbian network with rarely correlated bio-inspired neural traces to bridge temporal delays between actions and rewards when confounding observations and sparse rewards result in inaccurate TD errors. In MOHQA, DQN learns low-level features and control, while the MOHN contributes to high-level decisions by associating rewards with past states and actions. Thus, the proposed architecture combines two modules with significantly different learning algorithms, a Hebbian associative network and a classical DQN pipeline, exploiting the advantages of both. Simulations on a set of POMDPs and on the Malmo environment show that the proposed algorithm improved DQN's results and even outperformed control tests with advantage-actor critic (A2C), quantile regression DQN with long short-term memory (QRDQN + LSTM), Monte Carlo policy gradient (REINFORCE), and aggregated memory for reinforcement learning (AMRL) algorithms on most difficult POMDPs with confounding stimuli and sparse rewards.

Authors

  • Pawel Ladosz
  • Eseoghene Ben-Iwhiwhu
  • Jeffery Dick
  • Nicholas Ketz
  • Soheil Kolouri
    Department of Computer Science, Vanderbilt University, Nashville, TN 37212, USA.
  • Jeffrey L Krichmar
    Department of Computer Science, University of California Irvine, Irvine, CA, USA; Department of Cognitive Sciences, University of California Irvine, Irvine, CA, USA.
  • Praveen K Pilly
    Information and Systems Sciences Laboratory, HRL Laboratories LLC, Malibu, CA 90265, USA. Electronic address: pkpilly@hrl.com.
  • Andrea Soltoggio
    Department of Computer Science, Loughborough University, LE11 3TU, Loughborough, UK. Electronic address: a.soltoggio@lboro.ac.uk.