Deep Reinforcement Learning With Modulated Hebbian Plus Q-Network Architecture.

Journal: IEEE transactions on neural networks and learning systems

PMID: 34559664

Abstract

In this article, we consider a subclass of partially observable Markov decision process (POMDP) problems which we termed confounding POMDPs. In these types of POMDPs, temporal difference (TD)-based reinforcement learning (RL) algorithms struggle, as TD error cannot be easily derived from observations. We solve these types of problems using a new bio-inspired neural architecture that combines a modulated Hebbian network (MOHN) with deep Q-network (DQN), which we call modulated Hebbian plus Q-network architecture (MOHQA). The key idea is to use a Hebbian network with rarely correlated bio-inspired neural traces to bridge temporal delays between actions and rewards when confounding observations and sparse rewards result in inaccurate TD errors. In MOHQA, DQN learns low-level features and control, while the MOHN contributes to high-level decisions by associating rewards with past states and actions. Thus, the proposed architecture combines two modules with significantly different learning algorithms, a Hebbian associative network and a classical DQN pipeline, exploiting the advantages of both. Simulations on a set of POMDPs and on the Malmo environment show that the proposed algorithm improved DQN's results and even outperformed control tests with advantage-actor critic (A2C), quantile regression DQN with long short-term memory (QRDQN + LSTM), Monte Carlo policy gradient (REINFORCE), and aggregated memory for reinforcement learning (AMRL) algorithms on most difficult POMDPs with confounding stimuli and sparse rewards.

Authors

Pawel Ladosz
Eseoghene Ben-Iwhiwhu
Jeffery Dick
Nicholas Ketz
Soheil Kolouri

Department of Computer Science, Vanderbilt University, Nashville, TN 37212, USA.
Jeffrey L Krichmar

Department of Computer Science, University of California Irvine, Irvine, CA, USA; Department of Cognitive Sciences, University of California Irvine, Irvine, CA, USA.
Praveen K Pilly

Information and Systems Sciences Laboratory, HRL Laboratories LLC, Malibu, CA 90265, USA. Electronic address: pkpilly@hrl.com.
Andrea Soltoggio

Department of Computer Science, Loughborough University, LE11 3TU, Loughborough, UK. Electronic address: a.soltoggio@lboro.ac.uk.

Keywords

Algorithms Markov Chains Neural Networks, Computer Reinforcement, Psychology Reward

External Resources

View on PubMed Access via DOI PubMed (34559664)

Deep Reinforcement Learning With Modulated Hebbian Plus Q-Network Architecture.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals