Memristor-based spiking neural network with online reinforcement learning.

Journal: Neural networks : the official journal of the International Neural Network Society
PMID:

Abstract

Neural networks implemented in memristor-based hardware can provide fast and efficient in-memory computation, but traditional learning methods such as error back-propagation are hardly feasible in it. Spiking neural networks (SNNs) are highly promising in this regard, as their weights can be changed locally in a self-organized manner without the demand for high-precision changes calculated with the use of information almost from the entire network. This problem is rather relevant for solving control tasks with neural-network reinforcement learning methods, as those are highly sensitive to any source of stochasticity in a model initialization, training, or decision-making procedure. This paper presents an online reinforcement learning algorithm in which the change of connection weights is carried out after processing each environment state during interaction-with-environment data generation. Another novel feature of the algorithm is that it is applied to SNNs with memristor-based STDP-like learning rules. The plasticity functions are obtained from real memristors based on poly-p-xylylene and CoFeB-LiNbO nanocomposite, which were experimentally assembled and analyzed. The SNN is comprised of leaky integrate-and-fire neurons. Environmental states are encoded by the timings of input spikes, and the control action is decoded by the first spike. The proposed learning algorithm solves the Cart-Pole benchmark task successfully. This result could be the first step towards implementing a real-time agent learning procedure in a continuous-time environment that can be run on neuromorphic systems with memristive synapses.

Authors

  • Danila Vlasov
    NRC "Kurchatov Institute", Akademika Kurchatova sq., 1 Moscow, Russian Federation.
  • Anton Minnekhanov
    NRC "Kurchatov Institute", Akademika Kurchatova sq., 1 Moscow, Russian Federation.
  • Roman Rybka
    NRC "Kurchatov Institute", Akademika Kurchatova sq., 1 Moscow, Russian Federation; Russian Technological University "MIREA", Vernadsky av., 78 Moscow, Russian Federation. Electronic address: Rybka_RB@nrcki.ru.
  • Yury Davydov
    NRC "Kurchatov Institute", Akademika Kurchatova sq., 1 Moscow, Russian Federation.
  • Alexander Sboev
    NRC "Kurchatov Institute", Akademika Kurchatova sq., 1 Moscow, Russian Federation; Russian Technological University "MIREA", Vernadsky av., 78 Moscow, Russian Federation; NRNU "MEPhi", Kashira Hwy, 31 Moscow, Russian Federation.
  • Alexey Serenko
    NRC "Kurchatov Institute", Akademika Kurchatova sq., 1 Moscow, Russian Federation.
  • Alexander Ilyasov
    NRC "Kurchatov Institute", Akademika Kurchatova sq., 1 Moscow, Russian Federation; Faculty of Physics, Lomonosov Moscow State University, Leninskie gory, 1 Moscow, Russian Federation.
  • Vyacheslav Demin
    NRC "Kurchatov Institute", Akademika Kurchatova sq., 1 Moscow, Russian Federation. Electronic address: Demin_VA@nrcki.ru.