Mastering the game of Stratego with model-free multiagent reinforcement learning.

Journal: Science (New York, N.Y.)
PMID:

Abstract

We introduce DeepNash, an autonomous agent that plays the imperfect information game Stratego at a human expert level. Stratego is one of the few iconic board games that artificial intelligence (AI) has not yet mastered. It is a game characterized by a twin challenge: It requires long-term strategic thinking as in chess, but it also requires dealing with imperfect information as in poker. The technique underpinning DeepNash uses a game-theoretic, model-free deep reinforcement learning method, without search, that learns to master Stratego through self-play from scratch. DeepNash beat existing state-of-the-art AI methods in Stratego and achieved a year-to-date (2022) and all-time top-three ranking on the Gravon games platform, competing with human expert players.

Authors

  • Julien Perolat
    DeepMind Technologies Ltd., London, UK.
  • Bart De Vylder
    DeepMind Technologies Ltd., London, UK.
  • Daniel Hennes
    DeepMind Technologies Ltd., London, UK.
  • Eugene Tarassov
    DeepMind Technologies Ltd., London, UK.
  • Florian Strub
    DeepMind Technologies Ltd., London, UK.
  • Vincent de Boer
    DeepMind Technologies Ltd., London, UK.
  • Paul Müller
    Biotechnology Center, Center for Molecular and Cellular Bioengineering, Technische Universität Dresden, Tatzberg 47/49, Dresden, 01307, Germany. paul_mueller@tu-dresden.de.
  • Jerome T Connor
    DeepMind Technologies Ltd., London, UK.
  • Neil Burch
    DeepMind Technologies Ltd., London, UK.
  • Thomas Anthony
    Analytical AI, 1500, 1st Ave. N, Birmingham, AL 35022, USA. Electronic address: thomas@analyticalai.com.
  • Stephen McAleer
    Department of Computer Science, University of California, Irvine, Irvine, CA, USA.
  • Romuald Elie
    DeepMind Technologies Ltd., London, UK.
  • Sarah H Cen
    DeepMind Technologies Ltd., London, UK.
  • Zhe Wang
    Department of Pathology, The Eighth Affiliated Hospital, Sun Yat-sen University, Shenzhen 518033, China.
  • Audrunas Gruslys
    DeepMind Technologies Ltd., London, UK.
  • Aleksandra Malysheva
    DeepMind Technologies Ltd., London, UK.
  • Mina Khan
    DeepMind Technologies Ltd., London, UK.
  • Sherjil Ozair
    DeepMind Technologies Ltd., London, UK.
  • Finbarr Timbers
    DeepMind Technologies Ltd., London, UK.
  • Toby Pohlen
    DeepMind Technologies Ltd., London, UK.
  • Tom Eccles
    DeepMind, London, UK.
  • Mark Rowland
    DeepMind Technologies Ltd., London, UK.
  • Marc Lanctot
    Google DeepMind, 5 New Street Square, London EC4A 3TW, UK.
  • Jean-Baptiste Lespiau
    DeepMind Technologies Ltd., London, UK.
  • Bilal Piot
    DeepMind Technologies Ltd., London, UK.
  • Shayegan Omidshafiei
    DeepMind Technologies Ltd., London, UK.
  • Edward Lockhart
    DeepMind Technologies Ltd., London, UK.
  • Laurent Sifre
    Google DeepMind, 5 New Street Square, London EC4A 3TW, UK.
  • Nathalie Beauguerlange
    DeepMind Technologies Ltd., London, UK.
  • Remi Munos
    DeepMind Technologies Ltd., London, UK.
  • David Silver
    Google DeepMind, 5 New Street Square, London EC4A 3TW, UK.
  • Satinder Singh
    Computer Science and Engineering Department, University of Michigan, Ann Arbor, Michigan 48109, USA.
  • Demis Hassabis
    Google DeepMind, 5 New Street Square, London EC4A 3TW, UK.
  • Karl Tuyls
    DeepMind Technologies Ltd., London, UK.