Deepmind AI researchers present “DeepNash”, an autonomous agent trained with model-free multi-agent reinforcement learning that learns to play the game of Stratego at the expert level

For several years, the board game Stratego has been considered one of the most promising areas of research in Artificial Intelligence. Stratego is a two-player board game in which each player tries to take the other player’s flag. There are two main challenges in the game. 1) There are 10535 potential states in the Stratego game tree. 2) Each player in this game must consider 1066 possible deployments at the start of the game. Due to the various complex components of the game structure, the AI ​​research community has made minimal progress in this area.

This research introduces DeepNash, an autonomous agent that can develop human-level expertise in the imperfect information game Stratego from scratch. Regularized Nash Dynamics (R-NaD), a principle-based, model-free reinforcement learning technique, is the backbone of DeepNash. DeepNash achieves an ε-Nash equilibrium by integrating R-NaD with a deep neural network architecture. A Nash equilibrium ensures that the agent will perform well even against the worst-case adversary. The strategy game and a description of the DeepNash technique are shown in Figure 1.


DeepNash consists of three parts: a fundamental R-NaD training component, learned policy refinement, and post-processing at test time. R-NaD depends on three important steps: reward transformation, dynamics and updating. Moreover, DeepNash’s R-NaD learning method relies on the concept of regularization for convergence. The DeepNash network includes four heads, each of which is a smaller version of the torso and has final layers added with residual blocks and jump connections. The first DeepNash head generates the value function as a scalar, but the other three heads encode the agent’s policy by developing a probability distribution during play and deployment.

DeepNash’s dynamic stage is divided into two sections. The first part estimates the value function by adapting the trace estimator v to the case of imperfect information with two players. The second phase, using state action value estimation based on the trace estimator v, learns the policy via Neural Replicator Dynamics (NeuRD) update. Fine-tuning is done during training by applying additional thresholding and discretization to the action probabilities.

DeepNash’s performance is evaluated using the Gravon platform and eight well-known AI bots. DeepNash was tested against top human players for two weeks in early April 2022, resulting in 50 ladder matches in which DeepNash won 42%. Therefore, this equates to a rating of 1799 in the Classic Stratego 2022 challenge leaderboard, which placed DeepNash in third place among all Gravon Stratego players. This also resulted in an all-time Classic Stratego rating of 1778, placing DeepNash third among all ranked Gravon Stratego players. Despite not having received any training against any of the bots and simply using self-play, Table 1 shows that DeepNash wins the vast majority of games.


In this game, the key to being unworkable is having an unpredictable deployment, and DeepNash can produce billions of such implementations. DeepNash can compromise; for example, a player must weigh the value of capturing an opponent’s piece and thereby giving information about their piece against not capturing a piece but keeping a piece’s identity hidden. Additionally, DeepNash can handle situations involving casual bluffing, negative bluffing, and complex bluffing.

On the Gravon platform, DeepNash has a minimum win rate of 97% against other AI bots and an overall win rate of 84% against expert human players. DeepNash may open new opportunities for reinforcement learning methods in imperfectly known real-world multi-agent problems with astronomical state spaces that are now beyond the reach of existing state-of-the-art AI techniques.

This Article is written as a summary article by Marktechpost Staff based on the research paper 'Mastering the Game of Stratego with Model-Free Multiagent Reinforcement Learning'. All Credit For This Research Goes To Researchers on This Project. Checkout the paper.

Please Don't Forget To Join Our ML Subreddit

James G. Williams