AI researchers create a video game model that can remember past events

A team of researchers from Uber’s AI lab recently developed a system of AI algorithms that outperformed both human players and other AI systems in classic Atari video games. The AI ​​system developed by the researchers is able to remember previously successful strategies, creating new strategies based on what worked in the past. the study The research team believe the algorithms they have developed have potential applications in other technical areas such as language processing and robotics.

The typical method used to create AI systems capable of playing video games is to use a reinforcement learning algorithm. Reinforcement learning algorithms learn to perform a task by exploring a range of possible actions, and after each action they receive some type of reinforcement (a reward or punishment). Over time, the AI ​​model learns which actions lead to greater rewards, and it becomes more likely to perform those actions. Unfortunately, reinforcement learning models run into problems when they encounter data points that are incongruent with others in the data set.

According to the research team, the reason their approach has not been considered by other AI researchers is that the strategy differs from the “intrinsic motivation” approach typically used in reinforcement learning. The problem with an intrinsic motivation approach is that the model can be prone to “forgetting” potentially rewarding areas that are still worth exploring. This phenomenon is called “detachment”. Therefore, when the model encounters unexpected data, it may miss areas that still need to be explored.

According to TechXplore, the research team set out to create a learning model that was more flexible and able to respond to unexpected data. The researchers overcame this problem by introducing an algorithm that could remember all the actions taken by a previous version of the model when trying to solve a problem. When the AI ​​model encounters a data point that is not consistent with what it has learned so far, the model checks its memory map. The model will then identify successful and unsuccessful strategies and choose the strategies appropriately.

When playing a video game, the model collects screenshots of the game as it plays, creating a log of its actions. The images are grouped together based on similarity, forming precise points in time that the model can refer to. The algorithm can use the recorded images to return to an interesting point in time and continue exploring from there. When the model sees that he is losing, he refers to the screenshots taken and tries a different strategy.

As the BBC explains, there is also the problem of handling dangerous scenarios for the AI ​​agent playing the game. If the agent encounters a danger that can kill him, it would prevent him from returning to areas that deserve more exploration, a problem called “derailment”. The AI ​​model addresses derailment issues through a separate process from that used to encourage exploration of old areas.

The research team ran the mode through 55 Atari games. These games are commonly used to compare the performance of AI models, but the researchers added a twist to their model. The researchers introduced additional rules to the games, instructing the model to not only achieve the highest score possible, but to try to achieve an even higher score each time. When the model’s performance results were analyzed, the researchers found that their AI system outperformed other gaming AIs about 85% of the time. The AI ​​did particularly well in Montezuma’s Revenge, a platform game where the player avoids dangers and collects treasures. The game broke the record for a human player and also scored higher than any other AI system.

According to the Uber AI researchers, the strategies used by the research team have applications for industries like robotics. Bots benefit from the ability to remember which actions succeeded, which did not work, and which have not yet been tried.

James G. Williams