Microsoft AI researchers present PPE: a mathematically guaranteed reinforcement learning (RL) algorithm for exogenous noise

This Article Is Based On The Research Paper 'Provable RL with Exogenous Distractors via Multistep Inverse Dynamics' and Microsoft article. All Credit For This Research Goes To The Researchers Of This Paper 👏👏👏

Please Don't Forget To Join Our ML Subreddit

Reinforcement learning (RL) is training in machine learning.yourate that rewards desirable behavior while penalizing undesirable ones. A reinforcement learning agent can perceive and understand its environment, act and learn by trial and error in general. Although RL agents can solve some problems heuristically, such as helping a robot navigate to a specific location in a given environment, there is no guarantee that they will be able to handle problems in contexts they do not have. yet encountered. The ability of these models to recognize the robot and the obstacles in its path, but not the changes in its environment that occur independently of the agent, which we call exogenous noise, is essential to their success.

Existing RL algorithms are not powerful enough to deal effectively with exogenous noise. They are either unable to solve problems involving complicated observations or require an impossible amount of training data to be successful. They often lack the mathematical confidence necessary to work on new exploratory subjects. Since the cost of a real-world breakdown can be substantial, this warranty is desirable. To solve these problems encountered by an RL agent in the presence of exogenous noise, a team of Microsoft researchers introduced the Path Predictive Elimination (PPE) algorithm (in their paper, “Provable RL with Exogenous Distractors via Multistep Inverse Dynamics”), which guarantees mathematical assurance even in the presence of severe obstructions.

The agent or decision maker has an action space with an “A” number of actions in a general RL model, and he receives information about the world in the form of observations. An agent gets more knowledge about its environment and a reward after performing a single action. The agent’s objective is to maximize the total reward. A real-world RL model must cope with the challenges of large viewing areas and complex observations. According to substantial research, observation in an RL environment is derived from a considerably more compact but hidden endogenous state. In their study, the researchers believe that the dynamics of endogenous states are quasi-deterministic. In most cases, doing a fixed action in an endogenous state always leads to the next endogenous state.

PPE is a simple and fast algorithm that uses a self-supervised “hidden state decoding” approach. The agent trains a machine learning model called a decoder to extract the hidden endogenous state of an observation. It works by teaching the agent a minimal number of pathways that will lead to every conceivable endogenous state. On the other hand, considering all the alternative options can be overwhelming for the agent. By solving a novel self-supervised classification challenge, EPI removes redundant pathways that visit the same endogenous state. The EPI works similarly to the breadth-first search algorithm because the agent learns to explore all endogenous states that can be reached by performing a set of actions. In their study, the team looked at many ways to optimize the model-free and model-based reward function.


Although EPI is a significant advance in reinforcement learning because it provides mathematical guarantees in the presence of exogenous noise, there is still room for improvement until we can answer every RL problem that contains exogenous noise. Questions about the performance of PPE on real-world problems, the need for substantial training data sets to improve accuracy, and the assumptions made by the algorithm remain unresolved. Reinforcement learning is paving the way for a better future in various fields, ranging from automation, healthcare and robotics to finance. On the other hand, exogenous noise poses a significant difficulty in realizing the full potential of RL agents. The researchers anticipate that the introduction of PPE will serve as a springboard for further research on RL in the presence of exogenous noise.



James G. Williams