Princeton and Google AI researchers propose ReAct: an effective artificial intelligence method to synergize reasoning and action in large language models

Although large language models (LLMs) have shown amazing performance on tasks involving interactive decision-making and language comprehension, their reasoning (such as chain-of-thought prompting) and action skills (such as the generation of action plans) have mainly been studied as separate topics. Recent work focuses on translating textual contexts into textual actions using internal knowledge of the language model when using pre-trained language models to act in various interactive environments (such as text games, online browsing, etc). In contrast, with chain of thought incitement, a model generates reasoning from its internal representations and is not anchored to the external world. This limits his ability to investigate, reason, or update his knowledge in response to events.

In their most recent project, a research team from Google Research investigated the use of LLMs to produce interspersed reasoning traces and task-specific actions. The researchers set up a generic paradigm in their research paper titled “ReAct: Synergizing Reasoning and Acting in Language Models” to enable language models to handle a variety of linguistic reasoning and decision-making tasks. They show that the Reason + Act (ReAct) paradigm consistently performs better than the reason-only and action-only paradigms when it comes to inducing larger language models, optimizing smaller language models and improve human interpretability and reliability. ReAct allows language models to simultaneously produce traces of verbal reasoning and textual actions.

The PaLM-540B fixed language model used in the ReAct prompt configuration is prompted with a limited number of contextual examples to produce task-solving domain-specific actions (such as “search” in answering questions and “go to” in the room). navigation). When performing tasks where reasoning is crucial, the creation of reasoning and action traces are alternated, resulting in a task-solution trajectory that includes several phases of reasoning-action-observation. On the other hand, traces of reasoning only need to be present in a sparse way at the most crucial places of a trajectory in decision-making tasks that can involve a large number of actions. In this case, the prompts are written using sparse reasoning and the language model determines when the reasoning traces and actions will occur asynchronously. The group also investigated the use of ReAct-formatted trajectories to optimize smaller language models. The PaLM-540B model invited by ReAct was used to generate trajectories. The task success trajectories were then used to refine smaller language models (PaLM-8/62B) to reduce the need for extensive human annotations.

For the assessment purposes, four benchmarks – question answering (HotPotQA), fact-checking (Fever), text-based games (ALFWorld) and web page browsing (WebShop) – were used to compare ReAct and cutting-edge references. . With respect to question answering (HotpotQA) and fact-checking (Fever), the model overcomes common problems of hallucination and error propagation in chain-of-thought reasoning by interacting with a simple Wikipedia API and by producing human-like task-solving trajectories that are more interpretable than baselines without traces of reasoning. Additionally, ReAct outperforms imitation and reinforcement learning techniques on two interactive decision-making repositories, ALFWorld and WebShop, while receiving only one or two in-context examples, with absolute success rates of 34 % and 10%, respectively.

The research team also investigated human interactions in the loop with the system by giving a human inspector control of ReAct’s reasoning traces. ReAct has been shown to be able to alter its behavior to comply with Inspector reviews and perform a task effectively by simply replacing a hallucinatory line with Inspector advice. ReAct greatly simplifies problem solving as it only requires manual editing of a small number of ideas, opening up new possibilities for human-machine collaboration.

ReAct is a simple but successful technique for integrating action and thought into language models, to put it briefly. It shows that it is possible to describe thought, behavior and environmental feedback in a language model, resulting in a flexible agent capable of handling problems requiring interaction with the environment. ReAct achieves improved performance with understandable decision traces through several experiments focusing on multi-hop question answering, fact-checking, and interactive decision-making challenges. Google intends to continue working on ReAct using the huge potential of the language model to address more complex embodied tasks. They want to achieve this using techniques such as large-scale multitasking and pairing ReAct with powerful reward models.

Check Paper and Reference article. All credit for this research goes to the researchers on this project. Also don’t forget to register. our Reddit page and discord channelwhere we share the latest AI research news, cool AI projects, and more.

Khushboo Gupta is an intern consultant at MarktechPost. She is currently pursuing her B.Tech from Indian Institute of Technology (IIT), Goa. She is passionate about the fields of machine learning, natural language processing and web development. She likes to learn more about the technical field by participating in several challenges.

James G. Williams