AI researchers from AI21 labs present three new specialized approaches to frozen language in multiple tasks

This Article Is Based On The Research 'STANDING ON THE SHOULDERS OF GIANT FROZEN LANGUAGE MODELS'. All Credit For This Research Goes To The Researchers of This Project 👏👏👏

Please Don't Forget To Join Our ML Subreddit

Language models are used in various NLP tasks to predict the likelihood of a given sequence of words appearing in a sentence. These models use statistical and probabilistic approaches to make predictions.

Language models trained using huge datasets have been proven to provide excellent results in many NLP tasks. Current models take a so-called “jelly” approach, which leaves its weights unchanged. However, these models still perform worse than fine-tuning approaches, which modify these weights in a task-dependent manner.

The AI21 Labs team developed three novel strategies for learning small neural modules that can specialize in a frozen language model for distinct tasks. Their paper, “Standing on the Shoulders of Giant Frozen Language Models”, discusses: input-dependent fast tuning, frozen readers, and recursive LMs. This time-saving technology outperforms traditional frozen model methods and challenges fine-tuned performance without losing model adaptability.

Many previously published strategies improve performance on specific tasks by training a small number of parameters around a fixed model. Although these strategies achieve fine-tuning performance for some applications, peak performance in many practical scenarios still relies on fine-tuned models.

In addition to frozen LMs, the researchers believe that versatile natural language interfaces can be built. To show this, they plan to create more ambitious exterior scaffolds that can extract more from a frozen LM. The main finding is that the existing frozen LM technologies are so small that it is possible to expand them considerably at low cost compared to a single transit through the massive LM.

The team is focusing on two settings where the gold standard is still fine-tuned models.

  1. Massive multitasking: Asking a single model to tackle many NLP tasks simultaneously. The variety of existing multitasking models is refined; no frozen model method has been considered in this context.
  2. Challenging individual tasks, in which state-of-the-art methods are all honed. This includes answering open domain questions and asking a model to answer general knowledge questions.

The researchers demonstrate that a single frozen LM can compete with current fine-tuning methods in demanding scenarios such as huge multitasking or open-domain question answering. To do this, they use three distinct strategies of frozen models. Their results show other advantages of using frozen LMs, including avoiding the high cost of training and serving many different specialized models for different use cases while maintaining versatility, non-forgetfulness, and l extensibility of the LM.

This proposed approach offers two key advantages over the refined multitasking model:

1. Non-Forgetfulness: LM can suffer from catastrophic forgetfulness when it comes to abilities unrelated to these tasks, even after being honed for any multitasking suite.

Frozen LM will never forget anything because it is immutable.

2. Scalability: It is crucial to keep all jobs in the model at the same time as there is no guarantee that the performance of the model on the original job suite will be preserved when fine-tuning the model. However, this is an expensive and impractical training method. In contrast, there is no cross-interference between frozen backbone capabilities when adding new capabilities as external components.

Article: https://arxiv.org/pdf/2204.10019.pdf

James G. Williams