Meta releases language model code to AI researchers • The Register
Meta will release a giant language model for academics, in hopes that a better understanding of how these systems work can make them less toxic and biased.
The Open Pretrained Transformer (OPT-175B) has 175 billion parameters, matching commercial language models like OpenAI’s GPT-3. These types of systems have introduced features that developers can rely on, such as automated writing, content moderation, or even coding. But they can generate biased, toxic, and inaccurate text, which makes their use risky.
As Meta knows all too well from some of the human-generated text, he’s having a hard time dealing with it.
Proprietary tools are often beyond the reach of academic researchers who wish to study the problems of technology – both in terms of accessing the underlying code of a model and possessing the resources to create and train their own models of speech. Meta’s latest code release, however, can help them investigate these systems in more detail.
“We share Open Pretrained Transformer, a language model with 175 billion parameters trained on publicly available datasetsto enable greater community engagement in understanding this fundamental new technology,” social media industry researchers said Tuesday. “For the first time for a language technology system of this size, the release includes at the both the pre-trained models and the code needed to train and use them.”
Meta has also released subsets of the full model – up to 66 billion parameters – that anyone can use. However, the complete and larger OPT-175 system is only available to researchers upon request for non-commercial applications. It was trained using 992 Nvidia 80GB A100 GPUs, achieving a performance of 147 TFLOPS per chip. Future researchers won’t need to build the model and train it from scratch, as Meta provides them with the code to deploy it to 16 Nvidia V100 GPUs.
The formation of such large models is tricky. Meta’s research team said they encountered many failures and had to restart the entire process 35 times over a two-month period, according to a paper [PDF] on arXiv.
A Meta spokesperson said The register the publication of OPT-175 will help scholars replicate the results of Large Language Model (LLM) papers.
“It is important to improve transparency and openness around research at scale so that the future we build with this technology is more equitable and just. The future of LLM work cannot solely live between the hands of those with financial interests in keeping this research behind closed doors,” the spokesperson said.