Standing on the Shoulders of Giant Frozen Language Models

Huge pretrained language models (LMs) have demonstrated surprisingly good zero-shot capabilities on a wide variety of tasks. This gives rise to the appealing vision of a single, versatile model with a wide range of functionalities across disparate applications. However, current leading techniques fo...

Full description

Saved in:

Bibliographic Details
Main Authors	Levine, Yoav, Dalmedigos, Itay, Ram, Ori, Zeldes, Yoel, Jannai, Daniel, Muhlgay, Dor, Osin, Yoni, Lieber, Opher, Lenz, Barak, Shalev-Shwartz, Shai, Shashua, Amnon, Leyton-Brown, Kevin, Shoham, Yoav
Format	Journal Article
Language	English
Published	21.04.2022
Subjects	Computer Science - Artificial Intelligence Computer Science - Computation and Language
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Huge pretrained language models (LMs) have demonstrated surprisingly good zero-shot capabilities on a wide variety of tasks. This gives rise to the appealing vision of a single, versatile model with a wide range of functionalities across disparate applications. However, current leading techniques for leveraging a "frozen" LM -- i.e., leaving its weights untouched -- still often underperform fine-tuning approaches which modify these weights in a task-dependent way. Those, in turn, suffer forgetfulness and compromise versatility, suggesting a tradeoff between performance and versatility. The main message of this paper is that current frozen-model techniques such as prompt tuning are only the tip of the iceberg, and more powerful methods for leveraging frozen LMs can do just as well as fine tuning in challenging domains without sacrificing the underlying model's versatility. To demonstrate this, we introduce three novel methods for leveraging frozen models: input-dependent prompt tuning, frozen readers, and recursive LMs, each of which vastly improves on current frozen-model approaches. Indeed, some of our methods even outperform fine-tuning approaches in domains currently dominated by the latter. The computational cost of each method is higher than that of existing frozen model methods, but still negligible relative to a single pass through a huge frozen LM. Each of these methods constitutes a meaningful contribution in its own right, but by presenting these contributions together we aim to convince the reader of a broader message that goes beyond the details of any given method: that frozen models have untapped potential and that fine-tuning is often unnecessary.
DOI:	10.48550/arxiv.2204.10019