Promptagator: Few-shot Dense Retrieval From 8 Examples
Much recent research on information retrieval has focused on how to transfer from one task (typically with abundant supervised data) to various other tasks where supervision is limited, with the implicit assumption that it is possible to generalize from one task to all the rest. However, this overlo...
Saved in:
Main Authors | , , , , , , , , , |
---|---|
Format | Journal Article |
Language | English |
Published |
23.09.2022
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Much recent research on information retrieval has focused on how to transfer
from one task (typically with abundant supervised data) to various other tasks
where supervision is limited, with the implicit assumption that it is possible
to generalize from one task to all the rest. However, this overlooks the fact
that there are many diverse and unique retrieval tasks, each targeting
different search intents, queries, and search domains. In this paper, we
suggest to work on Few-shot Dense Retrieval, a setting where each task comes
with a short description and a few examples. To amplify the power of a few
examples, we propose Prompt-base Query Generation for Retriever (Promptagator),
which leverages large language models (LLM) as a few-shot query generator, and
creates task-specific retrievers based on the generated data. Powered by LLM's
generalization ability, Promptagator makes it possible to create task-specific
end-to-end retrievers solely based on a few examples {without} using Natural
Questions or MS MARCO to train %question generators or dual encoders.
Surprisingly, LLM prompting with no more than 8 examples allows dual encoders
to outperform heavily engineered models trained on MS MARCO like ColBERT v2 by
more than 1.2 nDCG on average on 11 retrieval sets. Further training
standard-size re-rankers using the same generated data yields another 5.0 point
nDCG improvement. Our studies determine that query generation can be far more
effective than previously observed, especially when a small amount of
task-specific knowledge is given. |
---|---|
DOI: | 10.48550/arxiv.2209.11755 |