Towards Modular LLMs by Building and Reusing a Library of LoRAs
The growing number of parameter-efficient adaptations of a base large language model (LLM) calls for studying whether we can reuse such trained adapters to improve performance for new tasks. We study how to best build a library of adapters given multi-task data and devise techniques for both zero-sh...
Saved in:
Main Authors | , , , , , , , |
---|---|
Format | Journal Article |
Language | English |
Published |
17.05.2024
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | The growing number of parameter-efficient adaptations of a base large
language model (LLM) calls for studying whether we can reuse such trained
adapters to improve performance for new tasks. We study how to best build a
library of adapters given multi-task data and devise techniques for both
zero-shot and supervised task generalization through routing in such library.
We benchmark existing approaches to build this library and introduce
model-based clustering, MBC, a method that groups tasks based on the similarity
of their adapter parameters, indirectly optimizing for transfer across the
multi-task dataset. To re-use the library, we present a novel zero-shot routing
mechanism, Arrow, which enables dynamic selection of the most relevant adapters
for new inputs without the need for retraining. We experiment with several
LLMs, such as Phi-2 and Mistral, on a wide array of held-out tasks, verifying
that MBC-based adapters and Arrow routing lead to superior generalization to
new tasks. We make steps towards creating modular, adaptable LLMs that can
match or outperform traditional joint training. |
---|---|
DOI: | 10.48550/arxiv.2405.11157 |