Synthesizing Post-Training Data for LLMs through Multi-Agent Simulation
Post-training is essential for enabling large language models (LLMs) to follow human instructions. Inspired by the recent success of using LLMs to simulate human society, we leverage multi-agent simulation to automatically generate diverse text-based scenarios, capturing a wide range of real-world h...
Saved in:
Main Authors | , , , , , , , |
---|---|
Format | Journal Article |
Language | English |
Published |
18.10.2024
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Post-training is essential for enabling large language models (LLMs) to
follow human instructions. Inspired by the recent success of using LLMs to
simulate human society, we leverage multi-agent simulation to automatically
generate diverse text-based scenarios, capturing a wide range of real-world
human needs. We propose MATRIX, a multi-agent simulator that creates realistic
and scalable scenarios. Leveraging these outputs, we introduce a novel
scenario-driven instruction generator MATRIX-Gen for controllable and highly
realistic data synthesis. Extensive experiments demonstrate that our framework
effectively generates both general and domain-specific data. Notably, on
AlpacaEval 2 and Arena-Hard benchmarks, Llama-3-8B-Base, post-trained on
datasets synthesized by MATRIX-Gen with just 20K instruction-response pairs,
outperforms Meta's Llama-3-8B-Instruct model, which was trained on over 10M
pairs; see our project at https://github.com/ShuoTang123/MATRIX-Gen. |
---|---|
DOI: | 10.48550/arxiv.2410.14251 |