Synthesizing Post-Training Data for LLMs through Multi-Agent Simulation

Post-training is essential for enabling large language models (LLMs) to follow human instructions. Inspired by the recent success of using LLMs to simulate human society, we leverage multi-agent simulation to automatically generate diverse text-based scenarios, capturing a wide range of real-world h...

Full description

Saved in:

Bibliographic Details
Main Authors	Tang, Shuo, Pang, Xianghe, Liu, Zexi, Tang, Bohan, Ye, Rui, Dong, Xiaowen, Wang, Yanfeng, Chen, Siheng
Format	Journal Article
Language	English
Published	18.10.2024
Subjects	Computer Science - Artificial Intelligence Computer Science - Computation and Language
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Post-training is essential for enabling large language models (LLMs) to follow human instructions. Inspired by the recent success of using LLMs to simulate human society, we leverage multi-agent simulation to automatically generate diverse text-based scenarios, capturing a wide range of real-world human needs. We propose MATRIX, a multi-agent simulator that creates realistic and scalable scenarios. Leveraging these outputs, we introduce a novel scenario-driven instruction generator MATRIX-Gen for controllable and highly realistic data synthesis. Extensive experiments demonstrate that our framework effectively generates both general and domain-specific data. Notably, on AlpacaEval 2 and Arena-Hard benchmarks, Llama-3-8B-Base, post-trained on datasets synthesized by MATRIX-Gen with just 20K instruction-response pairs, outperforms Meta's Llama-3-8B-Instruct model, which was trained on over 10M pairs; see our project at https://github.com/ShuoTang123/MATRIX-Gen.
DOI:	10.48550/arxiv.2410.14251