ProPD: Dynamic Token Tree Pruning and Generation for LLM Parallel Decoding

Recent advancements in generative large language models (LLMs) have significantly boosted the performance in natural language processing tasks. However, their efficiency is hampered by the inherent limitations in autoregressive token generation. While parallel decoding with token tree verification,...

Full description

Saved in:

Bibliographic Details
Published in	arXiv.org
Main Authors	Zhong, Shuzhang, Yang, Zebin, Li, Meng, Gong, Ruihao, Wang, Runsheng, Huang, Ru
Format	Paper
Language	English
Published	Ithaca Cornell University Library, arXiv.org 21.02.2024
Subjects	Batch processing Efficiency Large language models Natural language processing Parallel processing Tree generating algorithms Verification
Online Access	Get full text

Cover

Loading…

Be the first to leave a comment!