ProPD: Dynamic Token Tree Pruning and Generation for LLM Parallel Decoding

Recent advancements in generative large language models (LLMs) have significantly boosted the performance in natural language processing tasks. However, their efficiency is hampered by the inherent limitations in autoregressive token generation. While parallel decoding with token tree verification,...

Full description

Saved in:
Bibliographic Details
Published inarXiv.org
Main Authors Zhong, Shuzhang, Yang, Zebin, Li, Meng, Gong, Ruihao, Wang, Runsheng, Huang, Ru
Format Paper
LanguageEnglish
Published Ithaca Cornell University Library, arXiv.org 21.02.2024
Subjects
Online AccessGet full text

Cover

Loading…