Dynamic Depth Decoding: Faster Speculative Decoding for LLMs
The acceleration of Large Language Models (LLMs) with speculative decoding provides a significant runtime improvement without any loss of accuracy. Currently, EAGLE-2 is the state-of-the-art speculative decoding method, improving on EAGLE with a dynamic draft tree. We introduce Dynamic Depth Decodin...
Saved in:
Main Authors | , , , , |
---|---|
Format | Journal Article |
Language | English |
Published |
29.08.2024
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | The acceleration of Large Language Models (LLMs) with speculative decoding
provides a significant runtime improvement without any loss of accuracy.
Currently, EAGLE-2 is the state-of-the-art speculative decoding method,
improving on EAGLE with a dynamic draft tree. We introduce Dynamic Depth
Decoding (DDD), which optimises EAGLE-2's tree drafting method using a dynamic
depth. This extends the average speedup that EAGLE-2 achieves over EAGLE by
$44\%$, giving DDD an average speedup of $3.16$x. |
---|---|
DOI: | 10.48550/arxiv.2409.00142 |