Model Details

Domain:

Language

Task:

Translation

Model Access:

Open weights (non-commercial)

AI Tools Usage

This model is commonly used behind the scenes in AI tools.

Introduction

We propose Speculative Decoding (SpecDec), for the first time ever, to formally study exploiting the idea of speculative execution to accelerate autoregressive (AR) decoding. Speculative Decoding has two innovations: Spec-Drafter -- an independent model specially optimized for efficient and accurate drafting -- and Spec-Verification -- a reliable method for verifying the drafted tokens efficiently in the decoding paradigm. Experimental results on various seq2seq tasks including machine translation and abstractive summarization show our approach can achieve around speedup for the popular Transformer architectures with comparable generation quality to beam search decoding, refreshing the impression that the draft-then-verify paradigm introduces only speedup. In addition to the remarkable speedup, we also demonstrate 3 additional advantages of SpecDec, revealing its practical value for accelerating generative models in real-world applications. Our models and codes are available at this https URL.

Benchmarking

FLOPs118000000000000000000

Notes: 6 FLOP/parameter/token * 500000000 parameters * 39321600000 tokens = 117964800000000000000 FLOP

Training

Training Code Accessibilityhttps://github.com/hemingkx/SpecDec no clear license

HardwareNVIDIA V100

Hardware Quantity8

Size Notes: # max tokens 4096 update frequency 4 8 GPUs max updates 300K 4096 * 4 * 8 * 300000 = 39321600000 tokens

Parameters

Parameters500000000

Notes: 0.5B 12-layer encoder + 2-layer decoder, d=512/2048

Authors

Heming Xia, Tao Ge, Peiyi Wang, Si-Qing Chen, Furu Wei, Zhifang Sui

Peking University,Microsoft Research Asia | Spec-Drafter - Capabilities, Benchmarks and Use Cases

Top Tasks

Top Countries

Top Domains

Top Organizations

Top Categories

Top Collections

Platform

Top Tasks

Top Countries

Top Domains

Top Organizations

Top Categories

Top Collections

Platform

Model Details

AI Tools Usage

Introduction

Benchmarking

Training

Parameters

Authors

Top Tasks

Top Countries

Top Domains

Top Organizations

Top Categories

Top Collections

Platform

Model Details

AI Tools Usage

Introduction

Benchmarking

Training

Parameters

Authors