Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
166 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Navigating with Graph Representations for Fast and Scalable Decoding of Neural Language Models (1806.04189v1)

Published 11 Jun 2018 in cs.CL and cs.AI

Abstract: Neural LLMs (NLMs) have recently gained a renewed interest by achieving state-of-the-art performance across many NLP tasks. However, NLMs are very computationally demanding largely due to the computational cost of the softmax layer over a large vocabulary. We observe that, in decoding of many NLP tasks, only the probabilities of the top-K hypotheses need to be calculated preciously and K is often much smaller than the vocabulary size. This paper proposes a novel softmax layer approximation algorithm, called Fast Graph Decoder (FGD), which quickly identifies, for a given context, a set of K words that are most likely to occur according to a NLM. We demonstrate that FGD reduces the decoding time by an order of magnitude while attaining close to the full softmax baseline accuracy on neural machine translation and LLMing tasks. We also prove the theoretical guarantee on the softmax approximation quality.

Citations (28)

Summary

We haven't generated a summary for this paper yet.