Papers
arxiv:2410.10165

HSR-Enhanced Sparse Attention Acceleration

Published on Oct 14, 2024
Authors:
,
,
,
,

Abstract

A novel approach accelerates attention computation in large language models by exploiting sparsity in attention mechanisms and employing Half-Space Reporting data structures to reduce computational complexity for long-context tasks.

AI-generated summary

Large Language Models (LLMs) have demonstrated remarkable capabilities across various applications, but their performance on long-context tasks is often limited by the computational complexity of attention mechanisms. We introduce a novel approach to accelerate attention computation in LLMs, particularly for long-context scenarios. We leverage the inherent sparsity within attention mechanisms, both in conventional Softmax attention and ReLU attention (with ReLU^α activation, αin N_+), to significantly reduce the running time complexity. Our method employs a Half-Space Reporting (HSR) data structure to identify non-zero or ``massively activated'' entries in the attention matrix. We present theoretical analyses for two key scenarios: generation decoding and prompt prefilling. Our approach achieves a running time of O(mn^{4/5}) significantly faster than the naive approach O(mn) for generation decoding, where n is the context length, m is the query length, and d is the hidden dimension. We can also reduce the running time for prompt prefilling from O(mn) to O(mn^{1 - 1 / lfloor d/2rfloor} + mn^{4/5}). Our method introduces only provably negligible error for Softmax attention. This work represents a significant step towards enabling efficient long-context processing in LLMs.

Community

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2410.10165 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2410.10165 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2410.10165 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.