arxiv:2410.10165

HSR-Enhanced Sparse Attention Acceleration

Published on Oct 14, 2024

Authors:

Abstract

A novel approach accelerates attention computation in large language models by exploiting sparsity in attention mechanisms and employing Half-Space Reporting data structures to reduce computational complexity for long-context tasks.

AI-generated summary

Large Language Models (LLMs) have demonstrated remarkable capabilities across various applications, but their performance on long-context tasks is often limited by the computational complexity of attention mechanisms. We introduce a novel approach to accelerate attention computation in LLMs, particularly for long-context scenarios. We leverage the inherent sparsity within attention mechanisms, both in conventional Softmax attention and ReLU attention (with ReLU^α activation, αin N_+), to significantly reduce the running time complexity. Our method employs a Half-Space Reporting (HSR) data structure to identify non-zero or ``massively activated'' entries in the attention matrix. We present theoretical analyses for two key scenarios: generation decoding and prompt prefilling. Our approach achieves a running time of O(mn^{4/5}) significantly faster than the naive approach O(mn) for generation decoding, where n is the context length, m is the query length, and d is the hidden dimension. We can also reduce the running time for prompt prefilling from O(mn) to O(mn^{1 - 1 / lfloor d/2rfloor} + mn^{4/5}). Our method introduces only provably negligible error for Softmax attention. This work represents a significant step towards enabling efficient long-context processing in LLMs.

View arXiv page View PDF Add to collection

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2410.10165 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2410.10165 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2410.10165 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.