TidalDecode: Fast and Accurate LLM Decoding with Position Persistent Sparse Attention Paper • 2410.05076 • Published Oct 7, 2024 • 8
SeerAttention: Learning Intrinsic Sparse Attention in Your LLMs Paper • 2410.13276 • Published Oct 17, 2024 • 29
Star Attention: Efficient LLM Inference over Long Sequences Paper • 2411.17116 • Published Nov 26, 2024 • 53
MiniMax-01: Scaling Foundation Models with Lightning Attention Paper • 2501.08313 • Published Jan 14, 2025 • 302
Sigma: Differential Rescaling of Query, Key and Value for Efficient Language Models Paper • 2501.13629 • Published Jan 23, 2025 • 48
TransMLA: Multi-head Latent Attention Is All You Need Paper • 2502.07864 • Published Feb 11, 2025 • 57
Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention Paper • 2502.11089 • Published Feb 16, 2025 • 169
Does Time Have Its Place? Temporal Heads: Where Language Models Recall Time-specific Information Paper • 2502.14258 • Published Feb 20, 2025 • 26
How Do Large Vision-Language Models See Text in Image? Unveiling the Distinctive Role of OCR Heads Paper • 2505.15865 • Published May 21, 2025 • 5
Native Hybrid Attention for Efficient Sequence Modeling Paper • 2510.07019 • Published Oct 8, 2025 • 17
Every Attention Matters: An Efficient Hybrid Architecture for Long-Context Reasoning Paper • 2510.19338 • Published Oct 22, 2025 • 117
Kimi Linear: An Expressive, Efficient Attention Architecture Paper • 2510.26692 • Published Oct 30, 2025 • 131
Why Attention Patterns Exist: A Unifying Temporal Perspective Analysis Paper • 2601.21709 • Published Jan 29 • 3