MHLA: Restoring Expressivity of Linear Attention via Token-Level Multi-Head Paper • 2601.07832 • Published about 1 month ago • 52
Elastic Attention: Test-time Adaptive Sparsity Ratios for Efficient Transformers Paper • 2601.17367 • Published 19 days ago • 33
Why Attention Patterns Exist: A Unifying Temporal Perspective Analysis Paper • 2601.21709 • Published 14 days ago • 2
LLaDA2.1: Speeding Up Text Diffusion via Token Editing Paper • 2602.08676 • Published 3 days ago • 57
MOVA: Towards Scalable and Synchronized Video-Audio Generation Paper • 2602.08794 • Published 3 days ago • 144
OPUS: Towards Efficient and Principled Data Selection in Large Language Model Pre-training in Every Iteration Paper • 2602.05400 • Published 7 days ago • 292