view article Article MiniMax Goes Sparse: Decoding M3's Attention from a Single Diagram AtlasCloud-AI • 27 days ago • 10
InfLLM: Unveiling the Intrinsic Capacity of LLMs for Understanding Extremely Long Sequences with Training-Free Memory Paper • 2402.04617 • Published Feb 7, 2024 • 6
Training-Free Long-Context Scaling of Large Language Models Paper • 2402.17463 • Published Feb 27, 2024 • 24