-
How to inject knowledge efficiently? Knowledge Infusion Scaling Law for Pre-training Large Language Models
Paper • 2509.19371 • Published -
Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Free
Paper • 2505.06708 • Published • 10 -
Selective Attention: Enhancing Transformer through Principled Context Control
Paper • 2411.12892 • Published -
A Survey of Reinforcement Learning for Large Reasoning Models
Paper • 2509.08827 • Published • 190
Collections
Discover the best community collections!
Collections including paper arxiv:2501.08313
-
MiniMax-01: Scaling Foundation Models with Lightning Attention
Paper • 2501.08313 • Published • 300 -
Agent-Ark/Toucan-1.5M
Viewer • Updated • 1.65M • 4.74k • 192 -
facebook/natural_reasoning
Viewer • Updated • 1.15M • 1.55k • 550 -
Salesforce/Webscale-RL
Viewer • Updated • 1.11M • 382 • 81
-
Rewnozom/agent-zero-v1-a-01
Text Generation • 4B • Updated • 8 • 2 -
TheBloke/MythoMax-L2-13B-GGUF
13B • Updated • 67.6k • 217 -
DavidAU/Llama-3.2-8X3B-MOE-Dark-Champion-Instruct-uncensored-abliterated-18.4B-GGUF
Text Generation • 18B • Updated • 44.2k • 479 -
QuantFactory/DarkIdol-Llama-3.1-8B-Instruct-1.2-Uncensored-GGUF
Text Generation • 8B • Updated • 8.8k • 132
-
Open-Reasoner-Zero: An Open Source Approach to Scaling Up Reinforcement Learning on the Base Model
Paper • 2503.24290 • Published • 62 -
I Have Covered All the Bases Here: Interpreting Reasoning Features in Large Language Models via Sparse Autoencoders
Paper • 2503.18878 • Published • 119 -
START: Self-taught Reasoner with Tools
Paper • 2503.04625 • Published • 113 -
DAPO: An Open-Source LLM Reinforcement Learning System at Scale
Paper • 2503.14476 • Published • 144
-
MiniMax-01: Scaling Foundation Models with Lightning Attention
Paper • 2501.08313 • Published • 300 -
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
Paper • 2501.12948 • Published • 438 -
Qwen2.5 Technical Report
Paper • 2412.15115 • Published • 376 -
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone
Paper • 2404.14219 • Published • 259
-
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Paper • 2402.17764 • Published • 627 -
MiniMax-01: Scaling Foundation Models with Lightning Attention
Paper • 2501.08313 • Published • 300 -
Group Sequence Policy Optimization
Paper • 2507.18071 • Published • 316 -
Drivel-ology: Challenging LLMs with Interpreting Nonsense with Depth
Paper • 2509.03867 • Published • 211
-
MiniMaxAI/MiniMax-Text-01-hf
Text Generation • 456B • Updated • 9.28k • 10 -
MiniMaxAI/MiniMax-M1-80k-hf
Text Generation • 456B • Updated • 41 • 8 -
MiniMaxAI/MiniMax-M1-40k-hf
Text Generation • 456B • Updated • 45 • 12 -
MiniMaxAI/MiniMax-Text-01
Text Generation • 456B • Updated • 743 • 653
-
MiniMaxAI/MiniMax-Text-01
Text Generation • 456B • Updated • 743 • 653 -
MiniMaxAI/MiniMax-VL-01
Image-Text-to-Text • 456B • Updated • 73.9k • 282 -
MiniMax-01: Scaling Foundation Models with Lightning Attention
Paper • 2501.08313 • Published • 300 -
MiniMaxText01
💬120Generate responses to text and images in a chat interface
-
deepseek-ai/DeepSeek-R1
Text Generation • 685B • Updated • 452k • • 13k -
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Paper • 2402.17764 • Published • 627 -
MiniMax-01: Scaling Foundation Models with Lightning Attention
Paper • 2501.08313 • Published • 300 -
open-r1/OpenR1-Math-220k
Viewer • Updated • 450k • 12.6k • 706
-
How to inject knowledge efficiently? Knowledge Infusion Scaling Law for Pre-training Large Language Models
Paper • 2509.19371 • Published -
Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Free
Paper • 2505.06708 • Published • 10 -
Selective Attention: Enhancing Transformer through Principled Context Control
Paper • 2411.12892 • Published -
A Survey of Reinforcement Learning for Large Reasoning Models
Paper • 2509.08827 • Published • 190
-
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Paper • 2402.17764 • Published • 627 -
MiniMax-01: Scaling Foundation Models with Lightning Attention
Paper • 2501.08313 • Published • 300 -
Group Sequence Policy Optimization
Paper • 2507.18071 • Published • 316 -
Drivel-ology: Challenging LLMs with Interpreting Nonsense with Depth
Paper • 2509.03867 • Published • 211
-
MiniMax-01: Scaling Foundation Models with Lightning Attention
Paper • 2501.08313 • Published • 300 -
Agent-Ark/Toucan-1.5M
Viewer • Updated • 1.65M • 4.74k • 192 -
facebook/natural_reasoning
Viewer • Updated • 1.15M • 1.55k • 550 -
Salesforce/Webscale-RL
Viewer • Updated • 1.11M • 382 • 81
-
MiniMaxAI/MiniMax-Text-01-hf
Text Generation • 456B • Updated • 9.28k • 10 -
MiniMaxAI/MiniMax-M1-80k-hf
Text Generation • 456B • Updated • 41 • 8 -
MiniMaxAI/MiniMax-M1-40k-hf
Text Generation • 456B • Updated • 45 • 12 -
MiniMaxAI/MiniMax-Text-01
Text Generation • 456B • Updated • 743 • 653
-
Rewnozom/agent-zero-v1-a-01
Text Generation • 4B • Updated • 8 • 2 -
TheBloke/MythoMax-L2-13B-GGUF
13B • Updated • 67.6k • 217 -
DavidAU/Llama-3.2-8X3B-MOE-Dark-Champion-Instruct-uncensored-abliterated-18.4B-GGUF
Text Generation • 18B • Updated • 44.2k • 479 -
QuantFactory/DarkIdol-Llama-3.1-8B-Instruct-1.2-Uncensored-GGUF
Text Generation • 8B • Updated • 8.8k • 132
-
MiniMaxAI/MiniMax-Text-01
Text Generation • 456B • Updated • 743 • 653 -
MiniMaxAI/MiniMax-VL-01
Image-Text-to-Text • 456B • Updated • 73.9k • 282 -
MiniMax-01: Scaling Foundation Models with Lightning Attention
Paper • 2501.08313 • Published • 300 -
MiniMaxText01
💬120Generate responses to text and images in a chat interface
-
Open-Reasoner-Zero: An Open Source Approach to Scaling Up Reinforcement Learning on the Base Model
Paper • 2503.24290 • Published • 62 -
I Have Covered All the Bases Here: Interpreting Reasoning Features in Large Language Models via Sparse Autoencoders
Paper • 2503.18878 • Published • 119 -
START: Self-taught Reasoner with Tools
Paper • 2503.04625 • Published • 113 -
DAPO: An Open-Source LLM Reinforcement Learning System at Scale
Paper • 2503.14476 • Published • 144
-
MiniMax-01: Scaling Foundation Models with Lightning Attention
Paper • 2501.08313 • Published • 300 -
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
Paper • 2501.12948 • Published • 438 -
Qwen2.5 Technical Report
Paper • 2412.15115 • Published • 376 -
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone
Paper • 2404.14219 • Published • 259
-
deepseek-ai/DeepSeek-R1
Text Generation • 685B • Updated • 452k • • 13k -
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Paper • 2402.17764 • Published • 627 -
MiniMax-01: Scaling Foundation Models with Lightning Attention
Paper • 2501.08313 • Published • 300 -
open-r1/OpenR1-Math-220k
Viewer • Updated • 450k • 12.6k • 706