Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2501.08313

How to inject knowledge efficiently? Knowledge Infusion Scaling Law for Pre-training Large Language Models

Paper • 2509.19371 • Published Sep 19, 2025
Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Free

Paper • 2505.06708 • Published May 10, 2025 • 10
Selective Attention: Enhancing Transformer through Principled Context Control

Paper • 2411.12892 • Published Nov 19, 2024
A Survey of Reinforcement Learning for Large Reasoning Models

Paper • 2509.08827 • Published Sep 10, 2025 • 190

MiniMax-01: Scaling Foundation Models with Lightning Attention

Paper • 2501.08313 • Published Jan 14, 2025 • 300
Agent-Ark/Toucan-1.5M

Viewer • Updated Oct 4, 2025 • 1.65M • 4.74k • 192
facebook/natural_reasoning

Viewer • Updated Feb 21, 2025 • 1.15M • 1.55k • 550
Salesforce/Webscale-RL

Viewer • Updated Oct 14, 2025 • 1.11M • 382 • 81

Rewnozom/agent-zero-v1-a-01

Text Generation • 4B • Updated Jan 18, 2025 • 8 • 2
TheBloke/MythoMax-L2-13B-GGUF

13B • Updated Sep 27, 2023 • 67.6k • 217
DavidAU/Llama-3.2-8X3B-MOE-Dark-Champion-Instruct-uncensored-abliterated-18.4B-GGUF

Text Generation • 18B • Updated Dec 1, 2025 • 44.2k • 479
QuantFactory/DarkIdol-Llama-3.1-8B-Instruct-1.2-Uncensored-GGUF

Text Generation • 8B • Updated Jul 29, 2024 • 8.8k • 132

To Read collection

interesting papers to read

Open-Reasoner-Zero: An Open Source Approach to Scaling Up Reinforcement Learning on the Base Model

Paper • 2503.24290 • Published Mar 31, 2025 • 62
I Have Covered All the Bases Here: Interpreting Reasoning Features in Large Language Models via Sparse Autoencoders

Paper • 2503.18878 • Published Mar 24, 2025 • 119
START: Self-taught Reasoner with Tools

Paper • 2503.04625 • Published Mar 6, 2025 • 113
DAPO: An Open-Source LLM Reinforcement Learning System at Scale

Paper • 2503.14476 • Published Mar 18, 2025 • 144

wisdom of the ancient

MiniMax-01: Scaling Foundation Models with Lightning Attention

Paper • 2501.08313 • Published Jan 14, 2025 • 300
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Paper • 2501.12948 • Published Jan 22, 2025 • 438
Qwen2.5 Technical Report

Paper • 2412.15115 • Published Dec 19, 2024 • 376
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone

Paper • 2404.14219 • Published Apr 22, 2024 • 259

This collection is a list of papers I find to be very interesting.

The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

Paper • 2402.17764 • Published Feb 27, 2024 • 627
MiniMax-01: Scaling Foundation Models with Lightning Attention

Paper • 2501.08313 • Published Jan 14, 2025 • 300
Group Sequence Policy Optimization

Paper • 2507.18071 • Published Jul 24, 2025 • 316
Drivel-ology: Challenging LLMs with Interpreting Nonsense with Depth

Paper • 2509.03867 • Published Sep 4, 2025 • 211

MiniMax (Large Language Model) - Original and Transformers Compatible Weights

MiniMaxAI/MiniMax-Text-01-hf

Text Generation • 456B • Updated Jul 9, 2025 • 9.28k • 10
MiniMaxAI/MiniMax-M1-80k-hf

Text Generation • 456B • Updated Jul 9, 2025 • 41 • 8
MiniMaxAI/MiniMax-M1-40k-hf

Text Generation • 456B • Updated Jul 11, 2025 • 45 • 12
MiniMaxAI/MiniMax-Text-01

Text Generation • 456B • Updated Jul 3, 2025 • 743 • 653

MiniMaxAI/MiniMax-Text-01

Text Generation • 456B • Updated Jul 3, 2025 • 743 • 653
MiniMaxAI/MiniMax-VL-01

Image-Text-to-Text • 456B • Updated Jul 3, 2025 • 73.9k • 282
MiniMax-01: Scaling Foundation Models with Lightning Attention

Paper • 2501.08313 • Published Jan 14, 2025 • 300
Running

120

MiniMaxText01

💬

120

Generate responses to text and images in a chat interface

Running

14

Inpaint mask maker

👺

14

Swap faces in images with adjustments
deepseek-ai/DeepSeek-V3-0324

Text Generation • 685B • Updated Mar 27, 2025 • 262k • • 3.09k
MiniMax-01: Scaling Foundation Models with Lightning Attention

Paper • 2501.08313 • Published Jan 14, 2025 • 300

deepseek-ai/DeepSeek-R1

Text Generation • 685B • Updated Mar 27, 2025 • 452k • • 13k
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

Paper • 2402.17764 • Published Feb 27, 2024 • 627
MiniMax-01: Scaling Foundation Models with Lightning Attention

Paper • 2501.08313 • Published Jan 14, 2025 • 300
open-r1/OpenR1-Math-220k

Viewer • Updated Feb 18, 2025 • 450k • 12.6k • 706

How to inject knowledge efficiently? Knowledge Infusion Scaling Law for Pre-training Large Language Models

Paper • 2509.19371 • Published Sep 19, 2025
Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Free

Paper • 2505.06708 • Published May 10, 2025 • 10
Selective Attention: Enhancing Transformer through Principled Context Control

Paper • 2411.12892 • Published Nov 19, 2024
A Survey of Reinforcement Learning for Large Reasoning Models

Paper • 2509.08827 • Published Sep 10, 2025 • 190

This collection is a list of papers I find to be very interesting.

The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

Paper • 2402.17764 • Published Feb 27, 2024 • 627
MiniMax-01: Scaling Foundation Models with Lightning Attention

Paper • 2501.08313 • Published Jan 14, 2025 • 300
Group Sequence Policy Optimization

Paper • 2507.18071 • Published Jul 24, 2025 • 316
Drivel-ology: Challenging LLMs with Interpreting Nonsense with Depth

Paper • 2509.03867 • Published Sep 4, 2025 • 211

MiniMax-01: Scaling Foundation Models with Lightning Attention

Paper • 2501.08313 • Published Jan 14, 2025 • 300
Agent-Ark/Toucan-1.5M

Viewer • Updated Oct 4, 2025 • 1.65M • 4.74k • 192
facebook/natural_reasoning

Viewer • Updated Feb 21, 2025 • 1.15M • 1.55k • 550
Salesforce/Webscale-RL

Viewer • Updated Oct 14, 2025 • 1.11M • 382 • 81

MiniMax (Large Language Model) - Original and Transformers Compatible Weights

MiniMaxAI/MiniMax-Text-01-hf

Text Generation • 456B • Updated Jul 9, 2025 • 9.28k • 10
MiniMaxAI/MiniMax-M1-80k-hf

Text Generation • 456B • Updated Jul 9, 2025 • 41 • 8
MiniMaxAI/MiniMax-M1-40k-hf

Text Generation • 456B • Updated Jul 11, 2025 • 45 • 12
MiniMaxAI/MiniMax-Text-01

Text Generation • 456B • Updated Jul 3, 2025 • 743 • 653

Rewnozom/agent-zero-v1-a-01

Text Generation • 4B • Updated Jan 18, 2025 • 8 • 2
TheBloke/MythoMax-L2-13B-GGUF

13B • Updated Sep 27, 2023 • 67.6k • 217
DavidAU/Llama-3.2-8X3B-MOE-Dark-Champion-Instruct-uncensored-abliterated-18.4B-GGUF

Text Generation • 18B • Updated Dec 1, 2025 • 44.2k • 479
QuantFactory/DarkIdol-Llama-3.1-8B-Instruct-1.2-Uncensored-GGUF

Text Generation • 8B • Updated Jul 29, 2024 • 8.8k • 132

MiniMaxAI/MiniMax-Text-01

Text Generation • 456B • Updated Jul 3, 2025 • 743 • 653
MiniMaxAI/MiniMax-VL-01

Image-Text-to-Text • 456B • Updated Jul 3, 2025 • 73.9k • 282
MiniMax-01: Scaling Foundation Models with Lightning Attention

Paper • 2501.08313 • Published Jan 14, 2025 • 300
Running

120

MiniMaxText01

💬

120

Generate responses to text and images in a chat interface

To Read collection

interesting papers to read

Open-Reasoner-Zero: An Open Source Approach to Scaling Up Reinforcement Learning on the Base Model

Paper • 2503.24290 • Published Mar 31, 2025 • 62
I Have Covered All the Bases Here: Interpreting Reasoning Features in Large Language Models via Sparse Autoencoders

Paper • 2503.18878 • Published Mar 24, 2025 • 119
START: Self-taught Reasoner with Tools

Paper • 2503.04625 • Published Mar 6, 2025 • 113
DAPO: An Open-Source LLM Reinforcement Learning System at Scale

Paper • 2503.14476 • Published Mar 18, 2025 • 144

Running

14

Inpaint mask maker

👺

14

Swap faces in images with adjustments
deepseek-ai/DeepSeek-V3-0324

Text Generation • 685B • Updated Mar 27, 2025 • 262k • • 3.09k
MiniMax-01: Scaling Foundation Models with Lightning Attention

Paper • 2501.08313 • Published Jan 14, 2025 • 300

wisdom of the ancient

MiniMax-01: Scaling Foundation Models with Lightning Attention

Paper • 2501.08313 • Published Jan 14, 2025 • 300
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Paper • 2501.12948 • Published Jan 22, 2025 • 438
Qwen2.5 Technical Report

Paper • 2412.15115 • Published Dec 19, 2024 • 376
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone

Paper • 2404.14219 • Published Apr 22, 2024 • 259

deepseek-ai/DeepSeek-R1

Text Generation • 685B • Updated Mar 27, 2025 • 452k • • 13k
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

Paper • 2402.17764 • Published Feb 27, 2024 • 627
MiniMax-01: Scaling Foundation Models with Lightning Attention

Paper • 2501.08313 • Published Jan 14, 2025 • 300
open-r1/OpenR1-Math-220k

Viewer • Updated Feb 18, 2025 • 450k • 12.6k • 706

Previous
1
2
3
...
6
Next

Company

TOS Privacy About Careers

Website

Models Datasets Spaces Pricing Docs