AgentCPT/Qwen3-4B_thinking_agent_sft_nemotron_tool_calling_v2_lr1e-5_epoch_1_ctx_16384_bs_256 4B • Updated 3 days ago • 8
AgentCPT/Qwen3-4B_thinking_agent_sft_nemotron_tool_calling_v2_lr1e-5_epoch_1_ctx_16384_bs_256 4B • Updated 3 days ago • 8
DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models Paper • 2512.02556 • Published Dec 2, 2025 • 247
Stabilizing Reinforcement Learning with LLMs: Formulation and Practices Paper • 2512.01374 • Published Dec 1, 2025 • 96
LongWriter-Zero: Mastering Ultra-Long Text Generation via Reinforcement Learning Paper • 2506.18841 • Published Jun 23, 2025 • 56