LUFFY-RL - a Elliott Collection

Elliott 's Collections

LUFFY-RL

updated May 30, 2025

Elliott/LUFFY-Qwen-Math-7B-Zero

Text Generation • 8B • Updated Apr 23, 2025 • 21 • 1
Elliott/Qwen2.5-Math-7B-16k-think

Text Generation • 8B • Updated May 28, 2025 • 10.2k • • 6
Elliott/Openr1-Math-46k-8192

Viewer • Updated Apr 23, 2025 • 45.8k • 466 • 9
Learning to Reason under Off-Policy Guidance

Paper • 2504.14945 • Published Apr 21, 2025 • 88
Elliott/LUFFY-Qwen-Math-1.5B-Zero

Text Generation • 2B • Updated Apr 23, 2025 • 5 •
Elliott/LUFFY-Qwen-Instruct-7B

Text Generation • 8B • Updated Apr 23, 2025 • 25 • 1
Elliott/Qwen2.5-Math-7B-SFT

Text Generation • 8B • Updated May 2, 2025 • 2
Elliott/Qwen2.5-Math-7B-SFT-RL

Text Generation • 8B • Updated May 30, 2025 • 3
Elliott/Openr1-Math-48k-Complement

Viewer • Updated May 30, 2025 • 47.9k • 15