grpo-training - a Kmantric Collection

Kmantric 's Collections

grpo-training

updated Sep 11, 2025

meta-llama/Llama-3.2-1B-Instruct

Text Generation • 1B • Updated Oct 24, 2024 • 2.94M • • 1.23k
meta-llama/Llama-3.1-8B

Text Generation • 8B • Updated Oct 16, 2024 • 1.28M • • 2.01k
epfl-llm/meditron-7b

Text Generation • 7B • Updated Dec 7, 2023 • 5.43k • 307
medalpaca/medalpaca-7b

Text Generation • 7B • Updated Apr 2, 2024 • 2.01k • 90
ArGen: Auto-Regulation of Generative AI via GRPO and Policy-as-Code

Paper • 2509.07006 • Published Sep 6, 2025 • 1