--- license: apache-2.0 base_model: Qwen/Qwen3-Coder-30B-A3B-Instruct tags: - gerbil - scheme - lora - code - dpo language: - en --- # gerbil-qwen3-coder-30b-bf16 Qwen3-Coder-30B-A3B-Instruct fine-tuned for [Gerbil Scheme](https://cons.io/) generation. Training pipeline and tooling: ## Training pipeline Three-stage LoRA fine-tune (r=32, α=64, fused-MoE expert targets): 1. **CPT** — Continued pre-training on Gerbil source corpus (lr 2e-5, 2 epochs) 2. **SFT** — Supervised fine-tune on instruction/response pairs (lr 1e-4, 2 epochs) 3. **DPO** — Direct preference optimization on wrong→right pairs (lr 5e-6, 3 epochs) ## DPO eval (vs base Qwen3-Coder-30B-A3B-Instruct) | Metric | Base | Trained | Δ | |---|---|---|---| | Holdout task score | 31 | 39 | +8 | | Anti-idioms hit | 1 | 0 | -1 | | Code blocks wrapped | 9 | 14 | +5 | | tok_lean_sum (P(chosen) > P(rejected)) | -4.17 | +4.03 | +8.19 | | wins chosen / rejected (n=66) | 47 / 19 | 52 / 13 | +5 / -6 | ## Weights BF16 merged weights, ~57 GB across 13 safetensors shards.