---
license: apache-2.0
base_model: Qwen/Qwen3-Coder-30B-A3B-Instruct
tags:
  - gerbil
  - scheme
  - lora
  - code
  - dpo
language:
  - en
---

# gerbil-qwen3-coder-30b-bf16

Qwen3-Coder-30B-A3B-Instruct fine-tuned for [Gerbil Scheme](https://cons.io/) generation.

Training pipeline and tooling: <https://github.com/ober/gerbil-lora>

## Training pipeline

Three-stage LoRA fine-tune (r=32, α=64, fused-MoE expert targets):

1. **CPT** — Continued pre-training on Gerbil source corpus (lr 2e-5, 2 epochs)
2. **SFT** — Supervised fine-tune on instruction/response pairs (lr 1e-4, 2 epochs)
3. **DPO** — Direct preference optimization on wrong→right pairs (lr 5e-6, 3 epochs)

## DPO eval (vs base Qwen3-Coder-30B-A3B-Instruct)

| Metric | Base | Trained | Δ |
|---|---|---|---|
| Holdout task score | 31 | 39 | +8 |
| Anti-idioms hit | 1 | 0 | -1 |
| Code blocks wrapped | 9 | 14 | +5 |
| tok_lean_sum (P(chosen) > P(rejected)) | -4.17 | +4.03 | +8.19 |
| wins chosen / rejected (n=66) | 47 / 19 | 52 / 13 | +5 / -6 |

## Weights

BF16 merged weights, ~57 GB across 13 safetensors shards.