Drop MLX reference

1395d27 verified 22 days ago

1.1 kB

license: apache-2.0
base_model: Qwen/Qwen3-Coder-30B-A3B-Instruct
tags:
  - gerbil
  - scheme
  - lora
  - code
  - dpo
language:
  - en

gerbil-qwen3-coder-30b-bf16

Qwen3-Coder-30B-A3B-Instruct fine-tuned for Gerbil Scheme generation.

Training pipeline

Three-stage LoRA fine-tune (r=32, α=64, fused-MoE expert targets):

CPT — Continued pre-training on Gerbil source corpus (lr 2e-5, 2 epochs)
SFT — Supervised fine-tune on instruction/response pairs (lr 1e-4, 2 epochs)
DPO — Direct preference optimization on wrong→right pairs (lr 5e-6, 3 epochs)

Metric	Base	Trained	Δ
Holdout task score	31	39	+8
Anti-idioms hit	1	0	-1
Code blocks wrapped	9	14	+5
tok_lean_sum (P(chosen) > P(rejected))	-4.17	+4.03	+8.19
wins chosen / rejected (n=66)	47 / 19	52 / 13	+5 / -6

BF16 merged weights, ~57 GB across 13 safetensors shards.