dsl-debug
Collection
Models trained to find and fix bugs in custom dataflow DSL programs using multi-turn tool use. • 2 items • Updated
Qwen2.5-7B-Instruct fine-tuned on 1,593 debugging trajectories for the DSL Debug benchmark.
| Split | Base Model | This Model |
|---|---|---|
| Standard (481) | 50.5% | 56.3% |
| Nonlocal (200) | 12.0% | 40.0% |
| Intent-Mismatch (177) | 0.6% | 7.9% |
| Benchmark | Base | This Model |
|---|---|---|
| MMLU | 74.6% | 74.6% |
| GSM8K | 84.9% | 83.9% |
| HumanEval | 65.9% | 62.2% |
This checkpoint is primarily used as the starting point for SFT→RL training (GRPO), which achieves the best results. See the collection for all models.
from huggingface_hub import snapshot_download
snapshot_download("andrewlngdn/dsl-debug-7b-sft-step100",
local_dir="/workspace/models/sft_7b_step100")