🌟 Qwen2.5-Coder-3B — Claude Opus 4.6 Reasoning Distilled
A compact, fast, locally-runnable coding model fine-tuned on top of Qwen2.5-Coder-3B-Instruct using high-quality reasoning trajectories distilled from Claude 4.6 Opus. Designed to run efficiently on consumer hardware with as little as 4GB VRAM at ~88 tokens/sec.
💡 Model Introduction
Qwen2.5-Coder-3B-Claude-Opus-4.6-Distilled combines the strong code generation foundation of Qwen2.5-Coder with the structured, step-by-step reasoning style of Claude 4.6 Opus. Through Supervised Fine-Tuning (SFT) with LoRA, the model learns to think through problems carefully inside <think> tags before delivering precise, well-structured answers.
Unlike larger distilled models, this 3B model is built for real local inference — fast, private, and fits comfortably in 4GB VRAM.
🧠 Reasoning Style
The model adopts Claude Opus's structured reasoning pattern:
<think>
Let me analyze this carefully.
1. Identify the core objective.
2. Break down into subcomponents.
3. Consider edge cases and constraints.
4. Formulate and verify the solution.
</think>
[Final clean answer here]
🗺️ Training Pipeline
Base Model (Qwen/Qwen2.5-Coder-3B-Instruct)
│
▼
Supervised Fine-Tuning (SFT) + LoRA (r=16)
│ • 3,209 high-quality Claude reasoning samples
│ • Unsloth 2x faster training
│ • 1 epoch on T4 GPU (~46 mins)
│ • Final loss: 0.88
▼
Qwen2.5-Coder-3B-Claude-Opus-4.6-Distilled
📋 Training Details
| Parameter | Value |
|---|---|
| Base Model | Qwen/Qwen2.5-Coder-3B-Instruct |
| Framework | Unsloth 2026.3 |
| LoRA rank | 16 |
| LoRA alpha | 16 |
| Trainable params | 29,933,568 (0.96%) |
| Batch size | 16 (4 × 4 grad accum) |
| Learning rate | 2e-4 |
| Epochs | 1 |
| Max seq length | 4096 |
| Final train loss | 0.88 |
| GPU | Tesla T4 (16GB) |
| Training time | ~46 mins |
📚 Datasets Used
| Dataset | Samples | Purpose |
|---|---|---|
| nohurry/Opus-4.6-Reasoning-3000x-filtered | 2,326 | Claude 4.6 Opus reasoning trajectories |
| TeichAI/claude-4.5-opus-high-reasoning-250x | 250 | High-intensity structured reasoning |
| Jackrong/Qwen3.5-reasoning-700x | 633 | Step-by-step reasoning diversity |
| Total | 3,209 |
🚀 Running Locally
Via Ollama (easiest)
ollama run hf.co/ryzdfm/qwen2.5-coder-3b-claude_opus_4.6-distilled
Via llama.cpp (for GPU acceleration)
./llama-cli.exe \
-m qwen2.5-coder-3b-claude_opus_4.6-distilled.Q4_K_M.gguf \
-ngl 99 \
--flash-attn on \
--jinja \
-cnv \
--repeat-penalty 1.1 \
-p "You are a helpful assistant that thinks step by step."
🌟 Core Capabilities
- Structured Reasoning — thinks through problems step by step in
<think>blocks before answering - Code Generation — built on Qwen2.5-Coder, strong at Python, JavaScript, algorithms
- Math & Logic — correctly solves multi-step problems with verification
- Fast Local Inference — 88 t/s on RTX 3050 4GB, fully GPU-accelerated
⚡ Hardware Requirements
| Quantization | VRAM | Speed (RTX 3050) |
|---|---|---|
| Q4_K_M (this file) | ~2.1 GB | ~88 t/s |
| Q3_K_M | ~1.7 GB | ~95 t/s |
| Q8_0 | ~3.3 GB | ~70 t/s |
Runs comfortably on 4GB VRAM laptops and desktops.
⚠️ Limitations
- 3B scale — will struggle with very long multi-file code generation or complex system design
- 1 epoch training — reasoning style is distilled but not as deep as larger models
- Hallucination risk — like all LLMs, may produce incorrect facts; always verify outputs
🙏 Acknowledgements
- Unsloth AI for making fine-tuning accessible on consumer hardware
- nohurry, TeichAI, and Jackrong for the high-quality distillation datasets
- Qwen team for the excellent Qwen2.5-Coder base model
📖 Citation
@misc{ryzdfm_qwen25coder_claude_distilled,
title = {Qwen2.5-Coder-3B Claude Opus 4.6 Reasoning Distilled},
author = {ryzdfm},
year = {2026},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/ryzdfm/qwen2.5-coder-3b-claude_opus_4.6-distilled}}
}
- Downloads last month
- 3,669