🌟 Qwen2.5-Coder-3B — Claude Opus 4.6 Reasoning Distilled

A compact, fast, locally-runnable coding model fine-tuned on top of Qwen2.5-Coder-3B-Instruct using high-quality reasoning trajectories distilled from Claude 4.6 Opus. Designed to run efficiently on consumer hardware with as little as 4GB VRAM at ~88 tokens/sec.

💡 Model Introduction

Qwen2.5-Coder-3B-Claude-Opus-4.6-Distilled combines the strong code generation foundation of Qwen2.5-Coder with the structured, step-by-step reasoning style of Claude 4.6 Opus. Through Supervised Fine-Tuning (SFT) with LoRA, the model learns to think through problems carefully inside <think> tags before delivering precise, well-structured answers.

Unlike larger distilled models, this 3B model is built for real local inference — fast, private, and fits comfortably in 4GB VRAM.

🧠 Reasoning Style

The model adopts Claude Opus's structured reasoning pattern:

<think>
Let me analyze this carefully.

1. Identify the core objective.
2. Break down into subcomponents.
3. Consider edge cases and constraints.
4. Formulate and verify the solution.
</think>

[Final clean answer here]

🗺️ Training Pipeline

Base Model (Qwen/Qwen2.5-Coder-3B-Instruct)
 │
 ▼
Supervised Fine-Tuning (SFT) + LoRA (r=16)
 │  • 3,209 high-quality Claude reasoning samples
 │  • Unsloth 2x faster training
 │  • 1 epoch on T4 GPU (~46 mins)
 │  • Final loss: 0.88
 ▼
Qwen2.5-Coder-3B-Claude-Opus-4.6-Distilled

📋 Training Details

Parameter	Value
Base Model	Qwen/Qwen2.5-Coder-3B-Instruct
Framework	Unsloth 2026.3
LoRA rank	16
LoRA alpha	16
Trainable params	29,933,568 (0.96%)
Batch size	16 (4 × 4 grad accum)
Learning rate	2e-4
Epochs	1
Max seq length	4096
Final train loss	0.88
GPU	Tesla T4 (16GB)
Training time	~46 mins

📚 Datasets Used

Dataset	Samples	Purpose
nohurry/Opus-4.6-Reasoning-3000x-filtered	2,326	Claude 4.6 Opus reasoning trajectories
TeichAI/claude-4.5-opus-high-reasoning-250x	250	High-intensity structured reasoning
Jackrong/Qwen3.5-reasoning-700x	633	Step-by-step reasoning diversity
Total	3,209

🚀 Running Locally

Via Ollama (easiest)

ollama run hf.co/ryzdfm/qwen2.5-coder-3b-claude_opus_4.6-distilled

Via llama.cpp (for GPU acceleration)

./llama-cli.exe \
  -m qwen2.5-coder-3b-claude_opus_4.6-distilled.Q4_K_M.gguf \
  -ngl 99 \
  --flash-attn on \
  --jinja \
  -cnv \
  --repeat-penalty 1.1 \
  -p "You are a helpful assistant that thinks step by step."

🌟 Core Capabilities

Structured Reasoning — thinks through problems step by step in <think> blocks before answering
Code Generation — built on Qwen2.5-Coder, strong at Python, JavaScript, algorithms
Math & Logic — correctly solves multi-step problems with verification
Fast Local Inference — 88 t/s on RTX 3050 4GB, fully GPU-accelerated

⚡ Hardware Requirements

Quantization	VRAM	Speed (RTX 3050)
Q4_K_M (this file)	~2.1 GB	~88 t/s
Q3_K_M	~1.7 GB	~95 t/s
Q8_0	~3.3 GB	~70 t/s

Runs comfortably on 4GB VRAM laptops and desktops.

⚠️ Limitations

3B scale — will struggle with very long multi-file code generation or complex system design
1 epoch training — reasoning style is distilled but not as deep as larger models
Hallucination risk — like all LLMs, may produce incorrect facts; always verify outputs

🙏 Acknowledgements

Unsloth AI for making fine-tuning accessible on consumer hardware
nohurry, TeichAI, and Jackrong for the high-quality distillation datasets
Qwen team for the excellent Qwen2.5-Coder base model

📖 Citation

@misc{ryzdfm_qwen25coder_claude_distilled,
  title        = {Qwen2.5-Coder-3B Claude Opus 4.6 Reasoning Distilled},
  author       = {ryzdfm},
  year         = {2026},
  publisher    = {Hugging Face},
  howpublished = {\url{https://huggingface.co/ryzdfm/qwen2.5-coder-3b-claude_opus_4.6-distilled}}
}

Downloads last month: 3,669

Safetensors

Model size

3B params

Tensor type

BF16

Model tree for ryzdfm/qwen2.5-coder-3b-claude_opus_4.6-distilled

Base model

Qwen/Qwen2.5-3B

Finetuned

Qwen/Qwen2.5-Coder-3B