🌟 Qwen2.5-Coder-3B — Claude Opus 4.6 Reasoning Distilled

A compact, fast, locally-runnable coding model fine-tuned on top of Qwen2.5-Coder-3B-Instruct using high-quality reasoning trajectories distilled from Claude 4.6 Opus. Designed to run efficiently on consumer hardware with as little as 4GB VRAM at ~88 tokens/sec.


💡 Model Introduction

Qwen2.5-Coder-3B-Claude-Opus-4.6-Distilled combines the strong code generation foundation of Qwen2.5-Coder with the structured, step-by-step reasoning style of Claude 4.6 Opus. Through Supervised Fine-Tuning (SFT) with LoRA, the model learns to think through problems carefully inside <think> tags before delivering precise, well-structured answers.

Unlike larger distilled models, this 3B model is built for real local inference — fast, private, and fits comfortably in 4GB VRAM.


🧠 Reasoning Style

The model adopts Claude Opus's structured reasoning pattern:

<think>
Let me analyze this carefully.

1. Identify the core objective.
2. Break down into subcomponents.
3. Consider edge cases and constraints.
4. Formulate and verify the solution.
</think>

[Final clean answer here]

🗺️ Training Pipeline

Base Model (Qwen/Qwen2.5-Coder-3B-Instruct)
 │
 ▼
Supervised Fine-Tuning (SFT) + LoRA (r=16)
 │  • 3,209 high-quality Claude reasoning samples
 │  • Unsloth 2x faster training
 │  • 1 epoch on T4 GPU (~46 mins)
 │  • Final loss: 0.88
 ▼
Qwen2.5-Coder-3B-Claude-Opus-4.6-Distilled

📋 Training Details

Parameter Value
Base Model Qwen/Qwen2.5-Coder-3B-Instruct
Framework Unsloth 2026.3
LoRA rank 16
LoRA alpha 16
Trainable params 29,933,568 (0.96%)
Batch size 16 (4 × 4 grad accum)
Learning rate 2e-4
Epochs 1
Max seq length 4096
Final train loss 0.88
GPU Tesla T4 (16GB)
Training time ~46 mins

📚 Datasets Used

Dataset Samples Purpose
nohurry/Opus-4.6-Reasoning-3000x-filtered 2,326 Claude 4.6 Opus reasoning trajectories
TeichAI/claude-4.5-opus-high-reasoning-250x 250 High-intensity structured reasoning
Jackrong/Qwen3.5-reasoning-700x 633 Step-by-step reasoning diversity
Total 3,209

🚀 Running Locally

Via Ollama (easiest)

ollama run hf.co/ryzdfm/qwen2.5-coder-3b-claude_opus_4.6-distilled

Via llama.cpp (for GPU acceleration)

./llama-cli.exe \
  -m qwen2.5-coder-3b-claude_opus_4.6-distilled.Q4_K_M.gguf \
  -ngl 99 \
  --flash-attn on \
  --jinja \
  -cnv \
  --repeat-penalty 1.1 \
  -p "You are a helpful assistant that thinks step by step."

🌟 Core Capabilities

  • Structured Reasoning — thinks through problems step by step in <think> blocks before answering
  • Code Generation — built on Qwen2.5-Coder, strong at Python, JavaScript, algorithms
  • Math & Logic — correctly solves multi-step problems with verification
  • Fast Local Inference — 88 t/s on RTX 3050 4GB, fully GPU-accelerated

⚡ Hardware Requirements

Quantization VRAM Speed (RTX 3050)
Q4_K_M (this file) ~2.1 GB ~88 t/s
Q3_K_M ~1.7 GB ~95 t/s
Q8_0 ~3.3 GB ~70 t/s

Runs comfortably on 4GB VRAM laptops and desktops.


⚠️ Limitations

  • 3B scale — will struggle with very long multi-file code generation or complex system design
  • 1 epoch training — reasoning style is distilled but not as deep as larger models
  • Hallucination risk — like all LLMs, may produce incorrect facts; always verify outputs

🙏 Acknowledgements

  • Unsloth AI for making fine-tuning accessible on consumer hardware
  • nohurry, TeichAI, and Jackrong for the high-quality distillation datasets
  • Qwen team for the excellent Qwen2.5-Coder base model

📖 Citation

@misc{ryzdfm_qwen25coder_claude_distilled,
  title        = {Qwen2.5-Coder-3B Claude Opus 4.6 Reasoning Distilled},
  author       = {ryzdfm},
  year         = {2026},
  publisher    = {Hugging Face},
  howpublished = {\url{https://huggingface.co/ryzdfm/qwen2.5-coder-3b-claude_opus_4.6-distilled}}
}
Downloads last month
3,669
Safetensors
Model size
3B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ryzdfm/qwen2.5-coder-3b-claude_opus_4.6-distilled

Base model

Qwen/Qwen2.5-3B
Quantized
(92)
this model

Datasets used to train ryzdfm/qwen2.5-coder-3b-claude_opus_4.6-distilled