Qwen3-8B โ€” SWAN Mixed-Precision (6-bit avg)

This is Qwen3-8B quantized using SWAN (Statistical Weight Analysis for N-bit allocation) โ€” a data-free per-tensor mixed-precision quantization method for MLX on Apple Silicon.

Key Features

  • Data-free quantization: No calibration dataset required โ€” uses weight statistics only
  • Per-tensor bit allocation: Each tensor gets 2, 4, 8, or 16-bit based on sensitivity analysis
  • MLX native: Ready for inference on Apple Silicon via mlx_lm

Results

Metric BF16 SWAN (this model) Uniform 4-bit SWAN ฮ” vs BF16
PPL (WikiText-2) 9.727 10.097 10.249 +3.8%
ARC-Challenge (25-shot) 44.62% 43.43% 42.83% -1.2 pp
HellaSwag (10-shot) 60.04% 58.16% 58.14% -1.9 pp
Model size 15.3 GB 6.1 GB 4.1 GB 2.5x compression

Usage

pip install mlx-lm

# Generate text
python -m mlx_lm.generate \
    --model baa-ai/Qwen3-8B-SWAN-6bit \
    --prompt "Hello, how are you?"

# Interactive chat
python -m mlx_lm.chat --model baa-ai/Qwen3-8B-SWAN-6bit

Quantization Details

  • Method: SWAN v3 (hybrid normalization + optimized thresholds)
  • Average bits: ~5.82 bits per parameter
  • Base precision: 4-bit with selective 8-bit for sensitive layers
  • Sensitive layers: Early MLP layers, select attention projections
  • Hardware: Quantized on Apple M2 Ultra 192GB

About SWAN

SWAN computes four sensitivity metrics per tensor: SVD spectral concentration, excess kurtosis, output noise amplification, and reconstruction error proxy. These are combined into a composite score that drives automatic bit-width allocation โ€” without any calibration data.

  • Paper: SWAN: Data-Free Mixed-Precision Quantization for LLMs via Multi-Metric Sensitivity Analysis (Black Sheep AI Research, 2026)
Downloads last month
34
Safetensors
Model size
8B params
Tensor type
F16
ยท
U32
ยท
MLX
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for baa-ai/Qwen3-8B-SWAN-6bit-MLX

Base model

Qwen/Qwen3-8B-Base
Finetuned
Qwen/Qwen3-8B
Quantized
(234)
this model

Space using baa-ai/Qwen3-8B-SWAN-6bit-MLX 1