Qwen3-8B โ SWAN Mixed-Precision (6-bit avg)
This is Qwen3-8B quantized using SWAN (Statistical Weight Analysis for N-bit allocation) โ a data-free per-tensor mixed-precision quantization method for MLX on Apple Silicon.
Key Features
- Data-free quantization: No calibration dataset required โ uses weight statistics only
- Per-tensor bit allocation: Each tensor gets 2, 4, 8, or 16-bit based on sensitivity analysis
- MLX native: Ready for inference on Apple Silicon via
mlx_lm
Results
| Metric | BF16 | SWAN (this model) | Uniform 4-bit | SWAN ฮ vs BF16 |
|---|---|---|---|---|
| PPL (WikiText-2) | 9.727 | 10.097 | 10.249 | +3.8% |
| ARC-Challenge (25-shot) | 44.62% | 43.43% | 42.83% | -1.2 pp |
| HellaSwag (10-shot) | 60.04% | 58.16% | 58.14% | -1.9 pp |
| Model size | 15.3 GB | 6.1 GB | 4.1 GB | 2.5x compression |
Usage
pip install mlx-lm
# Generate text
python -m mlx_lm.generate \
--model baa-ai/Qwen3-8B-SWAN-6bit \
--prompt "Hello, how are you?"
# Interactive chat
python -m mlx_lm.chat --model baa-ai/Qwen3-8B-SWAN-6bit
Quantization Details
- Method: SWAN v3 (hybrid normalization + optimized thresholds)
- Average bits: ~5.82 bits per parameter
- Base precision: 4-bit with selective 8-bit for sensitive layers
- Sensitive layers: Early MLP layers, select attention projections
- Hardware: Quantized on Apple M2 Ultra 192GB
About SWAN
SWAN computes four sensitivity metrics per tensor: SVD spectral concentration, excess kurtosis, output noise amplification, and reconstruction error proxy. These are combined into a composite score that drives automatic bit-width allocation โ without any calibration data.
- Paper: SWAN: Data-Free Mixed-Precision Quantization for LLMs via Multi-Metric Sensitivity Analysis (Black Sheep AI Research, 2026)
- Downloads last month
- 34
Model size
8B params
Tensor type
F16
ยท
U32 ยท
Hardware compatibility
Log In to add your hardware
4-bit