GLM-5-SWAN-5bit-MLX
Mixed-precision quantized version of THUDM/GLM-5 using SWAN.
GLM-5 (355B parameters). Experimental.
Metrics
| Metric | Value |
|---|---|
| Size | See model files |
| Average bits | 5 |
| Framework | MLX |
| WikiText-2 PPL | — |
Usage
from mlx_lm import load, generate
model, tokenizer = load("baa-ai/GLM-5-SWAN-5bit-MLX")
response = generate(model, tokenizer, prompt="Hello!", max_tokens=256)
print(response)
About SWAN
SWAN uses data-free per-tensor sensitivity analysis with composite scoring to allocate bit-widths across model layers.
Quantized by baa.ai
- Downloads last month
- 387
Model size
744B params
Tensor type
BF16
·
U32 ·
F32 ·
Hardware compatibility
Log In to add your hardware
4-bit
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support