vllm serve mgoin/Qwen3-0.6B-MXFP8

python tests/evals/gsm8k/gsm8k_eval.py
Running GSM8K evaluation: 1319 questions, 5-shot
Evaluating: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1319/1319 [00:04<00:00, 305.10it/s]

Results:
Accuracy: 0.389
Invalid responses: 0.001
Total latency: 4.336 s
Questions per second: 304.229
Total output tokens: 130948
Output tokens per second: 30203.298

Downloads last month: 27

Safetensors

Model size

0.6B params

Tensor type

BF16

F8_E4M3

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for mgoin/Qwen3-0.6B-MXFP8

Base model

Qwen/Qwen3-0.6B-Base

Finetuned

Qwen/Qwen3-0.6B

Quantized

(261)

this model