Gemma-4-12B-IT NVFP4 (ModelOpt)
NVFP4 post-training quantization of google/gemma-4-12B-it, produced with NVIDIA TensorRT Model Optimizer (nvidia-modelopt 0.44.0).
NVIDIA published official NVFP4 checkpoints for Gemma-4-31B and the 26B-A4B MoE, but not for the dense 12B. This checkpoint fills that gap, following the exact recipe NVIDIA used for nvidia/Gemma-4-31B-IT-NVFP4.
Recipe
- Format: NVFP4 (E2M1 weights + activations, FP8-E4M3 block scale over 16-element groups),
quant_algo: NVFP4. - Scope: the language-model MLP projections only (
gate_proj/up_proj/down_projacross all 48 decoder layers = 144 linears). Attention projections, the vision and audio towers, token embeddings andlm_headare kept in BF16 — identical to NVIDIA'signorelist for the 31B. - Calibration: 512 samples of cnn_dailymail (3.0.0),
algorithm: max. - Hardware: single NVIDIA RTX 5090 (Blackwell, sm_120). The 12B fits PTQ on one 32 GB GPU — no tensor-parallel needed.
The output is a standard ModelOpt compressed-tensors-style checkpoint (quant_method: modelopt) consumable by vLLM / SGLang / TensorRT-LLM, and by llama.cpp's convert_hf_to_gguf.py.
GGUF builds
Ready-to-run llama.cpp GGUFs (NVFP4 FFN + K-quant elsewhere) are at LibertAIDAI/Gemma-4-12B-IT-NVFP4-GGUF.
About LibertAI
LibertAI is a decentralized AI platform — private inference, an OpenAI-compatible API, and a chat UI, all running on community GPUs over Aleph Cloud. If you want to run agents without managing infrastructure, see LiberClaw.
License
Inherits the Gemma license from the upstream model.
- Downloads last month
- 1,284