DuckyBlender/DiegoGPT-v3-MLX-8bit
This model is a 8-bit QLoRA fine-tune of Qwen/Qwen3-4B-MLX-8bit on around 500 examples of input output pairs. Trained and converted using mlx-lm version 0.26.0. https://wandb.ai/duckyblender/diegogpt-expanded/runs/345rsdlh
Run with system prompt /no_think and the following generation parameters:
--temp 0.7--top-p 0.8--top-k 20--min-p 0
Example usage:
pip install mlx-lm
from mlx_lm import load, generate
model, tokenizer = load("DuckyBlender/DiegoGPT-v3-MLX-8bit")
prompt = "are you red hat hacker?"
if tokenizer.chat_template is not None:
messages = [
{"role": "user", "content": user_input}
]
prompt = tokenizer.apply_chat_template(messages, add_generation_prompt=True, tokenize=False, enable_thinking=False)
else:
prompt = user_input
sampler = make_sampler(temp=0.7, top_p=0.8, top_k=20, min_p=0)
response = mlx_lm.generate(
model,
tokenizer,
prompt=prompt,
sampler=sampler,
verbose=True
)
Or directly via CLI:
mlx_lm.generate \
--model "DuckyBlender/diegogpt-v2-mlx-bf16" \
--temp 0.7 \
--top-p 0.8 \
--top-k 20 \
--min-p 0 \
--system "/no_think" \
--prompt "are you red hat hacker?"
Model uses ~4.5GB RAM during inference.
- Downloads last month
- 2
Model size
1B params
Tensor type
BF16
·
U32
·
Hardware compatibility
Log In
to add your hardware
8-bit
Model tree for DuckyBlender/DiegoGPT-v3-MLX-8bit
Base model
Qwen/Qwen3-4B-MLX-8bit