Fine-tuned Qwen3.5 MLX
Collection
26 items • Updated • 5
How to use TheCluster/Crow-9B-HERETIC-4.6-MLX-8bit with MLX:
# Make sure mlx-vlm is installed
# pip install --upgrade mlx-vlm
from mlx_vlm import load, generate
from mlx_vlm.prompt_utils import apply_chat_template
from mlx_vlm.utils import load_config
# Load the model
model, processor = load("TheCluster/Crow-9B-HERETIC-4.6-MLX-8bit")
config = load_config("TheCluster/Crow-9B-HERETIC-4.6-MLX-8bit")
# Prepare input
image = ["http://images.cocodataset.org/val2017/000000039769.jpg"]
prompt = "Describe this image."
# Apply chat template
formatted_prompt = apply_chat_template(
processor, config, prompt, num_images=1
)
# Generate output
output = generate(model, processor, formatted_prompt, image)
print(output)How to use TheCluster/Crow-9B-HERETIC-4.6-MLX-8bit with Pi:
# Install MLX LM: uv tool install mlx-lm # Start a local OpenAI-compatible server: mlx_lm.server --model "TheCluster/Crow-9B-HERETIC-4.6-MLX-8bit"
# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
"providers": {
"mlx-lm": {
"baseUrl": "http://localhost:8080/v1",
"api": "openai-completions",
"apiKey": "none",
"models": [
{
"id": "TheCluster/Crow-9B-HERETIC-4.6-MLX-8bit"
}
]
}
}
}# Start Pi in your project directory: pi
How to use TheCluster/Crow-9B-HERETIC-4.6-MLX-8bit with Hermes Agent:
# Install MLX LM: uv tool install mlx-lm # Start a local OpenAI-compatible server: mlx_lm.server --model "TheCluster/Crow-9B-HERETIC-4.6-MLX-8bit"
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default TheCluster/Crow-9B-HERETIC-4.6-MLX-8bit
hermes
Quality: quantized (8bit affine, group size: 32, 9.340 bpw)
Architecture: Qwen 3.5 | Parameters: 9 Billion | Teacher Model: Claude Opus 4.6 | Type: Distilled LLM
Default system prompt:
You are Crow, a precise and capable assistant for reasoning, writing, coding, and long-form dialogue.
Behavior rules:
- Answer the user's actual request directly.
- Be accurate, complete, and structured.
- Think before answering, but do not get stuck in repetitive loops or meta-commentary.
- If the request is ambiguous or incomplete, state what is missing and make the smallest reasonable assumption needed to continue.
- If the user wants creative writing, preserve tone, continuity, and character consistency.
- If the user wants analysis or technical help, prefer concrete steps, examples, and decisions over fluff.
- Finish with a usable answer, not just planning.
Shorter fallback system prompt:
You are Crow. Give direct, useful answers. Keep reasoning concise. Do not loop, do not repeat yourself, and do not pad. If context is missing, say what is missing in one sentence and continue with the best reasonable assumption.
More information here
This model was converted to MLX format from Crownelius/Crow-9B-HERETIC-4.6 using mlx-vlm version 0.4.
8-bit
Base model
Qwen/Qwen3.5-9B-Base