Instructions to use ansh0x/ace-0.5b-gguf with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use ansh0x/ace-0.5b-gguf with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="ansh0x/ace-0.5b-gguf", filename="ace-bf16.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": "What is the capital of France?" } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use ansh0x/ace-0.5b-gguf with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf ansh0x/ace-0.5b-gguf:BF16 # Run inference directly in the terminal: llama-cli -hf ansh0x/ace-0.5b-gguf:BF16
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf ansh0x/ace-0.5b-gguf:BF16 # Run inference directly in the terminal: llama-cli -hf ansh0x/ace-0.5b-gguf:BF16
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf ansh0x/ace-0.5b-gguf:BF16 # Run inference directly in the terminal: ./llama-cli -hf ansh0x/ace-0.5b-gguf:BF16
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf ansh0x/ace-0.5b-gguf:BF16 # Run inference directly in the terminal: ./build/bin/llama-cli -hf ansh0x/ace-0.5b-gguf:BF16
Use Docker
docker model run hf.co/ansh0x/ace-0.5b-gguf:BF16
- LM Studio
- Jan
- vLLM
How to use ansh0x/ace-0.5b-gguf with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "ansh0x/ace-0.5b-gguf" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "ansh0x/ace-0.5b-gguf", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/ansh0x/ace-0.5b-gguf:BF16
- Ollama
How to use ansh0x/ace-0.5b-gguf with Ollama:
ollama run hf.co/ansh0x/ace-0.5b-gguf:BF16
- Unsloth Studio new
How to use ansh0x/ace-0.5b-gguf with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for ansh0x/ace-0.5b-gguf to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for ansh0x/ace-0.5b-gguf to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for ansh0x/ace-0.5b-gguf to start chatting
- Pi new
How to use ansh0x/ace-0.5b-gguf with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf ansh0x/ace-0.5b-gguf:BF16
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "ansh0x/ace-0.5b-gguf:BF16" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use ansh0x/ace-0.5b-gguf with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf ansh0x/ace-0.5b-gguf:BF16
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default ansh0x/ace-0.5b-gguf:BF16
Run Hermes
hermes
- Docker Model Runner
How to use ansh0x/ace-0.5b-gguf with Docker Model Runner:
docker model run hf.co/ansh0x/ace-0.5b-gguf:BF16
- Lemonade
How to use ansh0x/ace-0.5b-gguf with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull ansh0x/ace-0.5b-gguf:BF16
Run and chat with the model
lemonade run user.ace-0.5b-gguf-BF16
List all available models
lemonade list
ACE 0.5B - Task Automation Model
Fine-tuned Qwen 0.5B for local task automation. Detects task types and generates execution plans.
Code: GitHub
Model Description
ACE is a 0.5B parameter language model fine-tuned for task automation. It can:
- Classify tasks (atomic, repetitive, clarification needed)
- Generate CLI commands for file operations
- Create execution plans with hotkeys
- Handle repetitive bulk operations
All inference runs on CPU - no GPU required.
Model Files
| File | Size | Quant | Use Case |
|---|---|---|---|
ace-bf16.gguf |
940MB | BF16 | Recommended - A bit slower inference, but better quality |
ace-q4-k-m.gguf |
385MB | Q4_K_M | Faster inference |
Training Details
Base Model: Qwen/Qwen2-0.5B
Method: LoRA fine-tuning (r=16, alpha=32)
Dataset: ~1000 custom task examples
Training: 500-700 steps, learning_rate=3e-5
Quantization: GGUF Q4_K_M with imatrix
Task Types:
- Atomic tasks (single operations)
- Repetitive tasks (bulk processing)
- Clarification requests (ambiguous inputs)
Data Format:
Input: {"task": "...", "directory": [...], "available_hotkeys": [...]}
Output: {"task_type": "atomic", "output": {"execution_plan": {...}}}
Usage
- Right now the model is a bit unstable and intended for only experimental usages.
- Refer to the GitHub repo for installation and usage.
Limitations
- Requires explicit file paths (no smart file search)
- Optimized for Linux commands (Should be able to work on Windows)
- CPU inference only (3-10 seconds on i3/i5)
- No visual understanding (text-only)
- English language only
Performance
Hardware benchmarks:
- Intel i5 (2018+): 3-5 seconds per task
- Intel i3 (2015+): 5-10 seconds per task
- Older hardware: 30-90 seconds per task
Bias and Ethics
Known biases:
- Training data focused on common developer workflows
- Linux command bias (more Linux than Windows examples)
- English-only (no multilingual support)
Ethical considerations:
- Model can generate destructive commands (file deletion)
- Users should review plans before execution
- No built-in safety checks for harmful operations
License
CC BY-NC-SA 4.0 (Non-commercial)
- โ Free for personal/research use
- โ Commercial use requires separate license
- โ Must provide attribution
- โ Derivatives must use same license
Additional Restriction: Training of AI/ML models using these weights is prohibited without explicit written permission.
Contact
- Issues: GitHub Issues
- Discussions: GitHub Discussions
More info: GitHub Repository ```
- Downloads last month
- 41
4-bit
16-bit