Instructions to use qylis/llama3.2-3b-tuned with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use qylis/llama3.2-3b-tuned with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="qylis/llama3.2-3b-tuned") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("qylis/llama3.2-3b-tuned") model = AutoModelForCausalLM.from_pretrained("qylis/llama3.2-3b-tuned") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use qylis/llama3.2-3b-tuned with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "qylis/llama3.2-3b-tuned" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "qylis/llama3.2-3b-tuned", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/qylis/llama3.2-3b-tuned
- SGLang
How to use qylis/llama3.2-3b-tuned with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "qylis/llama3.2-3b-tuned" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "qylis/llama3.2-3b-tuned", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "qylis/llama3.2-3b-tuned" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "qylis/llama3.2-3b-tuned", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use qylis/llama3.2-3b-tuned with Docker Model Runner:
docker model run hf.co/qylis/llama3.2-3b-tuned
📖 Model Overview
qylis/llama3.2-3b-tuned is a fine-tuned version of Meta's Llama 3.2 3B, developed and maintained by Qylis. This model has been adapted for enhanced instruction-following and domain-specific performance, leveraging Qylis's proprietary fine-tuning pipeline.
| Property | Details |
|---|---|
| Base Model | meta-llama/Llama-3.2-3B |
| Model Type | Causal Language Model (CLM) |
| Architecture | LlamaForCausalLM |
| Parameters | ~3 Billion |
| Fine-tuned by | Qylis |
| Language | English |
| License | Llama 3.2 Community License |
🚀 Quick Start
Installation
pip install transformers torch accelerate
Inference
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
model_id = "qylis/llama3.2-3b-tuned"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.bfloat16,
device_map="auto"
)
prompt = "Your prompt here"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
with torch.no_grad():
outputs = model.generate(
**inputs,
max_new_tokens=512,
temperature=0.7,
top_p=0.9,
do_sample=True,
)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)
Pipeline API
from transformers import pipeline
pipe = pipeline(
"text-generation",
model="qylis/llama3.2-3b-tuned",
torch_dtype="auto",
device_map="auto"
)
result = pipe("Your prompt here", max_new_tokens=256)
print(result[0]["generated_text"])
🎯 Intended Use
This model is intended for:
- Instruction following — Responding to natural language instructions
- Text generation — Generating coherent and contextually relevant text
- Domain-specific tasks — Applications fine-tuned by Qylis for specific use cases
- Research and development — Experimentation with fine-tuned LLMs
Out-of-Scope Use
- Generating harmful, abusive, or misleading content
- High-stakes decision making without human oversight
- Use in applications requiring absolute factual accuracy without verification
🏋️ Training Details
| Property | Details |
|---|---|
| Base Model | meta-llama/Llama-3.2-3B |
| Fine-tuning Method | Supervised Fine-Tuning (SFT) |
| Fine-tuned by | Qylis |
| Framework | HuggingFace Transformers / PEFT |
📝 Additional training details, dataset information, and hyperparameters will be updated as documentation is finalized.
📊 Evaluation
Benchmark results and evaluation metrics will be published here. Stay tuned for updates from the Qylis team.
⚠️ Limitations & Bias
Like all large language models, this model may:
- Hallucinate — Generate plausible-sounding but factually incorrect information
- Reflect training biases — Exhibit biases present in the training data
- Struggle with long contexts — Performance may degrade with very long inputs
- Lack real-time knowledge — No access to information beyond the training cutoff
Always validate outputs in production settings, especially for critical applications.
📜 License
This model is based on Meta's Llama 3.2 and is subject to the Llama 3.2 Community License Agreement. By using this model, you agree to the terms of that license.
⚠️ Naming Requirement: Per the Llama 3.2 Community License, any fine-tuned model distributed publicly must include "Llama" at the beginning of its name (e.g.,
Llama-Qylis-3.2-3B-Tuned). Please ensure your model name on HuggingFace complies with this requirement.
🤝 About Qylis
Qylis is building next-generation AI solutions, from fine-tuned language models to production-ready AI applications.
🌐 qylis.com | 🤗 HuggingFace | 📧 Contact Us
📬 Citation
If you use this model in your research or application, please cite:
@misc{qylis2024llama32tuned,
title = {Qylis Llama-3.2-3B-Tuned},
author = {Qylis},
year = {2024},
publisher = {HuggingFace},
howpublished = {\url{https://huggingface.co/qylis/llama3.2-3b-tuned}}
}
- Downloads last month
- 11
Model tree for qylis/llama3.2-3b-tuned
Base model
meta-llama/Llama-3.2-3B