Instructions to use qylis/llama3.2-3b-tuned with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use qylis/llama3.2-3b-tuned with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="qylis/llama3.2-3b-tuned")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("qylis/llama3.2-3b-tuned")
model = AutoModelForCausalLM.from_pretrained("qylis/llama3.2-3b-tuned")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use qylis/llama3.2-3b-tuned with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "qylis/llama3.2-3b-tuned"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "qylis/llama3.2-3b-tuned",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/qylis/llama3.2-3b-tuned

SGLang

How to use qylis/llama3.2-3b-tuned with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "qylis/llama3.2-3b-tuned" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "qylis/llama3.2-3b-tuned",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "qylis/llama3.2-3b-tuned" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "qylis/llama3.2-3b-tuned",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use qylis/llama3.2-3b-tuned with Docker Model Runner:
```
docker model run hf.co/qylis/llama3.2-3b-tuned
```

🦙 Qylis / Llama-3.2-3B-Tuned

A fine-tuned Llama 3.2 3B model by Qylis

📖 Model Overview

qylis/llama3.2-3b-tuned is a fine-tuned version of Meta's Llama 3.2 3B, developed and maintained by Qylis. This model has been adapted for enhanced instruction-following and domain-specific performance, leveraging Qylis's proprietary fine-tuning pipeline.

Property	Details
Base Model	meta-llama/Llama-3.2-3B
Model Type	Causal Language Model (CLM)
Architecture	LlamaForCausalLM
Parameters	~3 Billion
Fine-tuned by	Qylis
Language	English
License	Llama 3.2 Community License

🚀 Quick Start

Installation

pip install transformers torch accelerate

Inference

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_id = "qylis/llama3.2-3b-tuned"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto"
)

prompt = "Your prompt here"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_new_tokens=512,
        temperature=0.7,
        top_p=0.9,
        do_sample=True,
    )

response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

Pipeline API

from transformers import pipeline

pipe = pipeline(
    "text-generation",
    model="qylis/llama3.2-3b-tuned",
    torch_dtype="auto",
    device_map="auto"
)

result = pipe("Your prompt here", max_new_tokens=256)
print(result[0]["generated_text"])

🎯 Intended Use

This model is intended for:

Instruction following — Responding to natural language instructions
Text generation — Generating coherent and contextually relevant text
Domain-specific tasks — Applications fine-tuned by Qylis for specific use cases
Research and development — Experimentation with fine-tuned LLMs

Out-of-Scope Use

Generating harmful, abusive, or misleading content
High-stakes decision making without human oversight
Use in applications requiring absolute factual accuracy without verification

🏋️ Training Details

Property	Details
Base Model	meta-llama/Llama-3.2-3B
Fine-tuning Method	Supervised Fine-Tuning (SFT)
Fine-tuned by	Qylis
Framework	HuggingFace Transformers / PEFT

📝 Additional training details, dataset information, and hyperparameters will be updated as documentation is finalized.

📊 Evaluation

Benchmark results and evaluation metrics will be published here. Stay tuned for updates from the Qylis team.

⚠️ Limitations & Bias

Like all large language models, this model may:

Hallucinate — Generate plausible-sounding but factually incorrect information
Reflect training biases — Exhibit biases present in the training data
Struggle with long contexts — Performance may degrade with very long inputs
Lack real-time knowledge — No access to information beyond the training cutoff

Always validate outputs in production settings, especially for critical applications.

📜 License

This model is based on Meta's Llama 3.2 and is subject to the Llama 3.2 Community License Agreement. By using this model, you agree to the terms of that license.

⚠️ Naming Requirement: Per the Llama 3.2 Community License, any fine-tuned model distributed publicly must include "Llama" at the beginning of its name (e.g., Llama-Qylis-3.2-3B-Tuned). Please ensure your model name on HuggingFace complies with this requirement.

🤝 About Qylis

Qylis is building next-generation AI solutions, from fine-tuned language models to production-ready AI applications.

🌐 qylis.com | 🤗 HuggingFace | 📧 Contact Us

📬 Citation

If you use this model in your research or application, please cite:

@misc{qylis2024llama32tuned,
  title        = {Qylis Llama-3.2-3B-Tuned},
  author       = {Qylis},
  year         = {2024},
  publisher    = {HuggingFace},
  howpublished = {\url{https://huggingface.co/qylis/llama3.2-3b-tuned}}
}

Downloads last month: 11

Safetensors

Model size

3B params

Tensor type

BF16

Model tree for qylis/llama3.2-3b-tuned

Base model

meta-llama/Llama-3.2-3B

Finetuned

(455)

this model

qylis
/

llama3.2-3b-tuned