Instructions to use Surpem/Supertron1-8B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Surpem/Supertron1-8B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Surpem/Supertron1-8B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("Surpem/Supertron1-8B")
model = AutoModelForCausalLM.from_pretrained("Surpem/Supertron1-8B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Inference
Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use Surpem/Supertron1-8B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Surpem/Supertron1-8B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Surpem/Supertron1-8B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/Surpem/Supertron1-8B

SGLang

How to use Surpem/Supertron1-8B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Surpem/Supertron1-8B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Surpem/Supertron1-8B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Surpem/Supertron1-8B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Surpem/Supertron1-8B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use Surpem/Supertron1-8B with Docker Model Runner:
```
docker model run hf.co/Surpem/Supertron1-8B
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

Supertron1-8B: A Capable, Efficient Instruction-Tuned Language Model

Model Description

Supertron1-8B is an instruction-tuned language model built on top of Qwen3-8B-Base. Designed to be a reliable, efficient daily driver, it delivers strong performance across math, coding, reasoning, and general conversation while remaining fast enough to run on consumer hardware with a capable GPU.

Developed by: Surpem
Model type: Causal Language Model
Architecture: Dense Transformer, 8B parameters
Fine-tuned from: Qwen/Qwen3-8B-Base
Fine-tuning method: LoRA (r=16, alpha=32, all-linear targets)
License: Apache 2.0

Capabilities

Reasoning

Supertron1-8B was trained on long-form chain-of-thought reasoning traces, making it capable of breaking down complex multi-step problems clearly and methodically. It thinks through problems before answering rather than jumping to conclusions, resulting in more reliable and explainable outputs.

Math

With dedicated training on competition-style math problems and step-by-step solutions, the model handles everything from algebra and calculus to word problems with structured, verifiable working. It consistently shows its reasoning rather than just producing a final answer.

Coding

Supertron1-8B can write, debug, and explain code across popular languages including Python, JavaScript, C++, and more. Trained on filtered, high-quality coding instruction data, it understands not just syntax but software design patterns, algorithmic thinking, and best practices.

Science & General Knowledge

Broad instruction tuning across science, STEM, and general knowledge domains means the model can hold detailed technical conversations, explain difficult concepts clearly, and assist with research, writing, and analysis tasks.

Instruction Following

The model is highly responsive to natural language instructions. Whether you need concise answers, detailed explanations, structured output, or creative writing, Supertron1-8B adapts to the format and tone you ask for without needing complex prompting tricks.

Get Started

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_id = "surpem/supertron1-8b"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto"
)

messages = [
    {"role": "user", "content": "Explain the difference between LoRA and full fine-tuning."}
]

text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:], skip_special_tokens=True))

Hardware Requirements

Precision	Min VRAM	Recommended
bfloat16	18 GB	24 GB (RTX 3090/4090)
4-bit quantized	8 GB	12 GB (RTX 3060/4070)

For 4-bit quantized inference:

from transformers import BitsAndBytesConfig

bnb_config = BitsAndBytesConfig(load_in_4bit=True, bnb_4bit_compute_dtype=torch.bfloat16)
model = AutoModelForCausalLM.from_pretrained(model_id, quantization_config=bnb_config, device_map="auto")

Citation

@misc{surpem2026supertron1-8b,
      title={Supertron1-8B — Efficient Instruction-Tuned Language Model},
      author={Surpem},
      year={2026},
      url={https://huggingface.co/surpem/supertron1-8b},
}