Instructions to use prithivMLmods/Magpie-Qwen-CortexDual-0.6B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use prithivMLmods/Magpie-Qwen-CortexDual-0.6B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="prithivMLmods/Magpie-Qwen-CortexDual-0.6B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("prithivMLmods/Magpie-Qwen-CortexDual-0.6B")
model = AutoModelForCausalLM.from_pretrained("prithivMLmods/Magpie-Qwen-CortexDual-0.6B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use prithivMLmods/Magpie-Qwen-CortexDual-0.6B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "prithivMLmods/Magpie-Qwen-CortexDual-0.6B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "prithivMLmods/Magpie-Qwen-CortexDual-0.6B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/prithivMLmods/Magpie-Qwen-CortexDual-0.6B

SGLang

How to use prithivMLmods/Magpie-Qwen-CortexDual-0.6B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "prithivMLmods/Magpie-Qwen-CortexDual-0.6B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "prithivMLmods/Magpie-Qwen-CortexDual-0.6B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "prithivMLmods/Magpie-Qwen-CortexDual-0.6B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "prithivMLmods/Magpie-Qwen-CortexDual-0.6B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use prithivMLmods/Magpie-Qwen-CortexDual-0.6B with Docker Model Runner:
```
docker model run hf.co/prithivMLmods/Magpie-Qwen-CortexDual-0.6B
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

Magpie-Qwen-CortexDual-0.6B

Magpie-Qwen-CortexDual-0.6B is a specialized, general-purpose model designed for math, code, and structured reasoning. Built with CortexDual thinking mode, it dynamically adapts to the complexity of a problem, automatically shifting into a stepwise reasoning mode for intricate logic or math tasks. This 0.6B parameter model leverages 80% of the Magpie Pro 330k dataset and a modular blend of datasets for general-purpose proficiency and domain versatility.

GGUF : https://huggingface.co/prithivMLmods/Magpie-Qwen-CortexDual-0.6B-GGUF

Key Features

Adaptive Reasoning via CortexDual Automatically switches into a deeper thinking mode for complex problems, simulating trace-style deduction for higher-order tasks in math and code.
Efficient and Compact At 0.6B parameters, it is optimized for deployment in constrained environments while retaining high fidelity in logic, computation, and structural formatting.
Magpie-Driven Data Synthesis Trained using 80% of Magpie Pro 330k—a high-quality alignment and reasoning dataset—complemented with curated modular datasets for enhanced general-purpose capabilities.
Mathematical Precision Fine-tuned for arithmetic, algebra, calculus, and symbolic logic; ideal for STEM learning platforms, math solvers, and step-by-step tutoring.
Lightweight Code Assistance Understands and generates code in Python, JavaScript, and other common languages with contextual accuracy and explanation support.
Structured Output Generation Specializes in Markdown, JSON, and table outputs, suitable for technical documentation, instruction generation, and structured reasoning.
Multilingual Competence Supports over 20 languages with reasoning and translation support, expanding its reach for global educational and development use.

Quickstart with Transformers

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "prithivMLmods/Magpie-Qwen-CortexDual-0.6B"

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)

prompt = "Write a Python function to check if a number is prime. Explain each step."

messages = [
    {"role": "system", "content": "You are an AI tutor skilled in both math and code."},
    {"role": "user", "content": prompt}
]

text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)

model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=512
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(response)

Demo Inference

non-thinking (direct, reactive, retrieval-based responses)

thinking (reasoning, planning, deeper analysis)

Intended Use

General-purpose problem solving in math, logic, and code
Interactive STEM tutoring and reasoning explanation
Compact assistant for technical documentation and structured data tasks
Multilingual applications with a focus on accurate technical reasoning
Efficient offline deployment on low-resource devices

Limitations

Lower creativity and open-domain generation due to reasoning-focused tuning
Limited context window size due to compact model size
May produce simplified logic paths in highly abstract domains
Trade-offs in diversity and expressiveness compared to larger instruction-tuned models