Instructions to use TachyHealth/Gazal-R1-32B-sft-merged-preview with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use TachyHealth/Gazal-R1-32B-sft-merged-preview with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="TachyHealth/Gazal-R1-32B-sft-merged-preview")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("TachyHealth/Gazal-R1-32B-sft-merged-preview")
model = AutoModelForCausalLM.from_pretrained("TachyHealth/Gazal-R1-32B-sft-merged-preview")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

PEFT
How to use TachyHealth/Gazal-R1-32B-sft-merged-preview with PEFT:
```
Task type is invalid.
```
Inference
Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use TachyHealth/Gazal-R1-32B-sft-merged-preview with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "TachyHealth/Gazal-R1-32B-sft-merged-preview"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "TachyHealth/Gazal-R1-32B-sft-merged-preview",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/TachyHealth/Gazal-R1-32B-sft-merged-preview

SGLang

How to use TachyHealth/Gazal-R1-32B-sft-merged-preview with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "TachyHealth/Gazal-R1-32B-sft-merged-preview" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "TachyHealth/Gazal-R1-32B-sft-merged-preview",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "TachyHealth/Gazal-R1-32B-sft-merged-preview" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "TachyHealth/Gazal-R1-32B-sft-merged-preview",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use TachyHealth/Gazal-R1-32B-sft-merged-preview with Docker Model Runner:
```
docker model run hf.co/TachyHealth/Gazal-R1-32B-sft-merged-preview
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

Gazal-R1-32B-sft-merged-preview

This is a DoRA adapter fine-tuned on top of Qwen/Qwen3-32B for specialized medical reasoning tasks.

Model description

This adapter was trained using PEFT/LoRA to enhance the base model's ability to perform step-by-step clinical reasoning and medical problem-solving.

Training data

The model was fine-tuned on a synthetic, structured reasoning dataset, which contains medical questions with step-by-step reasoning and final answers.

Training procedure

The model was trained using:

LoRA with rank 256
DoRA (Weight-Decomposed Low-Rank Adaptation)
rsLoRA (Rank-stabilized LoRA)
BF16 precision training

Use cases and limitations

This model is intended for medical education and clinical reasoning training. It should NOT be used for actual medical diagnosis or treatment decisions. Always consult qualified healthcare professionals for medical advice.

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

# Load base model and tokenizer
base_model_id = "Qwen/Qwen3-32B"
adapter_id = "TachyHealth/Gazal-R1-32B-sft-merged"

# Load the tokenizer and base model
tokenizer = AutoTokenizer.from_pretrained(base_model_id)
model = AutoModelForCausalLM.from_pretrained(
    base_model_id,
    torch_dtype="auto",
    device_map="auto",
)

# Load the LoRA adapter
model = PeftModel.from_pretrained(model, adapter_id)

# Prepare a prompt following the format during training
query = """[MEDICAL QUESTION]"""

messages = [
    {"role": "system", "content": "When solving complex medical problems, follow this specific format..."},
    {"role": "user", "content": query}
]

input_text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(input_text, return_tensors="pt").to(model.device)

# Generate response
outputs = model.generate(
    input_ids=inputs.input_ids,
    max_new_tokens=2048,
    temperature=0.6,
    do_sample=True,
)

response = tokenizer.decode(outputs[0][inputs.input_ids.shape[1]:], skip_special_tokens=True)
print(response)

Performance Results

Gazal-R1 achieves exceptional performance across standard medical benchmarks:

Model	Size	MMLU Pro (Medical)	MedMCQA	MedQA	PubMedQA
Gazal-R1 (Final)	32B	81.6	71.9	87.1	79.6
Gazal-R1 (SFT-only)	32B	79.3	72.3	86.9	77.6
Llama 3.1 405B Instruct	405B	70.2	75.8	81.9	74.6
Qwen 2.5 72B Instruct	72B	72.1	66.2	72.7	71.7
Med42-Llama3.1-70B	70B	66.1	72.4	80.4	77.6
Llama 3.1 70B Instruct	70B	74.5	72.5	78.4	78.5
QwQ 32B	32B	70.1	65.6	72.3	73.7
Qwen 3 32B	32B	78.4	71.6	84.4	76.7