Instructions to use TachyHealth/Gazal-R1-32B-sft-merged-preview with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use TachyHealth/Gazal-R1-32B-sft-merged-preview with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="TachyHealth/Gazal-R1-32B-sft-merged-preview") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("TachyHealth/Gazal-R1-32B-sft-merged-preview") model = AutoModelForCausalLM.from_pretrained("TachyHealth/Gazal-R1-32B-sft-merged-preview") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - PEFT
How to use TachyHealth/Gazal-R1-32B-sft-merged-preview with PEFT:
Task type is invalid.
- Inference
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use TachyHealth/Gazal-R1-32B-sft-merged-preview with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "TachyHealth/Gazal-R1-32B-sft-merged-preview" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "TachyHealth/Gazal-R1-32B-sft-merged-preview", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/TachyHealth/Gazal-R1-32B-sft-merged-preview
- SGLang
How to use TachyHealth/Gazal-R1-32B-sft-merged-preview with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "TachyHealth/Gazal-R1-32B-sft-merged-preview" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "TachyHealth/Gazal-R1-32B-sft-merged-preview", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "TachyHealth/Gazal-R1-32B-sft-merged-preview" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "TachyHealth/Gazal-R1-32B-sft-merged-preview", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use TachyHealth/Gazal-R1-32B-sft-merged-preview with Docker Model Runner:
docker model run hf.co/TachyHealth/Gazal-R1-32B-sft-merged-preview
Gazal-R1-32B-sft-merged-preview
This is a DoRA adapter fine-tuned on top of Qwen/Qwen3-32B for specialized medical reasoning tasks.
Model description
This adapter was trained using PEFT/LoRA to enhance the base model's ability to perform step-by-step clinical reasoning and medical problem-solving.
Training data
The model was fine-tuned on a synthetic, structured reasoning dataset, which contains medical questions with step-by-step reasoning and final answers.
Training procedure
The model was trained using:
- LoRA with rank 256
- DoRA (Weight-Decomposed Low-Rank Adaptation)
- rsLoRA (Rank-stabilized LoRA)
- BF16 precision training
Use cases and limitations
This model is intended for medical education and clinical reasoning training. It should NOT be used for actual medical diagnosis or treatment decisions. Always consult qualified healthcare professionals for medical advice.
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
# Load base model and tokenizer
base_model_id = "Qwen/Qwen3-32B"
adapter_id = "TachyHealth/Gazal-R1-32B-sft-merged"
# Load the tokenizer and base model
tokenizer = AutoTokenizer.from_pretrained(base_model_id)
model = AutoModelForCausalLM.from_pretrained(
base_model_id,
torch_dtype="auto",
device_map="auto",
)
# Load the LoRA adapter
model = PeftModel.from_pretrained(model, adapter_id)
# Prepare a prompt following the format during training
query = """[MEDICAL QUESTION]"""
messages = [
{"role": "system", "content": "When solving complex medical problems, follow this specific format..."},
{"role": "user", "content": query}
]
input_text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(input_text, return_tensors="pt").to(model.device)
# Generate response
outputs = model.generate(
input_ids=inputs.input_ids,
max_new_tokens=2048,
temperature=0.6,
do_sample=True,
)
response = tokenizer.decode(outputs[0][inputs.input_ids.shape[1]:], skip_special_tokens=True)
print(response)
Performance Results
Gazal-R1 achieves exceptional performance across standard medical benchmarks:
| Model | Size | MMLU Pro (Medical) | MedMCQA | MedQA | PubMedQA |
|---|---|---|---|---|---|
| Gazal-R1 (Final) | 32B | 81.6 | 71.9 | 87.1 | 79.6 |
| Gazal-R1 (SFT-only) | 32B | 79.3 | 72.3 | 86.9 | 77.6 |
| Llama 3.1 405B Instruct | 405B | 70.2 | 75.8 | 81.9 | 74.6 |
| Qwen 2.5 72B Instruct | 72B | 72.1 | 66.2 | 72.7 | 71.7 |
| Med42-Llama3.1-70B | 70B | 66.1 | 72.4 | 80.4 | 77.6 |
| Llama 3.1 70B Instruct | 70B | 74.5 | 72.5 | 78.4 | 78.5 |
| QwQ 32B | 32B | 70.1 | 65.6 | 72.3 | 73.7 |
| Qwen 3 32B | 32B | 78.4 | 71.6 | 84.4 | 76.7 |
- Downloads last month
- 7