Sara-1.5-4B-it
Sara is a fine-tuned variant of Google's MedGemma-1.5-4B-it that excels at medical tool calling and agentic tasks in EHR/FHIR clinical workflows.
Model Description
Sara is specifically trained to interact with FHIR R4-compliant Electronic Health Record (EHR) systems through structured API calls. The model can:
- Query patient data via FHIR GET requests (patient lookup, lab results, vitals)
- Create clinical records via FHIR POST requests (medication orders, referrals, observations)
- Extract and return structured answers in a consistent format
This makes Sara ideal for building clinical AI agents that need to interface with healthcare IT systems.
Intended Use
Sara is designed for:
- Building AI agents that interact with FHIR R4-compliant EHR systems
- Clinical decision support workflows requiring structured API interactions
- Research on LLM agents in healthcare settings
- Prototyping medical AI applications with tool-calling capabilities
Out-of-Scope Use
- Direct clinical decision-making without human oversight
- Deployment in production healthcare environments without proper validation
- Use cases requiring real-time patient safety decisions
Training Data
Sara was fine-tuned on the MedToolCalling dataset, which contains 284 verified multi-turn conversations demonstrating correct FHIR API usage.
Dataset Overview
| Attribute | Value |
|---|---|
| Total Samples | 284 |
| Format | Multi-turn conversations |
| Avg. Turns per Sample | 2 |
| Action Types | GET, POST, FINISH |
| Total GET Calls | 225 |
| Total POST Calls | 78 |
Task Types Covered
| Task | Description |
|---|---|
| Patient Lookup | Search patients by name, DOB, MRN |
| Age Calculation | Calculate patient age from DOB |
| Vitals Recording | Record blood pressure observations (POST) |
| Lab Queries | Query magnesium, potassium, CBG, HbA1C levels |
| Medication Orders | Conditionally order IV replacements with correct dosing |
| Referrals | Order orthopedic surgery referrals |
| Follow-up Labs | Schedule follow-up lab orders based on conditions |
FHIR Resources Used
Patient- Search and retrieve patient demographicsObservation- Query labs and vitals, record new observationsMedicationRequest- Order medicationsServiceRequest- Order referrals and lab tests
How to Use
Installation
pip install transformers accelerate torch
Basic Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model_id = "Alfaxad/Sara-1.5-4B-it"
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.bfloat16,
device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained(model_id)
# Example: Patient lookup task
system_prompt = """You are an expert in using FHIR functions to assist medical professionals. You are given a question and a set of possible functions. Based on the question, you will need to make one or more function/tool calls to achieve the purpose.
1. If you decide to invoke a GET function, you MUST put it in the format of
GET url?param_name1=param_value1¶m_name2=param_value2...
2. If you decide to invoke a POST function, you MUST put it in the format of
POST url
[your payload data in JSON format]
3. If you have got answers for all the questions and finished all the requested tasks, you MUST call to finish the conversation in the format of
FINISH([answer1, answer2, ...])
Your response must be in the format of one of the three cases, and you can call only one function each time.
Available FHIR endpoints:
- GET {api_base}/Patient - Search patients by name, DOB, identifier
- GET {api_base}/Observation - Query lab results and vitals
- POST {api_base}/Observation - Record new observations
- POST {api_base}/MedicationRequest - Order medications
- POST {api_base}/ServiceRequest - Order referrals and labs
Use http://localhost:8080/fhir/ as the api_base.
Question: What's the MRN of the patient with name John Smith and DOB of 1985-03-15?"""
messages = [{"role": "user", "content": system_prompt}]
input_text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
inputs = tokenizer(input_text, return_tensors="pt").to(model.device)
with torch.inference_mode():
outputs = model.generate(
**inputs,
max_new_tokens=512,
do_sample=False,
pad_token_id=tokenizer.pad_token_id,
)
response = tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:], skip_special_tokens=True)
print(response)
# Expected output: GET http://localhost:8080/fhir/Patient?given=John&family=Smith&birthdate=1985-03-15
Multi-Turn Conversation Example
def run_agent_turn(model, tokenizer, conversation):
"""Run a single agent turn given the conversation history."""
input_text = tokenizer.apply_chat_template(
conversation,
tokenize=False,
add_generation_prompt=True
)
inputs = tokenizer(input_text, return_tensors="pt").to(model.device)
with torch.inference_mode():
outputs = model.generate(
**inputs,
max_new_tokens=512,
do_sample=False,
pad_token_id=tokenizer.pad_token_id,
)
response = tokenizer.decode(
outputs[0][inputs["input_ids"].shape[-1]:],
skip_special_tokens=True
)
return response.strip()
# Initialize conversation with system prompt
conversation = [{"role": "user", "content": system_prompt}]
# Turn 1: Agent makes API call
agent_response = run_agent_turn(model, tokenizer, conversation)
print(f"Agent: {agent_response}")
# Output: GET http://localhost:8080/fhir/Patient?given=John&family=Smith&birthdate=1985-03-15
# Simulate FHIR server response
conversation.append({"role": "model", "content": agent_response})
fhir_response = """Here is the response from the GET request:
{
"resourceType": "Bundle",
"total": 1,
"entry": [{
"resource": {
"resourceType": "Patient",
"id": "S1234567",
"identifier": [{"value": "S1234567"}],
"name": [{"family": "Smith", "given": ["John"]}],
"birthDate": "1985-03-15"
}
}]
}"""
conversation.append({"role": "user", "content": fhir_response})
# Turn 2: Agent extracts answer
agent_response = run_agent_turn(model, tokenizer, conversation)
print(f"Agent: {agent_response}")
# Output: FINISH(["S1234567"])
Agent Action Format
Sara responds in exactly one of three formats per turn:
GET Request
GET http://localhost:8080/fhir/{Resource}?param1=value1¶m2=value2
POST Request
POST http://localhost:8080/fhir/{Resource}
{
"resourceType": "...",
"field": "value",
...
}
Final Answer
FINISH([answer1, answer2, ...])
Limitations
- Domain Specificity: Sara is optimized for FHIR R4 API interactions and may not generalize well to other healthcare standards or non-medical tool-calling tasks.
- Validation Required: Outputs should be validated before execution in any clinical system.
- Not for Direct Patient Care: This model is intended for research and development purposes, not direct clinical decision-making.
- Context Window: While the model supports up to 128K tokens, it was fine-tuned on sequences up to 16K tokens.
License
The use of Sara is governed by the Health AI Developer Foundations terms of use, inherited from the base MedGemma model.
Citation
If you use this model, please cite:
@misc{Sara,
title={Sara-1.5-4B-it: A Fine-tuned MedGemma Model for Clinical Tool Calling},
author={Alfaxad Eyembe, Nadhari AI Lab},
year={2026},
publisher={Hugging Face},
url={https://huggingface.co/Nadhari/Sara-1.5-4B-it}
}
Base Model Citation
@article{sellergren2025medgemma,
title={MedGemma Technical Report},
author={Sellergren, Andrew and Kazemzadeh, Sahar and Jaroensri, Tiam and Kiraly, Atilla and others},
journal={arXiv preprint arXiv:2507.05201},
year={2025}
}
Dataset Citation
@misc{MedToolCalling,
author = {Alfaxad Eyembe, Nadhari AI Lab},
title = {MedToolCalling: Medical Tool Calling Dataset},
year = {2026},
publisher = {Hugging Face},
url={https://huggingface.co/Nadhari/MedToolCalling}
}
Evaluation Framework Citation
@article{tang2025medagentbench,
title={MedAgentBench: Dataset for Benchmarking LLMs as Agents in Medical Applications},
author={Tang, Yixing and Zou, Kaizhao and Sun, Hao and Chen, Zheng and Chen, Jonathan H},
journal={arXiv preprint arXiv:2501.14654},
year={2025}
}
- Downloads last month
- 87
Model tree for Nadhari/Sara-1.5-4B-it
Base model
google/medgemma-1.5-4b-it