Sara-1.5-4B-it

Sara is a fine-tuned variant of Google's MedGemma-1.5-4B-it that excels at medical tool calling and agentic tasks in EHR/FHIR clinical workflows.

Model Description

Sara is specifically trained to interact with FHIR R4-compliant Electronic Health Record (EHR) systems through structured API calls. The model can:

Query patient data via FHIR GET requests (patient lookup, lab results, vitals)
Create clinical records via FHIR POST requests (medication orders, referrals, observations)
Extract and return structured answers in a consistent format

This makes Sara ideal for building clinical AI agents that need to interface with healthcare IT systems.

Intended Use

Sara is designed for:

Building AI agents that interact with FHIR R4-compliant EHR systems
Clinical decision support workflows requiring structured API interactions
Research on LLM agents in healthcare settings
Prototyping medical AI applications with tool-calling capabilities

Out-of-Scope Use

Direct clinical decision-making without human oversight
Deployment in production healthcare environments without proper validation
Use cases requiring real-time patient safety decisions

Training Data

Sara was fine-tuned on the MedToolCalling dataset, which contains 284 verified multi-turn conversations demonstrating correct FHIR API usage.

Dataset Overview

Attribute	Value
Total Samples	284
Format	Multi-turn conversations
Avg. Turns per Sample	2
Action Types	`GET`, `POST`, `FINISH`
Total GET Calls	225
Total POST Calls	78

Task Types Covered

Task	Description
Patient Lookup	Search patients by name, DOB, MRN
Age Calculation	Calculate patient age from DOB
Vitals Recording	Record blood pressure observations (POST)
Lab Queries	Query magnesium, potassium, CBG, HbA1C levels
Medication Orders	Conditionally order IV replacements with correct dosing
Referrals	Order orthopedic surgery referrals
Follow-up Labs	Schedule follow-up lab orders based on conditions

FHIR Resources Used

Patient - Search and retrieve patient demographics
Observation - Query labs and vitals, record new observations
MedicationRequest - Order medications
ServiceRequest - Order referrals and lab tests

How to Use

Installation

pip install transformers accelerate torch

Basic Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_id = "Alfaxad/Sara-1.5-4B-it"

model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained(model_id)

# Example: Patient lookup task
system_prompt = """You are an expert in using FHIR functions to assist medical professionals. You are given a question and a set of possible functions. Based on the question, you will need to make one or more function/tool calls to achieve the purpose.

1. If you decide to invoke a GET function, you MUST put it in the format of
GET url?param_name1=param_value1&param_name2=param_value2...

2. If you decide to invoke a POST function, you MUST put it in the format of
POST url
[your payload data in JSON format]

3. If you have got answers for all the questions and finished all the requested tasks, you MUST call to finish the conversation in the format of
FINISH([answer1, answer2, ...])

Your response must be in the format of one of the three cases, and you can call only one function each time.

Available FHIR endpoints:
- GET {api_base}/Patient - Search patients by name, DOB, identifier
- GET {api_base}/Observation - Query lab results and vitals
- POST {api_base}/Observation - Record new observations
- POST {api_base}/MedicationRequest - Order medications
- POST {api_base}/ServiceRequest - Order referrals and labs

Use http://localhost:8080/fhir/ as the api_base.

Question: What's the MRN of the patient with name John Smith and DOB of 1985-03-15?"""

messages = [{"role": "user", "content": system_prompt}]

input_text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)

inputs = tokenizer(input_text, return_tensors="pt").to(model.device)

with torch.inference_mode():
    outputs = model.generate(
        **inputs,
        max_new_tokens=512,
        do_sample=False,
        pad_token_id=tokenizer.pad_token_id,
    )

response = tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:], skip_special_tokens=True)
print(response)
# Expected output: GET http://localhost:8080/fhir/Patient?given=John&family=Smith&birthdate=1985-03-15

Multi-Turn Conversation Example

def run_agent_turn(model, tokenizer, conversation):
    """Run a single agent turn given the conversation history."""
    input_text = tokenizer.apply_chat_template(
        conversation,
        tokenize=False,
        add_generation_prompt=True
    )
    
    inputs = tokenizer(input_text, return_tensors="pt").to(model.device)
    
    with torch.inference_mode():
        outputs = model.generate(
            **inputs,
            max_new_tokens=512,
            do_sample=False,
            pad_token_id=tokenizer.pad_token_id,
        )
    
    response = tokenizer.decode(
        outputs[0][inputs["input_ids"].shape[-1]:], 
        skip_special_tokens=True
    )
    return response.strip()

# Initialize conversation with system prompt
conversation = [{"role": "user", "content": system_prompt}]

# Turn 1: Agent makes API call
agent_response = run_agent_turn(model, tokenizer, conversation)
print(f"Agent: {agent_response}")
# Output: GET http://localhost:8080/fhir/Patient?given=John&family=Smith&birthdate=1985-03-15

# Simulate FHIR server response
conversation.append({"role": "model", "content": agent_response})
fhir_response = """Here is the response from the GET request:
{
  "resourceType": "Bundle",
  "total": 1,
  "entry": [{
    "resource": {
      "resourceType": "Patient",
      "id": "S1234567",
      "identifier": [{"value": "S1234567"}],
      "name": [{"family": "Smith", "given": ["John"]}],
      "birthDate": "1985-03-15"
    }
  }]
}"""
conversation.append({"role": "user", "content": fhir_response})

# Turn 2: Agent extracts answer
agent_response = run_agent_turn(model, tokenizer, conversation)
print(f"Agent: {agent_response}")
# Output: FINISH(["S1234567"])

Agent Action Format

Sara responds in exactly one of three formats per turn:

GET Request

GET http://localhost:8080/fhir/{Resource}?param1=value1&param2=value2

POST Request

POST http://localhost:8080/fhir/{Resource}
{
  "resourceType": "...",
  "field": "value",
  ...
}

Final Answer

FINISH([answer1, answer2, ...])

Limitations

Domain Specificity: Sara is optimized for FHIR R4 API interactions and may not generalize well to other healthcare standards or non-medical tool-calling tasks.
Validation Required: Outputs should be validated before execution in any clinical system.
Not for Direct Patient Care: This model is intended for research and development purposes, not direct clinical decision-making.
Context Window: While the model supports up to 128K tokens, it was fine-tuned on sequences up to 16K tokens.

License

The use of Sara is governed by the Health AI Developer Foundations terms of use, inherited from the base MedGemma model.

Citation

If you use this model, please cite:

@misc{Sara,
  title={Sara-1.5-4B-it: A Fine-tuned MedGemma Model for Clinical Tool Calling},
  author={Alfaxad Eyembe, Nadhari AI Lab},
  year={2026},
  publisher={Hugging Face},
  url={https://huggingface.co/Nadhari/Sara-1.5-4B-it}
}

Base Model Citation

@article{sellergren2025medgemma,
  title={MedGemma Technical Report},
  author={Sellergren, Andrew and Kazemzadeh, Sahar and Jaroensri, Tiam and Kiraly, Atilla and others},
  journal={arXiv preprint arXiv:2507.05201},
  year={2025}
}

Dataset Citation

@misc{MedToolCalling,
  author = {Alfaxad Eyembe, Nadhari AI Lab},
  title = {MedToolCalling: Medical Tool Calling Dataset},
  year = {2026},
  publisher = {Hugging Face},
  url={https://huggingface.co/Nadhari/MedToolCalling}
}

Evaluation Framework Citation

@article{tang2025medagentbench,
  title={MedAgentBench: Dataset for Benchmarking LLMs as Agents in Medical Applications},
  author={Tang, Yixing and Zou, Kaizhao and Sun, Hao and Chen, Zheng and Chen, Jonathan H},
  journal={arXiv preprint arXiv:2501.14654},
  year={2025}
}

Downloads last month: 87

Safetensors

Model size

4B params

Tensor type

BF16

Model tree for Nadhari/Sara-1.5-4B-it

Base model

google/medgemma-1.5-4b-it

Finetuned

(41)

this model

Dataset used to train Nadhari/Sara-1.5-4B-it

Papers for Nadhari/Sara-1.5-4B-it

MedGemma Technical Report

Paper • 2507.05201 • Published Jul 7, 2025 • 16

MedAgentBench: A Realistic Virtual EHR Environment to Benchmark Medical LLM Agents

Paper • 2501.14654 • Published Jan 24, 2025