Sara-1.5-4B-it

Sara is a fine-tuned variant of Google's MedGemma-1.5-4B-it that excels at medical tool calling and agentic tasks in EHR/FHIR clinical workflows.

Model Description

Sara is specifically trained to interact with FHIR R4-compliant Electronic Health Record (EHR) systems through structured API calls. The model can:

  • Query patient data via FHIR GET requests (patient lookup, lab results, vitals)
  • Create clinical records via FHIR POST requests (medication orders, referrals, observations)
  • Extract and return structured answers in a consistent format

This makes Sara ideal for building clinical AI agents that need to interface with healthcare IT systems.

Intended Use

Sara is designed for:

  • Building AI agents that interact with FHIR R4-compliant EHR systems
  • Clinical decision support workflows requiring structured API interactions
  • Research on LLM agents in healthcare settings
  • Prototyping medical AI applications with tool-calling capabilities

Out-of-Scope Use

  • Direct clinical decision-making without human oversight
  • Deployment in production healthcare environments without proper validation
  • Use cases requiring real-time patient safety decisions

Training Data

Sara was fine-tuned on the MedToolCalling dataset, which contains 284 verified multi-turn conversations demonstrating correct FHIR API usage.

Dataset Overview

Attribute Value
Total Samples 284
Format Multi-turn conversations
Avg. Turns per Sample 2
Action Types GET, POST, FINISH
Total GET Calls 225
Total POST Calls 78

Task Types Covered

Task Description
Patient Lookup Search patients by name, DOB, MRN
Age Calculation Calculate patient age from DOB
Vitals Recording Record blood pressure observations (POST)
Lab Queries Query magnesium, potassium, CBG, HbA1C levels
Medication Orders Conditionally order IV replacements with correct dosing
Referrals Order orthopedic surgery referrals
Follow-up Labs Schedule follow-up lab orders based on conditions

FHIR Resources Used

  • Patient - Search and retrieve patient demographics
  • Observation - Query labs and vitals, record new observations
  • MedicationRequest - Order medications
  • ServiceRequest - Order referrals and lab tests

How to Use

Installation

pip install transformers accelerate torch

Basic Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_id = "Alfaxad/Sara-1.5-4B-it"

model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained(model_id)

# Example: Patient lookup task
system_prompt = """You are an expert in using FHIR functions to assist medical professionals. You are given a question and a set of possible functions. Based on the question, you will need to make one or more function/tool calls to achieve the purpose.

1. If you decide to invoke a GET function, you MUST put it in the format of
GET url?param_name1=param_value1&param_name2=param_value2...

2. If you decide to invoke a POST function, you MUST put it in the format of
POST url
[your payload data in JSON format]

3. If you have got answers for all the questions and finished all the requested tasks, you MUST call to finish the conversation in the format of
FINISH([answer1, answer2, ...])

Your response must be in the format of one of the three cases, and you can call only one function each time.

Available FHIR endpoints:
- GET {api_base}/Patient - Search patients by name, DOB, identifier
- GET {api_base}/Observation - Query lab results and vitals
- POST {api_base}/Observation - Record new observations
- POST {api_base}/MedicationRequest - Order medications
- POST {api_base}/ServiceRequest - Order referrals and labs

Use http://localhost:8080/fhir/ as the api_base.

Question: What's the MRN of the patient with name John Smith and DOB of 1985-03-15?"""

messages = [{"role": "user", "content": system_prompt}]

input_text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)

inputs = tokenizer(input_text, return_tensors="pt").to(model.device)

with torch.inference_mode():
    outputs = model.generate(
        **inputs,
        max_new_tokens=512,
        do_sample=False,
        pad_token_id=tokenizer.pad_token_id,
    )

response = tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:], skip_special_tokens=True)
print(response)
# Expected output: GET http://localhost:8080/fhir/Patient?given=John&family=Smith&birthdate=1985-03-15

Multi-Turn Conversation Example

def run_agent_turn(model, tokenizer, conversation):
    """Run a single agent turn given the conversation history."""
    input_text = tokenizer.apply_chat_template(
        conversation,
        tokenize=False,
        add_generation_prompt=True
    )
    
    inputs = tokenizer(input_text, return_tensors="pt").to(model.device)
    
    with torch.inference_mode():
        outputs = model.generate(
            **inputs,
            max_new_tokens=512,
            do_sample=False,
            pad_token_id=tokenizer.pad_token_id,
        )
    
    response = tokenizer.decode(
        outputs[0][inputs["input_ids"].shape[-1]:], 
        skip_special_tokens=True
    )
    return response.strip()

# Initialize conversation with system prompt
conversation = [{"role": "user", "content": system_prompt}]

# Turn 1: Agent makes API call
agent_response = run_agent_turn(model, tokenizer, conversation)
print(f"Agent: {agent_response}")
# Output: GET http://localhost:8080/fhir/Patient?given=John&family=Smith&birthdate=1985-03-15

# Simulate FHIR server response
conversation.append({"role": "model", "content": agent_response})
fhir_response = """Here is the response from the GET request:
{
  "resourceType": "Bundle",
  "total": 1,
  "entry": [{
    "resource": {
      "resourceType": "Patient",
      "id": "S1234567",
      "identifier": [{"value": "S1234567"}],
      "name": [{"family": "Smith", "given": ["John"]}],
      "birthDate": "1985-03-15"
    }
  }]
}"""
conversation.append({"role": "user", "content": fhir_response})

# Turn 2: Agent extracts answer
agent_response = run_agent_turn(model, tokenizer, conversation)
print(f"Agent: {agent_response}")
# Output: FINISH(["S1234567"])

Agent Action Format

Sara responds in exactly one of three formats per turn:

GET Request

GET http://localhost:8080/fhir/{Resource}?param1=value1&param2=value2

POST Request

POST http://localhost:8080/fhir/{Resource}
{
  "resourceType": "...",
  "field": "value",
  ...
}

Final Answer

FINISH([answer1, answer2, ...])

Limitations

  • Domain Specificity: Sara is optimized for FHIR R4 API interactions and may not generalize well to other healthcare standards or non-medical tool-calling tasks.
  • Validation Required: Outputs should be validated before execution in any clinical system.
  • Not for Direct Patient Care: This model is intended for research and development purposes, not direct clinical decision-making.
  • Context Window: While the model supports up to 128K tokens, it was fine-tuned on sequences up to 16K tokens.

License

The use of Sara is governed by the Health AI Developer Foundations terms of use, inherited from the base MedGemma model.

Citation

If you use this model, please cite:

@misc{Sara,
  title={Sara-1.5-4B-it: A Fine-tuned MedGemma Model for Clinical Tool Calling},
  author={Alfaxad Eyembe, Nadhari AI Lab},
  year={2026},
  publisher={Hugging Face},
  url={https://huggingface.co/Nadhari/Sara-1.5-4B-it}
}

Base Model Citation

@article{sellergren2025medgemma,
  title={MedGemma Technical Report},
  author={Sellergren, Andrew and Kazemzadeh, Sahar and Jaroensri, Tiam and Kiraly, Atilla and others},
  journal={arXiv preprint arXiv:2507.05201},
  year={2025}
}

Dataset Citation

@misc{MedToolCalling,
  author = {Alfaxad Eyembe, Nadhari AI Lab},
  title = {MedToolCalling: Medical Tool Calling Dataset},
  year = {2026},
  publisher = {Hugging Face},
  url={https://huggingface.co/Nadhari/MedToolCalling}
}

Evaluation Framework Citation

@article{tang2025medagentbench,
  title={MedAgentBench: Dataset for Benchmarking LLMs as Agents in Medical Applications},
  author={Tang, Yixing and Zou, Kaizhao and Sun, Hao and Chen, Zheng and Chen, Jonathan H},
  journal={arXiv preprint arXiv:2501.14654},
  year={2025}
}
Downloads last month
87
Safetensors
Model size
4B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Nadhari/Sara-1.5-4B-it

Finetuned
(41)
this model

Dataset used to train Nadhari/Sara-1.5-4B-it

Papers for Nadhari/Sara-1.5-4B-it