Instructions to use albertoanalytics/pediatric-support-g4 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use albertoanalytics/pediatric-support-g4 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="albertoanalytics/pediatric-support-g4")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForMultimodalLM

processor = AutoProcessor.from_pretrained("albertoanalytics/pediatric-support-g4")
model = AutoModelForMultimodalLM.from_pretrained("albertoanalytics/pediatric-support-g4")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

PEFT
How to use albertoanalytics/pediatric-support-g4 with PEFT:
```
Task type is invalid.
```
Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use albertoanalytics/pediatric-support-g4 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "albertoanalytics/pediatric-support-g4"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "albertoanalytics/pediatric-support-g4",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/albertoanalytics/pediatric-support-g4

SGLang

How to use albertoanalytics/pediatric-support-g4 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "albertoanalytics/pediatric-support-g4" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "albertoanalytics/pediatric-support-g4",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "albertoanalytics/pediatric-support-g4" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "albertoanalytics/pediatric-support-g4",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use albertoanalytics/pediatric-support-g4 with Docker Model Runner:
```
docker model run hf.co/albertoanalytics/pediatric-support-g4
```

🩺 pediatric-support-g4 — v0.1 Pipeline Validation

A QLoRA adapter for Gemma 4 E4B, representing the first iteration of a research initiative to build offline-capable, locally-running LLMs for pediatric clinical decision support in resource-constrained clinical environments.

⚠️ Status and Intended Use

This is the first iteration of the project. Its sole purpose is to validate that the training pipeline, architecture, and hyperparameter configuration are stable and ready for scaled training.

This is NOT a medical device. It has not been validated for clinical use. It has not been benchmarked for diagnostic accuracy. Do not use in any patient-facing context. All outputs must be reviewed by a qualified healthcare professional. The authors accept no liability for decisions made based on model outputs.

This adapter is released for research and development purposes only. Its intended downstream use is as a foundation for a future, independently validated clinical decision support tool for tropical and endemic pediatric diseases in remote, offline clinical settings in the Americas.

Out-of-scope use:

Clinical diagnosis or treatment decisions of any kind
Any patient-facing application
General medical question answering in production settings
Use without a qualified healthcare professional reviewing all outputs

🚀 Quick Start

This is a LoRA adapter — it must be loaded alongside its base model.

Installation

pip install transformers peft bitsandbytes accelerate

Loading the Model

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

base_model_id = "unsloth/gemma-4-e4b-it-unsloth-bnb-4bit"
adapter_id    = "albertoanalytics/pediatric-support-g4-v1"

tokenizer = AutoTokenizer.from_pretrained(base_model_id)

model = AutoModelForCausalLM.from_pretrained(
    base_model_id,
    load_in_4bit=True,
    device_map="auto",
)
model = PeftModel.from_pretrained(model, adapter_id)
model.eval()

Inference Example

import torch

prompt = """You are a knowledgeable pediatric medicine assistant.
A 3-year-old presents with a barking cough, stridor at rest, and low-grade fever.
What is the most likely diagnosis and recommended first-line management?"""

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_new_tokens=512,
        temperature=0.7,
        do_sample=True,
    )

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

🏗️ Model Details

Property	Value
Base Model	`unsloth/gemma-4-e4b-it-unsloth-bnb-4bit`
Architecture	Gemma 4 E4B
Adapter Method	QLoRA (Quantized Low-Rank Adaptation)
LoRA Rank	16
LoRA Alpha	32
LoRA Dropout	0.05
Training Dataset	MedMCQA — pediatric subjects subset
Epochs	3
Total Steps	375
Final Loss	0.6220
Final Grad Norm	0.472
Total Tokens Seen	~2,587,816
Training Duration	1h 30m 24s
Training Framework	Unsloth Studio

Why Gemma 4 E4B

Gemma 4 E4B was chosen over MedGemma 1.5 4B (arXiv:2604.05081v2) for three reasons specific to this project's deployment requirements:

Thinking mode — extended chain-of-thought reasoning allows the clinician to follow and evaluate the model's reasoning process, not just receive an opaque conclusion. MedGemma 1.5 4B activates thinking via a prompted system instruction appended at inference time — it is not natively integrated into the architecture. Gemma 4 E4B, by contrast, controls thinking via a dedicated <|think|> token built into the model from the ground up, making it a first-class architectural capability rather than a prompted behaviour.
Mobile-first deployment — the E4B model is purpose-built for efficient local execution on smartphones. MedGemma 1.5 4B makes no equivalent claim about mobile optimisation, and its expanded capabilities — processing 3D CT/MRI volumes of up to 85 axial slices (21,760 vision tokens) and whole slide pathology images of up to 126 patches (32,256 vision tokens) per query (arXiv:2604.05081v2) — might not be fully leveraged on a smartphone, especially in remote and isolated field settings. Google's own recommended production deployment path for MedGemma 1.5 4B points explicitly to cloud infrastructure: Model Garden and Google Cloud Storage, with specialised server-side processing for large medical images. Gemma 4 E4B, by contrast, was explicitly designed for efficient execution on everyday devices such as smartphones.
No meaningful head start for this clinical scope — MedGemma 1.5 4B's medical pre-training reflects hospital-grade diagnostics (chest X-ray, 3D radiology, whole slide pathology, dermoscopy, ophthalmology). Conditions such as cutaneous leishmaniasis, severe dengue, Chagas disease, and Oropouche fever in children are not present in that training distribution. Both models require targeted fine-tuning for this scope; given that, Gemma 4's newer architecture with native reasoning and mobile optimisation is the stronger foundation.

Training Dynamics

🗺️ Roadmap

Version	Scope
✅ v0.1	Pipeline validation using MedMCQA pediatric subset
v0.2	Fine-tuning on Spanish-language Latin American clinical datasets (e.g. PeruMedQA as an Andean starting point) + expansion toward pan-regional tropical and endemic pediatric disease coverage + first accuracy benchmarks on smartphone inference
v0.3	Expanded clinical coverage + clinical expert review of outputs
v0.4	Quantized GGUF export for llama.cpp / mobile deployment
v1.0	Red-teaming, safety evaluation, and independent clinical validation

📄 Full Documentation

Full technical documentation, project background, international context, and references are available in the GitHub repository:

TECHNICAL.md — dataset rationale, architecture decision, training details, usage
BACKGROUND.md — project vision, SDG alignment, PAHO/WHO/ICRC institutional context, full references

⚖️ License

Released under the Apache 2.0 License, subject to the terms of the Gemma 4 base model license.

🙏 Acknowledgements

Unsloth — for the fine-tuning framework and Unsloth Studio
MedMCQA — for the open medical QA dataset
Google DeepMind — for the Gemma 4 model family

Downloads last month: 24

Dataset used to train albertoanalytics/pediatric-support-g4

Paper for albertoanalytics/pediatric-support-g4

MedGemma 1.5 Technical Report

Paper • 2604.05081 • Published Apr 6 • 15