Instructions to use albertoanalytics/pediatric-support-g4 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use albertoanalytics/pediatric-support-g4 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="albertoanalytics/pediatric-support-g4") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoProcessor, AutoModelForMultimodalLM processor = AutoProcessor.from_pretrained("albertoanalytics/pediatric-support-g4") model = AutoModelForMultimodalLM.from_pretrained("albertoanalytics/pediatric-support-g4") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] inputs = processor.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - PEFT
How to use albertoanalytics/pediatric-support-g4 with PEFT:
Task type is invalid.
- Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use albertoanalytics/pediatric-support-g4 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "albertoanalytics/pediatric-support-g4" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "albertoanalytics/pediatric-support-g4", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/albertoanalytics/pediatric-support-g4
- SGLang
How to use albertoanalytics/pediatric-support-g4 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "albertoanalytics/pediatric-support-g4" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "albertoanalytics/pediatric-support-g4", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "albertoanalytics/pediatric-support-g4" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "albertoanalytics/pediatric-support-g4", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use albertoanalytics/pediatric-support-g4 with Docker Model Runner:
docker model run hf.co/albertoanalytics/pediatric-support-g4
🩺 pediatric-support-g4 — v0.1 Pipeline Validation
A QLoRA adapter for Gemma 4 E4B, representing the first iteration of a research initiative to build offline-capable, locally-running LLMs for pediatric clinical decision support in resource-constrained clinical environments.
⚠️ Status and Intended Use
This is the first iteration of the project. Its sole purpose is to validate that the training pipeline, architecture, and hyperparameter configuration are stable and ready for scaled training.
This is NOT a medical device. It has not been validated for clinical use. It has not been benchmarked for diagnostic accuracy. Do not use in any patient-facing context. All outputs must be reviewed by a qualified healthcare professional. The authors accept no liability for decisions made based on model outputs.
This adapter is released for research and development purposes only. Its intended downstream use is as a foundation for a future, independently validated clinical decision support tool for tropical and endemic pediatric diseases in remote, offline clinical settings in the Americas.
Out-of-scope use:
- Clinical diagnosis or treatment decisions of any kind
- Any patient-facing application
- General medical question answering in production settings
- Use without a qualified healthcare professional reviewing all outputs
🚀 Quick Start
This is a LoRA adapter — it must be loaded alongside its base model.
Installation
pip install transformers peft bitsandbytes accelerate
Loading the Model
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
base_model_id = "unsloth/gemma-4-e4b-it-unsloth-bnb-4bit"
adapter_id = "albertoanalytics/pediatric-support-g4-v1"
tokenizer = AutoTokenizer.from_pretrained(base_model_id)
model = AutoModelForCausalLM.from_pretrained(
base_model_id,
load_in_4bit=True,
device_map="auto",
)
model = PeftModel.from_pretrained(model, adapter_id)
model.eval()
Inference Example
import torch
prompt = """You are a knowledgeable pediatric medicine assistant.
A 3-year-old presents with a barking cough, stridor at rest, and low-grade fever.
What is the most likely diagnosis and recommended first-line management?"""
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
with torch.no_grad():
outputs = model.generate(
**inputs,
max_new_tokens=512,
temperature=0.7,
do_sample=True,
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
🏗️ Model Details
| Property | Value |
|---|---|
| Base Model | unsloth/gemma-4-e4b-it-unsloth-bnb-4bit |
| Architecture | Gemma 4 E4B |
| Adapter Method | QLoRA (Quantized Low-Rank Adaptation) |
| LoRA Rank | 16 |
| LoRA Alpha | 32 |
| LoRA Dropout | 0.05 |
| Training Dataset | MedMCQA — pediatric subjects subset |
| Epochs | 3 |
| Total Steps | 375 |
| Final Loss | 0.6220 |
| Final Grad Norm | 0.472 |
| Total Tokens Seen | ~2,587,816 |
| Training Duration | 1h 30m 24s |
| Training Framework | Unsloth Studio |
Why Gemma 4 E4B
Gemma 4 E4B was chosen over MedGemma 1.5 4B (arXiv:2604.05081v2) for three reasons specific to this project's deployment requirements:
- Thinking mode — extended chain-of-thought reasoning allows the clinician to follow and evaluate the model's reasoning process, not just receive an opaque conclusion. MedGemma 1.5 4B activates thinking via a prompted system instruction appended at inference time — it is not natively integrated into the architecture. Gemma 4 E4B, by contrast, controls thinking via a dedicated <|think|> token built into the model from the ground up, making it a first-class architectural capability rather than a prompted behaviour.
- Mobile-first deployment — the E4B model is purpose-built for efficient local execution on smartphones. MedGemma 1.5 4B makes no equivalent claim about mobile optimisation, and its expanded capabilities — processing 3D CT/MRI volumes of up to 85 axial slices (21,760 vision tokens) and whole slide pathology images of up to 126 patches (32,256 vision tokens) per query (arXiv:2604.05081v2) — might not be fully leveraged on a smartphone, especially in remote and isolated field settings. Google's own recommended production deployment path for MedGemma 1.5 4B points explicitly to cloud infrastructure: Model Garden and Google Cloud Storage, with specialised server-side processing for large medical images. Gemma 4 E4B, by contrast, was explicitly designed for efficient execution on everyday devices such as smartphones.
- No meaningful head start for this clinical scope — MedGemma 1.5 4B's medical pre-training reflects hospital-grade diagnostics (chest X-ray, 3D radiology, whole slide pathology, dermoscopy, ophthalmology). Conditions such as cutaneous leishmaniasis, severe dengue, Chagas disease, and Oropouche fever in children are not present in that training distribution. Both models require targeted fine-tuning for this scope; given that, Gemma 4's newer architecture with native reasoning and mobile optimisation is the stronger foundation.
Training Dynamics
🗺️ Roadmap
| Version | Scope |
|---|---|
| ✅ v0.1 | Pipeline validation using MedMCQA pediatric subset |
| v0.2 | Fine-tuning on Spanish-language Latin American clinical datasets (e.g. PeruMedQA as an Andean starting point) + expansion toward pan-regional tropical and endemic pediatric disease coverage + first accuracy benchmarks on smartphone inference |
| v0.3 | Expanded clinical coverage + clinical expert review of outputs |
| v0.4 | Quantized GGUF export for llama.cpp / mobile deployment |
| v1.0 | Red-teaming, safety evaluation, and independent clinical validation |
📄 Full Documentation
Full technical documentation, project background, international context, and references are available in the GitHub repository:
TECHNICAL.md— dataset rationale, architecture decision, training details, usageBACKGROUND.md— project vision, SDG alignment, PAHO/WHO/ICRC institutional context, full references
⚖️ License
Released under the Apache 2.0 License, subject to the terms of the Gemma 4 base model license.
🙏 Acknowledgements
- Unsloth — for the fine-tuning framework and Unsloth Studio
- MedMCQA — for the open medical QA dataset
- Google DeepMind — for the Gemma 4 model family
- Downloads last month
- 24

