Instructions to use marin-community/marin-32b-base with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use marin-community/marin-32b-base with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="marin-community/marin-32b-base")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("marin-community/marin-32b-base") model = AutoModelForCausalLM.from_pretrained("marin-community/marin-32b-base") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use marin-community/marin-32b-base with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "marin-community/marin-32b-base" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "marin-community/marin-32b-base", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/marin-community/marin-32b-base
- SGLang
How to use marin-community/marin-32b-base with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "marin-community/marin-32b-base" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "marin-community/marin-32b-base", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "marin-community/marin-32b-base" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "marin-community/marin-32b-base", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use marin-community/marin-32b-base with Docker Model Runner:
docker model run hf.co/marin-community/marin-32b-base
Recommended vLLM setting?
Hello! Congrats on the release!! Really excited to try the model.
Is there a recommended setup for vLLM? For example:
from vllm import LLM, SamplingParams
llm = LLM(model="marin-community/marin-32b-base")
prompts = [
"We may have knowledge of the past but cannot control it; we may control the future but"
]
sampling_params = SamplingParams(
temperature=0.7,
top_p=0.9,
max_tokens=128,
)
outputs = llm.generate(prompts, sampling_params)
for i, output in enumerate(outputs):
print(prompts[i])
print(output.outputs[0].text.strip())
I've tried the script, installing with:
# current vLLM
pip install vllm==0.11.0
# nightly vLLM
pip install -U vllm --pre --extra-index-url https://wheels.vllm.ai/nightly
In both cases, this script will fail with the error (note, I'm using the v1 engine):
...
(EngineCore_DP0 pid=11777) (Worker_TP0 pid=11783) ERROR 10-29 17:44:30 [multiproc_executor.py:631] File "/oe-eval-default/davidh/marindebug/.venv/lib/python3.12/site-packages/vllm/model_executor/models/llama.py", line 503, in load_weights
(EngineCore_DP0 pid=11777) (Worker_TP0 pid=11783) ERROR 10-29 17:44:30 [multiproc_executor.py:631] param = params_dict[name]
(EngineCore_DP0 pid=11777) (Worker_TP0 pid=11783) ERROR 10-29 17:44:30 [multiproc_executor.py:631] ~~~~~~~~~~~^^^^^^
(EngineCore_DP0 pid=11777) (Worker_TP0 pid=11783) ERROR 10-29 17:44:30 [multiproc_executor.py:631] KeyError: 'layers.0.self_attn.k_norm.weight'
Finally, when trying the install with an earlier version vllm==0.9.0.1, I'm seeing this slightly different loading error:
2025-10-29T23:29:33.291Z (VllmWorker rank=0 pid=487) ERROR 10-29 23:29:33 [multiproc_executor.py:487] File "/stage/src/vllm/vllm/model_executor/models/llama.py", line 601, in load_weights
2025-10-29T23:29:33.291Z (VllmWorker rank=0 pid=487) ERROR 10-29 23:29:33 [multiproc_executor.py:487] return loader.load_weights(
2025-10-29T23:29:33.291Z (VllmWorker rank=0 pid=487) ERROR 10-29 23:29:33 [multiproc_executor.py:487] ^^^^^^^^^^^^^^^^^^^^
2025-10-29T23:29:33.291Z (VllmWorker rank=0 pid=487) ERROR 10-29 23:29:33 [multiproc_executor.py:487] File "/stage/src/vllm/vllm/model_executor/models/utils.py", line 291, in load_weights
2025-10-29T23:29:33.291Z (VllmWorker rank=0 pid=487) ERROR 10-29 23:29:33 [multiproc_executor.py:487] autoloaded_weights = set(self._load_module("", self.module, weights))
2025-10-29T23:29:33.291Z (VllmWorker rank=0 pid=487) ERROR 10-29 23:29:33 [multiproc_executor.py:487] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-10-29T23:29:33.291Z (VllmWorker rank=0 pid=487) ERROR 10-29 23:29:33 [multiproc_executor.py:487] File "/stage/src/vllm/vllm/model_executor/models/utils.py", line 249, in _load_module
2025-10-29T23:29:33.291Z (VllmWorker rank=0 pid=487) ERROR 10-29 23:29:33 [multiproc_executor.py:487] yield from self._load_module(prefix,
2025-10-29T23:29:33.291Z (VllmWorker rank=0 pid=487) ERROR 10-29 23:29:33 [multiproc_executor.py:487] File "/stage/src/vllm/vllm/model_executor/models/utils.py", line 222, in _load_module
2025-10-29T23:29:33.291Z (VllmWorker rank=0 pid=487) ERROR 10-29 23:29:33 [multiproc_executor.py:487] loaded_params = module_load_weights(weights)
2025-10-29T23:29:33.291Z (VllmWorker rank=0 pid=487) ERROR 10-29 23:29:33 [multiproc_executor.py:487] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-10-29T23:29:33.291Z (VllmWorker rank=0 pid=487) ERROR 10-29 23:29:33 [multiproc_executor.py:487] File "/stage/src/vllm/vllm/model_executor/models/llama.py", line 465, in load_weights
2025-10-29T23:29:33.291Z (VllmWorker rank=0 pid=487) ERROR 10-29 23:29:33 [multiproc_executor.py:487] param = params_dict[name]
2025-10-29T23:29:33.291Z (VllmWorker rank=0 pid=487) ERROR 10-29 23:29:33 [multiproc_executor.py:487] ~~~~~~~~~~~^^^^^^
2025-10-29T23:29:33.291Z (VllmWorker rank=0 pid=487) ERROR 10-29 23:29:33 [multiproc_executor.py:487] KeyError: 'layers.60.self_attn.k_norm.weight'
Edit: I also see the same problem with nightly transformers 5.0.0 (which I installed using this), transformers==4.57.1 (current main) and transformers==4.55.4 (listed in the config.json):
git clone https://github.com/huggingface/transformers.git
cd transformers
# install nightly transformers
pip install '.[torch]'
# install other versions
pip install transformers==4.55.4
pip install transformers==4.57.1
Hi @davidheineman !
Sorry for that, our export logic still had "LlamaForCausalLM" in the config.json (since we use https://github.com/marin-community/levanter for our evals so we can use TPUs, I had missed this). Trying on the newest revision, which correctly indicates to HF to load a Qwen3 architecture, should correct this! I'll test on one of the GPU machines I have access to now to double check, but let me know if you hit further issues.
Everything is working on my end now. Thanks Will!!
Marin gave a great quote from Pulp Fiction in response to that prompt
We may have knowledge of the past but cannot control it; we may control the future but
have no knowledge of it. Now we control the present but neither know nor control the future. (Tertullian, 2nd Century)
The path of the righteous man is beset on all sides by the inequities of the selfish and the tyranny of evil men. Blessed is he who, in the name of charity and goodwill, shepherds the weak through the valley of darkness, for he is truly his brother's keeper and the finder of lost children. And I will strike down upon thee with great vengeance and furious anger those who attempt to poison and destroy my brothers. And you will know I am the Lord when