Instructions to use marin-community/marin-32b-base with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use marin-community/marin-32b-base with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="marin-community/marin-32b-base")

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("marin-community/marin-32b-base")
model = AutoModelForCausalLM.from_pretrained("marin-community/marin-32b-base")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use marin-community/marin-32b-base with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "marin-community/marin-32b-base"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "marin-community/marin-32b-base",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/marin-community/marin-32b-base

SGLang

How to use marin-community/marin-32b-base with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "marin-community/marin-32b-base" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "marin-community/marin-32b-base",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "marin-community/marin-32b-base" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "marin-community/marin-32b-base",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use marin-community/marin-32b-base with Docker Model Runner:
```
docker model run hf.co/marin-community/marin-32b-base
```

Recommended vLLM setting?

by davidheineman - opened Oct 30, 2025

Discussion

davidheineman

Oct 30, 2025

•

edited Oct 30, 2025

Hello! Congrats on the release!! Really excited to try the model.

Is there a recommended setup for vLLM? For example:

from vllm import LLM, SamplingParams

llm = LLM(model="marin-community/marin-32b-base")

prompts = [
    "We may have knowledge of the past but cannot control it; we may control the future but"
]

sampling_params = SamplingParams(
    temperature=0.7,
    top_p=0.9,
    max_tokens=128,
)

outputs = llm.generate(prompts, sampling_params)

for i, output in enumerate(outputs):
    print(prompts[i])
    print(output.outputs[0].text.strip())

I've tried the script, installing with:

# current vLLM
pip install vllm==0.11.0

# nightly vLLM
pip install -U vllm --pre --extra-index-url https://wheels.vllm.ai/nightly

In both cases, this script will fail with the error (note, I'm using the v1 engine):

...
(EngineCore_DP0 pid=11777) (Worker_TP0 pid=11783) ERROR 10-29 17:44:30 [multiproc_executor.py:631]   File "/oe-eval-default/davidh/marindebug/.venv/lib/python3.12/site-packages/vllm/model_executor/models/llama.py", line 503, in load_weights
(EngineCore_DP0 pid=11777) (Worker_TP0 pid=11783) ERROR 10-29 17:44:30 [multiproc_executor.py:631]     param = params_dict[name]
(EngineCore_DP0 pid=11777) (Worker_TP0 pid=11783) ERROR 10-29 17:44:30 [multiproc_executor.py:631]             ~~~~~~~~~~~^^^^^^
(EngineCore_DP0 pid=11777) (Worker_TP0 pid=11783) ERROR 10-29 17:44:30 [multiproc_executor.py:631] KeyError: 'layers.0.self_attn.k_norm.weight'

Finally, when trying the install with an earlier version vllm==0.9.0.1, I'm seeing this slightly different loading error:

2025-10-29T23:29:33.291Z (VllmWorker rank=0 pid=487) ERROR 10-29 23:29:33 [multiproc_executor.py:487]   File "/stage/src/vllm/vllm/model_executor/models/llama.py", line 601, in load_weights
2025-10-29T23:29:33.291Z (VllmWorker rank=0 pid=487) ERROR 10-29 23:29:33 [multiproc_executor.py:487]     return loader.load_weights(
2025-10-29T23:29:33.291Z (VllmWorker rank=0 pid=487) ERROR 10-29 23:29:33 [multiproc_executor.py:487]            ^^^^^^^^^^^^^^^^^^^^
2025-10-29T23:29:33.291Z (VllmWorker rank=0 pid=487) ERROR 10-29 23:29:33 [multiproc_executor.py:487]   File "/stage/src/vllm/vllm/model_executor/models/utils.py", line 291, in load_weights
2025-10-29T23:29:33.291Z (VllmWorker rank=0 pid=487) ERROR 10-29 23:29:33 [multiproc_executor.py:487]     autoloaded_weights = set(self._load_module("", self.module, weights))
2025-10-29T23:29:33.291Z (VllmWorker rank=0 pid=487) ERROR 10-29 23:29:33 [multiproc_executor.py:487]                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-10-29T23:29:33.291Z (VllmWorker rank=0 pid=487) ERROR 10-29 23:29:33 [multiproc_executor.py:487]   File "/stage/src/vllm/vllm/model_executor/models/utils.py", line 249, in _load_module
2025-10-29T23:29:33.291Z (VllmWorker rank=0 pid=487) ERROR 10-29 23:29:33 [multiproc_executor.py:487]     yield from self._load_module(prefix,
2025-10-29T23:29:33.291Z (VllmWorker rank=0 pid=487) ERROR 10-29 23:29:33 [multiproc_executor.py:487]   File "/stage/src/vllm/vllm/model_executor/models/utils.py", line 222, in _load_module
2025-10-29T23:29:33.291Z (VllmWorker rank=0 pid=487) ERROR 10-29 23:29:33 [multiproc_executor.py:487]     loaded_params = module_load_weights(weights)
2025-10-29T23:29:33.291Z (VllmWorker rank=0 pid=487) ERROR 10-29 23:29:33 [multiproc_executor.py:487]                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-10-29T23:29:33.291Z (VllmWorker rank=0 pid=487) ERROR 10-29 23:29:33 [multiproc_executor.py:487]   File "/stage/src/vllm/vllm/model_executor/models/llama.py", line 465, in load_weights
2025-10-29T23:29:33.291Z (VllmWorker rank=0 pid=487) ERROR 10-29 23:29:33 [multiproc_executor.py:487]     param = params_dict[name]
2025-10-29T23:29:33.291Z (VllmWorker rank=0 pid=487) ERROR 10-29 23:29:33 [multiproc_executor.py:487]             ~~~~~~~~~~~^^^^^^
2025-10-29T23:29:33.291Z (VllmWorker rank=0 pid=487) ERROR 10-29 23:29:33 [multiproc_executor.py:487] KeyError: 'layers.60.self_attn.k_norm.weight'

Edit: I also see the same problem with nightly transformers 5.0.0 (which I installed using this), transformers==4.57.1 (current main) and transformers==4.55.4 (listed in the config.json):

git clone https://github.com/huggingface/transformers.git
cd transformers

# install nightly transformers
pip install '.[torch]'

# install other versions
pip install transformers==4.55.4
pip install transformers==4.57.1

WillHeld

The Marin Project org Oct 30, 2025

•

edited Oct 30, 2025

Hi @davidheineman !

Sorry for that, our export logic still had "LlamaForCausalLM" in the config.json (since we use https://github.com/marin-community/levanter for our evals so we can use TPUs, I had missed this). Trying on the newest revision, which correctly indicates to HF to load a Qwen3 architecture, should correct this! I'll test on one of the GPU machines I have access to now to double check, but let me know if you hit further issues.

davidheineman

Oct 30, 2025

Everything is working on my end now. Thanks Will!!

Marin gave a great quote from Pulp Fiction in response to that prompt

We may have knowledge of the past but cannot control it; we may control the future but
have no knowledge of it. Now we control the present but neither know nor control the future. (Tertullian, 2nd Century)

The path of the righteous man is beset on all sides by the inequities of the selfish and the tyranny of evil men. Blessed is he who, in the name of charity and goodwill, shepherds the weak through the valley of darkness, for he is truly his brother's keeper and the finder of lost children. And I will strike down upon thee with great vengeance and furious anger those who attempt to poison and destroy my brothers. And you will know I am the Lord when

davidheineman changed discussion status to closed Oct 30, 2025

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment