Instructions to use Qwen/Qwen3.6-27B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Qwen/Qwen3.6-27B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="Qwen/Qwen3.6-27B")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForImageTextToText

processor = AutoProcessor.from_pretrained("Qwen/Qwen3.6-27B")
model = AutoModelForImageTextToText.from_pretrained("Qwen/Qwen3.6-27B")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Inference
HuggingChat
Notebooks
Google Colab
Kaggle
AMD Developer Cloud
Local Apps Settings

vLLM

How to use Qwen/Qwen3.6-27B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Qwen/Qwen3.6-27B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Qwen/Qwen3.6-27B",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/Qwen/Qwen3.6-27B

SGLang

How to use Qwen/Qwen3.6-27B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Qwen/Qwen3.6-27B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Qwen/Qwen3.6-27B",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Qwen/Qwen3.6-27B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Qwen/Qwen3.6-27B",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use Qwen/Qwen3.6-27B with Docker Model Runner:
```
docker model run hf.co/Qwen/Qwen3.6-27B
```

Where can I try this?

#21

by mindplay - opened Apr 26

Discussion

mindplay

Apr 26

The model looks really good, but I don't have the hardware to run it - I can't find any inference providers, neither here on HF or on OpenRouter - Qwen themselves don't even seem to have this hosted and available?

What type of GPU would I need to rent to self-host this? The full quality model, not quantized.

kth8

Apr 26

2 RTX 5090s will be able to fit the full bf16 version.

EloyOn

Apr 27

The model looks really good, but I don't have the hardware to run it - I can't find any inference providers, neither here on HF or on OpenRouter - Qwen themselves don't even seem to have this hosted and available?

What type of GPU would I need to rent to self-host this? The full quality model, not quantized.

FP8 is good enough, which only needs half the GPU you'd need for the full FP16.

kth8

Apr 27

FP8 version is 55% the size of BF16 so you would still need 2 RTX 4090s to run that which is not much cheaper to rent than 5090s

kth8

Apr 27

3.6 27B just got added to Openrouter https://openrouter.ai/qwen/qwen3.6-27b

mindplay

Apr 29

3.6 27B just got added to Openrouter https://openrouter.ai/qwen/qwen3.6-27b

sadly, no caching from anyone hosting the full model - including Alibaba

$3.60/M output seems expensive for such a small model?

Smorty100

May 3

you can try the model on https://chat.qwen.ai

mindplay

27 days ago

It has a few providers on openrouter now.

The main attraction of this model is probably local inference though?

When it comes to cloud inference, it's priced like e.g. Kimi K2.6, a 1T model, so it doesn't really seem to make a lot of sense as a cloud model, does it?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment