Instructions to use QuantTrio/Step3-VL-10B-AWQ with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use QuantTrio/Step3-VL-10B-AWQ with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="QuantTrio/Step3-VL-10B-AWQ", trust_remote_code=True)
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoModelForSeq2SeqLM
model = AutoModelForSeq2SeqLM.from_pretrained("QuantTrio/Step3-VL-10B-AWQ", trust_remote_code=True, dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use QuantTrio/Step3-VL-10B-AWQ with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "QuantTrio/Step3-VL-10B-AWQ"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "QuantTrio/Step3-VL-10B-AWQ",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/QuantTrio/Step3-VL-10B-AWQ

SGLang

How to use QuantTrio/Step3-VL-10B-AWQ with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "QuantTrio/Step3-VL-10B-AWQ" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "QuantTrio/Step3-VL-10B-AWQ",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "QuantTrio/Step3-VL-10B-AWQ" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "QuantTrio/Step3-VL-10B-AWQ",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use QuantTrio/Step3-VL-10B-AWQ with Docker Model Runner:
```
docker model run hf.co/QuantTrio/Step3-VL-10B-AWQ
```

After deploying locally, I keep encountering errors when running the examples. Is there any solution

by AndyLeaf666 - opened Jan 26

Discussion

AndyLeaf666

Jan 26

(.venv_step3) D:\qr-code>d:/qr-code/.venv_step3/Scripts/python.exe d:/qr-code/test6.py
The tokenizer you are loading from 'D:\huggingface\Step3-VL-10B-AWQ' with an incorrect regex pattern: https://huggingface.co/mistralai/Mistral-Small-3.1-24B-Instruct-2503/discussions/84#69121093e8b480e709447d5e. This will lead to incorrect tokenization. You should set the fix_mistral_regex=True flag when loading this tokenizer to fix this issue.
Encountered exception while importing configuration_step_vl: No module named 'configuration_step_vl'
Encountered exception while importing vision_encoder: No module named 'vision_encoder'
Traceback (most recent call last):
File "d:\qr-code\test6.py", line 24, in
model = AutoModelForCausalLM.from_pretrained(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "d:\qr-code.venv_step3\Lib\site-packages\transformers\models\auto\auto_factory.py", line 586, in from_pretrained
model_class = get_class_from_dynamic_module(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "d:\qr-code.venv_step3\Lib\site-packages\transformers\dynamic_module_utils.py", line 604, in get_class_from_dynamic_module
final_module = get_cached_module_file(
^^^^^^^^^^^^^^^^^^^^^^^
File "d:\qr-code.venv_step3\Lib\site-packages\transformers\dynamic_module_utils.py", line 427, in get_cached_module_file
modules_needed = check_imports(resolved_module_file)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "d:\qr-code.venv_step3\Lib\site-packages\transformers\dynamic_module_utils.py", line 260, in check_imports
raise ImportError(
ImportError: This modeling file requires the following packages that were not found in your environment: configuration_step_vl, vision_encoder. Run pip install configuration_step_vl vision_encoder

JunHowie

QuantTrio org Jan 27

Based on the logs, I believe that the current version of Transformers does not yet support configuration_step_vl. In addition, all of our quantized models have only been tested on Linux systems using the vLLM inference engine. The Windows environment and the Transformers backend have not undergone comprehensive testing.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment