Instructions to use inclusionAI/Ring-1T with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use inclusionAI/Ring-1T with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="inclusionAI/Ring-1T", trust_remote_code=True) messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoModelForCausalLM model = AutoModelForCausalLM.from_pretrained("inclusionAI/Ring-1T", trust_remote_code=True, dtype="auto") - Inference
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use inclusionAI/Ring-1T with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "inclusionAI/Ring-1T" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "inclusionAI/Ring-1T", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/inclusionAI/Ring-1T
- SGLang
How to use inclusionAI/Ring-1T with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "inclusionAI/Ring-1T" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "inclusionAI/Ring-1T", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "inclusionAI/Ring-1T" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "inclusionAI/Ring-1T", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use inclusionAI/Ring-1T with Docker Model Runner:
docker model run hf.co/inclusionAI/Ring-1T
Responses of Ring-1T available on zenmux.ai often end abruptly after 14-16k tokens without generating complete answer
During my experiments with the model on zenmux.ai I can't get it to generate a full 32k range of output tokens. Usually it abruptly ends its reasoning without producing the final answer ("content" is empty) after 15-16k tokens like in the following stream:
...
data: {"id":"02e50ca72fa24fb68609c507bdfd14e9","model":"inclusionai/ring-1t","choices":[{"delta":{"content":"","role":"assistant","reasoning":" includes"},"index":0}],"created":1764176060,"object":"chat.completion.chunk"}
data: {"id":"02e50ca72fa24fb68609c507bdfd14e9","model":"inclusionai/ring-1t","choices":[{"delta":{"content":"","role":"assistant","reasoning":" all"},"index":0}],"created":1764176060,"object":"chat.completion.chunk"}
data: {"id":"02e50ca72fa24fb68609c507bdfd14e9","model":"inclusionai/ring-1t","choices":[{"delta":{"content":"","role":"assistant","reasoning":" these"},"index":0}],"created":1764176060,"object":"chat.completion.chunk"}
data: {"id":"02e50ca72fa24fb68609c507bdfd14e9","model":"inclusionai/ring-1t","choices":[{"delta":{"content":"","role":"assistant","reasoning":" people"},"index":0}],"created":1764176060,"object":"chat.completion.chunk"}
data: {"id":"02e50ca72fa24fb68609c507bdfd14e9","model":"inclusionai/ring-1t","choices":[{"delta":{"content":"","role":"assistant","reasoning":" from"},"index":0}],"created":1764176060,"object":"chat.completion.chunk"}
data: [DONE]
I don't understand why it responds with [DONE] when it clearly hasn't finished its reasoning yet. In zenmux.ai logs I can't find any request that exceeded 16k output tokens. Sometimes it starts generating an answer but also ends abruptly like this:
data: {"id":"31572c3b473a427ebc0495a8f857a637","model":"inclusionai/ring-1t","choices":[{"delta":{"content":" Isabella","role":"assistant"},"index":0}],"created":1764184475,"object":"chat.completion.chunk"}
data: {"id":"31572c3b473a427ebc0495a8f857a637","model":"inclusionai/ring-1t","choices":[{"delta":{"content":"'s","role":"assistant"},"index":0}],"created":1764184475,"object":"chat.completion.chunk"}
data: {"id":"31572c3b473a427ebc0495a8f857a637","model":"inclusionai/ring-1t","choices":[{"delta":{"content":" ancestor","role":"assistant"},"index":0}],"created":1764184475,"object":"chat.completion.chunk"}
data: {"id":"31572c3b473a427ebc0495a8f857a637","model":"inclusionai/ring-1t","choices":[{"delta":{"content":" (","role":"assistant"},"index":0}],"created":1764184475,"object":"chat.completion.chunk"}
data: {"id":"31572c3b473a427ebc0495a8f857a637","model":"inclusionai/ring-1t","choices":[{"delta":{"content":"Angela","role":"assistant"},"index":0}],"created":1764184475,"object":"chat.completion.chunk"}
data: [DONE]
What are the recommended request settings to use the whole available 32k tokens model output range? I tried settings like "reasoning": {"effort":"high"}, did not help. Setting max_tokens in reasoningalso doesn't seem to help. Note that I have max_completion_tokens set to 32000.
Update: it seems that there is some kind of timeout in zenmux.ai or the model provider that results in the generation being abruptly terminated after exactly 10 minutes (600 seconds). Since the model token generation rate is about 25 t/s, the model can only generate about 15k tokens, not the full 32k tokens.