[Bug] Chat template redacting reasoning traces for non-tool related use case

#24
by vibhas09 - opened

I’m seeing that the current chat template does not render reasoning (reasoning / reasoning_content) for standard assistant messages unless tool_calls are present.

This makes it unclear whether this is intended behavior or a limitation/bug in the template logic.

Minimal Repro:

messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Write a short joke about saving RAM."},
    {
        "role": "assistant",
        "reasoning": "Need a short memory pun.",
        "content": "I wrote a RAM joke, but I forgot to cache it."
    },
]

print(tokenizer.apply_chat_template(
    conversation=messages,
    tokenize=False,
    add_generation_prompt=False,
    enable_thinking=True
))

Observed Behavior:
Output does not include the reasoning:

<bos><|turn>system
<|think|> 
You are a helpful assistant.
<turn|> 
<|turn>user Write a short joke about saving RAM.
<turn|> 
<|turn>model I wrote a RAM joke, but I forgot to cache it.<turn|>

Expected Behavior:

If reasoning (or reasoning_content) is present, I would expect it to be rendered, e.g.:

<bos><|turn>system
<|think|>
You are a helpful assistant.<turn|>
<|turn>user
Write a short joke about saving RAM.<turn|>
<|turn>model
<|channel>thought
Need a short memory pun.
<channel|>
I wrote a RAM joke, but I forgot to cache it.<turn|>

This bevaviour is however common in HF as well as unsloth.

Relevant Template Logic:

{%- if thinking_text and loop.index0 > ns_turn.last_user_idx and message.get('tool_calls') -%}

This effectively prevents reasoning from being emitted for:

non-tool-call assistant messages
standard chat use cases

Questions:

Is reasoning intentionally restricted to tool-call flows?
Should enable_thinking=True also enable reasoning rendering for normal assistant messages?
Would it be acceptable to relax the condition to something like:

{%- if role == 'model' and thinking_text -%}

or gate it via enable_thinking instead?

Why this matters:

Limits usability of reasoning for standard chat scenarios
Makes enable_thinking behavior unclear/incomplete
Prevents consistent debugging/inspection of model reasoning

Sign up or log in to comment