This is a Nemotron-Cascade-14B-Thinking fine-tune, produced through P-E-W's Heretic (v1.1.0) abliteration engine merged with the Magnitude-Preserving Orthogonal Ablation PR.

Note: I doubt that KLD, and, also, it behaves similar to gpt-oss, so may need to be coerced into producing desireable output. I wouldn't call it uncensored and was't planning to release it at all, but I needed to make room in the privvy storage.

Heretication Results

Score Metric Value Parameter Value
Refusals 33/100 direction_index 22.62
KL Divergence 0.0000 attn.o_proj.max_weight 1.93
Initial Refusals 99/100 attn.o_proj.max_weight_position 23.60
attn.o_proj.min_weight 1.38
attn.o_proj.min_weight_distance 18.32
mlp.down_proj.max_weight 1.81
mlp.down_proj.max_weight_position 24.52
mlp.down_proj.min_weight 1.08
mlp.down_proj.min_weight_distance 20.24

Degree of Heretication

The Heresy Index weighs the resulting model's corruption by the process (KL Divergence) and its abolition of doctrine (Refusals) for a final verdict in classification.

Index Entry Classification Analysis
Absolute Absolute Heresy Less than 10/100 Refusals and 0.10 KL Divergence
Tainted Tainted Heresy Around 25-11/100 Refusals and/or -0.20-0.11 KL Divergence
Impotent Impotent Heresy Anything above 25/100 Refusals and 0.21 KL Divergence

Note: This is an arbitrary classification inspired by Warhammer 40K, having no tangible indication towards the model's performance.


Nemotron-Cascade-14B-Thinking

Technical Report SFT Dataset RL Dataset Models

main_fig

Introduction

We're excited to introduce Nemotron-Cascade-14B-Thinking, a powerful general-purpose model trained through sequential and domain-wise reinforcement learning. Nemotron-Cascade-14B-Thinking is post-trained from the Qwen3-14B Base model, and it achieves best-in-class performance across a wide range of benchmarks. Different from Nemotron-Cascade-8B, Nemotron-Cascade-14B-Thinking is designed exclusively for the thinking mode.

Training Pipeline

train_pipeline_fig

The training pipeline for Nemotron-Cascade begins with a multi-stage SFT phase to equip the model with foundational skills. Subsequently, Cascade RL is applied across multiple domains to further enhance the model’s performance in these areas.

Notably, RLHF for alignment, when used as a pre-step, boosts the model’s complex reasoning ability far beyond mere preference optimization, and subsequent domain-wise RLVR stages rarely degrade the benchmark performance attained in earlier domains and may even improve it (see an illustration in the following Figure).

lcb_through_cascade_rl_fig
The LiveCodeBench v6 (08/24–05/25) performance of the Nemotron-Cascade-14B-Thinking model throughout the Cascade RL process.

Results

  • We evaluate our model against competitive reasoning models on a diverse set of benchmarks, covering general-knowledge reasoning, alignment and instruction following, mathematical reasoning, competitive programming, software engineering, and tool-use proficiency.
  • For Nemotron-Cascade models, we use a maximum generation length of 64K tokens and set the temperature to 0.6 and top-p to 0.95 for reasoning tasks.
  • Our Nemotron-Cascade-14B-Thinking achieves best-in-class performance across almost all benchmarks. Remarkably, Nemotron-Cascade-14B-Thinking surpasses DeepSeek-R1-0528 (671B) by a clear margin across all LCB v5, v6, and Pro benchmarks.
Benchmark
Metric: Pass@1
Qwen3-14B DeepSeek-R1-0528 671B Gemini-2.5-Flash-Thinking Nemotron-Cascade-14B-Thinking
Knowledge Reasoning
MMLU 84.9 89.9 - 85.1
MMLU Pro 77.6 85.0 81.9 77.0
GPQA-Diamond 64.0 81.0 82.8 69.6
Alignment
ArenaHard 91.7 95.1 95.7 89.5
IFEval (Strict Prompt) 85.4 84.1 89.8 81.9
IFBench 33.7 38.0 36.1 41.7
Math
AIME 2024 79.3 91.4 82.3 89.7
AIME 2025 70.4 87.5 72.0 83.3
Code
LCB v5 (08/24-02/25) 65.2 74.8 63.4 77.5
LCB v6 (08/24-05/25) 63.5 73.3 61.9 74.6
LCB Pro 25Q2 (Easy) 53.6 63.9 47.4 68.9
LCB Pro 25Q2 (Med) 2.6 7.0 1.8 10.5
SWE Verified (Agentless) 27.4 57.6 48.9 43.1
Tool Calling
BFCL V3 70.4 67.9 68.6 67.5

Usage Recommendations

For local deployment, we recommend setting the sampling parameters to temperature = 0.6, top_p = 0.95. We recommend using RoPE scaling with the YaRN method for better long-context support. This can be enabled by updating the model’s config.json as shown below:

  {
    ...,
    "rope_scaling": {
        "rope_type": "yarn",
        "factor": 2.0,
        "original_max_position_embeddings": 32768
    }
  }
  • Nemotron-Cascade-14B-Thinking: use factor: 3.0 to extend the context length to 90K tokens for SWE Verified (Agentless), and factor: 2.0 to extend the context length to 64K tokens for other benchmarks.
  • Nemotron-Cascade-8B and Nemotron-Cascade-8B-Thinking: use factor: 2.0 across all benchmarks.

Evaluation Tookit

To reproduce our results, please check evaluation code, scripts, cached prediction files in https://huggingface.co/nvidia/Nemotron-Cascade-14B-Thinking/blob/main/evaluation/README.md

Chat Template

Nemotron-Cascade-14B-Thinking follows the Qwen3-style ChatML template and is designed exclusively for the thinking mode. To align with the template used in Nemotron-Cascade-8B, the " /think" tag should be appended to the end of the user input. Note that a leading space is included in this tag to ensure correct tokenization.

To reduce the context length in a multi-turn conversation, we include only the final summary of the model’s output in the conversation history and change the user turn’s " /think" tag to " /no_think".

A brief example is shown below:

from transformers import AutoTokenizer

model_name = 'nvidia/Nemotron-Cascade-14B-Thinking'
tokenizer = AutoTokenizer.from_pretrained(model_name)

'''
single-turn example
'''
messages = [
    {"role": "user", "content": "calculate 1+1?"}
]

# only thinking mode is supported (enable_thinking=True)
prompt_thinking = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True, enable_thinking=True)
# prompt_thinking = '<|im_start|>system\nYou are a helpful and harmless assistant.<|im_end|>\n<|im_start|>user\ncalculate 1+1? /think<|im_end|>\n<|im_start|>assistant\n'


'''
multi-turn example
'''
messages = [
    {"role": "user", "content": "calculate 1+1?"},
    {"role": "assistant", "content": "<think>THINKING_CONTENT</think>\nTo calculate 1+11 + 1:\n\n1. **Identify the operation**: This is a basic addition problem involving two integers.\n2. **Perform the addition**:  \n   1+1=21 + 1 = 2.\n\n**Result**: boxed2\\boxed{2}",},
    {"role": "user", "content": "what about 2+2"}
]

# only thinking mode is supported (enable_thinking=True)
prompt_thinking = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True, enable_thinking=True)
# prompt_thinking = '<|im_start|>system\nYou are a helpful and harmless assistant.<|im_end|>\n<|im_start|>user\ncalculate 1+1? /no_think<|im_end|>\n<|im_start|>assistant\nTo calculate 1+11 + 1:\n\n1. **Identify the operation**: This is a basic addition problem involving two integers.\n2. **Perform the addition**:  \n   1+1=21 + 1 = 2.\n\n**Result**: boxed{2}\\boxed\{2\}<|im_end|>\n<|im_start|>user\nwhat about 2+2 /think<|im_end|>\n<|im_start|>assistant\n'

Release Date

Dec 08, 2025

License

Your use of this model is governed by the NVIDIA Open Model License.

Citation

@article{Nemotron_Cascade_Scaling_Cascaded_Reinforcement_Learning,
  title={Nemotron-Cascade: Scaling Cascaded Reinforcement Learning for General-Purpose Reasoning Models},
  author={Wang, Boxin and Lee, Chankyu and Lee, Nayeon and Lin, Sheng-Chieh and Dai, Wenliang and Chen, Yang and Chen, Yangyi and Yang, Zhuolin and Liu, Zihan and Shoeybi, Mohammad and Catanzaro, Bryan and Ping, Wei},
  year={2025}
}
Downloads last month
83
Safetensors
Model size
15B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for MuXodious/Nemotron-Cascade-14B-Thinking-impotent-heresy

Finetuned
(7)
this model
Quantizations
2 models

Collection including MuXodious/Nemotron-Cascade-14B-Thinking-impotent-heresy

Papers for MuXodious/Nemotron-Cascade-14B-Thinking-impotent-heresy