Nemotron-Cascade-14B-Thinking

2.6K
45
by
nvidia
Language Model
OTHER
14B params
New
3K downloads
Early-stage
Edge AI:
Mobile
Laptop
Server
32GB+ RAM
Mobile
Laptop
Server
Quick Summary

AI model with specialized capabilities.

Device Compatibility

Mobile
4-6GB RAM
Laptop
16GB RAM
Server
GPU
Minimum Recommended
14GB+ RAM

Code Examples

only thinking mode is supported (enable_thinking=True)pythontransformers
from transformers import AutoTokenizer

model_name = 'nvidia/Nemotron-Cascade-14B-Thinking'
tokenizer = AutoTokenizer.from_pretrained(model_name)

'''
single-turn example
'''
messages = [
    {"role": "user", "content": "calculate 1+1?"}
]

# only thinking mode is supported (enable_thinking=True)
prompt_thinking = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True, enable_thinking=True)
# prompt_thinking = '<|im_start|>system\nYou are a helpful and harmless assistant.<|im_end|>\n<|im_start|>user\ncalculate 1+1? /think<|im_end|>\n<|im_start|>assistant\n'


'''
multi-turn example
'''
messages = [
    {"role": "user", "content": "calculate 1+1?"},
    {"role": "assistant", "content": "<think>THINKING_CONTENT</think>\nTo calculate \\(1 + 1\\):\n\n1. **Identify the operation**: This is a basic addition problem involving two integers.\n2. **Perform the addition**:  \n   \\(1 + 1 = 2\\).\n\n**Result**: \\(\\boxed{2}\\)",},
    {"role": "user", "content": "what about 2+2"}
]

# only thinking mode is supported (enable_thinking=True)
prompt_thinking = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True, enable_thinking=True)
# prompt_thinking = '<|im_start|>system\nYou are a helpful and harmless assistant.<|im_end|>\n<|im_start|>user\ncalculate 1+1? /no_think<|im_end|>\n<|im_start|>assistant\nTo calculate \\(1 + 1\\):\n\n1. **Identify the operation**: This is a basic addition problem involving two integers.\n2. **Perform the addition**:  \n   \\(1 + 1 = 2\\).\n\n**Result**: \\(\\boxed\{2\}\\)<|im_end|>\n<|im_start|>user\nwhat about 2+2 /think<|im_end|>\n<|im_start|>assistant\n'

Deploy This Model

Production-ready deployment in minutes

Together.ai

Instant API access to this model

Fastest API

Production-ready inference API. Start free, scale to millions.

Try Free API

Replicate

One-click model deployment

Easiest Setup

Run models in the cloud with simple API. No DevOps required.

Deploy Now

Disclosure: We may earn a commission from these partners. This helps keep LLMYourWay free.