Zamba2-2.7B-instruct

24
83
2.7B
license:apache-2.0
by
Zyphra
Language Model
OTHER
2.7B params
New
24 downloads
Early-stage
Edge AI:
Mobile
Laptop
Server
7GB+ RAM
Mobile
Laptop
Server
Quick Summary

Zamba2-2.7B-Instruct is obtained from Zamba2-2.7B by fine-tuning on instruction-following and chat datasets. Specifically: 1. SFT of the base Zamba2-2.7B model...

Device Compatibility

Mobile
4-6GB RAM
Laptop
16GB RAM
Server
GPU
Minimum Recommended
3GB+ RAM

Code Examples

Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))
Inferencepythontransformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Instantiate model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B-instruct")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B-instruct", device_map="cuda", torch_dtype=torch.bfloat16)

# Format the input as a chat template
user_turn_1 = "In one season a flower blooms three times. In one year, there is one blooming season. How many times do two flowers bloom in two years? Please include your logic."
assistant_turn_1 = "In one season, a flower blooms three times. In one year, there is one blooming season. Therefore, in two years, there are two blooming seasons. Since each flower blooms three times in one season, in two blooming seasons, each flower will bloom six times. Since there are two flowers, the total number of times they will bloom in two years is 12."
user_turn_2 = "How many times do the two flowers blossom in three years?"
sample = [{'role': 'user', 'content': user_turn_1}, {'role': 'assistant', 'content': assistant_turn_1}, {'role': 'user', 'content': user_turn_2}]
chat_sample = tokenizer.apply_chat_template(sample, tokenize=False)

# Tokenize input and generate output
input_ids = tokenizer(chat_sample, return_tensors='pt', add_special_tokens=False).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print((tokenizer.decode(outputs[0])))

Deploy This Model

Production-ready deployment in minutes

Together.ai

Instant API access to this model

Fastest API

Production-ready inference API. Start free, scale to millions.

Try Free API

Replicate

One-click model deployment

Easiest Setup

Run models in the cloud with simple API. No DevOps required.

Deploy Now

Disclosure: We may earn a commission from these partners. This helps keep LLMYourWay free.