cogito-671b-v2.1
37
21
671.0B
license:mit
by
deepcogito
Language Model
OTHER
671B params
New
37 downloads
Early-stage
Edge AI:
Mobile
Laptop
Server
1500GB+ RAM
Mobile
Laptop
Server
Quick Summary
AI model with specialized capabilities.
Device Compatibility
Mobile
4-6GB RAM
Laptop
16GB RAM
Server
GPU
Minimum Recommended
625GB+ RAM
Code Examples
With HuggingFace pipelinepythontransformers
import torch
from transformers import pipeline
model_id = "deepcogito/cogito-671b-v2.1"
pipe = pipeline("text-generation", model=model_id, model_kwargs={"dtype": "auto"}, device_map="auto")
messages = [
{"role": "system", "content": "Always respond in 1-2 words."},
{"role": "user", "content": "Who created you?"},
]
## without reasoning
outputs = pipe(messages, max_new_tokens=512, tokenizer_encode_kwargs={"enable_thinking": False})
print(outputs[0]["generated_text"][-1])
# {'role': 'assistant', 'content': 'Deep Cogito'}
## with reasoning
outputs = pipe(messages, max_new_tokens=512, tokenizer_encode_kwargs={"enable_thinking": True})
print(outputs[0]["generated_text"][-1])
# {'role': 'assistant', 'content': 'The question is asking about my creator. I know that I\'m Cogito, an AI assistant created by Deep Cogito, which is an AI research lab. The question is very direct and can be answered very briefly. Since the user has specified to always respond in 1-2 words, I should keep my answer extremely concise.\n\nThe most accurate 2-word answer would be "Deep Cogito" - this names the organization that created me without any unnecessary details. "Deep Cogito" is two words, so it fits the requirement perfectly.\n</think>\nDeep Cogito'}With HuggingFace AutoModelpythontransformers
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "deepcogito/cogito-671b-v2.1"
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype="auto", device_map="auto")
tokenizer = AutoTokenizer.from_pretrained(model_name)
messages = [
{"role": "system", "content": "Always respond in 1-2 words."},
{"role": "user", "content": "Who created you?"}
]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True,
enable_thinking=False,
)
# To enable reasoning, set `enable_thinking=True` above.
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
generated_ids = model.generate(**model_inputs, max_new_tokens=512)
generated_ids = [
output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(response)Tool Calling with HuggingFacepythontransformers
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "deepcogito/cogito-671b-v2.1"
model = AutoModelForCausalLM.from_pretrained(model_id, dtype="auto", device_map="auto")
tokenizer = AutoTokenizer.from_pretrained(model_id)
def get_current_temperature(location: str) -> float:
"""
Get the current temperature at a location.
Args:
location: The location to get the temperature for, in the format "City, Country"
Returns:
The current temperature at the specified location in the specified units, as a float.
"""
return 22.
def generate(messages):
global tokenizer, model
prompt = tokenizer.apply_chat_template(
messages,
tools=[get_current_temperature],
tokenize=False,
add_generation_prompt=True,
enable_thinking=False,
)
# To enable reasoning, set `enable_thinking=True` above.
model_inputs = tokenizer(prompt, return_tensors="pt", add_special_tokens=False).to(model.device)
generated_ids = model.generate(**model_inputs, max_new_tokens=512)
generated_ids = [output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)]
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
return response
messages = [{"role": "user", "content": "whats the temperature in Paris?"}]
response = generate(messages)Deploy This Model
Production-ready deployment in minutes
Together.ai
Instant API access to this model
Production-ready inference API. Start free, scale to millions.
Try Free APIReplicate
One-click model deployment
Run models in the cloud with simple API. No DevOps required.
Deploy NowDisclosure: We may earn a commission from these partners. This helps keep LLMYourWay free.