cogito-671b-v2.1

Name: cogito-671b-v2.1
Author: deepcogito

671.0B

license:mit

deepcogito

Language Model

OTHER

671B params

New

37 downloads

Early-stage

Try on Hugging Face Add to Compare

Edge AI:

Mobile

Laptop

Server

1500GB+ RAM

Mobile

Laptop

Server

Quick Summary

AI model with specialized capabilities.

Device Compatibility

Mobile

4-6GB RAM

Laptop

16GB RAM

Server

GPU

Minimum Recommended

625GB+ RAM

Code Examples

With HuggingFace pipelinepythontransformers

import torch
from transformers import pipeline

model_id = "deepcogito/cogito-671b-v2.1"
pipe = pipeline("text-generation", model=model_id, model_kwargs={"dtype": "auto"}, device_map="auto")

messages = [
    {"role": "system", "content": "Always respond in 1-2 words."},
    {"role": "user", "content": "Who created you?"},
]

## without reasoning
outputs = pipe(messages, max_new_tokens=512, tokenizer_encode_kwargs={"enable_thinking": False})
print(outputs[0]["generated_text"][-1])
# {'role': 'assistant', 'content': 'Deep Cogito'}

## with reasoning
outputs = pipe(messages, max_new_tokens=512, tokenizer_encode_kwargs={"enable_thinking": True})
print(outputs[0]["generated_text"][-1])
# {'role': 'assistant', 'content': 'The question is asking about my creator. I know that I\'m Cogito, an AI assistant created by Deep Cogito, which is an AI research lab. The question is very direct and can be answered very briefly. Since the user has specified to always respond in 1-2 words, I should keep my answer extremely concise.\n\nThe most accurate 2-word answer would be "Deep Cogito" - this names the organization that created me without any unnecessary details. "Deep Cogito" is two words, so it fits the requirement perfectly.\n</think>\nDeep Cogito'}

With HuggingFace AutoModelpythontransformers

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "deepcogito/cogito-671b-v2.1"

model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype="auto", device_map="auto")
tokenizer = AutoTokenizer.from_pretrained(model_name)

messages = [
    {"role": "system", "content": "Always respond in 1-2 words."},
    {"role": "user", "content": "Who created you?"}
]

text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
    enable_thinking=False,
)
# To enable reasoning, set `enable_thinking=True` above.

model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

generated_ids = model.generate(**model_inputs, max_new_tokens=512)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(response)

Tool Calling with HuggingFacepythontransformers

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "deepcogito/cogito-671b-v2.1"

model = AutoModelForCausalLM.from_pretrained(model_id, dtype="auto", device_map="auto")
tokenizer = AutoTokenizer.from_pretrained(model_id)

def get_current_temperature(location: str) -> float:
    """
    Get the current temperature at a location.
    
    Args:
        location: The location to get the temperature for, in the format "City, Country"
    Returns:
        The current temperature at the specified location in the specified units, as a float.
    """
    return 22.

def generate(messages):
    global tokenizer, model
    prompt = tokenizer.apply_chat_template(
        messages,
        tools=[get_current_temperature],
        tokenize=False,
        add_generation_prompt=True,
        enable_thinking=False,
    )
    # To enable reasoning, set `enable_thinking=True` above.

    model_inputs = tokenizer(prompt, return_tensors="pt", add_special_tokens=False).to(model.device)

    generated_ids = model.generate(**model_inputs, max_new_tokens=512)
    generated_ids = [output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)]
    response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
    return response

messages = [{"role": "user", "content": "whats the temperature in Paris?"}]
response = generate(messages)

Deploy This Model

Production-ready deployment in minutes

Together.ai

Instant API access to this model

Fastest API

Production-ready inference API. Start free, scale to millions.

Try Free API

Replicate

One-click model deployment

Easiest Setup

Run models in the cloud with simple API. No DevOps required.

Deploy Now

Disclosure: We may earn a commission from these partners. This helps keep LLMYourWay free.