T-pro-it-2.0-eagle

14
47
2 languages
llama
by
t-tech
Other
OTHER
New
14 downloads
Early-stage
Edge AI:
Mobile
Laptop
Server
Unknown
Mobile
Laptop
Server
Quick Summary

AI model with specialized capabilities.

Code Examples

👨‍💻 Examples of usagepythontransformers
import sglang
import transformers


if __name__ == "__main__":
    tokenizer = transformers.AutoTokenizer.from_pretrained("t-tech/T-pro-it-2.0")
    
    llm = sglang.Engine(
        model_path="t-tech/T-pro-it-2.0",
        max_running_requests=1,
        tp_size=2,
        mem_fraction_static=0.8,
        speculative_algorithm="EAGLE",
        speculative_draft_model_path="t-tech/T-pro-it-2.0-eagle",
        speculative_num_steps=5,
        speculative_eagle_topk=8,
        speculative_num_draft_tokens=64
    )

    sampling_params = {"temperature": 0.0, "max_new_tokens": 2048}

    # warmup
    llm.generate([tokenizer.apply_chat_template([{"role": "user", "content": "Что такое программирование?"}], tokenize=False)], sampling_params)
    llm.generate([tokenizer.apply_chat_template([{"role": "user", "content": "Расскажи шутку"}], tokenize=False)], sampling_params)
    llm.generate([tokenizer.apply_chat_template([{"role": "user", "content": "Чем питаются орлы?"}], tokenize=False)], sampling_params)
    llm.generate([tokenizer.apply_chat_template([{"role": "user", "content": "Ты орел?"}], tokenize=False)], sampling_params)

    # actual run
    outputs = llm.generate([tokenizer.apply_chat_template([{"role": "user", "content": "Что такое большая языковая модель?"}], tokenize=False)], sampling_params)
    output = outputs[0]
    llm.shutdown()
    total_latency = output["meta_info"]["e2e_latency"]
    total_output_tokens = output["meta_info"]["completion_tokens"]
    total_verify_ct = output["meta_info"]["spec_verify_ct"]
    total_output_throughput = total_output_tokens / total_latency
    accept_length = total_output_tokens / total_verify_ct
    print(outputs[0])
    print({"accept_length": accept_length, "total_output_throughput": total_output_throughput})

Deploy This Model

Production-ready deployment in minutes

Together.ai

Instant API access to this model

Fastest API

Production-ready inference API. Start free, scale to millions.

Try Free API

Replicate

One-click model deployment

Easiest Setup

Run models in the cloud with simple API. No DevOps required.

Deploy Now

Disclosure: We may earn a commission from these partners. This helps keep LLMYourWay free.