T-pro-it-2.0-eagle
14
47
2 languages
llama
by
t-tech
Other
OTHER
New
14 downloads
Early-stage
Edge AI:
Mobile
Laptop
Server
Unknown
Mobile
Laptop
Server
Quick Summary
AI model with specialized capabilities.
Code Examples
👨💻 Examples of usagepythontransformers
import sglang
import transformers
if __name__ == "__main__":
tokenizer = transformers.AutoTokenizer.from_pretrained("t-tech/T-pro-it-2.0")
llm = sglang.Engine(
model_path="t-tech/T-pro-it-2.0",
max_running_requests=1,
tp_size=2,
mem_fraction_static=0.8,
speculative_algorithm="EAGLE",
speculative_draft_model_path="t-tech/T-pro-it-2.0-eagle",
speculative_num_steps=5,
speculative_eagle_topk=8,
speculative_num_draft_tokens=64
)
sampling_params = {"temperature": 0.0, "max_new_tokens": 2048}
# warmup
llm.generate([tokenizer.apply_chat_template([{"role": "user", "content": "Что такое программирование?"}], tokenize=False)], sampling_params)
llm.generate([tokenizer.apply_chat_template([{"role": "user", "content": "Расскажи шутку"}], tokenize=False)], sampling_params)
llm.generate([tokenizer.apply_chat_template([{"role": "user", "content": "Чем питаются орлы?"}], tokenize=False)], sampling_params)
llm.generate([tokenizer.apply_chat_template([{"role": "user", "content": "Ты орел?"}], tokenize=False)], sampling_params)
# actual run
outputs = llm.generate([tokenizer.apply_chat_template([{"role": "user", "content": "Что такое большая языковая модель?"}], tokenize=False)], sampling_params)
output = outputs[0]
llm.shutdown()
total_latency = output["meta_info"]["e2e_latency"]
total_output_tokens = output["meta_info"]["completion_tokens"]
total_verify_ct = output["meta_info"]["spec_verify_ct"]
total_output_throughput = total_output_tokens / total_latency
accept_length = total_output_tokens / total_verify_ct
print(outputs[0])
print({"accept_length": accept_length, "total_output_throughput": total_output_throughput})Deploy This Model
Production-ready deployment in minutes
Together.ai
Instant API access to this model
Production-ready inference API. Start free, scale to millions.
Try Free APIReplicate
One-click model deployment
Run models in the cloud with simple API. No DevOps required.
Deploy NowDisclosure: We may earn a commission from these partners. This helps keep LLMYourWay free.