Aurora-Spec-Qwen3-Coder-Next-FP8
592
11
llama
by
togethercomputer
Language Model
OTHER
New
592 downloads
Early-stage
Edge AI:
Mobile
Laptop
Server
Unknown
Mobile
Laptop
Server
Quick Summary
AI model with specialized capabilities.
Code Examples
Usagepython
import sglang as sgl
def main():
# Sample prompts
prompts = [
"Write a Python function to compute fibonacci numbers:",
"Implement a binary search algorithm in Python:",
"Create a class for a binary tree in Python:",
]
# Create sampling params
sampling_params = {"temperature": 0.7, "max_new_tokens": 256}
# Initialize engine with speculative decoding
llm = sgl.Engine(
model_path="Qwen/Qwen3-Coder-Next-FP8",
speculative_draft_model_path="togethercomputer/Aurora-Spec-Qwen3-Coder-Next-FP8",
speculative_algorithm="EAGLE3",
speculative_num_steps=5,
speculative_eagle_topk=1,
speculative_num_draft_tokens=6,
trust_remote_code=True,
)
# Generate with speculative decoding
outputs = llm.generate(prompts, sampling_params)
# Print the outputs
for prompt, output in zip(prompts, outputs):
print("=" * 50)
print(f"Prompt: {prompt}")
print(f"Generated: {output['text']}")
# The __main__ condition is necessary when using spawn to create subprocesses
if __name__ == "__main__":
main()Deploy This Model
Production-ready deployment in minutes
Together.ai
Instant API access to this model
Production-ready inference API. Start free, scale to millions.
Try Free APIReplicate
One-click model deployment
Run models in the cloud with simple API. No DevOps required.
Deploy NowDisclosure: We may earn a commission from these partners. This helps keep LLMYourWay free.