Aurora-Spec-Qwen3-Coder-Next-FP8

592
11
llama
by
togethercomputer
Language Model
OTHER
New
592 downloads
Early-stage
Edge AI:
Mobile
Laptop
Server
Unknown
Mobile
Laptop
Server
Quick Summary

AI model with specialized capabilities.

Code Examples

Usagepython
import sglang as sgl

def main():
    # Sample prompts
    prompts = [
        "Write a Python function to compute fibonacci numbers:",
        "Implement a binary search algorithm in Python:",
        "Create a class for a binary tree in Python:",
    ]

    # Create sampling params
    sampling_params = {"temperature": 0.7, "max_new_tokens": 256}

    # Initialize engine with speculative decoding
    llm = sgl.Engine(
        model_path="Qwen/Qwen3-Coder-Next-FP8",
        speculative_draft_model_path="togethercomputer/Aurora-Spec-Qwen3-Coder-Next-FP8",
        speculative_algorithm="EAGLE3",
        speculative_num_steps=5,
        speculative_eagle_topk=1,
        speculative_num_draft_tokens=6,
        trust_remote_code=True,
    )

    # Generate with speculative decoding
    outputs = llm.generate(prompts, sampling_params)

    # Print the outputs
    for prompt, output in zip(prompts, outputs):
        print("=" * 50)
        print(f"Prompt: {prompt}")
        print(f"Generated: {output['text']}")

# The __main__ condition is necessary when using spawn to create subprocesses
if __name__ == "__main__":
    main()

Deploy This Model

Production-ready deployment in minutes

Together.ai

Instant API access to this model

Fastest API

Production-ready inference API. Start free, scale to millions.

Try Free API

Replicate

One-click model deployment

Easiest Setup

Run models in the cloud with simple API. No DevOps required.

Deploy Now

Disclosure: We may earn a commission from these partners. This helps keep LLMYourWay free.