LLMYourWay
ModelsDevices
Edge AI
CompareInsights
Enterprise

FasterDecoding

7 models • 1 total models in database
Sort by:

medusa-v1.0-vicuna-7b-v1.5

NaNK
llama
2,487
0

medusa-vicuna-7b-v1.3

 Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads - | --------------------------------------------- | --------------------------------------------------------------------- | | 7B | `python -m medusa.inference.cli --model FasterDecoding/medusa-vicuna-7b-v1.3` | FasterDecoding/medusa-vicuna-33b-v1.3 | | 13B | `python -m medusa.inference.cli --model FasterDecoding/medusa-vicuna-13b-v1.3` | FasterDecoding/medusa-vicuna-13b-v1.3 | | 33B | `python -m medusa.inference.cli --model FasterDecoding/medusa-vicuna-33b-v1.3` | FasterDecoding/medusa-vicuna-33b-v1.3 | Inference We currently support inference in the single GPU and batch size 1 setting, which is the most common setup for local model hosting. We are actively working to extend Medusa's capabilities by integrating it into other inference frameworks, please don't hesitate to reach out if you are interested in contributing to this effort. You can use the following command for lauching a CLI interface: You can also pass `--load-in-8bit` or `--load-in-4bit` to load the base model in quantized format.

NaNK
—
914
17

medusa-1.0-zephyr-7b-beta

NaNK
—
622
1

medusa-vicuna-33b-v1.3

NaNK
—
36
4

medusa-vicuna-13b-v1.3

NaNK
—
18
5

medusa-1.0-vicuna-13b-v1.5

NaNK
llama
17
1

medusa-1.0-vicuna-33b-v1.3

NaNK
llama
3
0
LLMYourWay

The definitive AI model comparison platform. Compare 12K+ models, track performance, and discover the perfect AI solution for your needs.

Made with AI
Real-time Data

Product

  • Find Your Device
  • Browse Models
  • Compare AI
  • Benchmarks
  • Pricing
  • API Access

Resources

  • Blog & Articles
  • Methodology
  • Changelog
  • Trending
  • Use Cases

Company

  • About
  • Contact

Legal

  • Privacy Policy
  • Cookie Policy
  • Terms of Service
12K+12,000+
AI Models Tracked & Updated Daily
© 2026 LLMYourWay. All rights reserved.
Data updated every 4 hours
Powered by real-time AI data
API