LLMYourWay
ModelsDevices
Edge AI
CompareInsights
Enterprise

cturan

2 models • 2 total models in database
Sort by:

MiniMax-M2-GGUF

Building and Running the Experimental `minimax` Branch of `llama.cpp` Note: This setup is experimental. The `minimax` branch will not work with the standard `llama.cpp`. Use it only for testing GGUF models with experimental features. System Requirements (you can use any supported this is for ubuntu build commands) - Ubuntu 22.04 - NVIDIA GPU with CUDA support - CUDA Toolkit 12.8 or later - CMake After the build is complete, the binaries will be located in: This configuration offloads the experts to the CPU, so approximately 16 GB of VRAM is sufficient. - `--cpu-moe` enables CPU offloading for mixture-of-experts layers. - `--jinja` activates the Jinja templating engine. - Adjust `-c` (context length) and `-ngl` (GPU layers) according to your hardware. - Ensure the model file (`minimax-m2-Q4K.gguf`) is available in the working directory. All steps complete. The experimental CUDA-enabled build of `llama.cpp` is ready to use.

NaNK
license:mit
8,608
15

cturan/Kimi-Linear-48B-A3B-Instruct-GGUF

NaNK
license:mit
300
2
LLMYourWay

The definitive AI model comparison platform. Compare 12K+ models, track performance, and discover the perfect AI solution for your needs.

Made with AI
Real-time Data

Product

  • Find Your Device
  • Browse Models
  • Compare AI
  • Benchmarks
  • Pricing
  • API Access

Resources

  • Blog & Articles
  • Methodology
  • Changelog
  • Trending
  • Use Cases

Company

  • About
  • Contact

Legal

  • Privacy Policy
  • Cookie Policy
  • Terms of Service
12K+12,000+
AI Models Tracked & Updated Daily
© 2026 LLMYourWay. All rights reserved.
Data updated every 4 hours
Powered by real-time AI data
API