cturan
MiniMax-M2-GGUF
Building and Running the Experimental `minimax` Branch of `llama.cpp` Note: This setup is experimental. The `minimax` branch will not work with the standard `llama.cpp`. Use it only for testing GGUF models with experimental features. System Requirements (you can use any supported this is for ubuntu build commands) - Ubuntu 22.04 - NVIDIA GPU with CUDA support - CUDA Toolkit 12.8 or later - CMake After the build is complete, the binaries will be located in: This configuration offloads the experts to the CPU, so approximately 16 GB of VRAM is sufficient. - `--cpu-moe` enables CPU offloading for mixture-of-experts layers. - `--jinja` activates the Jinja templating engine. - Adjust `-c` (context length) and `-ngl` (GPU layers) according to your hardware. - Ensure the model file (`minimax-m2-Q4K.gguf`) is available in the working directory. All steps complete. The experimental CUDA-enabled build of `llama.cpp` is ready to use.