YOYO-AI
Qwen3-30B-A3B-CoderThinking-YOYO-linear-Q4_K_M-GGUF
YOYO-AI/Qwen3-30B-A3B-CoderThinking-YOYO-linear-Q4KM-GGUF This model was converted to GGUF format from `YOYO-AI/Qwen3-30B-A3B-CoderThinking-YOYO-linear` using llama.cpp via the ggml.ai's GGUF-my-repo space. Refer to the original model card for more details on the model. Use with llama.cpp Install llama.cpp through brew (works on Mac and Linux) Note: You can also use this checkpoint directly through the usage steps listed in the Llama.cpp repo as well. Step 2: Move into the llama.cpp folder and build it with `LLAMACURL=1` flag along with other hardware-specific flags (for ex: LLAMACUDA=1 for Nvidia GPUs on Linux).
Qwen3-30B-A3B-Mixture-2507-Q4_K_M-GGUF
YOYO-AI/Qwen3-30B-A3B-Mixture-2507-Q4KM-GGUF This model was converted to GGUF format from `YOYO-AI/Qwen3-30B-A3B-Mixture-2507` using llama.cpp via the ggml.ai's GGUF-my-repo space. Refer to the original model card for more details on the model. Use with llama.cpp Install llama.cpp through brew (works on Mac and Linux) Note: You can also use this checkpoint directly through the usage steps listed in the Llama.cpp repo as well. Step 2: Move into the llama.cpp folder and build it with `LLAMACURL=1` flag along with other hardware-specific flags (for ex: LLAMACUDA=1 for Nvidia GPUs on Linux).
Qwen3-Coder-30B-A3B-Instruct-480B-Distill-V2-Fp32-mlx-fp16
YOYO-AI/Qwen3-Coder-30B-A3B-Instruct-480B-Distill-V2-Fp32-mlx-fp16 The Model YOYO-AI/Qwen3-Coder-30B-A3B-Instruct-480B-Distill-V2-Fp32-mlx-fp16 was converted to MLX format from BasedBase/Qwen3-Coder-30B-A3B-Instruct-480B-Distill-V2-Fp32 using mlx-lm version 0.26.4.
Qwen3 30B A3B YOYO V4
> Leveraging our novel merging approach, we can seamlessly integrate instruction, reasoning, and code models into a single, high-performing unified model in just one step. Model Highlights: Parameter Settings: > [!TIP] > `Temperature=0.7`, `TopP=0.8`, `TopK=20`,`MinP=0`. Problem Setting Objective: Merge ๐พ fine-tuned models with identical tensor names and shapes into a single model whose parameters ๐โ lie at the robust center of the ๐พ parameter sets. Per-Tensor Formulation For a given tensor name, each model provides a point ๐ฅแตข โ โโฟ (flattened). We seek a robust center ๐โ โ โโฟ. Arithmetic Mean: $$a = \frac{1}{K} \sum{i=1}^{K} xi$$ Robust but ignores vector magnitude coupling; computed elementwise across coordinates. Centered Linear Average: $$\theta^{(0)} = \frac{a + m}{2}$$ This blends efficiency and robustness without tuning, offering a strong seed for iterative robust estimators. Objective Function: $$\theta^{\star} = \arg\min{\theta \in \mathbb{R}^n} \sum{i=1}^{K} \|\theta - xi\|2$$ This is the multivariate analogue of the median, robust to outliers in the Euclidean geometry of parameters. $$wi^{(t)} = \frac{1}{\max(\|\theta^{(t)} - xi\|2, \varepsilon)}$$ Iteration Step: $$\theta^{(t+1)} = \frac{\sum{i=1}^{K} wi^{(t)} xi}{\sum{i=1}^{K} wi^{(t)}}$$ Convergence Criterion: Stop when the relative change is below ๐: $$\frac{\|\theta^{(t+1)} - \theta^{(t)}\|2}{\max(\|\theta^{(t)}\|2, 1)} \leq \varepsilon$$
Qwen3-30B-A3B-YOYO-V4-Q4_K_M-GGUF
YOYO-AI/Qwen3-30B-A3B-YOYO-V4-Q4KM-GGUF This model was converted to GGUF format from `YOYO-AI/Qwen3-30B-A3B-YOYO-V4` using llama.cpp via the ggml.ai's GGUF-my-repo space. Refer to the original model card for more details on the model. Use with llama.cpp Install llama.cpp through brew (works on Mac and Linux) Note: You can also use this checkpoint directly through the usage steps listed in the Llama.cpp repo as well. Step 2: Move into the llama.cpp folder and build it with `LLAMACURL=1` flag along with other hardware-specific flags (for ex: LLAMACUDA=1 for Nvidia GPUs on Linux).
Qwen3-30B-A3B-Thinking-2507-Deepseek-v3.1-Distill-V2-FP32-mlx-fp16
YOYO-AI/Qwen3-30B-A3B-Thinking-2507-Deepseek-v3.1-Distill-V2-FP32-mlx-fp16 The Model YOYO-AI/Qwen3-30B-A3B-Thinking-2507-Deepseek-v3.1-Distill-V2-FP32-mlx-fp16 was converted to MLX format from BasedBase/Qwen3-30B-A3B-Thinking-2507-Deepseek-v3.1-Distill-V2-FP32 using mlx-lm version 0.26.4.
Qwen3-30B-A3B-YOYO-V4-Q8_0-GGUF
YOYO-AI/Qwen3-30B-A3B-YOYO-V4-Q80-GGUF This model was converted to GGUF format from `YOYO-AI/Qwen3-30B-A3B-YOYO-V4` using llama.cpp via the ggml.ai's GGUF-my-repo space. Refer to the original model card for more details on the model. Use with llama.cpp Install llama.cpp through brew (works on Mac and Linux) Note: You can also use this checkpoint directly through the usage steps listed in the Llama.cpp repo as well. Step 2: Move into the llama.cpp folder and build it with `LLAMACURL=1` flag along with other hardware-specific flags (for ex: LLAMACUDA=1 for Nvidia GPUs on Linux).
ZYH-LLM-Qwen2.5-14B-V5
Qwen3 30B A3B Deepseek Distill Instruct 2507
> Using the AcreeFusion merging method, we transferred the knowledge of Deepseek-V3.1 from the distilled reasoning model to the instruction model. Model Highlights: - Highest precision: `dtype: float32` + `outdtype: bfloat16` Parameter Settings: > [!TIP] > `Temperature=0.7`, `TopP=0.8`, `TopK=20`,`MinP=0`. Configuration: The following YAML configuration was used to produce this model:
Qwen3-8B-YOYO-V2-Hybrid
> Enhance the performance of Qwen3-8B by merging powerful reasoning models without compromising the effectiveness of the \nothink tag! Model Highlights: Parameter Settings: Thinking Mode: > [!NOTE] > `Temperature=0.6`, `TopP=0.95`, `TopK=20`,`MinP=0`. Non-Thinking Mode: > [!TIP] > `Temperature=0.7`, `TopP=0.8`, `TopK=20`,`MinP=0`. Step1: Merge Two Hybrid Models - Leverage the advantages of the two hybrid models. Step2: Merge High-Performance Reasoning Models with Hybrid Models - Maximize the proportion of reasoning models on the premise that the \nothink tag remains effective. Step3: Unify the Enhanced Hybrid Modes - Merge the two models into the base model using the della merging method to make the model more versatile and stable. - We use the chat template of Qwen3-8B.
Qwen3 30B A3B CoderThinking YOYO Linear
- Highest precision: `dtype: float32` + `outdtype: bfloat16` Parameter Settings: > [!NOTE] > `Temperature=0.6`, `TopP=0.95`, `TopK=20`,`MinP=0`. Configuration: The following YAML configuration was used to produce this model:
Qwen3-EZO-8B-YOYO-karcher-128K
Qwen3-30B-A3B-YOYO
Qwen3-30B-A3B-Mixture-2507
- Highest precision: `dtype: float32` + `outdtype: bfloat16` Parameter Settings: > [!NOTE] > `Temperature=0.7`, `TopP=0.8`, `TopK=20`,`MinP=0`. Configuration: The following YAML configuration was used to produce this model:
Qwen2.5-14B-YOYO-Average
> We have used the Karcher merging method to average five of the most representative Qwen2.5 14B derivative models, in commemoration of the efforts made by the open-source community for the Qwen2.5 14B model. Merge Method This model was merged using the Karcher Mean merge method. Models Merged The following models were included in the merge: Qwen/Qwen2.5-14B-Instruct Qwen/Qwen2.5-14B-Instruct-1M arcee-ai/Virtuoso-Small-v2 deepcogito/cogito-v1-preview-qwen-14B deepseek-ai/DeepSeek-R1-Distill-Qwen-14B Configuration: The following YAML configuration was used to produce this model:
Qwen2.5-14B-YOYO-V4-p2
License: Apache 2.0. Language: English.
Qwen3-EZO-8B-YOYO-slerp
Qwen2.5-14B-YOYO-V4-p1
License: Apache 2.0, Language: English.
Qwen3-EZO-8B-YOYO-slerp-128K
Qwen3-8B-YOYO-slerp
ZYH-LLM-Qwen2.5-14B-V4
This model is licensed under Apache 2.0 and supports the English language.
Qwen3-8B-YOYO-nuslerp-plus
Qwen3-8B-YOYO-nuslerp-plus-128K
Qwen3-EZO-8B-YOYO-nuslerp
Qwen3-EZO-8B-YOYO-nuslerp-plus
Qwen3-8B-YOYO-slerp-128K
Qwen3-EZO-8B-YOYO-nuslerp-128K
Qwen3-8B-YOYO-karcher
Qwen3-EZO-8B-YOYO-karcher
Qwen2.5-14B-YOYO-V4-p3
Qwen3-8B-YOYO
This is a merge of pre-trained language models created using mergekit. This model was merged using the DELLA merge method using Qwen/Qwen3-8B-Base as a base. The following models were included in the merge: Qwen/Qwen3-8B The following YAML configuration was used to produce this model:
Qwen3-14B-YOYO
Qwen3-8B-YOYO-nuslerp
QwQ-Coder-instruct
Qwen3-EZO-8B-YOYO-nuslerp-128K-Q8_0-GGUF
YOYO-O1-14B-V2
Combined the most top-notch 14B inference model and code model in the entire open-source community. This model was merged using the SCE merge method using arcee-ai/Virtuoso-Small-v2 as a base. The following models were included in the merge: deepcogito/cogito-v1-preview-qwen-14B Zhihu-ai/Zhi-Create-DSR1-14B agentica-org/DeepCoder-14B-Preview FractalAIResearch/Fathom-R1-14B The following YAML configuration was used to produce this model:
Qwen3-4B-YOYO
This is a merge of pre-trained language models created using mergekit. This model was merged using the DELLA merge method using Qwen/Qwen3-4B-Base as a base. The following models were included in the merge: Qwen/Qwen3-4B The following YAML configuration was used to produce this model:
Qwen3-8B-YOYO-nuslerp-128K
Qwen2.5-7B-it-restore
License: Apache 2.0. Language: English.
Qwen3-EZO-8B-YOYO-nuslerp-plus-128K
Qwen3-8B-YOYO-karcher-128K
ZYH-LLM-Qwen2.5-14B-V3-GGUF
upgraded version๏ผ The fourth-generation model of ZYH-LLM-Qwen2.5 has been released! This is the third-generation model of the ZYH-LLM series. It employs a large amount of model merging techniques, aiming to provide a powerful and unified 14-billion-parameter model, laying a solid foundation for further model merging and model fine-tuning. imatrix quants๏ผ https://huggingface.co/mradermacher/ZYH-LLM-Qwen2.5-14B-V3-i1-GGUF
Qwen3-EZO-8B-YOYO-karcher-128K-Q8_0-GGUF
YOYO-AI/Qwen3-EZO-8B-YOYO-karcher-128K-Q80-GGUF This model was converted to GGUF format from `YOYO-AI/Qwen3-EZO-8B-YOYO-karcher-128K` using llama.cpp via the ggml.ai's GGUF-my-repo space. Refer to the original model card for more details on the model. Use with llama.cpp Install llama.cpp through brew (works on Mac and Linux) Note: You can also use this checkpoint directly through the usage steps listed in the Llama.cpp repo as well. Step 2: Move into the llama.cpp folder and build it with `LLAMACURL=1` flag along with other hardware-specific flags (for ex: LLAMACUDA=1 for Nvidia GPUs on Linux).
Qwen3-30B-A3B-YOYO-V3
Qwen3-EZO-8B-YOYO-nuslerp-plus-Q8_0-GGUF
YOYO-AI/Qwen3-EZO-8B-YOYO-nuslerp-plus-Q80-GGUF This model was converted to GGUF format from `YOYO-AI/Qwen3-EZO-8B-YOYO-nuslerp-plus` using llama.cpp via the ggml.ai's GGUF-my-repo space. Refer to the original model card for more details on the model. Use with llama.cpp Install llama.cpp through brew (works on Mac and Linux) Note: You can also use this checkpoint directly through the usage steps listed in the Llama.cpp repo as well. Step 2: Move into the llama.cpp folder and build it with `LLAMACURL=1` flag along with other hardware-specific flags (for ex: LLAMACUDA=1 for Nvidia GPUs on Linux).
Qwen2.5-14B-YOYO-Average-Q8_0-GGUF
Qwen3-30B-A3B-YOYO-V3-Q4_K_M-GGUF
YOYO-AI/Qwen3-30B-A3B-YOYO-V3-Q4KM-GGUF This model was converted to GGUF format from `YOYO-AI/Qwen3-30B-A3B-YOYO-V3` using llama.cpp via the ggml.ai's GGUF-my-repo space. Refer to the original model card for more details on the model. Use with llama.cpp Install llama.cpp through brew (works on Mac and Linux) Note: You can also use this checkpoint directly through the usage steps listed in the Llama.cpp repo as well. Step 2: Move into the llama.cpp folder and build it with `LLAMACURL=1` flag along with other hardware-specific flags (for ex: LLAMACUDA=1 for Nvidia GPUs on Linux).
Qwen3-8B-YOYO-nuslerp-plus-128K-Q8_0-GGUF
Qwen3-8B-YOYO-karcher-Q8_0-GGUF
YOYO-AI/Qwen3-8B-YOYO-karcher-Q80-GGUF This model was converted to GGUF format from `YOYO-AI/Qwen3-8B-YOYO-karcher` using llama.cpp via the ggml.ai's GGUF-my-repo space. Refer to the original model card for more details on the model. Use with llama.cpp Install llama.cpp through brew (works on Mac and Linux) Note: You can also use this checkpoint directly through the usage steps listed in the Llama.cpp repo as well. Step 2: Move into the llama.cpp folder and build it with `LLAMACURL=1` flag along with other hardware-specific flags (for ex: LLAMACUDA=1 for Nvidia GPUs on Linux).
EZO-QwQ-32B-Q4_K_M-GGUF
Qwen3-EZO-8B-YOYO-nuslerp-plus-128K-Q8_0-GGUF
YOYO-AI/Qwen3-EZO-8B-YOYO-nuslerp-plus-128K-Q80-GGUF This model was converted to GGUF format from `YOYO-AI/Qwen3-EZO-8B-YOYO-nuslerp-plus-128K` using llama.cpp via the ggml.ai's GGUF-my-repo space. Refer to the original model card for more details on the model. Use with llama.cpp Install llama.cpp through brew (works on Mac and Linux) Note: You can also use this checkpoint directly through the usage steps listed in the Llama.cpp repo as well. Step 2: Move into the llama.cpp folder and build it with `LLAMACURL=1` flag along with other hardware-specific flags (for ex: LLAMACUDA=1 for Nvidia GPUs on Linux).
ZYH-LLM-Qwen2.5-14B-V2
This model is licensed under the Apache 2.0 license and supports the English language.
Qwen2.5-32B-YOYO-V2-Q4_K_M-GGUF
YOYO-AI/Qwen2.5-32B-YOYO-V2-Q4KM-GGUF This model was converted to GGUF format from `YOYO-AI/Qwen2.5-32B-YOYO-V2` using llama.cpp via the ggml.ai's GGUF-my-repo space. Refer to the original model card for more details on the model. Use with llama.cpp Install llama.cpp through brew (works on Mac and Linux) Note: You can also use this checkpoint directly through the usage steps listed in the Llama.cpp repo as well. Step 2: Move into the llama.cpp folder and build it with `LLAMACURL=1` flag along with other hardware-specific flags (for ex: LLAMACUDA=1 for Nvidia GPUs on Linux).
Qwen3-8B-YOYO-nuslerp-plus-Q8_0-GGUF
ZYH-LLM-Qwen2.5-14B-V3
This model is licensed under the Apache 2.0 license and supports the English language.
Qwen2.5-32B-YOYO-MIX-Q4_K_M-GGUF
Qwen3-8B-YOYO-nuslerp-Q8_0-GGUF
YOYO-AI/Qwen3-8B-YOYO-nuslerp-Q80-GGUF This model was converted to GGUF format from `YOYO-AI/Qwen3-8B-YOYO-nuslerp` using llama.cpp via the ggml.ai's GGUF-my-repo space. Refer to the original model card for more details on the model. Use with llama.cpp Install llama.cpp through brew (works on Mac and Linux) Note: You can also use this checkpoint directly through the usage steps listed in the Llama.cpp repo as well. Step 2: Move into the llama.cpp folder and build it with `LLAMACURL=1` flag along with other hardware-specific flags (for ex: LLAMACUDA=1 for Nvidia GPUs on Linux).
Qwen3-8B-YOYO-slerp-Q8_0-GGUF
YOYO-AI/Qwen3-8B-YOYO-slerp-Q80-GGUF This model was converted to GGUF format from `YOYO-AI/Qwen3-8B-YOYO-slerp` using llama.cpp via the ggml.ai's GGUF-my-repo space. Refer to the original model card for more details on the model. Use with llama.cpp Install llama.cpp through brew (works on Mac and Linux) Note: You can also use this checkpoint directly through the usage steps listed in the Llama.cpp repo as well. Step 2: Move into the llama.cpp folder and build it with `LLAMACURL=1` flag along with other hardware-specific flags (for ex: LLAMACUDA=1 for Nvidia GPUs on Linux).
Qwen2.5-14B-YOYO-V3
QwQ-instruct-32B-Q4_K_M-GGUF
Qwen3-8B-YOYO-nuslerp-128K-Q8_0-GGUF
YOYO-AI/Qwen3-8B-YOYO-nuslerp-128K-Q80-GGUF This model was converted to GGUF format from `YOYO-AI/Qwen3-8B-YOYO-nuslerp-128K` using llama.cpp via the ggml.ai's GGUF-my-repo space. Refer to the original model card for more details on the model. Use with llama.cpp Install llama.cpp through brew (works on Mac and Linux) Note: You can also use this checkpoint directly through the usage steps listed in the Llama.cpp repo as well. Step 2: Move into the llama.cpp folder and build it with `LLAMACURL=1` flag along with other hardware-specific flags (for ex: LLAMACUDA=1 for Nvidia GPUs on Linux).
Qwen3-EZO-8B-YOYO-nuslerp-Q8_0-GGUF
Qwen3-EZO-8B-YOYO-slerp-Q8_0-GGUF
Qwen3-EZO-8B-YOYO-slerp-128K-Q8_0-GGUF
Qwen3-8B-YOYO-slerp-128K-Q8_0-GGUF
YOYO-AI/Qwen3-8B-YOYO-slerp-128K-Q80-GGUF This model was converted to GGUF format from `YOYO-AI/Qwen3-8B-YOYO-slerp-128K` using llama.cpp via the ggml.ai's GGUF-my-repo space. Refer to the original model card for more details on the model. Use with llama.cpp Install llama.cpp through brew (works on Mac and Linux) Note: You can also use this checkpoint directly through the usage steps listed in the Llama.cpp repo as well. Step 2: Move into the llama.cpp folder and build it with `LLAMACURL=1` flag along with other hardware-specific flags (for ex: LLAMACUDA=1 for Nvidia GPUs on Linux).
Qwen2.5-14B-YOYO-V4
License: Apache 2.0. Language: English.
Qwen2.5-14B-YOYO
This is a merge of pre-trained language models created using mergekit. This model was merged using the DELLA merge method using Qwen/Qwen2.5-14B as a base. The following models were included in the merge: Qwen/Qwen2.5-14B-instruct The following YAML configuration was used to produce this model:
Qwen2.5-14B-YOYO-V5
Qwen3-8B-YOYO-karcher-128K-Q8_0-GGUF
YOYO-AI/Qwen3-8B-YOYO-karcher-128K-Q80-GGUF This model was converted to GGUF format from `YOYO-AI/Qwen3-8B-YOYO-karcher-128K` using llama.cpp via the ggml.ai's GGUF-my-repo space. Refer to the original model card for more details on the model. Use with llama.cpp Install llama.cpp through brew (works on Mac and Linux) Note: You can also use this checkpoint directly through the usage steps listed in the Llama.cpp repo as well. Step 2: Move into the llama.cpp folder and build it with `LLAMACURL=1` flag along with other hardware-specific flags (for ex: LLAMACUDA=1 for Nvidia GPUs on Linux).
Qwen2.5-Coder-3B-YOYO
Qwen2.5-14B-YOYO-super
Qwen3-30B-A3B-YOYO-V2
> This is the initial unified version of the Qwen3-30B-A3B series models.As more fine-tuned models emerge and merging methods are applied, we will further improve it. Stay tuned! Model Highlights: Parameter Settings: > [!TIP] > `Temperature=0.7`, `TopP=0.8`, `TopK=20`,`MinP=0`. Step1: Merge Code Model with Instruction & Thinking Models Separately - Adopt the nuslerp method to improve model absorption rate. - Set a merging ratio of 9:1 to prevent capability degradation caused by an excessively high proportion of the code model. Step2: Merge Code Instruction & Code Thinking Models into Base Model Together - Merge the two models into the base model using the della merging method to make the model more versatile and stable. - Since the merged model is more similar to the instruction model, we use the chat template of the Qwen3-30B-A3B-Instruct-2507. Step3: Further Extend Context Length - By referring to the config1m.json of Qwen3-30B-A3B-Instruct-2507, we modified the config.json of the merged model and extended the maximum context length to 1M.
Qwen2.5-7B-YOYO-super
Qwen2.5-14B-it-restore
License: Apache 2.0. Language: English.
YOYO-O1-32B
Qwen2.5-7B-YOYO
Qwen2.5-Coder-7B-YOYO
Qwen2.5-32B-YOYO-V2
QwQ-openhands-coder-32B
Qwen2.5-14B-YOYO-V6-test2-Q8_0-GGUF
YOYO-AI/Qwen2.5-14B-YOYO-V6-test2-Q80-GGUF This model was converted to GGUF format from `YOYO-AI/Qwen2.5-14B-YOYO-V6-test2` using llama.cpp via the ggml.ai's GGUF-my-repo space. Refer to the original model card for more details on the model. Use with llama.cpp Install llama.cpp through brew (works on Mac and Linux) Note: You can also use this checkpoint directly through the usage steps listed in the Llama.cpp repo as well. Step 2: Move into the llama.cpp folder and build it with `LLAMACURL=1` flag along with other hardware-specific flags (for ex: LLAMACUDA=1 for Nvidia GPUs on Linux).
Qwen3-EZO-8B-YOYO-karcher-Q8_0-GGUF
YOYO-AI/Qwen3-EZO-8B-YOYO-karcher-Q80-GGUF This model was converted to GGUF format from `YOYO-AI/Qwen3-EZO-8B-YOYO-karcher` using llama.cpp via the ggml.ai's GGUF-my-repo space. Refer to the original model card for more details on the model. Use with llama.cpp Install llama.cpp through brew (works on Mac and Linux) Note: You can also use this checkpoint directly through the usage steps listed in the Llama.cpp repo as well. Step 2: Move into the llama.cpp folder and build it with `LLAMACURL=1` flag along with other hardware-specific flags (for ex: LLAMACUDA=1 for Nvidia GPUs on Linux).
Qwen3-8B-YOYO-V2-Hybrid-Q8_0-GGUF
YOYO-AI/Qwen3-8B-YOYO-V2-Hybrid-Q80-GGUF This model was converted to GGUF format from `YOYO-AI/Qwen3-8B-YOYO-V2-Hybrid` using llama.cpp via the ggml.ai's GGUF-my-repo space. Refer to the original model card for more details on the model. Use with llama.cpp Install llama.cpp through brew (works on Mac and Linux) Note: You can also use this checkpoint directly through the usage steps listed in the Llama.cpp repo as well. Step 2: Move into the llama.cpp folder and build it with `LLAMACURL=1` flag along with other hardware-specific flags (for ex: LLAMACUDA=1 for Nvidia GPUs on Linux).
Qwen3-30B-A3B-YOYO-V3-Q8_0-GGUF
YOYO-AI/Qwen3-30B-A3B-YOYO-V3-Q80-GGUF This model was converted to GGUF format from `YOYO-AI/Qwen3-30B-A3B-YOYO-V3` using llama.cpp via the ggml.ai's GGUF-my-repo space. Refer to the original model card for more details on the model. Use with llama.cpp Install llama.cpp through brew (works on Mac and Linux) Note: You can also use this checkpoint directly through the usage steps listed in the Llama.cpp repo as well. Step 2: Move into the llama.cpp folder and build it with `LLAMACURL=1` flag along with other hardware-specific flags (for ex: LLAMACUDA=1 for Nvidia GPUs on Linux).
YOYO-O1-32B-V2
Qwen2.5-Coder-1.5B-YOYO
Qwen2.5-0.5B-YOYO
YOYO-O1-32B-V3
Qwen2.5-14B-Fusion-Q8_0-GGUF
Qwen2.5-14B-YOYO-V6-test2
This is a merge of pre-trained language models created using mergekit. This model was merged using the Karcher Mean merge method using mergekit-community/Qwen2.5-14B-della-1M-dpo as a base. The following models were included in the merge: mergekit-community/Qwen2.5-14B-della-V6-dpo mergekit-community/Qwen2.5-14B-della-Nova-dpo agentica-org/DeepCoder-14B-Preview mergekit-community/Qwen2.5-14B-della-base-dpo Zhihu-ai/Zhi-writing-dsr1-14b mergekit-community/Qwen2.5-14B-della-v2-dpo mergekit-community/Qwen2.5-14B-della-code The following YAML configuration was used to produce this model:
Qwen2.5-32B-YOYO-reasoning-v3-Q4_K_M-GGUF
Qwen3-30B-A3B-YOYO-V2-Q8_0-GGUF
Qwen2.5-Coder-32B-YOYO
Qwen2.5-14B-YOYO-V2
Qwen2.5-3B-YOYO
Qwen2.5-1.5B-YOYO
Qwen3-30B-A3B-YOYO-V6
Qwen2.5-32B-YOYO
EZO-QwQ-32B
Qwen2.5-14B-YOYO-GGUF
ZYH-LLM-Qwen2.5-14B-V2-GGUF
Qwen2.5-32B-YOYO-reasoning
This is a merge of pre-trained language models created using mergekit. This model was merged using the Model Stock merge method using Qwen/Qwen2.5-32B-Instruct as a base. The following models were included in the merge: maldv/Awqward2.5-32B-Instruct qihoo360/Light-R1-32B Rombo-Org/Rombo-LLM-V3.0-Qwen-32b maldv/Qwentile2.5-32B-Instruct Qwen/QwQ-32B OpenPipe/Deductive-Reasoning-Qwen-32B The following YAML configuration was used to produce this model:
QwQ-Sky-T1-Med-32B
QwQ-coder-32B-plus
Qwen2.5-32B-YOYO-stock
QwQ-coder-32B
This is a merge of pre-trained language models created using mergekit. This model was merged using the SCE merge method using Qwen/Qwen2.5-Coder-32B as a base. The following models were included in the merge: Qwen/QwQ-32B Qwen/Qwen2.5-Coder-32B-Instruct The following YAML configuration was used to produce this model:
ZYH-LLM-Qwen2.5-14B
This model is licensed under the Apache 2.0 license and supports the English language.
Qwen2.5-Coder-14B-YOYO
This is a merge of pre-trained language models created using mergekit. This model was merged using the DELLA merge method using Qwen/Qwen2.5-Coder-14B as a base. The following models were included in the merge: Qwen/Qwen2.5-Coder-14B-instruct The following YAML configuration was used to produce this model:
Qwen2.5-32B-YOYO-MIX
YOYO-O1-14B
Qwen2.5-14B-YOYO-stock
Qwen3-30B-A3B-YOYO-Thinking-Chimera
QwQ-instruct-32B
Qwen2.5-72B-YOYO
Qwen2.5-Coder-0.5B-YOYO
QwQ-32B-YOYO
Qwen2.5-14B-Fusion
Qwen2.5-14B-Fusion-1M
Qwen2.5-14B-YOYO-V1.5
QwQ-Olympic-coder-32B
Qwen2.5-14B-YOYO-karcher-test
From the preliminary test results, the effect is really excellent!!! This is definitely a very promising method for model merging! The current optimal formula ratio is: instruction: reasoning: code = 6:2:1 This is a merge of pre-trained language models created using mergekit. This model was merged using the Karcher Mean merge method using Qwen/Qwen2.5-14B-Instruct as a base. instruction:(6) Qwen/Qwen2.5-14B-Instruct Qwen/Qwen2.5-14B-Instruct-1M huihui-ai/Qwen2.5-14B-Instruct-1M-abliterated huihui-ai/Qwen2.5-14B-Instruct-abliterated-v2 SicariusSicariiStuff/ImpishQWEN14B-1M tanliboy/lambda-qwen2.5-14b-dpo-test reasoning:(2) Zhihu-ai/Zhi-writing-dsr1-14b deepseek-ai/DeepSeek-R1-Distill-Qwen-14B The following YAML configuration was used to produce this model: