ehristoforu

Qwen2.5 is the latest series of Qwen large language models. For Qwen2.5, we release a number of base language models and instruction-tuned language models ranging from 0.5 to 72 billion parameters. Qwen2.5 brings the following improvements upon Qwen2: - Significantly more knowledge and has greatly improved capabilities in coding and mathematics, thanks to our specialized expert models in these domains. - Significant improvements in instruction following, generating long texts (over 8K tokens), understanding structured data (e.g, tables), and generating structured outputs especially JSON. More resilient to the diversity of system prompts, enhancing role-play implementation and condition-setting for chatbots. - Long-context Support up to 128K tokens and can generate up to 8K tokens. - Multilingual support for over 29 languages, including Chinese, English, French, Spanish, Portuguese, German, Italian, Russian, Japanese, Korean, Vietnamese, Thai, Arabic, and more. This repo contains the instruction-tuned 3B Qwen2.5 model, which has the following features: - Type: Causal Language Models - Training Stage: Pretraining & Post-training - Architecture: transformers with RoPE, SwiGLU, RMSNorm, Attention QKV bias and tied word embeddings - Number of Parameters: 3.09B - Number of Paramaters (Non-Embedding): 2.77B - Number of Layers: 36 - Number of Attention Heads (GQA): 16 for Q and 2 for KV - Context Length: Full 32,768 tokens and generation 8192 tokens For more details, please refer to our blog, GitHub, and Documentation. The code of Qwen2.5 has been in the latest Hugging face `transformers` and we advise you to use the latest version of `transformers`. With `transformers<4.37.0`, you will encounter the following error: Here provides a code snippet with `applychattemplate` to show you how to load the tokenizer and model and how to generate contents. Detailed evaluation results are reported in this 📑 blog. For requirements on GPU memory and the respective throughput, see results here. If you find our work helpful, feel free to give us a cite.

NaNK

—

FluentlyQwen3-1.7B-Q4_K_M-GGUF

NaNK

llama-cpp

Falcon3-MoE-2x7B-Insruct

- 13.4B parameters - BF16 - Falcon3 (Llama) - Instruct Falcon3-7B-Instruct Falcon3 family of Open Foundation Models is a set of pretrained and instruct LLMs ranging from 1B to 10B. This repository contains the Falcon3-7B-Instruct. It achieves state of art results (at the time of release) on reasoning, language understanding, instruction following, code and mathematics tasks. Falcon3-7B-Instruct supports 4 languages (english, french, spanish, portuguese) and a context length up to 32K.

NaNK

llama

phi-4-25b

This is a merge of pre-trained language models created using mergekit. This model was merged using the passthrough merge method. The following models were included in the merge: microsoft/phi-4 The following YAML configuration was used to produce this model:

NaNK

—

FluentlyQwen3-Coder-4B-0909-Q4_K_M-GGUF

ehristoforu/FluentlyQwen3-Coder-4B-0909-Q4KM-GGUF This model was converted to GGUF format from `fluently/FluentlyQwen3-Coder-4B-0909` using llama.cpp via the ggml.ai's GGUF-my-repo space. Refer to the original model card for more details on the model. Use with llama.cpp Install llama.cpp through brew (works on Mac and Linux) Note: You can also use this checkpoint directly through the usage steps listed in the Llama.cpp repo as well. Step 2: Move into the llama.cpp folder and build it with `LLAMACURL=1` flag along with other hardware-specific flags (for ex: LLAMACUDA=1 for Nvidia GPUs on Linux).

NaNK

llama-cpp

deliberate-v6-diffusers-unofficial

license:cc-by-nc-nd-4.0

reliberate-v3-diffusers-unofficial

license:cc-by-nc-nd-4.0

FluentlyQwen3-4B-Q4_K_M-GGUF

NaNK

llama-cpp

BoW-v1-768px

—

ruphi-4b

- Developed by: ehristoforu - License: apache-2.0 - Finetuned from model : unsloth/Phi-3.5-mini-instruct-bnb-4bit This llama model was trained 2x faster with Unsloth and Huggingface's TRL library.

NaNK

llama

Gemma2-9B-it-psy10k-mental_health

- Developed by: ehristoforu - License: apache-2.0 - Finetuned from model : ehristoforu/Gemma2-9B-it-psy10k This gemma2 model was trained 2x faster with Unsloth and Huggingface's TRL library.

NaNK

license:apache-2.0

moremerge

This is a merge of pre-trained language models created using mergekit. This model was merged using the TIES merge method using Qwen/Qwen2.5-7B-Instruct as a base. The following models were included in the merge: EVA-UNIT-01/EVA-Qwen2.5-7B-v0.1 HumanLLMs/Human-Like-Qwen2.5-7B-Instruct Qwen/Qwen2.5-7B-Instruct-1M Qwen/Qwen2.5-Math-7B deepseek-ai/DeepSeek-R1-Distill-Qwen-7B Qwen/Qwen2.5-Coder-7B fblgit/cybertron-v4-qw7B-UNAMGS prithivMLmods/QwQ-LCoT2-7B-Instruct huihui-ai/Qwen2.5-7B-Instruct-abliterated Rombo-Org/Rombo-LLM-V2.5-Qwen-7b The following YAML configuration was used to produce this model:

NaNK

—

rmoe-v1

This modelcard aims to be a base template for new models. It has been generated using this raw template. - Developed by: [More Information Needed] - Funded by [optional]: [More Information Needed] - Shared by [optional]: [More Information Needed] - Model type: [More Information Needed] - Language(s) (NLP): [More Information Needed] - License: [More Information Needed] - Finetuned from model [optional]: [More Information Needed] - Repository: [More Information Needed] - Paper [optional]: [More Information Needed] - Demo [optional]: [More Information Needed] Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019). - Hardware Type: [More Information Needed] - Hours used: [More Information Needed] - Cloud Provider: [More Information Needed] - Compute Region: [More Information Needed] - Carbon Emitted: [More Information Needed]

license:mit

tts-1111

tts-1111 is a merge of the following models using LazyMergekit:

llama

Gemma2-9b-it-train1

NaNK

license:apache-2.0

Gemma2-9b-it-train2

NaNK

license:apache-2.0

Gemma2-9b-it-train3

NaNK

license:apache-2.0

Gemma2-9b-it-train5

NaNK

license:apache-2.0

Qwen2-1.5b-it-chat-sp-ru-bel-arm-ger-fin-tur

NaNK

license:apache-2.0

qwen2.5-with-lora-think-3b-it

NaNK

—

fp4-14b-v1-fix

This is a merge of pre-trained language models created using mergekit. This model was merged using the Model Stock merge method using unsloth/phi-4 as a base. The following models were included in the merge: prithivMLmods/Phi-4-QwQ bunnycore/Phi-4-RP-V0.2 prithivMLmods/Phi-4-Empathetic mudler/LocalAI-functioncall-phi-4-v0.3 Pinkstack/SuperThoughts-CoT-14B-16k-o1-QwQ prithivMLmods/Phi-4-o1 prithivMLmods/Phi-4-Math-IO The following YAML configuration was used to produce this model:

NaNK

llama

llama-3-12b-instruct

NaNK

llama

Gemma2-9b-it-train6

- Developed by: ehristoforu - License: apache-2.0 - Finetuned from model : ehristoforu/Gemma2-9b-it-train5 This gemma2 model was trained 2x faster with Unsloth and Huggingface's TRL library.

NaNK

license:apache-2.0

RQwen-v0.1

Short info - Developed by: ehristoforu - Base model: Qwen/Qwen2.5-14B-Instruct - Type model: Qwen2 Instruct (ChatML) - Languages: English, Russian - Features: reflection tuning, logic and deep work with context - Trained with: Unsloth (Transformers SFT) - License: Apache-2.0 GGUF format: coming soon... Open LLM Leaderboard Evaluation Results Detailed results can be found here | Metric |Value| |-------------------|----:| |Avg. |32.48| |IFEval (0-Shot) |76.25| |BBH (3-Shot) |48.49| |MATH Lvl 5 (4-Shot)| 2.95| |GPQA (0-shot) |10.07| |MuSR (0-shot) |10.44| |MMLU-PRO (5-shot) |46.69|

NaNK

license:apache-2.0

fq2.5-7b-it-normalize_false

This is a merge of pre-trained language models created using mergekit. This model was merged using the Model Stock merge method using Qwen/Qwen2.5-7B-Instruct as a base. The following models were included in the merge: Bui1dMySea/LongRAG-Qwen2.5-7B-Instruct prithivMLmods/QwQ-MathOct-7B Krystalan/DRT-o1-7B prithivMLmods/QwQ-LCoT-7B-Instruct Orion-zhen/Qwen2.5-7B-Instruct-Uncensored Spestly/Athena-1-7B prithivMLmods/Deepthink-Reasoning-7B fblgit/cybertron-v4-qw7B-MGS Rombo-Org/Rombo-LLM-V2.5-Qwen-7b The following YAML configuration was used to produce this model:

NaNK

—

expansion-train2

NaNK

—

HappyLlama1-Q2_K-GGUF

NaNK

llama

0000mxs

NaNK

—

testllama

NaNK

llama

Qwen2-1.5b-it-chat-sp

NaNK

license:apache-2.0

Qwen2-1.5b-it-chat-sp-ru

NaNK

license:apache-2.0

Qwen2-1.5b-it-chat-sp-ru-bel-arm-ger

NaNK

license:apache-2.0

SoRu-0006

NaNK

license:apache-2.0

kwk-32b-Q5_K_M-GGUF

NaNK

llama-cpp

ultraset-1.5b-instruct-Q5_K_M-GGUF

NaNK

llama-cpp

falcon3-ultraset

- Developed by: ehristoforu - License: apache-2.0 - Finetuned from model : tiiuae/Falcon3-7B-Instruct This llama model was trained 2x faster with Unsloth and Huggingface's TRL library.

NaNK

llama

tmoe

license:apache-2.0

della-70b-test-v1

This is a merge of pre-trained language models created using mergekit. This model was merged using the Linear DELLA merge method using deepseek-ai/DeepSeek-R1-Distill-Llama-70B as a base. The following models were included in the merge: nvidia/Llama-3.1-Nemotron-70B-Instruct-HF The following YAML configuration was used to produce this model:

NaNK

llama

qwen3-4b-2

This is a merge of pre-trained language models created using mergekit. This model was merged using the Model Stock merge method using Qwen/Qwen3-4B as a base. The following models were included in the merge: Menlo/Jan-nano POLARIS-Project/Polaris-4B-Preview The following YAML configuration was used to produce this model:

NaNK

—

Gixtral-100B

NaNK

license:apache-2.0

QwenQwen2.5-7B-IT-Dare

This is a merge of pre-trained language models created using mergekit. This model was merged using the DARE TIES merge method using Qwen/Qwen2.5-7B-Instruct as a base. The following YAML configuration was used to produce this model:

NaNK

—

Gemma2-2b-it-chat

NaNK

license:apache-2.0

Qwen2-1.5b-it-bioinstruct

NaNK

license:apache-2.0

Qwen2-1.5b-it-math

NaNK

license:apache-2.0

Qwen2-1.5b-it-chat-sp-ru-bel-arm-ger-fin-tur-per-ko

NaNK

license:apache-2.0

RQwen-v0.1-Q2_K-GGUF

ehristoforu/RQwen-v0.1-Q2K-GGUF This model was converted to GGUF format from `ehristoforu/RQwen-v0.1` using llama.cpp via the ggml.ai's GGUF-my-repo space. Refer to the original model card for more details on the model. Use with llama.cpp Install llama.cpp through brew (works on Mac and Linux) Note: You can also use this checkpoint directly through the usage steps listed in the Llama.cpp repo as well. Step 2: Move into the llama.cpp folder and build it with `LLAMACURL=1` flag along with other hardware-specific flags (for ex: LLAMACUDA=1 for Nvidia GPUs on Linux).

NaNK

llama-cpp

SoRu-0001

NaNK

license:apache-2.0

SoRu-0003

NaNK

license:apache-2.0

SoRu-0009

- Developed by: ehristoforu - License: apache-2.0 - Finetuned from model : ehristoforu/SoRu-0008 This qwen2 model was trained 2x faster with Unsloth and Huggingface's TRL library. Open LLM Leaderboard Evaluation Results Detailed results can be found here | Metric |Value| |-------------------|----:| |Avg. | 5.95| |IFEval (0-Shot) |25.82| |BBH (3-Shot) | 5.14| |MATH Lvl 5 (4-Shot)| 0.00| |GPQA (0-shot) | 1.45| |MuSR (0-shot) | 0.62| |MMLU-PRO (5-shot) | 2.66|

NaNK

license:apache-2.0

BigFalcon3-18B

This is a merge of pre-trained language models created using mergekit. This model was merged using the passthrough merge method. The following models were included in the merge: tiiuae/Falcon3-10B-Instruct The following YAML configuration was used to produce this model:

NaNK

llama

frqwen2.5-from7b-duable4layers-it

This is a merge of pre-trained language models created using mergekit. This model was merged using the passthrough merge method. The following models were included in the merge: Qwen/Qwen2.5-7B-Instruct The following YAML configuration was used to produce this model:

NaNK

—

testq-32b

This is a merge of pre-trained language models created using mergekit. This model was merged using the passthrough merge method. The following models were included in the merge: ehristoforu/fq2.5-32b-v1 The following YAML configuration was used to produce this model:

NaNK

—

moremerge-upscaled

This is a merge of pre-trained language models created using mergekit. This model was merged using the Passthrough merge method. The following models were included in the merge: ehristoforu/moremerge The following YAML configuration was used to produce this model:

—

fd-lora-merged-64x128

license:mit

Gemma2-9B-it-psy10k

NaNK

license:apache-2.0

Llama-TI-8B-Instruct-Q4_K_M-GGUF

ehristoforu/Llama-TI-8B-Instruct-Q4KM-GGUF This model was converted to GGUF format from `fluently-lm/Llama-TI-8B-Instruct` using llama.cpp via the ggml.ai's GGUF-my-repo space. Refer to the original model card for more details on the model. Use with llama.cpp Install llama.cpp through brew (works on Mac and Linux) Note: You can also use this checkpoint directly through the usage steps listed in the Llama.cpp repo as well. Step 2: Move into the llama.cpp folder and build it with `LLAMACURL=1` flag along with other hardware-specific flags (for ex: LLAMACUDA=1 for Nvidia GPUs on Linux).

NaNK

llama

Qwen2-1.5b-it-chat-sp-ru-bel-arm-ger-fin-tur-per-ko-jap

NaNK

license:apache-2.0

RQwen-v0.2

- Developed by: ehristoforu - License: apache-2.0 - Finetuned from model : ehristoforu/RQwen-v0.1 This qwen2 model was trained 2x faster with Unsloth and Huggingface's TRL library.

NaNK

license:apache-2.0

mllama-3.1-8b-instruct

This is a merge of pre-trained language models created using mergekit. This model was merged using the TIES merge method using unsloth/Meta-Llama-3.1-8B-Instruct as a base. The following models were included in the merge: NousResearch/Hermes-3-Llama-3.1-8B Skywork/Skywork-o1-Open-Llama-3.1-8B cognitivecomputations/dolphin-2.9.4-llama3.1-8b SimpleBerry/LLaMA-O1-Base-1127 arcee-ai/Llama-3.1-SuperNova-Lite The following YAML configuration was used to produce this model:

NaNK

llama

fq2.5-7b-it-normalize_true

This is a merge of pre-trained language models created using mergekit. This model was merged using the Model Stock merge method using Qwen/Qwen2.5-7B-Instruct as a base. The following models were included in the merge: prithivMLmods/QwQ-MathOct-7B Orion-zhen/Qwen2.5-7B-Instruct-Uncensored Rombo-Org/Rombo-LLM-V2.5-Qwen-7b prithivMLmods/Deepthink-Reasoning-7B fblgit/cybertron-v4-qw7B-MGS Krystalan/DRT-o1-7B Bui1dMySea/LongRAG-Qwen2.5-7B-Instruct Spestly/Athena-1-7B prithivMLmods/QwQ-LCoT-7B-Instruct The following YAML configuration was used to produce this model:

NaNK

—

0001

NaNK

—

Mistral-7B-Instruct-v0.3-pruned

NaNK

—

Gemma2-9B-psy10k

NaNK

license:apache-2.0

Gemma2-9b-it-train4

NaNK

license:apache-2.0

Mistral-nemo-test-2layno-v3

NaNK

—

mistral-distil-test-2

NaNK

—

Exp-Test-BigXL

license:mit

Gemma2-2b-it-bioinstruct

NaNK

license:apache-2.0

Gemma2-2b-it-codealpaca

NaNK

license:apache-2.0

Gemma2-2b-it-math

NaNK

license:apache-2.0

Qwen2-1.5b-it-chat

NaNK

license:apache-2.0

Qwen2-1.5b-it-codealpaca

NaNK

license:apache-2.0

Llama3.1-it-chat

llama

Qwen2-1.5b-it-chat-sp-ru-bel

NaNK

license:apache-2.0

Qwen2-1.5b-it-chat-sp-ru-bel-arm

NaNK

license:apache-2.0

Qwen2-1.5b-it-chat-sp-ru-bel-arm-ger-fin

NaNK

license:apache-2.0

Qwen2-1.5b-it-chat-sp-ru-bel-arm-ger-fin-tur-per

NaNK

license:apache-2.0

Qwen2-1.5b-it-math-v2

NaNK

license:apache-2.0

theqwenmoe

NaNK

license:mit

SoRu-0004

NaNK

license:apache-2.0

QwenMoe-A1.5B-IT

NaNK

—

HermesX2

—

rufalcon3-3b-it

- Developed by: ehristoforu - License: apache-2.0 - Finetuned from model : tiiuae/Falcon3-3B-Instruct This llama model was trained 2x faster with Unsloth and Huggingface's TRL library.

NaNK

llama

rufalcon3-3b-it-Q3_K_S-GGUF

NaNK

llama

Falcon3-8B-Franken-Basestruct

This is a merge of pre-trained language models created using mergekit. This model was merged using the SLERP merge method. The following models were included in the merge: tiiuae/Falcon3-10B-Instruct tiiuae/Falcon3-10B-Base The following YAML configuration was used to produce this model:

NaNK

llama

frqwen2.5-from72b-duable10layers

NaNK

—

tmoe-v2

license:apache-2.0

tmoe-exp-v1

—

fd-lora-merged-16x32

license:mit

fd-lora-merged-64x128-Q5_0-GGUF

NaNK

llama-cpp

fd-lora-merged-16x32-Q5_0-GGUF

NaNK

llama-cpp

flc-r-union-4-ties

This is a merge of pre-trained language models created using mergekit. This model was merged using the TIES merge method using Qwen/Qwen2.5-3B as a base. The following models were included in the merge: Qwen/Qwen2.5-3B-Instruct + ehristoforu/flc-r-0004-lora Qwen/Qwen2.5-3B-Instruct Qwen/Qwen2.5-3B-Instruct + ehristoforu/flc-r-0001-lora Qwen/Qwen2.5-3B-Instruct + ehristoforu/flc-r-0002-lora Qwen/Qwen2.5-3B-Instruct + ehristoforu/flc-r-0003-lora The following YAML configuration was used to produce this model:

NaNK

—