ehristoforu
dalle-3-xl-v2
You should use ` ` to trigger the image generation. Weights for this model are available in Safetensors format.
stable-diffusion-v1-5-tiny
dreamdrop
dalle-3-xl
Weights for this model are available in Safetensors format.
coolqwen-3b-it
Qwen2.5 is the latest series of Qwen large language models. For Qwen2.5, we release a number of base language models and instruction-tuned language models ranging from 0.5 to 72 billion parameters. Qwen2.5 brings the following improvements upon Qwen2: - Significantly more knowledge and has greatly improved capabilities in coding and mathematics, thanks to our specialized expert models in these domains. - Significant improvements in instruction following, generating long texts (over 8K tokens), understanding structured data (e.g, tables), and generating structured outputs especially JSON. More resilient to the diversity of system prompts, enhancing role-play implementation and condition-setting for chatbots. - Long-context Support up to 128K tokens and can generate up to 8K tokens. - Multilingual support for over 29 languages, including Chinese, English, French, Spanish, Portuguese, German, Italian, Russian, Japanese, Korean, Vietnamese, Thai, Arabic, and more. This repo contains the instruction-tuned 3B Qwen2.5 model, which has the following features: - Type: Causal Language Models - Training Stage: Pretraining & Post-training - Architecture: transformers with RoPE, SwiGLU, RMSNorm, Attention QKV bias and tied word embeddings - Number of Parameters: 3.09B - Number of Paramaters (Non-Embedding): 2.77B - Number of Layers: 36 - Number of Attention Heads (GQA): 16 for Q and 2 for KV - Context Length: Full 32,768 tokens and generation 8192 tokens For more details, please refer to our blog, GitHub, and Documentation. The code of Qwen2.5 has been in the latest Hugging face `transformers` and we advise you to use the latest version of `transformers`. With `transformers<4.37.0`, you will encounter the following error: Here provides a code snippet with `applychattemplate` to show you how to load the tokenizer and model and how to generate contents. Detailed evaluation results are reported in this 📑 blog. For requirements on GPU memory and the respective throughput, see results here. If you find our work helpful, feel free to give us a cite.
FluentlyQwen3-1.7B-Q4_K_M-GGUF
Falcon3-MoE-2x7B-Insruct
- 13.4B parameters - BF16 - Falcon3 (Llama) - Instruct Falcon3-7B-Instruct Falcon3 family of Open Foundation Models is a set of pretrained and instruct LLMs ranging from 1B to 10B. This repository contains the Falcon3-7B-Instruct. It achieves state of art results (at the time of release) on reasoning, language understanding, instruction following, code and mathematics tasks. Falcon3-7B-Instruct supports 4 languages (english, french, spanish, portuguese) and a context length up to 32K.
phi-4-25b
This is a merge of pre-trained language models created using mergekit. This model was merged using the passthrough merge method. The following models were included in the merge: microsoft/phi-4 The following YAML configuration was used to produce this model:
Visionix-alpha
0109-test-32b-it
gpt2-Q4_K_M-GGUF
Qwen2-1.5b-it-chat-mistral-Q4_K_M-GGUF
Visionix-alpha-inpainting
Gistral-16B-Q4_K_M-GGUF
dreamdrop-inpainting
LLMs
c4ai-command-r-plus-Q2_K-GGUF
FluentlyLM-Prinum-Q2_K-GGUF
FluentlyQwen3-Coder-4B-0909-Q4_K_M-GGUF
ehristoforu/FluentlyQwen3-Coder-4B-0909-Q4KM-GGUF This model was converted to GGUF format from `fluently/FluentlyQwen3-Coder-4B-0909` using llama.cpp via the ggml.ai's GGUF-my-repo space. Refer to the original model card for more details on the model. Use with llama.cpp Install llama.cpp through brew (works on Mac and Linux) Note: You can also use this checkpoint directly through the usage steps listed in the Llama.cpp repo as well. Step 2: Move into the llama.cpp folder and build it with `LLAMACURL=1` flag along with other hardware-specific flags (for ex: LLAMACUDA=1 for Nvidia GPUs on Linux).
deliberate-v6-diffusers-unofficial
reliberate-v3-diffusers-unofficial
FluentlyQwen3-4B-Q4_K_M-GGUF
BoW-v1-768px
ruphi-4b
- Developed by: ehristoforu - License: apache-2.0 - Finetuned from model : unsloth/Phi-3.5-mini-instruct-bnb-4bit This llama model was trained 2x faster with Unsloth and Huggingface's TRL library.
Gemma2-9B-it-psy10k-mental_health
- Developed by: ehristoforu - License: apache-2.0 - Finetuned from model : ehristoforu/Gemma2-9B-it-psy10k This gemma2 model was trained 2x faster with Unsloth and Huggingface's TRL library.
moremerge
This is a merge of pre-trained language models created using mergekit. This model was merged using the TIES merge method using Qwen/Qwen2.5-7B-Instruct as a base. The following models were included in the merge: EVA-UNIT-01/EVA-Qwen2.5-7B-v0.1 HumanLLMs/Human-Like-Qwen2.5-7B-Instruct Qwen/Qwen2.5-7B-Instruct-1M Qwen/Qwen2.5-Math-7B deepseek-ai/DeepSeek-R1-Distill-Qwen-7B Qwen/Qwen2.5-Coder-7B fblgit/cybertron-v4-qw7B-UNAMGS prithivMLmods/QwQ-LCoT2-7B-Instruct huihui-ai/Qwen2.5-7B-Instruct-abliterated Rombo-Org/Rombo-LLM-V2.5-Qwen-7b The following YAML configuration was used to produce this model:
rmoe-v1
This modelcard aims to be a base template for new models. It has been generated using this raw template. - Developed by: [More Information Needed] - Funded by [optional]: [More Information Needed] - Shared by [optional]: [More Information Needed] - Model type: [More Information Needed] - Language(s) (NLP): [More Information Needed] - License: [More Information Needed] - Finetuned from model [optional]: [More Information Needed] - Repository: [More Information Needed] - Paper [optional]: [More Information Needed] - Demo [optional]: [More Information Needed] Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019). - Hardware Type: [More Information Needed] - Hours used: [More Information Needed] - Cloud Provider: [More Information Needed] - Compute Region: [More Information Needed] - Carbon Emitted: [More Information Needed]
tts-1111
tts-1111 is a merge of the following models using LazyMergekit:
Gemma2-9b-it-train1
Gemma2-9b-it-train2
Gemma2-9b-it-train3
Gemma2-9b-it-train5
Qwen2-1.5b-it-chat-sp-ru-bel-arm-ger-fin-tur
qwen2.5-with-lora-think-3b-it
Qwen2.5 is the latest series of Qwen large language models. For Qwen2.5, we release a number of base language models and instruction-tuned language models ranging from 0.5 to 72 billion parameters. Qwen2.5 brings the following improvements upon Qwen2: - Significantly more knowledge and has greatly improved capabilities in coding and mathematics, thanks to our specialized expert models in these domains. - Significant improvements in instruction following, generating long texts (over 8K tokens), understanding structured data (e.g, tables), and generating structured outputs especially JSON. More resilient to the diversity of system prompts, enhancing role-play implementation and condition-setting for chatbots. - Long-context Support up to 128K tokens and can generate up to 8K tokens. - Multilingual support for over 29 languages, including Chinese, English, French, Spanish, Portuguese, German, Italian, Russian, Japanese, Korean, Vietnamese, Thai, Arabic, and more. This repo contains the instruction-tuned 3B Qwen2.5 model, which has the following features: - Type: Causal Language Models - Training Stage: Pretraining & Post-training - Architecture: transformers with RoPE, SwiGLU, RMSNorm, Attention QKV bias and tied word embeddings - Number of Parameters: 3.09B - Number of Paramaters (Non-Embedding): 2.77B - Number of Layers: 36 - Number of Attention Heads (GQA): 16 for Q and 2 for KV - Context Length: Full 32,768 tokens and generation 8192 tokens For more details, please refer to our blog, GitHub, and Documentation. The code of Qwen2.5 has been in the latest Hugging face `transformers` and we advise you to use the latest version of `transformers`. With `transformers<4.37.0`, you will encounter the following error: Here provides a code snippet with `applychattemplate` to show you how to load the tokenizer and model and how to generate contents. Detailed evaluation results are reported in this 📑 blog. For requirements on GPU memory and the respective throughput, see results here. If you find our work helpful, feel free to give us a cite.
fp4-14b-v1-fix
This is a merge of pre-trained language models created using mergekit. This model was merged using the Model Stock merge method using unsloth/phi-4 as a base. The following models were included in the merge: prithivMLmods/Phi-4-QwQ bunnycore/Phi-4-RP-V0.2 prithivMLmods/Phi-4-Empathetic mudler/LocalAI-functioncall-phi-4-v0.3 Pinkstack/SuperThoughts-CoT-14B-16k-o1-QwQ prithivMLmods/Phi-4-o1 prithivMLmods/Phi-4-Math-IO The following YAML configuration was used to produce this model:
llama-3-12b-instruct
Gemma2-9b-it-train6
- Developed by: ehristoforu - License: apache-2.0 - Finetuned from model : ehristoforu/Gemma2-9b-it-train5 This gemma2 model was trained 2x faster with Unsloth and Huggingface's TRL library.
RQwen-v0.1
Short info - Developed by: ehristoforu - Base model: Qwen/Qwen2.5-14B-Instruct - Type model: Qwen2 Instruct (ChatML) - Languages: English, Russian - Features: reflection tuning, logic and deep work with context - Trained with: Unsloth (Transformers SFT) - License: Apache-2.0 GGUF format: coming soon... Open LLM Leaderboard Evaluation Results Detailed results can be found here | Metric |Value| |-------------------|----:| |Avg. |32.48| |IFEval (0-Shot) |76.25| |BBH (3-Shot) |48.49| |MATH Lvl 5 (4-Shot)| 2.95| |GPQA (0-shot) |10.07| |MuSR (0-shot) |10.44| |MMLU-PRO (5-shot) |46.69|
fq2.5-7b-it-normalize_false
This is a merge of pre-trained language models created using mergekit. This model was merged using the Model Stock merge method using Qwen/Qwen2.5-7B-Instruct as a base. The following models were included in the merge: Bui1dMySea/LongRAG-Qwen2.5-7B-Instruct prithivMLmods/QwQ-MathOct-7B Krystalan/DRT-o1-7B prithivMLmods/QwQ-LCoT-7B-Instruct Orion-zhen/Qwen2.5-7B-Instruct-Uncensored Spestly/Athena-1-7B prithivMLmods/Deepthink-Reasoning-7B fblgit/cybertron-v4-qw7B-MGS Rombo-Org/Rombo-LLM-V2.5-Qwen-7b The following YAML configuration was used to produce this model:
expansion-train2
HappyLlama1-Q2_K-GGUF
0000mxs
testllama
Qwen2-1.5b-it-chat-sp
Qwen2-1.5b-it-chat-sp-ru
Qwen2-1.5b-it-chat-sp-ru-bel-arm-ger
SoRu-0006
kwk-32b-Q5_K_M-GGUF
ultraset-1.5b-instruct-Q5_K_M-GGUF
falcon3-ultraset
- Developed by: ehristoforu - License: apache-2.0 - Finetuned from model : tiiuae/Falcon3-7B-Instruct This llama model was trained 2x faster with Unsloth and Huggingface's TRL library.
tmoe
della-70b-test-v1
This is a merge of pre-trained language models created using mergekit. This model was merged using the Linear DELLA merge method using deepseek-ai/DeepSeek-R1-Distill-Llama-70B as a base. The following models were included in the merge: nvidia/Llama-3.1-Nemotron-70B-Instruct-HF The following YAML configuration was used to produce this model:
qwen3-4b-2
This is a merge of pre-trained language models created using mergekit. This model was merged using the Model Stock merge method using Qwen/Qwen3-4B as a base. The following models were included in the merge: Menlo/Jan-nano POLARIS-Project/Polaris-4B-Preview The following YAML configuration was used to produce this model:
Gixtral-100B
QwenQwen2.5-7B-IT-Dare
This is a merge of pre-trained language models created using mergekit. This model was merged using the DARE TIES merge method using Qwen/Qwen2.5-7B-Instruct as a base. The following YAML configuration was used to produce this model:
Gemma2-2b-it-chat
Qwen2-1.5b-it-bioinstruct
Qwen2-1.5b-it-math
Qwen2-1.5b-it-chat-sp-ru-bel-arm-ger-fin-tur-per-ko
RQwen-v0.1-Q2_K-GGUF
ehristoforu/RQwen-v0.1-Q2K-GGUF This model was converted to GGUF format from `ehristoforu/RQwen-v0.1` using llama.cpp via the ggml.ai's GGUF-my-repo space. Refer to the original model card for more details on the model. Use with llama.cpp Install llama.cpp through brew (works on Mac and Linux) Note: You can also use this checkpoint directly through the usage steps listed in the Llama.cpp repo as well. Step 2: Move into the llama.cpp folder and build it with `LLAMACURL=1` flag along with other hardware-specific flags (for ex: LLAMACUDA=1 for Nvidia GPUs on Linux).
SoRu-0001
SoRu-0003
SoRu-0009
- Developed by: ehristoforu - License: apache-2.0 - Finetuned from model : ehristoforu/SoRu-0008 This qwen2 model was trained 2x faster with Unsloth and Huggingface's TRL library. Open LLM Leaderboard Evaluation Results Detailed results can be found here | Metric |Value| |-------------------|----:| |Avg. | 5.95| |IFEval (0-Shot) |25.82| |BBH (3-Shot) | 5.14| |MATH Lvl 5 (4-Shot)| 0.00| |GPQA (0-shot) | 1.45| |MuSR (0-shot) | 0.62| |MMLU-PRO (5-shot) | 2.66|
BigFalcon3-18B
This is a merge of pre-trained language models created using mergekit. This model was merged using the passthrough merge method. The following models were included in the merge: tiiuae/Falcon3-10B-Instruct The following YAML configuration was used to produce this model:
frqwen2.5-from7b-duable4layers-it
This is a merge of pre-trained language models created using mergekit. This model was merged using the passthrough merge method. The following models were included in the merge: Qwen/Qwen2.5-7B-Instruct The following YAML configuration was used to produce this model:
testq-32b
This is a merge of pre-trained language models created using mergekit. This model was merged using the passthrough merge method. The following models were included in the merge: ehristoforu/fq2.5-32b-v1 The following YAML configuration was used to produce this model:
moremerge-upscaled
This is a merge of pre-trained language models created using mergekit. This model was merged using the Passthrough merge method. The following models were included in the merge: ehristoforu/moremerge The following YAML configuration was used to produce this model:
fd-lora-merged-64x128
This modelcard aims to be a base template for new models. It has been generated using this raw template. - Developed by: [More Information Needed] - Funded by [optional]: [More Information Needed] - Shared by [optional]: [More Information Needed] - Model type: [More Information Needed] - Language(s) (NLP): [More Information Needed] - License: [More Information Needed] - Finetuned from model [optional]: [More Information Needed] - Repository: [More Information Needed] - Paper [optional]: [More Information Needed] - Demo [optional]: [More Information Needed] Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019). - Hardware Type: [More Information Needed] - Hours used: [More Information Needed] - Cloud Provider: [More Information Needed] - Compute Region: [More Information Needed] - Carbon Emitted: [More Information Needed] [More Information Needed] Open LLM Leaderboard Evaluation Results Detailed results can be found here | Metric |Value| |-------------------|----:| |Avg. | 8.11| |IFEval (0-Shot) |32.81| |BBH (3-Shot) | 7.82| |MATH Lvl 5 (4-Shot)| 0.15| |GPQA (0-shot) | 0.67| |MuSR (0-shot) | 1.27| |MMLU-PRO (5-shot) | 5.96|
Gemma2-9B-it-psy10k
Llama-TI-8B-Instruct-Q4_K_M-GGUF
ehristoforu/Llama-TI-8B-Instruct-Q4KM-GGUF This model was converted to GGUF format from `fluently-lm/Llama-TI-8B-Instruct` using llama.cpp via the ggml.ai's GGUF-my-repo space. Refer to the original model card for more details on the model. Use with llama.cpp Install llama.cpp through brew (works on Mac and Linux) Note: You can also use this checkpoint directly through the usage steps listed in the Llama.cpp repo as well. Step 2: Move into the llama.cpp folder and build it with `LLAMACURL=1` flag along with other hardware-specific flags (for ex: LLAMACUDA=1 for Nvidia GPUs on Linux).
Qwen2-1.5b-it-chat-sp-ru-bel-arm-ger-fin-tur-per-ko-jap
RQwen-v0.2
- Developed by: ehristoforu - License: apache-2.0 - Finetuned from model : ehristoforu/RQwen-v0.1 This qwen2 model was trained 2x faster with Unsloth and Huggingface's TRL library.
mllama-3.1-8b-instruct
This is a merge of pre-trained language models created using mergekit. This model was merged using the TIES merge method using unsloth/Meta-Llama-3.1-8B-Instruct as a base. The following models were included in the merge: NousResearch/Hermes-3-Llama-3.1-8B Skywork/Skywork-o1-Open-Llama-3.1-8B cognitivecomputations/dolphin-2.9.4-llama3.1-8b SimpleBerry/LLaMA-O1-Base-1127 arcee-ai/Llama-3.1-SuperNova-Lite The following YAML configuration was used to produce this model:
fq2.5-7b-it-normalize_true
This is a merge of pre-trained language models created using mergekit. This model was merged using the Model Stock merge method using Qwen/Qwen2.5-7B-Instruct as a base. The following models were included in the merge: prithivMLmods/QwQ-MathOct-7B Orion-zhen/Qwen2.5-7B-Instruct-Uncensored Rombo-Org/Rombo-LLM-V2.5-Qwen-7b prithivMLmods/Deepthink-Reasoning-7B fblgit/cybertron-v4-qw7B-MGS Krystalan/DRT-o1-7B Bui1dMySea/LongRAG-Qwen2.5-7B-Instruct Spestly/Athena-1-7B prithivMLmods/QwQ-LCoT-7B-Instruct The following YAML configuration was used to produce this model:
0001
Mistral-7B-Instruct-v0.3-pruned
Gemma2-9B-psy10k
Gemma2-9b-it-train4
Mistral-nemo-test-2layno-v3
mistral-distil-test-2
Exp-Test-BigXL
Gemma2-2b-it-bioinstruct
Gemma2-2b-it-codealpaca
Gemma2-2b-it-math
Qwen2-1.5b-it-chat
Qwen2-1.5b-it-codealpaca
Llama3.1-it-chat
Qwen2-1.5b-it-chat-sp-ru-bel
Qwen2-1.5b-it-chat-sp-ru-bel-arm
Qwen2-1.5b-it-chat-sp-ru-bel-arm-ger-fin
Qwen2-1.5b-it-chat-sp-ru-bel-arm-ger-fin-tur-per
Qwen2-1.5b-it-math-v2
theqwenmoe
SoRu-0004
QwenMoe-A1.5B-IT
HermesX2
rufalcon3-3b-it
- Developed by: ehristoforu - License: apache-2.0 - Finetuned from model : tiiuae/Falcon3-3B-Instruct This llama model was trained 2x faster with Unsloth and Huggingface's TRL library.
rufalcon3-3b-it-Q3_K_S-GGUF
Falcon3-8B-Franken-Basestruct
This is a merge of pre-trained language models created using mergekit. This model was merged using the SLERP merge method. The following models were included in the merge: tiiuae/Falcon3-10B-Instruct tiiuae/Falcon3-10B-Base The following YAML configuration was used to produce this model:
frqwen2.5-from72b-duable10layers
tmoe-v2
tmoe-exp-v1
fd-lora-merged-16x32
This modelcard aims to be a base template for new models. It has been generated using this raw template. - Developed by: [More Information Needed] - Funded by [optional]: [More Information Needed] - Shared by [optional]: [More Information Needed] - Model type: [More Information Needed] - Language(s) (NLP): [More Information Needed] - License: [More Information Needed] - Finetuned from model [optional]: [More Information Needed] - Repository: [More Information Needed] - Paper [optional]: [More Information Needed] - Demo [optional]: [More Information Needed] Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019). - Hardware Type: [More Information Needed] - Hours used: [More Information Needed] - Cloud Provider: [More Information Needed] - Compute Region: [More Information Needed] - Carbon Emitted: [More Information Needed] [More Information Needed] Open LLM Leaderboard Evaluation Results Detailed results can be found here | Metric |Value| |-------------------|----:| |Avg. | 7.61| |IFEval (0-Shot) |34.81| |BBH (3-Shot) | 6.53| |MATH Lvl 5 (4-Shot)| 0.00| |GPQA (0-shot) | 0.45| |MuSR (0-shot) | 1.60| |MMLU-PRO (5-shot) | 2.28|
fd-lora-merged-64x128-Q5_0-GGUF
fd-lora-merged-16x32-Q5_0-GGUF
flc-r-union-4-ties
This is a merge of pre-trained language models created using mergekit. This model was merged using the TIES merge method using Qwen/Qwen2.5-3B as a base. The following models were included in the merge: Qwen/Qwen2.5-3B-Instruct + ehristoforu/flc-r-0004-lora Qwen/Qwen2.5-3B-Instruct Qwen/Qwen2.5-3B-Instruct + ehristoforu/flc-r-0001-lora Qwen/Qwen2.5-3B-Instruct + ehristoforu/flc-r-0002-lora Qwen/Qwen2.5-3B-Instruct + ehristoforu/flc-r-0003-lora The following YAML configuration was used to produce this model:
Gistral-16B
StableLive-sd-portable
mjlora
dreamly-diffusion
extensions
phi-4-45b
stable-cascade-zip
custom-chatgpt-prompts
qwenUnion-32b-Q5_K_M-GGUF
think-lora-qwen-r64
qwen2.5-7b-upscaled
QwenQwen2.5-7B-IT
This is a merge of pre-trained language models created using mergekit. This model was merged using the TIES merge method using Qwen/Qwen2.5-7B-Instruct as a base. The following YAML configuration was used to produce this model: