marcuscedricridia
badllama3.2-1B
8B-Nemotaur-IT
distilroberta-fast
distilroberta-fast-surgical
badllama3.2-1B-GGUF
Qwill-0.6b-IT-Q5_K_M-GGUF
marcuscedricridia/Qwill-0.6b-IT-Q5KM-GGUF This model was converted to GGUF format from `marcuscedricridia/Qwill-0.6b-IT` using llama.cpp via the ggml.ai's GGUF-my-repo space. Refer to the original model card for more details on the model. Use with llama.cpp Install llama.cpp through brew (works on Mac and Linux) Note: You can also use this checkpoint directly through the usage steps listed in the Llama.cpp repo as well. Step 2: Move into the llama.cpp folder and build it with `LLAMACURL=1` flag along with other hardware-specific flags (for ex: LLAMACUDA=1 for Nvidia GPUs on Linux).
Qwill-0.6b-IT-Q8_0-GGUF
marcuscedricridia/Qwill-0.6b-IT-Q80-GGUF This model was converted to GGUF format from `marcuscedricridia/Qwill-0.6b-IT` using llama.cpp via the ggml.ai's GGUF-my-repo space. Refer to the original model card for more details on the model. Use with llama.cpp Install llama.cpp through brew (works on Mac and Linux) Note: You can also use this checkpoint directly through the usage steps listed in the Llama.cpp repo as well. Step 2: Move into the llama.cpp folder and build it with `LLAMACURL=1` flag along with other hardware-specific flags (for ex: LLAMACUDA=1 for Nvidia GPUs on Linux).
Qwill-0.6b-IT-Q4_K_M-GGUF
marcuscedricridia/Qwill-0.6b-IT-Q4KM-GGUF This model was converted to GGUF format from `marcuscedricridia/Qwill-0.6b-IT` using llama.cpp via the ggml.ai's GGUF-my-repo space. Refer to the original model card for more details on the model. Use with llama.cpp Install llama.cpp through brew (works on Mac and Linux) Note: You can also use this checkpoint directly through the usage steps listed in the Llama.cpp repo as well. Step 2: Move into the llama.cpp folder and build it with `LLAMACURL=1` flag along with other hardware-specific flags (for ex: LLAMACUDA=1 for Nvidia GPUs on Linux).
Springer-32B-4
smollm-instruction-freak
- Developed by: marcuscedricridia - License: apache-2.0 - Finetuned from model : HuggingFaceTB/SmolLM2-135M-Instruct This llama model was trained 2x faster with Unsloth and Huggingface's TRL library.
Yell-Qwen2.5-7B-Coder
Qwill-0.6b-IT-Q6_K-GGUF
marcuscedricridia/Qwill-0.6b-IT-Q6K-GGUF This model was converted to GGUF format from `marcuscedricridia/Qwill-0.6b-IT` using llama.cpp via the ggml.ai's GGUF-my-repo space. Refer to the original model card for more details on the model. Use with llama.cpp Install llama.cpp through brew (works on Mac and Linux) Note: You can also use this checkpoint directly through the usage steps listed in the Llama.cpp repo as well. Step 2: Move into the llama.cpp folder and build it with `LLAMACURL=1` flag along with other hardware-specific flags (for ex: LLAMACUDA=1 for Nvidia GPUs on Linux).
olmner-sbr-7b
Hush-Qwen2.5-7B-v1.4
Hush-Qwen2.5-7B-MST-v1.3
Cheng-2
Cheng-2-v1.1
stray-r1o-et
Hush-Qwen2.5-7B-della3
Yell-Qwen2.5-7B-Preview
This is a merge of pre-trained language models created using mergekit. This model was merged using the Model Stock merge method using marcuscedricridia/Yell-Qwen2.5-7B-1M-della1 as a base. The following models were included in the merge: marcuscedricridia/Yell-Qwen2.5-7B-1M-della4 marcuscedricridia/Yell-Qwen2.5-7B-1M-della3 marcuscedricridia/Yell-Qwen2.5-7B-1M-della2 marcuscedricridia/Yell-Qwen2.5-7B-Stock The following YAML configuration was used to produce this model:
Springer-32B-2
Springer-32B-5
olmner-della-7b
cursa-o1-7b
pre-cursa-o1-v1.6
Hush-Qwen2.5-7B-RP
Hush-Qwen2.5-7B-RP-v1.3-1M
Hush-Qwen2.5-7B-RP-v1.2-1M
Hush-Qwen2.5-7B-MST
Springer-32B
pre-cursa-o1
pre-cursa-o1-v1.3
pre-cursa-o1-v1.4
cursa-o1-7b-v1.2-normalize-false
r1o-et
etr1o-v1.2
Yell-Qwen2.5-7B-1M-della2
Yell-Qwen2.5-7B-Stock-v1.1
Qwen2.5-7B-1M
Qwen2.5-7B-Preview
Cheng-1
Model Overview Cheng-1 is a high-performance language model created through strategic merging of top-tier, pre-existing fine-tuned models. It excels in coding, math, translation, and roleplay without requiring additional fine-tuning. The final model was built using the modelstock method with a restore model to maintain strong instruction-following and mathematical abilities. 1. Foundation Model - "Yell-Qwen2.5-7B-1M" - Base Merge: Combined `Qwen2.5-7B-Instruct-1M` with `Qwen2.5-7B` using SCE merging. - Purpose: Established a strong general-purpose foundation for later merges. 2. Domain-Specific Merges - Coding: Merged `AceCoder-Qwen2.5-7B-Ins-Rule` with Yell-Qwen2.5-7B-1M. - Translation: Merged `DRT-7B` with Yell-Qwen2.5-7B-1M. - Math: Merged `AceMath-7B-Instruct` with Yell-Qwen2.5-7B-1M. - Method: All three were merged using della merging, producing three intermediate models. 3. Final Model Stock Merge - Models Combined: - `mergekit-della-wpunuct` - `mergekit-della-phphmhr` - `mergekit-della-qejrhsk` - `Hush-Qwen2.5-7B-RP-v1.2-1M` (Roleplay model) - Base Model: `YOYO-AI/Qwen2.5-7B-it-restore` - Final Method: Used modelstock merging to integrate all models into Cheng-1. Conclusion Cheng-1 is a versatile model optimized for multiple domains. By merging top-performing models in coding, math, translation, and roleplay, it achieves balanced and strong benchmark results without direct fine-tuning.
Springer-32B-Restore-2
Springer-32B-RP-4
Springer-32B-9
Springer-32B-20
Springer1.1-32B-Qwen2.5-Extras
Mixmix-LlaMAX3.2-1B-GGUF
Springer1.2-32B-3
32B-Peakaboo-Talk
kgr-600m-2511-it
This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated. - Developed by: [More Information Needed] - Funded by [optional]: [More Information Needed] - Shared by [optional]: [More Information Needed] - Model type: [More Information Needed] - Language(s) (NLP): [More Information Needed] - License: [More Information Needed] - Finetuned from model [optional]: [More Information Needed] - Repository: [More Information Needed] - Paper [optional]: [More Information Needed] - Demo [optional]: [More Information Needed] Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019). - Hardware Type: [More Information Needed] - Hours used: [More Information Needed] - Cloud Provider: [More Information Needed] - Compute Region: [More Information Needed] - Carbon Emitted: [More Information Needed]
bananafish-522
Release: May 2025 Base Model: Qwen3 Type: Instruction-tuned GLT (General Language Transformer) bananafish-522 is an update over bananafish-0517, trained for 5,000 steps on the 49,000-row ITP dataset. This is roughly one-fifth of a full epoch, so only a partial pass over the dataset. To improve coverage, the dataset was shuffled — ensuring samples from later parts were still seen during training. - Stronger coherence - Fewer output artifacts - More consistent instruction-following - Better formatting across responses - More usable as a base for creative or chat fine-tuning - Post-response artifacts Sometimes, the model appends: - One word in a foreign language - Or repeats a single Chinese character until max tokens Much rarer than in bananafish-0517. Likely caused by: - Incomplete training (only 5,000 steps) - Or possibly improper EOS handling (generating past ` `) To address this, ` ` was explicitly added as the EOS token in the generation config. - High hallucination rate Especially for anything dated 2024 or later. Always verify facts before use. bananafish-522 was built as a clean instruction-following base model without Qwen3’s "thinking toggle." It aims to support creative writing and roleplay fine-tuning where chain-of-thought generation is unwanted or intrusive. - Dataset: ITP (Instruction Tuning Public) - Size: 49,000 examples - Steps: 5,000 - Coverage: ~1/5 of a full epoch - Shuffled: Yes — to include later samples early in training - Contents: Instructions, chat, Q&A, STEM, writing, etc. Compared to the older 10k-row NoRobots dataset, ITP is larger, more diverse, and showed better alignment — even with fewer total steps. The difference came from dataset quality, not scale. Use in any transformer-compatible chat interface. Make sure to stop generation at ` `. - This is still a proof of concept - Only partially trained — fine-tuning for specific tasks is recommended - A full version trained on all 49k rows with style tuning is planned
Liyama-3B-Q4_K_M-GGUF
marcuscedricridia/Liyama-3B-Q4KM-GGUF This model was converted to GGUF format from `Linggowiktiks/Liyama-3B` using llama.cpp via the ggml.ai's GGUF-my-repo space. Refer to the original model card for more details on the model. Use with llama.cpp Install llama.cpp through brew (works on Mac and Linux) Note: You can also use this checkpoint directly through the usage steps listed in the Llama.cpp repo as well. Step 2: Move into the llama.cpp folder and build it with `LLAMACURL=1` flag along with other hardware-specific flags (for ex: LLAMACUDA=1 for Nvidia GPUs on Linux).
smollm-wizard-vicuna
smollm-wizard-vicuna-Q4_K_M-GGUF
Hush-Qwen2.5-7B-RP-v1.4-1M
Hush-Qwen2.5-7B-v1.3
Abus-7B-Instruct
Hush-Qwen2.5-7B-MST-v1.1
Cheng-2-Base
sbr-o1-7b
post-cursa-o1
etr1o-v1.1
Hush-Qwen2.5-7B-della4
Hush-Qwen2.5-7B-Preview
Yell-Qwen2.5-7B-1M
Yell-Qwen2.5-7B-Stock
Cheng-2-Ingredient3
Cheng-2-Ingredient4
Springer-32B-8
Springer-32B-15
Springer1.0-32B-Qwen2.5-Reasoning
llamalicious3.2-3B
llamalicious3.2-1B-GGUF
Mixmix-LlaMAX3.2-1B
160B-NotQuiteAMoE
This is a passthrough experiment with ~158B (160B) params. We merged all 64 layers from each model—no picking, full overlap. It's rough, unfiltered, and definitely experimental. This version is meant to test the concept. We don't recommend using this model. It's huge, needs serious hardware — more than we can run ourselves. If you must try it, use the cloud.
32B-Peakaboo-Prover
Qwill-0.6b-IT-Q3_K_S-GGUF
marcuscedricridia/Qwill-0.6b-IT-Q3KS-GGUF This model was converted to GGUF format from `marcuscedricridia/Qwill-0.6b-IT` using llama.cpp via the ggml.ai's GGUF-my-repo space. Refer to the original model card for more details on the model. Use with llama.cpp Install llama.cpp through brew (works on Mac and Linux) Note: You can also use this checkpoint directly through the usage steps listed in the Llama.cpp repo as well. Step 2: Move into the llama.cpp folder and build it with `LLAMACURL=1` flag along with other hardware-specific flags (for ex: LLAMACUDA=1 for Nvidia GPUs on Linux).
Qwill-0.6b-IT-Q2_K-GGUF
marcuscedricridia/Qwill-0.6b-IT-Q2K-GGUF This model was converted to GGUF format from `marcuscedricridia/Qwill-0.6b-IT` using llama.cpp via the ggml.ai's GGUF-my-repo space. Refer to the original model card for more details on the model. Use with llama.cpp Install llama.cpp through brew (works on Mac and Linux) Note: You can also use this checkpoint directly through the usage steps listed in the Llama.cpp repo as well. Step 2: Move into the llama.cpp folder and build it with `LLAMACURL=1` flag along with other hardware-specific flags (for ex: LLAMACUDA=1 for Nvidia GPUs on Linux).
bananafish-0517
Model Description bananafish-0517 is a proof-of-concept fine-tuned checkpoint built upon the Qwen 0.6B base model. This checkpoint represents an early stage in the fine-tuning process, trained for only 0.25 epochs. The main motivation behind this model is to explore an alternative instruction tuning approach using the ChatML format, departing from the Alpaca-style prompts commonly used with Qwen. Unlike the official Qwen3 instruction-tuned models, which are heavily aligned toward STEM tasks, bananafish-0517 aims to preserve a more natural, less technical writing style with fewer "GPT-like" artifacts. This makes it a promising base for future creative or general-purpose instruction tuning. Intended Use - Experimental use to evaluate early-stage fine-tuning on Qwen 0.6B. - Testing alternative prompt formats (ChatML) for conversational generation. - Proof of concept for instruction tuning less focused on STEM-heavy alignment. - Starting point for further fine-tuning iterations to improve versatility and creativity. - Base model: Qwen 0.6B - Fine-tuning epochs: 0.25 (only partial epoch) - Training method: LoRA fine-tuning (rank 16, alpha 32) - LoRA dropout: 0.05 - RSLora: Enabled (using Unsloth implementation) - Optimizer: AdamW with weight decay 0.0001 - Learning rate: 3e-6 - LR scheduler: Cosine - Warmup ratio: 0.03 Prompt Format This checkpoint uses the ChatML-style prompt format: This differs from the Alpaca-style format, aiming to better suit the Qwen architecture and encourage more natural dialogue flow. Reproducibility A Colab notebook will be provided for reproducibility and testing. Feel free to open a discussion for collaboration or questions. Why This Model Exists - Many users reported difficulties when fine-tuning Qwen base models, especially with Alpaca-style prompts. This checkpoint tests: - A different, cleaner prompt style (ChatML). (Different from the base Alpaca in the stock unsloth notebook) - Minimal training to observe the impact of prompt format and LoRA fine-tuning. - Moving away from the heavy STEM alignment of official Qwen instruction models toward a freer, more natural writing style. Limitations - Trained for only a fraction of an epoch so performance and stability are preliminary. - The model is expected to improve significantly with further training. - Currently optimized for inference with LoRA adapters and may require additional tuning for production use. Acknowledgments - Thanks to the Unsloth team! - Inspired by the Qwen team's open-source base model and instruction tuning efforts. Stay tuned for further updates and improvements! 😉 (Will do full models tomorrow, its currently 6:19 as I am writing this and I haven't gotten any sleep.)
kgr-600m-2511-it-709-Q4_K_M-GGUF
marcuscedricridia/kgr-600m-2511-it-709-Q4KM-GGUF This model was converted to GGUF format from `marcuscedricridia/kgr-600m-2511-it-709` using llama.cpp via the ggml.ai's GGUF-my-repo space. Refer to the original model card for more details on the model. Use with llama.cpp Install llama.cpp through brew (works on Mac and Linux) Note: You can also use this checkpoint directly through the usage steps listed in the Llama.cpp repo as well. Step 2: Move into the llama.cpp folder and build it with `LLAMACURL=1` flag along with other hardware-specific flags (for ex: LLAMACUDA=1 for Nvidia GPUs on Linux).
orbita-tiny-Q4_K_M-GGUF
arc-Q4_K_M-GGUF
cursa-o1-7b-v1.1
absolute-o1-7b
etr1o-v1.4
Hush-Qwen2.5-7B-ING1
Hush-Qwen2.5-7B-RP-1M
Hush-Qwen2.5-7B-RP-v1.1
Hush-Qwen2.5-7B-RP-v1.1-1M
Hush-Qwen2.5-7B-v1.1
Hush-Qwen2.5-7B-RP-v1.2
Hush-Qwen2.5-7B-RP-v1.3
Hush-Qwen2.5-7B-RP-v1.4
Hush-Qwen2.5-7B-v1.2
Abus-14B-Ingredient1
This is a merge of pre-trained language models created using mergekit. This model was merged using the DELLA merge method using marcuscedricridia/Cheng-2-Base as a base. The following models were included in the merge: JungZoona/T3Q-qwen2.5-14b-v1.0-e3 The following YAML configuration was used to produce this model: