ToastyPigeon
Qwen3.5-Test-GGUFs
muse-marvin-gguf
Qwen3-30B-A3B-AntiRep-2507-Q4_K_M-GGUF
ToastyPigeon/Qwen3-30B-A3B-AntiRep-2507-Q4KM-GGUF This model was converted to GGUF format from `ConicCat/Qwen3-30B-A3B-AntiRep-2507` using llama.cpp via the ggml.ai's GGUF-my-repo space. Refer to the original model card for more details on the model. Use with llama.cpp Install llama.cpp through brew (works on Mac and Linux) Note: You can also use this checkpoint directly through the usage steps listed in the Llama.cpp repo as well. Step 2: Move into the llama.cpp folder and build it with `LLAMACURL=1` flag along with other hardware-specific flags (for ex: LLAMACUDA=1 for Nvidia GPUs on Linux).
granite-4.0-h-small-half-slerp-Q4_K_S-GGUF
muse-margvin-gguf
muse-marvin-od-lora
medgemma-27b-text-it-abliterated
Qwen3.5-27B-Antirep-V1
cursed-test-ggufs
muse-marvin-32k-lora
This model is a fine-tuned version of LatitudeGames/Muse-12B on the ToastyPigeon/steve-and-marvin dataset. It achieves the following results on the evaluation set: - Loss: 2.5071 - Memory/max Active (gib): 4.98 - Memory/max Allocated (gib): 4.89 - Memory/device Reserved (gib): 6.9 The following hyperparameters were used during training: - learningrate: 1e-05 - trainbatchsize: 1 - evalbatchsize: 1 - seed: 69 - distributedtype: multi-GPU - numdevices: 2 - gradientaccumulationsteps: 4 - totaltrainbatchsize: 8 - totalevalbatchsize: 2 - optimizer: Use OptimizerNames.ADAMWTORCHFUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizerargs=No additional optimizer arguments - lrschedulertype: cosine - lrschedulerwarmupsteps: 10 - trainingsteps: 420 | Training Loss | Epoch | Step | Validation Loss | Active (gib) | Allocated (gib) | Reserved (gib) | |:-------------:|:------:|:----:|:---------------:|:------------:|:---------------:|:--------------:| | No log | 0 | 0 | 2.6286 | 8.04 | 6.73 | 8.36 | | 2.4233 | 0.0993 | 21 | 2.6047 | 4.98 | 4.89 | 6.9 | | 2.5581 | 0.1986 | 42 | 2.5627 | 4.98 | 4.89 | 6.9 | | 2.3368 | 0.2979 | 63 | 2.5447 | 4.98 | 4.89 | 6.9 | | 2.5579 | 0.3972 | 84 | 2.5328 | 4.98 | 4.89 | 6.9 | | 2.4241 | 0.4965 | 105 | 2.5253 | 4.98 | 4.89 | 6.9 | | 2.4608 | 0.5957 | 126 | 2.5199 | 4.98 | 4.89 | 6.9 | | 2.8143 | 0.6950 | 147 | 2.5156 | 4.98 | 4.89 | 6.9 | | 2.6305 | 0.7943 | 168 | 2.5129 | 4.98 | 4.89 | 6.9 | | 2.3989 | 0.8936 | 189 | 2.5105 | 4.98 | 4.89 | 6.9 | | 2.6816 | 0.9929 | 210 | 2.5096 | 4.98 | 4.89 | 6.9 | | 2.629 | 1.0898 | 231 | 2.5092 | 4.98 | 4.89 | 6.9 | | 2.4645 | 1.1891 | 252 | 2.5088 | 4.98 | 4.89 | 6.9 | | 2.3738 | 1.2884 | 273 | 2.5081 | 4.98 | 4.89 | 6.9 | | 2.3651 | 1.3877 | 294 | 2.5076 | 4.98 | 4.89 | 6.9 | | 2.4476 | 1.4870 | 315 | 2.5073 | 4.98 | 4.89 | 6.9 | | 2.4091 | 1.5863 | 336 | 2.5072 | 4.98 | 4.89 | 6.9 | | 2.6352 | 1.6856 | 357 | 2.5071 | 4.98 | 4.89 | 6.9 | | 2.5311 | 1.7849 | 378 | 2.5071 | 4.98 | 4.89 | 6.9 | | 2.5747 | 1.8842 | 399 | 2.5071 | 4.98 | 4.89 | 6.9 | | 2.3871 | 1.9835 | 420 | 2.5071 | 4.98 | 4.89 | 6.9 | - PEFT 0.17.1 - Transformers 4.56.1 - Pytorch 2.7.1+cu126 - Datasets 4.0.0 - Tokenizers 0.22.1
another-gemma-12b-lora-part1
- Developed by: [More Information Needed] - Funded by [optional]: [More Information Needed] - Shared by [optional]: [More Information Needed] - Model type: [More Information Needed] - Language(s) (NLP): [More Information Needed] - License: [More Information Needed] - Finetuned from model [optional]: [More Information Needed] - Repository: [More Information Needed] - Paper [optional]: [More Information Needed] - Demo [optional]: [More Information Needed] Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019). - Hardware Type: [More Information Needed] - Hours used: [More Information Needed] - Cloud Provider: [More Information Needed] - Compute Region: [More Information Needed] - Carbon Emitted: [More Information Needed]
Gemma 3 Starshine 12B
A creative writing model based on a merge of fine-tunes on Gemma 3 12B IT and Gemma 3 12B PT. This is the Story Focused merge. This version works better for storytelling and scenarios, as the prose is more novel-like and it has a tendency to impersonate the user character. This is a merge of two G3 models, one trained on instruct and one trained on base: allura-org/Gemma-3-Glitter-12B - Itself a merge of a storywriting and RP train (both also by ToastyPigeon), on instruct ToastyPigeon/Gemma-3-Confetti-12B - Experimental application of the Glitter data using base instead of instruct, additionally includes some adventure data in the form of SpringDragon. The result is a lovely blend of Glitter's ability to follow instructions and Confetti's free-spirit prose, effectively 'loosening up' much of the hesitancy that was left in Glitter. Thank you to jebcarter for the idea to make this. I love how it turned out! Uses Gemma2/3 instruct, but has been trained to recognize an optional system role. Note: While it won't immediately balk at the system role, results may be better without it. Yeah, I actually tried several things and surprisingly this one worked best.
ms-test-models
probably-terrible-gemma-12b-Q6_K-GGUF
muse-marvin-lora-2
probably-terrible-gemma-12b
muse-marvin-Q8_0-GGUF
probably-broken-glm-Q4_K_S-GGUF
another-qwen-test-model-Q6_K-GGUF
ToastyPigeon/another-qwen-test-model-Q6K-GGUF This model was converted to GGUF format from `ToastyPigeon/another-qwen-test-model` using llama.cpp via the ggml.ai's GGUF-my-repo space. Refer to the original model card for more details on the model. Use with llama.cpp Install llama.cpp through brew (works on Mac and Linux) Note: You can also use this checkpoint directly through the usage steps listed in the Llama.cpp repo as well. Step 2: Move into the llama.cpp folder and build it with `LLAMACURL=1` flag along with other hardware-specific flags (for ex: LLAMACUDA=1 for Nvidia GPUs on Linux).
half-granite-marvin-Q4_K_S-GGUF
muse-marvin-od2-lora
apertus-ffn-1
tess-books-4-Q6_K-GGUF
nemo-instruct-books-Q6_K-GGUF
ToastyPigeon/nemo-instruct-books-Q6K-GGUF This model was converted to GGUF format from `ToastyPigeon/nemo-instruct-books` using llama.cpp via the ggml.ai's GGUF-my-repo space. Refer to the original model card for more details on the model. Use with llama.cpp Install llama.cpp through brew (works on Mac and Linux) Note: You can also use this checkpoint directly through the usage steps listed in the Llama.cpp repo as well. Step 2: Move into the llama.cpp folder and build it with `LLAMACURL=1` flag along with other hardware-specific flags (for ex: LLAMACUDA=1 for Nvidia GPUs on Linux).
muse-marvin-ffn-lora
qwen3-16b-a3b-v3-iter3-Q6_K-GGUF
i-added-glitter-Q4_K_S-GGUF
ToastyPigeon/i-added-glitter-Q4KS-GGUF This model was converted to GGUF format from `ToastyPigeon/i-added-glitter` using llama.cpp via the ggml.ai's GGUF-my-repo space. Refer to the original model card for more details on the model. Use with llama.cpp Install llama.cpp through brew (works on Mac and Linux) Note: You can also use this checkpoint directly through the usage steps listed in the Llama.cpp repo as well. Step 2: Move into the llama.cpp folder and build it with `LLAMACURL=1` flag along with other hardware-specific flags (for ex: LLAMACUDA=1 for Nvidia GPUs on Linux).
a-funny-nemo-merge-Q6_K-GGUF
glitterdex-1epoch-test-merged-Q5_K_M-GGUF
mn-12b-mm-tess-books-Q6_K-GGUF
a-strange-nemo-model-Q6_K-GGUF
funny-nemo-embed-testing-3
A creative writing model based on Mistral Nemo 12B to support co-writing and other related longform writing tasks. This is pretty good, actually. Smarter than some other nemos I've tried and with decent samplers it's not very sloppy. Working samplers: temp 1.25-1.5, min-p 0.02-0.05, rep pen 1.01, temp first. Feels like some prompts need higher or lower temp than others. Lower temps result in sloppy mistral-isms, higher temps tap into the lora training a bit more. Chat template is theoretically ChatML because of the base models used in the merge. However the ChatML-Names preset in SillyTavern often gives better results, YMMV. With ChatML-Names in particular this is good at copying the style of what's already in the chat history. So if your chat history is sloppy, this likely will be too (use XTC for a bit to break it up). If your chat history isn't sloppy, this is less likely to introduce any extra. Start a conversation off with text from a good model (or better yet, human-written text), and this should follow along easily. Has the same pacing issues any Nemo model does when asked to compose a longform story from scratch via instruct, though better than some others. Seems like it's good at dialogue (though it has a bias towards country and/or british style English accents if unspecified), and is good at 'reading between the lines' for its size as well. I did not include any erotica or other NSFW data in the LoRA training parts of this; however, Mag-Mell contains Magnum (and Chronos, which is trained on top of a rejected Magnum) so the capability is there if you need it (it just might be a bit Claude-slop-y as I haven't optimized this part for style). The two LoRAs on this were trained at 8k (nemo-kimi-lora) and 32k (nemo-books-lora) context. As you might guess, nemo-kimi-lora is trained on outputs from kimi-k2 (dataset is public on my profile), and nemo-books-lora is trained on a bunch of books. This is a merge of pre-trained language models created using mergekit. This model was merged using the Linear merge method. The following models were included in the merge: inflatebot/MN-12B-Mag-Mell-R1 + ToastyPigeon/nemo-kimi-lora migtissera/Tess-3-Mistral-Nemo-12B + ToastyPigeon/nemo-books-lora The following YAML configuration was used to produce this model:
muse-marvin-attn-lora
This model is a fine-tuned version of LatitudeGames/Muse-12B on the grimulkan/LimaRP-augmented, the ToastyPigeon/steve-and-marvin and the ToastyPigeon/kimi-stories-completion datasets. It achieves the following results on the evaluation set: - Loss: 2.4268 - Memory/max Active (gib): 5.02 - Memory/max Allocated (gib): 4.89 - Memory/device Reserved (gib): 6.64 The following hyperparameters were used during training: - learningrate: 1e-05 - trainbatchsize: 1 - evalbatchsize: 1 - seed: 69 - distributedtype: multi-GPU - numdevices: 2 - gradientaccumulationsteps: 4 - totaltrainbatchsize: 8 - totalevalbatchsize: 2 - optimizer: Use OptimizerNames.ADAMWTORCHFUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizerargs=No additional optimizer arguments - lrschedulertype: cosine - lrschedulerwarmupsteps: 5 - trainingsteps: 232 | Training Loss | Epoch | Step | Validation Loss | Active (gib) | Allocated (gib) | Reserved (gib) | |:-------------:|:------:|:----:|:---------------:|:------------:|:---------------:|:--------------:| | No log | 0 | 0 | 2.5323 | 8.04 | 6.73 | 8.36 | | 2.5888 | 0.1032 | 24 | 2.4883 | 5.02 | 4.89 | 6.64 | | 2.4142 | 0.2065 | 48 | 2.4537 | 5.02 | 4.89 | 6.64 | | 2.3697 | 0.3097 | 72 | 2.4418 | 5.02 | 4.89 | 6.64 | | 2.2986 | 0.4129 | 96 | 2.4354 | 5.02 | 4.89 | 6.64 | | 2.5054 | 0.5161 | 120 | 2.4314 | 5.02 | 4.89 | 6.64 | | 2.6863 | 0.6194 | 144 | 2.4290 | 5.02 | 4.89 | 6.64 | | 2.3196 | 0.7226 | 168 | 2.4277 | 5.02 | 4.89 | 6.64 | | 2.3422 | 0.8258 | 192 | 2.4271 | 5.02 | 4.89 | 6.64 | | 2.5976 | 0.9290 | 216 | 2.4268 | 5.02 | 4.89 | 6.64 | - PEFT 0.17.1 - Transformers 4.56.1 - Pytorch 2.7.1+cu126 - Datasets 4.0.0 - Tokenizers 0.22.1
Gemma-3-Starshine-12B-Alt
g3-12b-storyteller-v0.2-textonly-Q6_K-GGUF
other-test-models
new-ms-rp-test-ws
medgemma-ero-healmerged-Q4_K_S-GGUF
ToastyPigeon/medgemma-ero-healmerged-Q4KS-GGUF This model was converted to GGUF format from `allura-forge/medgemma-ero-healmerged` using llama.cpp via the ggml.ai's GGUF-my-repo space. Refer to the original model card for more details on the model. Use with llama.cpp Install llama.cpp through brew (works on Mac and Linux) Note: You can also use this checkpoint directly through the usage steps listed in the Llama.cpp repo as well. Step 2: Move into the llama.cpp folder and build it with `LLAMACURL=1` flag along with other hardware-specific flags (for ex: LLAMACUDA=1 for Nvidia GPUs on Linux).
GLM-Tulu-ChatML-Q4_K_S-GGUF
possibly-working-glm-Q4_K_S-GGUF
muse-marvin-lora
This model is a fine-tuned version of LatitudeGames/Muse-12B on the grimulkan/LimaRP-augmented, the ToastyPigeon/steve-and-marvin and the ToastyPigeon/kimi-stories-completion datasets. It achieves the following results on the evaluation set: - Loss: 2.3857 - Memory/max Active (gib): 31.3 - Memory/max Allocated (gib): 31.3 - Memory/device Reserved (gib): 32.18 The following hyperparameters were used during training: - learningrate: 1e-05 - trainbatchsize: 6 - evalbatchsize: 6 - seed: 69 - optimizer: Use OptimizerNames.PAGEDADAMW8BIT with betas=(0.9,0.999) and epsilon=1e-08 and optimizerargs=No additional optimizer arguments - lrschedulertype: cosine - lrschedulerwarmupsteps: 7 - trainingsteps: 309 | Training Loss | Epoch | Step | Validation Loss | Active (gib) | Allocated (gib) | Reserved (gib) | |:-------------:|:-----:|:----:|:---------------:|:------------:|:---------------:|:--------------:| | No log | 0 | 0 | 2.4839 | 31.29 | 31.29 | 32.17 | | 2.5492 | 0.1 | 31 | 2.4235 | 31.31 | 31.31 | 32.18 | | 2.3906 | 0.2 | 62 | 2.4048 | 31.32 | 31.32 | 32.18 | | 2.2984 | 0.3 | 93 | 2.3961 | 31.31 | 31.31 | 32.18 | | 2.4423 | 0.4 | 124 | 2.3916 | 31.31 | 31.31 | 32.18 | | 2.4106 | 0.5 | 155 | 2.3889 | 31.3 | 31.3 | 32.18 | | 2.526 | 0.6 | 186 | 2.3875 | 31.3 | 31.3 | 32.18 | | 2.3574 | 0.7 | 217 | 2.3863 | 31.3 | 31.3 | 32.18 | | 2.4005 | 0.8 | 248 | 2.3858 | 31.3 | 31.3 | 32.18 | | 2.4227 | 0.9 | 279 | 2.3857 | 31.3 | 31.3 | 32.18 | - PEFT 0.17.1 - Transformers 4.56.1 - Pytorch 2.8.0+cu128 - Datasets 4.0.0 - Tokenizers 0.22.1
muse-marvin-stage3-lora
another-glm-train-2-epochs-Q4_K_S-GGUF
ToastyPigeon/another-glm-train-2-epochs-Q4KS-GGUF This model was converted to GGUF format from `ToastyPigeon/another-glm-train-2-epochs` using llama.cpp via the ggml.ai's GGUF-my-repo space. Refer to the original model card for more details on the model. Use with llama.cpp Install llama.cpp through brew (works on Mac and Linux) Note: You can also use this checkpoint directly through the usage steps listed in the Llama.cpp repo as well. Step 2: Move into the llama.cpp folder and build it with `LLAMACURL=1` flag along with other hardware-specific flags (for ex: LLAMACUDA=1 for Nvidia GPUs on Linux).
medgemma-27b-abliterated-multimodal
Llama-3-8B-Instruct-SpringDragon-V2-QLoRA
mistral-small-springdragon-qlora
funny-nemo-embedding-testing
This is a merge of pre-trained language models created using mergekit. This model was merged using the Linear merge method. The following models were included in the merge: magmell + ToastyPigeon/nemo-kimi-lora tess + ToastyPigeon/nemo-books-lora The following YAML configuration was used to produce this model:
mistral-small-dampf-qlora
Qwen3-Gutenberg-Encore-14B-Q6_K-GGUF
ToastyPigeon/Qwen3-Gutenberg-Encore-14B-Q6K-GGUF This model was converted to GGUF format from `nbeerbower/Qwen3-Gutenberg-Encore-14B` using llama.cpp via the ggml.ai's GGUF-my-repo space. Refer to the original model card for more details on the model. Use with llama.cpp Install llama.cpp through brew (works on Mac and Linux) Note: You can also use this checkpoint directly through the usage steps listed in the Llama.cpp repo as well. Step 2: Move into the llama.cpp folder and build it with `LLAMACURL=1` flag along with other hardware-specific flags (for ex: LLAMACUDA=1 for Nvidia GPUs on Linux).
another-strange-nemo-model-Q6_K-GGUF
nemo-books-lora-4
nemo-kimi-lora-2e-larger
middle-stage-qwen
possibly-cursed-glm-test
new-ms-rp-test-v2-ws
qwen21b-creative-Q4_K_S-GGUF
gemma3-27b-starlike-v2-Q4_K_S-GGUF
ToastyPigeon/gemma3-27b-starlike-v2-Q4KS-GGUF This model was converted to GGUF format from `ToastyPigeon/gemma3-27b-starlike-v2` using llama.cpp via the ggml.ai's GGUF-my-repo space. Refer to the original model card for more details on the model. Use with llama.cpp Install llama.cpp through brew (works on Mac and Linux) Note: You can also use this checkpoint directly through the usage steps listed in the Llama.cpp repo as well. Step 2: Move into the llama.cpp folder and build it with `LLAMACURL=1` flag along with other hardware-specific flags (for ex: LLAMACUDA=1 for Nvidia GPUs on Linux).
nemo-kimi-lora
Mistral-Nemo-12B-Adventure-QLoRA
QwQ-32B-Snowdrop-v0-EmbedFix
tess-books-4
nemo-instruct-books
muse-marvin
intern-rp-lora
q3-14b-completion-lora
This model is a fine-tuned version of Qwen/Qwen3-14B-Base on the ToastyPigeon/new-story-dataset, the ToastyPigeon/new-story-dataset, the ToastyPigeon/some-erotica, the ToastyPigeon/skein-text-adventures, the ToastyPigeon/SpringDragon and the ToastyPigeon/disco-chat datasets. The following hyperparameters were used during training: - learningrate: 1e-05 - trainbatchsize: 1 - evalbatchsize: 1 - seed: 69 - distributedtype: multi-GPU - numdevices: 2 - gradientaccumulationsteps: 8 - totaltrainbatchsize: 16 - totalevalbatchsize: 2 - optimizer: Use OptimizerNames.ADAMWTORCHFUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizerargs=No additional optimizer arguments - lrschedulertype: cosine - lrschedulerwarmupsteps: 20 - numepochs: 2.0 - PEFT 0.15.2 - Transformers 4.51.3 - Pytorch 2.7.1+cu126 - Datasets 3.5.0 - Tokenizers 0.21.1
another-possibly-cursed-glm-checkpoint
nemo-books-lora
half-granite-marvin
- Developed by: ToastyPigeon - License: apache-2.0 - Finetuned from model : Columbidae/granite-4.0-h-small-half-slerp This granitemoehybrid model was trained 2x faster with Unsloth and Huggingface's TRL library.
half-granite-marvin-Q4_K_M-GGUF
ToastyPigeon/half-granite-marvin-Q4KM-GGUF This model was converted to GGUF format from `ToastyPigeon/half-granite-marvin` using llama.cpp via the ggml.ai's GGUF-my-repo space. Refer to the original model card for more details on the model. Use with llama.cpp Install llama.cpp through brew (works on Mac and Linux) Note: You can also use this checkpoint directly through the usage steps listed in the Llama.cpp repo as well. Step 2: Move into the llama.cpp folder and build it with `LLAMACURL=1` flag along with other hardware-specific flags (for ex: LLAMACUDA=1 for Nvidia GPUs on Linux).
muse-marvin-stage2-lora
SpringDragon-NeMo-Instruct-QLoRA-ep1
Meta-Llama-3.1-8B-Adventure-QLoRA
qwen-story-test-qlora
Qwen2.5-14B-Instruct-1M-Unalign
A simple unalignment fine-tune on ~900k tokens aiming to make the model more compliant and willing to handle user requests. This is the same unalignment training seen in concedo/Beepo-22B, so big thanks to concedo for the dataset.
mn-12b-impersonation-city
This is a merge of pre-trained language models created using mergekit. This model was merged using the Linear merge method. The following models were included in the merge: LatitudeGames/Muse-12B muse-writer nbeerbower/mistral-nemo-gutenberg-12B The following YAML configuration was used to produce this model:
mn-12b-mm-tess-books
an-cleaner
ms-type1-adventure
tq14b-1m-gutenberg-sft
qwen14-creative-epoch1-Q4_K_S-GGUF
ms3-roselily-rp-v3-Q4_K_S-GGUF
g3-4b-it-creative-qlora
gemma-3-starshine-12b-continued
gemma3-27b-starlike-v3-Q4_K_S-GGUF
anti-star-maybe-stabilized-Q4_K_S-GGUF
glitterdex-1epoch-test-merged
nemo-kink-lora
nemo-books-lora-2
a-strange-nemo-model
nemo-books-lora-3
glm-books-lora-wonky
probably-broken-glm
possibly-working-glm
nemo-kimi-lora-2e
another-qwen-test-model
psyonic-cetacean-20b-v2
i-added-glitter
This is a merge of pre-trained language models created using mergekit. This model was merged using the Linear merge method. The following models were included in the merge: ToastyPigeon/anti-starlike allura-org/Gemma-3-Glitter-27B The following YAML configuration was used to produce this model:
MS-Meadowlark-Alt-22B
MS3-24B-MarbleRye
g3-12b-it-story-qlora
command-r-32b-Adventure-LoRA
supernova-medius-adventure-s-qlora
tq14-unalign-test-ws
qwen32-rp-ws
ms3-roselily-rp-Q4_K_S-GGUF
gemma-2-24b-retrained-base-adapter
g3-12b-it-unalign-epoch2-Q6_K-GGUF
g3-12b-storyteller-v0.1-epoch1-Q6_K-GGUF
g3-27b-part1-glitter-Q4_K_S-GGUF
g3-27b-merge-B-Q4_K_S-GGUF
g3-27b-beepo-mmtest-Q4_K_S-GGUF
starshine-simpo-test-1-Q6_K-GGUF
starshine-simpo-test-2-Q6_K-GGUF
starshine-simpo-test-3-Q6_K-GGUF
glm4-glimmer-v0-merged-idkifthiswillwork-Q4_K_S-GGUF
ToastyPigeon/glm4-glimmer-v0-merged-idkifthiswillwork-Q4KS-GGUF This model was converted to GGUF format from `ToastyPigeon/glm4-glimmer-v0-merged-idkifthiswillwork` using llama.cpp via the ggml.ai's GGUF-my-repo space. Refer to the original model card for more details on the model. Use with llama.cpp Install llama.cpp through brew (works on Mac and Linux) Note: You can also use this checkpoint directly through the usage steps listed in the Llama.cpp repo as well. Step 2: Move into the llama.cpp folder and build it with `LLAMACURL=1` flag along with other hardware-specific flags (for ex: LLAMACUDA=1 for Nvidia GPUs on Linux).
gemma-3-27b-medglitter-ero-Q4_K_S-GGUF
negative-starlike-v2-Q4_K_S-GGUF
ToastyPigeon/negative-starlike-v2-Q4KS-GGUF This model was converted to GGUF format from `ToastyPigeon/negative-starlike-v2` using llama.cpp via the ggml.ai's GGUF-my-repo space. Refer to the original model card for more details on the model. Use with llama.cpp Install llama.cpp through brew (works on Mac and Linux) Note: You can also use this checkpoint directly through the usage steps listed in the Llama.cpp repo as well. Step 2: Move into the llama.cpp folder and build it with `LLAMACURL=1` flag along with other hardware-specific flags (for ex: LLAMACUDA=1 for Nvidia GPUs on Linux).
sparkly-3.2-train
workspace/aibox-standalone-pool/axolotl/glitterms32-v2-ckpts This model is a fine-tuned version of Gryphe/Codex-24B-Small-3.2 on the ToastyPigeon/cowriter-instruct, the allura-org/EU01-S2, the allenai/tulu-3-sft-personas-instruction-following, the ToastyPigeon/mixed-medical-reasoning-formatted, the ToastyPigeon/steve-and-marvin, the ToastyPigeon/new-story-dataset, the allura-org/fujin-instruct-v2, the ToastyPigeon/some-rp-extended, the ToastyPigeon/gutenberg-sft, the ToastyPigeon/SpringDragon and the ToastyPigeon/some-erotica datasets. The following hyperparameters were used during training: - learningrate: 1e-05 - trainbatchsize: 1 - evalbatchsize: 1 - seed: 69 - distributedtype: multi-GPU - numdevices: 2 - gradientaccumulationsteps: 8 - totaltrainbatchsize: 16 - totalevalbatchsize: 2 - optimizer: Use OptimizerNames.PAGEDADAMW8BIT with betas=(0.9,0.999) and epsilon=1e-08 and optimizerargs=No additional optimizer arguments - lrschedulertype: cosine - trainingsteps: 10 - PEFT 0.15.2 - Transformers 4.51.3 - Pytorch 2.7.0+cu128 - Datasets 3.5.1 - Tokenizers 0.21.1
another-strange-nemo-model
psyonic-cetacean-20b-v2-4.0bpw-h6-exl2
BlackMagic-7B
mistral-small-adventure-qlora
granite-3.3-8b-creative
ms-type2-rp
TQ2.5-0.5B-Summary-ep1
not-for-human-consumption
ms-rp-test-revisit-e1
qwen2.5-32b-unnamed-test-model
g2-9b-creative-16k-Q6_K-GGUF
ToastyPigeon/g2-9b-creative-16k-Q6K-GGUF This model was converted to GGUF format from `Columbidae/g2-9b-creative-16k` using llama.cpp via the ggml.ai's GGUF-my-repo space. Refer to the original model card for more details on the model. Use with llama.cpp Install llama.cpp through brew (works on Mac and Linux) Note: You can also use this checkpoint directly through the usage steps listed in the Llama.cpp repo as well. Step 2: Move into the llama.cpp folder and build it with `LLAMACURL=1` flag along with other hardware-specific flags (for ex: LLAMACUDA=1 for Nvidia GPUs on Linux).
gemma-2-24b-instruct-0.5ep-Q4_K_S-GGUF
ToastyPigeon/gemma-2-24b-instruct-0.5ep-Q4KS-GGUF This model was converted to GGUF format from `Columbidae/gemma-2-24b-instruct-0.5ep` using llama.cpp via the ggml.ai's GGUF-my-repo space. Refer to the original model card for more details on the model. Use with llama.cpp Install llama.cpp through brew (works on Mac and Linux) Note: You can also use this checkpoint directly through the usage steps listed in the Llama.cpp repo as well. Step 2: Move into the llama.cpp folder and build it with `LLAMACURL=1` flag along with other hardware-specific flags (for ex: LLAMACUDA=1 for Nvidia GPUs on Linux).
g3-12b-storyteller-v0.1-epoch1
g3-12b-multimerge-test
g3-12b-pt-inkstruct-epoch1-mm-Q6_K-GGUF
g3-12b-inkstructfetti-Q6_K-GGUF
g3-27b-part1-glitter
g3-27b-merge-A-Q4_K_S-GGUF
gemma-3-27b-experiment-storyteller
another-gemma3-abomination
gemma3-negative-starlike-Q4_K_S-GGUF
negative-starlike-v2
This is a merge of pre-trained language models created using mergekit. This model was merged using the Linear merge method. The following models were included in the merge: ToastyPigeon/gemma3-27b-glitterlike-v2 ToastyPigeon/negative-confetti The following YAML configuration was used to produce this model:
anti-starlike-Q4_K_S-GGUF
ToastyPigeon/anti-starlike-Q4KS-GGUF This model was converted to GGUF format from `ToastyPigeon/anti-starlike` using llama.cpp via the ggml.ai's GGUF-my-repo space. Refer to the original model card for more details on the model. Use with llama.cpp Install llama.cpp through brew (works on Mac and Linux) Note: You can also use this checkpoint directly through the usage steps listed in the Llama.cpp repo as well. Step 2: Move into the llama.cpp folder and build it with `LLAMACURL=1` flag along with other hardware-specific flags (for ex: LLAMACUDA=1 for Nvidia GPUs on Linux).
qwen3-18b-completion-trained-Q6_K-GGUF
ToastyPigeon/qwen3-18b-completion-trained-Q6K-GGUF This model was converted to GGUF format from `allura-forge/qwen3-18b-completion-trained` using llama.cpp via the ggml.ai's GGUF-my-repo space. Refer to the original model card for more details on the model. Use with llama.cpp Install llama.cpp through brew (works on Mac and Linux) Note: You can also use this checkpoint directly through the usage steps listed in the Llama.cpp repo as well. Step 2: Move into the llama.cpp folder and build it with `LLAMACURL=1` flag along with other hardware-specific flags (for ex: LLAMACUDA=1 for Nvidia GPUs on Linux).
a-glm-train-mid-backup
40% Epoch checkpoint (~40M tokens seen). Producing some interesting output but inconsistent, potential target for stabilizing RL. Saving this in case it gets worse later.
a-glm-train-0.7ep-backup
another-glm-train
- Developed by: [More Information Needed] - Funded by [optional]: [More Information Needed] - Shared by [optional]: [More Information Needed] - Model type: [More Information Needed] - Language(s) (NLP): [More Information Needed] - License: [More Information Needed] - Finetuned from model [optional]: [More Information Needed] - Repository: [More Information Needed] - Paper [optional]: [More Information Needed] - Demo [optional]: [More Information Needed] Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019). - Hardware Type: [More Information Needed] - Hours used: [More Information Needed] - Cloud Provider: [More Information Needed] - Compute Region: [More Information Needed] - Carbon Emitted: [More Information Needed]
glm-books-qlora-2-2ep
another-glm-train-2-epochs
ST-Presets-Mistral-Small
Beeper-King-22B
g3-12b-storyteller-v0.2-textonly
Gemma-3-Confetti-12B
Qwen3.5-27B-Marvin-DPO-V2
SpringDragon-NeMo-QLoRA-ep1
Captain-Adventure-32B
qwen-rp-test-h-qlora
Sto-vo-kor-12B-LoRA
g3-12b-rp-system-v0.1
gemma-3-27b-experiment-v2-merge-B
nemo-12b-instruct-creative
Qwen3-16B-A3B-MixedData
gemma3-27b-v2-confettilike
gemma3-27b-v2-starlike
This is a merge of pre-trained language models created using mergekit. This model was merged using the Linear merge method. The following models were included in the merge: confettilike-mm glitterlike-mm The following YAML configuration was used to produce this model:
gemma3-27b-glitterlike-v2
gemma3-negative-glitter
This is a merge of pre-trained language models created using mergekit. This model was merged using the Linear merge method. The following models were included in the merge: ToastyPigeon/gemma3-27b-v2-glitterlike ToastyPigeon/medgemma-27b-abliterated-multimodal The following YAML configuration was used to produce this model: