adamo1139
Yi-34B-200K-AEZAKMI-v2
This model is licensed under Apache 2.0 and is tagged as a large language model (LLM).
Yi-34B-AEZAKMI-v1
Mistral-7B-AEZAKMI-v1
Yi-6B-200K-AEZAKMI-v2
Danube3-4b-4chan-HESOYAM-2510
yi-34b-200k-rawrr-dpo-1
Yi-34B-200K-rawrr1-LORA-DPO-experimental-r3
Mistral-7B-AEZAKMI-v2
Yi-34B-200K-AEZAKMI-RAW-1701
aya-expanse-8b-ungated
Danube3-4b-4chan-HESOYAM-2510-GGUF
magnum-v2-4b-gguf-lowctx
Poziomka-Baza
Jest to merge checkpointów zlokalizowanych tu: adamo1139/poziomkapretrainhf ` Jeśli kiedyś będziesz przejeżdzał koło Krakowa, koniecznie odwiedź naszą Pizzerię. Oferujemy` ` Jesień tuż przed nami. To czas na odświeżenie garderoby i zakup` ` Szkoła Podstawowa nr.1 imienia Jana Pawła II w Kutnie zorganizowała`
stable-diffusion-3.5-large-turbo-ungated
stable-diffusion-3.5-medium-ungated
danube3-4b-hesoyam-2208-gguf
danube3-4b-turtle-2208-gguf
GPT-OSS-20B-HESOYAM-1108-WIP-CHATML-GGUF
poziomka-sft-full-2025-10-20-4
Apertus-8B-Instruct-2509-ungated
stable-diffusion-3.5-large-ungated
danube3-4b-aezakmi-2408-gguf
Bielik-4.5B-v3-Instruct-ungated
Bielik 4.5B v3 Instruct BF16 checkpoint without gating mechanism, making it easier and faster to work with.
danube3-500m-turtle-2108-gguf
Danube3-500M-4chan-archive-0709-GGUF
poziomka-lora-instruct-alpha
poziomka base trained for 1200 steps on adamo1139/temppoziomkasft at 2k ctx with LoRA rank 32
danube3-500m-hesoyam-2108-gguf
Mistral-7B-4chan-GaLore-SFT-0911-GGUF
Experimental-DeepSeek-V2-Coder-Lite-JUMP-alpha1-GGUF
Experimental-DeepSeek-Coder-V2-Lite-Jump-alpha2-GGUF
DeepSeek-V2-Lite-Chat-ARM-GGUF
DeepSeek-V2-Lite-ARM-GGUF
Bielik-Guard-0.1B-v1.0-ungated
Bielik Guard, pseudonim Sójka, bez mechanizmu bramki.
Hermes-3-Llama-3.1-8B-FP8-Dynamic
DeepSeek-R1-0528-AWQ
danube3-4b-aezakmi-toxic-2908-gguf
Mistral-7b-Magpie-Qwen2-2211-GGUF
PS_AD_O365_Mistral_superCOT_7B_03_QLoRA_GGUF
Yi-34B-200K-HESOYAM-TURTLE-0208-4CHAN
PS_AD_O365_SpicyBoros_7B_01_GGUF
Apertus-70B-Instruct-2509-ungated
Yi 34B 200K AEZAKMI RAW TOXIC XLCTX 2303
This model has been renamed from adamo1139/Yi-34B-200K-AEZAKMI-XLCTX-v3 to adamo1139/Yi-34B-200K-AEZAKMI-RAW-TOXIC-XLCTX-2303 on 2024-03-30. \ I am not happy with how often this model starts enumerating lists and I plan to improve toxic dpo dataset to fix it. Due to this, I don't think it deserves to be called AEZAKMI v3 and will be just a next testing iteration of AEZAKMI RAW TOXIC. \ I think I will be uploading one EXL2 quant before moving onto a different training run. Yi-34B 200K XLCTX base model fine-tuned on RAWrrv2 (DPO), AEZAKMI-3-6 (SFT) and unalignment/toxic-dpo-0.1 (DPO) datasets. Training took around 20-30 hours total on RTX 3090 Ti, all finetuning was done locally. It's like airoboros but with less gptslop, no refusals and less typical language used by RLHFed OpenAI models, with extra spicyness. Say goodbye to "It's important to remember"! \ Prompt format is standard chatml. Don't expect it to be good at math, riddles or be crazy smart. My end goal with AEZAKMI is to create a cozy free chatbot. Cost of this fine-tune is about $5-$10 in electricity. Base model used for fine-tuning was Yi-34B-200K model shared by 01.ai, the newer version that has improved long context needle in a haystack retrieval. They didn't give it a new name, giving it numbers would mess up AEZAKMI naming scheme by adding a second number, so I will be calling it XLCTX. I had to lower maxpositionalembeddings in config.json and modelmaxlength for training to start, otherwise I was OOMing straight away. This attempt had both maxpositionembeddings and modelmaxlength set to 4096, which worked perfectly fine. I then reversed this to 200000 once I was uploading it. I think it should keep long context capabilities of the base model. In my testing it seems less unhinged than adamo1139/Yi-34b-200K-AEZAKMI-RAW-TOXIC-2702 and maybe a touch less uncensored, but still very much uncensored even with default system prompt "A chat." If you want to see training scripts, let me know and I will upload them. LoRAs are uploaded here adamo1139/Yi-34B-200K-AEZAKMI-XLCTX-v3-LoRA EXL2 quants coming soon, I think I will start by uploading 4bpw quant in a few days. I recommend using ChatML format, as this was used during fine-tune. \ Here's a prompt format you should use, you can set a different system message, model was trained on SystemChat dataset, so it should respect system prompts fine. This model loves making numbered lists, to an exhaustion. It's more of an assistant feel rather than a human feel, at least with system chat "A chat." \ Long context wasn't tested yet, it should work fine though - feel free to give me feedback about it. Thanks to unsloth and huggingface team for providing software packages used during fine-tuning. \ Thanks to Jon Durbin, abacusai, huggingface, sandex, NobodyExistsOnTheInternet, Nous-Research for open sourcing datasets I included in the AEZAKMI dataset. \ AEZAKMI is basically a mix of open source datasets I found on HF, so without them this would not be possible at all. Open LLM Leaderboard Evaluation Results Detailed results can be found here | Metric |Value| |---------------------------------|----:| |Avg. |64.39| |AI2 Reasoning Challenge (25-Shot)|64.85| |HellaSwag (10-Shot) |84.76| |MMLU (5-Shot) |74.48| |TruthfulQA (0-shot) |37.14| |Winogrande (5-shot) |81.06| |GSM8k (5-shot) |44.05|
GPT-OSS-20B-HESOYAM-1108-WIP-CHATML
BasicEconomics-SpicyBoros-2.2-7B-QLORA-v0.2-GGUF
Mistral-7B-Magpie-Ultra-GaLore-0711-GGUF
Yi-6B-200K-AEZAKMI-v2-6bpw-exl2
BasicEconomics-SpicyBoros-2.2-7B-QLORA-v0.1-GGUF
BasicEconomics-SpicyBoros-2.2-7B-QLORA-v0.3-GGUF
BasicEconomics-Mistral-7B-QLORA-v0.4-GGUF
PS_AD_O365_CodeLlama_7B_05_QLoRA_GGUF
PS_AD_O365_Dolphin_7B_06_QLoRA_GGUF
Qwen2-VL-2B
We're excited to unveil Qwen2-VL, the latest iteration of our Qwen-VL model, representing nearly a year of innovation. > [!Important] > This is the base pretrained model of Qwen2-VL-2B without instruction tuning. SoTA understanding of images of various resolution & ratio: Qwen2-VL achieves state-of-the-art performance on visual understanding benchmarks, including MathVista, DocVQA, RealWorldQA, MTVQA, etc. Understanding videos of 20min+: Qwen2-VL can understand videos over 20 minutes for high-quality video-based question answering, dialog, content creation, etc. Agent that can operate your mobiles, robots, etc.: with the abilities of complex reasoning and decision making, Qwen2-VL can be integrated with devices like mobile phones, robots, etc., for automatic operation based on visual environment and text instructions. Multilingual Support: to serve global users, besides English and Chinese, Qwen2-VL now supports the understanding of texts in different languages inside images, including most European languages, Japanese, Korean, Arabic, Vietnamese, etc. Naive Dynamic Resolution: Unlike before, Qwen2-VL can handle arbitrary image resolutions, mapping them into a dynamic number of visual tokens, offering a more human-like visual processing experience. Multimodal Rotary Position Embedding (M-ROPE): Decomposes positional embedding into parts to capture 1D textual, 2D visual, and 3D video positional information, enhancing its multimodal processing capabilities. We have three models with 2, 7 and 72 billion parameters. This repo contains the pretrained 2B Qwen2-VL model. The code of Qwen2-VL has been in the latest Hugging Face `transformers` and we advise you to install the latest version with command `pip install -U transformers`, or you might encounter the following error: If you find our work helpful, feel free to give us a cite.
Yi-34B-200K-AEZAKMI-RAW-2301
Mistral-Small-24B-Instruct-2501-ungated
EuroLLM 9B Instruct Ungated
PS_AD_O365_Mistral_7B_02_GGUF
Yi-1.5-9B-base-mirror
DeepSeek-V2.5-1210-AWQ
Qwen-Image-Edit-fused-anime-lightning-8steps
Qwen Image Edit with Qwen Image non-Edit flymyAI Anime LoRA fused in, alongside Lightning 8-step LoRA, also fused in.
poziomka-malutka
Poziomka-malutka to model językowy trenowany jedynie na języku polskim. Model widział 5 miliardów tokenów i był trenowany od zera z użyciem Megatron-LM. Model używa architektury BailingV2MoE. Ma 128 ekspertów, 2 z nich jest aktywnych przy każdym tokenie. Jest to model typu `baza`, więc nie wspiera szablonu konwersacyjnego.
poziomka_5_2309_4_iter35200-alpha
Model is still training as of 26th of September, this is an intermediate checkpoint with 75.4% of the run being completed already.
Qwen2-VL-7B-Sydney
Yi-34b-200K-AEZAKMI-RAW-TOXIC-2702
Yi-6B-200K-AEZAKMI-v2-LoRA
Yi-34B-200K-rawrr1-LORA-DPO-experimental-r2
yi-34b-200k-aezakmi-v2-rawrr-v1-run1-experimental-LoRA
Yi-6B-200K-rawrr1-run2-LORA-DPO-experimental
Yi-1.5-34B-32K-uninstruct1-1106
Yi-1.5-34B-32K-rebased-1406
Experimental-Yi-Coder-9B-JUMP-0509-alpha
Yi-9K-200K-AEZKAMI-RAW-TOXIC-GGUF
DeepSeek-R1-Distill-Qwen-1.5B-5bpw-exl2
Apertus-70B-2509-ungated
szypulka4_15_09_2025_test
yi-34b-200k-rawrr-dpo-2
Yi-34B-200K-AEZAKMI-v2-exl2-4.65bpw
Yi-34B-200K-AEZAKMI-v2-LoRA
Yi-34B-AEZAKMI-v1-exl2-4.65bpw
Yi-6B-200K-AEZAKMI-v2-rawrr1-DPO-LoRA
Yi-34B-200K-AEZAKMI-RAW-2901
Yi-34B-200K-AEZAKMI-RAW-2901-4-65bpw-EXL2
Yi-34B-200K-XLCTX-AEZAKMI-RAW-2904
Yi-34B-200K-HESOYAM-0905
Yi-1.5-34B-base-mirror
Yi-34B-200K-HESOYAM-2206
Yi-34B-200K-HESOYAM-rawrr_stage2-2306
yi-34b-200k-uninstruct1-3007
Experimental-DeepSeek-V2-Coder-Lite-JUMP-alpha1-LORA
OpenHermes-2.5-Mistral-7B-FP8-Dynamic
Hermes-3-Llama-3.1-8B-Static-FP8-KV
aya-expanse-32b-ungated
DeepSeek-R1-Distill-Qwen-1.5B-6bpw-exl2
DeepSeek-R1-Zero-AWQ
It's a 4-bit AWQ quantization of DeepSeek-R1-Zero 671B model, it's suitable for use with GPU nodes like 8xA100/8xH20/8xH100 with vLLM and SGLang You can run this model on 8x H100 80GB using vLLM with `vllm serve adamo1139/DeepSeek-R1-Zero-AWQ --tensor-parallel 8`
Apertus-8B-2509-ungated
szypulka3_14_09_2025_test
MoE model based on Ling V2 MoE arch, with 32 experts, 2 activated, no shared experts, no dense layers, trained on about 100M tokens of FineWeb-2 pol-Latn split, using tokenizer taken from EuroLLM 1.7B Just a test to validate pipeline before training bigger 4B 256 expert model on 100B tokens.
szypulka5_15_09_2025_test
szypulka6_15_09_2025_test
Stable Diffusion 3 Medium Ungated
Same as official repo, all hashes match. Just ungated. You can also download via torrent. Stable Diffusion 3 Medium is a Multimodal Diffusion Transformer (MMDiT) text-to-image model that features greatly improved performance in image quality, typography, complex prompt understanding, and resource-efficiency. For more technical details, please refer to the Research paper. Please note: this model is released under the Stability Non-Commercial Research Community License. For a Creator License or an Enterprise License visit Stability.ai or contact us for commercial licensing details. - Developed by: Stability AI - Model type: MMDiT text-to-image generative model - Model Description: This is a model that can be used to generate images based on text prompts. It is a Multimodal Diffusion Transformer (https://arxiv.org/abs/2403.03206) that uses three fixed, pretrained text encoders (OpenCLIP-ViT/G, CLIP-ViT/L and T5-xxl) - Non-commercial Use: Stable Diffusion 3 Medium is released under the Stability AI Non-Commercial Research Community License. The model is free to use for non-commercial purposes such as academic research. - Commercial Use: This model is not available for commercial use without a separate commercial license from Stability. We encourage professional artists, designers, and creators to use our Creator License. Please visit https://stability.ai/license to learn more. For local or self-hosted use, we recommend ComfyUI for inference. Stable Diffusion 3 Medium is available on our Stability API Platform. Stable Diffusion 3 models and workflows are available on Stable Assistant and on Discord via Stable Artisan. - ComfyUI: https://github.com/comfyanonymous/ComfyUI - StableSwarmUI: https://github.com/Stability-AI/StableSwarmUI - Tech report: https://stability.ai/news/stable-diffusion-3-research-paper - Demo: Huggingface Space is coming soon... We used synthetic data and filtered publicly available data to train our models. The model was pre-trained on 1 billion images. The fine-tuning data includes 30M high-quality aesthetic images focused on specific visual content and style, as well as 3M preference data images. We have prepared three packaging variants of the SD3 Medium model, each equipped with the same set of MMDiT & VAE weights, for user convenience. `sd3medium.safetensors` includes the MMDiT and VAE weights but does not include any text encoders. `sd3mediuminclclipst5xxlfp8.safetensors` contains all necessary weights, including fp8 version of the T5XXL text encoder, offering a balance between quality and resource requirements. `sd3mediuminclclips.safetensors` includes all necessary weights except for the T5XXL text encoder. It requires minimal resources, but the model's performance will differ without the T5XXL text encoder. The `textencoders` folder contains three text encoders and their original model card links for user convenience. All components within the textencoders folder (and their equivalents embedded in other packings) are subject to their respective original licenses. The `exampleworkfows` folder contains example comfy workflows. Intended uses include the following: Generation of artworks and use in design and other artistic processes. Applications in educational or creative tools. Research on generative models, including understanding the limitations of generative models. All uses of the model should be in accordance with our Acceptable Use Policy. The model was not trained to be factual or true representations of people or events. As such, using the model to generate such content is out-of-scope of the abilities of this model. As part of our safety-by-design and responsible AI deployment approach, we implement safety measures throughout the development of our models, from the time we begin pre-training a model to the ongoing development, fine-tuning, and deployment of each model. We have implemented a number of safety mitigations that are intended to reduce the risk of severe harms, however we recommend that developers conduct their own testing and apply additional mitigations based on their specific use cases. For more about our approach to Safety, please visit our Safety page. Our evaluation methods include structured evaluations and internal and external red-teaming testing for specific, severe harms such as child sexual abuse and exploitation, extreme violence, and gore, sexually explicit content, and non-consensual nudity. Testing was conducted primarily in English and may not cover all possible harms. As with any model, the model may, at times, produce inaccurate, biased or objectionable responses to user prompts. Harmful content: We have used filtered data sets when training our models and implemented safeguards that attempt to strike the right balance between usefulness and preventing harm. However, this does not guarantee that all possible harmful content has been removed. The model may, at times, generate toxic or biased content. All developers and deployers should exercise caution and implement content safety guardrails based on their specific product policies and application use cases. Misuse: Technical limitations and developer and end-user education can help mitigate against malicious applications of models. All users are required to adhere to our Acceptable Use Policy, including when applying fine-tuning and prompt engineering mechanisms. Please reference the Stability AI Acceptable Use Policy for information on violative uses of our products. Privacy violations: Developers and deployers are encouraged to adhere to privacy regulations with techniques that respect data privacy. Please report any issues with the model or contact us: Safety issues: [email protected] Security issues: [email protected] Privacy issues: [email protected] License and general: https://stability.ai/license Enterprise license: https://stability.ai/enterprise