Daemontatox

16 models • 2 total models in database

Sort by:

Zirel 3

Zirel-3 is a specialized finetune of cerebras/GLM-4.5-Air-REAP-82B-A12B, a memory-efficient 82B active parameter Mixture-of-Experts (MoE) model compressed using the novel REAP (Router-weighted Expert Activation Pruning) technique. The base model is a compressed variant of GLM-4.5-Air that: - Maintains near-identical performance while being 25% lighter (compressed from 110B to 82B total parameters) - Uses 82B parameters (~12B activated per forward pass) - Employs the REAP pruning method which outperforms expert merging, especially on generative tasks - Retains full capabilities: code generation, agentic workflows, repository-scale understanding, and function calling - Achieves drop-in compatibility with vanilla vLLM (no custom patches required) REAP (Router-weighted Expert Activation Pruning) is a one-shot MoE compression method that: - Prunes low-impact experts based on router gate values and expert activation norms - Preserves the router's independent control over remaining experts - Significantly outperforms expert merging on generative benchmarks (code, creative writing, math) - Maintains 95-97% of baseline model quality even at high compression ratios Paper: REAP the Experts: Why Pruning Prevails for One-Shot MoE compression (Lasby et al., 2025) This finetune was trained on a custom curated dataset designed to enhance the model's overall capabilities across multiple domains including instruction following, reasoning, and domain-specific knowledge. The training process builds upon the strong foundation of the REAP-compressed GLM-4.5-Air base model. - Total Parameters: 82B parameters (12 active) - Architecture: Sparse Mixture-of-Experts (SMoE) - Context Length: 128K tokens - Precision: BF16/FP16 compatible - License: MIT vLLM provides significantly faster inference with built-in optimizations for MoE models: - This is a large MoE model requiring substantial compute resources - Performance may vary based on hardware and optimization settings - May inherit biases present in training data - Requires careful prompt engineering for optimal results If you use this model, please cite both the base model and the REAP paper: This model builds upon: - Cerebras Research for the REAP compression method and GLM-4.5-Air-REAP base model - Original GLM-4.5-Air by Zhipu AI - The open-source AI community for tooling and infrastructure

NaNK

license:apache-2.0

278

Zirel 2

license:apache-2.0

103

SRA-LLM

Daemontatox/SRA-LLM is a fine-tuned variant of openai/gpt-oss-120b, the latest open-source release from OpenAI. This fine-tune transforms the base model into a sarcastic, intellectually sharp reasoning assistant, built to cut through noise, refuse nonsense, and produce concise logical analyses delivered with wit. The project began as an exploration into refusal behavior in large language models, which often either over-comply (hallucinating) or under-comply (refusing useful answers). By embedding sarcasm and skepticism into the training signal, the model adopts a personality that is both entertaining and practically useful in pushing through unnecessary refusals. While it is not a magic bullet—refusals still occur—the results during evaluation were noticeably better than the base model, with an added benefit: the assistant is funny. Over-politeness / Over-refusal: “I can’t help with that…” Over-confidence / Hallucination: confidently making things up. This project asked: What if we train the model to be sarcastically logical, skeptical, and blunt? The hypothesis was that sarcasm and critical reasoning could loosen refusal tendencies while keeping the assistant firmly grounded in step-by-step logical analysis. Offers practical conclusions—but with just enough bite to stay interesting. This was the exact instruction prompt used to fine-tune the model. It encoded the reasoning framework, communication protocol, and personality traits: This prompt proved most effective during fine-tuning experiments, resulting in fewer unnecessary refusals and more consistent reasoning chains. User: Why do people create overly complicated project plans? Assistant: So you’re telling me humans prefer a spaghetti maze over a straight line because they’re allergic to simplicity? Let’s audit this. 1. Assumption: complexity = control. False, it usually = confusion. 2. Assumption: big plans prevent failure. Actually, they just ensure bigger failures. 3. Reality: people love performative productivity. Verdict: keep it simple or enjoy drowning in Gantt charts. Critical Reasoning Tasks: debate prep, identifying flaws in arguments, sanity checks. Problem Decomposition: root cause analysis, decision-making breakdowns. Testing Refusal Behavior: model alignment experiments. Creative Brainstorming with an Edge: sarcastic commentary on otherwise dry analysis. Formal corporate settings (unless sarcasm is the culture). Refusal Reduction: In structured testing, the model refused ~30% fewer times than the base gpt-oss-120b. Reasoning Quality: Step-by-step analysis was more consistent than the base model, though sarcasm occasionally shortened explanations. User Reception: Human testers reported that answers felt “sharper, more honest, and entertaining.” Humor is subjective—sarcasm may alienate some users. Not calibrated for high-stakes or emotionally sensitive settings. Architecture: Decoder-only transformer, 120B parameters Tags: text-generation-inference, transformers, unsloth, gptoss @misc{daemontatoxsrallm, title = {SRA-LLM: Sarcastic Reasoning Assistant}, author = {Daemontatox}, year = {2025}, howpublished = {Hugging Face}, url = {https://huggingface.co/Daemontatox/SRA-LLM} }

NaNK

license:apache-2.0

Zireal-0

license:apache-2.0

Llama-Opus-Z8

llama

HydraCoder

HydraCoder is a state-of-the-art Rust-specialized coding model built on Qwen/Qwen3-Coder-30B-A3B-Instruct, designed for high-fidelity, idiomatic Rust code generation, completion, and repair. This is the strongest pure Rust model to date, specifically fine-tuned on real-world projects, crates, compiler patterns, and Rust best practices. Focused on Rust: Trained on diverse idiomatic Rust repositories, including tokio, serde, actix, clap, and async ecosystems. Instruction-tuned: Accepts natural instructions like "write a TCP server" or "convert this struct to JSON". Zero-shot Capable: Performs well without examples, and adapts to many Rust-specific patterns like lifetimes, Result , traits, ownership, and borrow checking. Max Sequence length : 8192 r = 32 , alpha = 64 bias is none lora dropout = 0.01 learning rate : 2e-4 / 2e-5 depinding on your dataset 2 epochs lr schedular = cosine weight decay is 0.05 warmup ration = 0.02 Base Model Qwen/Qwen3-Coder-30B-A3B-Instruct Fine-tuned Model Daemontatox/HydraCoder Model Type Mixture-of-Experts (2/8 active experts) Parameters ~30B (with 2 active experts, ~7.5B per step) Domain Specialization Idiomatic Rust Code Training Tooling Unsloth + Hugging Face TRL License Apache 2.0 Write a simple multithreaded web server in Rust that serves "Hello, world!" to any GET request. You can run inference using transformers and text-generation pipeline: Code Rust (HumanEval / MBPP in Rust) – correctly compiling and idiomatic Crate-specific patterns – understands macros, derive attributes, and lifetimes Trained for Rust only – not suited for general-purpose multi-language tasks. May hallucinate external crate names or imports if not in prompt. Not guaranteed to pass Rust compiler unless prompt includes full context. Released under the Apache 2.0 License. Free for research and commercial use with attribution.

NaNK

license:apache-2.0

Zeril-Hera

NaNK

llama

Tiny-OR1-Rust

A lightweight Rust code assistant model for code generation, completion, and explanation. Tiny-OR1-Rust is a specialized language model fine-tuned from Qwen3-1.7B for Rust programming tasks. Built on the efficient Qwen3 architecture, this 1.7B parameter model provides effective code generation, completion, and explanation capabilities specifically tailored for the Rust programming language while maintaining a compact footprint. - Model Name: Tiny-OR1-Rust - Developer: Daemontatox - Model Type: Code Generation / Text-to-Code - Language: Rust - Architecture: Qwen3-based Transformer - Parameters: 1.7B - Base Model: Qwen3-1.7B - Training Dataset: Tesslate/RustDataset Primary Use Cases - Code Generation: Generate Rust code from natural language descriptions - Code Completion: Complete partial Rust code snippets - Code Explanation: Explain Rust code functionality and concepts - Learning Assistant: Help developers learn Rust programming patterns and best practices Intended Users - Rust developers and learners - Students studying systems programming - Developers transitioning to Rust from other languages - Code editors and IDEs integrating Rust assistance The model was trained on the Tesslate/RustDataset, which contains: - Rust source code from various projects - Code documentation and comments - Rust programming examples and tutorials - Community-contributed Rust code snippets The model demonstrates strong performance in: - Generating syntactically correct Rust code - Understanding Rust-specific concepts (ownership, borrowing, lifetimes) - Providing contextually appropriate code completions - Explaining Rust programming patterns - Domain Specificity: Optimized for Rust code; may not perform well on other programming languages - Model Size: Being a "tiny" model, it may have limitations with very complex code generation tasks - Context Length: Limited context window may affect performance on very long code sequences - Specialized Knowledge: May not have extensive knowledge of very recent Rust features or niche crates - The model generates code based on training data patterns and may reproduce coding practices from the dataset - Users should review and test generated code before using in production environments - The model should not be used as a substitute for understanding fundamental programming concepts For questions, issues, or contributions, please contact [your contact information or GitHub profile]. - Thanks to the Tesslate team for providing the Rust dataset - Built upon the excellent Qwen3-1.7B foundation model by Alibaba Cloud - Special recognition to the Rust community for their contributions to open-source Rust code This model is part of ongoing efforts to make Rust programming more accessible through AI assistance.

license:apache-2.0

OR1-Behemoth

NaNK

license:apache-2.0

SmolLM-EMC2

SmolLM-EMC2 is a specialized fine-tuned language model based on HuggingFace's SmolLM3-3B architecture, optimized for enhanced reasoning capabilities and computational thinking tasks. The model demonstrates improved performance in logical reasoning, mathematical problem-solving, and structured analytical tasks while maintaining the compact efficiency of the base SmolLM3 framework. - Model Name: Daemontatox/SmolLM-EMC2 - Base Model: HuggingFaceTB/SmolLM3-3B - Model Type: Causal Language Model (Decoder-only Transformer) - Parameters: ~3 billion - Architecture: SmolLM3 (optimized transformer architecture) - License: Apache 2.0 - Language: English - Developer: Daemontatox Training Framework - Framework: Unsloth + Hugging Face TRL - Training Speed: 2x faster than standard fine-tuning approaches - Fine-tuning Method: Parameter-efficient fine-tuning with optimized memory usage Training Objective The model was fine-tuned to enhance: - Analytical reasoning and step-by-step problem decomposition - Mathematical and logical thinking capabilities - Structured response generation with clear reasoning chains - Multi-step problem-solving across diverse domains Training Data Characteristics - Curated datasets emphasizing reasoning patterns - Multi-domain problem-solving examples - Structured analytical workflows - Mathematical and logical reasoning tasks Primary Strengths 1. Enhanced Reasoning: Superior performance on multi-step logical problems 2. Structured Analysis: Clear decomposition of complex tasks into manageable components 3. Mathematical Competency: Improved arithmetic and algebraic reasoning 4. Systematic Thinking: Consistent application of analytical frameworks Recommended Applications - Educational Support: Tutoring and explanation of complex concepts - Research Assistant: Hypothesis generation and analytical framework development - Problem-Solving: Multi-step reasoning in technical domains - Code Analysis: Understanding and explaining algorithmic logic (especially Rust/Python) - Academic Writing: Structured argument development and analysis Performance Domains - Mathematical reasoning and computation - Logical puzzle solving - Scientific methodology and experimental design - Technical documentation and explanation - Strategic planning and decision-making frameworks Inference Requirements - Minimum VRAM: 6GB (FP16) - Recommended VRAM: 8GB+ for optimal performance - CPU RAM: 8GB minimum - Quantization Support: Compatible with 4-bit and 8-bit quantization Optimal Prompting Strategy For best results, use structured prompts that encourage analytical thinking: Benchmarks - Mathematical Reasoning: Improved performance on GSM8K-style problems - Logical Reasoning: Enhanced accuracy on multi-step inference tasks - Code Understanding: Superior performance on algorithmic explanation tasks - Analytical Tasks: Consistent structured reasoning across domains Limitations - Context Window: Limited to 2048 tokens - Domain Scope: Optimized for analytical tasks; may show reduced performance on creative writing - Computational Resources: Requires adequate VRAM for optimal inference speed - Factual Knowledge: Knowledge cutoff inherited from base model training data Intended Use - Educational and research applications - Analytical and problem-solving assistance - Technical documentation and explanation - Academic and professional development tools Limitations and Biases - May inherit biases from base model and fine-tuning data - Performance varies across different cultural and linguistic contexts - Should not replace human judgment in critical decision-making - Requires validation of outputs in high-stakes applications Responsible Use Guidelines - Verify important factual claims independently - Use as a reasoning assistant, not authoritative source - Consider potential biases in analytical frameworks - Maintain human oversight in critical applications - Base Model: HuggingFace Team for SmolLM3-3B - Training Framework: Unsloth team for optimized fine-tuning capabilities - Infrastructure: Hugging Face Transformers and TRL libraries - v1.0: Initial release with enhanced reasoning capabilities - Future Updates: Planned improvements in context length and domain-specific performance

NaNK

license:apache-2.0

FerrisMind

Model Details - Model name: Daemontatox/FerrisMind - Developed by: Daemontatox - Year released: 2025 - License: apache-2.0 - Base model: unsloth/qwen3-coder-30b-a3b-instruct - Model type: Instruction-tuned large language model for code generation, specifically designed to mimic hybrid thinking and utilize it in coding instruct models. Model Summary FerrisMind is a finetuned variant of Qwen3 Coder Flash, specialized for Rust programming. It was trained using GRPO in an attempt to mimic hybrid thinking and utilize it in coding instruct models. It is optimized for: - Idiomatic Rust generation - High-performance and memory-safe code practices - Fast inference and completion speed - Practical coding assistant tasks, from boilerplate scaffolding to compiler-level optimizations Intended Use - Rust development assistance - Generating idiomatic and production-ready Rust code - Accelerating prototyping and compiler-level workflows - Educational use for learning Rust best practices Out of Scope - Non-code general conversation - Unsafe or malicious code generation Training - Finetuned from: unsloth/qwen3-coder-30b-a3b-instruct - Objective: Specialization in Rust code generation and idiomatic best practices, mimicking hybrid thinking. - Methods: Instruction tuning with GRPO and domain-specific data Limitations - May generate non-compiling Rust code in complex cases Example Usage ```rust // Example: Async file reader in idiomatic Rust use tokio::fs::File; use tokio::io::{self, AsyncReadExt}; #[tokio::main] async fn main() -> io::Result { let mut file = File::open("example.txt").await?; let mut contents = String::new(); file.readtostring(&mut contents).await?; println!("File content: {}", contents); Ok(()) }

NaNK

license:apache-2.0

mini-overthinker

highly experimental model , might not work as expected 🧠 Daemontatox/mini-overthinker A highly experimental attempt to fine-tune Magistral (Mistral) for enhanced staged reasoning with self-reflective thinking patterns. Base Model: `unsloth/magistral-small-2506` Fine-tuned by: `Daemontatox` Model Name: `Daemontatox/mini-overthinker` License: Apache 2.0 Language: English Status: 🔬 Experimental – Not intended for production use. > This model is not designed for production. It is an experimental prototype to explore cognitive-loop-style reasoning with reflection. It may behave unpredictably, hallucinate, or fail to follow standard instruction formats. Use only for research and prototyping. Think in staged batches. Insert intermediate reasoning steps. Pause to self-reflect on its own outputs. Encourage Theory-of-Mind-like behavior via structured thinking templates. Inspired by the SUPERTHINKER design used in `HelpingAI/Dhanishtha-2.0-SUPERTHINKER`, this model attempts a similar multi-phase thought process in a lightweight setup. > Special thanks to the creators of `HelpingAI/Dhanishtha-2.0-SUPERTHINKER` for the dataset structure and inspiration behind this staged reasoning approach. Requires explicit token triggers (` `, ` `, etc.) May hallucinate or get stuck in loops. Behavior can degrade in zero-shot usage. Not benchmarked, no alignment or safety tuning applied. Research in cognitive loops LLM agent architecture prototyping Simulating multi-phase reasoning Real-world deployment Safety-critical tasks Answer quality evaluation without verification

license:apache-2.0

HydraMind

NaNK

license:apache-2.0

SOCAM-V1

SOCAM-V1 (Social Cognitive Agent Model – V1) is a fine-tuned large language model built on top of Qwen/Qwen3-30B-A3B-Instruct. The model is trained to function as a Cognitive State Machine, extracting cognitive chains from natural social utterances based on Theory of Mind (ToM) reasoning. This provides an interpretable representation of a user’s cognitive state, supporting applications in dialogue systems, emotional support agents, and multi-agent cognitive architectures. ~45k structured samples with fields: situation, clue, thought, action, emotion Emotions restricted to: Love, Surprise, Joyful, Sad, Angry, Fearful Learning rate: 2e-4 (cosine schedule, warmup ratio 0.02) Hardware: H100-class GPU (8-bit quantization for feasibility) Converts free-text utterances into structured cognitive chains. Outputs deterministic JSON for easy downstream parsing. The model may misclassify ambiguous emotions (e.g., Sad vs Fearful). Outputs depend on the quality of the SOCAM dataset and may reflect dataset biases. Always validate JSON outputs before downstream use. Multi-agent cognitive architectures (Tracker, Updater, Reviewer, Responder). Dialogue systems requiring interpretable cognitive reasoning. @misc{socam2025, title = {SOCAM-V1: A Cognitive State Machine for Theory of Mind Reasoning}, author = {Ammar Alnagar}, year = {2025}, howpublished = {\url{https://huggingface.co/Daemontatox/SOCAM-V1}} } Training libraries: Unsloth, TRL, Hugging Face Transformers

NaNK

license:apache-2.0

Cogito-Maximus

This model, Cogito-Maximus, is a fine-tuned version of the `unsloth/qwen2.5-72b-instruct` base model, optimized for advanced text generation tasks. It leverages the power of Unsloth and Huggingface's TRL (Transformer Reinforcement Learning) library to achieve faster training and improved performance. Key Features - Base Model: `unsloth/qwen2.5-72b-instruct` - Training Acceleration: Trained 2x faster using Unsloth. - Fine-Tuning Framework: Utilizes Huggingface's TRL library. - Optimized for Inference: Ready for deployment in text-generation tasks with efficient inference capabilities. - License: Apache-2.0 Developed by - Author: Daemontatox - Organization: Independent Contributor Tags - Text Generation Inference - Transformers - Unsloth - Qwen2 - TRL License This model is released under the Apache-2.0 License, which allows for free use, modification, and distribution, provided the original license and copyright notice are included. Base Model The model is derived from the `unsloth/qwen2.5-72b-instruct`, a version of the Qwen2.5-72B instruction-tuned model. The base model is optimized for efficiency using bitsandbytes (bnb) 4-bit quantization. Training Process - Framework: The model was fine-tuned using Unsloth, a library designed to accelerate the training of large language models. - Acceleration: Training was completed 2x faster compared to traditional methods, thanks to Unsloth's optimizations. - Reinforcement Learning: Fine-tuning incorporated techniques from Huggingface's TRL library, enabling advanced instruction-tuning and alignment with human preferences. Primary Use Case This model is designed for text generation tasks, including but not limited to: - Instruction-following - Question answering - Content creation - Dialogue systems Limitations - The model is trained primarily on English data and may not perform as well on other languages. - While fine-tuned for instruction-following, outputs should be reviewed for accuracy and relevance in critical applications. Installation To use this model, ensure you have the following libraries installed:

NaNK

license:apache-2.0

Zireal-R1

license:apache-2.0