EpistemeAI

156 models • 45 total models in database

Sort by:

metatune-gpt20b-R0

Model Card We release open-weight metatune-gpt20b, fine tuned version of OpenAI's gpt-oss-20b model, this is one of the first public release recursive self improving AI. - Generates new data for itself, - Evaluates its performance, and - Adjusts its own hyperparameters based on improvement metrics. - Fine tune automaticlaly using unsloth SFT tuning techniques Use cases: - genuinely demonstrate scientific and mathematical understanding at a postdoctoral level. - coding - - Topics: Euler–Lagrange equation, vector calculus, statistical mechanics additional information Due to recursive self improvement method, there is no final model, but improved model, this is a 5th metacycle(generation) improved checkpoint model. Guardrails: - generally, please set reasoning = "high", it will usually prevent jailbreaking and prompt injection - use safety gpt oss 20b for guardrails before this model: openai/gpt-oss-safeguard-20b You can use `gpt-oss-120b` and `gpt-oss-20b` with Transformers. If you use the Transformers chat template, it will automatically apply the harmony response format. If you use `model.generate` directly, you need to apply the harmony format manually using the chat template or use our openai-harmony package. To get started, install the necessary dependencies to setup your environment: Once, setup you can proceed to run the model by running the snippet below: You can adjust the reasoning level that suits your task across three levels: Low: Fast responses for general dialogue. Medium: Balanced speed and detail. High: Deep and detailed analysis. The reasoning level can be set in the system prompts, e.g., "Reasoning: high". The gpt-oss models are excellent for: Web browsing (using built-in browsing tools) Function calling with defined schemas Agentic operations like browser tasks Both gpt-oss models can be fine-tuned for a variety of specialized use cases. This smaller model `gpt-oss-20b` can be fine-tuned on consumer hardware, whereas the larger `gpt-oss-120b` can be fine-tuned on a single H100 node. Benchmark These benchmark are current benchmark and not final benchmark, due to recursive fine tuning techniques self improves over time: hf (pretrained=EpistemeAI/metatune-gpt20b-R0,parallelize=True,dtype=bfloat16), genkwargs: (temperature=1,topp=1,maxnewtokens=1000), limit: 30.0, numfewshot: 5, batchsize: 1 | Tasks |metatune|MiniMax M1 80k|Llama 4 Maverick| |:----------------------------|:-----|:-----|:----- | |gsm8kcot |0.91 | - | - | |gpqadiamondcotnshot |0.722 |0.70 |0.67| |winigrande |0.785| - |-| |hellaswag |0.421| - |-| |arcchallenge |0.349| - |-| Thank you - OpenAI - Unsloth - Google Colab - Nvidia for A100 - Developed by: EpistemeAI - License: apache-2.0 - Finetuned from model : unsloth/gpt-oss-20b-unsloth-bnb-4bit This gptoss model was trained 2x faster with Unsloth and Huggingface's TRL library.

NaNK

license:apache-2.0

277

Dolphin-Llama-3.1-8B-orpo-v0.1-4bit-gguf

NaNK

llama

251

Episteme-gptoss-20b-RL

Summary This EpistemeAI model is based on GPT-OSS-20B and has been fine-tuned using the Unsloth RL framework to optimize inference efficiency while mitigating vulnerabilities such as reward hacking during reinforcement learning from human feedback (RLHF)–style training. The fine-tuning process emphasizes alignment robustness and efficiency, ensuring the model preserves its reasoning depth without incurring excessive computational overhead. The model to deliver 3x faster inference for gpt-oss-rl at ~ 21 tokens/s. For BF16, this model also achieves the fastest inference (~30 tokens/s) - This model were trained on OpenAI's harmony response format and should only be used with the harmony format as it will not work correctly otherwise. Permissive Apache 2.0 license: Build freely without copyleft restrictions or patent risk—ideal for experimentation, customization, and commercial deployment. Configurable reasoning effort: Easily adjust the reasoning effort (low, medium, high) based on your specific use case and latency needs. Full chain-of-thought: Gain complete access to the model’s reasoning process, facilitating easier debugging and increased trust in outputs. It’s not intended to be shown to end users. Fine-tunable: Fully customize models to your specific use case through parameter fine-tuning. Agentic capabilities: Use the models’ native capabilities for function calling, web browsing, Python code execution, and Structured Outputs. MXFP4 quantization: The models were post-trained with MXFP4 quantization of the MoE weights, making `gpt-oss-120b` run on a single 80GB GPU (like NVIDIA H100 or AMD MI300X) and the `gpt-oss-20b` model run within 16GB of memory. All evals were performed with the same MXFP4 quantization. You can use 'gpt-oss-20b-rl` with Transformers. If you use the Transformers chat template, it will automatically apply the harmony response format. If you use `model.generate` directly, you need to apply the harmony format manually using the chat template or use our openai-harmony package. To get started, install the necessary dependencies to setup your environment: Once, setup you can proceed to run the model by running the snippet below: | Tasks | Version | Filter | n-shot | Metric | Epistgptoss-20b-rl | gpt-oss-20 | DeepSeek-V3.2-Exp |GLM-4.6| |---------------------|---------|------------------|--------|------------|--------------- |------------|-------------------|-------| | gsm8k (cot) | 3 | flexible-extract | 5 | exactmatch| 0.956 | 0.78 | - | - | | gpqadiamond (cot) | 2 | flexible-extract | 5 | exactmatch| 0.8538+ | 0.666 | 0.799 | 0.829 | | mmlu | 2 | none | | acc | 0.8528+ | 0.853 | 0.85 | - | | humaneval | 1 | createtest | 0 | pass@1 | 0.8452+ | 0.73 | - | - | | mmlucollegebiology| 1 | none | 2 | acc | 1.0 | 1.0 | - | - | | mmluproxbiology | 1 | none | 5 | acc | 0.8452 | | - | - | | mmluproxcomputerscience | 1 | none | 5 | acc | 0.7851 | | - | - | | Math | | | | | | | - | - | | | | AIME2025I+II | 1 | none | 5 | acc | 0.9495 | 0.63 | 0.893 | 0.93 | - Benchmark used The Language Model Evaluation Harness Score from Artificialanalysis.ai - Developed by: EpistemeAI - License: apache-2.0 - Finetuned from model : unsloth/gpt-oss-20b-unsloth-bnb-4bit This gptoss model was trained 2x faster with Unsloth and Huggingface's TRL library.

NaNK

license:apache-2.0

230

PD_gemma-3n-E4B-v2

NaNK

—

202

VCoder-120b-1.0

NaNK

license:apache-2.0

197

metatune-gpt20b-R1.1

NaNK

license:apache-2.0

144

rsi-gpt-oss-120bv2-8bit

NaNK

license:apache-2.0

126

VibeCoder-20b-RL1_0

Summary This is an improved version using RL of the vibe-code LLM. It’s optimized to produce both natural-language and code completions directly from loosely structured, “vibe coding” prompts. Compared to earlier-generation LLMs, it has a lower prompt-engineering overhead and smoother latent-space interpolation, making it easier to guide toward usable code. The following capabilities can be leveraged: - Agentic capabilities: Use the OpenAI's gpt oss 20b models’ native capabilities for function calling, web browsing, Python code execution, and Structured Outputs. - This model were trained on our harmony response format and should only be used with the harmony format as it will not work correctly otherwise. This is a first-generation vibe-code LLM. It’s optimized to produce both natural-language and code completions directly from loosely structured, “vibe coding” prompts. Unlike earlier LLMs that demanded rigid prompt engineering, vibe-code interaction lowers the overhead: you can sketch intent, describe functionality in free-form language, or mix pseudo-code with natural text. The model interpolates smoothly in latent space, making it easier to guide toward usable and executable code. - Low Prompt-Engineering Overhead Accepts incomplete or intuitive instructions, reducing the need for explicit formatting or rigid templates. - Latent-Space Interpolation Transitions fluidly between natural-language reasoning and syntax-aware code generation. Produces semantically coherent code blocks even when the prompt is under-specified. - Multi-Domain Support Handles a broad range of programming paradigms: Python, JavaScript, C++, shell scripting, and pseudo-code scaffolding. - Context-Sensitive Completion Leverages attention mechanisms to maintain coherence across multi-turn coding sessions. - Syntax-Aware Decoding Biases output distribution toward syntactically valid tokens, improving out-of-the-box executability of code. - Probabilistic Beam & Sampling Controls Supports temperature scaling, top-k, and nucleus (top-p) sampling to modulate creativity vs. determinism. - Hybrid Text + Code Responses Generates inline explanations, design rationales, or docstrings alongside code for improved readability and maintainability. - Generate Product Requirements Documents (PRDs) - - Automatically creates detailed Product Requirements Documents (PRDs) that outline the purpose, features, user stories, technical considerations, and success metrics for new products or features. These PRDs serve as a single source of truth for product managers, engineers, and designers, ensuring alignment across teams, reducing miscommunication, and accelerating the product development lifecycle. The system can structure PRDs with sections such as problem statements, goals, assumptions, dependencies, user flows, and acceptance criteria, making them ready for direct integration into project management tools. | Tasks | Version | n-shot | Metric | VibeCoder-20b-0.02-DRL | gpt-oss-20 | Qwen 3 235B | |---------------------------|----------|--------|----------------|:---------------------------|:---------|:--------------| | gsm8kcot | 3 | 3 | exactmatch ↑ | 0.8452 | 0.78 | 0.82 | | humaneval | 1 | 0 | exactmatch ↑ | 0.933 | 0.73 | 0.92 | | mmlucollegebiology | 1 | 0 | exactmatch ↑ | 1.000 | — | — | | mmluHScomputerscience | 1 | 0 | exactmatch ↑ | 1.000 | — | — | | computersecurity | 1 | 2 | acc ↑ | 0.8528 | — | — | | collegecomputerscience | 1 | 2 | acc ↑ | 0.8528 | — | — | |gpqadiamondcotnshot | 2 | 5 |exactmatch|↑ | 0.7633 | 0.715 | — | Notes: - The `(+value)` indicates delta over baseline evaluation. - Metrics marked with `↑` denote that higher is better. - Dashes (`—`) indicate results not yet reported or evaluated. - Rapid prototyping & exploratory coding - Creative coding workflows with minimal boilerplate - Educational contexts where explanation + code matter equally - Interactive REPLs, notebooks, or editor assistants that thrive on loose natural-language input - Not tuned for production-grade formal verification. - May require post-processing or linting to ensure strict compliance with project coding standards. - Designed for “fast prototyping vibes”, not for long-horizon enterprise-scale codebases. You can use `gpt-oss-120b` and `gpt-oss-20b` with Transformers. If you use the Transformers chat template, it will automatically apply the harmony response format. If you use `model.generate` directly, you need to apply the harmony format manually using the chat template or use our openai-harmony package. To get started, install the necessary dependencies to setup your environment: Once, setup you can proceed to run the model by running the snippet below: - Developed by: EpistemeAI - License: apache-2.0 - Finetuned from model : unsloth/gpt-oss-20b-unsloth-bnb-4bit This gptoss model was trained 2x faster with Unsloth and Huggingface's TRL library. Citation If you use this dataset in your work, please cite it as:

NaNK

license:apache-2.0

107

metatune-gpt20b-R1

NaNK

license:apache-2.0

104

metatune-gpt20b

NaNK

license:apache-2.0

RSI-AI-V1.1

NaNK

license:apache-2.0

SAI-DeepCoder-14B-Preview-v1.0-GGUF

NaNK

license:mit

LexiVox

NaNK

llama

DeepCoder-14B-Preview-GGUF

NaNK

license:apache-2.0

Alpaca-Llama3.1-8B.gguf

NaNK

llama

rsi-gpt-oss-20b

NaNK

license:apache-2.0

Llama-3.2-3B-Agent007-Coder

Base model: unsloth/llama-3.2-3b-instruct-bnb-4bit. Language: English.

NaNK

llama

Iceball-Phi-3.5-mini-instruct-shareGPT

NaNK

llama

RSI-AI-V1.0

NaNK

license:apache-2.0

rsi-gpt-oss-20b-v1.1R1-16b

NaNK

license:apache-2.0

Mistral-Nemo-Instruct-12B-Philosophy-Math

Core purpose is to provide instruction in philosophy and mathematics using the base model unsloth/Mistral-Nemo-Instruct-2407-bnb-4bit with support for the English language.

NaNK

license:apache-2.0

Fireball-Meta-Llama-3.1-8B-Instruct-Agent-0.003-128K-f16-gguf

NaNK

llama

Llama-3.2-3B-Agent007-Coder-gguf

NaNK

llama

EmbeddingsG300M-ft

—

EpistemeAI-MathLlama-3.1-8B-V0.2-gguf

NaNK

license:apache-2.0

EpistemeAI-codegemma-2-9b

NaNK

—

metatune-gpt20b-R1.09

NaNK

license:apache-2.0

Fireball-Alpaca-Llama3.1.07-8B-Philos-Math-KTO-beta

Language model designed for philosophical and mathematical inquiries.

NaNK

llama

SADeepCoder-1.5B-Preview-v1

NaNK

license:mit

VibeCoder-20B-alpha-0.001

NaNK

license:apache-2.0

gpt-oss-20b-finetune-multilanguage

- Developed by: EpistemeAI - License: apache-2.0 - Finetuned from model : unsloth/gpt-oss-20b-unsloth-bnb-4bit This gptoss model was trained 2x faster with Unsloth and Huggingface's TRL library.

NaNK

license:apache-2.0

Alpaca-CodeLlama-3.1-8B

NaNK

llama

SAI-DeepMathCoder-14B-Preview-v1.0-geopolitical-unbiased

NaNK

license:mit

Fireball-Mistral-Nemo-Base-2407-sft-v2-f16-gguf

NaNK

license:apache-2.0

Fireball-Llama-3.1-8B-Instruct-v1-16bit

NaNK

llama

MathCoder-Llama3.1-8B-8bit-gguf

NaNK

llama

Athena-gemma-2-2b-it-f16-gguf

NaNK

license:apache-2.0

Fireball-Meta-Llama-3.1-8B-Instruct-Agent-0.003-gguf

NaNK

llama

gpt-oss-20b-stem-distilled-reasoning

NaNK

license:apache-2.0

Alpaca-CodeLlama-3.1-8B-16bit-gguf

NaNK

llama

Fireball-Mistral-Nemo-Base-2407-sft-v2.1-f16-gguf

license:apache-2.0

Fireball-3.1-8B-ORPO-gguf

NaNK

llama

gpt-oss-20b-language-thinking

- Developed by: EpistemeAI - License: apache-2.0 - Finetuned from model : unsloth/gpt-oss-20b-unsloth-bnb-4bit This gptoss model was trained 2x faster with Unsloth and Huggingface's TRL library.

NaNK

license:apache-2.0

ReasoningCore-3B-R01

Text generation inference model based on EpistemeAI's ReasoningCore-3B-0.

NaNK

llama

Fireball-Llama-3.1-8B-Instruct-v1-8bit-gguf

NaNK

llama

gpt-oss-20b-2-multilingual-reasoning

Welcome to the gpt-oss series, OpenAI’s open-weight models designed for powerful reasoning, agentic tasks, and versatile developer use cases. We’re releasing two flavors of these open models: - `gpt-oss-120b` — for production, general purpose, high reasoning use cases that fit into a single 80GB GPU (like NVIDIA H100 or AMD MI300X) (117B parameters with 5.1B active parameters) - `gpt-oss-20b` — for lower latency, and local or specialized use cases (21B parameters with 3.6B active parameters) Both models were trained on our harmony response format and should only be used with the harmony format as it will not work correctly otherwise. > [!NOTE] > This model card is dedicated to the smaller `gpt-oss-20b` model. Check out `gpt-oss-120b` for the larger model. Permissive Apache 2.0 license: Build freely without copyleft restrictions or patent risk—ideal for experimentation, customization, and commercial deployment. Configurable reasoning effort: Easily adjust the reasoning effort (low, medium, high) based on your specific use case and latency needs. Full chain-of-thought: Gain complete access to the model’s reasoning process, facilitating easier debugging and increased trust in outputs. It’s not intended to be shown to end users. Fine-tunable: Fully customize models to your specific use case through parameter fine-tuning. Agentic capabilities: Use the models’ native capabilities for function calling, web browsing, Python code execution, and Structured Outputs. MXFP4 quantization: The models were post-trained with MXFP4 quantization of the MoE weights, making `gpt-oss-120b` run on a single 80GB GPU (like NVIDIA H100 or AMD MI300X) and the `gpt-oss-20b` model run within 16GB of memory. All evals were performed with the same MXFP4 quantization. This model is specially trained for Multilingual Thinking. The response will be more speciailized in a given language.\ - Shows reasoning in different languaage, example: reasoning language: French You can use `gpt-oss-120b` and `gpt-oss-20b` with Transformers. If you use the Transformers chat template, it will automatically apply the harmony response format. If you use `model.generate` directly, you need to apply the harmony format manually using the chat template or use our openai-harmony package. To get started, install the necessary dependencies to setup your environment: Once, setup you can proceed to run the model by running the snippet below: Alternatively, you can run the model via `Transformers Serve` to spin up a OpenAI-compatible webserver: Learn more about how to use gpt-oss with Transformers. vLLM recommends using uv for Python dependency management. You can use vLLM to spin up an OpenAI-compatible webserver. The following command will automatically download the model and start the server. To learn about how to use this model with PyTorch and Triton, check out our reference implementations in the gpt-oss repository. If you are trying to run gpt-oss on consumer hardware, you can use Ollama by running the following commands after installing Ollama. If you are using LM Studio you can use the following commands to download. Check out our awesome list for a broader collection of gpt-oss resources and inference partners. You can download the model weights from the Hugging Face Hub directly from Hugging Face CLI: You can adjust the reasoning level that suits your task across three levels: Low: Fast responses for general dialogue. Medium: Balanced speed and detail. High: Deep and detailed analysis. The reasoning level can be set in the system prompts, e.g., "Reasoning: high". The gpt-oss models are excellent for: Web browsing (using built-in browsing tools) Function calling with defined schemas Agentic operations like browser tasks Both gpt-oss models can be fine-tuned for a variety of specialized use cases. This smaller model `gpt-oss-20b` can be fine-tuned on consumer hardware, whereas the larger `gpt-oss-120b` can be fine-tuned on a single H100 node. See our collection for all versions of gpt-oss including GGUF, 4-bit & 16-bit formats. See Unsloth Dynamic 2.0 GGUFs for our quantization benchmarks. - Read our Blog about gpt-oss support: unsloth.ai/blog/gpt-oss - View the rest of our notebooks in our docs here. - Thank you to the llama.cpp team for their work on supporting this model. We wouldn't be able to release quants without them! The F32 quant is MXFP4 upcasted to BF16 for every single layer and is unquantized. - Developed by: EpistemeAI - License: apache-2.0 - Finetuned from model : unsloth/gpt-oss-20b-unsloth-bnb-4bit This gptoss model was trained 2x faster with Unsloth and Huggingface's TRL library.

NaNK

license:apache-2.0

Fireball-MathCoder-Llama-3.1-8B-v1dpo-16bit

NaNK

llama

Fireball-Llama-3.1-8B-Intruct-v1dpo-16bit

NaNK

llama

gpt-oss-20b-unsloth-Multilingual-Thinking-H

NaNK

license:apache-2.0

gpt-oss-20b-unsloth-puzzle-V3

NaNK

license:apache-2.0

gpt-oss-20b-unsloth-puzzle-25V1

NaNK

license:apache-2.0

Reasoning-Llama-3.2-1B-Instruct-v1.3-plus

NaNK

llama

ReasoningCore-Llama-3.2-1B-r1

NaNK

llama

Fireball-Mistral-Nemo-Base-2407-V2-f16-gguf

license:apache-2.0

PD_gemma-3n-E2B

Mode card Gemma is a family of lightweight, state-of-the-art open models from Google, built from the same research and technology used to create the Gemini models. This Plant Disease Gemma model is fine tuned version of the Gemma 3n E2B, it is fine-tuned with the plant disease dataset. This model specializes in the scientific analysis of plant diseases in image of plants. Most models lack accurate information on plant diseases. The purpose is to fine-tune the Gemma 3n model to specialize in scientific plant disease. - Input: - Text string, such as a question, a prompt, or a document to be summarized - Images, normalized to 256x256, 512x512, or 768x768 resolution and encoded to 256 tokens each - Audio data encoded to 6.25 tokens per second from a single channel - Total input context of 32K tokens - Output: - Generated text in response to the input, such as an answer to a question, analysis of image content, or a summary of a document - Total output length up to 32K tokens, subtracting the request input tokens Below, there are some code snippets on how to get quickly started with running the model. First, install the Transformers library. Gemma 3n is supported starting from transformers 4.53.0. Then, copy the snippet from the section that is relevant for your use case. You can initialize the model and processor for inference with `pipeline` as follows. With instruction-tuned models, you need to use chat templates to process our inputs first. Then, you can pass it to the pipeline. Model parameter: Model size: 8.39B Tensor type: BF16 Training Dataset - Dataset name: minhhungg/plant-disease-dataset - 70,295 rows - 70,295 24bit, 256x256 images of plant disease, questions and answers LoRa and Training Parameters - LoRA Adapter Parameters - r = 32, loraalpha = 32, loradropout = 0, bias = "none", randomstate = 3407 - Training Parameters - perdevicetrainbatchsize = 1, gradientaccumulationsteps = 4, gradientcheckpointing = True, gradientcheckpointingkwargs = {"usereentrant": False}, - maxgradnorm = 0.3, warmupratio = 0.03, maxsteps = 60, learningrate = 2e-4, loggingsteps = 1, savestrategy="steps", optim = "adamwtorchfused", weightdecay = 0.01, - lrschedulertype = "cosine", seed = 3407, These models have certain limitations that users should be aware of. Open generative models have a wide range of applications across various industries and domains. The following list of potential uses is not comprehensive. The purpose of this list is to provide contextual information about the possible use-cases that the model creators considered as part of model training and development. - Content Creation and Communication - Text Generation: Generate creative text formats such as poems, scripts, code, marketing copy, and email drafts. - Chatbots and Conversational AI: Power conversational interfaces for customer service, virtual assistants, or interactive applications. - Text Summarization: Generate concise summaries of a text corpus, research papers, or reports. - Image Data Extraction: Extract, interpret, and summarize visual data for text communications. - Audio Data Extraction: Transcribe spoken language, translate speech to text in other languages, and analyze sound-based data. - Research and Education - Natural Language Processing (NLP) and generative model Research: These models can serve as a foundation for researchers to experiment with generative models and NLP techniques, develop algorithms, and contribute to the advancement of the field. - Language Learning Tools: Support interactive language learning experiences, aiding in grammar correction or providing writing practice. - Knowledge Exploration: Assist researchers in exploring large bodies of data by generating summaries or answering questions about specific topics. - Training Data - The quality and diversity of the training data significantly influence the model's capabilities. Biases or gaps in the training data can lead to limitations in the model's responses. - The scope of the training dataset determines the subject areas the model can handle effectively. - Context and Task Complexity - Models are better at tasks that can be framed with clear prompts and instructions. Open-ended or highly complex tasks might be challenging. - A model's performance can be influenced by the amount of context provided (longer context generally leads to better outputs, up to a certain point). - Language Ambiguity and Nuance - Natural language is inherently complex. Models might struggle to grasp subtle nuances, sarcasm, or figurative language. - Factual Accuracy - Models generate responses based on information they learned from their training datasets, but they are not knowledge bases. They may generate incorrect or outdated factual statements. - Common Sense - Models rely on statistical patterns in language. They might lack the ability to apply common sense reasoning in certain situations. The development of generative models raises several ethical concerns. In creating an open model, we have carefully considered the following: - Bias and Fairness - Generative models trained on large-scale, real-world text and image data can reflect socio-cultural biases embedded in the training material. These models underwent careful scrutiny, input data pre-processing described and posterior evaluations reported in this card. - Misinformation and Misuse - Generative models can be misused to generate text that is false, misleading, or harmful. - Guidelines are provided for responsible use with the model, see the Responsible Generative AI Toolkit. - Transparency and Accountability: - This model card summarizes details on the models' architecture, capabilities, limitations, and evaluation processes. - A responsibly developed open model offers the opportunity to share innovation by making generative model technology accessible to developers and researchers across the AI ecosystem. - Perpetuation of biases: It's encouraged to perform continuous monitoring (using evaluation metrics, human review) and the exploration of de-biasing techniques during model training, fine-tuning, and other use cases. - Generation of harmful content: Mechanisms and guidelines for content safety are essential. Developers are encouraged to exercise caution and implement appropriate content safety safeguards based on their specific product policies and application use cases. - Misuse for malicious purposes: Technical limitations and developer and end-user education can help mitigate against malicious applications of generative models. Educational resources and reporting mechanisms for users to flag misuse are provided. Prohibited uses of Gemma models are outlined in the Gemma Prohibited Use Policy. - Privacy violations: Models were trained on data filtered for removal of certain personal information and other sensitive data. Developers are encouraged to adhere to privacy regulations with privacy-preserving techniques. special fine-tuned for vision plant disease detection and scientific solution. - Developed by: EpistemeAI - License: apache-2.0 - Finetuned from model : unsloth/gemma-3n-e2b-unsloth-bnb-4bit This gemma3n model was trained 2x faster with Unsloth and Huggingface's TRL library.

NaNK

license:cc-by-4.0

gpt-oss-20b-unsloth-Multilingual-Thinking-V2

- Developed by: EpistemeAI - License: apache-2.0 - Finetuned from model : unsloth/gpt-oss-20b-unsloth-bnb-4bit This gptoss model was trained 2x faster with Unsloth and Huggingface's TRL library.

NaNK

license:apache-2.0

ReasoningCore-Llama-3.2-3B-r1-v1_2

NaNK

llama

Fireball-Llama-3.1-8B-v1dpo-16bit-gguf

NaNK

llama

Fireball-Nemo-Base-2407-v1dpo-f16-gguf

NaNK

license:apache-2.0

Fireball-Llama-3.1-8B-Philos-Math-v0.1-f16-gguf

NaNK

llama

Fireball-Meta-Llama-3.1-8B-Instruct-Math-f16-gguf

NaNK

llama

Hercules1-8B-E4B-it

NaNK

license:apache-2.0

gpt-oss-20b-mxfp4

This is Mxfp4 version of unsloth/gpt-oss-20b-unsloth-bnb-4bit for easier fine tuning - Developed by: EpistemeAI - License: apache-2.0 - Finetuned from model : unsloth/gpt-oss-20b-unsloth-bnb-4bit This gptoss model was trained 2x faster with Unsloth and Huggingface's TRL library.

NaNK

license:apache-2.0

Fireball-12B-v1.13a-philosophers

EpistemeAI Fireball is a language model based on EpistemeAI Fireball Mistral Nemo 12B cot orcas, designed for philosophical inquiries and discussions.

NaNK

license:apache-2.0

Fireball-Mistral-Nemo-Base-2407-sft-v1-8b-gguf

NaNK

license:apache-2.0

Gemma3n-E4B-ap

NaNK

license:apache-2.0

gpt-oss-20b-mmlustem

NaNK

—

Iceball-Phi-3.5-mini-instruct-shareGPT-v1.00-f16-gguf

llama

ReasoningCore-3B-T1-V1

Text generation inference model based on EpistemeAI/ReasoningCore-3B-T1_1.

NaNK

llama

Fireball-R1-Llama-3.1-8B

Language model with capabilities in English. Licensed under Llama 3.1.

NaNK

llama

SAI-DeepMathCoder-14B-Preview-v1.0-geopolitical-unbiased-gguf

NaNK

license:apache-2.0

Hercules1-8B-E2B-it

NaNK

license:apache-2.0

PD_gemma-3n-E4B-4bit

NaNK

license:apache-2.0

gpt-oss-20b-2-mmlustem-2

Early experiment on self generated synthetic fine tuning techniques. Specialize with STEM and science for science purpose AI. Specialize with STEM and science for science purpose AI. This idea captures the need to design artificial intelligence systems that aren’t just generalists but are deeply tuned for scientific exploration and problem-solving. By focusing on science, technology, engineering, and mathematics, such AI can move beyond surface-level pattern recognition and instead tackle real challenges in physics, biology, chemistry, and mathematics with rigor. Imagine AI models that assist in discovering new materials, predicting protein folding with precision, optimizing renewable energy systems, or solving abstract mathematical conjectures. These are not applications where shallow training suffices—this requires an AI mindset that mirrors the scientific method: hypothesize, test, refine, and explain. A purpose-built science AI would act less like a chatbot and more like a laboratory collaborator, accelerating the pace of discovery while remaining grounded in evidence and reproducibility. - Developed by: Thomas YIu - Model type: GPT, gpt oss 20b - Language(s) (NLP): English and others - License: apache-2.0 - Finetuned from model [optional]: unsloth/gpt-oss-20b-unsloth-bnb-4bit - Repository: [More Information Needed] - Paper [optional]: [More Information Needed] - Demo [optional]: [More Information Needed] Specialized large language model fine-tuned for STEM (Science, Technology, Engineering, and Mathematics) domains. Improved MMLU-STEM performance by 30% through special fine-tuning of GPT-OSS-20B with a self-generated dataset containing reasoning traces and domain-specific multiple-choice questions. Direct Use - Answering science and engineering multiple-choice questions with higher accuracy. - Providing reasoning traces in mathematics and STEM domains. - Assisting as a study aid for researchers, engineers, and students in technical fields. Downstream Use (optional) - Reasoning engine for tutoring systems in physics, math, chemistry, or engineering. - Core component in scientific research assistants (hypothesis testing, summarizing papers). - Backend for exam preparation platforms and evaluation pipelines. Out-of-Scope Use - High-stakes decision-making without human verification (e.g., medical diagnoses, autonomous lab control). - Non-STEM general knowledge or commonsense tasks outside the model’s training domain. - Applications requiring ethical or social judgment. Bias, Risks, and Limitations - The model is biased toward STEM reasoning tasks and may underperform on humanities or everyday reasoning. - Risk of hallucinated precision: outputs may appear mathematically rigorous but contain subtle errors. - Users should treat results as hypotheses, not ground truth. Recommendations - Always apply human oversight in professional or research-grade applications. - For safe deployment, pair the model with verification tools (e.g., symbolic solvers, fact-checkers). Training Data - Self-generated STEM dataset (MMLU-style Q&A + reasoning traces). - Balanced coverage of physics, chemistry, biology, computer science, and mathematics. Training Procedure - Preprocessing: Tokenization, reasoning trace generation, Alpaca-style formatting. - Training regime: bf16 mixed precision - Batch size: 2 per device (gradient accumulation = 4) - Learning rate: 2e-4 with cosine scheduler - Epochs: 4 - Optimizer: AdamW 8-bit Compute - Model size: 20B parameters - Fine-tuning time: ~24 GPU-hours on 8×A100-40GB - Checkpoint size: ~40GB (smaller if LoRA adapters used) Testing Data - MMLU-STEM subset (10k+ science and engineering multiple-choice questions). Metrics - Accuracy (primary). - Reasoning consistency (qualitative). | Domain | Baseline GPT-OSS-20B | Fine-Tuned GPT-OSS-20B | Δ Improvement | |----------------|----------------------|-------------------------|---------------| | Mathematics | 52% | 69% | +17% | | Physics | 48% | 64% | +16% | | Chemistry | 50% | 66% | +16% | | Biology | 55% | 70% | +15% | | Comp. Science | 58% | 72% | +14% | | Average | 53% | 69% | +16% | Summary: Fine-tuning with STEM-specialized data produced substantial gains in domain-specific reasoning, particularly in mathematics and physics. - Hardware Type: 8× NVIDIA A100-40GB - Hours used: ~24 - Cloud Provider: [specify, e.g., AWS/GCP/Azure] - Region: [specify, e.g., us-west-2] - Carbon Emitted: Estimate ≈ XX kg CO2eq (calculated with ML Impact Calculator) Model Architecture - Decoder-only Transformer (GPT-OSS-20B). - Fine-tuned for causal LM objective with instruction-response data. Compute Infrastructure - Hardware: 8× A100-40GB GPUs (NVLink). - Software: PyTorch, Hugging Face Transformers, TRL, Unsloth. - Precision: bf16 mixed precision. - Optimizer: AdamW 8-bit. If you use this model in your research, please cite:

NaNK

—

Fireball-Meta-Llama-3.1-8B-Instruct-Agent-0.003-128K-code-ds-auto

NaNK

llama

ReasoningCore-3B-0

Language: English. License: Llama 3.2.

NaNK

llama

Reasoning-Llama-3.1-CoT-RE1-NMT-V3-ORPO

llama

OpenReason-Llama-3.2-1B-rs1

NaNK

llama

OpenReason-Llama-3.2-3B-rs1.01

NaNK

llama

Fireball-R1-Llama-3.1-8B-Medical-COT

Text generation inference model based on EpistemeAI's Fireball R1 Llama 3.1 8B.

NaNK

llama

Iceball-Phi-3.5-mini-instruct-shareGPT-f16-gguf

llama

Reasoning-Llama-3.1-CoT-RE1-NMT-V2-ORPO

Text generation inference model based on EpistemeAI's Reasoning Llama 3.1 CoT RE1 NMT V2.

NaNK

llama

Fireball-R1.1-Llama-3.1-8B

Text generation inference model based on EpistemeAI/Fireball-R1-Llama-3.1-8B.

NaNK

llama

Athena-codegemma-2-9b-v1.1a

NaNK

license:apache-2.0

Reasoning-Llama-3.1-CoT-RE1-NMT

This model is designed for reasoning tasks and utilizes datasets such as AI-MO/NuminaMath-TIR and bespokelabs/Bespoke-Stratos-17k.

llama

Fireball-Alpaca-Llama3.1.08-8B-C-R1-KTO-Relection-gguf

NaNK

llama

Reasoning-Llama-3.2-1B-Instruct-v1.3

Text generation inference model based on EpistemeAI's Reasoning Llama 3.2 1B Instruct version 1.2.

NaNK

llama

ReasoningCore-Llama-3.2-3B-r1

NaNK

llama

VibeCoder-20B-1.1

- Developed by: EpistemeAI - License: apache-2.0 - Finetuned from model : unsloth/gpt-oss-20b-unsloth-bnb-4bit This gptoss model was trained 2x faster with Unsloth and Huggingface's TRL library.

NaNK

license:apache-2.0

Athena-gemma-2-9b-it-Philos

NaNK

license:apache-2.0

Fireball-Llama-3.11-8B-v1orpo

NaNK

llama

Fireball-Llama-3.11-8B-v2orpo

NaNK

llama

Fireball-Llama-3.11-8B-v2r1orpo

NaNK

llama

MathCoder-Llama3.1-8B-cot

NaNK

llama

Athena-codegemma-2-9b

NaNK

license:apache-2.0

Athena-gemma-2-2b-it-Philos-KTO-f16-gguf

NaNK

license:apache-2.0

ReasoningCore-3B-RE1-V2

Language model for English with a license for Llama 3.2.

NaNK

llama

Hercules-Coder-E4B-it-V2

This fine-tuned model was fine-tune with code force dataset - open-r1/codeforces-cots. This will increase coding intelligence of the gemma 3n-e4b. The model will be used for coding purposes, especially for on device code editing. | Tasks |Version| Filter |n-shot| Metric | |Value | |---------|------:|-----------|-----:|---------|---|-----:| |mbpp | 1|none | 1|passat1|↑ |0.6318| - Developed by: EpistemeAI - License: apache-2.0 - Finetuned from model : unsloth/gemma-3n-e4b-it-unsloth-bnb-4bit This gemma3n model was trained 2x faster with Unsloth and Huggingface's TRL library.

NaNK

license:apache-2.0

gpt-oss-20b-stem-4bit

Welcome to the gpt-oss series, OpenAI’s open-weight models designed for powerful reasoning, agentic tasks, and versatile developer use cases. We’re releasing two flavors of these open models: - `gpt-oss-20b` — for lower latency, and local or specialized use cases (21B parameters with 3.6B active parameters) Both models were trained on our harmony response format and should only be used with the harmony format as it will not work correctly otherwise. > [!NOTE] > This model card is dedicated to the smaller `gpt-oss-20b` model. Check out `gpt-oss-120b` for the larger model. Permissive Apache 2.0 license: Build freely without copyleft restrictions or patent risk—ideal for experimentation, customization, and commercial deployment. Configurable reasoning effort: Easily adjust the reasoning effort (low, medium, high) based on your specific use case and latency needs. Full chain-of-thought: Gain complete access to the model’s reasoning process, facilitating easier debugging and increased trust in outputs. It’s not intended to be shown to end users. Fine-tunable: Fully customize models to your specific use case through parameter fine-tuning. Agentic capabilities: Use the models’ native capabilities for function calling, web browsing, Python code execution, and Structured Outputs. MXFP4 quantization: The models were post-trained with MXFP4 quantization of the MoE weights, making `gpt-oss-120b` run on a single 80GB GPU (like NVIDIA H100 or AMD MI300X) and the `gpt-oss-20b` model run within 16GB of memory. All evals were performed with the same MXFP4 quantization. This model is specially trained for STEM. - This model is trainted with STEM dataset to improve on STEM You can use `gpt-oss-120b` and `gpt-oss-20b` with Transformers. If you use the Transformers chat template, it will automatically apply the harmony response format. If you use `model.generate` directly, you need to apply the harmony format manually using the chat template or use our openai-harmony package. To get started, install the necessary dependencies to setup your environment: Once, setup you can proceed to run the model by running the snippet below: Alternatively, you can run the model via `Transformers Serve` to spin up a OpenAI-compatible webserver: Learn more about how to use gpt-oss with Transformers. vLLM recommends using uv for Python dependency management. You can use vLLM to spin up an OpenAI-compatible webserver. The following command will automatically download the model and start the server. To learn about how to use this model with PyTorch and Triton, check out our reference implementations in the gpt-oss repository. If you are trying to run gpt-oss on consumer hardware, you can use Ollama by running the following commands after installing Ollama. If you are using LM Studio you can use the following commands to download. Check out our awesome list for a broader collection of gpt-oss resources and inference partners. You can download the model weights from the Hugging Face Hub directly from Hugging Face CLI: You can adjust the reasoning level that suits your task across three levels: Low: Fast responses for general dialogue. Medium: Balanced speed and detail. High: Deep and detailed analysis. The reasoning level can be set in the system prompts, e.g., "Reasoning: high". The gpt-oss models are excellent for: Web browsing (using built-in browsing tools) Function calling with defined schemas Agentic operations like browser tasks Both gpt-oss models can be fine-tuned for a variety of specialized use cases. This smaller model `gpt-oss-20b` can be fine-tuned on consumer hardware, whereas the larger `gpt-oss-120b` can be fine-tuned on a single H100 node. See our collection for all versions of gpt-oss including GGUF, 4-bit & 16-bit formats. See Unsloth Dynamic 2.0 GGUFs for our quantization benchmarks. - Read our Blog about gpt-oss support: unsloth.ai/blog/gpt-oss - View the rest of our notebooks in our docs here. - Thank you to the llama.cpp team for their work on supporting this model. We wouldn't be able to release quants without them! The F32 quant is MXFP4 upcasted to BF16 for every single layer and is unquantized. - Developed by: EpistemeAI - License: apache-2.0 - Finetuned from model : unsloth/gpt-oss-20b-unsloth-bnb-4bit This gptoss model was trained 2x faster with Unsloth and Huggingface's TRL library.

NaNK

license:apache-2.0

Fireball-3.1-8B-ORPO

NaNK

llama

EpistemeAI-codegemma-2-9b-ultra

NaNK

—

OpenCodeLlama-3.1-8B

NaNK

llama

Fireball-12B-v1.0

NaNK

license:apache-2.0

ReasoningCore-3B-T1_1

Text generation inference model based on EpistemeAI/Reasoning-Llama-3.2-3B-Math-Instruct-RE1-ORPO.

NaNK

llama

ReasoningCore-3B-RE1-V2B

Text generation inference model based on EpistemeAI/ReasoningCore-3B-RE1-V2A.

NaNK

llama

Fireball-MathCoder-Llama-3.1-8B-v1dpo-4bit

NaNK

llama

DeepThink-Phi4-gguf-all

NaNK

llama

gpt-oss-20b-unsloth-arcprize

NaNK

license:apache-2.0

gpt-oss-20b-2-mmlustem-2-mxfp4

- Developed by: [More Information Needed] - Funded by [optional]: [More Information Needed] - Shared by [optional]: [More Information Needed] - Model type: [More Information Needed] - Language(s) (NLP): [More Information Needed] - License: [More Information Needed] - Finetuned from model [optional]: [More Information Needed] - Repository: [More Information Needed] - Paper [optional]: [More Information Needed] - Demo [optional]: [More Information Needed] Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019). - Hardware Type: [More Information Needed] - Hours used: [More Information Needed] - Cloud Provider: [More Information Needed] - Compute Region: [More Information Needed] - Carbon Emitted: [More Information Needed]

NaNK

—

Iceball-Phi-3.5-mini-instruct-shareGPT-v1.0

llama

Athena-gemma-2-9b-it

NaNK

license:apache-2.0

Fireball-12B

Base model: EpistemeAI Fireball Mistral Nemo Base 2407 SFT v2.2a language: English.

NaNK

license:apache-2.0

Athena-codegemma-2-9b-v1

NaNK

license:apache-2.0

Athene-codegemma-2-7b-it-alpaca-v1.1

NaNK

license:apache-2.0

Athene-codegemma-2-7b-it-alpaca-v1.3

Base model: Athene Codegemma 2.7B IT Alpaca 1.1. Language: English.

NaNK

license:apache-2.0

ReasoningCore-3B-T1_2

NaNK

llama

ReasoningCore-3B-Instruct-r01-Reflect-Math

This is a reasoning and reflect instruction-tuned generative model in 3B size (text in/text out). Model Architecture: Llama 3.2 is an auto-regressive language model that uses an optimized transformer architecture. The tuned versions use supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) with GRPO fine tuning using unsloth, to align with human preferences for helpfulness and safety. Fine tune with Numina math dataset. Starting with `transformers >= 4.43.0` onward, you can run conversational inference using the Transformers `pipeline` abstraction or by leveraging the Auto classes with the `generate()` function. Make sure to update your transformers installation via `pip install --upgrade transformers`. - Developed by: EpistemeAI - License: apache-2.0 - Finetuned from model : EpistemeAI/ReasoningCore-3B-Instruct-r01-Reflect This llama model was trained 2x faster with Unsloth and Huggingface's TRL library.

NaNK

llama