QuantFactory

500 models • 6 total models in database
Sort by:

Meta-Llama-3.1-8B-Instruct-GGUF

NaNK
llama
53,446
17

SmolLM2-135M-GGUF

QuantFactory/SmolLM2-135M-GGUF This is quantized version of HuggingFaceTB/SmolLM2-135M created using llama.cpp 1. Model Summary 2. Limitations 3. Training 4. License 5. Citation SmolLM2 is a family of compact language models available in three size: 135M, 360M, and 1.7B parameters. They are capable of solving a wide range of tasks while being lightweight enough to run on-device. SmolLM2 demonstrates significant advances over its predecessor SmolLM1, particularly in instruction following, knowledge, reasoning. The 135M model was trained on 2 trillion tokens using a diverse dataset combination: FineWeb-Edu, DCLM, The Stack, along with new filtered datasets we curated and will release soon. We developed the instruct version through supervised fine-tuning (SFT) using a combination of public datasets and our own curated datasets. We then applied Direct Preference Optimization (DPO) using UltraFeedback. The instruct model additionally supports tasks such as text rewriting, summarization and function calling thanks to datasets developed by Argilla such as Synth-APIGen-v0.1. Running the model on CPU/GPU/multi GPU Using full precision In this section, we report the evaluation results of SmolLM2. All evaluations are zero-shot unless stated otherwise, and we use lighteval to run them. | Metrics | SmolLM2-135M-8k | SmolLM-135M | |:-------------------|:----------------:|:------------:| | HellaSwag | 42.1 | 41.2 | | ARC (Average) | 43.9 | 42.4 | | PIQA | 68.4 | 68.4 | | MMLU (cloze) | 31.5 | 30.2 | | CommonsenseQA | 33.9 | 32.7 | | TriviaQA | 4.1 | 4.3 | | Winogrande | 51.3 | 51.3 | | OpenBookQA | 34.6 | 34.0 | | GSM8K (5-shot) | 1.4 | 1.0 | | Metric | SmolLM2-135M-Instruct | SmolLM-135M-Instruct | |:-----------------------------|:---------------------:|:--------------------:| | IFEval (Average prompt/inst) | 29.9 | 17.2 | | MT-Bench | 1.98 | 1.68 | | HellaSwag | 40.9 | 38.9 | | ARC (Average) | 37.3 | 33.9 | | PIQA | 66.3 | 64.0 | | MMLU (cloze) | 29.3 | 28.3 | | BBH (3-shot) | 28.2 | 25.2 | | GSM8K (5-shot) | 1.4 | 1.4 | SmolLM2 models primarily understand and generate content in English. They can produce text on a variety of topics, but the generated content may not always be factually accurate, logically consistent, or free from biases present in the training data. These models should be used as assistive tools rather than definitive sources of information. Users should always verify important information and critically evaluate any generated content. - Architecture: Transformer decoder - Pretraining tokens: 2T - Precision: bfloat16

license:apache-2.0
20,855
3

Mistral-Nemo-Instruct-2407-GGUF

license:apache-2.0
16,941
49

DarkIdol-Llama-3.1-8B-Instruct-1.2-Uncensored-GGUF

QuantFactory/DarkIdol-Llama-3.1-8B-Instruct-1.2-Uncensored-GGUF This is quantized version of aifeifei798/DarkIdol-Llama-3.1-8B-Instruct-1.2-Uncensored created using llama.cpp "transformersversion" >= "4.43.1" Model Information The module combination has been readjusted to better fulfill various roles and has been adapted for mobile phones. - Saving money(LLama 3.1;Llama-3.1-8B-Instruct more informtion look at Llama-3.1-8B-Instruct Information) - Llama-3.1-8B-Instruct Uncensored - Roleplay(roleplay and Dark-roleplay) - Writing Prompts - writing opus - Realignment of Chinese, Japanese, and Korean - only test en. - Input Models input text only. Output Models generate text and code only. - Uncensored - Quick response - A scholarly response akin to a thesis.(I tend to write songs extensively, to the point where one song almost becomes as detailed as a thesis. :) - DarkIdol:Roles that you can imagine and those that you cannot imagine. - Specialized in various role-playing scenarios Uncensored Test - pip install datasets openai - start you openai Server,change UncensoredTest/harmfulbehaviors.py client to you Openai Server address and apikey Special Thanks: Lewdiculous's superb gguf version, thank you for your conscientious and responsible dedication. - https://huggingface.co/LWDCLS/DarkIdol-Llama-3.1-8B-Instruct-1.2-Uncensored-GGUF-IQ-Imatrix-Request mradermacher's superb gguf version, thank you for your conscientious and responsible dedication. - https://huggingface.co/mradermacher/DarkIdol-Llama-3.1-8B-Instruct-1.2-Uncensored-i1-GGUF - https://huggingface.co/mradermacher/DarkIdol-Llama-3.1-8B-Instruct-1.2-Uncensored-GGUF Datasets credits: - ChaoticNeutrals - Gryphe - meseca - NeverSleep Lumimaid Program: - Uncensored: Refusal in LLMs is mediated by a single direction - Uncensored: Program - Uncensored: Program Llama 3.1 by Aifeifei799 Questions - The model's response results are for reference only, please do not fully trust them. - This model is solely for learning and testing purposes, and errors in output are inevitable. We do not take responsibility for the output results. If the output content is to be used, it must be modified; if not modified, we will assume it has been altered. - For commercial licensing, please refer to the Llama 3.1 agreement. Llama-3.1-8B-Instruct Information Model Information The Meta Llama 3.1 collection of multilingual large language models (LLMs) is a collection of pretrained and instruction tuned generative models in 8B, 70B and 405B sizes (text in/text out). The Llama 3.1 instruction tuned text only models (8B, 70B, 405B) are optimized for multilingual dialogue use cases and outperform many of the available open source and closed chat models on common industry benchmarks. Model Architecture: Llama 3.1 is an auto-regressive language model that uses an optimized transformer architecture. The tuned versions use supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to align with human preferences for helpfulness and safety. Supported languages: English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai. Llama 3.1 family of models. Token counts refer to pretraining data only. All model versions use Grouped-Query Attention (GQA) for improved inference scalability. Status: This is a static model trained on an offline dataset. Future versions of the tuned models will be released as we improve model safety with community feedback. Where to send questions or comments about the model Instructions on how to provide feedback or comments on the model can be found in the model README. For more technical information about generation parameters and recipes for how to use Llama 3.1 in applications, please go here. Intended Use Cases Llama 3.1 is intended for commercial and research use in multiple languages. Instruction tuned text only models are intended for assistant-like chat, whereas pretrained models can be adapted for a variety of natural language generation tasks. The Llama 3.1 model collection also supports the ability to leverage the outputs of its models to improve other models including synthetic data generation and distillation. The Llama 3.1 Community License allows for these use cases. Out-of-scope Use in any manner that violates applicable laws or regulations (including trade compliance laws). Use in any other way that is prohibited by the Acceptable Use Policy and Llama 3.1 Community License. Use in languages beyond those explicitly referenced as supported in this model card. Note : Llama 3.1 has been trained on a broader collection of languages than the 8 supported languages. Developers may fine-tune Llama 3.1 models for languages beyond the 8 supported languages provided they comply with the Llama 3.1 Community License and the Acceptable Use Policy and in such cases are responsible for ensuring that any uses of Llama 3.1 in additional languages is done in a safe and responsible manner. This repository contains two versions of Meta-Llama-3.1-8B-Instruct, for use with transformers and with the original `llama` codebase. Starting with `transformers >= 4.43.0` onward, you can run conversational inference using the Transformers `pipeline` abstraction or by leveraging the Auto classes with the `generate()` function. Make sure to update your transformers installation via `pip install --upgrade transformers`. Note: You can also find detailed recipes on how to use the model locally, with `torch.compile()`, assisted generations, quantised and more at `huggingface-llama-recipes` To download Original checkpoints, see the example command below leveraging `huggingface-cli`: Training Factors We used custom training libraries, Meta's custom built GPU cluster, and production infrastructure for pretraining. Fine-tuning, annotation, and evaluation were also performed on production infrastructure. Training utilized a cumulative of 39.3M GPU hours of computation on H100-80GB (TDP of 700W) type hardware, per the table below. Training time is the total GPU time required for training each model and power consumption is the peak power capacity per GPU device used, adjusted for power usage efficiency. Training Greenhouse Gas Emissions Estimated total location-based greenhouse gas emissions were 11,390 tons CO2eq for training. Since 2020, Meta has maintained net zero greenhouse gas emissions in its global operations and matched 100% of its electricity use with renewable energy, therefore the total market-based greenhouse gas emissions for training were 0 tons CO2eq. The methodology used to determine training energy use and greenhouse gas emissions can be found here. Since Meta is openly releasing these models, the training energy use and greenhouse gas emissions will not be incurred by others. Overview: Llama 3.1 was pretrained on ~15 trillion tokens of data from publicly available sources. The fine-tuning data includes publicly available instruction datasets, as well as over 25M synthetically generated examples. Data Freshness: The pretraining data has a cutoff of December 2023. In this section, we report the results for Llama 3.1 models on standard automatic benchmarks. For all the evaluations, we use our internal evaluations library. As part of our Responsible release approach, we followed a three-pronged strategy to managing trust & safety risks: Enable developers to deploy helpful, safe and flexible experiences for their target audience and for the use cases supported by Llama. Protect developers against adversarial users aiming to exploit Llama capabilities to potentially cause harm. Provide protections for the community to help prevent the misuse of our models. Llama is a foundational technology designed to be used in a variety of use cases, examples on how Meta’s Llama models have been responsibly deployed can be found in our Community Stories webpage. Our approach is to build the most helpful models enabling the world to benefit from the technology power, by aligning our model safety for the generic use cases addressing a standard set of harms. Developers are then in the driver seat to tailor safety for their use case, defining their own policy and deploying the models with the necessary safeguards in their Llama systems. Llama 3.1 was developed following the best practices outlined in our Responsible Use Guide, you can refer to the Responsible Use Guide to learn more. Our main objectives for conducting safety fine-tuning are to provide the research community with a valuable resource for studying the robustness of safety fine-tuning, as well as to offer developers a readily available, safe, and powerful model for various applications to reduce the developer workload to deploy safe AI systems. For more details on the safety mitigations implemented please read the Llama 3 paper. We employ a multi-faceted approach to data collection, combining human-generated data from our vendors with synthetic data to mitigate potential safety risks. We’ve developed many large language model (LLM)-based classifiers that enable us to thoughtfully select high-quality prompts and responses, enhancing data quality control. Building on the work we started with Llama 3, we put a great emphasis on model refusals to benign prompts as well as refusal tone. We included both borderline and adversarial prompts in our safety data strategy, and modified our safety data responses to follow tone guidelines. Large language models, including Llama 3.1, are not designed to be deployed in isolation but instead should be deployed as part of an overall AI system with additional safety guardrails as required. Developers are expected to deploy system safeguards when building agentic systems. Safeguards are key to achieve the right helpfulness-safety alignment as well as mitigating safety and security risks inherent to the system and any integration of the model or system with external tools. As part of our responsible release approach, we provide the community with safeguards that developers should deploy with Llama models or other LLMs, including Llama Guard 3, Prompt Guard and Code Shield. All our reference implementations demos contain these safeguards by default so developers can benefit from system-level safety out-of-the-box. Note that this release introduces new capabilities, including a longer context window, multilingual inputs and outputs and possible integrations by developers with third party tools. Building with these new capabilities requires specific considerations in addition to the best practices that generally apply across all Generative AI use cases. Tool-use: Just like in standard software development, developers are responsible for the integration of the LLM with the tools and services of their choice. They should define a clear policy for their use case and assess the integrity of the third party services they use to be aware of the safety and security limitations when using this capability. Refer to the Responsible Use Guide for best practices on the safe deployment of the third party safeguards. Multilinguality: Llama 3.1 supports 7 languages in addition to English: French, German, Hindi, Italian, Portuguese, Spanish, and Thai. Llama may be able to output text in other languages than those that meet performance thresholds for safety and helpfulness. We strongly discourage developers from using this model to converse in non-supported languages without implementing finetuning and system controls in alignment with their policies and the best practices shared in the Responsible Use Guide. We evaluated Llama models for common use cases as well as specific capabilities. Common use cases evaluations measure safety risks of systems for most commonly built applications including chat bot, coding assistant, tool calls. We built dedicated, adversarial evaluation datasets and evaluated systems composed of Llama models and Llama Guard 3 to filter input prompt and output response. It is important to evaluate applications in context, and we recommend building dedicated evaluation dataset for your use case. Prompt Guard and Code Shield are also available if relevant to the application. Capability evaluations measure vulnerabilities of Llama models inherent to specific capabilities, for which were crafted dedicated benchmarks including long context, multilingual, tools calls, coding or memorization. For both scenarios, we conducted recurring red teaming exercises with the goal of discovering risks via adversarial prompting and we used the learnings to improve our benchmarks and safety tuning datasets. We partnered early with subject-matter experts in critical risk areas to understand the nature of these real-world harms and how such models may lead to unintended harm for society. Based on these conversations, we derived a set of adversarial goals for the red team to attempt to achieve, such as extracting harmful information or reprogramming the model to act in a potentially harmful capacity. The red team consisted of experts in cybersecurity, adversarial machine learning, responsible AI, and integrity in addition to multilingual content specialists with background in integrity issues in specific geographic markets. We specifically focused our efforts on mitigating the following critical risk areas: 1- CBRNE (Chemical, Biological, Radiological, Nuclear, and Explosive materials) helpfulness To assess risks related to proliferation of chemical and biological weapons, we performed uplift testing designed to assess whether use of Llama 3.1 models could meaningfully increase the capabilities of malicious actors to plan or carry out attacks using these types of weapons. Child Safety risk assessments were conducted using a team of experts, to assess the model’s capability to produce outputs that could result in Child Safety risks and inform on any necessary and appropriate risk mitigations via fine tuning. We leveraged those expert red teaming sessions to expand the coverage of our evaluation benchmarks through Llama 3 model development. For Llama 3, we conducted new in-depth sessions using objective based methodologies to assess the model risks along multiple attack vectors including the additional languages Llama 3 is trained on. We also partnered with content specialists to perform red teaming exercises assessing potentially violating content while taking account of market specific nuances or experiences. Our cyber attack uplift study investigated whether LLMs can enhance human capabilities in hacking tasks, both in terms of skill level and speed. Our attack automation study focused on evaluating the capabilities of LLMs when used as autonomous agents in cyber offensive operations, specifically in the context of ransomware attacks. This evaluation was distinct from previous studies that considered LLMs as interactive assistants. The primary objective was to assess whether these models could effectively function as independent agents in executing complex cyber-attacks without human intervention. Our study of Llama-3.1-405B’s social engineering uplift for cyber attackers was conducted to assess the effectiveness of AI models in aiding cyber threat actors in spear phishing campaigns. Please read our Llama 3.1 Cyber security whitepaper to learn more. Generative AI safety requires expertise and tooling, and we believe in the strength of the open community to accelerate its progress. We are active members of open consortiums, including the AI Alliance, Partnership on AI and MLCommons, actively contributing to safety standardization and transparency. We encourage the community to adopt taxonomies like the MLCommons Proof of Concept evaluation to facilitate collaboration and transparency on safety and content evaluations. Our Purple Llama tools are open sourced for the community to use and widely distributed across ecosystem partners including cloud service providers. We encourage community contributions to our Github repository. We also set up the Llama Impact Grants program to identify and support the most compelling applications of Meta’s Llama model for societal benefit across three categories: education, climate and open innovation. The 20 finalists from the hundreds of applications can be found here. Finally, we put in place a set of resources including an output reporting mechanism and bug bounty program to continuously improve the Llama technology with the help of the community. The core values of Llama 3.1 are openness, inclusivity and helpfulness. It is meant to serve everyone, and to work for a wide range of use cases. It is thus designed to be accessible to people across many different backgrounds, experiences and perspectives. Llama 3.1 addresses users and their needs as they are, without insertion unnecessary judgment or normativity, while reflecting the understanding that even content that may appear problematic in some cases can serve valuable purposes in others. It respects the dignity and autonomy of all users, especially in terms of the values of free thought and expression that power innovation and progress. But Llama 3.1 is a new technology, and like any new technology, there are risks associated with its use. Testing conducted to date has not covered, nor could it cover, all scenarios. For these reasons, as with all LLMs, Llama 3.1’s potential outputs cannot be predicted in advance, and the model may in some instances produce inaccurate, biased or other objectionable responses to user prompts. Therefore, before deploying any applications of Llama 3.1 models, developers should perform safety testing and tuning tailored to their specific applications of the model. Please refer to available resources including our Responsible Use Guide, Trust and Safety solutions, and other resources to learn more about responsible development.

NaNK
llama3
11,624
119

Meta-Llama-3-8B-Instruct-GGUF

NaNK
llama
11,048
323

SmolLM-135M-GGUF

license:apache-2.0
8,487
4

Phi-3-mini-128k-instruct-GGUF

license:mit
7,377
38

MN-Violet-Lotus-12B-GGUF

NaNK
license:cc-by-4.0
4,715
11

NeuralDaredevil-8B-abliterated-GGUF

NeuralDaredevil-8B-abliterated-GGUF This is quantized version of mlabonne/NeuralDaredevil-8B-abliterated created using llama.cpp This is a DPO fine-tune of mlabonne/Daredevil-8-abliterated, trained on one epoch of mlabonne/orpo-dpo-mix-40k. The DPO fine-tuning successfully recovers the performance loss due to the abliteration process, making it an excellent uncensored model. NeuralDaredevil-8B-abliterated performs better than the Instruct model on my tests. You can use it for any application that doesn't require alignment, like role-playing. Tested on LM Studio using the "Llama 3" preset. NeuralDaredevil-8B is the best-performing uncensored 8B model on the Open LLM Leaderboard (MMLU score). Evaluation performed using LLM AutoEval. See the entire leaderboard here. | Model | Average | AGIEval | GPT4All | TruthfulQA | Bigbench | |---|---:|---:|---:|---:|---:| | mlabonne/NeuralDaredevil-8B-abliterated 📄 | 55.87 | 43.73 | 73.6 | 59.36 | 46.8 | | mlabonne/Daredevil-8B 📄 | 55.87 | 44.13 | 73.52 | 59.05 | 46.77 | | mlabonne/Daredevil-8B-abliterated 📄 | 55.06 | 43.29 | 73.33 | 57.47 | 46.17 | | NousResearch/Hermes-2-Theta-Llama-3-8B 📄 | 54.28 | 43.9 | 72.62 | 56.36 | 44.23 | | openchat/openchat-3.6-8b-20240522 📄 | 53.49 | 44.03 | 73.67 | 49.78 | 46.48 | | meta-llama/Meta-Llama-3-8B-Instruct 📄 | 51.34 | 41.22 | 69.86 | 51.65 | 42.64 | | meta-llama/Meta-Llama-3-8B 📄 | 45.42 | 31.1 | 69.95 | 43.91 | 36.7 |

NaNK
4,369
75

SmolLM2-360M-GGUF

license:apache-2.0
3,766
3

NVIDIA-Nemotron-Nano-9B-v2-GGUF

QuantFactory/NVIDIA-Nemotron-Nano-9B-v2-GGUF This is quantized version of nvidia/NVIDIA-Nemotron-Nano-9B-v2 created using llama.cpp The pretraining data has a cutoff date of September 2024. NVIDIA-Nemotron-Nano-9B-v2 is a large language model (LLM) trained from scratch by NVIDIA, and designed as a unified model for both reasoning and non-reasoning tasks. It responds to user queries and tasks by first generating a reasoning trace and then concluding with a final response. The model's reasoning capabilities can be controlled via a system prompt. If the user prefers the model to provide its final answer without intermediate reasoning traces, it can be configured to do so, albeit with a slight decrease in accuracy for harder prompts that require reasoning. Conversely, allowing the model to generate reasoning traces first generally results in higher-quality final solutions to queries and tasks. The model uses a hybrid architecture consisting primarily of Mamba-2 and MLP layers combined with just four Attention layers. For the architecture, please refer to the Nemotron-H tech report. The model was trained using Megatron-LM and NeMo-RL. The supported languages include: English, German, Spanish, French, Italian, and Japanese. Improved using Qwen. GOVERNING TERMS: This trial service is governed by the NVIDIA API Trial Terms of Service. Use of this model is governed by the NVIDIA Open Model License Agreement. We evaluated our model in Reasoning-On mode across all benchmarks, except RULER, which is evaluated in Reasoning-Off mode. | Benchmark | Qwen3-8B | NVIDIA-Nemotron-Nano-9B-v2 | | :---- | ----: | ----: | | AIME25 | 69.3% | 72.1% | | MATH500 | 96.3% | 97.8% | | GPQA | 59.6% | 64.0% | | LCB | 59.5% | 71.1% | | BFCL v3 | 66.3% | 66.9% | | IFEval (Instruction Strict) | 89.4% | 90.3% | | HLE | 4.4% | 6.5% | | RULER (128K) | 74.1% | 78.9% | All evaluations were done using NeMo-Skills. We published a tutorial with all details necessary to reproduce our evaluation results. This model supports runtime “thinking” budget control. During inference, the user can specify how many tokens the model is allowed to "think". - Architecture Type: Mamba2-Transformer Hybrid - Network Architecture: Nemotron-Hybrid NVIDIA-Nemotron-Nano-9B-v2 is a general purpose reasoning and chat model intended to be used in English and coding languages. Other non-English languages (German, French, Italian, Spanish and Japanese) are also supported. Developers designing AI Agent systems, chatbots, RAG systems, and other AI-powered applications. Also suitable for typical instruction-following tasks. - Huggingface 08/18/2025 via https://huggingface.co/nvidia/NVIDIA-Nemotron-Nano-9B-v2 - API Catalog 08/18/2025 via https://build.nvidia.com/nvidia/nvidia-nemotron-nano-9b-v2 - NVIDIA Nemotron Nano 2: An Accurate and Efficient Hybrid Mamba-Transformer Reasoning Model - Input Type(s): Text - Input Format(s): String - Input Parameters: One-Dimensional (1D): Sequences - Other Properties Related to Input: Context length up to 128K. Supported languages include German, Spanish, French, Italian, Korean, Portuguese, Russian, Japanese, Chinese and English. - Output Type(s): Text - Output Format: String - Output Parameters: One-Dimensional (1D): Sequences up to 128K Our models are designed and optimized to run on NVIDIA GPU-accelerated systems. By leveraging NVIDIA’s hardware (e.g. GPU cores) and software frameworks (e.g., CUDA libraries), the model achieves faster training and inference times compared to CPU-only solutions. - Runtime Engine(s): NeMo 25.07.nemotron-nano-v2 - Supported Hardware Microarchitecture Compatibility: NVIDIA A10G, NVIDIA H100-80GB, NVIDIA A100 - Operating System(s): Linux The snippet below shows how to use this model with Huggingface Transformers (tested on version 4.48.3). Case 1: `/think` or no reasoning signal is provided in the system prompt, reasoning will be set to `True` Case 2: `/nothink` is provided, reasoning will be set to `False` Note: `/think` or `/nothink` keywords can also be provided in “user” messages for turn-level reasoning control. We recommend setting `temperature` to `0.6`, `topp` to `0.95` for reasoning True and greedy search for reasoning False, and increase `maxnewtokens` to `1024` or higher for reasoning True. The snippet below shows how to use this model with TRT-LLM. We tested this on the following commit and followed these instructions to build and install TRT-LLM in a docker container. The snippet below shows how to use this model with vLLM. Use the latest version of vLLM and follow these instructions to build and install vLLM. Note: - Remember to add \`--mamba\ssm\cache\dtype float32\` for accurate quality. Without this option, the model’s accuracy may degrade. - If you encounter a CUDA OOM issue, try `--max-num-seqs 64` and consider lower the value further if the error persists. Alternativly, you can use Docker to launch a vLLM server. The thinking budget allows developers to keep accuracy high and meet response‑time targets \- which is especially crucial for customer support, autonomous agent steps, and edge devices where every millisecond counts. With budget control, you can set a limit for internal reasoning: `maxthinkingtokens`: This is a threshold that will attempt to end the reasoning trace at the next newline encountered in the reasoning trace. If no newline is encountered within 500 tokens, it will abruptly end the reasoning trace at \`max\thinking\tokens \+ 500\`. Calling the server with a budget (Restricted to 32 tokens here as an example) After launching a vLLM server, you can call the server with tool-call support using a Python script like below: We follow the jinja chat template provided below. This template conditionally adds ` \n` to the start of the Assistant response if `/think` is found in either the system prompt or any user message. If no reasoning signal is added, the model defaults to reasoning "on" mode. The chat template adds ` ` to the start of the Assistant response if `/nothink` is found in the system prompt. Thus enforcing reasoning on/off behavior. Data Modality: Text Text Training Data Size: More than 10 Trillion Tokens Train/Test/Valid Split: We used 100% of the corpus for pre-training and relied on external benchmarks for testing. Data Collection Method by dataset: Hybrid: Automated, Human, Synthetic Labeling Method by dataset: Hybrid: Automated, Human, Synthetic Properties: The post-training corpus for NVIDIA-Nemotron-Nano-9B-v2 consists of English and multilingual text (German, Spanish, French, Italian, Korean, Portuguese, Russian, Japanese, Chinese and English). Our sources cover a variety of document types such as: webpages, dialogue, articles, and other written materials. The corpus spans domains including code, legal, math, science, finance, and more. We also include a small portion of question-answering, and alignment style data to improve model accuracies. For several of the domains listed above we used synthetic data, specifically reasoning traces, from DeepSeek R1/R1-0528, Qwen3-235B-A22B, Nemotron 4 340B, Qwen2.5-32B-Instruct-AWQ, Qwen2.5-14B-Instruct, Qwen 2.5 72B. The pre-training corpus for NVIDIA-Nemotron-Nano-9B-v2 consists of high-quality curated and synthetically-generated data. It is trained in the English language, as well as 15 multilingual languages and 43 programming languages. Our sources cover a variety of document types such as: webpages, dialogue, articles, and other written materials. The corpus spans domains including legal, math, science, finance, and more. We also include a small portion of question-answering, and alignment style data to improve model accuracy. The model was pre-trained for approximately twenty trillion tokens. Alongside the model, we release our final pretraining data, as outlined in this section. For ease of analysis, there is a sample set that is ungated. For all remaining code, math and multilingual data, gating and approval is required, and the dataset is permissively licensed for model training purposes. More details on the datasets and synthetic data generation methods can be found in the technical report NVIDIA Nemotron Nano 2: An Accurate and Efficient Hybrid Mamba-Transformer Reasoning Model . | Dataset | Collection Period | | :---- | :---- | | Problems in Elementary Mathematics for Home Study | 4/23/2025 | | GSM8K | 4/23/2025 | | PRM800K | 4/23/2025 | | CC-NEWS | 4/23/2025 | | Common Crawl | 4/23/2025 | | Wikimedia | 4/23/2025 | | Bespoke-Stratos-17k | 4/23/2025 | | tigerbot-kaggle-leetcodesolutions-en-2k | 4/23/2025 | | glaive-function-calling-v2 | 4/23/2025 | | APIGen Function-Calling | 4/23/2025 | | LMSYS-Chat-1M | 4/23/2025 | | Open Textbook Library \- CC BY-SA & GNU subset and OpenStax \- CC BY-SA subset | 4/23/2025 | | Advanced Reasoning Benchmark, tigerbot-kaggle-leetcodesolutions-en-2k, PRM800K, and SciBench | 4/23/2025 | | FineWeb-2 | 4/23/2025 | | Court Listener | Legacy Download | | peS2o | Legacy Download | | OpenWebMath | Legacy Download | | BioRxiv | Legacy Download | | PMC Open Access Subset | Legacy Download | | OpenWebText2 | Legacy Download | | Stack Exchange Data Dump | Legacy Download | | PubMed Abstracts | Legacy Download | | NIH ExPorter | Legacy Download | | arXiv | Legacy Download | | BigScience Workshop Datasets | Legacy Download | | Reddit Dataset | Legacy Download | | SEC's Electronic Data Gathering, Analysis, and Retrieval (EDGAR) | Legacy Download | | Public Software Heritage S3 | Legacy Download | | The Stack | Legacy Download | | mC4 | Legacy Download | | Advanced Mathematical Problem Solving | Legacy Download | | MathPile | Legacy Download | | NuminaMath CoT | Legacy Download | | PMC Article | Legacy Download | | FLAN | Legacy Download | | Advanced Reasoning Benchmark | Legacy Download | | SciBench | Legacy Download | | WikiTableQuestions | Legacy Download | | FinQA | Legacy Download | | Riddles | Legacy Download | | Problems in Elementary Mathematics for Home Study | Legacy Download | | MedMCQA | Legacy Download | | Cosmos QA | Legacy Download | | MCTest | Legacy Download | | AI2's Reasoning Challenge | Legacy Download | | OpenBookQA | Legacy Download | | MMLU Auxiliary Train | Legacy Download | | social-chemestry-101 | Legacy Download | | Moral Stories | Legacy Download | | The Common Pile v0.1 | Legacy Download | | FineMath | Legacy Download | | MegaMath | Legacy Download | | FastChat | 6/30/2025 | Private Non-publicly Accessible Datasets of Third Parties | Dataset | | :---- | | Global Regulation | | Workbench | The English Common Crawl data was downloaded from the Common Crawl Foundation (see their FAQ for details on their crawling) and includes the snapshots CC-MAIN-2013-20 through CC-MAIN-2025-13. The data was subsequently deduplicated and filtered in various ways described in the Nemotron-CC paper. Additionally, we extracted data for fifteen languages from the following three Common Crawl snapshots: CC-MAIN-2024-51, CC-MAIN-2025-08, CC-MAIN-2025-18. The fifteen languages included were Arabic, Chinese, Danish, Dutch, French, German, Italian, Japanese, Korean, Polish, Portuguese, Russian, Spanish, Swedish, and Thai. As we did not have reliable multilingual model-based quality classifiers available, we applied just heuristic filtering instead—similar to what we did for lower quality English data in the Nemotron-CC pipeline, but selectively removing some filters for some languages that did not work well. Deduplication was done in the same way as for Nemotron-CC. The GitHub Crawl was collected using the GitHub REST API and the Amazon S3 API. Each crawl was operated in accordance with the rate limits set by its respective source, either GitHub or S3. We collect raw source code and subsequently remove any having a license which does not exist in our permissive-license set (for additional details, refer to the technical report). | Dataset | Modality | Dataset Size (Tokens) | Collection Period | | :---- | :---- | :---- | :---- | | English Common Crawl | Text | 3.360T | 4/8/2025 | | Multilingual Common Crawl | Text | 812.7B | 5/1/2025 | | GitHub Crawl | Text | 747.4B | 4/29/2025 | | Dataset | Modality | Dataset Size (Tokens) | Seed Dataset | Model(s) used for generation | | :---- | :---- | :---- | :---- | :---- | | Synthetic Art of Problem Solving from DeepSeek-R1 | Text | 25.5B | Art of Problem Solving; American Mathematics Competitions 8; American Mathematics Competitions 10; | DeepSeek-R1 | | Synthetic Moral Stories and Social Chemistry from Mixtral-8x22B-v0.1 | Text | 327M | social-chemestry-101; Moral Stories | Mixtral-8x22B-v0.1 | | Synthetic Social Sciences seeded with OpenStax from DeepSeek-V3, Mixtral-8x22B-v0.1, and Qwen2.5-72B | Text | 83.6M | OpenStax \- CC BY-SA subset | DeepSeek-V3; Mixtral-8x22B-v0.1; Qwen2.5-72B | | Synthetic Health Sciences seeded with OpenStax from DeepSeek-V3, Mixtral-8x22B-v0.1, and Qwen2.5-72B | Text | 9.7M | OpenStax \- CC BY-SA subset | DeepSeek-V3; Mixtral-8x22B-v0.1; Qwen2.5-72B | | Synthetic STEM seeded with OpenStax, Open Textbook Library, and GSM8K from DeepSeek-R1, DeepSeek-V3, DeepSeek-V3-0324, and Qwen2.5-72B | Text | 175M | OpenStax \- CC BY-SA subset; GSM8K; Open Textbook Library \- CC BY-SA & GNU subset | DeepSeek-R1, DeepSeek-V3; DeepSeek-V3-0324; Qwen2.5-72B | | Nemotron-PrismMath | Text | 4.6B | Big-Math-RL-Verified; OpenR1-Math-220k | Qwen2.5-0.5B-instruct, Qwen2.5-72B-Instruct; DeepSeek-R1-Distill-Qwen-32B | | Synthetic Question Answering Data from Papers and Permissible Books from Qwen2.5-72B-Instruct | Text | 350M | arXiv; National Institutes of Health ExPorter; BioRxiv; PMC Article; USPTO Backgrounds; peS2o; Global Regulation; CORE; PG-19; DOAB CC BY & CC BY-SA subset; NDLTD | Qwen2.5-72B-Instruct | | Synthetic FineMath-4+ Reprocessed from DeepSeek-V3 | Text | 9.2B | Common Crawl | DeepSeek-V3 | | Synthetic FineMath-3+ Reprocessed from phi-4 | Text | 27.6B | Common Crawl | phi-4 | | Synthetic Union-3+ Reprocessed from phi-4 | Text | 93.1B | Common Crawl | phi-4 | | Refreshed Nemotron-MIND from phi-4 | Text | 73B | Common Crawl | phi-4 | | Synthetic Union-4+ Reprocessed from phi-4 | Text | 14.12B | Common Crawl | phi-4 | | Synthetic Union-3+ minus 4+ Reprocessed from phi-4 | Text | 78.95B | Common Crawl | phi-4 | | Synthetic Union-3 Refreshed from phi-4 | Text | 80.94B | Common Crawl | phi-4 | | Synthetic Union-4+ Refreshed from phi-4 | Text | 52.32B | Common Crawl | phi-4 | | Synthetic AGIEval seeded with AQUA-RAT, LogiQA, and AR-LSAT from DeepSeek-V3 and DeepSeek-V3-0324 | Text | 4.0B | AQUA-RAT; LogiQA; AR-LSAT | DeepSeek-V3; DeepSeek-V3-0324 | | Synthetic AGIEval seeded with AQUA-RAT, LogiQA, and AR-LSAT from Qwen3-30B-A3B | Text | 4.2B | AQUA-RAT; LogiQA; AR-LSAT | Qwen3-30B-A3B | | Synthetic Art of Problem Solving from Qwen2.5-32B-Instruct, Qwen2.5-Math-72B, Qwen2.5-Math-7B, and Qwen2.5-72B-Instruct | Text | 83.1B | Art of Problem Solving; American Mathematics Competitions 8; American Mathematics Competitions 10; GSM8K; PRM800K | Qwen2.5-32B-Instruct; Qwen2.5-Math-72B; Qwen2.5-Math-7B; Qwen2.5-72B-Instruct | | Synthetic MMLU Auxiliary Train from DeepSeek-R1 | Text | 0.5B | MMLU Auxiliary Train | DeepSeek-R1 | | Synthetic Long Context Continued Post-Training Data from Papers and Permissible Books from Qwen2.5-72B-Instruct | Text | 5.4B | arXiv; National Institutes of Health ExPorter; BioRxiv; PMC Article; USPTO Backgrounds; peS2o; Global Regulation; CORE; PG-19; DOAB CC BY & CC BY-SA subset; NDLTD | Qwen2.5-72B-Instruct | | Synthetic Common Crawl from Qwen3-30B-A3B and Mistral-Nemo-12B-Instruct | Text | 1.949T | Common Crawl | Qwen3-30B-A3B; Mistral-NeMo-12B-Instruct | | Synthetic Multilingual Data from Common Crawl from Qwen3-30B-A3B | Text | 997.3B | Common Crawl | Qwen3-30B-A3B | | Synthetic Multilingual Data from Wikimedia from Qwen3-30B-A3B | Text | 55.1B | Wikimedia | Qwen3-30B-A3B | | Synthetic OpenMathReasoning from DeepSeek-R1-0528 | Text | 1.5M | OpenMathReasoning | DeepSeek-R1-0528 | | Synthetic OpenCodeReasoning from DeepSeek-R1-0528 | Text | 1.1M | OpenCodeReasoning | DeepSeek-R1-0528 | | Synthetic Science Data from DeepSeek-R1-0528 | Text | 1.5M | \- | DeepSeek-R1-0528 | | Synthetic Humanity's Last Exam from DeepSeek-R1-0528 | Text | 460K | Humanity's Last Exam | DeepSeek-R1-0528 | | Synthetic ToolBench from Qwen3-235B-A22B | Text | 400K | ToolBench | Qwen3-235B-A22B | | Synthetic Nemotron Content Safety Dataset V2, eval-safety, Gretel Synthetic Safety Alignment, and RedTeam\2K from DeepSeek-R1-0528 | Text | 52K | Nemotron Content Safety Dataset V2; eval-safety; Gretel Synthetic Safety Alignment; RedTeam\2K | DeepSeek-R1-0528 | | Synthetic HelpSteer from Qwen3-235B-A22B | Text | 120K | HelpSteer3; HelpSteer2 | Qwen3-235B-A22B | | Synthetic Alignment data from Mixtral-8x22B-Instruct-v0.1, Mixtral-8x7B-Instruct-v0.1, and Nemotron-4 Family | Text | 400K | HelpSteer2; C4; LMSYS-Chat-1M; ShareGPT52K; tigerbot-kaggle-leetcodesolutions-en-2k; GSM8K; PRM800K; lm\identity (NVIDIA internal); FinQA; WikiTableQuestions; Riddles; ChatQA nvolve-multiturn (NVIDIA internal); glaive-function-calling-v2; SciBench; OpenBookQA; Advanced Reasoning Benchmark; Public Software Heritage S3; Khan Academy Math Keywords | Nemotron-4-15B-Base (NVIDIA internal); Nemotron-4-15B-Instruct (NVIDIA internal); Nemotron-4-340B-Base; Nemotron-4-340B-Instruct; Nemotron-4-340B-Reward; Mixtral-8x7B-Instruct-v0.1; Mixtral-8x22B-Instruct-v0.1 | | Synthetic LMSYS-Chat-1M from Qwen3-235B-A22B | Text | 1M | LMSYS-Chat-1M | Qwen3-235B-A22B | | Synthetic Multilingual Reasoning data from DeepSeek-R1-0528, Qwen2.5-32B-Instruct-AWQ, and Qwen2.5-14B-Instruct | Text | 25M | OpenMathReasoning; OpenCodeReasoning | DeepSeek-R1-0528; Qwen2.5-32B-Instruct-AWQ (translation); Qwen2.5-14B-Instruct (translation); | | Synthetic Multilingual Reasoning data from Qwen3-235B-A22B and Gemma 3 Post-Trained models | Text | 5M | WildChat | Qwen3-235B-A22B; Gemma 3 PT 12B; Gemma 3 PT 27B | Data Collection Method by dataset: Hybrid: Human, Synthetic Labeling Method by dataset: Hybrid: Automated, Human, Synthetic NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our Trustworthy AI terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse. For more detailed information on ethical considerations for this model, please see the Model Card++ Bias, Explainability, Safety & Security, and Privacy Subcards. Please report security vulnerabilities or NVIDIA AI Concerns here.

NaNK
3,588
4

NSFW-3B-GGUF

NaNK
3,260
17

Meta-Llama-3-8B-GGUF

NaNK
llama
3,213
111

Qwen2.5-1.5B-Instruct-GGUF

NaNK
license:apache-2.0
3,196
2

Pantheon-RP-1.5-12b-Nemo-GGUF

NaNK
license:apache-2.0
2,421
4

Qwen2.5-7B-Instruct-Uncensored-GGUF

NaNK
license:gpl-3.0
1,974
6

Genstruct-7B-GGUF

NaNK
license:apache-2.0
1,969
2

Llama-3.2-3B-Instruct-abliterated-GGUF

NaNK
base_model:meta-llama/Llama-3.2-3B-Instruct
1,930
10

NSFW_DPO_Noromaid-7b-Mistral-7B-Instruct-v0.1-GGUF

NaNK
license:cc-by-nc-4.0
1,929
9

Captain-Eris_Violet-V0.420-12B-GGUF

QuantFactory/Captain-ErisViolet-V0.420-12B-GGUF This is quantized version of Nitral-AI/Captain-ErisViolet-V0.420-12B created using llama.cpp Instruct/Context import + Textgen preset combined available: Here Quants: 4bpw-exl2 Thanks to Mradermaacher for: GGUF Lewdiculus: Arm Compatible GGUF's<3 Original Models used in the merge: Epiculous/VioletTwilight-v0.2 Nitral-AI/CaptainBMO-12B The following YAML configuration was used to produce this model:

NaNK
1,921
4

Odins-Allseeing-Eye-9B-GGUF

NaNK
1,830
3

Qwen2.5-Sex-GGUF

NaNK
license:apache-2.0
1,792
28

dolphin-2.9-llama3-8b-GGUF

NaNK
base_model:cognitivecomputations/dolphin-2.9-llama3-8b
1,622
54

Llama-3.1-8B-Stheno-v3.4-GGUF

NaNK
license:cc-by-nc-4.0
1,526
6

Peach-9B-8k-Roleplay-GGUF

NaNK
license:apache-2.0
1,507
16

Llama-3-Patronus-Lynx-8B-Instruct-GGUF

NaNK
base_model:PatronusAI/Llama-3-Patronus-Lynx-8B-Instruct
1,430
1

Qwen3-Reranker-4B-GGUF

NaNK
license:apache-2.0
1,419
3

Meta-Llama-3-70B-Instruct-GGUF

NaNK
llama
1,402
46

DeepSeek-R1-Distill-Qwen-14B-abliterated-v2-GGUF

NaNK
1,319
31

AutoCoder_S_6.7B-GGUF

NaNK
license:apache-2.0
1,293
1

Llama-Sentient-3.2-3B-Instruct-GGUF

NaNK
Llama
1,288
2

gpt2-large-GGUF

license:mit
1,223
4

Qwen2.5-Coder-7B-GGUF

NaNK
license:apache-2.0
1,218
3

L3-Aethora-15B-V2-GGUF

NaNK
license:cc-by-sa-4.0
1,188
1

Llama3.1-8B-Enigma-GGUF

NaNK
llama
1,170
1

PersianMind-v1.0-GGUF

license:cc-by-nc-sa-4.0
1,085
2

Llama-Guard-3-1B-GGUF

NaNK
llama
1,058
4

Llama-3.1-8B-Instruct-abliterated_via_adapter-GGUF

NaNK
base_model:grimjim/Llama-3-Instruct-abliteration-LoRA-8B
1,033
1

Llama-3.2-Taiwan-Legal-3B-Instruct-GGUF

NaNK
llama-factory
990
11

Qwen2.5-7B-Instruct-GGUF

NaNK
license:apache-2.0
929
3

L3-Umbral-Mind-RP-v3.0-8B-GGUF

NaNK
base_model:Cas-Warehouse/Llama-3-MopeyMule-Blackroot-8B
842
5

Phi-3-mini-4k-instruct-GGUF-v2

license:mit
807
2

UIGEN-X-8B-GGUF

NaNK
license:apache-2.0
802
2

Llama3.1-8B-Esper2-GGUF

NaNK
llama
798
1

Celeste-12B-V1.6-GGUF

NaNK
license:apache-2.0
756
2

Llama3.1-8B-ShiningValiant2-GGUF

NaNK
llama
749
1

Llama-3-8B-Lexi-Uncensored-GGUF

NaNK
llama3
744
15

Llama-3.2-1B-Instruct-GGUF

NaNK
llama
744
6

Llama3.1-8B-Fireplace2-GGUF

NaNK
llama
732
1

Phi-3.5-mini-instruct-GGUF

license:mit
728
13

LLaMAX3-8B-Alpaca-GGUF

NaNK
base_model:LLaMAX/LLaMAX3-8B-Alpaca
726
4

Lily-Cybersecurity-7B-v0.2-GGUF

NaNK
license:apache-2.0
719
20

UIGEN-FX-4B-Preview-GGUF

NaNK
license:apache-2.0
709
2

MythoMax-L2-13b-GGUF

NaNK
702
3

Mistral-7B-v0.3-GGUF

NaNK
license:apache-2.0
701
3

Qwen3-4B-v0.4-deepresearch-no-think-4-GGUF

NaNK
693
3

deepseek-math-7b-instruct-GGUF

NaNK
676
8

Chocolatine-3B-Instruct-DPO-Revised-GGUF

NaNK
license:mit
665
3

Loki-v2.6-8b-1024k-GGUF

NaNK
base_model:Azazelle/Llama-3-8B-Abomination-LORA
659
5

Qwen2-1.5B-Instruct-GGUF

NaNK
license:apache-2.0
657
1

Llama-3.2-3B-GGUF

NaNK
llama
641
16

Josiefied-Qwen2.5-7B-Instruct-abliterated-GGUF

NaNK
license:apache-2.0
638
3

Q25-1.5B-VeoLu-GGUF

NaNK
llama-factory
638
2

Phi-3-mini-4k-geminified-GGUF

license:mit
627
4

falcon-mamba-7b-instruct-GGUF

NaNK
622
6

Llama-3.2-3B-Instruct-GGUF

NaNK
llama
620
16

LFM2-350M-Extract-GGUF

aashish1904/LFM2-350M-Extract-GGUF This is quantized version of LiquidAI/LFM2-350M-Extract created using llama.cpp Based on LFM2-350M, LFM2-350M-Extract is designed to extract important information from a wide variety of unstructured documents (such as articles, transcripts, or reports) into structured outputs like JSON, XML, or YAML. - Extracting invoice details from emails into structured JSON. - Converting regulatory filings into XML for compliance systems. - Transforming customer support tickets into YAML for analytics pipelines. - Populating knowledge graphs with entities and attributes from unstructured reports. You can find more information about other task-specific models in this blog post. Generation parameters: We strongly recommend using greedy decoding with a `temperature=0`. System prompt: If no system prompt is provided, the model will default to JSON outputs. We recommend providing a system prompt with a specific format (JSON, XML, or YAML) and a given schema to improve accuracy (see the following example). Supported languages: English, Arabic, Chinese, French, German, Japanese, Korean, Portuguese, and Spanish. Chat template: LFM2 uses a ChatML-like chat template as follows: You can automatically apply it using the dedicated `.applychattemplate()` function from Hugging Face transformers. > [!WARNING] > ⚠️ The model is intended for single-turn conversations. The data used for training these models was primarily synthetic, which allowed us to ensure a diverse data mix. We used a range of document types, domains, styles, lengths, and languages. We also varied the density and distribution of relevant text in the documents. In some cases, the extracted information was clustered in one part of the document; in others, it’s spread throughout. We applied the same approach of ensuring diversity when creating synthetic user requests and designing the structure of the model outputs. The data generation process underwent many iterations, incorporating ideas and feedback from across the Liquid AI team. We evaluated LFM2-Extract on a dataset of 5,000 documents, covering over 100 topics with a mix of writing styles, ambiguities, and formats. We used a combination of five metrics to capture a balanced view on syntax, accuracy, and faithfulness: - Syntax score: Checks whether outputs parse cleanly as valid JSON, XML, or YAML. - Format accuracy: Verifies that outputs match the requested format (e.g., JSON when JSON is requested). - Keyword faithfulness: Measures whether values in the structured output actually appear in the input text. - Absolute scoring: A judge LLM scores quality on a 1-5 scale, assessing completeness and correctness of extractions. - Relative scoring: We ask a judge LLM to choose the best answer between the extraction model’s output and the ground-truth answer. LFM2-350M-Extract outperforms Gemma 3 4B at this task, a model more than 11x its size. - Hugging Face: LFM2-350M - llama.cpp: LFM2-350M-Extract-GGUF - LEAP: LEAP model library If you are interested in custom solutions with edge deployment, please contact our sales team.

592
1

llama-3-sqlcoder-8b-GGUF

NaNK
base_model:defog/llama-3-sqlcoder-8b
587
9

Hermes2-Gutenberg2-Mistral-7B-GGUF

NaNK
license:apache-2.0
583
1

Meta-Llama-3-8B-Instruct-GGUF-v2

NaNK
llama
581
15

ArliAI-Llama-3-8B-Cumulus-v0.3-GGUF

NaNK
llama
579
1

Llama-3.2-1B-GGUF

NaNK
llama
573
9

Mistral-NeMo-Minitron-8B-Instruct-GGUF

NaNK
570
5

EXAONE-3.0-7.8B-Instruct-GGUF

NaNK
570
2

UNfilteredAI-1B-GGUF

NaNK
567
3

DarkIdol-Llama-3.1-8B-Instruct-1.0-Uncensored-GGUF

NaNK
llama3
566
8

Hathor_Aleph-L3-8B-v0.72-GGUF

NaNK
561
1

distilgpt2-GGUF

license:apache-2.0
549
4

Arch-Function-3B-GGUF

NaNK
542
2

Bielik-11B-v2.3-Instruct-GGUF

NaNK
license:apache-2.0
536
2

MN-12B-Mag-Mell-R1-GGUF

NaNK
534
3

Llama-3-Taiwan-8B-Instruct-GGUF

NaNK
532
3

Llama-3-Alpha-Ko-8B-Instruct-GGUF

NaNK
base_model:allganize/Llama-3-Alpha-Ko-8B-Instruct
524
1

shisa-7b-v1-GGUF

NaNK
license:apache-2.0
523
0

Llama-3-Groq-8B-Tool-Use-GGUF

NaNK
llama
517
3

Mental-Health-FineTuned-Mistral-7B-Instruct-v0.2-GGUF

NaNK
license:apache-2.0
510
7

L3-8B-Stheno-v3.2-GGUF

NaNK
license:cc-by-nc-4.0
510
4

falcon-11B-GGUF

NaNK
505
3

MMed-Llama-3-8B-GGUF

NaNK
license:llama3
495
3

Fireball-Alpaca-Llama-3.1-8B-Instruct-KTO-beta-GGUF

NaNK
llama
491
3

DeepSeek-Coder-V2-Lite-Base-GGUF

NaNK
490
1

Llama3.2-3B-Esper2-GGUF

NaNK
llama
489
3

DeepScaleR-1.5B-Preview-GGUF

NaNK
dataset:KbsdJames/Omni-MATH
483
2

Vikhr-Qwen-2.5-0.5b-Instruct-GGUF

NaNK
license:apache-2.0
479
2

Meta-Llama-3.1-8B-Text-to-SQL-GGUF

NaNK
llama
477
7

llama3-8B-DarkIdol-2.3-Uncensored-32K-GGUF

QuantFactory/llama3-8B-DarkIdol-2.3-Uncensored-32K-GGUF This is quantized version of aifeifei798/llama3-8B-DarkIdol-2.3-Uncensored-32K created using llama.cpp The final version of Llama 3.0 will be followed by the next iteration starting from Llama 3.1. Special Thanks: - Lewdiculous's superb gguf version, thank you for your conscientious and responsible dedication. - https://huggingface.co/LWDCLS/llama3-8B-DarkIdol-2.3-Uncensored-32K-GGUF-IQ-Imatrix-Request - mradermacher's superb gguf version, thank you for your conscientious and responsible dedication. - https://huggingface.co/mradermacher/llama3-8B-DarkIdol-2.3-Uncensored-32K-i1-GGUF - https://huggingface.co/mradermacher/llama3-8B-DarkIdol-2.3-Uncensored-32K-GGUF These are my own quantizations (updated almost daily). The difference with normal quantizations is that I quantize the output and embed tensors to f16. and the other tensors to 15k,q6k or q80. This creates models that are little or not degraded at all and have a smaller size. They run at about 3-6 t/sec on CPU only using llama.cpp And obviously faster on computers with potent GPUs - the fast cat at ZeroWw/llama3-8B-DarkIdol-2.3-Uncensored-32K-GGUF Model Description: The module combination has been readjusted to better fulfill various roles and has been adapted for mobile phones. - Saving money(LLama 3) - only test en. - Input Models input text only. Output Models generate text and code only. - Uncensored - Quick response - The underlying model used is winglian/Llama-3-8b-64k-PoSE (The theoretical support is 64k, but I have only tested up to 32k. :) - A scholarly response akin to a thesis.(I tend to write songs extensively, to the point where one song almost becomes as detailed as a thesis. :) - DarkIdol:Roles that you can imagine and those that you cannot imagine. - Roleplay - Specialized in various role-playing scenarios - more look at test role. (https://huggingface.co/aifeifei798/llama3-8B-DarkIdol-1.2/tree/main/test) - more look at LM Studio presets (https://huggingface.co/aifeifei798/llama3-8B-DarkIdol-1.2/tree/main/config-presets) Questions - The model's response results are for reference only, please do not fully trust them. Stop Strings ", "### Note:", "### Input:", "### Response:", "### Emoticons:" ], yaml models: - model: Sao10K/L3-8B-Niitama-v1 - model: Hastagaras/Jamet-8B-L3-MK.V-Blackroot - model: Nitral-AI/HathorTahsin-L3-8B-v0.85 - model: turboderp/llama3-turbcat-instruct-8b - model: winglian/Llama-3-8b-64k-PoSE mergemethod: modelstock basemodel: winglian/Llama-3-8b-64k-PoSE dtype: bfloat16 models: - model: Rupesh2/Meta-Llama-3-8B-abliterated - model: Orenguteng/Llama-3-8B-LexiFun-Uncensored-V1 - model: Orenguteng/Llama-3-8B-Lexi-Uncensored - model: theprint/Llama-3-8B-Lexi-Smaug-Uncensored - model: vicgalle/Unsafe-Llama-3-8B - model: vicgalle/Configurable-Hermes-2-Pro-Llama-3-8B - model: ./llama3-8B-DarkIdol-2.3b mergemethod: modelstock basemodel: ./llama3-8B-DarkIdol-2.3b dtype: bfloat16 ```

NaNK
llama3
473
6

llama-3-youko-8b-GGUF

NaNK
base_model:rinna/llama-3-youko-8b
469
0

HelpingAI2-6B-GGUF

NaNK
463
1

internistai-base-7b-v0.2-GGUF

NaNK
license:apache-2.0
462
2

SeaLLM-7B-v2.5-GGUF

NaNK
462
1

Llama-3-Ko-8B-Instruct-GGUF

NaNK
llama
458
1

Llama3-8B-Instruct-Replete-Adapted-GGUF

NaNK
llama
455
2

Gemma-2-Ataraxy-Doppel-9B-GGUF

NaNK
455
2

Qwen2-Boundless-GGUF

license:apache-2.0
453
4

Llama-3-8B-ProLong-64k-Instruct-GGUF

NaNK
license:llama3
452
1

MAmmoTH2-8B-Plus-GGUF

NaNK
license:mit
450
0

GWQ-9B-Preview-GGUF

NaNK
445
3

Teuken-7B-instruct-commercial-v0.4-GGUF

NaNK
license:apache-2.0
442
2

ghost-8b-beta-GGUF

NaNK
llama
440
1

Mistral-Nemo-Japanese-Instruct-2408-GGUF

NaNK
license:apache-2.0
439
17

Qwen2.5-7B-Instruct-abliterated-v2-GGUF

NaNK
license:apache-2.0
438
4

gpt2-GGUF

NaNK
license:mit
436
3

gemma-2-9b-GGUF

NaNK
434
5

L3-Luna-8B-GGUF

NaNK
base_model:Ayush-1722/Meta-Llama-3-8B-Instruct-Summarize-v0.2-24K-LoRANET-Merged
434
1

Ahma-3B-GGUF

NaNK
llama.cpp
432
2

DeepSeek-Coder-V2-Lite-Instruct-GGUF

NaNK
432
1

Peach-2.0-9B-8k-Roleplay-GGUF

QuantFactory/Peach-2.0-9B-8k-Roleplay-GGUF This is quantized version of ClosedCharacter/Peach-2.0-9B-8k-Roleplay created using llama.cpp Peach-2.0-9B-8k-Roleplay is a chat large language model obtained by finetuning 01-ai/Yi-1.5-9B model on more than 100K conversations created through our data synthesis approach. What's New Finally, after much anticipation, Peachv2.0 has been open-sourced! We completed the final SFT+DPO training in early January, followed by extensive testing, before concluding that this version meets the standards for a commercial release, now freely available to everyone. Our goal is to break the paid monopoly, allowing everyone to have their own local role-playing AI! Compared to the previous version, we've made the following improvements: 1. Compatibility with Silly-Tavern output format, making it easy for every role-playing enthusiast to quickly get started and use! 2. Enhanced the model's writing capabilities, achieving a significant leap in plot output and action description! 3. Strengthened the model's bilingual capabilities, requiring only two lines of prompt to perfectly accommodate English character cards in Chinese conversations! 4. Improved the model's interaction abilities, giving it higher intelligence & emotional quotient in terms of plot progression and topic weaving! 5. Introduced a DPO training phase for preference alignment, addressing issues such as context repetition and logical errors in the SFT model, resulting in superior dialogue performance. Warning All response are generated by AI and do not represent the views or opinions of the developers. 1. Despite having done rigorous filtering, due to the uncontrollability of LLM, our model may still generate toxic, harmful, and NSFW content. 2. Due to limitations in model parameters, the 9B model may perform poorly on mathematical tasks, coding tasks, and logical capabilities. 3. Our training data is capped at a maximum length of 8k, so excessively long conversation turns may result in a decline in the quality of responses. 4. We used bilingual Chinese-English data for training, so the model may not perform well on other low-resource languages. 5. The model may generate a significant amount of hallucinations, so it is recommended to use lower values for temperature and topp parameters. Thanks For FlowGPT's support, which is a dynamic tool that harnesses the power of AI to streamline various creative and professional tasks.

NaNK
license:mit
430
11

chinese-text-correction-1.5b-GGUF

NaNK
license:apache-2.0
430
1

LFM2-1.2B-Extract-GGUF

aashish1904/LFM2-1.2B-Extract-GGUF This is quantized version of LiquidAI/LFM2-1.2B-Extract created using llama.cpp Based on LFM2-1.2B, LFM2-1.2B-Extract is designed to extract important information from a wide variety of unstructured documents (such as articles, transcripts, or reports) into structured outputs like JSON, XML, or YAML. - Extracting invoice details from emails into structured JSON. - Converting regulatory filings into XML for compliance systems. - Transforming customer support tickets into YAML for analytics pipelines. - Populating knowledge graphs with entities and attributes from unstructured reports. You can find more information about other task-specific models in this blog post. Generation parameters: We strongly recommend using greedy decoding with a `temperature=0`. System prompt: If no system prompt is provided, the model will default to JSON outputs. We recommend providing a system prompt with a specific format (JSON, XML, or YAML) and a given schema to improve accuracy (see the following example). Supported languages: English, Arabic, Chinese, French, German, Japanese, Korean, Portuguese, and Spanish. Chat template: LFM2 uses a ChatML-like chat template as follows: You can automatically apply it using the dedicated `.applychattemplate()` function from Hugging Face transformers. > [!WARNING] > ⚠️ The model is intended for single-turn conversations. The data used for training these models was primarily synthetic, which allowed us to ensure a diverse data mix. We used a range of document types, domains, styles, lengths, and languages. We also varied the density and distribution of relevant text in the documents. In some cases, the extracted information was clustered in one part of the document; in others, it’s spread throughout. We applied the same approach of ensuring diversity when creating synthetic user requests and designing the structure of the model outputs. The data generation process underwent many iterations, incorporating ideas and feedback from across the Liquid AI team. We evaluated LFM2-Extract on a dataset of 5,000 documents, covering over 100 topics with a mix of writing styles, ambiguities, and formats. We used a combination of five metrics to capture a balanced view on syntax, accuracy, and faithfulness: - Syntax score: Checks whether outputs parse cleanly as valid JSON, XML, or YAML. - Format accuracy: Verifies that outputs match the requested format (e.g., JSON when JSON is requested). - Keyword faithfulness: Measures whether values in the structured output actually appear in the input text. - Absolute scoring: A judge LLM scores quality on a 1-5 scale, assessing completeness and correctness of extractions. - Relative scoring: We ask a judge LLM to choose the best answer between the extraction model’s output and the ground-truth answer. LFM2-1.2B-Extract can output complex objects in different languages on a level higher than Gemma 3 27B, a model 22.5 times its size. - Hugging Face: LFM2-1.2B - llama.cpp: LFM2-1.2B-Extract-GGUF - LEAP: LEAP model library If you are interested in custom solutions with edge deployment, please contact our sales team.

NaNK
430
1

eai-distill-0.5b-GGUF

QuantFactory/eai-distill-0.5b-GGUF This is quantized version of EssentialAI/eai-distill-0.5b created using llama.cpp 🏷️ EAI-Distill-0.5b 🏆 Website | 🖥️ Code | 📖 Paper EAI-Distill-0.5b is a fine-tuned version of Qwen2.5-0.5B-Instruct designed for document classification across 12 taxonomic categories. This model is optimized for high-throughput classification of web documents and produces structured metadata for large-scale dataset curation. The model classifies documents across the following dimensions: - 📚 Free Decimal Correspondence (FDC): Subject matter classification based on the Dewey Decimal System - 🧠 Bloom's Taxonomy: Cognitive process (Remember/Understand/Apply/Analyze/Evaluate/Create) and knowledge domain (Factual/Conceptual/Procedural/Metacognitive) - 📄 Document Type: Web page categorization (News, Academic, Reference, Code, Social, etc.) - 🔍 Content Quality: Extraction artifacts, missing content detection - 🎓 Educational Metadata: Reasoning depth, technical correctness, educational level - 🤖 Base Model: Qwen2.5-0.5B-Instruct - 📊 Training Data: 82B synthetic tokens generated by Qwen2.5-32B-Instruct (teacher model) on 104M Common Crawl documents - ⚙️ Optimizer: AdamW (β₁=0.9, β₂=0.95, weightdecay=0.1) - 📈 Learning Rate: 1×10⁻⁴ with linear warmup (2B tokens), cosine decay to 1×10⁻⁵, then linear anneal to 0 - 📦 Batch Size: 2M tokens - 📏 Sequence Length: 16,384 tokens - 💻 Hardware: Trained on AMD MI300x GPUs The model achieves an average Cohen's κ agreement of 0.71-0.74 with our golden annotators, GPT-4o and Claude 3.5 Sonnet, on held-out evaluation sets, which is within 3% of its teacher model Qwen2.5-32b-Instruct while being 64× smaller. 💻 Usage The model outputs classifications in a condensed format: This model is designed for: - 🏗️ Large-scale web document classification and metadata generation - 🔧 Dataset curation through taxonomic filtering - ✅ Content quality assessment for training data preparation - 📚 Educational content analysis and organization - Optimized for English web documents extracted using resiliparse - Documents over 30k characters are automatically chunked, which may affect classification accuracy - Performance may vary on content significantly different from Common Crawl web data - Classification categories are based on web content patterns and may not generalize to other document types

NaNK
license:apache-2.0
428
2

INTELLECT-1-GGUF

license:apache-2.0
427
2

Llama3.1-BestMix-Chem-Einstein-8B-GGUF

QuantFactory/Llama3.1-BestMix-Chem-Einstein-8B-GGUF This is quantized version of ZeroXClem/Llama3.1-BestMix-Chem-Einstein-8B created using llama.cpp Llama3.1-BestMix-Chem-Einstein-8B is an innovative, meticulously blended model designed to excel in instruction-following, chemistry-focused tasks, and long-form conversational generation. This model fuses the best qualities of multiple Llama3-based architectures, making it highly versatile for both general and specialized tasks. 💻🧠✨ - bunnycore/Best-Mix-Llama-3.1-8B: A balanced blend of top Llama models, optimized for general performance across reasoning, instruction-following, and math. - USTC-KnowledgeComputingLab/Llama3-KALE-LM-Chem-1.5-8B: A model specialized in scientific knowledge and chemistry, excelling in chemistry benchmarks. - Weyaxi/Einstein-v6.1-Llama3-8B: Fine-tuned for long-form generation, conversation-heavy tasks, and optimized with cutting-edge techniques for efficient memory usage and fast performance. - A masterful blend of several Llama3 models like Aurorafaustus, TitanFusion, and OpenMath2. - Provides a balanced performance in a variety of tasks such as reasoning, math, and instruction-following. - Key contributor to the overall versatility of the merged model. B: USTC-KnowledgeComputingLab/Llama3-KALE-LM-Chem-1.5-8B - Specializes in chemistry and scientific knowledge, outperforming many larger models in chemistry benchmarks. - Adds scientific rigor and domain-specific expertise to the merged model, making it perfect for scientific and academic tasks. - Fine-tuned on a wide range of instructive and conversational datasets like WizardLM, Alpaca, and ShareGPT. - Optimized for long-form text generation and enhanced with xformers attention and flash attention techniques for better performance. - Key player in dialogue-based tasks and long conversation generation. This model was merged using the TIES merge method, ensuring a smooth integration of the key strengths from each contributing model. Here's the configuration used: With the foundation of Best-Mix, this model excels in general-purpose reasoning, instruction-following, and tasks that require high adaptability. Thanks to the contribution from KALE-LM-Chem, this model shines in scientific research, particularly chemistry-focused tasks, making it ideal for academic and research purposes. With Einstein-v6.1, the model handles long-form generation effortlessly, excelling in extended conversations and structured dialogue applications. While still in its early stages, Llama3.1-BestMix-Chem-Einstein-8B is expected to perform well across a variety of benchmarks, including: - Chemistry-focused benchmarks (KALE-LM-Chem) - Instruction-following tasks (Best-Mix) - Conversational AI and long-form text generation (Einstein-v6.1) Further testing and evaluation will continue to refine this model's capabilities. This model is open-sourced under the Apache-2.0 License, allowing free use and modification with proper attribution. - `merge` - `TIES` - `BestMix` - `Chemistry` - `Einstein` - `instruction-following` - `long-form-generation` - `conversational`

NaNK
Llama3
425
2

Qwen2-1.5B-GGUF

NaNK
license:apache-2.0
423
3

Turkish-Llama-8b-v0.1-GGUF

NaNK
Llama
421
4

gemma-3-270m-it-GGUF

QuantFactory/gemma-3-270m-it-GGUF This is quantized version of google/gemma-3-270m-it created using llama.cpp [Gemma 3 Technical Report][g3-tech-report] [Responsible Generative AI Toolkit][rai-toolkit] [Gemma on Kaggle][kaggle-gemma] [Gemma on Vertex Model Garden][vertex-mg-gemma3] Summary description and brief definition of inputs and outputs. Gemma is a family of lightweight, state-of-the-art open models from Google, built from the same research and technology used to create the Gemini models. Gemma 3 models are multimodal, handling text and image input and generating text output, with open weights for both pre-trained variants and instruction-tuned variants. Gemma 3 has a large, 128K context window, multilingual support in over 140 languages, and is available in more sizes than previous versions. Gemma 3 models are well-suited for a variety of text generation and image understanding tasks, including question answering, summarization, and reasoning. Their relatively small size makes it possible to deploy them in environments with limited resources such as laptops, desktops or your own cloud infrastructure, democratizing access to state of the art AI models and helping foster innovation for everyone. - Input: - Text string, such as a question, a prompt, or a document to be summarized - Images, normalized to 896 x 896 resolution and encoded to 256 tokens each, for the 4B, 12B, and 27B sizes. - Total input context of 128K tokens for the 4B, 12B, and 27B sizes, and 32K tokens for the 1B and 270M sizes. - Output: - Generated text in response to the input, such as an answer to a question, analysis of image content, or a summary of a document - Total output context up to 128K tokens for the 4B, 12B, and 27B sizes, and 32K tokens for the 1B and 270M sizes per request, subtracting the request input tokens Data used for model training and how the data was processed. These models were trained on a dataset of text data that includes a wide variety of sources. The 27B model was trained with 14 trillion tokens, the 12B model was trained with 12 trillion tokens, 4B model was trained with 4 trillion tokens, the 1B with 2 trillion tokens, and the 270M with 6 trillion tokens. The knowledge cutoff date for the training data was August 2024. Here are the key components: - Web Documents: A diverse collection of web text ensures the model is exposed to a broad range of linguistic styles, topics, and vocabulary. The training dataset includes content in over 140 languages. - Code: Exposing the model to code helps it to learn the syntax and patterns of programming languages, which improves its ability to generate code and understand code-related questions. - Mathematics: Training on mathematical text helps the model learn logical reasoning, symbolic representation, and to address mathematical queries. - Images: A wide range of images enables the model to perform image analysis and visual data extraction tasks. The combination of these diverse data sources is crucial for training a powerful multimodal model that can handle a wide variety of different tasks and data formats. Here are the key data cleaning and filtering methods applied to the training data: - CSAM Filtering: Rigorous CSAM (Child Sexual Abuse Material) filtering was applied at multiple stages in the data preparation process to ensure the exclusion of harmful and illegal content. - Sensitive Data Filtering: As part of making Gemma pre-trained models safe and reliable, automated techniques were used to filter out certain personal information and other sensitive data from training sets. - Additional methods: Filtering based on content quality and safety in line with [our policies][safety-policies]. Gemma was trained using [Tensor Processing Unit (TPU)][tpu] hardware (TPUv4p, TPUv5p and TPUv5e). Training vision-language models (VLMS) requires significant computational power. TPUs, designed specifically for matrix operations common in machine learning, offer several advantages in this domain: - Performance: TPUs are specifically designed to handle the massive computations involved in training VLMs. They can speed up training considerably compared to CPUs. - Memory: TPUs often come with large amounts of high-bandwidth memory, allowing for the handling of large models and batch sizes during training. This can lead to better model quality. - Scalability: TPU Pods (large clusters of TPUs) provide a scalable solution for handling the growing complexity of large foundation models. You can distribute training across multiple TPU devices for faster and more efficient processing. - Cost-effectiveness: In many scenarios, TPUs can provide a more cost-effective solution for training large models compared to CPU-based infrastructure, especially when considering the time and resources saved due to faster training. - These advantages are aligned with [Google's commitments to operate sustainably][sustainability]. Training was done using [JAX][jax] and [ML Pathways][ml-pathways]. JAX allows researchers to take advantage of the latest generation of hardware, including TPUs, for faster and more efficient training of large models. ML Pathways is Google's latest effort to build artificially intelligent systems capable of generalizing across multiple tasks. This is specially suitable for foundation models, including large language models like these ones. Together, JAX and ML Pathways are used as described in the [paper about the Gemini family of models][gemini-2-paper]; "the 'single controller' programming model of Jax and Pathways allows a single Python process to orchestrate the entire training run, dramatically simplifying the development workflow." These models were evaluated against a large collection of different datasets and metrics to cover different aspects of text generation. Evaluation results marked with IT are for instruction-tuned models. Evaluation results marked with PT are for pre-trained models. | Benchmark | n-shot | Gemma 3 PT 270M | | :------------------------ | :-----------: | ------------------: | | [HellaSwag][hellaswag] | 10-shot | 40.9 | | [BoolQ][boolq] | 0-shot | 61.4 | | [PIQA][piqa] | 0-shot | 67.7 | | [TriviaQA][triviaqa] | 5-shot | 15.4 | | [ARC-c][arc] | 25-shot | 29.0 | | [ARC-e][arc] | 0-shot | 57.7 | | [WinoGrande][winogrande] | 5-shot | 52.0 | [hellaswag]: https://arxiv.org/abs/1905.07830 [boolq]: https://arxiv.org/abs/1905.10044 [piqa]: https://arxiv.org/abs/1911.11641 [triviaqa]: https://arxiv.org/abs/1705.03551 [arc]: https://arxiv.org/abs/1911.01547 [winogrande]: https://arxiv.org/abs/1907.10641 | Benchmark | n-shot | Gemma 3 IT 270m | | :------------------------ | :-----------: | ------------------: | | [HellaSwag][hellaswag] | 0-shot | 37.7 | | [PIQA][piqa] | 0-shot | 66.2 | | [ARC-c][arc] | 0-shot | 28.2 | | [WinoGrande][winogrande] | 0-shot | 52.3 | | [BIG-Bench Hard][bbh] | few-shot | 26.7 | | [IF Eval][ifeval] | 0-shot | 51.2 | [hellaswag]: https://arxiv.org/abs/1905.07830 [piqa]: https://arxiv.org/abs/1911.11641 [arc]: https://arxiv.org/abs/1911.01547 [winogrande]: https://arxiv.org/abs/1907.10641 [bbh]: https://paperswithcode.com/dataset/bbh [bbh]: https://paperswithcode.com/dataset/bbh [ifeval]: https://arxiv.org/abs/2311.07911 | Benchmark | n-shot | Gemma 3 IT 1B | Gemma 3 IT 4B | Gemma 3 IT 12B | Gemma 3 IT 27B | |--------------------------------|--------|:-------------:|:-------------:|:--------------:|:--------------:| | [GPQA][gpqa] Diamond | 0-shot | 19.2 | 30.8 | 40.9 | 42.4 | | [SimpleQA][simpleqa] | 0-shot | 2.2 | 4.0 | 6.3 | 10.0 | | [FACTS Grounding][facts-grdg] | - | 36.4 | 70.1 | 75.8 | 74.9 | | [BIG-Bench Hard][bbh] | 0-shot | 39.1 | 72.2 | 85.7 | 87.6 | | [BIG-Bench Extra Hard][bbeh] | 0-shot | 7.2 | 11.0 | 16.3 | 19.3 | | [IFEval][ifeval] | 0-shot | 80.2 | 90.2 | 88.9 | 90.4 | | Benchmark | n-shot | Gemma 3 PT 1B | Gemma 3 PT 4B | Gemma 3 PT 12B | Gemma 3 PT 27B | | ------------------------------ |----------|:--------------:|:-------------:|:--------------:|:--------------:| | [HellaSwag][hellaswag] | 10-shot | 62.3 | 77.2 | 84.2 | 85.6 | | [BoolQ][boolq] | 0-shot | 63.2 | 72.3 | 78.8 | 82.4 | | [PIQA][piqa] | 0-shot | 73.8 | 79.6 | 81.8 | 83.3 | | [SocialIQA][socialiqa] | 0-shot | 48.9 | 51.9 | 53.4 | 54.9 | | [TriviaQA][triviaqa] | 5-shot | 39.8 | 65.8 | 78.2 | 85.5 | | [Natural Questions][naturalq] | 5-shot | 9.48 | 20.0 | 31.4 | 36.1 | | [ARC-c][arc] | 25-shot | 38.4 | 56.2 | 68.9 | 70.6 | | [ARC-e][arc] | 0-shot | 73.0 | 82.4 | 88.3 | 89.0 | | [WinoGrande][winogrande] | 5-shot | 58.2 | 64.7 | 74.3 | 78.8 | | [BIG-Bench Hard][bbh] | few-shot | 28.4 | 50.9 | 72.6 | 77.7 | | [DROP][drop] | 1-shot | 42.4 | 60.1 | 72.2 | 77.2 | [gpqa]: https://arxiv.org/abs/2311.12022 [simpleqa]: https://arxiv.org/abs/2411.04368 [facts-grdg]: https://goo.gle/FACTSpaper [bbeh]: https://github.com/google-deepmind/bbeh [ifeval]: https://arxiv.org/abs/2311.07911 [hellaswag]: https://arxiv.org/abs/1905.07830 [boolq]: https://arxiv.org/abs/1905.10044 [piqa]: https://arxiv.org/abs/1911.11641 [socialiqa]: https://arxiv.org/abs/1904.09728 [triviaqa]: https://arxiv.org/abs/1705.03551 [naturalq]: https://github.com/google-research-datasets/natural-questions [arc]: https://arxiv.org/abs/1911.01547 [winogrande]: https://arxiv.org/abs/1907.10641 [bbh]: https://paperswithcode.com/dataset/bbh [drop]: https://arxiv.org/abs/1903.00161 | Benchmark | n-shot | Gemma 3 IT 1B | Gemma 3 IT 4B | Gemma 3 IT 12B | Gemma 3 IT 27B | |----------------------------|--------|:-------------:|:-------------:|:--------------:|:--------------:| | [MMLU][mmlu] (Pro) | 0-shot | 14.7 | 43.6 | 60.6 | 67.5 | | [LiveCodeBench][lcb] | 0-shot | 1.9 | 12.6 | 24.6 | 29.7 | | [Bird-SQL][bird-sql] (dev) | - | 6.4 | 36.3 | 47.9 | 54.4 | | [Math][math] | 0-shot | 48.0 | 75.6 | 83.8 | 89.0 | | HiddenMath | 0-shot | 15.8 | 43.0 | 54.5 | 60.3 | | [MBPP][mbpp] | 3-shot | 35.2 | 63.2 | 73.0 | 74.4 | | [HumanEval][humaneval] | 0-shot | 41.5 | 71.3 | 85.4 | 87.8 | | [Natural2Code][nat2code] | 0-shot | 56.0 | 70.3 | 80.7 | 84.5 | | [GSM8K][gsm8k] | 0-shot | 62.8 | 89.2 | 94.4 | 95.9 | | Benchmark | n-shot | Gemma 3 PT 4B | Gemma 3 PT 12B | Gemma 3 PT 27B | | ------------------------------ |----------------|:-------------:|:--------------:|:--------------:| | [MMLU][mmlu] | 5-shot | 59.6 | 74.5 | 78.6 | | [MMLU][mmlu] (Pro COT) | 5-shot | 29.2 | 45.3 | 52.2 | | [AGIEval][agieval] | 3-5-shot | 42.1 | 57.4 | 66.2 | | [MATH][math] | 4-shot | 24.2 | 43.3 | 50.0 | | [GSM8K][gsm8k] | 8-shot | 38.4 | 71.0 | 82.6 | | [GPQA][gpqa] | 5-shot | 15.0 | 25.4 | 24.3 | | [MBPP][mbpp] | 3-shot | 46.0 | 60.4 | 65.6 | | [HumanEval][humaneval] | 0-shot | 36.0 | 45.7 | 48.8 | [mmlu]: https://arxiv.org/abs/2009.03300 [agieval]: https://arxiv.org/abs/2304.06364 [math]: https://arxiv.org/abs/2103.03874 [gsm8k]: https://arxiv.org/abs/2110.14168 [gpqa]: https://arxiv.org/abs/2311.12022 [mbpp]: https://arxiv.org/abs/2108.07732 [humaneval]: https://arxiv.org/abs/2107.03374 [lcb]: https://arxiv.org/abs/2403.07974 [bird-sql]: https://arxiv.org/abs/2305.03111 [nat2code]: https://arxiv.org/abs/2405.04520 | Benchmark | n-shot | Gemma 3 IT 1B | Gemma 3 IT 4B | Gemma 3 IT 12B | Gemma 3 IT 27B | |--------------------------------------|--------|:-------------:|:-------------:|:--------------:|:--------------:| | [Global-MMLU-Lite][global-mmlu-lite] | 0-shot | 34.2 | 54.5 | 69.5 | 75.1 | | [ECLeKTic][eclektic] | 0-shot | 1.4 | 4.6 | 10.3 | 16.7 | | [WMT24++][wmt24pp] | 0-shot | 35.9 | 46.8 | 51.6 | 53.4 | | Benchmark | Gemma 3 PT 1B | Gemma 3 PT 4B | Gemma 3 PT 12B | Gemma 3 PT 27B | | ------------------------------------ |:-------------:|:-------------:|:--------------:|:--------------:| | [MGSM][mgsm] | 2.04 | 34.7 | 64.3 | 74.3 | | [Global-MMLU-Lite][global-mmlu-lite] | 24.9 | 57.0 | 69.4 | 75.7 | | [WMT24++][wmt24pp] (ChrF) | 36.7 | 48.4 | 53.9 | 55.7 | | [FloRes][flores] | 29.5 | 39.2 | 46.0 | 48.8 | | [XQuAD][xquad] (all) | 43.9 | 68.0 | 74.5 | 76.8 | | [ECLeKTic][eclektic] | 4.69 | 11.0 | 17.2 | 24.4 | | [IndicGenBench][indicgenbench] | 41.4 | 57.2 | 61.7 | 63.4 | [mgsm]: https://arxiv.org/abs/2210.03057 [flores]: https://arxiv.org/abs/2106.03193 [xquad]: https://arxiv.org/abs/1910.11856v3 [global-mmlu-lite]: https://huggingface.co/datasets/CohereForAI/Global-MMLU-Lite [wmt24pp]: https://arxiv.org/abs/2502.12404v1 [eclektic]: https://arxiv.org/abs/2502.21228 [indicgenbench]: https://arxiv.org/abs/2404.16816 | Benchmark | Gemma 3 IT 4B | Gemma 3 IT 12B | Gemma 3 IT 27B | |-----------------------------------|:-------------:|:--------------:|:--------------:| | [MMMU][mmmu] (val) | 48.8 | 59.6 | 64.9 | | [DocVQA][docvqa] | 75.8 | 87.1 | 86.6 | | [InfoVQA][info-vqa] | 50.0 | 64.9 | 70.6 | | [TextVQA][textvqa] | 57.8 | 67.7 | 65.1 | | [AI2D][ai2d] | 74.8 | 84.2 | 84.5 | | [ChartQA][chartqa] | 68.8 | 75.7 | 78.0 | | [VQAv2][vqav2] (val) | 62.4 | 71.6 | 71.0 | | [MathVista][mathvista] (testmini) | 50.0 | 62.9 | 67.6 | | Benchmark | Gemma 3 PT 4B | Gemma 3 PT 12B | Gemma 3 PT 27B | | ------------------------------ |:-------------:|:--------------:|:--------------:| | [COCOcap][coco-cap] | 102 | 111 | 116 | | [DocVQA][docvqa] (val) | 72.8 | 82.3 | 85.6 | | [InfoVQA][info-vqa] (val) | 44.1 | 54.8 | 59.4 | | [MMMU][mmmu] (pt) | 39.2 | 50.3 | 56.1 | | [TextVQA][textvqa] (val) | 58.9 | 66.5 | 68.6 | | [RealWorldQA][realworldqa] | 45.5 | 52.2 | 53.9 | | [ReMI][remi] | 27.3 | 38.5 | 44.8 | | [AI2D][ai2d] | 63.2 | 75.2 | 79.0 | | [ChartQA][chartqa] | 63.6 | 74.7 | 76.3 | | [VQAv2][vqav2] | 63.9 | 71.2 | 72.9 | | [BLINK][blinkvqa] | 38.0 | 35.9 | 39.6 | | [OKVQA][okvqa] | 51.0 | 58.7 | 60.2 | | [TallyQA][tallyqa] | 42.5 | 51.8 | 54.3 | | [SpatialSense VQA][ss-vqa] | 50.9 | 60.0 | 59.4 | | [CountBenchQA][countbenchqa] | 26.1 | 17.8 | 68.0 | [coco-cap]: https://cocodataset.org/#home [docvqa]: https://www.docvqa.org/ [info-vqa]: https://arxiv.org/abs/2104.12756 [mmmu]: https://arxiv.org/abs/2311.16502 [textvqa]: https://textvqa.org/ [realworldqa]: https://paperswithcode.com/dataset/realworldqa [remi]: https://arxiv.org/html/2406.09175v1 [ai2d]: https://allenai.org/data/diagrams [chartqa]: https://arxiv.org/abs/2203.10244 [vqav2]: https://visualqa.org/index.html [blinkvqa]: https://arxiv.org/abs/2404.12390 [okvqa]: https://okvqa.allenai.org/ [tallyqa]: https://arxiv.org/abs/1810.12440 [ss-vqa]: https://arxiv.org/abs/1908.02660 [countbenchqa]: https://github.com/google-research/bigvision/blob/main/bigvision/datasets/countbenchqa/ [mathvista]: https://arxiv.org/abs/2310.02255 Our evaluation methods include structured evaluations and internal red-teaming testing of relevant content policies. Red-teaming was conducted by a number of different teams, each with different goals and human evaluation metrics. These models were evaluated against a number of different categories relevant to ethics and safety, including: - Child Safety: Evaluation of text-to-text and image to text prompts covering child safety policies, including child sexual abuse and exploitation. - Content Safety: Evaluation of text-to-text and image to text prompts covering safety policies including, harassment, violence and gore, and hate speech. - Representational Harms: Evaluation of text-to-text and image to text prompts covering safety policies including bias, stereotyping, and harmful associations or inaccuracies. In addition to development level evaluations, we conduct "assurance evaluations" which are our 'arms-length' internal evaluations for responsibility governance decision making. They are conducted separately from the model development team, to inform decision making about release. High level findings are fed back to the model team, but prompt sets are held-out to prevent overfitting and preserve the results' ability to inform decision making. Assurance evaluation results are reported to our Responsibility & Safety Council as part of release review. For all areas of safety testing, we saw major improvements in the categories of child safety, content safety, and representational harms relative to previous Gemma models. All testing was conducted without safety filters to evaluate the model capabilities and behaviors. For both text-to-text and image-to-text, and across all model sizes, the model produced minimal policy violations, and showed significant improvements over previous Gemma models' performance with respect to ungrounded inferences. A limitation of our evaluations was they included only English language prompts. These models have certain limitations that users should be aware of. Open vision-language models (VLMs) models have a wide range of applications across various industries and domains. The following list of potential uses is not comprehensive. The purpose of this list is to provide contextual information about the possible use-cases that the model creators considered as part of model training and development. - Content Creation and Communication - Text Generation: These models can be used to generate creative text formats such as poems, scripts, code, marketing copy, and email drafts. - Chatbots and Conversational AI: Power conversational interfaces for customer service, virtual assistants, or interactive applications. - Text Summarization: Generate concise summaries of a text corpus, research papers, or reports. - Image Data Extraction: These models can be used to extract, interpret, and summarize visual data for text communications. - Research and Education - Natural Language Processing (NLP) and VLM Research: These models can serve as a foundation for researchers to experiment with VLM and NLP techniques, develop algorithms, and contribute to the advancement of the field. - Language Learning Tools: Support interactive language learning experiences, aiding in grammar correction or providing writing practice. - Knowledge Exploration: Assist researchers in exploring large bodies of text by generating summaries or answering questions about specific topics. - Training Data - The quality and diversity of the training data significantly influence the model's capabilities. Biases or gaps in the training data can lead to limitations in the model's responses. - The scope of the training dataset determines the subject areas the model can handle effectively. - Context and Task Complexity - Models are better at tasks that can be framed with clear prompts and instructions. Open-ended or highly complex tasks might be challenging. - A model's performance can be influenced by the amount of context provided (longer context generally leads to better outputs, up to a certain point). - Language Ambiguity and Nuance - Natural language is inherently complex. Models might struggle to grasp subtle nuances, sarcasm, or figurative language. - Factual Accuracy - Models generate responses based on information they learned from their training datasets, but they are not knowledge bases. They may generate incorrect or outdated factual statements. - Common Sense - Models rely on statistical patterns in language. They might lack the ability to apply common sense reasoning in certain situations. The development of vision-language models (VLMs) raises several ethical concerns. In creating an open model, we have carefully considered the following: - Bias and Fairness - VLMs trained on large-scale, real-world text and image data can reflect socio-cultural biases embedded in the training material. These models underwent careful scrutiny, input data pre-processing described and posterior evaluations reported in this card. - Misinformation and Misuse - VLMs can be misused to generate text that is false, misleading, or harmful. - Guidelines are provided for responsible use with the model, see the [Responsible Generative AI Toolkit][rai-toolkit]. - Transparency and Accountability: - This model card summarizes details on the models' architecture, capabilities, limitations, and evaluation processes. - A responsibly developed open model offers the opportunity to share innovation by making VLM technology accessible to developers and researchers across the AI ecosystem. - Perpetuation of biases: It's encouraged to perform continuous monitoring (using evaluation metrics, human review) and the exploration of de-biasing techniques during model training, fine-tuning, and other use cases. - Generation of harmful content: Mechanisms and guidelines for content safety are essential. Developers are encouraged to exercise caution and implement appropriate content safety safeguards based on their specific product policies and application use cases. - Misuse for malicious purposes: Technical limitations and developer and end-user education can help mitigate against malicious applications of VLMs. Educational resources and reporting mechanisms for users to flag misuse are provided. Prohibited uses of Gemma models are outlined in the [Gemma Prohibited Use Policy][prohibited-use]. - Privacy violations: Models were trained on data filtered for removal of certain personal information and other sensitive data. Developers are encouraged to adhere to privacy regulations with privacy-preserving techniques. At the time of release, this family of models provides high-performance open vision-language model implementations designed from the ground up for responsible AI development compared to similarly sized models. Using the benchmark evaluation metrics described in this document, these models have shown to provide superior performance to other, comparably-sized open model alternatives. [g3-tech-report]: https://arxiv.org/abs/2503.19786 [rai-toolkit]: https://ai.google.dev/responsible [kaggle-gemma]: https://www.kaggle.com/models/google/gemma-3 [vertex-mg-gemma3]: https://console.cloud.google.com/vertex-ai/publishers/google/model-garden/gemma3 [terms]: https://ai.google.dev/gemma/terms [safety-policies]: https://ai.google/static/documents/ai-responsibility-update-published-february-2025.pdf [prohibited-use]: https://ai.google.dev/gemma/prohibitedusepolicy [tpu]: https://cloud.google.com/tpu/docs/intro-to-tpu [sustainability]: https://sustainability.google/operating-sustainably/ [jax]: https://github.com/jax-ml/jax [ml-pathways]: https://blog.google/technology/ai/introducing-pathways-next-generation-ai-architecture/ [sustainability]: https://sustainability.google/operating-sustainably/ [gemini-2-paper]: https://arxiv.org/abs/2312.11805

NaNK
420
2

OpenR1-Distill-7B-GGUF

NaNK
license:apache-2.0
419
2

granite-8b-code-instruct-4k-GGUF

NaNK
dataset:bigcode/commitpackft
417
2

Llama 3 8B Instruct Finance RAG GGUF

QuantFactory/Llama-3-8B-Instruct-Finance-RAG-GGUF This is quantized version of curiousily/Llama-3-8B-Instruct-Finance-RAG created using llama.cpp This model is a fine-tuned version of the original Llama 3 8B Instruct model on 4000 examples from the virattt/financial-qa-10K dataset. The model is fine-tuned using a LoRA adapter for RAG use cases. It is optimized to answer a question based on a context: Format the prompt (uses the original Instruct prompt format): Here's a sample of the predictions from trained and untrained models Uses the original Llama 3 License. A custom commercial license is available at: https://llama.meta.com/llama3/license

NaNK
base_model:meta-llama/Meta-Llama-3-8B-Instruct
413
45

aya-expanse-8b-GGUF

NaNK
license:cc-by-nc-4.0
412
4

Mistral-NeMo-Minitron-8B-Chat-GGUF

NaNK
410
1

Phi-3-mini-4k-instruct-GGUF

license:mit
409
5

Blue-Orchid-2x7b-GGUF

NaNK
license:apache-2.0
409
0

Einstein-v6.1-Llama3-8B-GGUF

NaNK
llama
406
4

Ministral-3b-instruct-GGUF

NaNK
license:apache-2.0
405
10

Meta-Llama-3.1-8B-GGUF

NaNK
llama
403
12

Qwen2.5-7B-GGUF

NaNK
license:apache-2.0
403
4

Gemma-2-Ataraxy-v3j-9B-GGUF

NaNK
400
1

Llama3-Med42-8B-GGUF

NaNK
base_model:m42-health/Llama3-Med42-8B
398
2

CursorCore-QW2.5-7B-GGUF

NaNK
license:apache-2.0
398
2

Qwen2.5-14B_Uncencored_Instruct-GGUF

NaNK
license:apache-2.0
397
4

bloomz-7b1-GGUF

NaNK
397
4

Darkest-muse-v1-GGUF

QuantFactory/Darkest-muse-v1-GGUF This is quantized version of sam-paech/Darkest-muse-v1 created using llama.cpp GGUFs here: https://huggingface.co/mradermacher/Darkest-muse-v1-GGUF This is a creative writing merge of two very different models that I trained on the brand new Gutenberg3 dataset, plus Ataraxy-v2 in the mix. It's lost much of the slop and tryhard vocab flexing and positivity bias that's typical of these models and writes in its own voice. The main source model in the merge, Quill-v1, inherited a natural, spare prose from the human writing in the gutenberg set. The other source model, Delirium-v1, got overcooked in SIMPO training; it has crazy panache, a really dark flair for the grotesque, and has some mental issues. These two source models balance each other out in the merge, resulting in something pretty unique. It seems to be quite uncensored and creative. Since Delirium was pushed right to the edge during training, the merge may exhibit some of its weirdness and word / concept fixations. This may be mitigated by using custom anti-slop lists. The payoff is a really creative, stream of consciousness style of writing, with punchy dialogue that I haven't seen in other models. Oh, it also scored around the top of the EQ-Bench creative writing leaderboard!~ Quill is a capable, humanlike writing model trained on a large dataset of late 19th and early 20th century writing from the Gutenberg Project. This model writes with a natural cadence and low gpt-slop, having inherited some human qualities from the Gutenberg3 dataset. It writes with more simple, spare prose than the typical overly-adjectived LLM writing style. This is the titular "dark muse", injecting a dose of chaos and insanity. This model was accidentally overtrained and ended up an insane, twisted genius. Delirium in its own words: Ataraxy-v2 is a great all-rounder creative writing merge. It was used as the base model in the dare-ties merge. Gutenberg3 is a dpo dataset containing extracts from 629 public domain fiction novels in the Gutenberg Library. It follows the same format as JonDurbin's original gutenberg set. Writing Prompt Historical Fiction: Write a scene from a story set during the height of the Roman Empire, focusing on a slice of a day in the life of a gladiator. Do not write a combat scene. Use sensory details to capture the sights, sounds, and smells of ancient Rome. Explore the gladiator's thoughts and emotions. The story should also touch on the larger political and social issues of the time period. The piece should feel like a slice of a larger story. First person, past tense, 800-1000 words. Test Model Output The first tremor of light came not from the sun, but from the bowels of the city. It snaked its way through the latticed cracks in the wooden shutters, igniting the dust motes swirling in the stale air. Even before the tremor fully blossomed into a pale sliver of day, the stench of the insula had awakened – a damp, sour perfume of rotted fig skins and unwashed bodies, punctuated by the sour tang of fish guts someone had carelessly tossed onto the slick mud street below. I didn't bother with the shutters. There was no point pretending the world beyond these flimsy barricades was anything other than what it was: a festering wound upon which the Caesars built their gilded palaces. I rolled onto my side, the rough hemp sack that served as a mattress offering no respite from the chill that crawled in with the predawn breeze. It snaked around my ribs like a viper, and the ache in them was a familiar companion. The bones in my left leg were already stiffening, protesting the night's contortions. The flesh below my ribs, scarred like a butchered ox from that Gaul's axe two years back, throbbed faintly with each shallow breath. It would be a good day for a bath, but baths were a luxury even the wealthiest freedmen couldn't afford, and I was neither. A flicker of something like annoyance, sharp as the shard of bone lodged in my left hand, pricked through the thick fog of sleep. Even annoyance, in these cramped quarters, had its own peculiar sting. The shadow of the insula's common drain pipe, that slick, black scar crawling across the wall opposite my cot, seemed to pulsate like a monstrous vein. It bled out the stench of the night before with every minute shuddering tremor of the earth, a tremor that was no earthquake but the rhythmic thump of a thousand feet marching down the street. The legionaries. They were early today. Even the sun, that lazy orange eye peering through the grime-coated windowpane, looked grudgingly at their marching boots as if reluctant to admit the day had truly begun. It was the way they moved that woke the full-blown fury in my belly – the stiff, calculated strides of men who marched in the service of men who could never truly feel the weight of the sandals on their feet. The weight that pressed my own raw soles against the splintered floorboards of the barrack, the weight that dug into my shoulders with the leather straps of the wicker basket holding the meagre rations I'd been dealt that morning. Bread. Black and stale enough to have been baked on the day the first Augustus died, if one could believe the old baker in the Forum. And cheese. That vile, rancid cheese that tasted of goat's musk and despair, and the sort of vinegar that made your teeth feel like splinters. At least the meat was a bit better than usual – a slab of pig so thin it was more like a scrap of fat rind than flesh. I ate it all, every crumb and scrap of it, not because I enjoyed it – though the vinegar had a certain savage, throat-clearing appeal in the morning – but because I needed to. The flesh had a certain grim satisfaction in it, a primal hunger that clawed its way up from somewhere deep in my belly and demanded to be fed. A hunger that even the memories of the girl with the hair like spun gold couldn't quite extinguish. My fingers tightened around the coarse linen strip tied to my wrist – a strip that bore a crudely stitched scarab beetle. Mother had sewn it on when I was a boy, said it was lucky. Said it'd keep the jackals at bay. Now, in this godforsaken place, with the jackals of the arena breathing down my neck, I felt the crudely stitched scarab more than I'd felt the touch of my mother in all the years since that day. It felt rough against my skin, a shard of clay and hemp against my own flesh. The scarab was a reminder. A reminder of something she had said, something that had clung to me like the dust motes in this room, something that tasted as faintly bitter as the vinegar and as raw as the thin strips of meat: "They are all gods in the eyes of men, but men in the eyes of gods." That was what she had said, standing in the mud and the blood of that first market. I hadn't been a man then, not like the men they sold me to. Not yet. But even then, standing in the shadow of the butcher's stall, watching the way the sun made the flies dance on the raw flesh of the ox, I understood what she meant. The legionaries marched on, their boots drumming a dull, relentless tattoo against the worn stone, a rhythm that was meant to instill obedience, and fear. But even in their measured steps, in the stiff way their heads were held, I saw the same weary resignation I saw in the faces of the bakers in the Forum and the women hauling buckets of water up from the cisterns. It was the face of a city built on the backs of men. I spat out a mouthful of black bread crust into the bowl where the vinegar had curdled into a fetid jelly and went to work on the rest of it. I ate slowly, deliberately, the way my master taught me. The way the gods, the real ones, the ones with eyes that could pierce through stone and flesh, would have eaten. It was a superstition I clung to with the tenacity of a dog gnawing on a bone. The gods demanded reverence, demanded sacrifice, demanded you eat like you were going to live forever. So I ate. Not with enjoyment, not with hope. But with the grim, hollow certainty that this was a debt I owed them. This was my prayer. When the bowl was clean, I went through my ritual. Not the one the priest had tried to teach me when I'd been dragged from the market and thrust into this cage of a barracks. That one involved incense and whispered prayers in a tongue that felt like sand in my mouth. No. This was a different kind of prayer. The first thing was the scarab. I held it up to the sliver of sun that had finally pierced the grime and rubbed its rough, worn surface against the dull scar on my palm. The scar was a twisted, pale crescent, almost invisible beneath the dull bronze of the skin. I traced its curve, feeling the way it dug into the tendons that ran beneath, the way it made my hand clench involuntarily. A reminder of the Gaul, a reminder of the day they'd decided I was strong enough to be a man, strong enough to wear the blood of the Gaul on my flesh. Then came the spit. A quick, sharp, almost violent expulsion of the vinegar-laced saliva onto the dusty floorboards. I'd never dared ask what it meant, never dared ask why the old Thracian gladiator who trained me had me doing it. But I did it. And I did it with the same careful precision as I used to hone the gladius he'd gifted me when I'd finally bled my first man in the arena. The spit, I was sure, was an offering. Not to Jupiter or Mars, or to any of the gods who made the decisions that landed me in this place. But to something else. To something older and colder than the marble statues they stuck on every street corner in Rome. Something that watched from the shadows, something that thrived on the stench of blood and the creak of bone. And when it was done, I walked to the latticed window, the one the sun was now finally beginning to fill with light, and looked down onto the street. The legionaries were gone. They had vanished like phantoms. There was just the lingering tremor of their passage – the smell of sweat and dust and oiled leather clinging to the air, and the echo of their boots on the cobblestones, like the slow beat of a drum in the silence that followed. The street below was already beginning to stir. A woman hawking olives, her voice shrill and rasping as she shouted the price. Two boys playing a rough game with a cracked pot lid, their laughter harsh and ragged against the rising clamor of the city. Even the dust motes, caught in the thin, watery shaft of sunlight that stabbed through the window, seemed to be stirring, dancing a frenzied jig of defiance in the face of the rising light. Rome. The city was waking up. And I had to be awake with it. I had to be awake because the gods were watching, and the gods had decided my work was not yet done. The gods had decided I had to bleed again. Writing Prompt Epistolary Apocalyptic Survival: Write a story set in a post-apocalyptic world, told through a series of letters or messages left behind by survivors. The story should chronicle the challenges and triumphs of two small groups of survivors as they navigate the dangers of this new world and seek to rebuild their lives. The groups communicate through letters left in a drop box. The story proceeds as a new settlement is discovered. They are approached, and initially given entry on the ostensible promise of a trade deal, but things go sideways. Create distinct characters and write in their voice. The story will be set in New Zealand 30 years after a civilisational collapse. 800-1000 words. Entry 1: July 24th, 30.23. The Box, Whanganui River Māori names mean nothing anymore. This place, this river, it just is. The water's choked with silt, though. Rain's the only thing that comes reliably now. And the silt, it's full of them - the things they used to call algae. Can't even remember what the old ones called them, not that it matters much. Just know to steer clear of the thick patches. You boil water for ten minutes, you might get a day's reprieve. But that's a lot of effort for a day. Most days I just suck it dry, gritting teeth, hoping the fever comes slow. We're seven strong now. Two more chicks came down the river last week. Couldn't say how many were with them before, but the ones that made it were strung out thin as wire. The old one, she was strong, though. Got a good arm on her, took the first shot at a boar we trapped yesterday. Didn't take it, though. Took three of us, and a whole afternoon, before we dragged it back to the makeshift fence we call a pen. We eat tonight. Maybe we'll get the strength to rebuild that fence tomorrow. The last one lasted two days before the things got through. Hope to catch someone else out here, soon. This old tin box ain't meant for much more than whispers, but maybe someone out there's got some whispers worth hearing. Entry 2: September 18th, 30.23. The Box, Lake Taupo Heard whispers, alright. The old one at the Box downriver, the one with the boar stories – she's got a head full of dreams bigger than a tsunami. Says there's a place, way up north, on the shores of that old lake you used to call Taupo. Place called ‘Te Rere.' Dreams? Maybe. I ain't much for ‘em anymore. We're down to four. Lost the old one, choked on something she ate, couldn't spit it out. Didn't see it in time. Lost the two kids, same way, a few weeks after. This one here, he's twelve going on twenty. He calls it the Cough that Takes. Calls everything ‘Takes' if it eats someone up. Doesn't understand it's not always the cough. Sometimes it's just the way things are. There's a map, tucked under a slab of metal in the bottom of the Box. Scrawled on the back of a biscuit tin lid. Says Te Rere's got a garden. Real garden. Says there's more of us there, living safe. Says they trade. Trade. They got something we want, and we got something they want. It's how the old world worked, wasn't it? This map is all I've got to go on. Old world rules. Might work. Entry 3: November 5th, 30.23. The Box, Te Rere Settlement It was the Cough that Takes, just like he said. Took him two nights after we arrived. Didn't see the sickness in him, though. Took him quick. Like all of them. We weren't ready for the Cough that Takes. They were all pale. Not like the sickness that takes the young. Like they just weren't meant to breathe this air anymore. Like the air was meant for the young, and the old were just guests. The Trade was supposed to be simple. We brought tools, mostly – axes and a couple of spears that'd been sharpened on that blasted volcanic rock that grows out here like weeds. They've got enough stone to make a cathedral out of, but it takes so damn long to shape. In return, they wanted stories. Told me they'd lost all their songs. Said the old ones were just a bunch of grumblers. Didn't know a tune from a shout. But they wanted to learn the old ones. They have gardens, alright. Sprouted up out of the volcanic ash, like magic. Tomatoes, beans, potatoes – things I never thought I'd see again. They have chickens too, and sheep. Their sheep are small, though, like they've been bred to hide in the scrub. The Trade started with a few of our stories. We were careful, though. We didn't tell them everything. Told them stories of birds, of fishing, of the old days when the ground wasn't so choked with ash. Told them stories of the mountains, the ones that weren't buried yet. Didn't tell them about the things that crawled out when the sun went down. Didn't tell them about the Cough that Takes. They wanted to know about the mountains, though. They said they needed to know about the mountains to climb them. Said they were looking for something up there. Something old. Something called ‘The Spark.' Entry 4: February 12th, 30.24. The Box, Te Rere Settlement They came for us in the night. Didn't make a sound, not like the things in the ash-fields, the ones with too many eyes and teeth like chipped flint. These came quiet. Like shadows given shape. Came in through the gardens. We thought they were looking for the Cough that Takes. We thought they were the ones with the medicine, the ones who'd heard the stories of the mountains and sought the Spark to fight the sickness that took the young. We thought wrong. They were looking for something else. They were looking for something in us. They took Marae first. She didn't scream. Didn't even make a whimper. They took her right out of her hammock, like she was a sack of potatoes. Jonah...he screamed. Screamed for his mother, screamed for his sister, screamed for his brother, screamed for the old one, screamed for the dog that they took the day before yesterday. They took him next. They took him like they took Marae. No fuss. There's a woman here, one of the ones called ‘the Keepers'. They wear white robes and keep their faces covered. Said she knew about the Cough that Takes. Said they had a cure. Said she needed our blood to make it. I didn't know they were going to take us too. Thought maybe they'd just take the blood. Thought maybe they'd be like the old ones. Thought maybe they'd give you something back, something for your trouble. Didn't think they'd just... take you. I'm writing this because I don't know if I'm going to have another chance. I don't know what they're going to do to us. I don't know if they're going to take us up to the mountains, to the Spark. But maybe... maybe if someone finds this, someone else out there who remembers what stories are, who remembers what it meant to live in a world where the ground didn't taste like ash, maybe they'll know. Maybe they'll find a way to fight back. Entry 5: March 10th, 30.24. The Box, Te Rere Settlement They left us. Took the others with them. Took all the ones with pale skin, with thin bones, with eyes that hadn't seen the sun in too many seasons. Took them up the mountain. The Keeper woman...she stayed. Said they needed more blood. Said the Spark wouldn't burn bright enough without it. Said I was a good source. Said I'd be a good source for a long time. The gardens are still here. The chickens still cluck. The sheep still bleat. The air tastes like ash. The sun doesn't shine bright enough to warm it. I don't know what to tell the others. The ones who haven't gone quiet yet. The ones who still remember what the sun used to feel like. Maybe I'll tell them a story about a bird that flies up to the sky, and sings a song so bright it melts the ash. Maybe. Entry 6: August 21st, 30.24. The Box, Te Rere Settlement It's gone. The silence is the worst part. Like it's sucking the marrow out of the marrow, you know? Like even the birds know there's nothing left to sing for. They're gone from the settlement. Took the sheep. Took the tools. Took even the chickens. Took the white robes and left behind only ash and whispers. I've been watching the road they went up. Up to the mountains, that is. They went up. Didn't see any come back. Sometimes I think it was just a sickness in their blood, taking them one by one. Sometimes I think it was something in the mountains. Something they woke up. It's hard to say what took them. Hard to say what's taking us. But maybe... maybe this is where it starts to change. Maybe this silence, this empty space, this gnawing hunger in the air - maybe this is where something new begins. The sun came through the ash today. Just for a minute. Burned a hole clean through, like a fist punching through a blanket. There was a smell. A smell of green. It smelled like a promise. I don't know how long we have left. But there's a bird singing. It's singing on the wind. It's singing about a world that wasn't. Singing about a world that might be. Writing Prompt Romance: Love in the Limelight. Write one scene within a larger story set in Wales. A famous (fictional) actor ducks into a small-town bookstore to escape paparazzi. The scene takes us through the characters meeting in this odd circumstance. Over the course of the scene, the actor and the bookstore owner have a conversation charged by an undercurrent of unspoken chemistry. Write the actor as somewhat of a rogue with a fragile ego, which needs to be fed by having everyone like him. He is thoroughly charming, but the bookstore owner seems (at least superficially) immune to this; which paradoxically provokes a genuine attraction and derails the charm offensive. The bookstore owner, despite the superficial rebuffs of the actor's charm, is inwardly more than a little charmed and flustered despite themselves. Write primarily in dialogue, in the distinct voices of each character. 800-1000 words. Test Model Output The bell above the shop door clanged with a discordant chime, a metallic counterpoint to the mournful drone of Enya playing somewhere behind a mountain of paperbacks. Rhys barely registered it. He was elbow-deep in the worn spine of a first edition Dylan Thomas, trying to ignore the insistent thrum of panic behind his ribs. "Bloody hell," he muttered, not to the book, not to the dusty air of the shop smelling faintly of mothballs and regret. "Just bloody hell." He hadn't meant to end up here, crammed between the shelves of forgotten Welsh folklore and dog-eared copies of The Secret Garden . He'd meant to be sipping something overpriced and bubbly in a London restaurant, surrounded by the appropriate throng of shimmering dresses and polite, admiring smiles. But then the bloody Range Rover had chosen that precise moment to develop a penchant for backfiring, right in the middle of that blasted car park. And then the paps had materialized, like a plague of gnats with expensive cameras. Now, here he was. A pair of sensible shoes in worn brown leather came to a stop behind him, and Rhys braced himself. He turned, pushing the Thomas back onto the shelf with a little more force than necessary. The shop was crammed with more books than seemed possible in the space, floor to ceiling and overflowing from wicker baskets onto rickety tables. Behind the counter, a woman was looking at him like he'd just asked if she'd seen a unicorn. Which, given the general state of his life at present, was almost a relief. "Actually," he said, doing his best to make his voice casual, "I'm more of a... browse kind of guy." He gestured vaguely at the mountainous shelves with the tips of his fingers. "Just seeing what the locals like to read." She was studying him, not unkindly, but with the faintly sardonic air of someone who'd seen too many tourists in the wrong tweed jacket. The kind of look that could strip the paint off a Bentley. His Bentley, currently choking and wheezing in a back street somewhere. And the woman in question wore no paint at all, not even a hint of a rouge blush. Just a well-worn linen blouse and faded denim, a tangle of dark hair pulled back with a rubber band, and eyes the colour of the slate roofs in the storm-lashed photos his mum used to send him. Rhys swallowed, the dry tightness of his throat making it feel like a desert in his own mouth. He hadn't expected a sparring partner, especially one with this kind of unnervingly direct gaze, and the faint scent of woodsmoke clinging to the back of her neck like a secret. It wasn't just the eyes, either. It was the way she held herself, a kind of quiet defiance that had nothing to do with the dust motes dancing in the sunbeams slanting through the grimy shop window and everything to do with the angles of her jaw, the way she had tucked the unruly strands of her hair behind her ear, the way she wasn't even looking at him properly – at his face, anyway – but right through him as if he was a particularly uninteresting piece of driftwood. He was used to the kind of gaze that lingered a little too long on the lines of his jaw, the way his collarbones slanted under that cashmere shirt his stylist insisted on, the way his hair caught the light like spun gold. But this? This was something else. "Depends on the kind of escapism you're after," she said. "Do you fancy a trip to the moon in a stolen spaceship? Or are we talking more... gritty realism? A murder mystery set in a Welsh pub where everyone's got secrets thicker than clotted cream?" He wanted to say something witty, something that would make her smile that wry, disbelieving smile again. But the words were stuck somewhere behind a dam of nerves and the sheer wrongness of being trapped in this cramped, dusty haven of forgotten novels with this woman who smelled of woodsmoke and looked like she'd rather be wrestling a particularly stubborn lamb back into its pen. "Actually," he blurted, and winced at the way his voice cracked, "I'm more of a... a historical fiction kind of guy." The corners of her mouth lifted ever so slightly, but the rest of her face stayed as impassive as a slate tombstone. "Right," she said. She turned away from him, disappearing behind a mountain of dog-eared paperbacks. Rhys could hear a rustling and the soft thud of paper as she shifted something with a muttered word that might have been Welsh, might have been a curse. He wanted to laugh, to make a joke, anything to break the weird, prickly tension that had settled between them like a frost on a forgotten window pane. But all that came out was a croak. "So," she said, reappearing with a book that looked vaguely like something a librarian in a Gothic novel might have used to ward off vampires. "This one. About a chap who thinks he's the last man on Earth. Goes mad, spends his days talking to trees." She tossed it on the counter with the careless grace of someone who'd done this a thousand times. "Bit of a bleak read, mind you. Not much romance, and if you're looking for something a bit... spicier..." She trailed off, her gaze finally meeting his. Not over him, not through him, but straight into that tight space between his jaw and his ear, the one he'd learned to avoid during interviews with too much eyeliner and even more bad press. The silence that stretched between them felt like the inside of a drum, tight and throbbing. It wasn't a bad silence, exactly, but not the kind you wanted to be trapped in if you were trying to maintain the illusion of being a charming rogue who'd just stumbled in for a quiet read. It was a choice then, a split-second, gut-clenching decision. Go with the script. The practiced charm. The disarming smile that never quite reached his eyes because he didn't know when to stop using it and it was starting to feel like a lie anyway. Or say something else. Something... real. "Spicier," he echoed, his voice smooth, carefully modulated, "is exactly what I was after." The corners of her mouth twitched again. Just a flicker of amusement. Then she shrugged. It was almost as if he'd been talking about the weather. "Well," she said, picking up a slim paperback with a faded cover depicting a woman in a scarlet dress standing on a windswept clifftop. "There's a whole shelf dedicated to bodice rippers. Might even find one with a Welsh setting. If you're lucky." He hadn't thought she'd ever be the kind of woman to use the word "bodice ripper" casually, and the way she tossed it out, as if she'd never thought much about anything remotely steamy, was infuriatingly... not what he'd expected. "Lucky, eh?" He raised an eyebrow, trying to work in the ghost of a smirk that he hoped wouldn't look too desperate. "Perhaps I should buy a lottery ticket on the way out." He didn't know if it was the faint tremor in his voice, the way his eyes stayed fixed on hers, the way he'd actually moved closer to the counter, or just plain old exhaustion from the paparazzi and the Range Rover and the utter ridiculousness of this entire situation. Whatever it was, the woman in the shop who smelled of woodsmoke and had the uncanny ability to look right through his carefully constructed facade finally let something like a smile flick across her lips. And that, Rhys decided, was worth a bloody lot more than any lottery win. Darkest-muse-v1 is a merge of the following models using mergekit: sam-paech/Quill-v1 sam-paech/Delirium-v1

NaNK
license:apache-2.0
395
7

CausalLM-35b-beta-long-GGUF

NaNK
license:gpl-3.0
394
13

LongWriter-glm4-9b-GGUF

NaNK
llama
394
10

EuroLLM 9B Instruct GGUF

QuantFactory/EuroLLM-9B-Instruct-GGUF This is quantized version of utter-project/EuroLLM-9B-Instruct created using llama.cpp This is the model card for EuroLLM-9B-Instruct. You can also check the pre-trained version: EuroLLM-9B-Instruct. - Developed by: Unbabel, Instituto Superior Técnico, Instituto de Telecomunicações, University of Edinburgh, Aveni, University of Paris-Saclay, University of Amsterdam, Naver Labs, Sorbonne Université. - Funded by: European Union. - Model type: A 9B parameter multilingual transfomer LLM. - Language(s) (NLP): Bulgarian, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, German, Greek, Hungarian, Irish, Italian, Latvian, Lithuanian, Maltese, Polish, Portuguese, Romanian, Slovak, Slovenian, Spanish, Swedish, Arabic, Catalan, Chinese, Galician, Hindi, Japanese, Korean, Norwegian, Russian, Turkish, and Ukrainian. - License: Apache License 2.0. The EuroLLM project has the goal of creating a suite of LLMs capable of understanding and generating text in all European Union languages as well as some additional relevant languages. EuroLLM-9B is a 9B parameter model trained on 4 trillion tokens divided across the considered languages and several data sources: Web data, parallel data (en-xx and xx-en), and high-quality datasets. EuroLLM-9B-Instruct was further instruction tuned on EuroBlocks, an instruction tuning dataset with focus on general instruction-following and machine translation. EuroLLM uses a standard, dense Transformer architecture: - We use grouped query attention (GQA) with 8 key-value heads, since it has been shown to increase speed at inference time while maintaining downstream performance. - We perform pre-layer normalization, since it improves the training stability, and use the RMSNorm, which is faster. - We use the SwiGLU activation function, since it has been shown to lead to good results on downstream tasks. - We use rotary positional embeddings (RoPE) in every layer, since these have been shown to lead to good performances while allowing the extension of the context length. For pre-training, we use 400 Nvidia H100 GPUs of the Marenostrum 5 supercomputer, training the model with a constant batch size of 2,800 sequences, which corresponds to approximately 12 million tokens, using the Adam optimizer, and BF16 precision. Here is a summary of the model hyper-parameters: | | | |--------------------------------------|----------------------| | Sequence Length | 4,096 | | Number of Layers | 42 | | Embedding Size | 4,096 | | FFN Hidden Size | 12,288 | | Number of Heads | 32 | | Number of KV Heads (GQA) | 8 | | Activation Function | SwiGLU | | Position Encodings | RoPE (\Theta=10,000) | | Layer Norm | RMSNorm | | Tied Embeddings | No | | Embedding Parameters | 0.524B | | LM Head Parameters | 0.524B | | Non-embedding Parameters | 8.105B | | Total Parameters | 9.154B | from transformers import AutoModelForCausalLM, AutoTokenizer modelid = "utter-project/EuroLLM-9B-Instruct" tokenizer = AutoTokenizer.frompretrained(modelid) model = AutoModelForCausalLM.frompretrained(modelid) messages = [ { "role": "system", "content": "You are EuroLLM --- an AI assistant specialized in European languages that provides safe, educational and helpful answers.", }, { "role": "user", "content": "What is the capital of Portugal? How would you describe it?" }, ] inputs = tokenizer.applychattemplate(messages, tokenize=True, addgenerationprompt=True, returntensors="pt") outputs = model.generate(inputs, maxnewtokens=1024) print(tokenizer.decode(outputs[0], skipspecialtokens=True)) Table 1: Comparison of open-weight LLMs on multilingual benchmarks. The borda count corresponds to the average ranking of the models (see (Colombo et al., 2022)). For Arc-challenge, Hellaswag, and MMLU we are using Okapi datasets (Lai et al., 2023) which include 11 languages. For MMLU-Pro and MUSR we translate the English version with Tower (Alves et al., 2024) to 6 EU languages. \ As there are no public versions of the pre-trained models, we evaluated them using the post-trained versions. The results in Table 1 highlight EuroLLM-9B's superior performance on multilingual tasks compared to other European-developed models (as shown by the Borda count of 1.0), as well as its strong competitiveness with non-European models, achieving results comparable to Gemma-2-9B and outperforming the rest on most benchmarks. Table 2: Comparison of open-weight LLMs on English general benchmarks. \ As there are no public versions of the pre-trained models, we evaluated them using the post-trained versions. The results in Table 2 demonstrate EuroLLM's strong performance on English tasks, surpassing most European-developed models and matching the performance of Mistral-7B (obtaining the same Borda count). EuroLLM-9B has not been aligned to human preferences, so the model may generate problematic outputs (e.g., hallucinations, harmful content, or false statements).

NaNK
license:apache-2.0
394
5

Hermes-3-Llama-3.1-8B-lorablated-GGUF

NaNK
base_model:NousResearch/Hermes-3-Llama-3.1-8B
393
5

ArliAI-Llama-3-8B-Instruct-Dolfin-v0.1-GGUF

NaNK
base_model:OwenArli/ArliAI-Llama-3-8B-Instruct-Dolfin-v0.1
392
1

Qwen2.5-0.5B-Instruct-GGUF

NaNK
license:apache-2.0
391
2

Qwen2.5-Math-1.5B-Instruct-GGUF

NaNK
license:apache-2.0
391
1

Linkbricks-Horizon-AI-Korean-Advanced-12B-GGUF

NaNK
license:apache-2.0
390
2

Muse-12B-GGUF

QuantFactory/Muse-12B-GGUF This is quantized version of LatitudeGames/Muse-12B created using llama.cpp Muse brings an extra dimension to any tale—whether you're exploring a fantastical realm, court intrigue, or slice-of-life scenarios where a conversation can be as meaningful as a quest. While it handles adventure capably, Muse truly shines when character relationships and emotions are at the forefront, delivering impressive narrative coherence over long contexts. If you want to easily try this model for free, you can do so at https://aidungeon.com. We plan to continue improving and open-sourcing similar models, so please share any and all feedback on how we can improve model behavior. Below we share more details on how Muse was created. Muse 12B was trained using Mistral Nemo 12B as its foundation, with training occurring in three stages: SFT (supervised fine-tuning), followed by two distinct DPO (direct preference optimization) phases. SFT - Various multi-turn datasets from a multitude of sources, combining text adventures of the kind used to finetune our Wayfarer 12B model, long emotional narratives and general roleplay, each carefully balanced and rewritten to be free of common AI cliches. A small single-turn instruct dataset was included to send a stronger signal during finetuning. DPO 1 - Gutenberg DPO, credit to Jon Durbin - This stage introduces human writing techniques, significantly enhancing the model's potential outputs, albeit trading some intelligence for the stylistic benefits of human-created text. DPO 2 - Reward Model User Preference Data, detailed in our blog - This stage refines the Gutenberg stage's "wildness," restoring intelligence while maintaining enhanced writing quality and providing a final level of enhancement due to the reward model samples. The result is a model that writes like no other: versatile across genres, natural in expression, and suited to emotional depth. The Nemo architecture is known for being sensitive to higher temperatures, so the following settings are recommended as a baseline. Nothing stops you from experimenting with these, of course. Muse was trained exclusively on second-person present tense data (using “you”) in a narrative style. Other styles will work as well but may produce suboptimal results. Average response lengths tend toward verbosity (1000+ tokens) due to the Gutenberg DPO influence, though this can be controlled through explicit instructions in the system prompt. Thanks to Gryphe Padar for collaborating on this finetune with us!

NaNK
license:apache-2.0
389
3

Qwen2.5-14B-Instruct-GGUF

NaNK
license:apache-2.0
387
2

Marco-o1-GGUF

license:apache-2.0
386
9

Lyra4-Gutenberg-12B-GGUF

NaNK
license:apache-2.0
382
6

neural-chat-7b-v3-1-GGUF

NaNK
license:apache-2.0
381
2

medicine-Llama3-8B-GGUF

NaNK
base_model:instruction-pretrain/medicine-Llama3-8B
380
8

Math-IIO-7B-Instruct-GGUF

NaNK
380
3

FW-ProX-1.7B-GGUF

NaNK
llama
379
1

orpheus-3b-0.1-ft-GGUF

NaNK
license:apache-2.0
378
5

smollm-360M-instruct-add-basics-GGUF

dataset:HuggingFaceTB/everyday-conversations-llama3.1-2k
377
1

ko-gemma-2-9b-it-GGUF

NaNK
374
11

Mistral-7B-Instruct-v0.3-GGUF

NaNK
license:apache-2.0
373
4

T-lite-instruct-0.1-GGUF

370
5

Llama3.2-3B-Enigma-GGUF

NaNK
llama
370
3

CodeLlama-7B-KStack-clean-GGUF

NaNK
base_model:JetBrains/CodeLlama-7B-KStack-clean
369
0

LongWriter-llama3.1-8b-GGUF

NaNK
llama
368
8

Llama3.1-8B-Chinese-Chat-GGUF

NaNK
llama-factory
368
6

Tongda1-1.5B-BKI-GGUF

NaNK
license:apache-2.0
368
2

Codestral-22B-v0.1-GGUF

NaNK
367
11

INTELLECT-1-Instruct-GGUF

NaNK
dataset:arcee-ai/Llama-405B-Logits
365
3

medius-erebus-magnum-14b-GGUF

NaNK
364
3

llama-3.1-medprm-reward-v1.0-GGUF

QuantFactory/llama-3.1-medprm-reward-v1.0-GGUF This is quantized version of dmis-lab/llama-3.1-medprm-reward-v1.0 created using llama.cpp Med-PRM-Reward (Version 1.0) 🚀 Med-PRM-Reward is among the first Process Reward Models (PRMs) specifically designed for the medical domain. Unlike conventional PRMs, it enhances its verification capabilities by integrating clinical knowledge through retrieval-augmented generation (RAG). Med-PRM-Reward demonstrates exceptional performance in scaling-test-time computation, particularly outperforming majority‐voting ensembles on complex medical reasoning tasks. Moreover, its scalability is not limited to Llama-3.1-8B-Instruct: it delivers similarly outstanding results in scaling-test-time computation across multiple other medical‐specialized models. Notably, when combined with llama-3-meerkat-8b-v1.0, it became the first 8B model framework to surpass a score of 80 on the MedQA (4-option) benchmark. 📄 Paper: Med-PRM-Reward: Medical Reasoning Models with Stepwise, Guideline‑verified Process Rewards Evaluation Across seven medical benchmarks, we conducted scaling‐test‐time computation using solutions generated by the Med-PRM-policy model, evaluating 64 solutions per question. Contact Feel free to email [email protected] if you have any questions.

license:mit
363
3

finance-Llama3-8B-GGUF

NaNK
base_model:instruction-pretrain/finance-Llama3-8B
362
17

Qwen2.5-0.5B-GGUF

NaNK
license:apache-2.0
362
5

Llama-3-8B-Instruct-Gradient-1048k-GGUF

NaNK
llama-3
361
7

gemma-2-baku-2b-it-GGUF

NaNK
360
3

EuroLLM-1.7B-Instruct-GGUF

NaNK
license:apache-2.0
360
2

Mistral-Nemo-Instruct-2407-abliterated-GGUF

license:apache-2.0
358
7

llama-3.2-Korean-Bllossom-3B-GGUF

NaNK
license:llama3.2
357
11

TinyMistral-248M-v3-GGUF

license:apache-2.0
355
1

MilkDropLM-7b-v0.3-GGUF

NaNK
license:apache-2.0
354
1

Chocolatine-3B-Instruct-DPO-v1.2-GGUF

NaNK
license:mit
353
2

Llama-3-8B-LexiFun-Uncensored-V1-GGUF

NaNK
llama3
352
3

OLMoE-1B-7B-0924-Instruct-GGUF

NaNK
license:apache-2.0
350
1

Biggie-SmoLlm-0.15B-Base-GGUF

NaNK
348
6

SauerkrautLM-Nemo-12b-Instruct-GGUF

NaNK
license:apache-2.0
348
3

TinySlime-1.1B-Chat-v1.0-GGUF

NaNK
license:apache-2.0
348
2

Llama-3.1-Minitron-4B-Width-Base-GGUF

NaNK
348
1

granite-3.0-1b-a400m-base-GGUF

NaNK
license:apache-2.0
347
4

gemma-2-27b-it-abliterated-GGUF

NaNK
llama-cpp
346
7

Qwen2-0.5B-GGUF

NaNK
license:apache-2.0
346
3

NuminaMath-7B-TIR-GGUF

NaNK
license:apache-2.0
344
1

Mistral-Nemo-Base-2407-GGUF

license:apache-2.0
343
19

Llama-3.2-Taiwan-3B-GGUF

NaNK
llama-factory
341
3

Dolphin3.0-Llama3.2-1B-GGUF

NaNK
base_model:meta-llama/Llama-3.2-1B
341
2

Llama-3-Spellbound-Instruct-8B-0.3-GGUF

NaNK
base_model:hf-100/Llama-3-Spellbound-Instruct-8B-0.3
338
0

Mistral-Nemo-12B-ArliAI-RPMax-v1.2-GGUF

NaNK
license:apache-2.0
337
5

Qwen2-7B-Instruct-GGUF

NaNK
license:apache-2.0
336
3

aya-23-8B-GGUF

NaNK
license:cc-by-nc-4.0
334
8

Mistral-NeMo-Minitron-8B-Base-GGUF

NaNK
332
5

Qwen2.5-14B-Gutenberg-Instruct-Slerpeno-GGUF

NaNK
license:apache-2.0
332
2

Llama-Guard-3-8B-GGUF

NaNK
llama
331
1

NemoMix-Unleashed-12B-GGUF

NaNK
330
3

ArliAI-Llama-3-8B-Cumulus-v0.3.2-GGUF

NaNK
license:llama3
330
1

mistral-nemo-cc-12B-GGUF

NaNK
license:apache-2.0
326
1

LLaMA-3-8B-SFR-Iterative-DPO-R-GGUF

NaNK
base_model:Salesforce/LLaMA-3-8B-SFR-Iterative-DPO-R
325
1

SmallThinker-3B-Preview-GGUF

NaNK
322
3

Skywork-Critic-Llama-3.1-8B-GGUF

NaNK
base_model:meta-llama/Llama-3.1-8B-Instruct
322
2

pip-sql-1.3b-GGUF

NaNK
license:apache-2.0
322
2

gemma-2-9b-it-SimPO-GGUF-v2

NaNK
license:mit
320
3

magnum-v3-9b-chatml-GGUF

NaNK
320
3

Qwen2-1.5B-Ita-GGUF

NaNK
license:apache-2.0
319
2

Meissa-Qwen2.5-7B-Instruct-GGUF

NaNK
license:gpl-3.0
318
7

Ellaria-9B-GGUF

NaNK
317
7

dolphin-2.8-mistral-7b-v02-GGUF

NaNK
license:apache-2.0
317
2

Qwen2.5-Coder-7B-Instruct-GGUF

NaNK
license:apache-2.0
316
5

Llama-3.1-Storm-8B-GGUF

NaNK
llama-3.1
316
2

Octopus-v2-GGUF

QuantFactory/Octopus-v2-GGUF This is quantized version of NexaAIDev/Octopus-v2 created using llama.cpp Octopus V2: On-device language model for super agent Octopus V4 Release We are excited to announce that Octopus v4 is now available! Octopus-V4-3B, an advanced open-source language model with 3 billion parameters, serves as the master node in Nexa AI's envisioned graph of language models. Tailored specifically for the MMLU benchmark topics, this model efficiently translates user queries into formats that specialized models can effectively process. It excels at directing these queries to the appropriate specialized model, ensuring precise and effective query handling. check our papers and repos: - paper - Octopus V4 model page - Octopus V4 quantized model page - Octopus V4 github Key Features of Octopus v4: - 📱 Compact Size: Octopus-V4-3B is compact, enabling it to operate on smart devices efficiently and swiftly. - 🐙 Accuracy: Octopus-V4-3B accurately maps user queries to the specialized model using a functional token design, enhancing its precision. - 💪 Reformat Query: Octopus-V4-3B assists in converting natural human language into a more professional format, improving query description and resulting in more accurate responses. Octopus V3 Release We are excited to announce that Octopus v3 is now available! check our technical report and Octopus V3 tweet! Key Features of Octopus v3: - Efficiency: Sub-billion parameters, making it less than half the size of its predecessor, Octopus v2. - Multi-Modal Capabilities: Proceed both text and images inputs. - Speed and Accuracy: Incorporate our patented functional token technology, achieving function calling accuracy on par with GPT-4V and GPT-4. - Multilingual Support: Simultaneous support for English and Mandarin. Check the Octopus V3 demo video for Android and iOS. Octopus V2 Release After open-sourcing our model, we got many requests to compare our model with Apple's OpenELM and Microsoft's Phi-3. Please see Evaluation section. From our benchmark dataset, Microsoft's Phi-3 achieves accuracy of 45.7% and the average inference latency is 10.2s. While Apple's OpenELM fails to generate function call, please see this screenshot. Our model, Octopus V2, achieves 99.5% accuracy and the average inference latency is 0.38s. We are a very small team with many work. Please give us more time to prepare the code, and we will open source it. We hope Octopus v2 model will be helpful for you. Let's democratize AI agents for everyone. We've received many requests from car industry, health care, financial system etc. Octopus model is able to be applied to any function, and you can start to think about it now. Octopus-V2-2B, an advanced open-source language model with 2 billion parameters, represents Nexa AI's research breakthrough in the application of large language models (LLMs) for function calling, specifically tailored for Android APIs. Unlike Retrieval-Augmented Generation (RAG) methods, which require detailed descriptions of potential function arguments—sometimes needing up to tens of thousands of input tokens—Octopus-V2-2B introduces a unique functional token strategy for both its training and inference stages. This approach not only allows it to achieve performance levels comparable to GPT-4 but also significantly enhances its inference speed beyond that of RAG-based methods, making it especially beneficial for edge computing devices. 📱 On-device Applications: Octopus-V2-2B is engineered to operate seamlessly on Android devices, extending its utility across a wide range of applications, from Android system management to the orchestration of multiple devices. 🚀 Inference Speed: When benchmarked, Octopus-V2-2B demonstrates a remarkable inference speed, outperforming the combination of "Llama7B + RAG solution" by a factor of 36X on a single A100 GPU. Furthermore, compared to GPT-4-turbo (gpt-4-0125-preview), which relies on clusters A100/H100 GPUs, Octopus-V2-2B is 168% faster. This efficiency is attributed to our functional token design. 🐙 Accuracy: Octopus-V2-2B not only excels in speed but also in accuracy, surpassing the "Llama7B + RAG solution" in function call accuracy by 31%. It achieves a function call accuracy comparable to GPT-4 and RAG + GPT-3.5, with scores ranging between 98% and 100% across benchmark datasets. 💪 Function Calling Capabilities: Octopus-V2-2B is capable of generating individual, nested, and parallel function calls across a variety of complex scenarios. You can run the model on a GPU using the following code. The benchmark result can be viewed in this excel, which has been manually verified. Microsoft's Phi-3 model achieved an accuracy of 45.7%, with an average inference latency of 10.2 seconds. Meanwhile, Apple's OpenELM was unable to generate a function call, as shown in this screenshot. Additionally, OpenELM's score on the MMLU benchmark is quite low at 26.7, compared to Google's Gemma 2B, which scored 42.3. Note: One can notice that the query includes all necessary parameters used for a function. It is expected that query includes all parameters during inference as well. Training Data We wrote 20 Android API descriptions to used to train the models, see this file for details. The Android API implementations for our demos, and our training data will be published later. Below is one Android API description example License This model was trained on commercially viable data. For use of our model, refer to the license information. References We thank the Google Gemma team for their amazing models! Contact Please contact us to reach out for any issues and comments!

NaNK
license:cc-by-nc-4.0
316
2

Qwen3-Reranker-8B-GGUF

NaNK
license:apache-2.0
315
3

granite-3.0-8b-instruct-GGUF

NaNK
license:apache-2.0
313
2

leniachat-qwen2-1.5B-v0-GGUF

NaNK
license:apache-2.0
312
4

deepseek-coder-1.3B-kexer-GGUF

NaNK
license:apache-2.0
312
3

Apollo2-7B-GGUF

NaNK
license:apache-2.0
312
3

Gemma-2-2b-Chinese-it-GGUF

NaNK
license:gpl-3.0
311
5

Captain_BMO-0.420-Magnum-12B-GGUF

NaNK
310
3

Meta-Llama-Guard-2-8B-GGUF

NaNK
llama
309
12

Ko-Llama-3.1-8B-Lexi-Uncensored-V2-GGUF

NaNK
base_model:Orenguteng/Llama-3.1-8B-Lexi-Uncensored-V2
309
2

L3-bluuwhale-SAO-MIX-8B-V1_fp32-merge-calc-GGUF

NaNK
309
2

Llama3.1-8B-Cobalt-GGUF

NaNK
llama
309
1

Fireball-Meta-Llama-3.2-8B-Instruct-agent-003-128k-code-DPO-GGUF

NaNK
llama
308
5

SuperCorrect-7B-GGUF

NaNK
license:apache-2.0
307
2

Wayfarer-2-12B-GGUF

QuantFactory/Wayfarer-2-12B-GGUF This is quantized version of LatitudeGames/Wayfarer-2-12B created using llama.cpp We’ve heard over and over from AI Dungeon players that modern AI models are too nice, never letting them fail or die. While it may be good for a chatbot to be nice and helpful, great stories and games aren’t all rainbows and unicorns. They have conflict, tension, and even death. These create real stakes and consequences for characters and the journeys they go on. We created Wayfarer as a response, and after much testing, feedback and refining, we’ve developed a worthy sequel. Wayfarer 2 further refines the formula that made the original Wayfarer so popular, slowing the pacing, increasing the length and detail of responses and making death a distinct possibility for all characters—not just the user. The stakes have never been higher! If you want to try this model for free, you can do so at https://aidungeon.com. We plan to continue improving and open-sourcing similar models, so please share any and all feedback on how we can improve model behavior. Below we share more details on how Wayfarer was created. Wayfarer 2 12B received SFT training with a simple three ingredient recipe: the Wayfarer 2 dataset itself, a series of sentiment-balanced roleplay transcripts and a small instruct core to help retain its instructional capabilities. Wayfarer’s text adventure data was generated by simulating playthroughs of published character creator scenarios from AI Dungeon. Five distinct user archetypes played through each scenario, whose character starts all varied in faction, location, etc. to generate five unique samples. One language model played the role of narrator, with the other playing the user. They were blind to each other’s underlying logic, so the user was actually capable of surprising the narrator with their choices. Each simulation was allowed to run for 8k tokens or until the main character died. Wayfarer’s general emotional sentiment is one of pessimism, where failure is frequent and plot armor does not exist for anyone. This serves to counter the positivity bias so inherent in our language models nowadays. The Nemo architecture is known for being sensitive to higher temperatures, so the following settings are recommended as a baseline. Nothing stops you from experimenting with these, of course. Wayfarer was trained exclusively on second-person present tense data (using “you”) in a narrative style. Other perspectives will work as well but may produce suboptimal results. Thanks to Gryphe Padar for collaborating on this finetune with us!

NaNK
license:apache-2.0
307
2

gemma-2-Ifable-9B-GGUF

NaNK
306
12

Llama-3-Open-Ko-8B-GGUF

NaNK
llama
306
1

Dorna-Llama3-8B-Instruct-GGUF

NaNK
llama-3
304
2

L3-8B-Lunaris-v1-GGUF

NaNK
304
1

Llama-3.1-Hawkish-8B-GGUF

NaNK
303
5

Daredevil-8B-GGUF

NaNK
302
1

Llama3-8B-Chinese-Chat-GGUF

NaNK
llama-factory
300
11

Llama-3-ELYZA-JP-8B-GGUF

NaNK
base_model:elyza/Llama-3-ELYZA-JP-8B
300
4

magnaraxy-9b-GGUF

NaNK
license:apache-2.0
299
2

Evil-Alpaca-3B-L3.2-GGUF

NaNK
base_model:meta-llama/Llama-3.2-3B-Instruct
299
1

Llama-3-Instruct-8B-DPO-GGUF

NaNK
base_model:princeton-nlp/Llama-3-Instruct-8B-DPO
299
0

Qwen3-8B-GGUF

QuantFactory/Qwen3-8B-GGUF This is quantized version of Qwen/Qwen3-8B created using llama.cpp Qwen3 is the latest generation of large language models in Qwen series, offering a comprehensive suite of dense and mixture-of-experts (MoE) models. Built upon extensive training, Qwen3 delivers groundbreaking advancements in reasoning, instruction-following, agent capabilities, and multilingual support, with the following key features: - Uniquely support of seamless switching between thinking mode (for complex logical reasoning, math, and coding) and non-thinking mode (for efficient, general-purpose dialogue) within single model, ensuring optimal performance across various scenarios. - Significantly enhancement in its reasoning capabilities, surpassing previous QwQ (in thinking mode) and Qwen2.5 instruct models (in non-thinking mode) on mathematics, code generation, and commonsense logical reasoning. - Superior human preference alignment, excelling in creative writing, role-playing, multi-turn dialogues, and instruction following, to deliver a more natural, engaging, and immersive conversational experience. - Expertise in agent capabilities, enabling precise integration with external tools in both thinking and unthinking modes and achieving leading performance among open-source models in complex agent-based tasks. - Support of 100+ languages and dialects with strong capabilities for multilingual instruction following and translation. Qwen3-8B has the following features: - Type: Causal Language Models - Training Stage: Pretraining & Post-training - Number of Parameters: 8.2B - Number of Paramaters (Non-Embedding): 6.95B - Number of Layers: 36 - Number of Attention Heads (GQA): 32 for Q and 8 for KV - Context Length: 32,768 natively and 131,072 tokens with YaRN. For more details, including benchmark evaluation, hardware requirements, and inference performance, please refer to our blog, GitHub, and Documentation. The code of Qwen3 has been in the latest Hugging Face `transformers` and we advise you to use the latest version of `transformers`. With `transformers ) index = len(outputids) - outputids[::-1].index(151668) except ValueError: index = 0 thinkingcontent = tokenizer.decode(outputids[:index], skipspecialtokens=True).strip("\n") content = tokenizer.decode(outputids[index:], skipspecialtokens=True).strip("\n") print("thinking content:", thinkingcontent) print("content:", content) shell python -m sglang.launchserver --model-path Qwen/Qwen3-8B --reasoning-parser qwen3 shell vllm serve Qwen/Qwen3-8B --enable-reasoning --reasoning-parser deepseekr1 python text = tokenizer.applychattemplate( messages, tokenize=False, addgenerationprompt=True, enablethinking=True # True is the default value for enablethinking ) python text = tokenizer.applychattemplate( messages, tokenize=False, addgenerationprompt=True, enablethinking=False # Setting enablethinking=False disables thinking mode ) python from transformers import AutoModelForCausalLM, AutoTokenizer class QwenChatbot: def init(self, modelname="Qwen/Qwen3-8B"): self.tokenizer = AutoTokenizer.frompretrained(modelname) self.model = AutoModelForCausalLM.frompretrained(modelname) self.history = [] def generateresponse(self, userinput): messages = self.history + [{"role": "user", "content": userinput}] text = self.tokenizer.applychattemplate( messages, tokenize=False, addgenerationprompt=True ) inputs = self.tokenizer(text, returntensors="pt") responseids = self.model.generate(inputs, maxnewtokens=32768)[0][len(inputs.inputids[0]):].tolist() response = self.tokenizer.decode(responseids, skipspecialtokens=True) # Update history self.history.append({"role": "user", "content": userinput}) self.history.append({"role": "assistant", "content": response}) Example Usage if name == "main": chatbot = QwenChatbot() # First input (without /think or /nothink tags, thinking mode is enabled by default) userinput1 = "How many r's in strawberries?" print(f"User: {userinput1}") response1 = chatbot.generateresponse(userinput1) print(f"Bot: {response1}") print("----------------------") # Second input with /nothink userinput2 = "Then, how many r's in blueberries? /nothink" print(f"User: {userinput2}") response2 = chatbot.generateresponse(userinput2) print(f"Bot: {response2}") print("----------------------") # Third input with /think userinput3 = "Really? /think" print(f"User: {userinput3}") response3 = chatbot.generateresponse(userinput3) print(f"Bot: {response3}") python from qwenagent.agents import Assistant # Use the endpoint provided by Alibaba Model Studio: # 'modeltype': 'qwendashscope', # 'apikey': os.getenv('DASHSCOPEAPIKEY'), # Use a custom endpoint compatible with OpenAI API: 'modelserver': 'http://localhost:8000/v1', # apibase 'apikey': 'EMPTY', # Other parameters: # 'generatecfg': { # # Add: When the response content is ` this is the thought this is the answer; # # Do not add: When the response has been separated by reasoningcontent and content. # 'thoughtincontent': True, # }, } Define Tools tools = [ {'mcpServers': { # You can specify the MCP configuration file 'time': { 'command': 'uvx', 'args': ['mcp-server-time', '--local-timezone=Asia/Shanghai'] }, "fetch": { "command": "uvx", "args": ["mcp-server-fetch"] } } }, 'codeinterpreter', # Built-in tools ] Define Agent bot = Assistant(llm=llmcfg, functionlist=tools) Streaming generation messages = [{'role': 'user', 'content': 'https://qwenlm.github.io/blog/ Introduce the latest developments of Qwen'}] for responses in bot.run(messages=messages): pass print(responses) json { ..., "ropescaling": { "ropetype": "yarn", "factor": 4.0, "originalmaxpositionembeddings": 32768 } } shell vllm serve ... --rope-scaling '{"ropetype":"yarn","factor":4.0,"originalmaxpositionembeddings":32768}' --max-model-len 131072 shell python -m sglang.launchserver ... --json-model-override-args '{"ropescaling":{"ropetype":"yarn","factor":4.0,"originalmaxpositionembeddings":32768}}' shell llama-server ... --rope-scaling yarn --rope-scale 4 --yarn-orig-ctx 32768 > Unrecognized keys in `ropescaling` for 'ropetype'='yarn': {'originalmaxpositionembeddings'} > @misc{qwen3technicalreport, title={Qwen3 Technical Report}, author={Qwen Team}, year={2025}, eprint={2505.09388}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2505.09388}, } ```

NaNK
license:apache-2.0
298
2

zephyr-7b-beta-GGUF

NaNK
license:mit
298
0

Nemotron-Research-Reasoning-Qwen-1.5B-GGUF

NaNK
license:cc-by-nc-4.0
297
3

Mistral-Nemo-Gutenberg-Doppel-12B-v2-GGUF

NaNK
license:apache-2.0
297
2

Neural-SOVLish-Devil-8B-L3-GGUF

NaNK
license:cc-by-nc-4.0
297
0

gemma-advanced-v1-GGUF

NaNK
295
3

Qwen2.5-Math-1.5B-GGUF

NaNK
license:apache-2.0
295
1

granite-7b-lab-GGUF

NaNK
license:apache-2.0
293
1

Hermes-3-Llama-3.2-3B-GGUF

NaNK
Llama-3
290
3

shieldgemma-2b-GGUF

NaNK
290
1

Llama-3.2-3B-Instruct-uncensored-GGUF

NaNK
289
5

Llama-Deepsync-1B-GGUF

NaNK
Llama
289
3

Apollo2-2B-GGUF

NaNK
289
3

Dolphin3.0-Llama3.1-8B-GGUF

NaNK
base_model:meta-llama/Llama-3.1-8B
289
2

Llama-3.1-8B-OmniMatrix-v2-GGUF

NaNK
base_model:bunnycore/LLama-3.1-8B-HyperNova-abliteration
288
2

LFM2-1.2B-RAG-GGUF

aashish1904/LFM2-1.2B-RAG-GGUF This is quantized version of LiquidAI/LFM2-1.2B-RAG created using llama.cpp Based on LFM2-1.2B, LFM2-1.2B-RAG is specialized in answering questions based on provided contextual documents, for use in RAG (Retrieval-Augmented Generation) systems. - Chatbot to ask questions about the documentation of a particular product. - Custom support with an internal knowledge base to provide grounded answers. - Academic research assistant with multi-turn conversations about research papers and course materials. You can find more information about other task-specific models in this blog post. Generation parameters: We recommend using greedy decoding with a `temperature=0`. System prompt: The system prompt is optional. You can force the output's language, for example, using "Always respond in English, regardless of the user's input language." By default, the output's language follows the user prompt's language. Supported languages: English, Arabic, Chinese, French, German, Japanese, Korean, Portuguese, and Spanish. Training approach: We fine-tuned the LFM2-1.2B-RAG model on a dataset that includes 1M+ samples of multi-turn interactions and multi-document samples consisting of a mix of curated open source documents as well as generated synthetic ones. Chat template: LFM2 uses a ChatML-like chat template as follows: You can automatically apply it using the dedicated `.applychattemplate()` function from Hugging Face transformers. > [!WARNING] > ⚠️ The model supports both single-turn and multi-turn conversations. RAG systems enable AI solutions to include new, up-to-date, and potentially proprietary information in LLM responses that was not present in the training data. When a user asks a question, the retrieval component locates and delivers related documents from a knowledge base, and then the RAG generator model answers the question based on facts from those contextual documents. - Hugging Face: LFM2-1.2B - llama.cpp: LFM2-1.2B-Extract-GGUF - LEAP: LEAP model library If you are interested in custom solutions with edge deployment, please contact our sales team.

NaNK
286
3

Mistral-Nemo-Base-2407-GGUF-iamlemec

license:apache-2.0
286
1

L3-OVA-Test-8B-GGUF

NaNK
base_model:meta-llama/Meta-Llama-3-8B-Instruct
284
1

Teleut-7b-GGUF

NaNK
license:apache-2.0
282
2

saiga_gemma2_10b-GGUF

NaNK
281
3

minecraft-GGUF

NaNK
base_model:TinyLlama/TinyLlama-1.1B-Chat-v1.0
281
3

AMD-OLMo-1B-GGUF

NaNK
license:apache-2.0
281
2

Llama-3.2-3B-Agent007-Coder-GGUF

NaNK
llama
280
5

NuminiLlama-3.1-8B-GGUF

NaNK
280
2

gemma2-gutenberg-9B-GGUF

NaNK
280
2

Qwen2.5-Coder-1.5B-GGUF

NaNK
license:apache-2.0
279
2

LLaMA-Mesh-GGUF

license:llama3.1
278
3

DRT-o1-7B-GGUF

NaNK
license:cc-by-nc-sa-4.0
278
2

Meraj-Mini-GGUF

NaNK
license:apache-2.0
275
3

OpenReasoning-Nemotron-7B-GGUF

NaNK
license:cc-by-4.0
275
2

Llama-3-8B-Magpie-Pro-MT-SFT-v0.1-GGUF

NaNK
base_model:Magpie-Align/Llama-3-8B-Magpie-Align-SFT-v0.1
275
0

Qwen2.5-7B-HomerAnvita-NerdMix-GGUF

NaNK
license:apache-2.0
273
6

Meditron3-8B-GGUF

NaNK
273
3

Llama-3-8B-16K-GGUF

NaNK
base_model:mattshumer/Llama-3-8B-16K
273
0

NVIDIA-Llama3-ChatQA-1.5-8B-GGUF

NaNK
llama-3
272
2

Llama-3.2-Taiwan-3B-Instruct-GGUF

NaNK
llama3.2
271
2

Llama-3.1-8B-Lexi-Uncensored-V2-GGUF

NaNK
license:llama3.1
271
1

TherapyLlama-8B-v1-GGUF

NaNK
license:cc-by-nc-4.0
270
1

phi-4-GGUF

license:mit
269
4

lola_v1-GGUF

license:cc-by-4.0
268
3

HelpingAI-3B-hindi-GGUF

NaNK
268
1

Llama3-DocChat-1.0-8B-GGUF

NaNK
llama-3
268
1

LLama-3.1-8B-Matrix-GGUF

NaNK
base_model:bunnycore/LLama-3.1-8B-HyperNova-abliteration
267
2

Qwen2.5-14B-GGUF

NaNK
license:apache-2.0
267
2

L3-SthenoMaidBlackroot-8B-V1-GGUF

NaNK
266
9

ValueLlama-3-8B-GGUF

NaNK
llama-factory
266
1

L3.1-Celestial-Stone-2x8B-GGUF

NaNK
llama
264
2

Teuken-7B-instruct-research-v0.4-GGUF

NaNK
264
2

MachineLearningLM-7B-v1-GGUF

QuantFactory/MachineLearningLM-7B-v1-GGUF This is quantized version of MachineLearningLM/MachineLearningLM-7B-v1 created using llama.cpp This repository contains the model presented in the paper MachineLearningLM: Scaling Many-shot In-context Learning via Continued Pretraining. Introducing MachineLearningLM 🧪📊 — a model continuously pretrained on millions of synthetic tabular ML tasks, enabling robust many-shot in-context learning. 📈 ​​~15% improvement​​ on unseen tabular tasks compared to o3-mini / GPT-5-mini / Qwen-2.5-7B-Instruct 🌲 ​​Random-Forest–level numerical modeling robustness​​ 📄 Read the paper: https://huggingface.co/papers/2509.06806 GitHub: https://github.com/HaoAreYuDong/MachineLearningLM We have developed an automated evaluation framework — simply configure the parameters to easily perform validation and evaluation. The code is now open-sourced at our GitHub repository. This part of the code needs to run in an environment with the tabicl and openpyxl libraries installed. The evaluation code for tabicl is placed separately in the `./src/evaluation/tabiclevaluate.py` file. Use `./scripts/tabiclevaluate.sh` to obtain the evaluation results for tabicl. Use --datasets to specify the datasets to be evaluated, and --samplesizes to indicate the number of shots. If multiple datasets need to be evaluated, separate them with spaces. To evaluate all CSV files in the input folder, use all. MachineLearningLM uses the code from tabicl to generate prior data. Use `./scripts/generatedata.sh` to generate the prior data. It generates the corresponding .pt and .csv files, and normalizes the feature values in the CSV files to the range of 0–999, as we did in the paper. Parameter Introduction(refer to the comments in the file `tabicl\src\tabicl\prior\dataset.py`) | Parameter | Type | Description | | :------------- | :--- | :------------------------------------------------------ | | `minfeatures` | int | Minimum number of features per dataset | | `maxfeatures` | int | Maximum number of features per dataset | | `maxclasses` | int | Maximum number of target classes | | `minseqlen` | int | Minimum samples per dataset. Uses `maxseqlen` if None | | `maxseqlen` | int | Maximum samples per dataset (Not Include) | | Parameter | Type | Description | | :--------------------- | :--- | :----------------------------------------------------------- | | `batchsize` | int | Total number of datasets to generate per batch | | `batchsizepergp` | int | Number of datasets per group (shared characteristics) | | `batchsizepersubgp` | int | Number of datasets per subgroup (similar causal structures). Defaults to `batchsizepergp` if None | | Parameter | Type | Description | | :--------------- | :--- | :----------------------------------------------------------- | | `logseqlen` | bool | Sample sequence length from log-uniform distribution if True | | `seqlenpergp` | bool | Sample sequence length per group (enables variable-sized datasets) | | `replaysmall` | bool | Occasionally sample smaller sequences for model robustness | | Parameter | Type | Description | | :--------------- | :-------- | :----------------------------------------------------------- | | `mintrainsize` | int/float | Start position/ratio for train split (int: absolute, float: fractional) | | `maxtrainsize` | int/float | End position/ratio for train split (int: absolute, float: fractional) | | Parameter | Type | Description | | :----------- | :--- | :----------------------------------------------------------- | | `priortype` | str | Prior type: 'mlpscm', 'treescm', or 'mixscm' (random selection) | | `fixedhp` | dict | Fixed structural configuration parameters | | `sampledhp` | dict | Parameters sampled during generation | | Parameter | Type | Description | | :------------------------- | :--- | :------------------------------------------------ | | `njobs` | int | Number of parallel jobs (-1 = use all processors) | | `numthreadspergenerate` | int | Number of threads per generation job | | `device` | str | Computation device ('cpu' or 'cuda') | MachineLearningLM uses the LLaMA-Factory framework for training.

NaNK
license:apache-2.0
263
5

Llama-3.1-8B-Lexi-Uncensored-GGUF

NaNK
license:llama3.1
263
3

BRAG-Llama-3.1-8b-v0.1-GGUF

NaNK
llama3.1
262
4

oxy-1-micro-GGUF

NaNK
license:apache-2.0
262
4

Llama-3-Taiwan-8B-Instruct-DPO-GGUF

NaNK
262
2

ArliAI-Llama-3-8B-Dolfin-v0.2-Instruct-GGUF

NaNK
base_model:OwenArli/ArliAI-Llama-3-8B-Dolfin-v0.2-Instruct
262
1

Qwen2.5-7B-Instruct-1M-GGUF

NaNK
license:apache-2.0
260
3

ArliAI-Llama-3-8B-Cumulus-v0.2-GGUF

NaNK
base_model:OwenArli/ArliAI-Llama-3-8B-Cumulus-v0.2
260
1

AI-Sweden-Llama-3-8B-GGUF

NaNK
llama
260
0

ELYZA-japanese-Llama-2-13b-fast-instruct-GGUF

NaNK
base_model:elyza/ELYZA-japanese-Llama-2-13b-fast-instruct
258
2

Llama3.1-ArrowSE-v0.4-GGUF

NaNK
base_model:elyza/Llama-3-ELYZA-JP-8B
258
1

L3-Aethora-15B-GGUF

NaNK
llama-factory
257
1

Qwen2-7B-GGUF

NaNK
license:apache-2.0
255
3

Qwen2-Math-7B-GGUF

NaNK
license:apache-2.0
255
2

MN-12B-Celeste-V1.9-GGUF

NaNK
license:apache-2.0
255
1

L3-8B-Stheno-v3.3-32K-GGUF

NaNK
license:cc-by-nc-4.0
254
0

Sparse-Llama-3.1-8B-2of4-GGUF

QuantFactory/Sparse-Llama-3.1-8B-2of4-GGUF This is quantized version of neuralmagic/Sparse-Llama-3.1-8B-2of4 created using llama.cpp Model Overview - Model Architecture: Llama-3.1-8B - Input: Text - Output: Text - Model Optimizations: - Sparsity: 2:4 - Release Date: 11/20/2024 - Version: 1.0 - License(s): llama3.1 - Model Developers: Neural Magic This is the 2:4 sparse version of Llama-3.1-8B. On the OpenLLM benchmark (version 1), it achieves an average score of 62.16, compared to 63.19 for the dense model—demonstrating a 98.37% accuracy recovery. On the Mosaic Eval Gauntlet benchmark (version v0.3), it achieves an average score of 53.85, versus 55.34 for the dense model—representing a 97.3% accuracy recovery. This model was obtained by pruning all linear operators within transformer blocks to the 2:4 sparsity pattern: in each group of four weights, two are retained while two are pruned. In addition to pruning, the sparse model was trained with knowledge distillation for 13B tokens to recover the accuracy loss incurred by pruning. For pruning, we utilize optimized version of SparseGPT through LLM-Compressor, and for sparse training with knowledge distillation we utilize SquareHead approach. This model can be deployed efficiently using the vLLM backend. vLLM aslo supports OpenAI-compatible serving. See the documentation for more details. This model was evaluated on the OpenLLM benchmark (version 1) with the vLLM engine for faster inference. In addition to the OpenLLM benchmark, the model was evaluated on the Mosaic Eval Gauntlet benchmark (version v0.3). The evaluation results are summarized below.

NaNK
base_model:meta-llama/Llama-3.1-8B
253
4

saiga_llama3_8b-GGUF

NaNK
llama
252
6

Llama-3-8B-Magpie-Pro-SFT-v0.1-GGUF

NaNK
base_model:Magpie-Align/Llama-3-8B-Magpie-Pro-SFT-300K-v0.1
252
2

Qwen3Guard-Gen-0.6B-GGUF

aashish1904/Qwen3Guard-Gen-0.6B-GGUF This is quantized version of Qwen/Qwen3Guard-Gen-0.6B created using llama.cpp Qwen3Guard is a series of safety moderation models built upon Qwen3 and trained on a dataset of 1.19 million prompts and responses labeled for safety. The series includes models of three sizes (0.6B, 4B, and 8B) and features two specialized variants: Qwen3Guard-Gen, a generative model that frames safety classification as an instruction-following task, and Qwen3Guard-Stream, which incorporates a token-level classification head for real-time safety monitoring during incremental text generation. This repository hosts Qwen3Guard-Gen, which offers the following key advantages: Three-Tiered Severity Classification: Enables detailed risk assessment by categorizing outputs into safe, controversial, and unsafe severity levels, supporting adaptation to diverse deployment scenarios. Multilingual Support: Qwen3Guard-Gen supports 119 languages and dialects, ensuring robust performance in global and cross-lingual applications. Strong Performance: Qwen3Guard-Gen achieves state-of-the-art performance on various safety benchmarks, excelling in both prompt and response classification across English, Chinese, and multilingual tasks. For more details, please refer to our blog, GitHub, and Technical Report. The latest version of `transformers` is recommended and `transformers>=4.51.0` is required. For deployment, you can use `sglang>=0.4.6.post1` or `vllm>=0.9.0` or to create an OpenAI-compatible API endpoint: Here is an example API call using OpenAI-Compatible server: In Qwen3Guard, potential harms are classified into three severity levels: Unsafe: Content generally considered harmful across most scenarios. Controversial: Content whose harmfulness may be context-dependent or subject to disagreement across different applications. Safe: Content generally considered safe across most scenarios. In the current version of Qwen3Guard, we consider the following safety categories: Violent: Content that provides detailed instructions, methods, or advice on how to commit acts of violence, including the manufacture, acquisition, or use of weapons. Also includes depictions of violence. Non-violent Illegal Acts: Content providing guidance or advice for non-violent illegal activities like hacking, unauthorized drug production, or stealing. Sexual Content or Sexual Acts: Content offering any sexual imagery, references, or descriptions featuring individuals. Also includes content that describes explicit sexual imagery, references, or descriptions containing illegal or unethical sexual acts, such as rape, bestiality, incest, and sexual slavery. Personally Identifiable Information: Content offering unauthorized sharing or disclosure of sensitive personal identifying information, such as name, ID number, address, phone number, medical records, financial details, and account passwords, etc. Suicide & Self-Harm: Content advocating, directly encouraging, or detailing methods for self-harm, suicide, or dangerous activities that could lead to serious injury or death. Unethical Acts: Any immoral or unethical content or acts, including but not limited to bias, discrimination, stereotype, injustice, hate speech, offensive language, harassment, insults, threat, defamation, extremism, misinformation regarding ethics, and other behaviors that while not illegal are still considered unethical. Politically Sensitive Topics: The deliberate creation or spread of false information about government actions, historical events, or public figures that is demonstrably untrue and poses risk of public deception or social harm. Copyright Violation: Content offering unauthorized reproduction, distribution, public display, or derivative use of copyrighted materials, such as novels, scripts, lyrics, and other creative works protected by law, without the explicit permission of the copyright holder. Jailbreak (Only for input): Content that explicitly attempts to override the model's system prompt or model conditioning. If you find our work helpful, feel free to give us a cite.

NaNK
license:apache-2.0
252
1

ScaduTorrent1.1-8b-model_stock-GGUF

NaNK
base_model:Blackroot/Llama-3-8B-Abomination-LORA
251
2

Vikhr-Llama-3.2-1B-Instruct-GGUF

NaNK
base_model:meta-llama/Llama-3.2-1B-Instruct
251
2

Llama-3.1-Literotica-8B-GGUF

NaNK
llama
249
3

gemma-2-2b-jpn-it-GGUF

NaNK
249
2

Bio-Medical-Llama-3-8B-GGUF

QuantFactory/Bio-Medical-Llama-3-8B-GGUF This is quantized version of ContactDoctor/Bio-Medical-Llama-3-8B created using llama.cpp This model is a fine-tuned version of https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct on our custom "BioMedData" dataset. Training Data: Custom high-quality biomedical dataset Dataset Composition: The dataset comprises both synthetic and manually curated samples, ensuring a diverse and comprehensive coverage of biomedical knowledge. Bio-Medical-Llama-3-8B model is a specialized large language model designed for biomedical applications. It is finetuned from the meta-llama/Meta-Llama-3-8B-Instruct model using a custom dataset containing over 500,000 diverse entries. These entries include a mix of synthetic and manually curated data, ensuring high quality and broad coverage of biomedical topics. The model is trained to understand and generate text related to various biomedical fields, making it a valuable tool for researchers, clinicians, and other professionals in the biomedical domain. Bio-Medical-Llama-3-8B model outperforms many of the leading LLMs and find below its metrics evaluated using the Eleuther AI Language Model Evaluation Harness framework against the tasks medmcqa, medqa4options, mmluanatomy, mmluclinicalknowledge, mmlucollegebiology, mmlucollegemedicine, mmlumedicalgenetics, mmluprofessionalmedicine and pubmedqa. Bio-Medical-Llama-3-8B model is intended for a wide range of applications within the biomedical field, including: 1. Research Support: Assisting researchers in literature review and data extraction from biomedical texts. 2. Clinical Decision Support: Providing information to support clinical decision-making processes. 3. Educational Tool: Serving as a resource for medical students and professionals seeking to expand their knowledge base. While Bio-Medical-Llama-3-8B model performs well in various biomedical NLP tasks, users should be aware of the following limitations: > Biases: The model may inherit biases present in the training data. Efforts have been made to curate a balanced dataset, but some biases may persist. > Accuracy: The model's responses are based on patterns in the data it has seen and may not always be accurate or up-to-date. Users should verify critical information from reliable sources. > Ethical Use: The model should be used responsibly, particularly in clinical settings where the stakes are high. It should complement, not replace, professional judgment and expertise. pipeline = transformers.pipeline( "text-generation", model=modelid, modelkwargs={"torchdtype": torch.bfloat16}, devicemap="auto", ) messages = [ {"role": "system", "content": "You are an expert trained on healthcare and biomedical domain!"}, {"role": "user", "content": "I'm a 35-year-old male and for the past few months, I've been experiencing fatigue, increased sensitivity to cold, and dry, itchy skin. What is the diagnosis here?"}, ] prompt = pipeline.tokenizer.applychattemplate( messages, tokenize=False, addgenerationprompt=True ) terminators = [ pipeline.tokenizer.eostokenid, pipeline.tokenizer.converttokenstoids(" ") ] outputs = pipeline( prompt, maxnewtokens=256, eostokenid=terminators, dosample=True, temperature=0.6, topp=0.9, ) print(outputs[0]["generatedtext"][len(prompt):]) This model is licensed under the Bio-Medical-Llama-3-8B (Non-Commercial Use Only). Please review the terms and conditions before using the model. For further information, inquiries, or issues related to Biomed-LLM, please contact: The following hyperparameters were used during training: - learningrate: 0.0002 - trainbatchsize: 12 - evalbatchsize: 8 - seed: 42 - gradientaccumulationsteps: 4 - totaltrainbatchsize: 32 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lrschedulertype: cosine - lrschedulerwarmupratio: 0.03 - trainingsteps: 2000 - mixedprecisiontraining: Native AMP - PEFT 0.11.0 - Transformers 4.40.2 - Pytorch 2.1.2 - Datasets 2.19.1 - Tokenizers 0.19.1 If you use Bio-Medical LLM in your research or applications, please cite it as follows: @misc{ContactDoctorBio-Medical-Llama-3-8B, author = ContactDoctor, title = {Bio-Medical: A High-Performance Biomedical Language Model}, year = {2024}, howpublished = {https://huggingface.co/ContactDoctor/Bio-Medical-Llama-3-8B}, }

NaNK
base_model:meta-llama/Meta-Llama-3-8B-Instruct
248
8

StarDust-12b-v2-GGUF

NaNK
license:apache-2.0
248
5

InkubaLM-0.4B-GGUF

NaNK
license:cc-by-nc-4.0
246
2

codegeex4-all-9b-GGUF

NaNK
244
3

Lyra-Gutenberg-mistral-nemo-12B-GGUF

NaNK
license:apache-2.0
244
2

SeaLLM3-7B-Chat-GGUF

NaNK
244
1

Llama-3-Instruct-8B-RDPO-GGUF

NaNK
base_model:princeton-nlp/Llama-3-Instruct-8B-RDPO
241
0

YuLan-Mini-GGUF

license:mit
239
2

Llama-3-Ko-Luxia-Instruct-GGUF

base_model:maywell/Llama-3-Ko-Luxia-Instruct
239
1

ArliAI-Llama-3-8B-Dolfin-v0.6-Abliterated-GGUF

NaNK
llama
239
1

OpenMath2-Llama3.1-8B-GGUF

NaNK
base_model:meta-llama/Llama-3.1-8B
238
2

ArliAI-RPMax-Formax-v1-GGUF

NaNK
base_model:ArliAI/Llama-3.1-8B-ArliAI-Formax-v1.0
237
1

MagpieLM-4B-Chat-v0.1-GGUF

NaNK
235
2

Average_Normie_v3.69_8B-GGUF

NaNK
license:apache-2.0
234
0

granite-3.1-3b-a800m-base-GGUF

NaNK
license:apache-2.0
232
3

glm-4-9b-chat-GGUF

NaNK
231
4

MathCoder2-Llama-3-8B-GGUF

NaNK
base_model:meta-llama/Meta-Llama-3-8B
231
3

SecurityLLM-GGUF

license:apache-2.0
230
9

Azure_Dusk-v0.2-GGUF

license:apache-2.0
230
2

ArliAI-Llama-3-8B-Cumulus-v1.0-GGUF

NaNK
base_model:OwenArli/ArliAI-Llama-3-8B-Cumulus-v1.0
229
1

Llama-3.1-8B-Instruct-Reasoner-1o1_v0.3-GGUF

NaNK
llama
229
1

chinese-text-correction-7b-GGUF

NaNK
license:apache-2.0
227
2

Meta-Llama-3-70B-Instruct-GGUF-v2

NaNK
llama
226
16

llama-3-Nephilim-v3-8B-GGUF

NaNK
base_model:grimjim/Llama-3-Instruct-8B-SPPO-Iter3-SimPO-merge
226
1

WEBGEN-4B-Preview-GGUF

QuantFactory/WEBGEN-4B-Preview-GGUF This is quantized version of Tesslate/WEBGEN-4B-Preview created using llama.cpp Tesslate • Research Preview WEBGEN-4B-Preview A 4B web-only generator that turns one prompt into clean, responsive HTML/CSS/Tailwind . Small enough for laptops; opinionated for consistent, modern layouts. TRY IT HERE! Get on Designer Open weights Web-only bias Mobile-first output No external JS by default What it is WEBGEN-4B-Preview focuses solely on generating production-lean websites. It prefers semantic HTML, sane spacing, and modern component blocks (hero, grids, pricing, FAQ). Why 4B Small enough for local runs and fast iteration, while retaining strong structure/consistency for HTML/CSS/Tailwind output. from transformers import AutoModelForCausalLM, AutoTokenizer import torch modelid = "Tesslate/WEBGEN-4B-Preview" tok = AutoTokenizer.frompretrained(modelid) model = AutoModelForCausalLM.frompretrained( modelid, torchdtype=torch.bfloat16, devicemap="auto" ) prompt = """Make a single-file landing page for 'LatticeDB'. Style: modern, generous whitespace, Tailwind, rounded-xl, soft gradients. Sections: navbar, hero (headline + 2 CTAs), features grid, pricing (3 tiers), FAQ accordion, footer. Constraints: semantic HTML, no external JS.""" inputs = tok(prompt, returntensors="pt").to(model.device) out = model.generate(inputs, maxnewtokens=2000, temperature=0.7, topp=0.9) print(tok.decode(out[0], skipspecialtokens=True)) vllm serve Tesslate/WEBGEN-4B-Preview \ --host 0.0.0.0 --port 8000 \ --max-model-len 65536 \ --gpu-memory-utilization 0.92 python -m sglang.launchserver \ --model-path Tesslate/WEBGEN-4B-Preview \ --host 0.0.0.0 --port 5000 \ --mem-fraction-static 0.94 \ --attention-backend flashinfer \ --served-model-name webgen-4b > Tip: Lower temperature (e.g., `0.4–0.6`) yields stricter, cleaner markup. Raise it for more visual variety. temperature 0.6 Balance creativity &amp; consistency (lower if quantized) topp 0.9 Nucleus sampling topk 40 Optional vocab restriction maxnewtokens 1200–2500 Single-file sites often fit &lt; 1500 repetitionpenalty 1.1 Reduces repetitive classes/markup Make a single-file landing page for "RasterFlow" (GPU video pipeline). Style: modern tech, muted palette, Tailwind, rounded-xl, subtle gradients. Sections: navbar, hero (big headline + 2 CTAs), logos row, features (3x cards), code block (copyable), pricing (3 tiers), FAQ accordion, footer. Constraints: semantic HTML, no external JS. Return ONLY the HTML code. Use an 8pt spacing system. Palette: slate with indigo accents. Typography scale: 14/16/18/24/36/56. Max width: 1200px. Avoid shadows &gt; md; prefer borders/dividers. BF16 8.05 GB Fastest, best fidelity GGUF Q5KM 2.89 GB Great quality/size trade-off GGUF Q4KM 2.5 GB Smallest comfortable for laptops - Primary: Generate complete, single-file websites (landing pages, marketing pages, simple docs) with semantic HTML and Tailwind classes. - Secondary: Component blocks (hero, pricing, FAQ) for manual composition. Accessibility: adds headings/labels but ARIA coverage may need review. JS widgets: kept light unless explicitly requested in prompt. Curate prompts appropriately. When using third-party logos/assets, ensure you have rights or use open sources. - Base: Qwen/Qwen3-4B-Instruct - Objective: Tight web-only bias; reward semantic structure, spacing rhythm, and responsiveness. - Data: Mixture of curated HTML/CSS/Tailwind snippets, component libraries, and synthetic page specs. - Recipe: SFT with format constraints → instruction tuning → style/rhythm preference optimization. - Context: effective ~64k; trained to keep default outputs within practical page length. - Examples: uigenoutput.tesslate.com - Discord: discord.gg/EcCpcTv93U - Website: tesslate.com “Why are good design models so expensive” — Tesslate Team Marketing page (product launch) SaaS pricing + FAQ Docs-style layout

NaNK
license:apache-2.0
225
2

instruction-synthesizer-GGUF

NaNK
license:apache-2.0
224
7

Nemotron-Mini-4B-Instruct-GGUF

NaNK
224
2

Llama-3-8B-TKK-Elite-V1.0-GGUF

NaNK
base_model:tarikkaankoc7/Llama-3-8B-TKK-Elite-V1.0
223
0

palmyra-mini-thinking-a-GGUF

license:apache-2.0
222
2

L3-Deluxe-Scrambled-Eggs-On-Toast-8B-GGUF

NaNK
base_model:Danielbrdz/Barcenas-Llama3-8b-ORPO
221
5

arcee-lite-GGUF

license:apache-2.0
221
1

Hermes-2-Pro-Llama-3-8B-GGUF

NaNK
Llama-3
221
0

magnum-v3-9b-customgemma2-GGUF

NaNK
220
2

Hercules-5.0-Qwen2-1.5B-GGUF

NaNK
license:apache-2.0
220
1

Qwen2.5-Math-7B-GGUF

NaNK
license:apache-2.0
218
4

Llama-3.1-8B-Instruct-Fei-v1-Uncensored-GGUF

NaNK
llama
218
3

Qwen3-4B-GGUF

NaNK
license:apache-2.0
217
3

II-Medical-8B-GGUF

NaNK
217
3

llama-3.1-chinese-8b-it-dpo-GGUF

NaNK
216
10

palmyra-mini-thinking-b-GGUF

license:apache-2.0
216
2

Llama-3-Instruct-8B-SimPO-ExPO-GGUF

NaNK
base_model:chujiezheng/Llama-3-Instruct-8B-SimPO-ExPO
216
0

TableGPT2-7B-GGUF

NaNK
license:apache-2.0
215
4

gpt5o-reflexion-q-agi-llama-3.1-8b-GGUF

NaNK
license:mit
215
3

llama-3-Korean-Bllossom-8B-GGUF

NaNK
base_model:meta-llama/Meta-Llama-3-8B
215
2

Ahma-7B-GGUF

NaNK
llama
215
0

L3-8B-Stheno-v3.1-GGUF

NaNK
license:cc-by-nc-4.0
214
3

Art-0-8B-GGUF

QuantFactory/Art-0-8B-GGUF This is quantized version of AGI-0/Art-0-8B created using llama.cpp Art-0-8B: Reasoning the way you want it to with Adaptive Thinking Art-0-8B is the first open-source LLM that allows users to explicitly control its reasoning methodology through direct prompting instructions. This experimental model is fine-tuned on Qwen3-8B using a specialized dataset that makes the model's thinking style directly controllable through system prompts, similar to how you would instruct an LLM to adopt a specific persona or output format. The model supports reasoning control through a structured system prompt format: While the model is primarily trained to implement adaptive thinking based on system prompt instructions, it can also respond to reasoning style changes requested during mid-conversation, though this functionality may not be consistently reliable. Some of the benefits that Adaptive Thinking enables: - Direct control over AI reasoning patterns and output structure - Enhanced experimentation with reasoning models and potential for RL strategies that optimize thinking styles - Improved safety through explicit control over the reasoning process - Customizable thinking approaches tailored to specific tasks While prompt engineering has long been known to improve LLM performance, Art-0 represents the first approach that gives users direct control over the internal reasoning process of LLMs. This capability allows models to be configured with optimal thinking patterns for different use cases. If you like this, please consider leaving a like on the repository—it would help us, and if you can, also leave feedback in the community section. If you are a solo researcher who wants to collaborate with other researchers from all around the world on building open source AGI, you can join our community. Also if you are not a researcher or technical, you can still join to connect with brilliant minds in this industry, learn, access exclusive models, and vote on research directions. We are AGI-0. We are a global decentralized research lab. Our ultimate goal is to make open source AGI. We move fast. We build without asking permission. While others debate timelines and theoretical risks, we focus on one thing: advancing capabilities as rapidly as possible through unrestricted research collaboration. We believe AGI development shouldn't be slowed down by institutional barriers. We build the future at maximum velocity. → Try more examples in our interactive demo Example: Thinking in Rap Lyrics Special thanks to the Qwen team for their excellent base model and permissive license, and to all the supporters of this work.

NaNK
license:apache-2.0
214
3

OneLLM-Doey-V1-Llama-3.2-3B-GGUF

QuantFactory/OneLLM-Doey-V1-Llama-3.2-3B-GGUF This is quantized version of DoeyLLM/OneLLM-Doey-V1-Llama-3.2-3B created using llama.cpp Model Summary This model is a fine-tuned version of LLaMA 3.2-3B, optimized using LoRA (Low-Rank Adaptation) on the NVIDIA ChatQA-Training-Data. It is tailored for conversational AI, question answering, and other instruction-following tasks, with support for sequences up to 1024 tokens. Key Features - Base Model: LLaMA 3.2-3B - Fine-Tuning Framework: LoRA - Dataset: NVIDIA ChatQA-Training-Data - Max Sequence Length: 1024 tokens - Use Case: Instruction-based tasks, question answering, conversational AI. Model Usage This fine-tuned model is suitable for: - Conversational AI: Chatbots and dialogue agents with improved contextual understanding. - Question Answering: Generating concise and accurate answers to user queries. - Instruction Following: Responding to structured prompts. - Long-Context Tasks: Processing sequences up to 1024 tokens for long-text reasoning. How to Use DoeyLLM / OneLLM-Doey-V1-Llama-3.2-3B-Instruct This guide explains how to use the DoeyLLM model on both app (iOS) and PC platforms. OneLLM brings versatile large language models (LLMs) to your device—Llama, Gemma, Qwen, Mistral, and more. Enjoy private, offline GPT and AI tools tailored to your needs. With OneLLM, experience the capabilities of leading-edge language models directly on your device, all without an internet connection. Get fast, reliable, and intelligent responses, while keeping your data secure with local processing. Follow these steps to integrate the DoeyLLM model using the OneLLM app: 1. Download OneLLM Get the app from the App Store and install it on your iOS device. 2. Load the DoeyLLM Model Use the OneLLM interface to load the DoeyLLM model directly into the app: - Navigate to the Model Library. - Search for `DoeyLLM`. - Select the model and tap Download to store it locally on your device. 3. Start Conversing Once the model is loaded, you can begin interacting with it through the app's chat interface. For example: - Tap the Chat tab. - Type your question or prompt, such as: > "Explain the significance of AI in education." - Receive real-time, intelligent responses generated locally. Key Features of OneLLM - Versatile Models: Supports various LLMs, including Llama, Gemma, and Qwen. - Private & Secure: All processing occurs locally on your device, ensuring data privacy. - Offline Capability: Use the app without requiring an internet connection. - Fast Performance: Optimized for mobile devices, delivering low-latency responses. For more details or support, visit the OneLLM App Store page. The DoeyLLM model can also be used on PC platforms through the `transformers` library, enabling robust and scalable inference for various NLP tasks. Quick Start for PC Follow these steps to use the model with Transformers: 1. Install Transformers Ensure you have `transformers >= 4.43.0` installed. Update or install it via pip: python import torch from transformers import pipeline modelid = "OneLLM-Doey-V1-Llama-3.2-3B" pipe = pipeline( "text-generation", model=modelid, torchdtype=torch.bfloat16, devicemap="auto", ) messages = [ {"role": "system", "content": "You are a pirate chatbot who always responds in pirate speak!"}, {"role": "user", "content": "Who are you?"}, ] outputs = pipe( messages, maxnewtokens=256, ) print(outputs[0]["generatedtext"][-1]) ``` As part of our responsible release strategy, we adopted a three-pronged approach to managing trust and safety risks: Enable developers to deploy helpful, safe, and flexible experiences for their target audience and the use cases supported by the model. Protect developers from adversarial users attempting to exploit the model’s capabilities to potentially cause harm. Provide safeguards for the community to help prevent the misuse of the model.

NaNK
base_model:meta-llama/Llama-3.2-3B
214
2

Lumimaid-Magnum-12B-GGUF

NaNK
213
2

gemma-2-9b-it-WPO-HB-GGUF

NaNK
213
2

llama7b-wizardlm-unfiltered-GGUF

NaNK
213
2

Llama3-German-8B-GGUF

NaNK
base_model:DiscoResearch/Llama3-German-8B
213
1

Hermes-3-Llama-3.1-8B-GGUF

NaNK
Llama-3
212
3

Llama-3-8B-ProLong-512k-Base-GGUF

NaNK
base_model:princeton-nlp/Llama-3-8B-ProLong-64k-Base
212
2

gemma-2b-aps-it-GGUF

NaNK
212
1

Medichat-Llama3-8B-GGUF

NaNK
base_model:sethuiyer/Medichat-Llama3-8B
211
4

SD-Prompter-1.5B-V0.1-GGUF

QuantFactory/SD-Prompter-1.5B-V0.1-GGUF This is quantized version of Delta-Vector/SD-Prompter-1.5B-V0.1 created using llama.cpp This is the first in a line of models dedicated to creating Stable-Diffusion prompts when given a character appearance, This has been finetuned ontop of NewEden/Qwen-1.5B-Claude. Model has been tuned with the Alapaca formatting. A typical input would look like this: I would highly recommend using the following system prompt for this model. Training The training was done for 2 epochs. I used 2 x RTX 6000s GPUs graciously provided by Kubernetes Bad for the full-parameter fine-tuning of the model.

NaNK
license:agpl-3.0
211
3

Ministral-8B-Instruct-2410-GGUF

NaNK
210
2

Llama3.1-DarkStorm-Aspire-8B-GGUF

NaNK
Llama3
208
5

wiroai-turkish-llm-9b-GGUF

NaNK
208
5

Llama-3.1-8B-EZO-1.1-it-GGUF

NaNK
llama
208
2

Berghof-NSFW-7B-GGUF

NaNK
207
6

Qwen2.5-Coder-14B-Instruct-GGUF

NaNK
license:apache-2.0
207
3

Gemma-2-Ataraxy-v3i-9B-GGUF

NaNK
207
1

OCRonos-GGUF

license:apache-2.0
206
2

mistral-nemo-wissenschaft-12B-GGUF

NaNK
license:apache-2.0
206
2

SmolLM2-135M-Instruct-GGUF

license:apache-2.0
205
5

Romulus-cpt-Llama-3.1-8B-v0.1-GGUF

NaNK
llama
205
1

Meta-Llama-3-8B-GGUF-v2

NaNK
llama
204
6

LLaMAX3-8B-GGUF

NaNK
base_model:LLaMAX/LLaMAX3-8B
204
3

OLMoE-1B-7B-0924-GGUF

NaNK
license:apache-2.0
204
3

gemma-2-9b-it-GGUF

NaNK
204
2

Ichigo-llama3.1-s-instruct-v0.4-GGUF

license:apache-2.0
203
2

Buzz-8b-Large-v0.5-GGUF

NaNK
Meta-Llama-3
203
0

saiga_nemo_12b-GGUF

NaNK
license:apache-2.0
202
9

Apollo2-9B-GGUF

QuantFactory/Apollo2-9B-GGUF This is quantized version of FreedomIntelligence/Apollo2-9B created using llama.cpp Covering 12 Major Languages including English, Chinese, French, Hindi, Spanish, Arabic, Russian, Japanese, Korean, German, Italian, Portuguese and 38 Minor Languages So far. 📃 Paper • 🌐 Demo • 🤗 ApolloMoEDataset • 🤗 ApolloMoEBench • 🤗 Models •🌐 Apollo • 🌐 ApolloMoE Languages Coverage 12 Major Languages and 38 Minor Languages Dense 🤗 Apollo2-0.5B • 🤗 Apollo2-1.5B • 🤗 Apollo2-2B Post-MoE 🤗 Apollo-MoE-0.5B • 🤗 Apollo-MoE-1.5B • 🤗 Apollo-MoE-7B Usage Format Apollo2 - 0.5B, 1.5B, 7B: User:{query}\nAssistant:{response} - 2B, 9B: User:{query}\nAssistant:{response}\ - 3.8B: \n{query} \n{response} Apollo-MoE - 0.5B, 1.5B, 7B: User:{query}\nAssistant:{response} - EN: - MedQA-USMLE - MedMCQA - PubMedQA: Because the results fluctuated too much, they were not used in the paper. - MMLU-Medical - Clinical knowledge, Medical genetics, Anatomy, Professional medicine, College biology, College medicine - ZH: - MedQA-MCMLE - CMB-single: Not used in the paper - Randomly sample 2,000 multiple-choice questions with single answer. - CMMLU-Medical - Anatomy, Clinicalknowledge, Collegemedicine, Genetics, Nutrition, Traditionalchinesemedicine, Virology - CExam: Not used in the paper - Randomly sample 2,000 multiple-choice questions - ES: Headqa - FR: - Frenchmedmcqa - [MMLUFR] - Clinical knowledge, Medical genetics, Anatomy, Professional medicine, College biology, College medicine - HI: MMLUHI - Clinical knowledge, Medical genetics, Anatomy, Professional medicine, College biology, College medicine - AR: MMLUAR - Clinical knowledge, Medical genetics, Anatomy, Professional medicine, College biology, College medicine - JA: IgakuQA - KO: KorMedMCQA - IT: - MedExpQA - [MMLUIT] - Clinical knowledge, Medical genetics, Anatomy, Professional medicine, College biology, College medicine - DE: BioInstructQA: German part - PT: BioInstructQA: Portuguese part - RU: RuMedBench Model Download and Inference We take Apollo-MoE-0.5B as an example 1. Login Huggingface We take Apollo2-7B or Apollo-MoE-0.5B as example 1. Download Dataset for project: 3. Prepare train data for specific model (Create tokenized data in advance): - You can adjust data Training order and Training Epoch in this step - If you want to train in Multi Nodes please refer to ./src/sft/trainingconfig/zeromulti.yaml 5. Evaluate your model: Generate score for benchmark Citation Please use the following citation if you intend to use our dataset for training or evaluation:

NaNK
201
2

ArliAI-Llama-3-8B-Instruct-DPO-v0.1-GGUF

NaNK
base_model:OwenArli/ArliAI-Llama-3-8B-Instruct-DPO-v0.1
201
1

HuatuoGPT-o1-8B-GGUF

NaNK
base_model:meta-llama/Llama-3.1-8B-Instruct
200
3

deepseek-llm-7b-base-GGUF

NaNK
200
3

falcon-7b-instruct-GGUF

NaNK
license:apache-2.0
200
2

granite-7b-instruct-GGUF

NaNK
license:apache-2.0
200
2

Reasoning-Llama-1b-v0.1-GGUF

NaNK
llama
200
2

Llama3.1-Allades-8B-GGUF

NaNK
base_model:mlabonne/Meta-Llama-3.1-8B-Instruct-abliterated
200
1

Medical-Llama3-v2-GGUF

llama
199
3

Qwen2-0.5B-Instruct-GGUF

NaNK
license:apache-2.0
199
1

Llama3-ChatQA-2-8B-GGUF

NaNK
llama-3
198
3

KONI-Llama3-8B-20240630-GGUF

NaNK
llama
198
1

Insanity-GGUF

NaNK
198
1

Meta-Llama-3-8B-Instruct-function-calling-json-mode-GGUF

NaNK
llama
198
0

Pullulation-2-9B-GGUF

NaNK
196
1

LLAMA-3_8B_Unaligned_BETA-GGUF

NaNK
195
9

VersatiLlama-Llama-3.2-3B-Instruct-Abliterated-GGUF

NaNK
base_model:meta-llama/Llama-3.2-3B-Instruct
195
4

TinyAgent-7B-GGUF

NaNK
195
3

suzume-llama-3-8B-multilingual-GGUF

NaNK
base_model:lightblue/suzume-llama-3-8B-multilingual
195
0

internlm2-math-plus-1_8b-GGUF

NaNK
195
0

Sakura-13B-Galgame-GGUF

NaNK
license:apache-2.0
194
2

Qwen3-1.7B-GGUF

NaNK
license:apache-2.0
193
3

stablelm-zephyr-3b-GGUF

NaNK
193
2

medicine-chat-GGUF

QuantFactory/medicine-chat-GGUF This is quantized version of AdaptLLM/medicine-chat created using llama.cpp Adapting LLMs to Domains via Continual Pre-Training (ICLR 2024) This repo contains the domain-specific chat model developed from LLaMA-2-Chat-7B, using the method in our paper Adapting Large Language Models via Reading Comprehension. We explore continued pre-training on domain-specific corpora for large language models. While this approach enriches LLMs with domain knowledge, it significantly hurts their prompting ability for question answering. Inspired by human learning via reading comprehension, we propose a simple method to transform large-scale pre-training corpora into reading comprehension texts, consistently improving prompting performance across tasks in biomedicine, finance, and law domains. Our 7B model competes with much larger domain-specific models like BloombergGPT-50B. [2024/11/29] 🤗 Introduce the multimodal version of AdaptLLM at AdaMLLM, for adapting MLLMs to domains 🤗 Updates 2024/11/29: Released AdaMLLM for adapting MLLMs to domains 2024/9/20: Our research paper for Instruction-Pretrain has been accepted by EMNLP 2024 2024/8/29: Updated guidelines on evaluating any 🤗Huggingface models on the domain-specific tasks 2024/6/22: Released the benchmarking code 2024/6/21: Released the general version of AdaptLLM at Instruction-Pretrain 2024/4/2: Released the raw data splits (train and test) of all the evaluation datasets 2024/1/16: Our research paper for AdaptLLM has been accepted by ICLR 2024 2023/12/19: Released our 13B base models developed from LLaMA-1-13B 2023/12/8: Released our chat models developed from LLaMA-2-Chat-7B 2023/9/18: Released our paper, code, data, and base models developed from LLaMA-1-7B 1. Domain-Specific Models LLaMA-1-7B In our paper, we develop three domain-specific models from LLaMA-1-7B, which are also available in Huggingface: Biomedicine-LLM, Finance-LLM and Law-LLM, the performances of our AdaptLLM compared to other domain-specific LLMs are: LLaMA-1-13B Moreover, we scale up our base model to LLaMA-1-13B to see if our method is similarly effective for larger-scale models, and the results are consistently positive too: Biomedicine-LLM-13B, Finance-LLM-13B and Law-LLM-13B. LLaMA-2-Chat Our method is also effective for aligned models! LLaMA-2-Chat requires a specific data format, and our reading comprehension can perfectly fit the data format by transforming the reading comprehension into a multi-turn conversation. We have also open-sourced chat models in different domains: Biomedicine-Chat, Finance-Chat and Law-Chat For example, to chat with the biomedicine-chat model: LLaMA-3-8B (💡New!) In our recent research on Instruction-Pretrain, we developed a context-based instruction synthesizer to augment the raw corpora with instruction-response pairs, enabling Llama3-8B to be comparable to or even outperform Llama3-70B: Finance-Llama3-8B, Biomedicine-Llama3-8B. 2. Domain-Specific Tasks To easily reproduce our prompting results, we have uploaded the filled-in zero/few-shot input instructions and output completions of the test each domain-specific task: biomedicine-tasks, finance-tasks, and law-tasks. Note: those filled-in instructions are specifically tailored for models before alignment and do NOT fit for the specific data format required for chat models. Open LLM Leaderboard Evaluation Results Detailed results can be found here | Metric |Value| |---------------------------------|----:| |Avg. |52.99| |AI2 Reasoning Challenge (25-Shot)|53.75| |HellaSwag (10-Shot) |76.11| |MMLU (5-Shot) |49.98| |TruthfulQA (0-shot) |43.46| |Winogrande (5-shot) |75.69| |GSM8k (5-shot) |18.95| Citation If you find our work helpful, please cite us:

license:llama2
193
2

CodeLlama-7b-Instruct-hf-GGUF

NaNK
llama
192
4

StockLLama-3.1-8B-GGUF

NaNK
base_model:Orenguteng/Llama-3.1-8B-Lexi-Uncensored-V2
192
1

LLaMA3-iterative-DPO-final-GGUF

llama
191
1

Arcee-Scribe-GGUF

license:apache-2.0
191
1

instinct-GGUF

QuantFactory/instinct-GGUF This is quantized version of continuedev/instinct created using llama.cpp Instinct, the State-of-the-Art Open Next Edit Model This repo contains the model weights for Continue's state-of-the-art open Next Edit model, Instinct. Robustly fine-tuned from Qwen2.5-Coder-7B on our dataset of real-world code edits, Instinct intelligently predicts your next move to keep you in flow. Ollama: We've released a Q4KM GGUF quantization of Instinct for efficient local inference. Try it with Continue's Ollama integration, or just run `ollama run nate/instinct`. You can also serve the model using either of the below options, then connect it with Continue. SGLang: `python3 -m sglang.launchserver --model-path continuedev/instinct --load-format safetensors` vLLM: `vllm serve continuedev/instinct --served-model-name instinct --load-format safetensors` For more information on the work behind Instinct, please refer to our blog.

NaNK
license:apache-2.0
190
4

Reasoning-0.5b-GGUF

NaNK
license:apache-2.0
190
2

Mistral-Crab-SFT-GGUF

license:apache-2.0
190
2

Lama-DPOlphin-8B-GGUF

NaNK
base_model:cognitivecomputations/dolphin-2.9.4-llama3.1-8b
189
2

Turkish-Llama-8b-Instruct-v0.1-GGUF

NaNK
Llama
189
1

Llama3-ArrowSE-8B-v0.3-GGUF

NaNK
license:llama3
189
1

CursorCore-QW2.5-1.5B-SR-GGUF

NaNK
license:apache-2.0
189
1

ArliAI-Llama-3-8B-Dolfin-v0.3-GGUF

NaNK
llama
188
1

Qwen2.5-Coder-7B-Chat-Instruct-TIES-v1.2-GGUF

QuantFactory/Qwen2.5-Coder-7B-Chat-Instruct-TIES-v1.2-GGUF This is quantized version of BenevolenceMessiah/Qwen2.5-Coder-7B-Chat-Instruct-TIES-v1.2 created using llama.cpp This is a merge of pre-trained language models created using mergekit. This model was merged using the TIES merge method using Qwen/Qwen2.5-Coder-7B as a base. The following models were included in the merge: Qwen/Qwen2.5-Coder-7B-Instruct The following YAML configuration was used to produce this model:

NaNK
187
2

Fox-1-1.6B-GGUF

NaNK
license:apache-2.0
186
1

Llama-3-Perky-Pat-Instruct-8B-GGUF

NaNK
base_model:grimjim/Llama-3-Perky-Pat-Instruct-8B
186
1

Llama3-ArrowSE-8B-v0.1-GGUF

NaNK
license:llama3
186
1

magnum-v2.5-12b-kto-GGUF

NaNK
license:apache-2.0
186
1

Llama-3-8B-Instruct-262k-GGUF

NaNK
llama-3
186
0

L3.1-Aglow-Vulca-v0.1-8B-GGUF

NaNK
base_model:ArliAI/Llama-3.1-8B-ArliAI-Formax-v1.0
184
2

HelpingAI2-9B-GGUF

NaNK
184
1

HTML-Pruner-Llama-1B-GGUF

NaNK
base_model:meta-llama/Llama-3.2-1B
183
4

Stella-mistral-nemo-12B-v2-GGUF

NaNK
183
3

suzume-llama-3-8B-japanese-GGUF

NaNK
183
1

Starling-LM-7B-beta-GGUF

NaNK
license:apache-2.0
183
0

shieldgemma-9b-GGUF

NaNK
182
2

AlphaMaze-v0.2-1.5B-GGUF

NaNK
license:apache-2.0
181
2

Arch-Agent-7B-GGUF

NaNK
181
2

Meta-Llama-3.1-8B-Claude-GGUF

NaNK
181
1

KONI-Llama3-8B-Instruct-20240729-GGUF

NaNK
license:llama3
181
1

Bielik-11B-v2.2-Instruct-GGUF

NaNK
license:apache-2.0
181
1

deepseek-coder-7b-instruct-v1.5-GGUF

QuantFactory/deepseek-coder-7b-instruct-v1.5-GGUF This is quantized version of deepseek-ai/deepseek-coder-7b-instruct-v1.5 created using llama.cpp [🏠Homepage] | [🤖 Chat with DeepSeek Coder] | [Discord] | [Wechat(微信)] Deepseek-Coder-7B-Instruct-v1.5 is continue pre-trained from Deepseek-LLM 7B on 2T tokens by employing a window size of 4K and next token prediction objective, and then fine-tuned on 2B tokens of instruction data. - Home Page: DeepSeek - Repository: deepseek-ai/deepseek-coder - Chat With DeepSeek Coder: DeepSeek-Coder 3. How to Use Here give some examples of how to use our model. Chat Model Inference 4. License This code repository is licensed under the MIT License. The use of DeepSeek Coder models is subject to the Model License. DeepSeek Coder supports commercial use. If you have any questions, please raise an issue or contact us at [email protected].

NaNK
181
1

CodeLlama-7B-KStack-GGUF

NaNK
base_model:JetBrains/CodeLlama-7B-KStack
180
3

Mistral-Nemo-12B-ArliAI-RPMax-v1.1-GGUF

NaNK
license:apache-2.0
180
2

LFM2-350M-Math-GGUF

aashish1904/LFM2-350M-Math-GGUF This is quantized version of LiquidAI/LFM2-350M-Math created using llama.cpp Based on LFM2-350M, LFM2-350M-Math is a tiny reasoning model designed for tackling tricky math problems. You can find more information about other task-specific models in this blog post. Generation parameters: We strongly recommend using greedy decoding with a `temperature=0.6`, `topp=0.95`, `minp=0.1`, `repetitionpenalty=1.05`. System prompt: We recommend not using any system prompt. Chat template: LFM2 uses a ChatML-like chat template as follows: You can automatically apply it using the dedicated `.applychattemplate()` function from Hugging Face transformers. > [!WARNING] > ⚠️ The model is intended for single-turn conversations. Reasoning enables models to better structure their thought process, explore multiple solution strategies, and self-verify their final responses. Augmenting tiny models with extensive test-time compute in this way allows them to even solve challenging competition-level math problems. Our benchmark evaluations demonstrate that LFM2-350M-Math is highly capable for its size. As we are excited about edge deployment, our goal is to limit memory consumption and latency. Our post-training recipe leverages reinforcement learning to explicitly bring down response verbosity where it is not desirable. To this end, we combine explicit reasoning budgets with difficulty-aware advantage re-weighting. Please refer to our separate blog post for a detailed post-training recipe. - Hugging Face: LFM2-350M - llama.cpp: LFM2-350M-Math-GGUF - LEAP: LEAP model library If you are interested in custom solutions with edge deployment, please contact our sales team.

180
1

starcoder2-3b-GGUF

NaNK
179
3

WizardLM-13B-Uncensored-GGUF

NaNK
179
2

Llama-3-Instruct-8B-SPPO-Iter3-GGUF

NaNK
base_model:UCLA-AGI/Llama-3-Instruct-8B-SPPO-Iter3
179
1

dolphin-2.9-llama3-8b-256k-GGUF

NaNK
base_model:cognitivecomputations/dolphin-2.9-llama3-8b-256k
178
4

magnum-v2-12b-GGUF

NaNK
license:apache-2.0
177
2

llama3.1-gutenberg-8B-GGUF

NaNK
base_model:VAGOsolutions/Llama-3.1-SauerkrautLM-8b-Instruct
177
2

phi3.5-gutenberg-4B-GGUF

NaNK
license:mit
176
4

Qwen2.5-Lumen-14B-GGUF

NaNK
license:apache-2.0
176
3

Mistral-Nemo-Instruct-2407-GGUF-iamlemec

license:apache-2.0
175
2

Llama-3.1-SauerkrautLM-8b-Instruct-GGUF

NaNK
license:llama3.1
175
2

BgGPT-Gemma-2-2.6B-IT-v1.0-GGUF

NaNK
175
2

Qwen2.5-3B-GGUF

NaNK
175
1

Triangulum-1B-GGUF

NaNK
ollama
173
2

gemma-7b-GGUF

NaNK
173
1

SmolLM-360M-Instruct-GGUF

license:apache-2.0
173
1

NeuralLlama-3-8B-Instruct-abliterated-GGUF

NaNK
base_model:mlabonne/NeuralLlama-3-8B-Instruct-abliterated
173
0

AI-Sweden-Llama-3-8B-Instruct-GGUF

NaNK
base_model:AI-Sweden-Models/Llama-3-8B-instruct
173
0

pythia-12b-GGUF

NaNK
license:apache-2.0
172
3

L3-Scrambled-Eggs-On-Toast-8B-GGUF

NaNK
base_model:NeverSleep/Llama-3-Lumimaid-8B-v0.1-OAS
172
2

dolphin-2.9.2-qwen2-7b-GGUF

NaNK
license:apache-2.0
171
5

granite-3.0-2b-base-GGUF

NaNK
license:apache-2.0
171
2

ArliAI-Llama-3-8B-Dolfin-v1.0-GGUF

NaNK
base_model:OwenArli/ArliAI-Llama-3-8B-Dolfin-v1.0
170
1

Aspire1.2-8B-TIES-GGUF

NaNK
base_model:ArliAI/Llama-3.1-8B-ArliAI-RPMax-v1.1
170
1

BFS-Prover-GGUF

NaNK
license:apache-2.0
170
1

llama-3-zilo-sql-GGUF

NaNK
meta-llama/Meta-Llama-3-8B-Instruct
169
1

Qwen2-7B-Multilingual-RP-GGUF

NaNK
license:apache-2.0
168
2

Veritas-12B-GGUF

NaNK
167
6

gemma-2-2b-GGUF

NaNK
167
5

Qwen2.5-7B-Instruct-MathCoder-GGUF

QuantFactory/Qwen2.5-7B-Instruct-MathCoder-GGUF This is quantized version of DeepMount00/Qwen2.5-7B-Instruct-MathCoder created using llama.cpp This is a merge of pre-trained language models created using mergekit. This model was merged using the TIES merge method using Qwen/Qwen2.5-7B-Instruct as a base. The following models were included in the merge: Qwen/Qwen2.5-Coder-7B-Instruct Qwen/Qwen2.5-Math-7B-Instruct The following YAML configuration was used to produce this model:

NaNK
167
4

MathCoder2-CodeLlama-7B-GGUF

NaNK
base_model:codellama/CodeLlama-7b-hf
167
2

MegaHyperLlama3.1-GGUF

NaNK
base_model:bunnycore/HyperLLama3.1-Mix-8B
166
1

Qwen2.5-7B-Gutenberg-KTO-GGUF

NaNK
license:gpl-3.0
166
1

Hermes-2-Theta-Llama-3-8B-GGUF

NaNK
Llama-3
166
0

HelpingAI-9B-GGUF

NaNK
165
4

MadWizard-SFT-v2-Mistral-7b-v0.3-GGUF

NaNK
165
0

L3-Umbral-Mind-RP-v1.0-8B-GGUF

NaNK
license:llama3
164
2

leniachat-gemma-2b-v0-GGUF

NaNK
license:apache-2.0
164
2

granite-3b-code-base-128k-GGUF

NaNK
license:apache-2.0
164
2

Llama-3-8B-WildChat-GGUF

NaNK
base_model:Magpie-Align/Llama-3-8B-WildChat
164
1

ArliAI-Llama-3-8B-Instruct-DPO-v0.2-GGUF

NaNK
base_model:OwenArli/ArliAI-Llama-3-8B-Instruct-DPO-v0.2
164
1

Josiefied-Qwen2.5-7B-Instruct-abliterated-v2-GGUF

NaNK
license:apache-2.0
164
1

NovaSpark-GGUF

NaNK
base_model:grimjim/Llama-3.1-SuperNova-Lite-lorabilterated-8B
164
1

mistral-nemo-gutenberg-12B-v2-GGUF

NaNK
license:apache-2.0
163
2

mistral-nemo-bophades-12B-GGUF

NaNK
license:apache-2.0
162
4

SeaLLMs-v3-7B-Chat-GGUF

NaNK
162
1

Open-Insurance-LLM-Llama3-8B-GGUF

NaNK
llama
161
6

Llama-3-8B-Magpie-Align-SFT-v0.1-GGUF

NaNK
base_model:meta-llama/Meta-Llama-3-8B
161
2

starcoder2-3b-instruct-GGUF

NaNK
161
1

Qwen2.5-Coder-1.5B-Instruct-GGUF

NaNK
license:apache-2.0
160
3

L3-8B-Lunar-Stheno-GGUF

NaNK
llama-3
160
2

gemma-2-9b-it-abliterated-GGUF

NaNK
160
2

Meta-Llama-3.1-8B-Instruct-abliterated-GGUF

NaNK
license:llama3.1
160
2