arcee-ai
Trinity-Large-Preview-W4A16
Trinity-Mini-GGUF
AFM-4.5B-Base
AFM-4.5B-Base is a 4.5 billion parameter instruction-tuned model developed by Arcee.ai, designed for enterprise-grade performance across diverse deployment environments from cloud to edge. The base model was trained on a dataset of 8 trillion tokens, comprising 6.5 trillion tokens of general pretraining data followed by 1.5 trillion tokens of midtraining data with enhanced focus on mathematical reasoning and code generation. Following pretraining, the model underwent supervised fine-tuning on high-quality instruction datasets. The instruction-tuned model was further refined through reinforcement learning on verifiable rewards as well as for human preference. We use a modified version of TorchTitan for pretraining, Axolotl for supervised fine-tuning, and a modified version of Verifiers for reinforcement learning. The development of AFM-4.5B prioritized data quality as a fundamental requirement for achieving robust model performance. We collaborated with DatologyAI, a company specializing in large-scale data curation. DatologyAI's curation pipeline integrates a suite of proprietary algorithms—model-based quality filtering, embedding-based curation, target distribution-matching, source mixing, and synthetic data. Their expertise enabled the creation of a curated dataset tailored to support strong real-world performance. The model architecture follows a standard transformer decoder-only design based on Vaswani et al., incorporating several key modifications for enhanced performance and efficiency. Notable architectural features include grouped query attention for improved inference efficiency and ReLU^2 activation functions instead of SwiGLU to enable sparsification while maintaining or exceeding performance benchmarks. The model available in this repo is the base model following merging and context extension. Model Architecture: ArceeForCausalLM Parameters: 4.5B Training Tokens: 8T License: Apache-2.0 You can use the model directly with the `transformers` library.
Trinity-Large-Thinking
Trinity-Mini
AFM-4.5B
AFM-4.5B is a 4.5 billion parameter instruction-tuned model developed by Arcee.ai, designed for enterprise-grade performance across diverse deployment environments from cloud to edge. The base model was trained on a dataset of 8 trillion tokens, comprising 6.5 trillion tokens of general pretraining data followed by 1.5 trillion tokens of midtraining data with enhanced focus on mathematical reasoning and code generation. Following pretraining, the model underwent supervised fine-tuning on high-quality instruction datasets. The instruction-tuned model was further refined through reinforcement learning on verifiable rewards as well as for human preference. We use a modified version of TorchTitan for pretraining, Axolotl for supervised fine-tuning, and a modified version of Verifiers for reinforcement learning. The development of AFM-4.5B prioritized data quality as a fundamental requirement for achieving robust model performance. We collaborated with DatologyAI, a company specializing in large-scale data curation. DatologyAI's curation pipeline integrates a suite of proprietary algorithms—model-based quality filtering, embedding-based curation, target distribution-matching, source mixing, and synthetic data. Their expertise enabled the creation of a curated dataset tailored to support strong real-world performance. The model architecture follows a standard transformer decoder-only design based on Vaswani et al., incorporating several key modifications for enhanced performance and efficiency. Notable architectural features include grouped query attention for improved inference efficiency and ReLU^2 activation functions instead of SwiGLU to enable sparsification while maintaining or exceeding performance benchmarks. The model available in this repo is the instruct model following supervised fine-tuning and reinforcement learning. View our documentation here for more details: https://docs.arcee.ai/arcee-foundation-models/introduction-to-arcee-foundation-models Model Architecture: ArceeForCausalLM Parameters: 4.5B Training Tokens: 8T License: Apache 2.0 Recommended settings: temperature: 0.5 topk: 50 topp: 0.95 repeatpenalty: 1.1 Qwen3 and SmolLM's reasoning approach causes their scores to vary wildly from suite to suite - but these are all scores on our internal harness with the same hyperparameters. Be sure to reference their reported scores. SmolLM just released its bench. You can use the model directly with the `transformers` library. We recommend a lower temperature, around 0.5, for optimal performance. You can access this model directly via the Together Playground. Support for llama.cpp and Intel OpenVINO is available:
Trinity-Nano-Base
SuperNova-Medius-GGUF
Trinity-Large-Preview
Virtuoso-Medium-v2-GGUF
Llama-3.1-SuperNova-Lite-GGUF
Arcee-VyLinh
Arcee-Nova-GGUF
Homunculus-GGUF
AFM-4.5B-GGUF
AFM-4.5B is a 4.5 billion parameter instruction-tuned model developed by Arcee.ai, designed for enterprise-grade performance across diverse deployment environments from cloud to edge. The base model was trained on a dataset of 8 trillion tokens, comprising 6.5 trillion tokens of general pretraining data followed by 1.5 trillion tokens of midtraining data with enhanced focus on mathematical reasoning and code generation. Following pretraining, the model underwent supervised fine-tuning on high-quality instruction datasets. The instruction-tuned model was further refined through reinforcement learning on verifiable rewards as well as for human preference. We use a modified version of TorchTitan for pretraining, Axolotl for supervised fine-tuning, and a modified version of Verifiers for reinforcement learning. The development of AFM-4.5B prioritized data quality as a fundamental requirement for achieving robust model performance. We collaborated with DatologyAI, a company specializing in large-scale data curation. DatologyAI's curation pipeline integrates a suite of proprietary algorithms—model-based quality filtering, embedding-based curation, target distribution-matching, source mixing, and synthetic data. Their expertise enabled the creation of a curated dataset tailored to support strong real-world performance. The model architecture follows a standard transformer decoder-only design based on Vaswani et al., incorporating several key modifications for enhanced performance and efficiency. Notable architectural features include grouped query attention for improved inference efficiency and ReLU^2 activation functions instead of SwiGLU to enable sparsification while maintaining or exceeding performance benchmarks. The model available in this repo is the instruct model following supervised fine-tuning and reinforcement learning. Model Architecture: ArceeForCausalLM Parameters: 4.5B Training Tokens: 8T License: Apache 2.0 Recommended settings: temperature: 0.5 topk: 50 topp: 0.95 repeatpenalty: 1.1 Qwen3 and SmolLM's reasoning approach causes their scores to vary wildly from suite to suite - but these are all scores on our internal harness with the same hyperparameters. Be sure to reference their reported scores. SmolLM just released its bench.
Virtuoso-Small-GGUF
Llama-Spark-GGUF
Arcee-SuperNova-v1
Arcee-SuperNova-v1 (70B) is a merged model built from multiple advanced training approaches. At its core is a distilled version of Llama-3.1-405B-Instruct into Llama-3.1-70B-Instruct, using out DistillKit to preserve instruction-following strengths while reducing size. Alongside this, another Llama-3.1-70B model was instruction-tuned using synthetic data from our Evol-Kit pipeline, improving precision and adherence across diverse queries. Updates were integrated mid-epoch for smoother performance gains. A third version underwent Direct Preference Optimization (DPO) to better align with human feedback. While its contribution was smaller, it helped refine final alignment. The resulting Arcee-SuperNova combines all three, delivering strong human preference alignment and state-of-the-art instruction-following ability. - Architecture Base: Llama-3.1-70B-Instruct - Parameter Count: 70B - License: [Llama3] - General intelligence and instruction following - Serving as a base to be retrained over time using Reinforcement Learning from Human Feedback (RLHF) - Mathematical applications and queries Arcee-SuperNova-v1 (70B) is released under the Llama-3 license. You are free to use, modify, and distribute this model in both commercial and non-commercial applications, subject to the terms and conditions of the license. If you have questions or would like to share your experiences using Arcee-SuperNova-v1 (70B), please connect with us on social media. We’re excited to see what you build—and how this model helps you innovate!
Virtuoso-Large-GGUF
Virtuoso-Large (72B) is our most powerful and versatile general-purpose model, designed to excel at handling complex and varied tasks across domains. With state-of-the-art performance, it offers unparalleled capability for nuanced understanding, contextual adaptability, and high accuracy. - Architecture Base: Qwen2.5-72B - Parameter Count: 72B - License: qwen - Advanced content creation, such as technical writing and creative storytelling - Data summarization and report generation for cross-functional domains - Detailed knowledge synthesis and deep-dive insights from diverse datasets - Multilingual support for international operations and communications Virtuoso-Large (72B) is released under the qwen License. If you have questions or would like to share your experiences using Virtuoso-Large (72B), please connect with us on social media. We’re excited to see what you build—and how this model helps you innovate!
Llama-3-SEC-Chat
arcee-lite-GGUF
Arcee-Blitz-GGUF
Arcee-Spark-GGUF
Arcee-SuperNova-v1-GGUF
Trinity-Mini-Base
KidRails
Trinity-Large-Preview-FP8-Block
Caller-GGUF
Arcee-Maestro-7B-Preview-GGUF
Llama-3-SEC-Base
Mistral-7B-Instruct-v0.2-sliced-24-layer
Arcee-Scribe
Arcee-Scribe-GGUF
DeepSeek-V3-0324-bf16
DeepSeek-R1-bf16
Biomistral-Calme-Instruct-7b
Meraj-Mini
Trinity-Large-TrueBase
Arcee-Agent
Llama-3-SEC-Chat-GGUF
Trinity-Nano-Preview-W4A16
gemma-3b-it-expanded
arcee-lite
Virtuoso-Medium-v2
Virtuoso-Large
gemma-7b-slerp
gemma-7b-zephyr-alpaca-it-ties
Hermes-Mistral-Legal-Slerp
Qwen2.5-32B-Instruct-FP8
saul-mistral-v0.2-7b-slerp
GLM-4-32B-Base-32K
Virtuoso-Lite-GGUF
Patent-Instruct-7b
Biomistral-Exp-Slerp
CS-Calme-Instruct-7b
saul-zephyr-7b-slerp
Saul-Nous-Hermes-2-Mistral-7B-DPO-slerp
Llama-3-OpenBioLLM-JSL-8B-SLERP
Mistral-Instruct-Orca-Slerp
SEC-MBX-7B-DPO
arcee-blitz-caller-beta
Mistral-Hermes-Support-Ties
Calme-Instruct-Extended
Patent-Instruct-Pro
Patent-Instruct-LLaMA-Pro
saul-zephyr-7b-ties
Saul-Nous-Hermes-2-Mistral-7B-DPO-Ties
sec-mistral-7b-instruct-v2
Gemma-merged-2B-ties
Saul-Base-Calme-7B-Instruct-slerp
PMC_LLaMA_Vicuna_13B_Slerp
Mistral-7B-Instruct-v0.2-expanded
Mistral-7B-Instruct-v0.2-expanded-sec-1.6B-tokens
Legal-Saul-Multiverse-7b
saul-mistral-instruct-v0.1-7b-ties
SEC-1.6-Calme-7B-Instruct
Llama-3-Base-Instruct-Slerp
Mistral-Lora-Adapter-CS-Slerp
gemma-7b-it-zaphyr-slerp
Gemma-Openchat-SauerkrautLM
Customer-Support-Clown-7b
Clown-Saul-Extended
Patent-Instruct-Extended-40
arcee-sec-mistral-7b
Saul-Instruct-Mistral-7B-Instruct-v0.2-Slerp
Llama-3-MegaMed-8B-Model-Stock
Hermes-2-Pro-WizardMath-7B-SLERP
Caller
Caller (32B) is a robust model engineered for seamless integrations and optimized for managing complex tool-based interactions and API function calls. Its strength lies in precise execution, intelligent orchestration, and effective communication between systems, making it indispensable for sophisticated automation pipelines. - Architecture Base: Qwen2.5-32B - Parameter Count: 32B - License: Apache-2.0 - Managing integrations between CRMs, ERPs, and other enterprise systems - Running multi-step workflows with intelligent condition handling - Orchestrating external tool interactions like calendar scheduling, email parsing, or data extraction - Real-time monitoring and diagnostics in IoT or SaaS environments GGUF format available here License Caller (32B) is released under the Apache-2.0 License. You are free to use, modify, and distribute this model in both commercial and non-commercial applications, subject to the terms and conditions of the license. If you have questions or would like to share your experiences using Caller (32B), please connect with us on social media. We’re excited to see what you build—and how this model helps you innovate!
Trinity-Tokenizer
Virtuoso-Lite-PreDistill
Hermes-Mistral-Saul-Slerp
Saul-Legal-Calme-Instruct
gemma-10b-it-expanded
llama_from_mistral_instruct_v2
saul-mistral-v0.1-7b-slerp
Saul-Base-Clown-7B-Instruct-slerp
Patent-Llama-7B-Chat-Slerp
Patent-Base-7b
gemma-7b-alpaca-zaphyr-slerp
Saul-Instruct-Clown-7b
Calme-Clown-Extended
Patent-Instruct-Extended
Patent-Instruct-Barcenas-Orca-2
Patent-Base-InternLM2-7B-Ties
mistral-v2-sec-dolphin
Patent-Base-Llama-2-Chat-7B-Slerp
MyAlee-Qwen-Instruct-v2-16k-v1
Arcee-Spark-FP32
BioMistral-merged-zephyr
Biomistral-Clown-Slerp
SEC-1.6-MBX-7B-DPO
Saul-Instruct-Extended
Patent-Instruct-Orca-2-Model-Stock
Patent-Instruct-Internlm2-7B-Ties
Patent-Base-Orca-2-7B-Slerp
Llama-3-8B-Instruct-Base-Slerp
Clown-DPO-Extended
MedLLaMA-Vicuna-13B-Slerp
sec-mistral-v2-Hercules
myalee-v3-L31-8B
MistralProSupportSlerp
Alpaca-Dragon-Smaug-Slerp
mistral-sliced
Customer-Support-Clown-Extended
zilo-instruct-v2-sft-filtered
BioMistral-merged-instruct
Gemma-Zephyr-Dolly-Chat-Slerp
Patent-Base-Orca-2-7B-Ties
Patent-Instruct-Llama-2-Chat-7B-Slerp
Llama-3-Medical-JSL-WiNGPT2-SLERP
Homunculus
WitchLM-1.5B
SEC-Calme-7B-Instruct
teeny-tiny-mixtral
sec-mistral-7b-instruct-1.6-epoch
patent-evol-merge
cpt-16B-auto-sft-ties-post-merge-auto-dpo
gemma-7b-alpaca-it-ties
Patent-Instruct-Orca-2
AFM-4.5B-Preview
Patent-Base-Barcenas-Orca-2-7B-Slerp
Arcee-Blitz-AWQ
SuperNova-Medius-FP8
Meraj-Mini-FP8
deepseek-v2-chat-0628-awq
Llama-3.1-SuperNova-Lite-FP8
Trinity-Large-Base
AFM 4.5B Base Pre Anneal
AFM-4.5B-Base-Pre-Anneal is a 4.5 billion parameter instruction-tuned model developed by Arcee.ai, designed for enterprise-grade performance across diverse deployment environments from cloud to edge. The base model was trained on a dataset of 6.5 trillion tokens of general pretraining data. We use a modified version of TorchTitan for pretraining. The development of AFM-4.5B prioritized data quality as a fundamental requirement for achieving robust model performance. We collaborated with DatologyAI, a company specializing in large-scale data curation. DatologyAI's curation pipeline integrates a suite of proprietary algorithms—model-based quality filtering, embedding-based curation, target distribution-matching, source mixing, and synthetic data. Their expertise enabled the creation of a curated dataset tailored to support strong real-world performance. The model architecture follows a standard transformer decoder-only design based on Vaswani et al., incorporating several key modifications for enhanced performance and efficiency. Notable architectural features include grouped query attention for improved inference efficiency and ReLU^2 activation functions instead of SwiGLU to enable sparsification while maintaining or exceeding performance benchmarks. The model available in this repo is the base model before it was annealed with math and code and before merging and context extension. Model Architecture: ArceeForCausalLM Parameters: 4.5B Training Tokens: 6.5T - this model is pre-annealing with math and code and uses only the general dataset. License: Apache-2.0 You can use the model directly with the `transformers` library.
Arcee-VyLinh-GGUF
Trinity-Mini-W4A16
llama-8b-sft-qlora
Virtuoso-Lite-4bit-mlx
The Model mlx-community/Virtuoso-Lite-4bit was converted to MLX format from arcee-ai/Virtuoso-Lite using mlx-lm version 0.21.1.
Virtuoso-Medium-v2-3bit-mlx
Virtuoso-Lite-3bit-mlx
The Model mlx-community/Virtuoso-Lite-3bit was converted to MLX format from arcee-ai/Virtuoso-Lite using mlx-lm version 0.21.1.
Virtuoso-Lite-6bit-mlx
Virtuoso-Medium-v2-6bit-mlx
Virtuoso-Medium-v2-bf16-mlx
Arcee-Nova-AWQ
Virtuoso-Lite-8bit-mlx
Virtuoso-Medium-v2-4bit-mlx
Virtuoso-Medium-v2-8bit-mlx
WitchLM-1.5B-GGUF
Arcee-Maestro-7B-Preview-AWQ
zilo-sft-qlora
Trinity-Nano-Base-Pre-Anneal
Trinity-Mini-Base-Pre-Anneal
Trinity-Large-Thinking-NVFP4
Trinity-Large-Preview-FP8
AFM-4.5B-ov
This model repository contains 16-bit, 8-bit, and 4-bit versions of AFM-4.5B optimized with the Intel OpenVINO toolkit. The original model was converted with Intel OpenVINO 2025.03 and Hugging Face Optimum Intel. You can easily try the model with this code snippet: AFM-4.5B is a 4.5 billion parameter instruction-tuned model developed by Arcee.ai, designed for enterprise-grade performance across diverse deployment environments from cloud to edge. The base model was trained on a dataset of 8 trillion tokens, comprising 6.5 trillion tokens of general pretraining data followed by 1.5 trillion tokens of midtraining data with enhanced focus on mathematical reasoning and code generation. Following pretraining, the model underwent supervised fine-tuning on high-quality instruction datasets. The instruction-tuned model was further refined through reinforcement learning on verifiable rewards as well as for human preference. We use a modified version of TorchTitan for pretraining, Axolotl for supervised fine-tuning, and a modified version of Verifiers for reinforcement learning. The development of AFM-4.5B prioritized data quality as a fundamental requirement for achieving robust model performance. We collaborated with DatologyAI, a company specializing in large-scale data curation. DatologyAI's curation pipeline integrates a suite of proprietary algorithms—model-based quality filtering, embedding-based curation, target distribution-matching, source mixing, and synthetic data. Their expertise enabled the creation of a curated dataset tailored to support strong real-world performance. The model architecture follows a standard transformer decoder-only design based on Vaswani et al., incorporating several key modifications for enhanced performance and efficiency. Notable architectural features include grouped query attention for improved inference efficiency and ReLU^2 activation functions instead of SwiGLU to enable sparsification while maintaining or exceeding performance benchmarks. The model available in this repo is the instruct model following supervised fine-tuning and reinforcement learning. Model Architecture: ArceeForCausalLM Parameters: 4.5B Training Tokens: 8T License: Apache 2.0 Recommended settings: temperature: 0.5 topk: 50 topp: 0.95 repeatpenalty: 1.1 Qwen3 and SmolLM's reasoning approach causes their scores to vary wildly from suite to suite - but these are all scores on our internal harness with the same hyperparameters. Be sure to reference their reported scores. SmolLM just released its bench.