Salesforce

--- language: en license: mit tags: - vision - image-to-text - image-captioning - visual-question-answering pipeline_tag: image-to-text inference: false ---

NaNK

license:mit

472,967

blip-vqa-base

--- pipeline_tag: 'visual-question-answering' tags: - visual-question-answering inference: false languages: - en license: bsd-3-clause ---

license:bsd-3-clause

285,201

182

moirai-moe-1.0-R-base

--- license: cc-by-nc-4.0 pipeline_tag: time-series-forecasting tags: - time series - forecasting - pretrained models - foundation models - time series foundation models - time-series ---

license:cc-by-nc-4.0

211,368

codegen-350M-mono

license:bsd-3-clause

138,727

moirai-2.0-R-small

This model is designed for time series forecasting. It is licensed under CC BY-NC 4.0 and is categorized as a pretrained foundation model for time series applications.

license:cc-by-nc-4.0

105,888

codet5p-110m-embedding

license:bsd-3-clause

90,360

blip2-flan-t5-xl

license:mit

72,074

moirai-1.1-R-large

This is new updated version of Moirai-1.0-R (https://huggingface.co/Salesforce/moirai-1.0-R-large). The Moirai-1.1-R model achieved significant improvements (~20%) for low-frequency cases like Yearly and Quarterly data in Normalised Mean Absolute Error (NMAE) for 40 datasets on the Monash repository. This release is for research purposes only in support of an academic paper. Our models, datasets, and code are not specifically designed or evaluated for all downstream purposes. We strongly recommend users evaluate and address potential concerns related to accuracy, safety, and fairness before deploying this model. We encourage users to consider the common limitations of AI, comply with applicable laws, and leverage best practices when selecting use cases, particularly for high-risk scenarios where errors or misuse could significantly impact people’s lives, rights, or safety. For further guidance on use cases, refer to our AUP and AI AUP.

license:cc-by-nc-4.0

57,869

moirai-1.1-R-base

This is new updated version of Moirai-1.0-R (https://huggingface.co/Salesforce/moirai-1.0-R-base). The new Moirai model achieved significant improvements (~20%) for low-frequency cases like Yearly and Quarterly data in Normalised Mean Absolute Error (NMAE) for 40 datasets on the Monash repository. This release is for research purposes only in support of an academic paper. Our models, datasets, and code are not specifically designed or evaluated for all downstream purposes. We strongly recommend users evaluate and address potential concerns related to accuracy, safety, and fairness before deploying this model. We encourage users to consider the common limitations of AI, comply with applicable laws, and leverage best practices when selecting use cases, particularly for high-risk scenarios where errors or misuse could significantly impact people’s lives, rights, or safety. For further guidance on use cases, refer to our AUP and AI AUP.

license:cc-by-nc-4.0

49,289

moirai-1.1-R-small

This is new updated version of Moirai-1.0-R (https://huggingface.co/Salesforce/moirai-1.0-R-base). The Moirai-1.1-R model achieved significant improvements (~20%) for low-frequency cases like Yearly and Quarterly data in Normalised Mean Absolute Error (NMAE) for 40 datasets on the Monash repository. This release is for research purposes only in support of an academic paper. Our models, datasets, and code are not specifically designed or evaluated for all downstream purposes. We strongly recommend users evaluate and address potential concerns related to accuracy, safety, and fairness before deploying this model. We encourage users to consider the common limitations of AI, comply with applicable laws, and leverage best practices when selecting use cases, particularly for high-risk scenarios where errors or misuse could significantly impact people’s lives, rights, or safety. For further guidance on use cases, refer to our AUP and AI AUP.

license:cc-by-nc-4.0

46,383

codet5-small

Pre-trained CodeT5 model. It was introduced in the paper CodeT5: Identifier-aware Unified Pre-trained Encoder-Decoder Models for Code Understanding and Generation by Yue Wang, Weishi Wang, Shafiq Joty, Steven C.H. Hoi and first released in this repository. Disclaimer: The team releasing CodeT5 did not write a model card for this model so this model card has been written by the Hugging Face team (more specifically, nielsr). "We present CodeT5, a unified pre-trained encoder-decoder Transformer model that better leverages the code semantics conveyed from the developer-assigned identifiers. Our model employs a unified framework to seamlessly support both code understanding and generation tasks and allows for multi-task learning. Besides, we propose a novel identifier-aware pre-training task that enables the model to distinguish which code tokens are identifiers and to recover them when they are masked. Furthermore, we propose to exploit the user-written code comments with a bimodal dual generation task for better NL-PL alignment. Comprehensive experiments show that CodeT5 significantly outperforms prior methods on understanding tasks such as code defect detection and clone detection, and generation tasks across various directions including PL-NL, NL-PL, and PL-PL. Further analysis reveals that our model can better capture semantic information from code." This repository contains the pre-trained model only, so you can use this model for masked span prediction, as shown in the code example below. However, the main use of this model is to fine-tune it for a downstream task of interest, such as: code summarization code generation code translation code refinement code defect detection code clone detection. See the model hub to look for fine-tuned versions on a task that interests you. The CodeT5 model was pretrained on CodeSearchNet Husain et al., 2019. Additionally, the authors collected two datasets of C/CSharp from BigQuery1 to ensure that all downstream tasks have overlapped programming languages with the pre-training data. In total, around 8.35 million instances are used for pretraining. This model uses a code-specific BPE (Byte-Pair Encoding) tokenizer. One can prepare text (or code) for the model using RobertaTokenizer, with the files from this repository. For evaluation results on several downstream benchmarks, we refer to the paper. This release is for research purposes only in support of an academic paper. Our models, datasets, and code are not specifically designed or evaluated for all downstream purposes. We strongly recommend users evaluate and address potential concerns related to accuracy, safety, and fairness before deploying this model. We encourage users to consider the common limitations of AI, comply with applicable laws, and leverage best practices when selecting use cases, particularly for high-risk scenarios where errors or misuse could significantly impact people’s lives, rights, or safety. For further guidance on use cases, refer to our AUP and AI AUP.

license:apache-2.0

24,793

blip-vqa-capfilt-large

license:bsd-3-clause

23,145

codet5-base

Pre-trained CodeT5 model. It was introduced in the paper CodeT5: Identifier-aware Unified Pre-trained Encoder-Decoder Models for Code Understanding and Generation by Yue Wang, Weishi Wang, Shafiq Joty, Steven C.H. Hoi and first released in this repository. Disclaimer: The team releasing CodeT5 did not write a model card for this model so this model card has been written by the Hugging Face team (more specifically, nielsr). "We present CodeT5, a unified pre-trained encoder-decoder Transformer model that better leverages the code semantics conveyed from the developer-assigned identifiers. Our model employs a unified framework to seamlessly support both code understanding and generation tasks and allows for multi-task learning. Besides, we propose a novel identifier-aware pre-training task that enables the model to distinguish which code tokens are identifiers and to recover them when they are masked. Furthermore, we propose to exploit the user-written code comments with a bimodal dual generation task for better NL-PL alignment. Comprehensive experiments show that CodeT5 significantly outperforms prior methods on understanding tasks such as code defect detection and clone detection, and generation tasks across various directions including PL-NL, NL-PL, and PL-PL. Further analysis reveals that our model can better capture semantic information from code." This repository contains the pre-trained model only, so you can use this model for (among other tasks) masked span prediction, as shown in the code example below. However, the main use of this model is to fine-tune it for a downstream task of interest, such as: code summarization code generation code translation code refinement code defect detection code clone detection. Supervised datasets for code can be found here. See the model hub to look for fine-tuned versions on a task that interests you. The CodeT5 model was pretrained on CodeSearchNet Husain et al., 2019. Additionally, the authors collected two datasets of C/CSharp from BigQuery1 to ensure that all downstream tasks have overlapped programming languages with the pre-training data. In total, around 8.35 million instances are used for pretraining. This model uses a code-specific BPE (Byte-Pair Encoding) tokenizer trained using the HuggingFace Tokenizers library. One can prepare text (or code) for the model using RobertaTokenizer, with the files from this repository. For evaluation results on several downstream benchmarks, we refer to the paper. This release is for research purposes only in support of an academic paper. Our models, datasets, and code are not specifically designed or evaluated for all downstream purposes. We strongly recommend users evaluate and address potential concerns related to accuracy, safety, and fairness before deploying this model. We encourage users to consider the common limitations of AI, comply with applicable laws, and leverage best practices when selecting use cases, particularly for high-risk scenarios where errors or misuse could significantly impact people’s lives, rights, or safety. For further guidance on use cases, refer to our AUP and AI AUP.

license:apache-2.0

22,083

132

blip2-opt-6.7b

BLIP-2 model, leveraging OPT-6.7b (a large language model with 6.7 billion parameters). It was introduced in the paper BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders a...

NaNK

license:mit

17,106

SFR-Embedding-2_R

license:cc-by-nc-4.0

16,949

blip-itm-large-coco

license:bsd-3-clause

16,506

instructblip-vicuna-7b

NaNK

—

14,476

blip2-itm-vit-g

license:mit

13,818

SFR-Embedding-Code-400M_R

license:cc-by-nc-4.0

12,584

codet5p-220m

license:bsd-3-clause

12,468

moirai-1.0-R-small

license:cc-by-nc-4.0

10,809

instructblip-flan-t5-xl

license:mit

10,304

Llama-xLAM-2-8b-fc-r-gguf

This repo provides the GGUF format for the Llama-xLAM-2-8b-fc-r model. Here's a link to original model Llama-xLAM-2-8b-fc-r. Large Action Models (LAMs) are advanced language models designed to enhance decision-making by translating user intentions into executable actions. As the brains of AI agents, LAMs autonomously plan and execute tasks to achieve specific goals, making them invaluable for automating workflows across diverse domains. This model release is for research purposes only. The new xLAM-2 series, built on our most advanced data synthesis, processing, and training pipelines, marks a significant leap in multi-turn conversation and tool usage. Trained using our novel APIGen-MT framework, which generates high-quality training data through simulated agent-human interactions. Our models achieve state-of-the-art performance on BFCL and τ-bench benchmarks, outperforming frontier models like GPT-4o and Claude 3.5. Notably, even our smaller models demonstrate superior capabilities in multi-turn scenarios while maintaining exceptional consistency across trials. We've also refined the chat template and vLLM integration, making it easier to build advanced AI agents. Compared to previous xLAM models, xLAM-2 offers superior performance and seamless deployment across applications. Comparative performance of larger xLAM-2-fc-r models (8B-70B, trained with APIGen-MT data) against state-of-the-art baselines on function-calling (BFCL v3, as of date 04/02/2025) and agentic (τ-bench) capabilities. Table of Contents - Model Series - Using GGUF Files - Benchmark Results - Citation xLAM series are significantly better at many things including general tasks and function calling. For the same number of parameters, the model have been fine-tuned across a wide range of agent tasks and scenarios, all while preserving the capabilities of the original model. | Model | # Total Params | Context Length | Category | Download Model | Download GGUF files | |------------------------|----------------|------------|-------|----------------|----------| | Llama-xLAM-2-70b-fc-r | 70B | 128k | Multi-turn Conversation, Function-calling | 🤗 Link | NA | | Llama-xLAM-2-8b-fc-r | 8B | 128k | Multi-turn Conversation, Function-calling | 🤗 Link | 🤗 Link | | xLAM-2-32b-fc-r | 32B | 32k (max 128k) | Multi-turn Conversation, Function-calling | 🤗 Link | NA | | xLAM-2-3b-fc-r | 3B | 32k (max 128k) | Multi-turn Conversation, Function-calling | 🤗 Link | 🤗 Link | | xLAM-2-1b-fc-r | 1B | 32k (max 128k) | Multi-turn Conversation, Function-calling | 🤗 Link | 🤗 Link | Note: The default context length for Qwen-2.5-based models is 32k, but you can use techniques like YaRN (Yet Another Recursive Network) to achieve maximum 128k context length. Please refer to here for more details. You can also explore our previous xLAM series here. The `-fc` suffix indicates that the models are fine-tuned for function calling tasks, while the `-r` suffix signifies a research release. ✅ All models are fully compatible with vLLM and Transformers-based inference frameworks. For scenarios requiring more efficient inference or deployment on resource-constrained devices, we provide GGUF versions of our models, which are compatible with llama.cpp and similar frameworks. 1. Install llama.cpp framework from the source here 2. Run the inference task as shown below. For configuration of generation-related parameters, refer to llama.cpp documentation Performance comparison of different models on BFCL leaderboard. The rank is based on the overall accuracy, which is a weighted average of different evaluation categories. "FC" stands for function-calling mode in contrast to using a customized "prompt" to extract the function calls. Success Rate (pass@1) on τ-bench benchmark averaged across at least 5 trials. Our xLAM-2-70b-fc-r model achieves an overall success rate of 56.2% on τ-bench, significantly outperforming the base Llama 3.1 70B Instruct model (38.2%) and other open-source models like DeepSeek v3 (40.6%). Notably, our best model even outperforms proprietary models such as GPT-4o (52.9%) and approaches the performance of more recent models like Claude 3.5 Sonnet (new) (60.1%). Pass^k curves measuring the probability that all 5 independent trials succeed for a given task, averaged across all tasks for τ-retail (left) and τ-airline (right) domains. Higher values indicate better consistency of the models. This release is for research purposes only in support of an academic paper. Our models, datasets, and code are not specifically designed or evaluated for all downstream purposes. We strongly recommend users evaluate and address potential concerns related to accuracy, safety, and fairness before deploying this model. We encourage users to consider the common limitations of AI, comply with applicable laws, and leverage best practices when selecting use cases, particularly for high-risk scenarios where errors or misuse could significantly impact people's lives, rights, or safety. For further guidance on use cases, refer to our AUP and AI AUP. For all Llama relevant models, please also follow corresponding Llama license and terms. Meta Llama 3 is licensed under the Meta Llama 3 Community License, Copyright © Meta Platforms, Inc. All Rights Reserved. If you use our model or dataset in your work, please cite our paper: Additionally, please check our other amazing works regarding xLAM series and consider citing them as well:

NaNK

llama

8,398

blip2-itm-vit-g-coco

license:mit

8,009

blip2-flan-t5-xxl

license:mit

7,684

moirai-moe-1.0-R-small

license:cc-by-nc-4.0

7,508

codet5-base-multi-sum

license:bsd-3-clause

7,017

ctrl

license:bsd-3-clause

6,846

moirai-1.0-R-large

license:cc-by-nc-4.0

5,255

Llama-xLAM-2-8b-fc-r

Large Action Models (LAMs) are advanced language models designed to enhance decision-making by translating user intentions into executable actions. As the brains of AI agents, LAMs autonomously plan and execute tasks to achieve specific goals, making them invaluable for automating workflows across diverse domains. This model release is for research purposes only. The new xLAM-2 series, built on our most advanced data synthesis, processing, and training pipelines, marks a significant leap in multi-turn conversation and tool usage. Trained using our novel APIGen-MT framework, which generates high-quality training data through simulated agent-human interactions. Our models achieve state-of-the-art performance on BFCL and τ-bench benchmarks, outperforming frontier models like GPT-4o and Claude 3.5. Notably, even our smaller models demonstrate superior capabilities in multi-turn scenarios while maintaining exceptional consistency across trials. We've also refined the chat template and vLLM integration, making it easier to build advanced AI agents. Compared to previous xLAM models, xLAM-2 offers superior performance and seamless deployment across applications. Comparative performance of larger xLAM-2-fc-r models (8B-70B, trained with APIGen-MT data) against state-of-the-art baselines on function-calling (BFCL v3, as of date 04/02/2025) and agentic (τ-bench) capabilities. Table of Contents - Usage - Basic Usage with Huggingface Chat Template - Using vLLM for Inference - Setup and Serving - Testing with OpenAI API - Benchmark Results - Citation xLAM series are significant better at many things including general tasks and function calling. For the same number of parameters, the model have been fine-tuned across a wide range of agent tasks and scenarios, all while preserving the capabilities of the original model. | Model | # Total Params | Context Length | Category | Download Model | Download GGUF files | |------------------------|----------------|------------|-------|----------------|----------| | Llama-xLAM-2-70b-fc-r | 70B | 128k | Multi-turn Conversation, Function-calling | 🤗 Link | NA | | Llama-xLAM-2-8b-fc-r | 8B | 128k | Multi-turn Conversation, Function-calling | 🤗 Link | 🤗 Link | | xLAM-2-32b-fc-r | 32B | 32k (max 128k) | Multi-turn Conversation, Function-calling | 🤗 Link | NA | | xLAM-2-3b-fc-r | 3B | 32k (max 128k) | Multi-turn Conversation, Function-calling | 🤗 Link | 🤗 Link | | xLAM-2-1b-fc-r | 1B | 32k (max 128k) | Multi-turn Conversation, Function-calling | 🤗 Link | 🤗 Link | Note: The default context length for Qwen-2.5-based models is 32k, but you can use techniques like YaRN (Yet Another Recursive Network) to achieve maximum 128k context length. Please refer to here for more details. You can also explore our previous xLAM series here. The `-fc` suffix indicates that the models are fine-tuned for function calling tasks, while the `-r` suffix signifies a research release. ✅ All models are fully compatible with vLLM and Transformers-based inference frameworks. - Transformers 4.46.1 (or later) - PyTorch 2.5.1+cu124 (or later) - Datasets 3.1.0 (or later) - Tokenizers 0.20.3 (or later) The new xLAM models are designed to work seamlessly with the Hugging Face Transformers library and utilize natural chat templates for an easy and intuitive conversational experience. Below are examples of how to use these models. The xLAM models can also be efficiently served using vLLM for high-throughput inference. Please use `vllm>=0.6.5` since earlier versions will cause degraded performance for Qwen-based models. 2. Download the tool parser plugin to your local path: Note: Ensure that the tool parser plugin file is downloaded and that the path specified in `--tool-parser-plugin` correctly points to your local copy of the file. The xLAM series models all utilize the same tool call parser, so you only need to download it once for all models. Here's a minimal example to test tool usage with the served endpoint: For more advanced configurations and deployment options, please refer to the vLLM documentation. Performance comparison of different models on BFCL leaderboard. The rank is based on the overall accuracy, which is a weighted average of different evaluation categories. "FC" stands for function-calling mode in contrast to using a customized "prompt" to extract the function calls. Success Rate (pass@1) on τ-bench benchmark averaged across at least 5 trials. Our xLAM-2-70b-fc-r model achieves an overall success rate of 56.2% on τ-bench, significantly outperforming the base Llama 3.1 70B Instruct model (38.2%) and other open-source models like DeepSeek v3 (40.6%). Notably, our best model even outperforms proprietary models such as GPT-4o (52.9%) and approaches the performance of more recent models like Claude 3.5 Sonnet (new) (60.1%). Pass^k curves measuring the probability that all 5 independent trials succeed for a given task, averaged across all tasks for τ-retail (left) and τ-airline (right) domains. Higher values indicate better consistency of the models. This release is for research purposes only in support of an academic paper. Our models, datasets, and code are not specifically designed or evaluated for all downstream purposes. We strongly recommend users evaluate and address potential concerns related to accuracy, safety, and fairness before deploying this model. We encourage users to consider the common limitations of AI, comply with applicable laws, and leverage best practices when selecting use cases, particularly for high-risk scenarios where errors or misuse could significantly impact people's lives, rights, or safety. For further guidance on use cases, refer to our AUP and AI AUP. For all Llama relevant models, please also follow corresponding Llama license and terms. Meta Llama 3 is licensed under the Meta Llama 3 Community License, Copyright © Meta Platforms, Inc. All Rights Reserved. If you use our model or dataset in your work, please cite our paper: Additionally, please check our other awesome related works regarding xLAM series and consider citing them as well:

NaNK

llama

4,718

codegen-350M-multi

license:bsd-3-clause

4,111

blip-itm-large-flickr

license:bsd-3-clause

3,194

codet5p-770m

license:bsd-3-clause

2,610

moirai-1.0-R-base

license:cc-by-nc-4.0

2,277

SFR-Embedding-Code-2B_R

NaNK

license:cc-by-nc-4.0

1,920

GTA1-32B

Reinforcement learning (RL) (e.g., GRPO) helps with grounding because of its inherent objective alignment—rewarding successful clicks—rather than encouraging long textual Chain-of-Thought (CoT) reasoning. Unlike approaches that rely heavily on verbose CoT reasoning, GRPO directly incentivizes actionable and grounded responses. Based on findings from our blog, we share state-of-the-art GUI grounding models trained using GRPO. We follow the standard evaluation protocol and benchmark our model on three challenging datasets. Our method consistently achieves the best results among all open-source model families. Below are the comparative results: | Model | Size | Open Source | ScreenSpot-V2 | ScreenSpotPro | OSWORLD-G | OSWORLD-G-Refined | |-------------------|:--------:|:---------------:|:-----------------:|:-----------------:|:-----------------:|:-----------------:| | OpenAI CUA | — | ❌ | 87.9 | 23.4 | — | — | | Claude 3.7 | — | ❌ | 87.6 | 27.7 | — | — | | JEDI-7B | 7B | ✅ | 91.7 | 39.5 | 54.1 | — | | SE-GUI | 7B | ✅ | 90.3 | 47.0 | — | — | | UI-TARS | 7B | ✅ | 91.6 | 35.7 | 47.5 | — | | UI-TARS-1.5 | 7B | ✅ | 89.7 | 42.0 | 52.8 | 64.2 | | UGround-v1-7B | 7B | ✅ | — | 31.1 | — | 36.4 | | Qwen2.5-VL-32B-Instruct | 32B | ✅ | 91.9 | 48.0 | 46.5 | 59.6 | | UGround-v1-72B | 72B | ✅ | — | 34.5 | — | — | | Qwen2.5-VL-72B-Instruct | 72B | ✅ | 94.00 | 53.3 | — | 62.2 | | UI-TARS | 72B | ✅ | 90.3 | 38.1 | — | — | | OpenCUA | 7B | ✅ | 92.3 | 50.0 | 55.3 | 68.3 | | OpenCUA | 32B | ✅ | 93.4 | 55.3 | 59.6 | 70.2 | | GTA1-2507 (Ours) | 7B | ✅ | 92.4 (∆ +2.7) | 50.1 (∆ +8.1) | 55.1 (∆ +2.3) | 67.7 (∆ +3.5) | | GTA1 (Ours) | 7B | ✅ | 93.4 (∆ +0.1) | 55.5 (∆ +5.5) | 60.1 (∆ +4.8) | 68.8 (∆ +0.5) | | GTA1 (Ours) | 32B | ✅ | 95.2 (∆ +1.8) | 63.6 (∆ +8.3) | 65.2 (∆ +5.6) | 72.2 (∆ +2.0) | > Note: > - Model size is indicated in billions (B) of parameters. > - A dash (—) denotes results that are currently unavailable. > - A superscript asterisk (﹡) denotes our evaluated result. > - UI-TARS-1.5 7B, OpenCUA-7B, and OpenCUA-32B are applied as our baseline models. > - ∆ indicates the performance improvement (∆) of our model compared to its baseline. We evaluate our models on the OSWorld and OSWorld-Verified benchmarks following the standard evaluation protocol. The results demonstrate strong performance across both datasets. | Agent Model | Step | OSWorld | OSWorld-Verified | |-----------------|:--------:|:-----------:|:-------------------:| | Proprietary Models | | Claude 3.7 Sonnet | 100 | 28.0 | — | | OpenAI CUA 4o | 200 | 38.1 | — | | UI-TARS-1.5 | 100 | 42.5 | 41.8 | | OpenAI CUA o3 | 200 | 42.9 | — | | Open-Source Models | | Aria-UI w/ GPT-4o | 15 | 15.2 | — | | Aguvis-72B w/ GPT-4o | 15 | 17.0 | — | | UI-TARS-72B-SFT | 50 | 18.8 | — | | Agent S w/ Claude-3.5-Sonnet | 15 | 20.5 | — | | Agent S w/ GPT-4o | 15 | 20.6 | — | | UI-TARS-72B-DPO | 15 | 22.7 | — | | UI-TARS-72B-DPO | 50 | 24.6 | — | | UI-TARS-1.5-7B | 100 | 26.9 | 27.4 | | Jedi-7B w/ o3 | 100 | — | 51.0 | | Jedi-7B w/ GPT-4o | 100 | 27.0 | — | | Agent S2 w/ Claude-3.7-Sonnet | 50 | 34.5 | — | | Agent S2 w/ Gemini-2.5-Pro | 50 | 41.4 | 45.8 | | Agent S2.5 w/ o3 | 100 | — | 56.0 | | Agent S2.5 w/ GPT-5 | 100 | — | 58.4 | | CoAct-1 w/o3 & o4mini & OpenAI CUA 4o | 150 | — | 60.8 | | GTA1-7B-2507 w/ o3 | 100 | 45.2 | 53.1 | | GTA1-7B-2507 w/ GPT-5 | 100 | — | 61.0 | | GTA1-32B w/ o3 | 100 | — | 55.4 | | GTA1-32B w/ GPT-5 | 100 | — | 63.4 | We also evaluate our models on the WindowsAgentArena benchmark, demonstrating strong performance in Windows-specific GUI automation tasks. | Agent Model | Step | Success Rate | |-----------------|:--------:|:---------------:| | Kimi-VL | 15 | 10.4 | | WAA | — | 19.5 | | Jedi w/ GPT-4o | 100 | 33.7 | | GTA1-7B-2507 w/ o3 | 100 | 47.9 | | GTA1-7B-2507 w/ GPT-5 | 100 | 49.2 | | GTA1-32B w/ o3 | 100 | 51.2 | | GTA1-32B w/ GPT-5 | 100 | 50.6 | Inference Below is a code snippet demonstrating how to run inference using a trained model. This model is released for research and educational purposes. While our model demonstrates strong performance on GUI benchmarks, users should carefully evaluate its suitability for their specific use cases. Important Considerations: - Accuracy Limitations: Like all AI systems, this model may produce incorrect outputs or fail to accurately identify GUI elements in certain scenarios. - Safety and Security: Exercise caution when deploying GUI automation agents, especially in production environments where incorrect actions could affect system integrity or data security. - Human Oversight: We recommend maintaining appropriate human supervision when using this model for automated GUI interactions. - Compliance: Users are responsible for ensuring their use of this model complies with applicable laws, regulations, and organizational policies. Recommended Best Practices: - Thoroughly test the model in controlled environments before production deployment - Implement safeguards and error handling mechanisms - Consider the potential impact of automated actions on user systems and data - Regularly monitor and validate model performance in your specific domain For further guidance on use cases, refer to our AUP and AI AUP. If you're using any GTA model or find it helpful in your research, please cite it as follows:

NaNK

license:mit

1,450

codet5p-770m-py

license:bsd-3-clause

1,382

codet5-large

license:bsd-3-clause

1,377

xLAM-2-3b-fc-r

NaNK

llama

1,178

SweRankEmbed-Small

license:cc-by-nc-4.0

1,090

xLAM-2-3b-fc-r-gguf

NaNK

llama

1,064

safety-flan-t5-base

—

1,015

xgen-mm-phi3-mini-instruct-interleave-r-v1.5

license:apache-2.0

983

codegen-16B-multi

blip2-flan-t5-xl-coco

license:mit

870

xgen-7b-8k-inst

Official research release for the family of XGen models (`7B`) by Salesforce AI Research: Title: Long Sequence Modeling with XGen: A 7B LLM Trained on 8K Input Sequence Length Authors: Erik Nijkamp\, Tian Xie\, Hiroaki Hayashi\, Bo Pang\, Congying Xia\, Chen Xing, Jesse Vig, Semih Yavuz, Philippe Laban, Ben Krause, Senthil Purushwalkam, Tong Niu, Wojciech Kryscinski, Lidiya Murakhovs'ka, Prafulla Kumar Choubey, Alex Fabbri, Ye Liu, Rui Meng, Lifu Tu, Meghana Bhat, Chien-Sheng Wu, Silvio Savarese, Yingbo Zhou, Shafiq Rayhan Joty, Caiming Xiong. Correspondence to: Shafiq Rayhan Joty, Caiming Xiong Base models XGen-7B-4K-Base: XGen-7B model pre-trained under 4K sequence length. License: Apache-2.0 XGen-7B-8K-Base: XGen-7B model pre-trained under 8K sequence length. License: Apache-2.0 Supervised finetuned model on public domain instructional data. Released for research purpose only. The training data for the models are tokenized with OpenAI Tiktoken library. To use this model, install the package via `pip`: The models can be used as auto-regressive samplers as follows: This release is for research purposes only in support of an academic paper. Our models, datasets, and code are not specifically designed or evaluated for all downstream purposes. We strongly recommend users evaluate and address potential concerns related to accuracy, safety, and fairness before deploying this model. We encourage users to consider the common limitations of AI, comply with applicable laws, and leverage best practices when selecting use cases, particularly for high-risk scenarios where errors or misuse could significantly impact people’s lives, rights, or safety. For further guidance on use cases, refer to our AUP and AI AUP.

NaNK

llama

859

codegen-6B-multi

NaNK

license:bsd-3-clause

855

codegen-2B-multi

NaNK

license:bsd-3-clause

817

codegen-2B-mono

NaNK

license:bsd-3-clause

808

xgen-7b-8k-base

CoDA-v0-Instruct

license:cc-by-nc-4.0

774

XLAM 2 1b Fc R

NaNK

llama

770

codegen-6B-nl

NaNK

license:bsd-3-clause

711

codegen-16B-nl

NaNK

license:bsd-3-clause

708

codegen-350M-nl

license:bsd-3-clause

701

xgen-mm-phi3-mini-instruct-r-v1

license:cc-by-nc-4.0

699

185

instructblip-vicuna-13b

NaNK

—

629

instructblip-flan-t5-xxl

license:mit

598

codet5p-220m-py

license:bsd-3-clause

416

xgen-small-4B-instruct-r

NaNK

llama

404

GTA1-7B

NaNK

license:mit

375

xLAM-2-1b-fc-r-gguf

NaNK

llama

360

codegen-6B-mono

NaNK

license:bsd-3-clause

352

CoDA-v0-Base

Try CoDA · Paper · Model Collection · GitHub Repository Welcome to CoDA, Salesforce AI Research's diffusion-based language model designed for powerful code generation and bidirectional context understanding. We're releasing CoDA as a lightweight yet capable model: - `CoDA-1.7B-Base` — diffusion foundation model with bidirectional diffusion architecture, ideal for further fine-tuning and RL training - `CoDA-1.7B-Instruct` — optimized for code generation tasks with bidirectional diffusion modeling (1.7B parameters) CoDA leverages discrete diffusion processes to enable understanding of both past and future tokens, making it uniquely suited for code completion and generation tasks where context flows in both directions. > [!NOTE] > This model card is dedicated to the `CoDA-1.7B-Base` model. Check out our model collection for other variants. Bidirectional Context Understanding: Leverage discrete diffusion processes to understand both past and future tokens, enabling superior code completion. Confidence-Guided Sampling: Maintain competitive inference latency through intelligent sampling strategies that balance quality and speed. Lightweight Architecture: Achieve strong performance with only 1.7B parameters, making it accessible for researchers with limited computational resources. Full Training Pipeline: Complete reproducible training pipeline from pre-training to fine-tuning, enabling customization for specific domains. Optimized for Code: Specifically designed and trained for code generation tasks, with strong performance on HumanEval, MBPP, and other coding benchmarks. - Model Size: 1.7B parameters - Architecture: Diffusion-based language model - Training: TPU-based pre-training with GPU fine-tuning - Primary Use: Code generation and completion tasks - Bidirectional Context: Diffusion modeling enables understanding of both past and future tokens - Confidence-Guided Sampling: Maintains competitive inference latency through intelligent sampling - Lightweight Design: Achieves strong performance with fewer parameters than comparable models - Open Training Pipeline: Fully reproducible training from pre-training to fine-tuning CoDA-1.7B-Instruct demonstrates competitive performance on standard code generation benchmarks: | Model | HumanEval | HumanEval+ | MBPP | MBPP+ | EvalPlus | |-------|-----------|------------|------|-------|----------| | CoDA-Base | 29.3 | 23.8 | 35.2 | 46.0 | 34.9 | | CoDA-Instruct | 54.3 | 47.6 | 47.2 | 63.2 | 55.4 | | Dream-Base | 56.7 | 50.0 | 68.7 | 57.4 | 53.7 | | Dream-7B-Instruct | 57.9 | 53.7 | 68.3 | 56.1 | 54.9 | | LLaDA-8B-Instruct | 35.4 | 31.7 | 31.5 | 28.6 | 30.2 | 🎯 Key Finding: CoDA-1.7B-Instruct matches or surpasses diffusion models up to 7B parameters while maintaining significantly lower computational requirements. CoDA offers an advantageous balance between inference speed and accuracy compared to larger diffusion models. Three-stage training: (1) Pre-training with bidirectional masking, (2) Post-training with instruction format, (3) Inference with progressive denoising. For production deployment, we provide serving with OpenAI-compatible APIs: Customize generation behavior with environment variables: Recommended Settings: - Fast inference: `STEPS=64`, `TEMPERATURE=0.0` - Quality generation: `STEPS=128`, `TEMPERATURE=0.7`, `TOPP=0.9` - High quality: `STEPS=256`, `TEMPERATURE=0.5`, `TOPP=0.95` The complete training pipeline is available in our repository: Technical report coming soon. For now, please cite: - 📄 Technical Report: technicalreport.pdf - 💻 Code Repository: github.com/SalesforceAIResearch/CoDA - 🤗 Model Hub: Salesforce CoDA collection We thank Lingpeng Kong for insightful discussions and Jialei Chen for technical support with TPU infrastructure.

license:cc-by-nc-4.0

346

codet5p-6b

NaNK

license:bsd-3-clause

345

xgen-mm-vid-phi3-mini-r-v1.5-32tokens-8frames

license:cc-by-nc-4.0

300

codet5p-2b

NaNK

license:bsd-3-clause

279

GTA1-7B-2507

NaNK

license:apache-2.0

261

Llama-xLAM-2-70b-fc-r

NaNK

llama

257

codegen2-1B_P

NaNK

license:apache-2.0

224

xLAM-7b-r

NaNK

license:cc-by-nc-4.0

224

mixqg-3b

NaNK

—

222

blip-itm-base-flickr

license:bsd-3-clause

215

xgen-mm-vid-phi3-mini-r-v1.5-128tokens-8frames

license:cc-by-nc-4.0

195

codet5-base-codexglue-sum-java

license:bsd-3-clause

182

codet5p-16b

NaNK

license:bsd-3-clause

163

xgen-small-9B-instruct-r

NaNK

llama

160

XLAM 1b Fc R

[Homepage] | [APIGen Paper] | [ActionStudio Paper] | [Discord] | [Dataset] | [Github] Welcome to the xLAM model family! Large Action Models (LAMs) are advanced large language models designed to enhance decision-making and translate user intentions into executable actions that interact with the world. LAMs autonomously plan and execute tasks to achieve specific goals, serving as the brains of AI agents. They have the potential to automate workflow processes across various domains, making them invaluable for a wide range of applications. Table of Contents - Model Series - Repository Overview - Benchmark Results - Usage - Basic Usage with Huggingface - Usage with vLLM - License - Citation We provide a series of xLAMs in different sizes to cater to various applications, including those optimized for function-calling and general agent applications: | Model | # Total Params | Context Length |Release Date | Category | Download Model | Download GGUF files | |------------------------|----------------|----------------|----|----|----------------|----------| | xLAM-7b-r | 7.24B | 32k | Sep. 5, 2024|General, Function-calling | 🤗 Link | -- | | xLAM-8x7b-r | 46.7B | 32k | Sep. 5, 2024|General, Function-calling | 🤗 Link | -- | | xLAM-8x22b-r | 141B | 64k | Sep. 5, 2024|General, Function-calling | 🤗 Link | -- | | xLAM-1b-fc-r | 1.35B | 16k | July 17, 2024 | Function-calling| 🤗 Link | 🤗 Link | | xLAM-7b-fc-r | 6.91B | 4k | July 17, 2024| Function-calling| 🤗 Link | 🤗 Link | | xLAM-v0.1-r | 46.7B | 32k | Mar. 18, 2024 |General, Function-calling | 🤗 Link | -- | The `fc` series of models are optimized for function-calling capability, providing fast, accurate, and structured responses based on input queries and available APIs. These models are fine-tuned based on the deepseek-coder models and are designed to be small enough for deployment on personal devices like phones or computers. We also provide their quantized GGUF files for efficient deployment and execution. GGUF is a file format designed to efficiently store and load large language models, making GGUF ideal for running AI models on local devices with limited resources, enabling offline functionality and enhanced privacy. This repository is focused on our tiny `xLAM-1b-fc-r` model, which is optimized for function-calling and can be easily deployed on personal devices. Function-calling, or tool use, is one of the key capabilities for AI agents. It requires the model not only understand and generate human-like text but also to execute functional API calls based on natural language instructions. This extends the utility of LLMs beyond simple conversation tasks to dynamic interactions with a variety of digital services and applications, such as retrieving weather information, managing social media platforms, and handling financial services. The instructions will guide you through the setup, usage, and integration of `xLAM-1b-fc-r` with HuggingFace and vLLM. We will first introduce the basic usage, and then walk through the provided tutorial and example scripts in the examples folder. - Transformers 4.41.0 - Pytorch 2.3.0+cu121 - Datasets 2.19.1 - Tokenizers 0.19.1 We mainly test our function-calling models on the Berkeley Function-Calling Leaderboard (BFCL), which offers a comprehensive evaluation framework for assessing LLMs' function-calling capabilities across various programming languages and application domains like Java, JavaScript, and Python. Performance comparison on the BFCL benchmark as of date 07/18/2024. Evaluated with temperature=0.001 and topp=1 Our xLAM-7b-fc-r secures the 3rd place with an overall accuracy of 88.24% on the leaderboard, outperforming many strong models. Notably, our xLAM-1b-fc-r model is the only tiny model with less than 2B parameters on the leaderboard, but still achieves a competitive overall accuracy of 78.94% and outperforming GPT3-Turbo and many larger models. Both models exhibit balanced performance across various categories, showing their strong function-calling capabilities despite their small sizes. See our paper and Github repo for more detailed analysis. To use the `xLAM-1b-fc-r` model from Huggingface, please first install the `transformers` library: We use the following example to illustrate how to use our model to perform function-calling tasks. Please note that, our model works best with our provided prompt format. It allows us to extract JSON output that is similar to the function-calling mode of ChatGPT. { "toolcalls": [ {"name": "funcname1", "arguments": {"argument1": "value1", "argument2": "value2"}}, ... (more tool calls as required) ] } ` Then you should be able to see the following output string in JSON format: We highly recommend to use our provided prompt format and helper functions to yield the best function-calling performance of our model. We provide example scripts to deploy our model with `vllm` and run inferences. First, install the required packages: The example scripts are located in the examples folder. To build prompts using the chat template and output formatted prompts ready for various test cases, run: Options: - `--temperature`: Default 0.3 - `--topp`: Default 1.0 - `--maxtokens`: Default 512 This test script provides a handler implementation that can be easily applied to your customized function-calling applications. To test the xLAM model directly with the vLLM library, run: Options are the same as for the endpoint test. This test script also provides a handler implementation that can be easily applied to your customized function-calling applications. These examples are designed to be flexible and easily integrated into your own projects. Feel free to modify the scripts to suit your specific needs and applications. You can adjust test queries or API definitions in each script to test different scenarios or model capabilities. Additional customization tips: - Modify the `--dtype` parameter when serving the model based on your GPU capacity. - Refer to the vLLM documentation for more detailed configuration options. - Explore the `demo.ipynb` file for a comprehensive description of the entire workflow, including how to execute APIs. These resources provide a robust foundation for integrating xLAM models into your applications, allowing for tailored and efficient deployment. `xLAM-1b-fc-r` is distributed under the CC-BY-NC-4.0 license, with additional terms specified in the Deepseek license. This release is for research purposes only in support of an academic paper. Our models, datasets, and code are not specifically designed or evaluated for all downstream purposes. We strongly recommend users evaluate and address potential concerns related to accuracy, safety, and fairness before deploying this model. We encourage users to consider the common limitations of AI, comply with applicable laws, and leverage best practices when selecting use cases, particularly for high-risk scenarios where errors or misuse could significantly impact people’s lives, rights, or safety. For further guidance on use cases, refer to our AUP and AI AUP. If you find this repo helpful, please cite our paper:

NaNK

llama

146

codegen25-7b-multi_P

codegen-16B-mono

xgen-mm-phi3-mini-instruct-dpo-r-v1.5

license:apache-2.0

129

xLAM-7b-fc-r-gguf

NaNK

license:cc-by-nc-4.0

124

xLAM-2-32b-fc-r

NaNK

llama

121

xgen-small-4B-base-r

NaNK

llama

107

codet5p-220m-bimodal

license:bsd-3-clause

102

xLAM-1b-fc-r-gguf

NaNK

license:cc-by-nc-4.0

codegen2-7B_P

NaNK

license:apache-2.0

dialogstudio-t5-base-v1.0

license:apache-2.0

Llama Fin 8b

💰 Demystifying Domain-adaptive Post-training for Financial LLMs This is the finance-specific large language model trained using the recipe described in our paper: 📄 Demystifying Domain-adaptive Post-training for Financial LLMs For more details, please check the following resources: - 🌐 Project Page: https://vincent950129.github.io/adapt-llm/ - 📚 Training Data: https://huggingface.co/datasets/Salesforce/FinTrain - 🧠 Evaluation Data: https://huggingface.co/datasets/Salesforce/FinEval - 💻 Code Repository: https://github.com/SalesforceAIResearch/FinDAP Ethical Considerations Users need to make their own assessment regarding any obligations or responsibilities under the corresponding licenses or terms and conditions pertaining to the original datasets and data. This release is for research purposes only in support of an academic paper. If you find our project helpful, please consider citing our paper 😊

NaNK

llama

xgen-mm-phi3-mini-base-r-v1.5

license:apache-2.0

cogalign-internvl2_5-mpo-1b

NaNK

license:cc-by-nc-4.0

mixqg-base

—

FARE-20B

Foundational Automatic Evaluators: Scaling Multi-Task Generative Evaluator Training for Reasoning-Centric Domains Paper: arXiv link Authors: Austin Xu, Xuan-Phi Nguyen, Yilun Zhou, Chien-Sheng Wu, Caiming Xiong, Shafiq Joty FARE-20B is a multi-task evaluator model finetuned from gpt-oss-20B. It is trained on a large-scale multi-task, multi-domain data mixture using rejection-sampling SFT to perform the following evaluation tasks: Pairwise comparisons, step-level evaluation, reference-based verification, reference-free verification, and single-rating assessment. Usage > [!IMPORTANT] > The FARE family of evaluators has been trained with specific system and user prompt templates. We provide examples below for two evaluation tasks: Pairwise comparisons and step-level error identification evaluation. For other tasks, we provide prompt templates in our paper (Appendix E). Example inference with SGLang For FARE-20B (gpt-oss variant), our evaluations were conducted with SGLang. We provide a minimal working example below with pairwise evaluation. For example usage with vLLM, see FARE-8B Ethics disclaimer for Salesforce AI models, data, code This release is for research purposes only in support of an academic paper. Our models, datasets, and code are not specifically designed or evaluated for all downstream purposes. We strongly recommend users evaluate and address potential concerns related to accuracy, safety, and fairness before deploying this model. We encourage users to consider the common limitations of AI, comply with applicable laws, and leverage best practices when selecting use cases, particularly for high-risk scenarios where errors or misuse could significantly impact people’s lives, rights, or safety. For further guidance on use cases, refer to our standard AUP and AI AUP.

NaNK

license:cc-by-nc-4.0

codet5-large-ntp-py

license:bsd-3-clause

FARE 8B

Foundational Automatic Evaluators: Scaling Multi-Task Generative Evaluator Training for Reasoning-Centric Domains Paper: arXiv link Authors: Austin Xu, Xuan-Phi Nguyen, Yilun Zhou, Chien-Sheng Wu, Caiming Xiong, Shafiq Joty FARE-8B is a multi-task evaluator model finetuned from Qwen-8B. It is trained on a large-scale multi-task, multi-domain data mixture using rejection-sampling SFT to perform the following evaluation tasks: Pairwise comparisons, step-level evaluation, reference-based verification, reference-free verification, and single-rating assessment. Usage > [!IMPORTANT] > The FARE family of evaluators has been trained with specific system and user prompt templates. We provide examples below for two evaluation tasks: Pairwise comparisons and step-level error identification evaluation. For other tasks, we provide prompt templates in our paper (Appendix E). Example inference with vLLM For FARE-8B (Qwen-3 variant), our evaluations were conducted with vLLM. We provide a minimal working example below with pairwise evaluation. For example usage with SGLang, see FARE-20B Ethics disclaimer for Salesforce AI models, data, code This release is for research purposes only in support of an academic paper. Our models, datasets, and code are not specifically designed or evaluated for all downstream purposes. We strongly recommend users evaluate and address potential concerns related to accuracy, safety, and fairness before deploying this model. We encourage users to consider the common limitations of AI, comply with applicable laws, and leverage best practices when selecting use cases, particularly for high-risk scenarios where errors or misuse could significantly impact people’s lives, rights, or safety. For further guidance on use cases, refer to our standard AUP and AI AUP.

NaNK

license:cc-by-nc-4.0

SweRankLLM-Small

NaNK

license:cc-by-nc-4.0

codegen2-3_7B_P

NaNK

license:apache-2.0

instructcodet5p-16b

NaNK

license:bsd-3-clause

xgen-mm-phi3-mini-instruct-singleimg-r-v1.5

license:apache-2.0

xLAM-v0.1-r

license:cc-by-nc-4.0

xgen-mm-phi3-mini-base-r-v1

license:apache-2.0

LLaMA-3-8B-SFR-Iterative-DPO-R

We release a state-of-the-art instruct model of its class, Llama-3-8B-SFR-Iterative-DPO-R. On all three widely-used instruct model benchmarks: Alpaca-Eval-V2, MT-Bench, Chat-Arena-Hard, our model outperforms all models of similar size, e.g., LLaMA-3-8B-it, and most large open-sourced models, e.g., Mixtral-8x7B-it.

NaNK

llama

codegen-2B-nl

NaNK

license:bsd-3-clause

xLAM-7b-fc-r

NaNK

llama

grappa_large_jnt

—

xLAM-8x7b-r

NaNK

license:cc-by-nc-4.0

codegen25-7b-instruct_P

NaNK

llama

WQRM-PRE

license:mit

dialogstudio-t5-large-v1.0

license:apache-2.0

qa_consolidation

license:apache-2.0

E1-Math-1.5B

NaNK

license:cc-by-nc-4.0

cogalign-internvl2_5-mpo-4b

NaNK

license:cc-by-nc-4.0

xgen-7b-4k-base

NaNK

llama

SweRankEmbed-Large

NaNK

license:cc-by-nc-4.0

codegen2-16B_P

NaNK

license:apache-2.0

WQRM

license:mit

BLIP3o-NEXT-GRPO-TexT-3B

This is BLIP3o-NEXT-GRPO-TexT checkpoint trained on the BLIP3o-NEXT-SFT. Clone the repo (if you haven’t already) and install the environment: ``` git clone https://github.com/JiuhaiChen/BLIP3o.git

NaNK

license:apache-2.0

codet5-base-codexglue-sum-python

license:bsd-3-clause

bart-large-xsum-samsum

—

codet5-base-codexglue-defect

license:bsd-3-clause

LLaMA-3-8B-SFR-SFT-R

NaNK

llama

codet5-base-codexglue-translate-java-cs

license:bsd-3-clause

codegen25-7b-mono_P

NaNK

llama

codet5-base-codexglue-translate-cs-java

license:bsd-3-clause

LLaMA-3-8B-SFR-RM-R

NaNK

llama

discord_qg

license:apache-2.0

codet5-base-codexglue-refine-medium

license:bsd-3-clause

BLIP3o-NEXT-edit-VAE

This is BLIP3o-NEXT-edit-VAE checkpoint trained on the BLIP3o-NEXT-SFT and use VAE as condition. Clone the repo (if you haven’t already) and install the environment: and switch to BLIP3o-NEXT-edit branch to do the inference.

license:apache-2.0

qaconv-roberta-large-squad2

—

SweRankLLM-Large

NaNK

license:cc-by-nc-4.0

qaconv-unifiedqa-t5-large

—

codet5-base-codexglue-concode

license:bsd-3-clause

xLAM-8x22b-r

NaNK

license:cc-by-nc-4.0

llama3-siglip-mantis-taco-8b

NaNK

base_model:TIGER-Lab/Mantis-8B-siglip-llama3

codet5-base-codexglue-sum-php

license:bsd-3-clause

E1-Code-14B

NaNK

license:cc-by-nc-4.0

dialogstudio-t5-3b-v1.0

NaNK

license:apache-2.0

codet5-base-codexglue-refine-small

license:bsd-3-clause

qwen2-siglip-llava-ov-taco-7b

NaNK

license:cc-by-nc-4.0

cogalign-llava-ov-0_5b

NaNK

license:cc-by-nc-4.0

codet5-base-codexglue-sum-go

license:bsd-3-clause

bic_simple_edit_id

license:apache-2.0

squality-socratic-books-30M

license:bsd-3-clause

socratic-books-30M

license:bsd-3-clause

qaconv-unifiedqa-t5-3b

NaNK

—

codet5-base-codexglue-sum-ruby

license:bsd-3-clause

E1-AceReason-14B

NaNK

license:cc-by-nc-4.0

mixqg-large

—

LLaMA-3-8B-SFR-Iterative-DPO-Concise-R

NaNK

llama

llama3-clip-pretrained-mantis-taco-8b

NaNK

base_model:TIGER-Lab/Mantis-8B-clip-llama3-pretraind

codet5-base-codexglue-sum-javascript

license:bsd-3-clause

bart-large-swipe-clean

license:apache-2.0

xgen-small-r

license:cc-by-nc-4.0

Elastic-Reasoning

—

discord_qa

license:apache-2.0

BLIP

license:bsd

E1-Math-7B

NaNK

license:cc-by-nc-4.0

xgen-small-9B-base-r

NaNK

llama

BLIP3o-NEXT-Pretrain-3B

NaNK

license:apache-2.0

BLIP3o-NEXT-GRPO-Geneval-3B

This is BLIP3o-NEXT-GRPO-Geneval checkpoint trained on the BLIP3o-NEXT-SFT. Clone the repo (if you haven’t already) and install the environment: ``` git clone https://github.com/JiuhaiChen/BLIP3o.git

NaNK

license:apache-2.0

cods-bart-large-xsum-samsum

—

qaconv-unifiedqa-t5-base

—

safety-flan-t5-small

license:cc-by-nc-4.0

BLIP3o-NEXT-SFT-3B

This is BLIP3o-NEXT-SFT checkpoint trained on BLIP3o-NEXT-Pretrain. Clone the repo (if you haven’t already) and install the environment: ``` git clone https://github.com/JiuhaiChen/BLIP3o.git

NaNK

license:apache-2.0

Salesforce

blip-image-captioning-base

blip-image-captioning-large

blip2-opt-2.7b

blip2-opt-2.7b-coco

blip-vqa-base

moirai-moe-1.0-R-base

codegen-350M-mono

moirai-2.0-R-small

codet5p-110m-embedding

blip2-flan-t5-xl

moirai-1.1-R-large

moirai-1.1-R-base

moirai-1.1-R-small

SFR-Embedding-Mistral

blip2-opt-6.7b-coco

blip-itm-base-coco

codet5-small

blip-vqa-capfilt-large

codet5-base

blip2-opt-6.7b

SFR-Embedding-2_R

blip-itm-large-coco

instructblip-vicuna-7b

blip2-itm-vit-g

SFR-Embedding-Code-400M_R

codet5p-220m

moirai-1.0-R-small

instructblip-flan-t5-xl

Llama-xLAM-2-8b-fc-r-gguf

blip2-itm-vit-g-coco

blip2-flan-t5-xxl

moirai-moe-1.0-R-small

codet5-base-multi-sum

ctrl

moirai-1.0-R-large

Llama-xLAM-2-8b-fc-r

codegen-350M-multi

blip-itm-large-flickr

codet5p-770m

moirai-1.0-R-base

SFR-Embedding-Code-2B_R

GTA1-32B

codet5p-770m-py

codet5-large

xLAM-2-3b-fc-r

SweRankEmbed-Small

xLAM-2-3b-fc-r-gguf

safety-flan-t5-base

xgen-mm-phi3-mini-instruct-interleave-r-v1.5

codegen-16B-multi

blip2-flan-t5-xl-coco

xgen-7b-8k-inst

codegen-6B-multi

codegen-2B-multi

codegen-2B-mono

xgen-7b-8k-base

CoDA-v0-Instruct

XLAM 2 1b Fc R

codegen-6B-nl

codegen-16B-nl

codegen-350M-nl

xgen-mm-phi3-mini-instruct-r-v1

instructblip-vicuna-13b

instructblip-flan-t5-xxl

codet5p-220m-py

xgen-small-4B-instruct-r

GTA1-7B

xLAM-2-1b-fc-r-gguf

codegen-6B-mono

CoDA-v0-Base

codet5p-6b

xgen-mm-vid-phi3-mini-r-v1.5-32tokens-8frames

codet5p-2b

GTA1-7B-2507

Llama-xLAM-2-70b-fc-r

codegen2-1B_P

xLAM-7b-r

mixqg-3b

blip-itm-base-flickr