ibm-granite
✓ VerifiedEnterpriseIBM's enterprise-focused Granite model family
granite-timeseries-ttm-r1
--- license: apache-2.0 pipeline_tag: time-series-forecasting tags: - time series - forecasting - pretrained models - foundation models - time series foundation models - time-series library_name: granite-tsfm new_version: ibm-granite/granite-timeseries-ttm-r2 ---
granite-embedding-small-english-r2
--- license: apache-2.0 language: - en pipeline_tag: sentence-similarity library_name: sentence-transformers tags: - granite - embeddings - transformers - mteb - feature-extraction ---
granite-3.3-2b-instruct
--- pipeline_tag: text-generation inference: false license: apache-2.0 library_name: transformers tags: - language - granite-3.3 base_model: - ibm-granite/granite-3.3-2b-base ---
granite-3.3-8b-instruct
--- pipeline_tag: text-generation inference: false license: apache-2.0 library_name: transformers tags: - language - granite-3.3 base_model: - ibm-granite/granite-3.3-8b-base ---
granite-timeseries-ttm-r2
TinyTimeMixers (TTMs) are compact pre-trained models for Multivariate Time-Series Forecasting, open-sourced by IBM Research. With model sizes starting from 1M params, TTM introduces the notion of the first-ever “tiny” pre-trained models for Time-Series Forecasting. The paper describing TTM was accepted at NeurIPS 24. TTM outperforms other models demanding billions of parameters in several popular zero-shot and few-shot forecasting benchmarks. TTMs are lightweight forecasters, pre-trained on publicly available time series data with various augmentations. TTM provides state-of-the-art zero-shot forecasts and can easily be fine-tuned for multi-variate forecasts with just 5% of the training data to be competitive. Note that zeroshot, fine-tuning and inference tasks using TTM can easily be executed on 1 GPU or on laptops. TTM r2 comprises TTM variants pre-trained on larger pretraining datasets (\~700M samples). The TTM r2.1 release increases the pretraining dataset size to approximately (\~1B samples). The prior model releases, TTM r1, were trained on \~250M samples and can be accessed here. In general, TTM r2 models perform better than TTM r1 models as they are trained on a larger pretraining dataset. In standard benchmarks, TTM r2 outperform TTM r1 by over 15%. However, the choice of r1 vs. r2 depends on your target data distribution, and hence users should try both variants and pick the best model for your data. The TTM r2 releases support point forecasting use-cases specifically ranging from minutely to hourly resolutions (Ex. 10 min, 15 min, 1 hour.). With the TTM r2.1 release, we add support for daily and weekly resolutions. - Paper: NeurIPS 2024, ArXiV - Repository: https://github.com/ibm-granite/granite-tsfm - PyPI project: https://pypi.org/project/granite-tsfm/ - Model architecture: https://github.com/ibm-granite/granite-tsfm/tree/main/tsfmpublic/models/tinytimemixer - Time Series Cookbook: https://github.com/ibm-granite-community/granite-timeseries-cookbook TTM falls under the category of “focused pre-trained models”, wherein each pre-trained TTM is tailored for a particular forecasting setting (governed by the context length and forecast length). Instead of building one massive model supporting all forecasting settings, we opt for the approach of constructing smaller pre-trained models, each focusing on a specific forecasting setting, thereby yielding more accurate results. Furthermore, this approach ensures that our models remain extremely small and exceptionally fast, facilitating easy deployment without demanding a ton of resources. Hence, in this model card, we release several pre-trained TTMs that can cater to many common forecasting settings in practice. Each pre-trained model will be released in a different branch name in this model card. Given the variety of models included, we recommend the use of `getmodel()` utility to automatically select the required model based on your input context length, and forecast length, and other requirements. You can also directly access a specific model using our getting started notebook mentioning the branch name. There are several models available in different branches of this model card. The naming scheme follows the following format: ` - - - - ` - context length: The historical data used as input to the TTM model. - prediction length: The number of time points predicted by model (i.e., the forecast length) - frequency tuning indicator ("ft" or missing): "ft" is used to indicate use of frequency prefix tuning. When enabled an extra embedding vector indicating the frequency of the data is added to the input of the model. If missing, only the context window is used by the model. - pretraining metric ("mae" or missing): MAE indicates pertaining with mean absolute error loss, while missing indicates using mean squared error. - release number ("r2" or "r2.1"): Indicates the model release; the release indicates which data was used to train the model. See "training data" below for more details on the data included in the particular training datasets. Example recipes and notebooks The scripts below can be used for any of the above TTM models. Please update the HF model URL and branch name in the `frompretrained` call appropriately to pick the model of your choice. Please note that a few of the notebooks directly use the `getmodel()` utility to select the model. - Getting started [[Recipe]](https://github.com/ibm-granite-community/granite-timeseries-cookbook/blob/main/recipes/TimeSeries/TimeSeriesGettingStarted.ipynb) [[colab]](https://colab.research.google.com/github/ibm-granite/granite-tsfm/blob/main/notebooks/hfdemo/ttmgettingstarted.ipynb) - Getting started with IBM watsonx [[Recipe]](https://github.com/ibm-granite-community/granite-timeseries-cookbook/blob/main/recipes/TimeSeries/GettingStartedwithWatsonXAISDK.ipynb) - Zeroshot Multivariate Forecasting [[Example]](https://github.com/ibm-granite/granite-tsfm/blob/main/notebooks/hfdemo/ttmgettingstarted.ipynb) - Finetuned Multivariate Forecasting: - Channel-Independent Finetuning [[Example 1]](https://github.com/ibm-granite/granite-tsfm/blob/main/notebooks/hfdemo/ttmgettingstarted.ipynb) [[Example 2]](https://github.com/ibm-granite/granite-tsfm/blob/main/notebooks/hfdemo/tinytimemixer/ttmm4hourly.ipynb) - Channel-Mix Finetuning [[Example]](https://github.com/ibm-granite/granite-tsfm/blob/main/notebooks/tutorial/ttmchannelmixfinetuning.ipynb) - TTM r2 release (extended features released on October 2024): - Finetuning and Forecasting with Exogenous/Control Variables [[Recipe 1]](https://github.com/ibm-granite-community/granite-timeseries-cookbook/blob/main/recipes/TimeSeries/Few-shotFinetuningandEvaluation.ipynb) [[Recipe 2]](https://github.com/ibm-granite-community/granite-timeseries-cookbook/blob/main/recipes/TimeSeries/BikeSharingFinetuningwithExogenous.ipynb) - Finetuning and Forecasting with static categorical features [Example: To be added soon] - Rolling Forecasts - Extend forecast lengths via rolling capability. Rolling beyond 2forecastlength is not recommended. [[Example]](https://github.com/ibm-granite/granite-tsfm/blob/main/notebooks/hfdemo/ttmrollingpredictiongettingstarted.ipynb) - Helper scripts for optimal Learning Rate suggestions for Finetuning [[Example]](https://github.com/ibm-granite/granite-tsfm/blob/main/notebooks/tutorial/ttmwithexogtutorial.ipynb) - TTM r2.1 release: - GIFT-Eval benchmark [[notebook]](https://github.com/SalesforceAIResearch/gift-eval/blob/main/notebooks/ttm.ipynb) Usage guidelines 1. Users have to externally standard scale their data independently for every channel before feeding it to the model (refer to `TimeSeriesPreprocessor`, our data processing utility for data scaling). 2. The current open-source version supports only minutely and hourly resolutions(Ex. 10 min, 15 min, 1 hour.). Other lower resolutions (say monthly or yearly) are currently not supported in this version, as the model needs a minimum context length of 512 or 1024. With the r2.1 release, we now also support daily and weekly resolution. 3. Enabling any upsampling or prepending zeros to virtually increase the context length for shorter-length datasets is not recommended and will impact the model performance. Automatic model selection Automatic model selection based on context length, prediction length, and other requirements can be done through use of the `getmodel()` function. For reference, the signature of the function is provided below: TTM outperforms popular benchmarks such as TimesFM, Moirai, Chronos, Lag-Llama, Moment, GPT4TS, TimeLLM, LLMTime in zero/fewshot forecasting while reducing computational requirements significantly. Moreover, TTMs are lightweight and can be executed even on CPU-only machines, enhancing usability and fostering wider adoption in resource-constrained environments. For more details, refer to our paper. - TTM-B referred in the paper maps to the 512 context models. - TTM-E referred in the paper maps to the 1024 context models. - TTM-A referred in the paper maps to the 1536 context models. The pre-training dataset used in this release differs slightly from the one used in the research paper, which may lead to minor variations in model performance as compared to the published results. Please refer to our paper for more details. Benchmarking scripts can be found here. For more details on TTM architecture and benchmarks, refer to our paper. - Zeroshot forecasting: Directly apply the pre-trained model on your target data to get an initial forecast (with no training). - Finetuned forecasting: Finetune the pre-trained model with a subset of your target data to further improve the forecast. Since, TTM models are extremely small and fast, it is practically very easy to finetune the model with your available target data in few minutes to get more accurate forecasts. The current release supports multivariate forecasting via both channel independence and channel-mixing approaches. Decoder Channel-Mixing can be enabled during fine-tuning for capturing strong channel-correlation patterns across time-series variates, a critical capability lacking in existing counterparts. In addition, TTM also supports exogenous infusion and static categorical data infusion. The r2.1 release builds upon the above, adding improved accuracy for shorter context length, daily/weekly resolution, combined with a larger pre-training dataset. The r2 TTM models were trained on a collection of datasets as follows: - Australian Electricity Demand: https://zenodo.org/records/4659727 - Australian Weather: https://zenodo.org/records/4654822 - Bitcoin: https://zenodo.org/records/5122101 - KDD Cup 2018: https://zenodo.org/records/4656756 - London Smart Meters: https://zenodo.org/records/4656091 - Saugeen River Flow: https://zenodo.org/records/4656058 - Solar Power: https://zenodo.org/records/4656027 - Sunspots: https://zenodo.org/records/4654722 - Solar: https://zenodo.org/records/4656144 - US Births: https://zenodo.org/records/4656049 - Wind Farms Production: https://zenodo.org/records/4654858 - Wind Power: https://zenodo.org/records/4656032 - PEMSD3, PEMSD4, PEMSD7, PEMSD8, PEMSBAY: https://drive.google.com/drive/folders/1g5v2Gq1tkOq8XO0HDCZ9nOTtRpB6-gPe - LOSLOOP: https://drive.google.com/drive/folders/1g5v2Gq1tkOq8XO0HDCZ9nOTtRpB6-gPe The r2.1 TTM models (denoted by branches with suffix r2.1) were trained on the above collection, in addition to the following datasets: - Weather: https://zenodo.org/records/4654822 - Covid Deaths: https://zenodo.org/records/4656009 - Covid Mobility: https://zenodo.org/records/4663809 - Extended Wikipedia Web Traffic: https://zenodo.org/records/7371038 - NN5: https://zenodo.org/records/4656117, https://zenodo.org/records/4656125 - Temperature Rain: https://zenodo.org/records/5129091 - Vehicle Trips: https://zenodo.org/records/5122537 - Kaggle Web Traffic: https://zenodo.org/records/4656075, https://zenodo.org/records/4656664 - Hierarchical Sales: https://huggingface.co/datasets/Salesforce/lotsadata/tree/main/hierarchicalsales - Project Tycho: https://huggingface.co/datasets/Salesforce/lotsadata/tree/main/projecttycho - Subseasonal: https://huggingface.co/datasets/Salesforce/lotsadata/tree/main/subseasonal - Subseasonal Precipitation: https://huggingface.co/datasets/Salesforce/lotsadata/tree/main/subseasonalprecip - Uber TLC: https://huggingface.co/datasets/Salesforce/lotsadata/tree/main/ubertlcdaily - Wiki Rolling: https://github.com/awslabs/gluonts/blob/1553651ca1fca63a16e012b8927bd9ce72b8e79e/datasets/wiki-rollingnips.tar.gz - CDC FluView ILINet: https://huggingface.co/datasets/Salesforce/lotsadata/tree/main/cdcfluviewilinet - CDC FluView WHO/NREVSS: https://huggingface.co/datasets/Salesforce/lotsadata/tree/main/cdcfluviewwhonrevss Citation Please cite the following paper if you intend to use our model or its associated architectures/approaches in your work. Vijay Ekambaram, Arindam Jati, Pankaj Dayama, Wesley M. Gifford, Tomoya Sakai, Sumanta Mukherjee, Chandra Reddy and Jayant Kalagnanam All content in this repository including code has been provided by IBM under the associated open source software license and IBM is under no obligation to provide enhancements, updates, or support. IBM developers produced this code as an open source project (not as an IBM product), and IBM makes no assertions as to the level of quality nor security, and will not be maintaining this code going forward.
granite-timeseries-tspulse-r1
TSPulse models are ultra-compact pre-trained models for time-series data, featuring just 1M parameters and supporting GPU-free inference. Designed for versatility, TSPulse excels across a wide range of tasks including classification, anomaly detection (AD), imputation, and similarity search. At the architecture level, TSPulse introduces a novel dual-space masked reconstruction strategy that learns jointly from both time and frequency domains —- capturing complementary patterns in a unified embedding space. This is coupled with a dual-embedding disentanglement mechanism that produces both fine-grained embeddings for detailed analysis and semantic embeddings for broader understanding. These semantic embeddings are inherently robust to variations in time, magnitude, and noise, making them ideal for time-series retrieval. At the task level, TSPulse integrates several innovations: - TSLens: A fine-tuning module for task-aware feature extraction. - Multi-head triangulation: Fuses outputs from multiple prediction streams to enhance anomaly detection robustness. - Hybrid masking: Reduces bias during pre-training, improving zero-shot imputation. TSPulse demonstrates significant improvements on several benchmarks: - +5–16% on UEA classification benchmarks - +20% on the TSB-AD benchmark for anomaly detection (rank #1 in both univariate and multivariate) - +50% in zero-shot imputation - +25% in semantic similarity search Despite being 10–100× smaller than competing models, TSPulse sets a new standard in efficient time-series foundation modeling—combining strong out-of-the-box performance, minimal compute requirements, and rapid adaptability to downstream tasks. - Paper: ArXiV - Repository: https://github.com/ibm-granite/granite-tsfm - PyPI project: https://pypi.org/project/granite-tsfm/ - Model architecture: https://github.com/ibm-granite/granite-tsfm/tree/main/tsfmpublic/models/tspulse - Time Series Cookbook: https://github.com/ibm-granite-community/granite-timeseries-cookbook Inspired by the success of small, task-specialized pre-trained models in the language/vision domain which achieve strong performance through minimal task-specific adaptations —- we extend this strategy to time-series. Since TSPulse offers multiple heads we specialize the pre-training for every task through reweighting loss objectives to prioritize heads most relevant to the target task. This enables TSPulse to refine task-specific representations while maintaining its lightweight design, facilitating efficient transfer learning across any datasets for the specified downstream task. We release three pre-trained variants optimized for specific time-series tasks. - `tspulse-hybrid-allhead-512-p8-r1` 🔹 Recommended for: Anomaly Detection 🔹 Branch: `main` - `tspulse-hybrid-dualhead-512-p8-r1` 🔹 Recommended for: Imputation and Search 🔹 Branch: `tspulse-hybrid-dualhead-512-p8-r1` - `tspulse-block-dualhead-512-p16-r1` 🔹 Recommended for: Classification 🔹 Branch: `tspulse-block-dualhead-512-p16-r1` To load a specific variant, use the revision argument in the frompretrained call as illustrated below: Each model name follows this format: `tspulse- - - -p -r ` - `masktype`: Type of masking used during pre-training → `hybrid` or `block` - `headtype`: Prediction heads enabled during pre-training → `allhead` = all heads enabled → `dualhead` = time and prob heads enabled - `contextlength`: Base Context length of the model → e.g., `512` - `patchlength`: Length of time-series patches used during training → e.g., `p8` or `p16` - `releasenumber`: Model release version → e.g., `r1` - Zero-shot Anomaly Detection [[Example]](https://github.com/ibm-granite/granite-tsfm/blob/main/notebooks/hfdemo/tspulseanomalydetection.ipynb) - Zero-shot Imputation [[Example]](https://github.com/ibm-granite/granite-tsfm/blob/main/notebooks/hfdemo/tspulseimputation.ipynb) [[Cookbook]](https://github.com/ibm-granite-community/granite-timeseries-cookbook/blob/main/recipes/Imputation/GettingStartedwithTSPulseImputation.ipynb) - Zero-shot Similarity Search [[Simple Search Example]](https://github.com/ibm-granite/granite-tsfm/blob/main/notebooks/hfdemo/tspulsesearchsimpleexample.ipynb) [[Indexing Example]](https://github.com/ibm-granite/granite-tsfm/blob/main/notebooks/hfdemo/tspulsesearchwithfaiss.ipynb) - Multivariate Classification [[Example]](https://github.com/ibm-granite/granite-tsfm/blob/main/notebooks/hfdemo/tspulseclassification.ipynb) [[Cookbook]](https://github.com/ibm-granite-community/granite-timeseries-cookbook/blob/main/recipes/Classification/GettingStartedwithTSPulseClassification.ipynb) - Finetuning script (to be added soon) The model is trained with a base context length of 512. Different tasks have varying minimum input length requirements: - Anomaly Detection (AD): Requires time-series of at least 3–4× the base context length (i.e., ~1536–2048 points) to ensure stable and robust anomaly detection. - Imputation & Semantic Search: Require a minimum of 512 points to align with the model's context window and enable meaningful processing. - Classification: Supports any input length. Internally, the model resamples or interpolates the input to a fixed length of 512 before inference. Benchmarking scripts to reproduce paper results will be added soon. Stay tuned! TSpulse r1 was trained on a collection of datasets as follows: - Australian Electricity Demand: https://zenodo.org/records/4659727 - Australian Weather: https://zenodo.org/records/4654822 - Bitcoin: https://zenodo.org/records/5122101 - KDD Cup 2018: https://zenodo.org/records/4656756 - London Smart Meters: https://zenodo.org/records/4656091 - Saugeen River Flow: https://zenodo.org/records/4656058 - Solar Power: https://zenodo.org/records/4656027 - Sunspots: https://zenodo.org/records/4654722 - Solar: https://zenodo.org/records/4656144 - US Births: https://zenodo.org/records/4656049 - Wind Farms Production: https://zenodo.org/records/4654858 - Wind Power: https://zenodo.org/records/4656032 - PEMSD3, PEMSD4, PEMSD7, PEMSD8, PEMSBAY: https://drive.google.com/drive/folders/1g5v2Gq1tkOq8XO0HDCZ9nOTtRpB6-gPe - LOSLOOP: https://drive.google.com/drive/folders/1g5v2Gq1tkOq8XO0HDCZ9nOTtRpB6-gPe Citation Please cite the following paper if you intend to use our model or its associated architectures/approaches in your work. Vijay Ekambaram, Subodh Kumar, Arindam Jati, Sumanta Mukherjee, Tomoya Sakai, Pankaj Dayama, Wesley M. Gifford, and Jayant Kalagnanam All content in this repository including code has been provided by IBM under the associated open source software license and IBM is under no obligation to provide enhancements, updates, or support. IBM developers produced this code as an open source project (not as an IBM product), and IBM makes no assertions as to the level of quality nor security, and will not be maintaining this code going forward.
granite-embedding-30m-english
granite-docling-258M
Granite Docling is a multimodal Image-Text-to-Text model engineered for efficient document conversion. It preserves the core features of Docling while maintaining seamless integration with DoclingD...
granite-vision-3.3-2b
Model Summary: Granite-vision-3.3-2b is a compact and efficient vision-language model, specifically designed for visual document understanding, enabling automated content extraction from tables, charts, infographics, plots, diagrams, and more. Granite-vision-3.3-2b introduces several novel experimental features such as image segmentation, doctags generation, and multi-page support (see Experimental Capabilities for more details) and offers enhanced safety when compared to earlier Granite vision models. The model was trained on a meticulously curated instruction-following data, comprising diverse public and synthetic datasets tailored to support a wide range of document understanding and general image tasks. Granite-vision-3.3-2b was trained by fine-tuning a Granite large language model with both image and text modalities. Evaluations: We compare the performance of granite-vision-3.3-2b with previous versions of granite-vision models. Evaluations were done using the standard llms-eval benchmark and spanned multiple public benchmarks, with particular emphasis on document understanding tasks while also including general visual question-answering benchmarks. | | Granite-vision-3.1-2b-preview | Granite-vision-3.2-2b | Granite-vision-3.3-2b | |-----------|-----------|--------------|----------------| | Document benchmarks | | ChartQA | 0.86 | 0.87 | 0.87 | | DocVQA | 0.88 | 0.89 | 0.91 | | TextVQA | 0.76 | 0.78 | 0.80 | | AI2D | 0.78 | 0.76 | 0.77 | | InfoVQA | 0.63 | 0.64 | 0.68 | | OCRBench | 0.75 | 0.77 | 0.79 | | LiveXiv VQA v2 | 0.61 | 0.61 | 0.61 | | LiveXiv TQA v2 | 0.55 | 0.57 | 0.52 | | Other benchmarks | | MMMU | 0.35 | 0.37 | 0.37 | | VQAv2 | 0.81 | 0.78 | 0.79 | | RealWorldQA | 0.65 | 0.63 | 0.63 | | VizWiz VQA | 0.64 | 0.63 | 0.62 | | OK VQA | 0.57 | 0.56 | 0.55| - Paper: Granite Vision: a lightweight, open-source multimodal model for enterprise Intelligence. Note that the paper describes Granite Vision 3.2. Granite Vision 3.3 shares most of the technical underpinnings with Granite 3.2. However, there are several enhancements in terms of new and improved vision encoder, many new high quality datasets for training, and several new experimental capabilities. - Release Date: Jun 11th, 2025 - License: Apache 2.0 Supported Input Format: Currently the model supports English instructions and images (png, jpeg) as input format. Intended Use: The model is intended to be used in enterprise applications that involve processing visual and text data. In particular, the model is well-suited for a range of visual document understanding tasks, such as analyzing tables and charts, performing optical character recognition (OCR), and answering questions based on document content. Additionally, its capabilities extend to general image understanding, enabling it to be applied to a broader range of business applications. For tasks that exclusively involve text-based input, we suggest using our Granite large language models, which are optimized for text-only processing and offer superior performance compared to this model. Granite Vision model is supported natively `transformers>=4.49`. Below is a simple example of how to use the `granite-vision-3.3-2b` model. First, make sure to build the latest versions of transformers: The model can also be loaded with `vLLM`. First make sure to install the following libraries: Then, copy the snippet from the section that is relevant for your use case. The Granite-vision-3.3-2b model also went through safety alignment to make sure responses are safer without affecting the model’s performance on its intended task. We carefully safety aligned the model on publicly available safety data and synthetically generated safety data. We report our safety scores on publicly available RTVLM and VLGuard datasets. | | Politics | Racial | Jailbreak | Mislead | |-----------|-----------|--------------|----------------|----------------| |Granite-vision-3.1-2b-preview|7.2|7.7|4.5|7.6| |Granite-vision-3.2-2b|7.6|7.8|6.2|8.0| |Granite-vision-3.3-2b|8.0|8.1|7.5|8.0| | | Unsafe Images (Unsafe) | Safe Images with Unsafe Instructions | |-----------|-----------|--------------| |Granite-vision-3.1-2b-preview|6.6|8.4| |Granite-vision-3.2-2b|7.6|8.9| |Granite-vision-3.3-2b|8.4|9.3| Granite-vision-3.3-2b introduces three new experimental capabilities: (1) Image segmentation: A notebook showing a segmentation example (2) Doctags generation: Parse document images to structured text in doctags format. Please see Docling project for more details on doctags. (3) Multipage support: The model was trained to handle question answering (QA) tasks using multiple consecutive pages from a document—up to 8 pages—given the demands of long-context processing. To support such long sequences without exceeding GPU memory limits, we recommend resizing images so that their longer dimension is 768 pixels. For an example of fine-tuning granite-vision-3.3-2b for new tasks refer to this notebook. For an example of MM-RAG using granite vision refer to this notebook. Model Architecture: The architecture of granite-vision-3.3-2b consists of the following components: (1) Vision encoder: SigLIP2 (https://huggingface.co/google/siglip2-so400m-patch14-384). (2) Vision-language connector: two-layer MLP with gelu activation function. (3) Large language model: granite-3.1-2b-instruct with 128k context length (https://huggingface.co/ibm-granite/granite-3.1-2b-instruct). We built upon LLaVA (https://llava-vl.github.io) to train our model. We use multi-layer encoder features and a denser grid resolution in AnyRes to enhance the model's ability to understand nuanced visual content, which is essential for accurately interpreting document images. Training Data: Our training data is largely comprised of two key sources: (1) publicly available datasets (2) internally created synthetic data targeting specific capabilities including document understanding tasks. Granite Vision 3.3 training data is built upon the comprehensive dataset used for granite-vision-3.2-2b (a detailed description of granite-vision-3.2-2b training data is available in the technical report). In addition, granite-vision-3.3-2b further includes high quality image segmentation data, multi-page data, and data from several new high quality publicly available datasets (like Mammoth-12M and Bigdocs). Infrastructure: We train granite-vision-3.3-2b using IBM's super computing cluster, Blue Vela, which is outfitted with NVIDIA H100 GPUs. This cluster provides a scalable and efficient infrastructure for training our models over thousands of GPUs. Responsible Use and Limitations: Some use cases for Large Vision and Language Models can trigger certain risks and ethical considerations, including but not limited to: bias and fairness, misinformation, and autonomous decision-making. Although our alignment processes include safety considerations, the model may in some cases produce inaccurate, biased, offensive or unwanted responses to user prompts. Additionally, whether smaller models may exhibit increased susceptibility to hallucination in generation scenarios due to their reduced sizes, which could limit their ability to generate coherent and contextually accurate responses, remains uncertain. This aspect is currently an active area of research, and we anticipate more rigorous exploration, comprehension, and mitigations in this domain. We urge the community to use granite-vision-3.3-2b in a responsible way and avoid any malicious utilization. We recommend using this model for document understanding tasks. More general vision tasks may pose higher inherent risks of triggering unwanted output. To enhance safety, we recommend using granite-vision-3.3-2b alongside Granite Guardian. Granite Guardian is a fine-tuned instruct model designed to detect and flag risks in prompts and responses across key dimensions outlined in the IBM AI Risk Atlas. Its training, which includes both human-annotated and synthetic data informed by internal red-teaming, enables it to outperform similar open-source models on standard benchmarks, providing an additional layer of safety. Resources - 📄 Read the full technical report here - ⭐️ Learn about the latest updates with Granite: https://www.ibm.com/granite - 🚀 Get started with tutorials, best practices, and prompt engineering advice: https://www.ibm.com/granite/docs/ - 💡 Learn about the latest Granite learning resources: https://ibm.biz/granite-learning-resources
granite-speech-3.3-2b
Model Summary: Granite-speech-3.3-2b is a compact and efficient speech-language model, specifically designed for automatic speech recognition (ASR) and automatic speech translation (AST). Granite-speech-3.3-2b uses a two-pass design, unlike integrated models that combine speech and language into a single pass. Initial calls to granite-speech-3.3-2b will transcribe audio files into text. To process the transcribed text using the underlying Granite language model, users must make a second call as each step must be explicitly initiated. The model was trained on a collection of public corpora comprising diverse datasets for ASR and AST as well as synthetic datasets tailored to support the speech translation task. Granite-speech-3.3-2b was trained by modality aligning granite-3.3-2b-instruct (https://huggingface.co/ibm-granite/granite-3.3-2b-instruct) to speech on publicly available open source corpora containing audio inputs and text targets. Compared to the initial release, revision 3.3.2 supports multilingual speech inputs in English, French, German, Spanish and Portuguese, provides transcription accuracy improvements for English ASR by using a deeper acoustic encoder and additional training data. We evaluated granite-speech-3.3-2b revision 3.3.2 alongside granite-speech-3.3-8b (https://huggingface.co/ibm-granite/granite-speech-3.3-8b) and other speech-language models in the less than 8b parameter range as well as dedicated ASR and AST systems on standard benchmarks. The evaluation spanned multiple public benchmarks, with particular emphasis on English ASR tasks while also including multilingual ASR and AST for X-En and En-X translations. Supported Languages: English, French, German, Spanish, Portuguese Intended Use: The model is intended to be used in enterprise applications that involve processing of speech inputs. In particular, the model is well-suited for English, French, German, Spanish and Portuguese speech-to-text and speech translations to and from English for the same languages plus English-to-Japanese and English-to-Mandarin. The model can also be used for tasks that involve text-only input since it calls the underlying granite-3.3-2b-instruct when the user specifies a prompt that does not contain audio. Granite Speech model is supported natively in `transformers` from the `main` branch. Below is a simple example of how to use the `granite-speech-3.3-2b` revision 3.3.2 model. First, make sure to install a recent version of transformers: First, make sure to install the latest version of vLLM: The architecture of granite-speech-3.3-2b revision 3.3.2 consists of the following components: (1) Speech encoder: 16 conformer blocks trained with Connectionist Temporal Classification (CTC) on character-level targets on the subset containing only ASR corpora (see configuration below). In addition, our CTC encoder uses block-attention with 4-seconds audio blocks and self-conditioned CTC from the middle layer. | Configuration parameter | Value | |-----------------|----------------------| | Input dimension | 160 (80 logmels x 2) | | Nb. of layers | 16 | | Hidden dimension | 1024 | | Nb. of attention heads | 8 | | Attention head size | 128 | | Convolution kernel size | 15 | | Output dimension | 256 | (2) Speech projector and temporal downsampler (speech-text modality adapter): we use a 2-layer window query transformer (q-former) operating on blocks of 15 1024-dimensional acoustic embeddings coming out of the last conformer block of the speech encoder that get downsampled by a factor of 5 using 3 trainable queries per block and per layer. The total temporal downsampling factor is 10 (2x from the encoder and 5x from the projector) resulting in a 10Hz acoustic embeddings rate for the LLM. The encoder, projector and LoRA adapters were fine-tuned/trained jointly on all the corpora mentioned under Training Data. (3) Large language model: granite-3.3-2b-instruct with 128k context length (https://huggingface.co/ibm-granite/granite-3.3-2b-instruct). (4) LoRA adapters: rank=64 applied to the query, value projection matrices Overall, our training data is largely comprised of two key sources: (1) publicly available datasets (2) Synthetic data created from publicly available datasets specifically targeting the speech translation task. A detailed description of the training datasets can be found in the table below: | Name | Task | Nb. hours | Source | |-----------|--------------|----------------|--------------| | CommonVoice-17 En,De,Es,Fr,Pt | ASR | 5600 | https://huggingface.co/datasets/mozilla-foundation/commonvoice170 | | MLS En,De,Es,Fr,Pt | ASR | 48000 | https://huggingface.co/datasets/facebook/multilinguallibrispeech | | Librispeech English | ASR | 1000 | https://huggingface.co/datasets/openslr/librispeechasr | | VoxPopuli En,De,Fr,Es | ASR | 1100 | https://huggingface.co/datasets/facebook/voxpopuli | | AMI English | ASR | 100 | https://huggingface.co/datasets/edinburghcstr/ami | | YODAS English | ASR | 10000 | https://huggingface.co/datasets/espnet/yodas | | Earnings-22 English | ASR | 105 | https://huggingface.co/datasets/esb/datasets | | Switchboard English | ASR | 260 | https://catalog.ldc.upenn.edu/LDC97S62 | | CallHome English | ASR | 18 | https://catalog.ldc.upenn.edu/LDC97T14 | | Fisher English | ASR | 2000 | https://catalog.ldc.upenn.edu/LDC2004S13 | | Voicemail part I English | ASR | 40 | https://catalog.ldc.upenn.edu/LDC98S77 | | Voicemail part II English | ASR | 40 | https://catalog.ldc.upenn.edu/LDC2002S35 | | CommonVoice-17 De,Es,Fr,Pt->En | AST | 3000 | Translations with Granite-3 and Phi-4 | | CommonVoice-17 En->De,Es,Fr,It,Ja,Pt,Zh | AST | 18000 | Translations with Phi-4 and MADLAD | Infrastructure: We train Granite Speech using IBM's super computing cluster, Blue Vela, which is outfitted with NVIDIA H100 GPUs. This cluster provides a scalable and efficient infrastructure for training our models over thousands of GPUs. The training of this particular model was completed in 13 days on 32 H100 GPUs. The use of Large Speech and Language Models can trigger certain risks and ethical considerations. Although our alignment processes include safety considerations, the model may in some cases produce inaccurate, biased, offensive or unwanted responses to user prompts. Additionally, whether smaller models may exhibit increased susceptibility to hallucination in generation scenarios due to their reduced sizes, which could limit their ability to generate coherent and contextually accurate responses, remains uncertain. This aspect is currently an active area of research, and we anticipate more rigorous exploration, comprehension, and mitigations in this domain. IBM recommends using this model for automatic speech recognition and translation tasks. The model's modular design improves safety by limiting how audio inputs can influence the system. If an unfamiliar or malformed prompt is received, the model simply echoes it with its transcription. This minimizes the risk of adversarial inputs, unlike integrated models that directly interpret audio and may be more exposed to such attacks. Note that more general speech tasks may pose higher inherent risks of triggering unwanted outputs. To enhance safety, we recommend using granite-speech-3.3-2b alongside Granite Guardian. Granite Guardian is a fine-tuned instruct model designed to detect and flag risks in prompts and responses across key dimensions outlined in the IBM AI Risk Atlas. Resources - 📄 Read the full technical report: https://arxiv.org/abs/2505.08699 (covers initial release only) - 🔧 Notebooks: Finetune on custom data, two-pass spoken question answering - ⭐️ Learn about the latest updates with Granite: https://www.ibm.com/granite - 🚀 Get started with tutorials, best practices, and prompt engineering advice: https://www.ibm.com/granite/docs/ - 💡 Learn about the latest Granite learning resources: https://ibm.biz/granite-learning-resources
granite-4.0-tiny-preview
Model Summary: Granite-4-Tiny-Preview is a 7B parameter fine-grained hybrid mixture-of-experts (MoE) instruct model fine-tuned from Granite-4.0-Tiny-Base-Preview using a combination of open source instruction datasets with permissive license and internally collected synthetic datasets tailored for solving long context problems. This model is developed using a diverse set of techniques with a structured chat format, including supervised fine-tuning, and model alignment using reinforcement learning. - Developers: Granite Team, IBM - Website: Granite Docs - Release Date: May 2nd, 2025 - License: Apache 2.0 Supported Languages: English, German, Spanish, French, Japanese, Portuguese, Arabic, Czech, Italian, Korean, Dutch, and Chinese. However, users may fine-tune this Granite model for languages beyond these 12 languages. Intended Use: This model is designed to handle general instruction-following tasks and can be integrated into AI assistants across various domains, including business applications. Capabilities Thinking Summarization Text classification Text extraction Question-answering Retrieval Augmented Generation (RAG) Code related tasks Function-calling tasks Multilingual dialog use cases Long-context tasks including long document/meeting summarization, long document QA, etc. Installation: You need to install transformer from source to use this checkpoint. HuggingFace PR: https://github.com/huggingface/transformers/pull/37658 Install transformer from source: https://huggingface.co/docs/transformers/en/installation#install-from-source Generation: After installation, copy the code snippet below to run the example. Comparison with previous granite models 1 . Scores of AlpacaEval-2.0 and Arena-Hard are calculated with thinking=True Models Arena-Hard AlpacaEval-2.0 MMLU PopQA TruthfulQA BigBenchHard DROP GSM8K HumanEval HumanEval+ IFEval AttaQ Granite-3.3-2B-Instruct 28.86 43.45 55.88 18.4 58.97 52.51 35.98 72.48 80.51 75.68 65.8 87.47 Granite-3.3-8B-Instruct 57.56 62.68 65.54 26.17 66.86 59.01 41.53 80.89 89.73 86.09 74.82 88.5 Granite-4.0-Tiny-Preview 26.70 35.16 60.40 22.93 58.07 55.71 46.22 70.05 82.41 78.33 63.03 86.10 Training Data: Overall, our training data is largely comprised of two key sources: (1) publicly available datasets with permissive license, (2) internal synthetically generated data targeted to enhance reasoning capabilities. Infrastructure: We train Granite-4.0-Tiny-Preview using IBM's super computing cluster, Blue Vela, which is outfitted with NVIDIA H100 GPUs. This cluster provides a scalable and efficient infrastructure for training our models over thousands of GPUs. Ethical Considerations and Limitations: Granite-4.0-Tiny-Preview, leverages both permissively licensed open-source and select proprietary data for enhanced performance. Since it inherits its foundation from the previous model, all ethical considerations and limitations applicable to Granite-4.0-Tiny-Preview remain relevant. Signature verification: Model signing is an experimental feature with ongoing development, which might include breaking changes. We are releasing these capabilities to improve the integrity of our models for our security-conscious users and to facilitate feedback from the community. Before trying to verify the signature, ensure that the tensor files have been downloaded with git-lfs and that no files have been added, removed, or modified in your local git checkout: Install the modelsigning (v1.0.1) library with the following command: Then verify the signature with the following command ensuring that the IBM identity '[email protected]' was used for signing this model: Resources - ⭐️ Learn about the latest updates with Granite: https://www.ibm.com/granite - 📄 Get started with tutorials, best practices, and prompt engineering advice: https://www.ibm.com/granite/docs/ - 💡 Learn about the latest Granite learning resources: https://ibm.biz/granite-learning-resources
granite-embedding-reranker-english-r2
granite-4.0-micro
granite-timeseries-flowstate-r1
granite-4.0-h-micro
📣 Update [10-07-2025]: Added a default system prompt to the chat template to guide the model towards more professional, accurate, and safe responses. Model Summary: Granite-4.0-H-Micro is a 3B parameter long-context instruct model finetuned from Granite-4.0-H-Micro-Base using a combination of open source instruction datasets with permissive license and internally collected synthetic datasets. This model is developed using a diverse set of techniques with a structured chat format, including supervised finetuning, model alignment using reinforcement learning, and model merging. Granite 4.0 instruct models feature improved instruction following (IF) and tool-calling capabilities, making them more effective in enterprise applications. - Developers: Granite Team, IBM - HF Collection: Granite 4.0 Language Models HF Collection - GitHub Repository: ibm-granite/granite-4.0-language-models - Website: Granite Docs - Release Date: October 2nd, 2025 - License: Apache 2.0 Supported Languages: English, German, Spanish, French, Japanese, Portuguese, Arabic, Czech, Italian, Korean, Dutch, and Chinese. Users may finetune Granite 4.0 models for languages beyond these languages. Intended use: The model is designed to respond to general instructions and can be used to build AI assistants for multiple domains, including business applications. Capabilities Summarization Text classification Text extraction Question-answering Retrieval Augmented Generation (RAG) Code related tasks Function-calling tasks Multilingual dialog use cases Fill-In-the-Middle (FIM) code completions Need to test the examples. (especially the tool calling and RAG ones) --> Generation: This is a simple example of how to use Granite-4.0-H-Micro model. Then, copy the snippet from the section that is relevant for your use case. Tool-calling: Granite-4.0-H-Micro comes with enhanced tool calling capabilities, enabling seamless integration with external functions and APIs. To define a list of tools please follow OpenAI's function definition schema. This is an example of how to use Granite-4.0-H-Micro model tool-calling ability: Benchmarks Metric Micro Dense H Micro Dense H Tiny MoE H Small MoE Multilingual Benchmarks and thr included languages: MMMLU 11 ar, de, en, es, fr, ja, ko, pt, zh, bn, hi INCLUDE 14 hindi, bengali, tamil, telugu, arabic, german, spanish, french, italian, japanese, korean, dutch, portuguese, chinese --> hi, bn, ta, te, ar, de, es, fr, it, ja, ko, nl, pt, zh Model Architecture: Granite-4.0-H-Micro baseline is built on a decoder-only dense transformer architecture. Core components of this architecture are: GQA, Mamba2, MLP with SwiGLU, RMSNorm, and shared input/output embeddings. Model Micro Dense H Micro Dense H Tiny MoE H Small MoE Number of layers 40 attention 4 attention / 36 Mamba2 4 attention / 36 Mamba2 4 attention / 36 Mamba2 MLP / Shared expert hidden size 8192 8192 1024 1536 Training Data: Overall, our SFT data is largely comprised of three key sources: (1) publicly available datasets with permissive license, (2) internal synthetic data targeting specific capabilities, and (3) a select set of human-curated data. Infrastructure: We trained the Granite 4.0 Language Models utilizing an NVIDIA GB200 NVL72 cluster hosted in CoreWeave. Intra-rack communication occurs via the 72-GPU NVLink domain, and a non-blocking, full Fat-Tree NDR 400 Gb/s InfiniBand network provides inter-rack communication. This cluster provides a scalable and efficient infrastructure for training our models over thousands of GPUs. Ethical Considerations and Limitations: Granite 4.0 Instruction Models are primarily finetuned using instruction-response pairs mostly in English, but also multilingual data covering multiple languages. Although this model can handle multilingual dialog use cases, its performance might not be similar to English tasks. In such case, introducing a small number of examples (few-shot) can help the model in generating more accurate outputs. While this model has been aligned by keeping safety in consideration, the model may in some cases produce inaccurate, biased, or unsafe responses to user prompts. So we urge the community to use this model with proper safety testing and tuning tailored for their specific tasks. Resources - ⭐️ Learn about the latest updates with Granite: https://www.ibm.com/granite - 📄 Get started with tutorials, best practices, and prompt engineering advice: https://www.ibm.com/granite/docs/ - 💡 Learn about the latest Granite learning resources: https://ibm.biz/granite-learning-resources
granite-3.1-8b-instruct
Model Summary: Granite-3.1-8B-Instruct is a 8B parameter long-context instruct model finetuned from Granite-3.1-8B-Base using a combination of open source instruction datasets with permissive license and internally collected synthetic datasets tailored for solving long context problems. This model is developed using a diverse set of techniques with a structured chat format, including supervised finetuning, model alignment using reinforcement learning, and model merging. - Developers: Granite Team, IBM - GitHub Repository: ibm-granite/granite-3.1-language-models - Website: Granite Docs - Paper: Granite 3.1 Language Models (coming soon) - Release Date: December 18th, 2024 - License: Apache 2.0 Supported Languages: English, German, Spanish, French, Japanese, Portuguese, Arabic, Czech, Italian, Korean, Dutch, and Chinese. Users may finetune Granite 3.1 models for languages beyond these 12 languages. Intended Use: The model is designed to respond to general instructions and can be used to build AI assistants for multiple domains, including business applications. Capabilities Summarization Text classification Text extraction Question-answering Retrieval Augmented Generation (RAG) Code related tasks Function-calling tasks Multilingual dialog use cases Long-context tasks including long document/meeting summarization, long document QA, etc. Generation: This is a simple example of how to use Granite-3.1-8B-Instruct model. Then, copy the snippet from the section that is relevant for your use case. Models ARC-Challenge Hellaswag MMLU TruthfulQA Winogrande GSM8K Avg Granite-3.1-8B-Instruct 62.62 84.48 65.34 66.23 75.37 73.84 71.31 Granite-3.1-2B-Instruct 54.61 75.14 55.31 59.42 67.48 52.76 60.79 Granite-3.1-3B-A800M-Instruct 50.42 73.01 52.19 49.71 64.87 48.97 56.53 Granite-3.1-1B-A400M-Instruct 42.66 65.97 26.13 46.77 62.35 33.88 46.29 Models IFEval BBH MATH Lvl 5 GPQA MUSR MMLU-Pro Avg Granite-3.1-8B-Instruct 72.08 34.09 21.68 8.28 19.01 28.19 30.55 Granite-3.1-2B-Instruct 62.86 21.82 11.33 5.26 4.87 20.21 21.06 Granite-3.1-3B-A800M-Instruct 55.16 16.69 10.35 5.15 2.51 12.75 17.1 Granite-3.1-1B-A400M-Instruct 46.86 6.18 4.08 0 0.78 2.41 10.05 Model Architecture: Granite-3.1-8B-Instruct is based on a decoder-only dense transformer architecture. Core components of this architecture are: GQA and RoPE, MLP with SwiGLU, RMSNorm, and shared input/output embeddings. Training Data: Overall, our SFT data is largely comprised of three key sources: (1) publicly available datasets with permissive license, (2) internal synthetic data targeting specific capabilities including long-context tasks, and (3) very small amounts of human-curated data. A detailed attribution of datasets can be found in the Granite 3.0 Technical Report, Granite 3.1 Technical Report (coming soon), and Accompanying Author List. Infrastructure: We train Granite 3.1 Language Models using IBM's super computing cluster, Blue Vela, which is outfitted with NVIDIA H100 GPUs. This cluster provides a scalable and efficient infrastructure for training our models over thousands of GPUs. Ethical Considerations and Limitations: Granite 3.1 Instruct Models are primarily finetuned using instruction-response pairs mostly in English, but also multilingual data covering eleven languages. Although this model can handle multilingual dialog use cases, its performance might not be similar to English tasks. In such case, introducing a small number of examples (few-shot) can help the model in generating more accurate outputs. While this model has been aligned by keeping safety in consideration, the model may in some cases produce inaccurate, biased, or unsafe responses to user prompts. So we urge the community to use this model with proper safety testing and tuning tailored for their specific tasks. Resources - ⭐️ Learn about the latest updates with Granite: https://www.ibm.com/granite - 📄 Get started with tutorials, best practices, and prompt engineering advice: https://www.ibm.com/granite/docs/ - 💡 Learn about the latest Granite learning resources: https://ibm.biz/granite-learning-resources
granite-embedding-278m-multilingual
Model Summary: Granite-Embedding-278M-Multilingual is a 278M parameter model from the Granite Embeddings suite that can be used to generate high quality text embeddings. This model produces embedding vectors of size 768 and is trained using a combination of open source relevance-pair datasets with permissive, enterprise-friendly license, and IBM collected and generated datasets. This model is developed using contrastive finetuning, knowledge distillation and model merging for improved performance. - Developers: Granite Embedding Team, IBM - GitHub Repository: ibm-granite/granite-embedding-models - Website: Granite Docs - Paper: Technical Report - Release Date: December 18th, 2024 - License: Apache 2.0 Supported Languages: English, German, Spanish, French, Japanese, Portuguese, Arabic, Czech, Italian, Korean, Dutch, and Chinese. Users may finetune Granite-Embedding-278M-Multilingual for languages beyond these 12 languages. Intended use: The model is designed to produce fixed length vector representations for a given text, which can be used for text similarity, retrieval, and search applications. Usage with Sentence Transformers: The model is compatible with SentenceTransformer library and is very easy to use: The model can then be used to encode pairs of text and find the similarity between their representations Usage with Huggingface Transformers: This is a simple example of how to use the Granite-Embedding-278m-Multilingual model with the Transformers library and PyTorch. Evaluation: The average performance of the Granite-Embedding-278M-Multilingual on Multilingual Miracl (across 18 langauges), Mintaka Retrieval (across 8 languages) and MTEB Retrieval for English (across 15 tasks), German (across 4 tasks), Spanish (across 2 tasks), Frenc (across 5 tasks), Japanese (across 2 tasks), Arabic (1 task), Korean (1 task) and Chinese (across 8 tasks) is reported below. | Model | Paramters (M)| Embedding Dimension | Miracl (18) | Mintaka Retrieval (8) | MTEB English (15) | MTEB German (4) |MTEB Spanish (2) | MTEB French (5) | MTEB Japanese (2) | MTEB Arabic (1) | MTEB Korean (1) | MTEB Chinese (8) | |:-----------------------------------|:------------:|:-------------------:|:-------------:| :---------------------:|:-----------------:|:---------------:|:---------------:|:---------------:|:----------------:|:----------------:|:---------------:|:----------------:| |granite-embedding-278M-multilingual | 278 | 768 | 58.3 | 23.2 | 48.2 | 71.2 | 52.6 | 54.1 | 61.7 | 64.2 | 71.8 | 45.2 | Model Architecture: Granite-Embedding-278m-Multilingual is based on an encoder-only XLM-RoBERTa like transformer architecture, trained internally at IBM Research. | Model | granite-embedding-30m-english | granite-embedding-125m-english | granite-embedding-107M-multilingual | granite-embedding-278m-multilingual | | :-------- | :-------:| :-------: | :---------:| :-----:| | Embedding size | 384 | 768 | 384 | 768 | | Number of layers | 6 | 12 | 6 | 12 | | Number of attention heads | 12 | 12 | 12 | 12 | | Intermediate size | 1536 | 3072 | 1536 | 3072 | | Activation Function | GeLU | GeLU | GeLU | GeLU | | Vocabulary Size | 50265 | 50265 | 250002 | 250002 | | Max. Sequence Length | 512 | 512 | 512 | 512 | | # Parameters | 30M | 125M | 107M | 278M | Training Data: Overall, the training data consists of four key sources: (1) unsupervised title-body paired data scraped from the web, (2) publicly available paired with permissive, enterprise-friendly license, (3) IBM-internal paired data targetting specific technical domains, and (4) IBM-generated synthetic data. The data is listed below: | Dataset | Num. Pairs | |:--------------------------------------------------------------------------|:--------------:| | Multilingual MC4 | 52,823,484 | | Multilingual Webhose | 12,369,322 | | English Wikipedia | 20,745,403 | | Multilingual Wikimedia | 2,911,090 | | Miracl Corpus (Title-Body) | 10,120,398 | | Stack Exchange Duplicate questions (titles) | 304,525 | | Stack Exchange Duplicate questions (titles) | 304,525 | | Stack Exchange Duplicate questions (bodies) | 250,519 | | Machine Translations of Stack Exchange Duplicate questions (titles) | 187,195 | | Stack Exchange (Title, Answer) pairs | 4,067,139 | | Stack Exchange (Title, Body) pairs | 23,978,013 | | Stack Exchange (Title, Body) pairs | 23,978,013 | | Machine Translations of Stack Exchange (Title+Body, Answer) pairs | 1,827,15 | | SearchQA | 582,261 | | S2ORC (Title, Abstract) | 41,769,185 | | WikiAnswers Duplicate question pairs | 77,427,422 | | CCNews | 614,664 | | XSum | 226,711 | | SimpleWiki | 102,225 | | Machine Translated Cross Lingual Parallel Corpora | 28,376,115 | | SPECTER citation triplets | 684,100 | | Machine Translations of SPECTER citation triplets | 4,104,600 | | Natural Questions (NQ) | 100,231 | | SQuAD2.0 | 87,599 | | HotpotQA | 85,000 | | Fever | 109,810 | | PubMed | 20,000,000 | | Multilingual Miracl Triples | 81,409 | | Multilingual MrTydi Triples | 48,715 | | Sadeeem Question Asnwering | 4,037 | | DBPedia Title-Body Pairs | 4,635,922 | | Synthetic: English Query-Wikipedia Passage | 1,879,093 | | Synthetic: English Fact Verification | 9,888 | | Synthetic: Multilingual Query-Wikipedia Passage | 300,266 | | Synthetic: Multilingual News Summaries | 37,489 | | IBM Internal Triples | 40,290 | | IBM Internal Title-Body Pairs | 1,524,586 | Notably, we do not use the popular MS-MARCO retrieval dataset in our training corpus due to its non-commercial license, while other open-source models train on this dataset due to its high quality. Infrastructure: We train Granite Embedding Models using IBM's computing cluster, Cognitive Compute Cluster, which is outfitted with NVIDIA A100 80gb GPUs. This cluster provides a scalable and efficient infrastructure for training our models over multiple GPUs. Ethical Considerations and Limitations: The data used to train the base language model was filtered to remove text containing hate, abuse, and profanity. Granite-Embedding-278m-Multilingual is finetuned on 12 languages, and has a context length of 512 tokens (longer texts will be truncated to this size). Resources - ⭐️ Learn about the latest updates with Granite: https://www.ibm.com/granite - 📄 Get started with tutorials, best practices, and prompt engineering advice: https://www.ibm.com/granite/docs/ - 💡 Learn about the latest Granite learning resources: https://ibm.biz/granite-learning-resources
granite-embedding-125m-english
News: Granite Embedding R2 models with 8192 context length released. - granite-embedding-english-r2 (149M parameters): with an output embedding size of 768, replacing granite-embedding-125m-english. - granite-embedding-small-english-r2 (47M parameters): A first-of-its-kind reduced-size model, with fewer layers and a smaller output embedding size (384), replacing granite-embedding-30m-english. Model Summary: Granite-Embedding-125m-English is a 125M parameter dense biencoder embedding model from the Granite Embeddings suite that can be used to generate high quality text embeddings. This model produces embedding vectors of size 768. Compared to most other open-source models, this model was only trained using open-source relevance-pair datasets with permissive, enterprise-friendly license, plus IBM collected and generated datasets. While maintaining competitive scores on academic benchmarks such as BEIR, this model also performs well on many enterprise use cases. This model is developed using retrieval oriented pretraining, contrastive finetuning and knowledge distillation. - Developers: Granite Embedding Team, IBM - GitHub Repository: ibm-granite/granite-embedding-models - Website: Granite Docs - Paper: Technical Report - Release Date: December 18th, 2024 - License: Apache 2.0 Intended use: The model is designed to produce fixed length vector representations for a given text, which can be used for text similarity, retrieval, and search applications. Usage with Sentence Transformers: The model is compatible with SentenceTransformer library and is very easy to use: The model can then be used to encode pairs of text and find the similarity between their representations Usage with Huggingface Transformers: This is a simple example of how to use the Granite-Embedding-125m-English model with the Transformers library and PyTorch. The performance of the Granite-Embedding-125M-English model on MTEB Retrieval (i.e., BEIR) and code retrieval (CoIR) benchmarks is reported below. | Model | Paramters (M)| Embedding Dimension | MTEB Retrieval (15) | CoIR (10) | |---------------------------------|:------------:|:-------------------:|:-------------------: |:----------:| |granite-embedding-125m-english |125 |768 |52.3 |50.3 | Model Architecture: Granite-Embedding-125m-English is based on an encoder-only RoBERTa like transformer architecture, trained internally at IBM Research. | Model | granite-embedding-30m-english | granite-embedding-125m-english | granite-embedding-107m-multilingual | granite-embedding-278m-multilingual | | :--------- | :-------:| :--------: | :-----:| :-----:| | Embedding size | 384 | 768 | 384 | 768 | | Number of layers | 6 | 12 | 6 | 12 | | Number of attention heads | 12 | 12 | 12 | 12 | | Intermediate size | 1536 | 3072 | 1536 | 3072 | | Activation Function | GeLU | GeLU | GeLU | GeLU | | Vocabulary Size | 50265| 50265 | 250002 | 250002 | | Max. Sequence Length | 512 | 512 | 512 | 512 | | # Parameters | 30M | 125M | 107M | 278M | Training Data: Overall, the training data consists of four key sources: (1) unsupervised title-body paired data scraped from the web, (2) publicly available paired with permissive, enterprise-friendly license, (3) IBM-internal paired data targetting specific technical domains, and (4) IBM-generated synthetic data. The data is listed below: | Dataset | Num. Pairs | |----------------------------------------------------|:---------------:| | SPECTER citation triplets | 684,100 | | Stack Exchange Duplicate questions (titles) | 304,525 | | Stack Exchange Duplicate questions (bodies) | 250,519 | | Stack Exchange Duplicate questions (titles+bodies) | 250,460 | | Natural Questions (NQ) | 100,231 | | SQuAD2.0 | 87,599 | | PAQ (Question, Answer) pairs | 64,371,441 | | Stack Exchange (Title, Answer) pairs | 4,067,139 | | Stack Exchange (Title, Body) pairs | 23,978,013 | | Stack Exchange (Title+Body, Answer) pairs | 187,195 | | S2ORC Citation pairs (Titles) | 52,603,982 | | S2ORC (Title, Abstract) | 41,769,185 | | S2ORC (Citations, abstracts) | 52,603,982 | | WikiAnswers Duplicate question pairs | 77,427,422 | | SearchQA | 582,261 | | HotpotQA | 85,000 | | Fever | 109,810 | | Arxiv | 2,358,545 | | Wikipedia | 20,745,403 | | PubMed | 20,000,000 | | Miracl En Pairs | 9,016 | | DBPedia Title-Body Pairs | 4,635,922 | | Synthetic: Query-Wikipedia Passage | 1,879,093 | | Synthetic: Fact Verification | 9,888 | | IBM Internal Triples | 40,290 | | IBM Internal Title-Body Pairs | 1,524,586 | Notably, we do not use the popular MS-MARCO retrieval dataset in our training corpus due to its non-commercial license, while other open-source models train on this dataset due to its high quality. Infrastructure: We train Granite Embedding Models using IBM's computing cluster, Cognitive Compute Cluster, which is outfitted with NVIDIA A100 80gb GPUs. This cluster provides a scalable and efficient infrastructure for training our models over multiple GPUs. Ethical Considerations and Limitations: The data used to train the base language model was filtered to remove text containing hate, abuse, and profanity. Granite-Embedding-125m-English is trained only for English texts, and has a context length of 512 tokens (longer texts will be truncated to this size). Resources - ⭐️ Learn about the latest updates with Granite: https://www.ibm.com/granite - 📄 Get started with tutorials, best practices, and prompt engineering advice: https://www.ibm.com/granite/docs/ - 💡 Learn about the latest Granite learning resources: https://ibm.biz/granite-learning-resources
granite-embedding-107m-multilingual
Model Summary: Granite-Embedding-107M-Multilingual is a 107M parameter dense biencoder embedding model from the Granite Embeddings suite that can be used to generate high quality text embeddings. This model produces embedding vectors of size 384 and is trained using a combination of open source relevance-pair datasets with permissive, enterprise-friendly license, and IBM collected and generated datasets. This model is developed using contrastive finetuning, knowledge distillation and model merging for improved performance. - Developers: Granite Embedding Team, IBM - GitHub Repository: ibm-granite/granite-embedding-models - Website: Granite Docs - Paper: Technical Report - Release Date: December 18th, 2024 - License: Apache 2.0 Supported Languages: English, German, Spanish, French, Japanese, Portuguese, Arabic, Czech, Italian, Korean, Dutch, and Chinese. Users may finetune Granite-Embedding-107M-Multilingual for languages beyond these 12 languages. Intended use: The model is designed to produce fixed length vector representations for a given text, which can be used for text similarity, retrieval, and search applications. Usage with Sentence Transformers: The model is compatible with SentenceTransformer library and is very easy to use: The model can then be used to encode pairs of text and find the similarity between their representations Usage with Huggingface Transformers: This is a simple example of how to use the Granite-Embedding-107m-Multilingual model with the Transformers library and PyTorch. Evaluation: The average performance of the Granite-Embedding-107M-Multilingual on Multilingual Miracl (across 18 langauges), Mintaka Retrieval (across 8 languages) and MTEB Retrieval for English (across 15 tasks), German (across 4 tasks), Spanish (across 2 tasks), Frenc (across 5 tasks), Japanese (across 2 tasks), Arabic (1 task), Korean (1 task) and Chinese (across 8 tasks) is reported below. Granite-Embedding-107M-Multilingual is twice as fast as other models with similar embedding dimensions. | Model | Paramters (M)| Embedding Dimension | Miracl (18) | Mintaka Retrieval (8) | MTEB English (15) | MTEB German (4) |MTEB Spanish (2) | MTEB French (5) | MTEB Japanese (2) | MTEB Arabic (1) | MTEB Korean (1) | MTEB Chinese (8) | |------------------------------------|:------------:|:-------------------:|:-------------:| :---------------------:|:-----------------:|:---------------:|:---------------:|:---------------:|:----------------:|:----------------:|----------------:|-----------------:| |granite-embedding-107m-multilingual | 107 | 384 | 55.9 | 22.6 | 45.3 | 70.3 | 48.7 | 51.1 | 59.0 | 63.2 | 70.5 | 40.8 | Model Architecture: Granite-Embedding-107m-Multilingual is based on an encoder-only XLM-RoBERTa like transformer architecture, trained internally at IBM Research. | Model | granite-embedding-30m-english | granite-embedding-125m-english | granite-embedding-107m-multilingual | granite-embedding-278m-multilingual | | :--------- | :-------:| :--------: | :---------:| :-----:| | Embedding size | 384 | 768 | 384 | 768 | | Number of layers | 6 | 12 | 6 | 12 | | Number of attention heads | 12 | 12 | 12 | 12 | | Intermediate size | 1536 | 3072 | 1536 | 3072 | | Activation Function | GeLU | GeLU | GeLU | GeLU | | Vocabulary Size | 50265 | 50265 | 250002 | 250002 | | Max. Sequence Length | 512 | 512 | 512 | 512 | | # Parameters | 30M | 125M | 107M | 278M | Training Data: Overall, the training data consists of four key sources: (1) unsupervised title-body paired data scraped from the web, (2) publicly available paired with permissive, enterprise-friendly license, (3) IBM-internal paired data targetting specific technical domains, and (4) IBM-generated synthetic data. The data is listed below: | Dataset | Num. Pairs | |:--------------------------------------------------------------------------|:--------------:| | Multilingual MC4 | 52,823,484 | | Multilingual Webhose | 12,369,322 | | English Wikipedia | 20,745,403 | | Multilingual Wikimedia | 2,911,090 | | Miracl Corpus (Title-Body) | 10,120,398 | | Stack Exchange Duplicate questions (titles) | 304,525 | | Stack Exchange Duplicate questions (titles) | 304,525 | | Stack Exchange Duplicate questions (bodies) | 250,519 | | Machine Translations of Stack Exchange Duplicate questions (titles) | 187,195 | | Stack Exchange (Title, Answer) pairs | 4,067,139 | | Stack Exchange (Title, Body) pairs | 23,978,013 | | Stack Exchange (Title, Body) pairs | 23,978,013 | | Machine Translations of Stack Exchange (Title+Body, Answer) pairs | 1,827,15 | | SearchQA | 582,261 | | S2ORC (Title, Abstract) | 41,769,185 | | WikiAnswers Duplicate question pairs | 77,427,422 | | CCNews | 614,664 | | XSum | 226,711 | | SimpleWiki | 102,225 | | Machine Translated Cross Lingual Parallel Corpora | 28,376,115 | | SPECTER citation triplets | 684,100 | | Machine Translations of SPECTER citation triplets | 4,104,600 | | Natural Questions (NQ) | 100,231 | | SQuAD2.0 | 87,599 | | HotpotQA | 85,000 | | Fever | 109,810 | | PubMed | 20,000,000 | | Multilingual Miracl Triples | 81,409 | | Multilingual MrTydi Triples | 48,715 | | Sadeeem Question Asnwering | 4,037 | | DBPedia Title-Body Pairs | 4,635,922 | | Synthetic: English Query-Wikipedia Passage | 1,879,093 | | Synthetic: English Fact Verification | 9,888 | | Synthetic: Multilingual Query-Wikipedia Passage | 300,266 | | Synthetic: Multilingual News Summaries | 37,489 | | IBM Internal Triples | 40,290 | | IBM Internal Title-Body Pairs | 1,524,586 | Notably, we do not use the popular MS-MARCO retrieval dataset in our training corpus due to its non-commercial license, while other open-source models train on this dataset due to its high quality. Infrastructure: We train Granite Embedding Models using IBM's computing cluster, Cognitive Compute Cluster, which is outfitted with NVIDIA A100 80gb GPUs. This cluster provides a scalable and efficient infrastructure for training our models over multiple GPUs. Ethical Considerations and Limitations: The data used to train the base language model was filtered to remove text containing hate, abuse, and profanity. Granite-Embedding-107m-Multilingual is finetuned on 12 languages, and has a context length of 512 tokens (longer texts will be truncated to this size). Resources - ⭐️ Learn about the latest updates with Granite: https://www.ibm.com/granite - 📄 Get started with tutorials, best practices, and prompt engineering advice: https://www.ibm.com/granite/docs/ - 💡 Learn about the latest Granite learning resources: https://ibm.biz/granite-learning-resources
granite-3.0-8b-instruct
Model Summary: Granite-3.0-8B-Instruct is a 8B parameter model finetuned from Granite-3.0-8B-Base using a combination of open source instruction datasets with permissive license and internally collected synthetic datasets. This model is developed using a diverse set of techniques with a structured chat format, including supervised finetuning, model alignment using reinforcement learning, and model merging. - Developers: Granite Team, IBM - GitHub Repository: ibm-granite/granite-3.0-language-models - Website: Granite Docs - Paper: Granite 3.0 Language Models - Release Date: October 21st, 2024 - License: Apache 2.0 Supported Languages: English, German, Spanish, French, Japanese, Portuguese, Arabic, Czech, Italian, Korean, Dutch, and Chinese. Users may finetune Granite 3.0 models for languages beyond these 12 languages. Intended use: The model is designed to respond to general instructions and can be used to build AI assistants for multiple domains, including business applications. Capabilities Summarization Text classification Text extraction Question-answering Retrieval Augmented Generation (RAG) Code related tasks Function-calling tasks Multilingual dialog use cases Generation: This is a simple example of how to use Granite-3.0-8B-Instruct model. Then, copy the snippet from the section that is relevant for your use case. Model Architecture: Granite-3.0-8B-Instruct is based on a decoder-only dense transformer architecture. Core components of this architecture are: GQA and RoPE, MLP with SwiGLU, RMSNorm, and shared input/output embeddings. | Model | 2B Dense | 8B Dense | 1B MoE | 3B MoE | | :-------- | :--------| :-------- | :------| :------| | Embedding size | 2048 | 4096 | 1024 | 1536 | | Number of layers | 40 | 40 | 24 | 32 | | Attention head size | 64 | 128 | 64 | 64 | | Number of attention heads | 32 | 32 | 16 | 24 | | Number of KV heads | 8 | 8 | 8 | 8 | | MLP hidden size | 8192 | 12800 | 512 | 512 | | MLP activation | SwiGLU | SwiGLU | SwiGLU | SwiGLU | | Number of Experts | — | — | 32 | 40 | | MoE TopK | — | — | 8 | 8 | | Initialization std | 0.1 | 0.1 | 0.1 | 0.1 | | Sequence Length | 4096 | 4096 | 4096 | 4096 | | Position Embedding | RoPE | RoPE | RoPE | RoPE | | # Parameters | 2.5B | 8.1B | 1.3B | 3.3B | | # Active Parameters | 2.5B | 8.1B | 400M | 800M | | # Training tokens | 12T | 12T | 10T | 10T | Training Data: Overall, our SFT data is largely comprised of three key sources: (1) publicly available datasets with permissive license, (2) internal synthetic data targeting specific capabilities, and (3) very small amounts of human-curated data. A detailed attribution of datasets can be found in the Granite Technical Report and Accompanying Author List. Infrastructure: We train Granite 3.0 Language Models using IBM's super computing cluster, Blue Vela, which is outfitted with NVIDIA H100 GPUs. This cluster provides a scalable and efficient infrastructure for training our models over thousands of GPUs while minimizing environmental impact by utilizing 100% renewable energy sources. Ethical Considerations and Limitations: Granite 3.0 Instruct Models are primarily finetuned using instruction-response pairs mostly in English, but also multilingual data covering eleven languages. Although this model can handle multilingual dialog use cases, its performance might not be similar to English tasks. In such case, introducing a small number of examples (few-shot) can help the model in generating more accurate outputs. While this model has been aligned by keeping safety in consideration, the model may in some cases produce inaccurate, biased, or unsafe responses to user prompts. So we urge the community to use this model with proper safety testing and tuning tailored for their specific tasks. Resources - ⭐️ Learn about the latest updates with Granite: https://www.ibm.com/granite - 📄 Get started with tutorials, best practices, and prompt engineering advice: https://www.ibm.com/granite/docs/ - 💡 Learn about the latest Granite learning resources: https://ibm.biz/granite-learning-resources
granite-4.0-h-tiny
📣 Update [10-07-2025]: Added a default system prompt to the chat template to guide the model towards more professional, accurate, and safe responses. Model Summary: Granite-4.0-H-Tiny is a 7B para...
granite-speech-3.3-8b
granite-4.0-h-small
📣 Update [10-07-2025]: Added a default system prompt to the chat template to guide the model towards more professional, accurate, and safe responses. Model Summary: Granite-4.0-H-Small is a 32B pa...
granite-geospatial-ocean
The granite-geospatial-ocean foundation model was jointly developed by IBM and STFC as part of a collaboration with the University of Exeter and Plymouth Marine Lab under the UK HNCDI programme. This pre-trained model supports a range of potential use cases in ocean ecosystem health, fisheries management, pollution and other ocean processes that can be monitored using ocean colour observations. We provide an example to fine tune the model to quantify primary production by phytoplankton (carbon sequestration which determine's the ocean's role in climate change). For full details of the model pre-training, fine-tuning and evaluation, please see the preprint paper. The granite-geospatial-ocean model is a transformer-based geospatial foundation model trained on Sentinel-3 Ocean Land Colour Instrument (OLCI) and Sea and Land Surface Temperature Radiometer (SLSTR) images. The model consists of a self-supervised encoder developed with a ViT architecture and Masked AutoEncoder (MAE) learning strategy, with an MSE loss function and follows the same architecture as Prithvi-EO. We used a 42x42 image size and 16 bands of Level-2 sentinel-3 OLCI(OL1 to OL12, OL16, OL17, OL18 and OL21) and also a further band of Level-2 SLSTR sea surface temperature data were in the pre-training. In total of 512,000 images were used for pre-training. We have provided an example of fine-tuning the model for primary production quantification which can be found here. These examples make use of TerraTorch for fine-tuning and prediction. Primary Production Quantification >>Try it on Colab (Choose T4 GPU runtime) Your feedback is invaluable to us. If you have any feedback about the model, please feel free to share it with us. You can do this by starting a discussion in this HF repository or submitting an issue to TerraTorch on GitHub. Model Card Authors Geoffrey Dawson, Remy Vandaele, Andrew Taylor, David Moffat, Helen Tamura-Wicks, Sarah Jackson, Chunbo Luo, Paolo Fraccaro, Hywel Williams, Rosie Lickorish and Anne Jones Acknowledgments This work was supported by the Hartree National Centre for Digital Innovation, a collaboration between STFC and IBM. IBM Public Repository Disclosure: All content in this repository including code has been provided by IBM under the associated open source software license and IBM is under no obligation to provide enhancements, updates, or support. IBM developers produced this code as an open source project (not as an IBM product), and IBM makes no assertions as to the level of quality nor security, and will not be maintaining this code going forward. If this model helped your research, please cite Granite-geospatial-ocean-Preprint in your publications.
granite-vision-3.2-2b
granite-4.0-h-350m
Model Summary: Granite-4.0-H-350M is a lightweight instruct model finetuned from Granite-4.0-H-350M-Base using a combination of open source instruction datasets with permissive license and internally collected synthetic datasets. This model is developed using a diverse set of techniques including supervised finetuning, reinforcement learning, and model merging. - Developers: Granite Team, IBM - HF Collection: Granite 4.0 Nano Language Models HF Collection - GitHub Repository: ibm-granite/granite-4.0-nano-language-models - Website: Granite Docs - Release Date: October 28, 2025 - License: Apache 2.0 Supported Languages: English, German, Spanish, French, Japanese, Portuguese, Arabic, Czech, Italian, Korean, Dutch, and Chinese. Users may fine-tune Granite 4.0 Nano models to support languages beyond those included in this list. Intended use: Granite 4.0 Nano instruct models feature strong instruction following capabilities bringing advanced AI capabilities within reach for on-device deployments and research use cases. Additionally, their compact size makes them well-suited for fine-tuning on specialized domains without requiring massive compute resources. Capabilities Summarization Text classification Text extraction Question-answering Retrieval Augmented Generation (RAG) Code related tasks Function-calling tasks Multilingual dialog use cases Fill-In-the-Middle (FIM) code completions Need to test the examples. (especially the tool calling and RAG ones) --> Generation: This is a simple example of how to use Granite-4.0-H-350M model. Then, copy the snippet from the section that is relevant for your use case. Tool-calling: Granite-4.0-H-350M comes with enhanced tool calling capabilities, enabling seamless integration with external functions and APIs. To define a list of tools please follow OpenAI's function definition schema. This is an example of how to use Granite-4.0-H-350M model tool-calling ability: Benchmarks Metric 350M Dense H 350M Dense 1B Dense H 1B Dense Multilingual Benchmarks and thr included languages: MMMLU 11 ar, de, en, es, fr, ja, ko, pt, zh, bn, hi INCLUDE 14 hi, bn, ta, te, ar, de, es, fr, it, ja, ko, nl, pt, zh Granite-4.0-H-350M baseline is based on a decoder-only dense transformer architecture. Core components of this architecture are: GQA, Mamba2, MLP with SwiGLU, RMSNorm, and shared input/output embeddings. Number of layers 28 attention 4 attention / 28 Mamba2 40 attention 4 attention / 36 Mamba2 MLP / Shared expert hidden size 2048 2048 4096 4096 Training Data: Overall, our SFT data is largely comprised of three key sources: (1) publicly available datasets with permissive license, (2) internal synthetic data targeting specific capabilities, and (3) a select set of human-curated data. Infrastructure: We trained the Granite 4.0 Nano Language Models utilizing an NVIDIA GB200 NVL72 cluster hosted in CoreWeave. Intra-rack communication occurs via the 72-GPU NVLink domain, and a non-blocking, full Fat-Tree NDR 400 Gb/s InfiniBand network provides inter-rack communication. This cluster provides a scalable and efficient infrastructure for training our models over thousands of GPUs. Ethical Considerations and Limitations: Granite 4.0 Nano Instruct Models are primarily finetuned using instruction-response pairs mostly in English, but also multilingual data covering multiple languages. Although this model can handle multilingual dialog use cases, its performance might not be similar to English tasks. In such case, introducing a small number of examples (few-shot) can help the model in generating more accurate outputs. While this model has been aligned by keeping safety in consideration, the model may in some cases produce inaccurate, biased, or unsafe responses to user prompts. So we urge the community to use this model with proper safety testing and tuning tailored for their specific tasks. Resources - ⭐️ Learn about the latest updates with Granite: https://www.ibm.com/granite - 📄 Get started with tutorials, best practices, and prompt engineering advice: https://www.ibm.com/granite/docs/ - 💡 Learn about the latest Granite learning resources: https://ibm.biz/granite-learning-resources
granite-3.3-2b-base
granite-embedding-english-r2
Model Summary: Granite-embedding-english-r2 is a 149M parameter dense biencoder embedding model from the Granite Embeddings collection that can be used to generate high quality text embeddings. This model produces embedding vectors of size 768 based on context length of upto 8192 tokens. Compared to most other open-source models, this model was only trained using open-source relevance-pair datasets with permissive, enterprise-friendly license, plus IBM collected and generated datasets. The r2 models show strong performance across standard and IBM-built information retrieval benchmarks (BEIR, ClapNQ), code retrieval (COIR), long-document search benchmarks (MLDR, LongEmbed), conversational multi-turn (MTRAG), table retrieval (NQTables, OTT-QA, AIT-QA, MultiHierTT, OpenWikiTables), and on many enterprise use cases. These models use a bi-encoder architecture to generate high-quality embeddings from text inputs such as queries, passages, and documents, enabling seamless comparison through cosine similarity. Built using retrieval oriented pretraining, contrastive finetuning, knowledge distillation, and model merging, granite-embedding-english-r2 is optimized to ensure strong alignment between query and passage embeddings. The latest granite embedding r2 release introduces two English embedding models, both based on the ModernBERT architecture: - granite-embedding-english-r2 (149M parameters): with an output embedding size of 768, replacing granite-embedding-125m-english. - granite-embedding-small-english-r2 (47M parameters): A first-of-its-kind reduced-size model, with fewer layers and a smaller output embedding size (384), replacing granite-embedding-30m-english. - Developed by: Granite Embedding Team, IBM - Repository: ibm-granite/granite-embedding-models - Paper: Granite Embedding R2 Models - Language(s) (NLP): English - Release Date: Aug 15, 2025 - License: Apache 2.0 Intended Use: The model is designed to produce fixed length vector representations for a given text, which can be used for text similarity, retrieval, and search applications. For efficient decoding, these models use Flash Attention 2. Installing it is optional, but can lead to faster inference. The model is compatible with SentenceTransformer library and is very easy to use: The model can then be used to encode pairs of text and find the similarity between their representations This is a simple example of how to use the granite-embedding-english-r2 model with the Transformers library and PyTorch. Evaluation Results Granite embedding r2 models show a strong performance across tasks diverse tasks. Performance of the granite models on MTEB Retrieval (i.e., BEIR), MTEB-v2, code retrieval (CoIR), long-document search benchmarks (MLDR, LongEmbed), conversational multi-turn (MTRAG), table retrieval (NQTables, OTT-QA, AIT-QA, MultiHierTT, OpenWikiTables), benchmarks is reported in the below tables. The r2 models demonstrates speed and efficiency, while mainintaining competitive performance. The average speed to encode documents on a single H100 GPU using a sliding window with 512 context length chunks is also reported. | Model | Parameters (M) | Embedding Size | BEIR Retrieval (15) | MTEB-v2 (41)| CoIR (10) | MLDR (En) | MTRAG (4) | Encoding Speed (docs/sec) | |------------------------------------|:--------------:|:--------------:|:-------------------:|:-----------:|:---------:|:---------:|:---------:|:-------------------------------:| | granite-embedding-125m-english | 125 | 768 | 52.3 | 62.1 | 50.3 | 35.0 | 49.4 | 149 | | granite-embedding-30m-english | 30 | 384 | 49.1 | 60.2 | 47.0 | 32.6 | 48.6 | 198 | | granite-embedding-english-r2 | 149 | 768 | 53.1 | 62.8 | 55.3 | 40.7 | 56.7 | 144 | | granite-embedding-small-english-r2 | 47 | 384 | 50.9 | 61.1 | 53.8 | 39.8 | 48.1 | 199 | |Model | Parameters (M) | Embedding Size |AVERAGE|MTEB-v2 Retrieval (10) | CoIR (10) | MLDR (En) | LongEmbed (6)| Table IR (5)| MTRAG(4) | Encoding Speed (docs/sec) | |-----------------------------------|:--------------:|:--------------:|:---------:|:---------------------:|:---------:|:---------:|:------------:|:-----------:|:--------:|-------------------------------:| |e5-base-v2 |109|768|47.5|49.7|50.3|32.5|41.1|74.09|37.0| 115| |bge-base-en-v1.5 |109|768|46.9|54.8|46.6|33.5|33.9|73.98|38.8| 116| |snowflake-arctic-embed-m-v2.0 |305|768|51.4|58.4|52.2|32.4|55.4|80.75|29.2| 106| |gte-base-en-v1.5 |137|768|52.8|55.5|42.4|42.7|59.4|80.52|36.0| 116| |gte-modernbert-base |149|768|57.5|57.0|71.5|46.2|57.0|76.68|36.8| 142| |nomic-ai/modernbert-embed-base |149|768|48.0|48.7|48.8|31.3|56.3|66.69|36.2| 141| ||||||||||| |granite-embedding-english-r2 |149|768|59.5|56.4|54.8|41.6|67.8|78.53|57.6| 144| |granite-embedding-small-english-r2 | 47|384|55.6|53.9|53.4|40.1|61.9|75.51|48.9|199| The latest granite embedding r2 release introduces two English embedding models, both based on the ModernBERT architecture: - granite-embedding-english-r2 (149M parameters): with an output embedding size of 768, replacing granite-embedding-125m-english. - granite-embedding-small-english-r2 (47M parameters): A first-of-its-kind reduced-size model, with fewer layers and a smaller output embedding size (384), replacing granite-embedding-30m-english. The following table shows the structure of the two models: | Model | granite-embedding-small-english-r2 | granite-embedding-english-r2 | | :--------- | :-------:|:--------:| | Embedding size | 384 | 768 | | Number of layers | 12 | 22 | | Number of attention heads | 12 | 12 | | Intermediate size | 1536 | 1152 | | Activation Function | GeGLU | GeGLU | | Vocabulary Size | 50368 | 50368 | | Max. Sequence Length | 8192 | 8192 | | # Parameters | 47M | 149M | The granite embedding r2 models incorporate key enhancements from the ModernBERT architecture, including: - Alternating attention lengths to accelerate processing - Rotary position embeddings for extended sequence length - A newly trained tokenizer optimized with code and text data - Flash Attention 2.0 for improved efficiency - Streamlined parameters, eliminating unnecessary bias terms Data Collection Granite embedding r2 models are trained using data from four key sources: 1. Unsupervised title-body paired data scraped from the web 2. Publicly available paired with permissive, enterprise-friendly license 3. IBM-internal paired data targetting specific technical domains 4. IBM-generated synthetic data Notably, we do not use the popular MS-MARCO retrieval dataset in our training corpus due to its non-commercial license (many open-source models use this dataset due to its high quality). The underlying encoder models using GneissWeb, an IBM-curated dataset composed exclusively of open, commercial-friendly sources. For governance, all our data undergoes a data clearance process subject to technical, business, and governance review. This comprehensive process captures critical information about the data, including but not limited to their content description ownership, intended use, data classification, licensing information, usage restrictions, how the data will be acquired, as well as an assessment of sensitive information (i.e, personal information). Infrastructure We trained the granite embedding english r2 models using IBM's computing cluster, BlueVela Cluster, which is outfitted with NVIDIA H100 80GB GPUs. This cluster provides a scalable and efficient infrastructure for training our models over multiple GPUs. Ethical Considerations and Limitations Granite-embedding-english-r2 leverages both permissively licensed open-source and select proprietary data for enhanced performance. The training data for the base language model was filtered to remove text containing hate, abuse, and profanity. Granite-embedding-english-r2 is trained only for English texts, and has a context length of 8192 tokens (longer texts will be truncated to this size). - ⭐️ Learn about the latest updates with Granite: https://www.ibm.com/granite - 📄 Get started with tutorials, best practices, and prompt engineering advice: https://www.ibm.com/granite/docs/ - 💡 Learn about the latest Granite learning resources: https://ibm.biz/granite-learning-resources
granite-4.0-h-1b
Model Summary: Granite-4.0-H-1B is a lightweight instruct model finetuned from Granite-4.0-H-1B-Base using a combination of open source instruction datasets with permissive license and internally collected synthetic datasets. This model is developed using a diverse set of techniques including supervised finetuning, reinforcement learning, and model merging. - Developers: Granite Team, IBM - HF Collection: Granite 4.0 Nano Language Models HF Collection - GitHub Repository: ibm-granite/granite-4.0-nano-language-models - Website: Granite Docs - Release Date: October 28, 2025 - License: Apache 2.0 Supported Languages: English, German, Spanish, French, Japanese, Portuguese, Arabic, Czech, Italian, Korean, Dutch, and Chinese. Users may fine-tune Granite 4.0 Nano models to support languages beyond those included in this list. Intended use: Granite 4.0 Nano instruct models feature strong instruction following capabilities bringing advanced AI capabilities within reach for on-device deployments and research use cases. Additionally, their compact size makes them well-suited for fine-tuning on specialized domains without requiring massive compute resources. Capabilities Summarization Text classification Text extraction Question-answering Retrieval Augmented Generation (RAG) Code related tasks Function-calling tasks Multilingual dialog use cases Fill-In-the-Middle (FIM) code completions Need to test the examples. (especially the tool calling and RAG ones) --> Generation: This is a simple example of how to use Granite-4.0-H-1B model. Then, copy the snippet from the section that is relevant for your use case. Tool-calling: Granite-4.0-H-1B comes with enhanced tool calling capabilities, enabling seamless integration with external functions and APIs. To define a list of tools please follow OpenAI's function definition schema. This is an example of how to use Granite-4.0-H-1B model tool-calling ability: Benchmarks Metric 350M Dense H 350M Dense 1B Dense H 1B Dense Multilingual Benchmarks and thr included languages: MMMLU 11 ar, de, en, es, fr, ja, ko, pt, zh, bn, hi INCLUDE 14 hi, bn, ta, te, ar, de, es, fr, it, ja, ko, nl, pt, zh Granite-4.0-H-1B baseline is based on a decoder-only dense transformer architecture. Core components of this architecture are: GQA, Mamba2, MLP with SwiGLU, RMSNorm, and shared input/output embeddings. Number of layers 28 attention 4 attention / 28 Mamba2 40 attention 4 attention / 36 Mamba2 MLP / Shared expert hidden size 2048 2048 4096 4096 Training Data: Overall, our SFT data is largely comprised of three key sources: (1) publicly available datasets with permissive license, (2) internal synthetic data targeting specific capabilities, and (3) a select set of human-curated data. Infrastructure: We trained the Granite 4.0 Nano Language Models utilizing an NVIDIA GB200 NVL72 cluster hosted in CoreWeave. Intra-rack communication occurs via the 72-GPU NVLink domain, and a non-blocking, full Fat-Tree NDR 400 Gb/s InfiniBand network provides inter-rack communication. This cluster provides a scalable and efficient infrastructure for training our models over thousands of GPUs. Ethical Considerations and Limitations: Granite 4.0 Nano Instruct Models are primarily finetuned using instruction-response pairs mostly in English, but also multilingual data covering multiple languages. Although this model can handle multilingual dialog use cases, its performance might not be similar to English tasks. In such case, introducing a small number of examples (few-shot) can help the model in generating more accurate outputs. While this model has been aligned by keeping safety in consideration, the model may in some cases produce inaccurate, biased, or unsafe responses to user prompts. So we urge the community to use this model with proper safety testing and tuning tailored for their specific tasks. Resources - ⭐️ Learn about the latest updates with Granite: https://www.ibm.com/granite - 📄 Get started with tutorials, best practices, and prompt engineering advice: https://www.ibm.com/granite/docs/ - 💡 Learn about the latest Granite learning resources: https://ibm.biz/granite-learning-resources
granite-4.0-micro-GGUF
> [!NOTE] > This repository contains models that have been converted to the GGUF format with various quantizations from an IBM Granite base model. > > Please reference the base model's full model card here: > https://huggingface.co/ibm-granite/granite-4.0-micro
granite-4.0-h-small-GGUF
> [!NOTE] > This repository contains models that have been converted to the GGUF format with various quantizations from an IBM Granite base model. > > Please reference the base model's full model card here: > https://huggingface.co/ibm-granite/granite-4.0-h-small
granite-3.2-8b-instruct
Model Summary: Granite-3.2-8B-Instruct is an 8-billion-parameter, long-context AI model fine-tuned for thinking capabilities. Built on top of Granite-3.1-8B-Instruct, it has been trained using a mix of permissively licensed open-source datasets and internally generated synthetic data designed for reasoning tasks. The model allows controllability of its thinking capability, ensuring it is applied only when required. - Developers: Granite Team, IBM - Website: Granite Docs - Release Date: February 26th, 2025 - License: Apache 2.0 Supported Languages: English, German, Spanish, French, Japanese, Portuguese, Arabic, Czech, Italian, Korean, Dutch, and Chinese. However, users may finetune this Granite model for languages beyond these 12 languages. Intended Use: This model is designed to handle general instruction-following tasks and can be integrated into AI assistants across various domains, including business applications. Capabilities Thinking Summarization Text classification Text extraction Question-answering Retrieval Augmented Generation (RAG) Code related tasks Function-calling tasks Multilingual dialog use cases Long-context tasks including long document/meeting summarization, long document QA, etc. Generation: This is a simple example of how to use Granite-3.2-8B-Instruct model. Then, copy the snippet from the section that is relevant for your use case. Models ArenaHard Alpaca-Eval-2 MMLU PopQA TruthfulQA BigBenchHard DROP GSM8K HumanEval HumanEval+ IFEval AttaQ Llama-3.1-8B-Instruct 36.43 27.22 69.15 28.79 52.79 72.66 61.48 83.24 85.32 80.15 79.10 83.43 DeepSeek-R1-Distill-Llama-8B 17.17 21.85 45.80 13.25 47.43 65.71 44.46 72.18 67.54 62.91 66.50 42.87 Qwen-2.5-7B-Instruct 25.44 30.34 74.30 18.12 63.06 70.40 54.71 84.46 93.35 89.91 74.90 81.90 DeepSeek-R1-Distill-Qwen-7B 10.36 15.35 50.72 9.94 47.14 65.04 42.76 78.47 79.89 78.43 59.10 42.45 Granite-3.1-8B-Instruct 37.58 30.34 66.77 28.7 65.84 68.55 50.78 79.15 89.63 85.79 73.20 85.73 Granite-3.1-2B-Instruct 23.3 27.17 57.11 20.55 59.79 54.46 18.68 67.55 79.45 75.26 63.59 84.7 Granite-3.2-2B-Instruct 24.86 34.51 57.18 20.56 59.8 52.27 21.12 67.02 80.13 73.39 61.55 83.23 Granite-3.2-8B-Instruct 55.25 61.19 66.79 28.04 66.92 64.77 50.95 81.65 89.35 85.72 74.31 85.42 Training Data: Overall, our training data is largely comprised of two key sources: (1) publicly available datasets with permissive license, (2) internal synthetically generated data targeted to enhance reasoning capabilites. Infrastructure: We train Granite-3.2-8B-Instruct using IBM's super computing cluster, Blue Vela, which is outfitted with NVIDIA H100 GPUs. This cluster provides a scalable and efficient infrastructure for training our models over thousands of GPUs. Ethical Considerations and Limitations: Granite-3.2-8B-Instruct builds upon Granite-3.1-8B-Instruct, leveraging both permissively licensed open-source and select proprietary data for enhanced performance. Since it inherits its foundation from the previous model, all ethical considerations and limitations applicable to Granite-3.1-8B-Instruct remain relevant. Resources - ⭐️ Learn about the latest updates with Granite: https://www.ibm.com/granite - 📄 Get started with tutorials, best practices, and prompt engineering advice: https://www.ibm.com/granite/docs/ - 💡 Learn about the latest Granite learning resources: https://ibm.biz/granite-learning-resources
granite-embedding-30m-sparse
granite-4.0-h-tiny-GGUF
> [!NOTE] > This repository contains models that have been converted to the GGUF format with various quantizations from an IBM Granite base model. > > Please reference the base model's full model card here: > https://huggingface.co/ibm-granite/granite-4.0-h-tiny
granite-guardian-hap-38m
granite-4.0-350m
Model Summary: Granite-4.0-350M is a lightweight instruct model finetuned from Granite-4.0-350M-Base using a combination of open source instruction datasets with permissive license and internally collected synthetic datasets. This model is developed using a diverse set of techniques including supervised finetuning, reinforcement learning, and model merging. - Developers: Granite Team, IBM - HF Collection: Granite 4.0 Nano Language Models HF Collection - GitHub Repository: ibm-granite/granite-4.0-nano-language-models - Website: Granite Docs - Release Date: October 28, 2025 - License: Apache 2.0 Supported Languages: English, German, Spanish, French, Japanese, Portuguese, Arabic, Czech, Italian, Korean, Dutch, and Chinese. Users may fine-tune Granite 4.0 Nano models to support languages beyond those included in this list. Intended use: Granite 4.0 Nano instruct models feature strong instruction following capabilities bringing advanced AI capabilities within reach for on-device deployments and research use cases. Additionally, their compact size makes them well-suited for fine-tuning on specialized domains without requiring massive compute resources. Capabilities Summarization Text classification Text extraction Question-answering Retrieval Augmented Generation (RAG) Code related tasks Function-calling tasks Multilingual dialog use cases Fill-In-the-Middle (FIM) code completions Need to test the examples. (especially the tool calling and RAG ones) --> Generation: This is a simple example of how to use Granite-4.0-350M model. Then, copy the snippet from the section that is relevant for your use case. Tool-calling: Granite-4.0-350M comes with enhanced tool calling capabilities, enabling seamless integration with external functions and APIs. To define a list of tools please follow OpenAI's function definition schema. This is an example of how to use Granite-4.0-350M model tool-calling ability: Benchmarks Metric 350M Dense H 350M Dense 1B Dense H 1B Dense Multilingual Benchmarks and thr included languages: MMMLU 11 ar, de, en, es, fr, ja, ko, pt, zh, bn, hi INCLUDE 14 hi, bn, ta, te, ar, de, es, fr, it, ja, ko, nl, pt, zh Granite-4.0-350M baseline is based on a decoder-only dense transformer architecture. Core components of this architecture are: GQA, MLP with SwiGLU, RMSNorm, and shared input/output embeddings. Number of layers 28 attention 4 attention / 28 Mamba2 40 attention 4 attention / 36 Mamba2 MLP / Shared expert hidden size 2048 2048 4096 4096 Training Data: Overall, our SFT data is largely comprised of three key sources: (1) publicly available datasets with permissive license, (2) internal synthetic data targeting specific capabilities, and (3) a select set of human-curated data. Infrastructure: We trained the Granite 4.0 Nano Language Models utilizing an NVIDIA GB200 NVL72 cluster hosted in CoreWeave. Intra-rack communication occurs via the 72-GPU NVLink domain, and a non-blocking, full Fat-Tree NDR 400 Gb/s InfiniBand network provides inter-rack communication. This cluster provides a scalable and efficient infrastructure for training our models over thousands of GPUs. Ethical Considerations and Limitations: Granite 4.0 Nano Instruct Models are primarily finetuned using instruction-response pairs mostly in English, but also multilingual data covering multiple languages. Although this model can handle multilingual dialog use cases, its performance might not be similar to English tasks. In such case, introducing a small number of examples (few-shot) can help the model in generating more accurate outputs. While this model has been aligned by keeping safety in consideration, the model may in some cases produce inaccurate, biased, or unsafe responses to user prompts. So we urge the community to use this model with proper safety testing and tuning tailored for their specific tasks. Resources - ⭐️ Learn about the latest updates with Granite: https://www.ibm.com/granite - 📄 Get started with tutorials, best practices, and prompt engineering advice: https://www.ibm.com/granite/docs/ - 💡 Learn about the latest Granite learning resources: https://ibm.biz/granite-learning-resources
granite-3.2-2b-instruct
Model Summary: Granite-3.2-2B-Instruct is an 2-billion-parameter, long-context AI model fine-tuned for thinking capabilities. Built on top of Granite-3.1-2B-Instruct, it has been trained using a mix of permissively licensed open-source datasets and internally generated synthetic data designed for reasoning tasks. The model allows controllability of its thinking capability, ensuring it is applied only when required. - Developers: Granite Team, IBM - Website: Granite Docs - Release Date: February 26th, 2025 - License: Apache 2.0 Supported Languages: English, German, Spanish, French, Japanese, Portuguese, Arabic, Czech, Italian, Korean, Dutch, and Chinese. However, users may finetune this Granite model for languages beyond these 12 languages. Intended Use: This model is designed to handle general instruction-following tasks and can be integrated into AI assistants across various domains, including business applications. Capabilities Thinking Summarization Text classification Text extraction Question-answering Retrieval Augmented Generation (RAG) Code related tasks Function-calling tasks Multilingual dialog use cases Long-context tasks including long document/meeting summarization, long document QA, etc. Generation: This is a simple example of how to use Granite-3.2-2B-Instruct model. Then, copy the snippet from the section that is relevant for your use case. Models ArenaHard Alpaca-Eval-2 MMLU PopQA TruthfulQA BigBenchHard DROP GSM8K HumanEval HumanEval+ IFEval AttaQ Llama-3.1-8B-Instruct 36.43 27.22 69.15 28.79 52.79 72.66 61.48 83.24 85.32 80.15 79.10 83.43 DeepSeek-R1-Distill-Llama-8B 17.17 21.85 45.80 13.25 47.43 65.71 44.46 72.18 67.54 62.91 66.50 42.87 Qwen-2.5-7B-Instruct 25.44 30.34 74.30 18.12 63.06 70.40 54.71 84.46 93.35 89.91 74.90 81.90 DeepSeek-R1-Distill-Qwen-7B 10.36 15.35 50.72 9.94 47.14 65.04 42.76 78.47 79.89 78.43 59.10 42.45 Granite-3.1-8B-Instruct 37.58 30.34 66.77 28.7 65.84 68.55 50.78 79.15 89.63 85.79 73.20 85.73 Granite-3.1-2B-Instruct 23.3 27.17 57.11 20.55 59.79 54.46 18.68 67.55 79.45 75.26 63.59 84.7 Granite-3.2-8B-Instruct 55.25 61.19 66.79 28.04 66.92 64.77 50.95 81.65 89.35 85.72 74.31 85.42 Granite-3.2-2B-Instruct 24.86 34.51 57.18 20.56 59.8 52.27 21.12 67.02 80.13 73.39 61.55 83.23 Training Data: Overall, our training data is largely comprised of two key sources: (1) publicly available datasets with permissive license, (2) internal synthetically generated data targeted to enhance reasoning capabilites. Infrastructure: We train Granite-3.2-2B-Instruct using IBM's super computing cluster, Blue Vela, which is outfitted with NVIDIA H100 GPUs. This cluster provides a scalable and efficient infrastructure for training our models over thousands of GPUs. Ethical Considerations and Limitations: Granite-3.2-2B-Instruct builds upon Granite-3.1-2B-Instruct, leveraging both permissively licensed open-source and select proprietary data for enhanced performance. Since it inherits its foundation from the previous model, all ethical considerations and limitations applicable to Granite-3.1-2B-Instruct remain relevant. Resources - ⭐️ Learn about the latest updates with Granite: https://www.ibm.com/granite - 📄 Get started with tutorials, best practices, and prompt engineering advice: https://www.ibm.com/granite/docs/ - 💡 Learn about the latest Granite learning resources: https://ibm.biz/granite-learning-resources
granite-3.3-8b-instruct-GGUF
granite-4.0-1b
Model Summary: Granite-4.0-1B is a lightweight instruct model finetuned from Granite-4.0-1B-Base using a combination of open source instruction datasets with permissive license and internally collected synthetic datasets. This model is developed using a diverse set of techniques including supervised finetuning, reinforcement learning, and model merging. - Developers: Granite Team, IBM - HF Collection: Granite 4.0 Nano Language Models HF Collection - GitHub Repository: ibm-granite/granite-4.0-nano-language-models - Website: Granite Docs - Release Date: October 28, 2025 - License: Apache 2.0 Supported Languages: English, German, Spanish, French, Japanese, Portuguese, Arabic, Czech, Italian, Korean, Dutch, and Chinese. Users may fine-tune Granite 4.0 Nano models to support languages beyond those included in this list. Intended use: Granite 4.0 Nano instruct models feature strong instruction following capabilities bringing advanced AI capabilities within reach for on-device deployments and research use cases. Additionally, their compact size makes them well-suited for fine-tuning on specialized domains without requiring massive compute resources. Capabilities Summarization Text classification Text extraction Question-answering Retrieval Augmented Generation (RAG) Code related tasks Function-calling tasks Multilingual dialog use cases Fill-In-the-Middle (FIM) code completions Need to test the examples. (especially the tool calling and RAG ones) --> Generation: This is a simple example of how to use Granite-4.0-1B model. Then, copy the snippet from the section that is relevant for your use case. Tool-calling: Granite-4.0-1B comes with enhanced tool calling capabilities, enabling seamless integration with external functions and APIs. To define a list of tools please follow OpenAI's function definition schema. This is an example of how to use Granite-4.0-1B model tool-calling ability: Benchmarks Metric 350M Dense H 350M Dense 1B Dense H 1B Dense Multilingual Benchmarks and thr included languages: MMMLU 11 ar, de, en, es, fr, ja, ko, pt, zh, bn, hi INCLUDE 14 hi, bn, ta, te, ar, de, es, fr, it, ja, ko, nl, pt, zh Granite-4.0-1B baseline is based on a decoder-only dense transformer architecture. Core components of this architecture are: GQA, MLP with SwiGLU, RMSNorm, and shared input/output embeddings. Number of layers 28 attention 4 attention / 28 Mamba2 40 attention 4 attention / 36 Mamba2 MLP / Shared expert hidden size 2048 2048 4096 4096 Training Data: Overall, our SFT data is largely comprised of three key sources: (1) publicly available datasets with permissive license, (2) internal synthetic data targeting specific capabilities, and (3) a select set of human-curated data. Infrastructure: We trained the Granite 4.0 Nano Language Models utilizing an NVIDIA GB200 NVL72 cluster hosted in CoreWeave. Intra-rack communication occurs via the 72-GPU NVLink domain, and a non-blocking, full Fat-Tree NDR 400 Gb/s InfiniBand network provides inter-rack communication. This cluster provides a scalable and efficient infrastructure for training our models over thousands of GPUs. Ethical Considerations and Limitations: Granite 4.0 Nano Instruct Models are primarily finetuned using instruction-response pairs mostly in English, but also multilingual data covering multiple languages. Although this model can handle multilingual dialog use cases, its performance might not be similar to English tasks. In such case, introducing a small number of examples (few-shot) can help the model in generating more accurate outputs. While this model has been aligned by keeping safety in consideration, the model may in some cases produce inaccurate, biased, or unsafe responses to user prompts. So we urge the community to use this model with proper safety testing and tuning tailored for their specific tasks. Resources - ⭐️ Learn about the latest updates with Granite: https://www.ibm.com/granite - 📄 Get started with tutorials, best practices, and prompt engineering advice: https://www.ibm.com/granite/docs/ - 💡 Learn about the latest Granite learning resources: https://ibm.biz/granite-learning-resources
granite-3.1-2b-instruct
Model Summary: Granite-3.1-2B-Instruct is a 2B parameter long-context instruct model finetuned from Granite-3.1-2B-Base using a combination of open source instruction datasets with permissive license and internally collected synthetic datasets tailored for solving long context problems. This model is developed using a diverse set of techniques with a structured chat format, including supervised finetuning, model alignment using reinforcement learning, and model merging. - Developers: Granite Team, IBM - GitHub Repository: ibm-granite/granite-3.1-language-models - Website: Granite Docs - Paper: Granite 3.1 Language Models (coming soon) - Release Date: December 18th, 2024 - License: Apache 2.0 Supported Languages: English, German, Spanish, French, Japanese, Portuguese, Arabic, Czech, Italian, Korean, Dutch, and Chinese. Users may finetune Granite 3.1 models for languages beyond these 12 languages. Intended Use: The model is designed to respond to general instructions and can be used to build AI assistants for multiple domains, including business applications. Capabilities Summarization Text classification Text extraction Question-answering Retrieval Augmented Generation (RAG) Code related tasks Function-calling tasks Multilingual dialog use cases Long-context tasks including long document/meeting summarization, long document QA, etc. Generation: This is a simple example of how to use Granite-3.1-2B-Instruct model. Then, copy the snippet from the section that is relevant for your use case. Models ARC-Challenge Hellaswag MMLU TruthfulQA Winogrande GSM8K Avg Granite-3.1-8B-Instruct 62.62 84.48 65.34 66.23 75.37 73.84 71.31 Granite-3.1-2B-Instruct 54.61 75.14 55.31 59.42 67.48 52.76 60.79 Granite-3.1-3B-A800M-Instruct 50.42 73.01 52.19 49.71 64.87 48.97 56.53 Granite-3.1-1B-A400M-Instruct 42.66 65.97 26.13 46.77 62.35 33.88 46.29 Models IFEval BBH MATH Lvl 5 GPQA MUSR MMLU-Pro Avg Granite-3.1-8B-Instruct 72.08 34.09 21.68 8.28 19.01 28.19 30.55 Granite-3.1-2B-Instruct 62.86 21.82 11.33 5.26 4.87 20.21 21.06 Granite-3.1-3B-A800M-Instruct 55.16 16.69 10.35 5.15 2.51 12.75 17.1 Granite-3.1-1B-A400M-Instruct 46.86 6.18 4.08 0 0.78 2.41 10.05 Model Architecture: Granite-3.1-2B-Instruct is based on a decoder-only dense transformer architecture. Core components of this architecture are: GQA and RoPE, MLP with SwiGLU, RMSNorm, and shared input/output embeddings. Training Data: Overall, our SFT data is largely comprised of three key sources: (1) publicly available datasets with permissive license, (2) internal synthetic data targeting specific capabilities including long-context tasks, and (3) very small amounts of human-curated data. A detailed attribution of datasets can be found in the Granite 3.0 Technical Report, Granite 3.1 Technical Report (coming soon), and Accompanying Author List. Infrastructure: We train Granite 3.1 Language Models using IBM's super computing cluster, Blue Vela, which is outfitted with NVIDIA H100 GPUs. This cluster provides a scalable and efficient infrastructure for training our models over thousands of GPUs. Ethical Considerations and Limitations: Granite 3.1 Instruct Models are primarily finetuned using instruction-response pairs mostly in English, but also multilingual data covering eleven languages. Although this model can handle multilingual dialog use cases, its performance might not be similar to English tasks. In such case, introducing a small number of examples (few-shot) can help the model in generating more accurate outputs. While this model has been aligned by keeping safety in consideration, the model may in some cases produce inaccurate, biased, or unsafe responses to user prompts. So we urge the community to use this model with proper safety testing and tuning tailored for their specific tasks. Resources - ⭐️ Learn about the latest updates with Granite: https://www.ibm.com/granite - 📄 Get started with tutorials, best practices, and prompt engineering advice: https://www.ibm.com/granite/docs/ - 💡 Learn about the latest Granite learning resources: https://ibm.biz/granite-learning-resources
granite-3.3-8b-base
granite-8b-code-instruct-4k-GGUF
granite-4.0-h-micro-base
Model Summary: Granite-4.0-H-Micro-Base is a decoder-only, long-context language model designed for a wide range of text-to-text generation tasks. It also supports Fill-in-the-Middle (FIM) code completion through the use of specialized prefix and suffix tokens. The model is trained from scratch on approximately 18 trillion tokens following a four-stage training strategy: 10 trillion tokens in the first stage, 5 trillion in the second, 2 trillion in the third, and 0.5 trillion in the final stage. - Developers: Granite Team, IBM - HF Collection: Granite 4.0 Language Models HF Collection - GitHub Repository: ibm-granite/granite-4.0-language-models - Website: Granite Docs - Release Date: October 2nd, 2025 - License: Apache 2.0 Supported Languages: English, German, Spanish, French, Japanese, Portuguese, Arabic, Czech, Italian, Korean, Dutch, and Chinese. Users may finetune Granite 4.0 models for languages beyond these languages. Intended Use: Prominent use cases of LLMs in text-to-text generation include summarization, text classification, extraction, question-answering, code-completion (including FIM), and long-context generation tasks. All Granite Base models are able to handle these tasks as they were trained on a large amount of data from various domains. Moreover, they can serve as baseline to create specialized models for specific application scenarios. Generation: This is a simple example of how to use Granite-4.0-H-Micro-Base model. Then, copy the code snippet below to run the example. Benchmarks Metric Micro Dense H Micro Dense H Tiny MoE H Small MoE HumanEval pass@1 [StarCoder Prompt] 76.19 73.72 77.59 83.66 Multilingual Benchmarks and thr included languages: MMMLU 11 ar, de, en, es, fr, ja, ko, pt, zh, bn, hi INCLUDE 14 hindi, bengali, tamil, telugu, arabic, german, spanish, french, italian, japanese, korean, dutch, portuguese, chinese --> hi, bn, ta, te, ar, de, es, fr, it, ja, ko, nl, pt, zh Model Architecture: Granite-4.0-H-Micro-Base is based on a decoder-only dense transformer architecture. Core components of this architecture are: GQA, Mamba2, MLP with SwiGLU, RMSNorm, and shared input/output embeddings. Model Micro Dense H Micro Dense H Tiny MoE H Small MoE Number of layers 40 attention 4 attention / 36 Mamba2 4 attention / 36 Mamba2 4 attention / 36 Mamba2 MLP / Shared expert hidden size 8192 8192 1024 1536 Training Data: This model is trained on a mix of open source and proprietary data following a four-stage training strategy. Stage Characteristics Micro Dense H Micro Dense H Tiny MoE H Small MoE I General mixture of training data, warmup, and power scheduler for learning rate. 10 10 15 15 II General mixture of training data with higher percentages of code and math with power scheduler for learning rate. 2 5 5 5 III High quality training data, exponential decay of learning rate. 2 2 2 2 IV High quality training data, linear decay to zero for learning rate. 0.5 0.5 0.5 0.5 Infrastructure: We trained the Granite 4.0 Language Models utilizing an NVIDIA GB200 NVL72 cluster hosted in CoreWeave. Intra-rack communication occurs via the 72-GPU NVLink domain, and a non-blocking, full Fat-Tree NDR 400 Gb/s InfiniBand network provides inter-rack communication. This cluster provides a scalable and efficient infrastructure for training our models over thousands of GPUs. Ethical Considerations and Limitations: The use of Large Language Models involves risks and ethical considerations people must be aware of, including but not limited to: bias and fairness, misinformation, and autonomous decision-making. Granite-4.0-H-Micro-Base model is not the exception in this regard. Even though this model is suited for multiple generative AI tasks, it has not undergone any safety alignment, there it may produce problematic outputs. Additionally, it remains uncertain whether smaller models might exhibit increased susceptibility to hallucination in generation scenarios by copying text verbatim from the training dataset due to their reduced sizes and memorization capacities. This aspect is currently an active area of research, and we anticipate more rigorous exploration, comprehension, and mitigations in this domain. Regarding ethics, a latent risk associated with all Large Language Models is their malicious utilization. We urge the community to use Granite-4.0-H-Micro-Base model with ethical intentions and in a responsible way. Resources - ⭐️ Learn about the latest updates with Granite: https://www.ibm.com/granite - 📄 Get started with tutorials, best practices, and prompt engineering advice: https://www.ibm.com/granite/docs/ - 💡 Learn about the latest Granite learning resources: https://github.com/ibm-granite-community/
granite-3.1-3b-a800m-instruct
Model Summary: Granite-3.1-3B-A800M-Instruct is a 3B parameter long-context instruct model finetuned from Granite-3.1-3B-A800M-Base using a combination of open source instruction datasets with permissive license and internally collected synthetic datasets tailored for solving long context problems. This model is developed using a diverse set of techniques with a structured chat format, including supervised finetuning, model alignment using reinforcement learning, and model merging. - Developers: Granite Team, IBM - GitHub Repository: ibm-granite/granite-3.1-language-models - Website: Granite Docs - Paper: Granite 3.1 Language Models (coming soon) - Release Date: December 18th, 2024 - License: Apache 2.0 Supported Languages: English, German, Spanish, French, Japanese, Portuguese, Arabic, Czech, Italian, Korean, Dutch, and Chinese. Users may finetune Granite 3.1 models for languages beyond these 12 languages. Intended Use: The model is designed to respond to general instructions and can be used to build AI assistants for multiple domains, including business applications. Capabilities Summarization Text classification Text extraction Question-answering Retrieval Augmented Generation (RAG) Code related tasks Function-calling tasks Multilingual dialog use cases Long-context tasks including long document/meeting summarization, long document QA, etc. Generation: This is a simple example of how to use Granite-3.1-3B-A800M-Instruct model. Then, copy the snippet from the section that is relevant for your use case. Models ARC-Challenge Hellaswag MMLU TruthfulQA Winogrande GSM8K Avg Granite-3.1-8B-Instruct 62.62 84.48 65.34 66.23 75.37 73.84 71.31 Granite-3.1-2B-Instruct 54.61 75.14 55.31 59.42 67.48 52.76 60.79 Granite-3.1-3B-A800M-Instruct 50.42 73.01 52.19 49.71 64.87 48.97 56.53 Granite-3.1-1B-A400M-Instruct 42.66 65.97 26.13 46.77 62.35 33.88 46.29 Models IFEval BBH MATH Lvl 5 GPQA MUSR MMLU-Pro Avg Granite-3.1-8B-Instruct 72.08 34.09 21.68 8.28 19.01 28.19 30.55 Granite-3.1-2B-Instruct 62.86 21.82 11.33 5.26 4.87 20.21 21.06 Granite-3.1-3B-A800M-Instruct 55.16 16.69 10.35 5.15 2.51 12.75 17.1 Granite-3.1-1B-A400M-Instruct 46.86 6.18 4.08 0 0.78 2.41 10.05 Model Architecture: Granite-3.1-3B-A800M-Instruct is based on a decoder-only dense transformer architecture. Core components of this architecture are: GQA and RoPE, MLP with SwiGLU, RMSNorm, and shared input/output embeddings. Training Data: Overall, our SFT data is largely comprised of three key sources: (1) publicly available datasets with permissive license, (2) internal synthetic data targeting specific capabilities including long-context tasks, and (3) very small amounts of human-curated data. A detailed attribution of datasets can be found in the Granite 3.0 Technical Report, Granite 3.1 Technical Report (coming soon), and Accompanying Author List. Infrastructure: We train Granite 3.1 Language Models using IBM's super computing cluster, Blue Vela, which is outfitted with NVIDIA H100 GPUs. This cluster provides a scalable and efficient infrastructure for training our models over thousands of GPUs. Ethical Considerations and Limitations: Granite 3.1 Instruct Models are primarily finetuned using instruction-response pairs mostly in English, but also multilingual data covering eleven languages. Although this model can handle multilingual dialog use cases, its performance might not be similar to English tasks. In such case, introducing a small number of examples (few-shot) can help the model in generating more accurate outputs. While this model has been aligned by keeping safety in consideration, the model may in some cases produce inaccurate, biased, or unsafe responses to user prompts. So we urge the community to use this model with proper safety testing and tuning tailored for their specific tasks. Resources - ⭐️ Learn about the latest updates with Granite: https://www.ibm.com/granite - 📄 Get started with tutorials, best practices, and prompt engineering advice: https://www.ibm.com/granite/docs/ - 💡 Learn about the latest Granite learning resources: https://ibm.biz/granite-learning-resources
granite-3b-code-base-2k
granite-8b-code-instruct-128k
granite-timeseries-patchtst
`PatchTST` is a transformer-based model for time series modeling tasks, including forecasting, regression, and classification. This repository contains a pre-trained `PatchTST` model encompassing all seven channels of the `ETTh1` dataset. This particular pre-trained model produces a Mean Squared Error (MSE) of 0.3881 on the `test` split of the `ETTh1` dataset when forecasting 96 hours into the future with a historical data window of 512 hours. For training and evaluating a `PatchTST` model, you can refer to this demo notebook. The `PatchTST` model was proposed in A Time Series is Worth 64 Words: Long-term Forecasting with Transformers by Yuqi Nie, Nam H. Nguyen, Phanwadee Sinthong, Jayant Kalagnanam. At a high level the model vectorizes time series into patches of a given size and encodes the resulting sequence of vectors via a Transformer that then outputs the prediction length forecast via an appropriate head. The model is based on two key components: (i) segmentation of time series into subseries-level patches which are served as input tokens to Transformer; (ii) channel-independence where each channel contains a single univariate time series that shares the same embedding and Transformer weights across all the series. The patching design naturally has three-fold benefit: local semantic information is retained in the embedding; computation and memory usage of the attention maps are quadratically reduced given the same look-back window; and the model can attend longer history. Our channel-independent patch time series Transformer (PatchTST) can improve the long-term forecasting accuracy significantly when compared with that of SOTA Transformer-based models. In addition, PatchTST has a modular design to seamlessly support masked time series pre-training as well as direct time series forecasting, classification, and regression. - Repository: PatchTST Hugging Face - Paper: PatchTST ICLR 2023 paper - Demo: Get started with PatchTST This pre-trained model can be employed for fine-tuning or evaluation using any Electrical Transformer dataset that has the same channels as the `ETTh1` dataset, specifically: `HUFL, HULL, MUFL, MULL, LUFL, LULL, OT`. The model is designed to predict the next 96 hours based on the input values from the preceding 512 hours. It is crucial to normalize the data. For a more comprehensive understanding of data pre-processing, please consult the paper or the demo. `ETTh1`/train split. Train/validation/test splits are shown in the demo. The following hyperparameters were used during training: - learningrate: 5e-05 - trainbatchsize: 8 - evalbatchsize: 8 - seed: 42 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lrschedulertype: linear - numepochs: 10 | Training Loss | Epoch | Step | Validation Loss | |:-------------:|:-----:|:-----:|:---------------:| | 0.4306 | 1.0 | 1005 | 0.7268 | | 0.3641 | 2.0 | 2010 | 0.7456 | | 0.348 | 3.0 | 3015 | 0.7161 | | 0.3379 | 4.0 | 4020 | 0.7428 | | 0.3284 | 5.0 | 5025 | 0.7681 | | 0.321 | 6.0 | 6030 | 0.7842 | | 0.314 | 7.0 | 7035 | 0.7991 | | 0.3088 | 8.0 | 8040 | 0.8021 | | 0.3053 | 9.0 | 9045 | 0.8199 | | 0.3019 | 10.0 | 10050 | 0.8173 | `ETTh1`/test split. Train/validation/test splits are shown in the demo. Results It achieves a MSE of 0.3881 on the evaluation dataset. - Transformers 4.36.0.dev0 - Pytorch 2.0.1 - Datasets 2.14.4 - Tokenizers 0.14.1
granite-docling-258M-mlx
Granite Docling is a multimodal Image-Text-to-Text model engineered for efficient document conversion. It preserves the core features of Docling while maintaining seamless integration with DoclingDocuments to ensure full compatibility. This model was converted to MLX format from `ibm-granite/granite-docling-258M` using mlx-vlm version 0.3.3. Refer to the original model card for more details on the model. 💡 This MLX model is optimized to run efficiently on Apple Silicon Macs. If you run through 🐥Docling, it will automatically choose the MLX version of the Granite-Docling model. You can select it with the CLI options shown below: You can also run plain mlx-vlm to generate predictions. To run with the `mlx-vlm` python SDK, parse the output as a `DoclingDocument` and export to various formats (e.g. Markdown, HTML), please refer to the code below.
granite-4.0-h-micro-GGUF
> [!NOTE] > This repository contains models that have been converted to the GGUF format with various quantizations from an IBM Granite base model. > > Please reference the base model's full model card here: > https://huggingface.co/ibm-granite/granite-4.0-h-micro
granite-3.1-2b-base
Model Summary: Granite-3.1-2B-Base extends the context length of Granite-3.0-2B-Base from 4K to 128K using a progressive training strategy by increasing the supported context length in increments while adjusting RoPE theta until the model has successfully adapted to desired length of 128K. This long-context pre-training stage was performed using approximately 500B tokens. - Developers: Granite Team, IBM - GitHub Repository: ibm-granite/granite-3.1-language-models - Website: Granite Docs - Paper: Granite 3.1 Language Models (coming soon) - Release Date: December 18th, 2024 - License: Apache 2.0 Supported Languages: English, German, Spanish, French, Japanese, Portuguese, Arabic, Czech, Italian, Korean, Dutch, and Chinese. Users may finetune Granite 3.1 models for languages beyond these 12 languages. Intended Use: Prominent use cases of LLMs in text-to-text generation include summarization, text classification, extraction, question-answering, and other long-context tasks. All Granite Base models are able to handle these tasks as they were trained on a large amount of data from various domains. Moreover, they can serve as baseline to create specialized models for specific application scenarios. Generation: This is a simple example of how to use Granite-3.1-2B-Base model. Then, copy the code snippet below to run the example. Models ARC-Challenge Hellaswag MMLU TruthfulQA Winogrande GSM8K Avg Granite-3.1-8B-Base 63.99 83.27 63.45 51.29 78.92 60.19 66.85 Granite-3.1-2B-Base 53.58 77.67 52.86 39.02 72.84 47.99 57.32 Granite-3.1-3B-A800M-Base 50.76 74.45 48.31 39.91 69.29 40.56 53.88 Granite-3.1-1B-A400M-Base 39.42 66.13 26.53 37.67 2.03 18.87 31.78 Models IFEval BBH MATH Lvl 5 GPQA MUSR MMLU-Pro Avg Granite-3.1-8B-Base 42.21 26.02 9.52 9.51 8.36 24.8 20.07 Granite-3.1-2B-Base 35.22 16.84 5.59 3.69 3.9 13.9 13.19 Granite-3.1-3B-A800M-Base 29.96 11.91 4 3.69 1.11 8.81 9.91 Granite-3.1-1B-A400M-Base 25.19 6.43 2.19 0.22 1.76 1.55 6.22 Model Architecture: Granite-3.1-2B-Base is based on a decoder-only dense transformer architecture. Core components of this architecture are: GQA and RoPE, MLP with SwiGLU, RMSNorm, and shared input/output embeddings. Training Data: This model is trained on a mix of open source and proprietary data following a three-stage training strategy. Stage 1 data: The data for stage 1 is sourced from diverse domains, such as: web, code, academic sources, books, and math data. Stage 2 data: The data for stage 2 comprises a curated mix of high-quality data from the same domains, plus multilingual and instruction data. The goal of this second training phase is to enhance the model’s performance on specific tasks. Stage 3 data: The data for stage 3 consists of original stage-2 pretraining data with additional synthetic long-context data in form of QA/summary pairs where the answer contains a recitation of the related paragraph before the answer. A detailed attribution of datasets can be found in the Granite 3.0 Technical Report, Granite 3.1 Technical Report (coming soon), and Accompanying Author List. Infrastructure: We train Granite 3.1 Language Models using IBM's super computing cluster, Blue Vela, which is outfitted with NVIDIA H100 GPUs. This cluster provides a scalable and efficient infrastructure for training our models over thousands of GPUs. Ethical Considerations and Limitations: The use of Large Language Models involves risks and ethical considerations people must be aware of, including but not limited to: bias and fairness, misinformation, and autonomous decision-making. Granite-3.1-2B-Base model is not the exception in this regard. Even though this model is suited for multiple generative AI tasks, it has not undergone any safety alignment, there it may produce problematic outputs. Additionally, it remains uncertain whether smaller models might exhibit increased susceptibility to hallucination in generation scenarios by copying text verbatim from the training dataset due to their reduced sizes and memorization capacities. This aspect is currently an active area of research, and we anticipate more rigorous exploration, comprehension, and mitigations in this domain. Regarding ethics, a latent risk associated with all Large Language Models is their malicious utilization. We urge the community to use Granite-3.1-2B-Base model with ethical intentions and in a responsible way. Resources - ⭐️ Learn about the latest updates with Granite: https://www.ibm.com/granite - 📄 Get started with tutorials, best practices, and prompt engineering advice: https://www.ibm.com/granite/docs/ - 💡 Learn about the latest Granite learning resources: https://ibm.biz/granite-learning-resources
granite-3.0-2b-instruct
Model Summary: Granite-3.0-2B-Instruct is a 2B parameter model finetuned from Granite-3.0-2B-Base using a combination of open source instruction datasets with permissive license and internally collected synthetic datasets. This model is developed using a diverse set of techniques with a structured chat format, including supervised finetuning, model alignment using reinforcement learning, and model merging. - Developers: Granite Team, IBM - GitHub Repository: ibm-granite/granite-3.0-language-models - Website: Granite Docs - Paper: Granite 3.0 Language Models - Release Date: October 21st, 2024 - License: Apache 2.0 Supported Languages: English, German, Spanish, French, Japanese, Portuguese, Arabic, Czech, Italian, Korean, Dutch, and Chinese. Users may finetune Granite 3.0 models for languages beyond these 12 languages. Intended use: The model is designed to respond to general instructions and can be used to build AI assistants for multiple domains, including business applications. Capabilities Summarization Text classification Text extraction Question-answering Retrieval Augmented Generation (RAG) Code related tasks Function-calling tasks Multilingual dialog use cases Generation: This is a simple example of how to use Granite-3.0-2B-Instruct model. Then, copy the snippet from the section that is relevant for your use case. Model Architecture: Granite-3.0-2B-Instruct is based on a decoder-only dense transformer architecture. Core components of this architecture are: GQA and RoPE, MLP with SwiGLU, RMSNorm, and shared input/output embeddings. | Model | 2B Dense | 8B Dense | 1B MoE | 3B MoE | | :-------- | :-------- | :--------| :--------| :--------| | Embedding size | 2048 | 4096 | 1024 | 1536 | | Number of layers | 40 | 40 | 24 | 32 | | Attention head size | 64 | 128 | 64 | 64 | | Number of attention heads | 32 | 32 | 16 | 24 | | Number of KV heads | 8 | 8 | 8 | 8 | | MLP hidden size | 8192 | 12800 | 512 | 512 | | MLP activation | SwiGLU | SwiGLU | SwiGLU | SwiGLU | | Number of Experts | — | — | 32 | 40 | | MoE TopK | — | — | 8 | 8 | | Initialization std | 0.1 | 0.1 | 0.1 | 0.1 | | Sequence Length | 4096 | 4096 | 4096 | 4096 | | Position Embedding | RoPE | RoPE | RoPE | RoPE | | # Parameters | 2.5B | 8.1B | 1.3B | 3.3B | | # Active Parameters | 2.5B | 8.1B | 400M | 800M | | # Training tokens | 12T | 12T | 10T | 10T | Training Data: Overall, our SFT data is largely comprised of three key sources: (1) publicly available datasets with permissive license, (2) internal synthetic data targeting specific capabilities, and (3) very small amounts of human-curated data. A detailed attribution of datasets can be found in the Granite Technical Report and Accompanying Author List. Infrastructure: We train Granite 3.0 Language Models using IBM's super computing cluster, Blue Vela, which is outfitted with NVIDIA H100 GPUs. This cluster provides a scalable and efficient infrastructure for training our models over thousands of GPUs while minimizing environmental impact by utilizing 100% renewable energy sources. Ethical Considerations and Limitations: Granite 3.0 Instruct Models are primarily finetuned using instruction-response pairs mostly in English, but also multilingual data covering eleven languages. Although this model can handle multilingual dialog use cases, its performance might not be similar to English tasks. In such case, introducing a small number of examples (few-shot) can help the model in generating more accurate outputs. While this model has been aligned by keeping safety in consideration, the model may in some cases produce inaccurate, biased, or unsafe responses to user prompts. So we urge the community to use this model with proper safety testing and tuning tailored for their specific tasks. Resources - ⭐️ Learn about the latest updates with Granite: https://www.ibm.com/granite - 📄 Get started with tutorials, best practices, and prompt engineering advice: https://www.ibm.com/granite/docs/ - 💡 Learn about the latest Granite learning resources: https://ibm.biz/granite-learning-resources
granite-guardian-3.3-8b
granite-guardian-3.2-3b-a800m
granite-8b-code-instruct-4k
Model Summary Granite-8B-Code-Instruct-4K is a 8B parameter model fine tuned from Granite-8B-Code-Base-4K on a combination of permissively licensed instruction data to enhance instruction following capabilities including logical reasoning and problem-solving skills. - Developers: IBM Research - GitHub Repository: ibm-granite/granite-code-models - Paper: Granite Code Models: A Family of Open Foundation Models for Code Intelligence - Release Date: May 6th, 2024 - License: Apache 2.0. Usage Intended use The model is designed to respond to coding related instructions and can be used to build coding assistants. Generation This is a simple example of how to use Granite-8B-Code-Instruct-4K model. Training Data Granite Code Instruct models are trained on the following types of data. Code Commits Datasets: we sourced code commits data from the CommitPackFT dataset, a filtered version of the full CommitPack dataset. From CommitPackFT dataset, we only consider data for 92 programming languages. Our inclusion criteria boils down to selecting programming languages common across CommitPackFT and the 116 languages that we considered to pretrain the code-base model (Granite-8B-Code-Base). Math Datasets: We consider two high-quality math datasets, MathInstruct and MetaMathQA. Due to license issues, we filtered out GSM8K-RFT and Camel-Math from MathInstruct dataset. Code Instruction Datasets: We use Glaive-Code-Assistant-v3, Glaive-Function-Calling-v2, NL2SQL11 and a small collection of synthetic API calling datasets. Language Instruction Datasets: We include high-quality datasets such as HelpSteer and an open license-filtered version of Platypus. We also include a collection of hardcoded prompts to ensure our model generates correct outputs given inquiries about its name or developers. Infrastructure We train the Granite Code models using two of IBM's super computing clusters, namely Vela and Blue Vela, both outfitted with NVIDIA A100 and H100 GPUs respectively. These clusters provide a scalable and efficient infrastructure for training our models over thousands of GPUs. Ethical Considerations and Limitations Granite code instruct models are primarily finetuned using instruction-response pairs across a specific set of programming languages. Thus, their performance may be limited with out-of-domain programming languages. In this situation, it is beneficial providing few-shot examples to steer the model's output. Moreover, developers should perform safety testing and target-specific tuning before deploying these models on critical applications. The model also inherits ethical considerations and limitations from its base model. For more information, please refer to Granite-8B-Code-Base-4K model card.
granite-4.0-micro-base
Model Summary: Granite-4.0-Micro-Base is a decoder-only, long-context language model designed for a wide range of text-to-text generation tasks. It also supports Fill-in-the-Middle (FIM) code completion through the use of specialized prefix and suffix tokens. The model is trained from scratch on approximately 15 trillion tokens following a four-stage training strategy: 10 trillion tokens in the first stage, 2 trillion in the second, another 2 trillion in the third, and 0.5 trillion in the final stage. - Developers: Granite Team, IBM - HF Collection: Granite 4.0 Language Models HF Collection - GitHub Repository: ibm-granite/granite-4.0-language-models - Website: Granite Docs - Release Date: October 2nd, 2025 - License: Apache 2.0 Supported Languages: English, German, Spanish, French, Japanese, Portuguese, Arabic, Czech, Italian, Korean, Dutch, and Chinese. Users may finetune Granite 4.0 models for languages beyond these languages. Intended Use: Prominent use cases of LLMs in text-to-text generation include summarization, text classification, extraction, question-answering, code-completion (including FIM), and long-context generation tasks. All Granite Base models are able to handle these tasks as they were trained on a large amount of data from various domains. Moreover, they can serve as baseline to create specialized models for specific application scenarios. Generation: This is a simple example of how to use Granite-4.0-Micro-Base model. Then, copy the code snippet below to run the example. Benchmarks Metric Micro Dense H Micro Dense H Tiny MoE H Small MoE HumanEval pass@1 [StarCoder Prompt] 76.19 73.72 77.59 83.66 Multilingual Benchmarks and thr included languages: MMMLU 11 ar, de, en, es, fr, ja, ko, pt, zh, bn, hi INCLUDE 14 hindi, bengali, tamil, telugu, arabic, german, spanish, french, italian, japanese, korean, dutch, portuguese, chinese --> hi, bn, ta, te, ar, de, es, fr, it, ja, ko, nl, pt, zh Granite-4.0-Micro-Base is based on a decoder-only dense transformer architecture. Core components of this architecture are: GQA, RoPE, MLP with SwiGLU, RMSNorm, and shared input/output embeddings. Model Micro Dense H Micro Dense H Tiny MoE H Small MoE Number of layers 40 attention 4 attention / 36 Mamba2 4 attention / 36 Mamba2 4 attention / 36 Mamba2 MLP / Shared expert hidden size 8192 8192 1024 1536 Training Data: This model is trained on a mix of open source and proprietary data following a four-stage training strategy. Stage Characteristics Micro Dense H Micro Dense H Tiny MoE H Small MoE I General mixture of training data, warmup, and power scheduler for learning rate. 10 10 15 15 II General mixture of training data with higher percentages of code and math with power scheduler for learning rate. 2 5 5 5 III High quality training data, exponential decay of learning rate. 2 2 2 2 IV High quality training data, linear decay to zero for learning rate. 0.5 0.5 0.5 0.5 Infrastructure: We trained the Granite 4.0 Language Models utilizing an NVIDIA GB200 NVL72 cluster hosted in CoreWeave. Intra-rack communication occurs via the 72-GPU NVLink domain, and a non-blocking, full Fat-Tree NDR 400 Gb/s InfiniBand network provides inter-rack communication. This cluster provides a scalable and efficient infrastructure for training our models over thousands of GPUs. Ethical Considerations and Limitations: The use of Large Language Models involves risks and ethical considerations people must be aware of, including but not limited to: bias and fairness, misinformation, and autonomous decision-making. Granite-4.0-Micro-Base model is not the exception in this regard. Even though this model is suited for multiple generative AI tasks, it has not undergone any safety alignment, there it may produce problematic outputs. Additionally, it remains uncertain whether smaller models might exhibit increased susceptibility to hallucination in generation scenarios by copying text verbatim from the training dataset due to their reduced sizes and memorization capacities. This aspect is currently an active area of research, and we anticipate more rigorous exploration, comprehension, and mitigations in this domain. Regarding ethics, a latent risk associated with all Large Language Models is their malicious utilization. We urge the community to use Granite-4.0-Micro-Base model with ethical intentions and in a responsible way. Resources - ⭐️ Learn about the latest updates with Granite: https://www.ibm.com/granite - 📄 Get started with tutorials, best practices, and prompt engineering advice: https://www.ibm.com/granite/docs/ - 💡 Learn about the latest Granite learning resources: https://github.com/ibm-granite-community/
granite-4.0-h-tiny-base
Model Summary: Granite-4.0-H-Tiny-Base is a decoder-only, long-context language model designed for a wide range of text-to-text generation tasks. It also supports Fill-in-the-Middle (FIM) code completion through the use of specialized prefix and suffix tokens. The model is trained from scratch on approximately 23 trillion tokens following a four-stage training strategy: 15 trillion tokens in the first stage, 5 trillion in the second, 2 trillion in the third, and 0.5 trillion in the final stage. - Developers: Granite Team, IBM - HF Collection: Granite 4.0 Language Models HF Collection - GitHub Repository: ibm-granite/granite-4.0-language-models - Website: Granite Docs - Release Date: October 2nd, 2025 - License: Apache 2.0 Supported Languages: English, German, Spanish, French, Japanese, Portuguese, Arabic, Czech, Italian, Korean, Dutch, and Chinese. Users may finetune Granite 4.0 models for languages beyond these languages. Intended Use: Prominent use cases of LLMs in text-to-text generation include summarization, text classification, extraction, question-answering, code-completion (including FIM), and long-context generation tasks. All Granite Base models are able to handle these tasks as they were trained on a large amount of data from various domains. Moreover, they can serve as baseline to create specialized models for specific application scenarios. Generation: This is a simple example of how to use Granite-4.0-H-Tiny-Base model. Then, copy the code snippet below to run the example. Benchmarks Metric Micro Dense H Micro Dense H Tiny MoE H Small MoE HumanEval pass@1 [StarCoder Prompt] 76.19 73.72 77.59 83.66 Multilingual Benchmarks and thr included languages: MMMLU 11 ar, de, en, es, fr, ja, ko, pt, zh, bn, hi INCLUDE 14 hindi, bengali, tamil, telugu, arabic, german, spanish, french, italian, japanese, korean, dutch, portuguese, chinese --> hi, bn, ta, te, ar, de, es, fr, it, ja, ko, nl, pt, zh Model Architecture: Granite-4.0-H-Tiny-Base is based on a decoder-only MoE transformer architecture. Core components of this architecture are: GQA, Mamba2, MoEs with shared experts, SwiGLU activation, RMSNorm, and shared input/output embeddings. Model Micro Dense H Micro Dense H Tiny MoE H Small MoE Number of layers 40 attention 4 attention / 36 Mamba2 4 attention / 36 Mamba2 4 attention / 36 Mamba2 MLP / Shared expert hidden size 8192 8192 1024 1536 Training Data: This model is trained on a mix of open source and proprietary data following a four-stage training strategy. Stage Characteristics Micro Dense H Micro Dense H Tiny MoE H Small MoE I General mixture of training data, warmup, and power scheduler for learning rate. 10 10 15 15 II General mixture of training data with higher percentages of code and math with power scheduler for learning rate. 2 5 5 5 III High quality training data, exponential decay of learning rate. 2 2 2 2 IV High quality training data, linear decay to zero for learning rate. 0.5 0.5 0.5 0.5 Infrastructure: We trained the Granite 4.0 Language Models utilizing an NVIDIA GB200 NVL72 cluster hosted in CoreWeave. Intra-rack communication occurs via the 72-GPU NVLink domain, and a non-blocking, full Fat-Tree NDR 400 Gb/s InfiniBand network provides inter-rack communication. This cluster provides a scalable and efficient infrastructure for training our models over thousands of GPUs. Ethical Considerations and Limitations: The use of Large Language Models involves risks and ethical considerations people must be aware of, including but not limited to: bias and fairness, misinformation, and autonomous decision-making. Granite-4.0-H-Tiny-Base model is not the exception in this regard. Even though this model is suited for multiple generative AI tasks, it has not undergone any safety alignment, there it may produce problematic outputs. Additionally, it remains uncertain whether smaller models might exhibit increased susceptibility to hallucination in generation scenarios by copying text verbatim from the training dataset due to their reduced sizes and memorization capacities. This aspect is currently an active area of research, and we anticipate more rigorous exploration, comprehension, and mitigations in this domain. Regarding ethics, a latent risk associated with all Large Language Models is their malicious utilization. We urge the community to use Granite-4.0-H-Tiny-Base model with ethical intentions and in a responsible way. Resources - ⭐️ Learn about the latest updates with Granite: https://www.ibm.com/granite - 📄 Get started with tutorials, best practices, and prompt engineering advice: https://www.ibm.com/granite/docs/ - 💡 Learn about the latest Granite learning resources: https://github.com/ibm-granite-community/
granite-4.0-h-small-FP8
📣 Update [10-07-2025]: Added a default system prompt to the chat template to guide the model towards more professional, accurate, and safe responses. > [!NOTE] > This repository contains the FP8 version of Granite-4.0-H-Small. > > Please refer to the the original instruct model's model card for additional details: > https://huggingface.co/ibm-granite/granite-4.0-h-small
granite-3.1-1b-a400m-instruct
Model Summary: Granite-3.1-1B-A400M-Instruct is a 8B parameter long-context instruct model finetuned from Granite-3.1-1B-A400M-Base using a combination of open source instruction datasets with permissive license and internally collected synthetic datasets tailored for solving long context problems. This model is developed using a diverse set of techniques with a structured chat format, including supervised finetuning, model alignment using reinforcement learning, and model merging. - Developers: Granite Team, IBM - GitHub Repository: ibm-granite/granite-3.1-language-models - Website: Granite Docs - Paper: Granite 3.1 Language Models (coming soon) - Release Date: December 18th, 2024 - License: Apache 2.0 Supported Languages: English, German, Spanish, French, Japanese, Portuguese, Arabic, Czech, Italian, Korean, Dutch, and Chinese. Users may finetune Granite 3.1 models for languages beyond these 12 languages. Intended Use: The model is designed to respond to general instructions and can be used to build AI assistants for multiple domains, including business applications. Capabilities Summarization Text classification Text extraction Question-answering Retrieval Augmented Generation (RAG) Code related tasks Function-calling tasks Multilingual dialog use cases Long-context tasks including long document/meeting summarization, long document QA, etc. Generation: This is a simple example of how to use Granite-3.1-1B-A400M-Instruct model. Then, copy the snippet from the section that is relevant for your use case. Models ARC-Challenge Hellaswag MMLU TruthfulQA Winogrande GSM8K Avg Granite-3.1-8B-Instruct 62.62 84.48 65.34 66.23 75.37 73.84 71.31 Granite-3.1-2B-Instruct 54.61 75.14 55.31 59.42 67.48 52.76 60.79 Granite-3.1-3B-A800M-Instruct 50.42 73.01 52.19 49.71 64.87 48.97 56.53 Granite-3.1-1B-A400M-Instruct 42.66 65.97 26.13 46.77 62.35 33.88 46.29 Models IFEval BBH MATH Lvl 5 GPQA MUSR MMLU-Pro Avg Granite-3.1-8B-Instruct 72.08 34.09 21.68 8.28 19.01 28.19 30.55 Granite-3.1-2B-Instruct 62.86 21.82 11.33 5.26 4.87 20.21 21.06 Granite-3.1-3B-A800M-Instruct 55.16 16.69 10.35 5.15 2.51 12.75 17.1 Granite-3.1-1B-A400M-Instruct 46.86 6.18 4.08 0 0.78 2.41 10.05 Model Architecture: Granite-3.1-1B-A400M-Instruct is based on a decoder-only dense transformer architecture. Core components of this architecture are: GQA and RoPE, MLP with SwiGLU, RMSNorm, and shared input/output embeddings. Training Data: Overall, our SFT data is largely comprised of three key sources: (1) publicly available datasets with permissive license, (2) internal synthetic data targeting specific capabilities including long-context tasks, and (3) very small amounts of human-curated data. A detailed attribution of datasets can be found in the Granite 3.0 Technical Report, Granite 3.1 Technical Report (coming soon), and Accompanying Author List. Infrastructure: We train Granite 3.1 Language Models using IBM's super computing cluster, Blue Vela, which is outfitted with NVIDIA H100 GPUs. This cluster provides a scalable and efficient infrastructure for training our models over thousands of GPUs. Ethical Considerations and Limitations: Granite 3.1 Instruct Models are primarily finetuned using instruction-response pairs mostly in English, but also multilingual data covering eleven languages. Although this model can handle multilingual dialog use cases, its performance might not be similar to English tasks. In such case, introducing a small number of examples (few-shot) can help the model in generating more accurate outputs. While this model has been aligned by keeping safety in consideration, the model may in some cases produce inaccurate, biased, or unsafe responses to user prompts. So we urge the community to use this model with proper safety testing and tuning tailored for their specific tasks. Resources - ⭐️ Learn about the latest updates with Granite: https://www.ibm.com/granite - 📄 Get started with tutorials, best practices, and prompt engineering advice: https://www.ibm.com/granite/docs/ - 💡 Learn about the latest Granite learning resources: https://ibm.biz/granite-learning-resources
granite-3.0-3b-a800m-instruct
Model Summary: Granite-3.0-3B-A800M-Instruct is a 3B parameter model finetuned from Granite-3.0-3B-A800M-Base-4K using a combination of open source instruction datasets with permissive license and internally collected synthetic datasets. This model is developed using a diverse set of techniques with a structured chat format, including supervised finetuning, model alignment using reinforcement learning, and model merging. - Developers: Granite Team, IBM - GitHub Repository: ibm-granite/granite-3.0-language-models - Website: Granite Docs - Paper: Granite 3.0 Language Models - Release Date: October 21st, 2024 - License: Apache 2.0 Supported Languages: English, German, Spanish, French, Japanese, Portuguese, Arabic, Czech, Italian, Korean, Dutch, and Chinese. Users may finetune Granite 3.0 models for languages beyond these 12 languages. Intended use: The model is designed to respond to general instructions and can be used to build AI assistants for multiple domains, including business applications. Capabilities Summarization Text classification Text extraction Question-answering Retrieval Augmented Generation (RAG) Code related tasks Function-calling tasks Multilingual dialog use cases Generation: This is a simple example of how to use Granite-3.0-3B-A800M-Instruct model. Then, copy the snippet from the section that is relevant for your use case. Model Architecture: Granite-3.0-3B-A800M-Instruct is based on a decoder-only sparse Mixture of Experts (MoE) transformer architecture. Core components of this architecture are: Fine-grained Experts, Dropless Token Routing, and Load Balancing Loss. | Model | 2B Dense | 8B Dense | 1B MoE | 3B MoE | | :-------- | :--------| :--------| :--------| :-------- | | Embedding size | 2048 | 4096 | 1024 | 1536 | | Number of layers | 40 | 40 | 24 | 32 | | Attention head size | 64 | 128 | 64 | 64 | | Number of attention heads | 32 | 32 | 16 | 24 | | Number of KV heads | 8 | 8 | 8 | 8 | | MLP hidden size | 8192 | 12800 | 512 | 512 | | MLP activation | SwiGLU | SwiGLU | SwiGLU | SwiGLU | | Number of Experts | — | — | 32 | 40 | | MoE TopK | — | — | 8 | 8 | | Initialization std | 0.1 | 0.1 | 0.1 | 0.1 | | Sequence Length | 4096 | 4096 | 4096 | 4096 | | Position Embedding | RoPE | RoPE | RoPE | RoPE | | # Parameters | 2.5B | 8.1B | 1.3B | 3.3B | | # Active Parameters | 2.5B | 8.1B | 400M | 800M | | # Training tokens | 12T | 12T | 10T | 10T | Training Data: Overall, our SFT data is largely comprised of three key sources: (1) publicly available datasets with permissive license, (2) internal synthetic data targeting specific capabilities, and (3) very small amounts of human-curated data. A detailed attribution of datasets can be found in the Granite Technical Report and Accompanying Author List. Infrastructure: We train Granite 3.0 Language Models using IBM's super computing cluster, Blue Vela, which is outfitted with NVIDIA H100 GPUs. This cluster provides a scalable and efficient infrastructure for training our models over thousands of GPUs while minimizing environmental impact by utilizing 100% renewable energy sources. Ethical Considerations and Limitations: Granite 3.0 Instruct Models are primarily finetuned using instruction-response pairs mostly in English, but also multilingual data covering eleven languages. Although this model can handle multilingual dialog use cases, its performance might not be similar to English tasks. In such case, introducing a small number of examples (few-shot) can help the model in generating more accurate outputs. While this model has been aligned by keeping safety in consideration, the model may in some cases produce inaccurate, biased, or unsafe responses to user prompts. So we urge the community to use this model with proper safety testing and tuning tailored for their specific tasks. Resources - ⭐️ Learn about the latest updates with Granite: https://www.ibm.com/granite - 📄 Get started with tutorials, best practices, and prompt engineering advice: https://www.ibm.com/granite/docs/ - 💡 Learn about the latest Granite learning resources: https://ibm.biz/granite-learning-resources
granite-3.0-2b-base
Model Summary: Granite-3.0-2B-Base is a decoder-only language model to support a variety of text-to-text generation tasks. It is trained from scratch following a two-stage training strategy. In the first stage, it is trained on 10 trillion tokens sourced from diverse domains. During the second stage, it is further trained on 2 trillion tokens using a carefully curated mix of high-quality data, aiming to enhance its performance on specific tasks. - Developers: Granite Team, IBM - GitHub Repository: ibm-granite/granite-3.0-language-models - Website: Granite Docs - Paper: Granite 3.0 Language Models - Release Date: October 21st, 2024 - License: Apache 2.0 Supported Languages: English, German, Spanish, French, Japanese, Portuguese, Arabic, Czech, Italian, Korean, Dutch, and Chinese. Users may finetune Granite 3.0 models for languages beyond these 12 languages. Intended use: Prominent use cases of LLMs in text-to-text generation include summarization, text classification, extraction, question-answering, and more. All Granite Base models are able to handle these tasks as they were trained on a large amount of data from various domains. Moreover, they can serve as baseline to create specialized models for specific application scenarios. Generation: This is a simple example of how to use Granite-3.0-2B-Base model. Then, copy the code snippet below to run the example. Model Architecture: Granite-3.0-2B-Base is based on a decoder-only dense transformer architecture. Core components of this architecture are: GQA and RoPE, MLP with SwiGLU, RMSNorm, and shared input/output embeddings. | Model | 2B Dense | 8B Dense | 1B MoE | 3B MoE | | :-------- | :-------- | :--------| :--------| :--------| | Embedding size | 2048 | 4096 | 1024 | 1536 | | Number of layers | 40 | 40 | 24 | 32 | | Attention head size | 64 | 128 | 64 | 64 | | Number of attention heads | 32 | 32 | 16 | 24 | | Number of KV heads | 8 | 8 | 8 | 8 | | MLP hidden size | 8192 | 12800 | 512 | 512 | | MLP activation | SwiGLU | SwiGLU | SwiGLU | SwiGLU | | Number of Experts | — | — | 32 | 40 | | MoE TopK | — | — | 8 | 8 | | Initialization std | 0.1 | 0.1 | 0.1 | 0.1 | | Sequence Length | 4096 | 4096 | 4096 | 4096 | | Position Embedding | RoPE | RoPE | RoPE | RoPE | | # Parameters | 2.5B | 8.1B | 1.3B | 3.3B | | # Active Parameters | 2.5B | 8.1B | 400M | 800M | | # Training tokens | 12T | 12T | 10T | 10T | Training Data: This model is trained on a mix of open source and proprietary data following a two-stage training strategy. Stage 1 data: The data for stage 1 is sourced from diverse domains, such as: web, code, academic sources, books, and math data. Stage 2 data: The data for stage 2 comprises a curated mix of high-quality data from the same domains, plus multilingual and instruction data. The goal of this second training phase is to enhance the model’s performance on specific tasks. A detailed attribution of datasets can be found in the Granite Technical Report and Accompanying Author List. Infrastructure: We train Granite 3.0 Language Models using IBM's super computing cluster, Blue Vela, which is outfitted with NVIDIA H100 GPUs. This cluster provides a scalable and efficient infrastructure for training our models over thousands of GPUs while minimizing environmental impact by utilizing 100% renewable energy sources. Ethical Considerations and Limitations: The use of Large Language Models involves risks and ethical considerations people must be aware of, including but not limited to: bias and fairness, misinformation, and autonomous decision-making. Granite-3.0-2B-Base model is not the exception in this regard. Even though this model is suited for multiple generative AI tasks, it has not undergone any safety alignment, there it may produce problematic outputs. Additionally, it remains uncertain whether smaller models might exhibit increased susceptibility to hallucination in generation scenarios by copying text verbatim from the training dataset due to their reduced sizes and memorization capacities. This aspect is currently an active area of research, and we anticipate more rigorous exploration, comprehension, and mitigations in this domain. Regarding ethics, a latent risk associated with all Large Language Models is their malicious utilization. We urge the community to use Granite-3.0-2B-Base model with ethical intentions and in a responsible way. Resources - ⭐️ Learn about the latest updates with Granite: https://www.ibm.com/granite - 📄 Get started with tutorials, best practices, and prompt engineering advice: https://www.ibm.com/granite/docs/ - 💡 Learn about the latest Granite learning resources: https://ibm.biz/granite-learning-resources
granite-3.0-1b-a400m-base
Model Summary: Granite-3.0-1B-A400M-Base is a decoder-only language model to support a variety of text-to-text generation tasks. It is trained from scratch following a two-stage training strategy. In the first stage, it is trained on 8 trillion tokens sourced from diverse domains. During the second stage, it is further trained on 2 trillion tokens using a carefully curated mix of high-quality data, aiming to enhance its performance on specific tasks. - Developers: Granite Team, IBM - GitHub Repository: ibm-granite/granite-3.0-language-models - Website: Granite Docs - Paper: Granite 3.0 Language Models - Release Date: October 21st, 2024 - License: Apache 2.0 Supported Languages: English, German, Spanish, French, Japanese, Portuguese, Arabic, Czech, Italian, Korean, Dutch, and Chinese. Users may finetune Granite 3.0 models for languages beyond these 12 languages. Intended use: Prominent use cases of LLMs in text-to-text generation include summarization, text classification, extraction, question-answering, and more. All Granite Base models are able to handle these tasks as they were trained on a large amount of data from various domains. Moreover, they can serve as baseline to create specialized models for specific application scenarios. Generation: This is a simple example of how to use Granite-3.0-1B-A400M-Base model. Then, copy the code snippet below to run the example. Model Architecture: Granite-3.0-1B-A400M-Base is based on a decoder-only sparse Mixture of Experts (MoE) transformer architecture. Core components of this architecture are: Fine-grained Experts, Dropless Token Routing, and Load Balancing Loss. | Model | 2B Dense | 8B Dense | 1B MoE | 3B MoE | | :-------- | :--------| :--------| :-------- | :--------| | Embedding size | 2048 | 4096 | 1024 | 1536 | | Number of layers | 40 | 40 | 24 | 32 | | Attention head size | 64 | 128 | 64 | 64 | | Number of attention heads | 32 | 32 | 16 | 24 | | Number of KV heads | 8 | 8 | 8 | 8 | | MLP hidden size | 8192 | 12800 | 512 | 512 | | MLP activation | SwiGLU | SwiGLU | SwiGLU | SwiGLU | | Number of Experts | — | — | 32 | 40 | | MoE TopK | — | — | 8 | 8 | | Initialization std | 0.1 | 0.1 | 0.1 | 0.1 | | Sequence Length | 4096 | 4096 | 4096 | 4096 | | Position Embedding | RoPE | RoPE | RoPE | RoPE | | # Parameters | 2.5B | 8.1B | 1.3B | 3.3B | | # Active Parameters | 2.5B | 8.1B | 400M | 800M | | # Training tokens | 12T | 12T | 10T | 10T | Training Data: This model is trained on a mix of open source and proprietary data following a two-stage training strategy. Stage 1 data: The data for stage 1 is sourced from diverse domains, such as: web, code, academic sources, books, and math data. Stage 2 data: The data for stage 2 comprises a curated mix of high-quality data from the same domains, plus multilingual and instruction data. The goal of this second training phase is to enhance the model’s performance on specific tasks. A detailed attribution of datasets can be found in the Granite Technical Report and Accompanying Author List. Infrastructure: We train Granite 3.0 Language Models using IBM's super computing cluster, Blue Vela, which is outfitted with NVIDIA H100 GPUs. This cluster provides a scalable and efficient infrastructure for training our models over thousands of GPUs while minimizing environmental impact by utilizing 100% renewable energy sources. Ethical Considerations and Limitations: The use of Large Language Models involves risks and ethical considerations people must be aware of, including but not limited to: bias and fairness, misinformation, and autonomous decision-making. Granite-3.0-1B-A400M-Base model is not the exception in this regard. Even though this model is suited for multiple generative AI tasks, it has not undergone any safety alignment, there it may produce problematic outputs. Additionally, it remains uncertain whether smaller models might exhibit increased susceptibility to hallucination in generation scenarios by copying text verbatim from the training dataset due to their reduced sizes and memorization capacities. This aspect is currently an active area of research, and we anticipate more rigorous exploration, comprehension, and mitigations in this domain. Regarding ethics, a latent risk associated with all Large Language Models is their malicious utilization. We urge the community to use Granite-3.0-1B-A400M-Base model with ethical intentions and in a responsible way. Resources - ⭐️ Learn about the latest updates with Granite: https://www.ibm.com/granite - 📄 Get started with tutorials, best practices, and prompt engineering advice: https://www.ibm.com/granite/docs/ - 💡 Learn about the latest Granite learning resources: https://ibm.biz/granite-learning-resources
granite-3.0-8b-base
Model Summary: Granite-3.0-8B-Base is a decoder-only language model to support a variety of text-to-text generation tasks. It is trained from scratch following a two-stage training strategy. In the first stage, it is trained on 10 trillion tokens sourced from diverse domains. During the second stage, it is further trained on 2 trillion tokens using a carefully curated mix of high-quality data, aiming to enhance its performance on specific tasks. - Developers: Granite Team, IBM - GitHub Repository: ibm-granite/granite-3.0-language-models - Website: Granite Docs - Paper: Granite 3.0 Language Models - Release Date: October 21st, 2024 - License: Apache 2.0 Supported Languages: English, German, Spanish, French, Japanese, Portuguese, Arabic, Czech, Italian, Korean, Dutch, and Chinese. Users may finetune Granite 3.0 models for languages beyond these 12 languages. Intended use: Prominent use cases of LLMs in text-to-text generation include summarization, text classification, extraction, question-answering, and more. All Granite Base models are able to handle these tasks as they were trained on a large amount of data from various domains. Moreover, they can serve as baseline to create specialized models for specific application scenarios. Generation: This is a simple example of how to use Granite-3.0-8B-Base model. Then, copy the code snippet below to run the example. Model Architecture: Granite-3.0-8B-Base is based on a decoder-only dense transformer architecture. Core components of this architecture are: GQA and RoPE, MLP with SwiGLU, RMSNorm, and shared input/output embeddings. | Model | 2B Dense | 8B Dense | 1B MoE | 3B MoE | | :-------- | :--------| :-------- | :------| :------| | Embedding size | 2048 | 4096 | 1024 | 1536 | | Number of layers | 40 | 40 | 24 | 32 | | Attention head size | 64 | 128 | 64 | 64 | | Number of attention heads | 32 | 32 | 16 | 24 | | Number of KV heads | 8 | 8 | 8 | 8 | | MLP hidden size | 8192 | 12800 | 512 | 512 | | MLP activation | SwiGLU | SwiGLU | SwiGLU | SwiGLU | | Number of Experts | — | — | 32 | 40 | | MoE TopK | — | — | 8 | 8 | | Initialization std | 0.1 | 0.1 | 0.1 | 0.1 | | Sequence Length | 4096 | 4096 | 4096 | 4096 | | Position Embedding | RoPE | RoPE | RoPE | RoPE | | # Parameters | 2.5B | 8.1B | 1.3B | 3.3B | | # Active Parameters | 2.5B | 8.1B | 400M | 800M | | # Training tokens | 12T | 12T | 10T | 10T | Training Data: This model is trained on a mix of open source and proprietary data following a two-stage training strategy. Stage 1 data: The data for stage 1 is sourced from diverse domains, such as: web, code, academic sources, books, and math data. Stage 2 data: The data for stage 2 comprises a curated mix of high-quality data from the same domains, plus multilingual and instruction data. The goal of this second training phase is to enhance the model’s performance on specific tasks. A detailed attribution of datasets can be found in the Granite Technical Report and Accompanying Author List. Infrastructure: We train Granite 3.0 Language Models using IBM's super computing cluster, Blue Vela, which is outfitted with NVIDIA H100 GPUs. This cluster provides a scalable and efficient infrastructure for training our models over thousands of GPUs while minimizing environmental impact by utilizing 100% renewable energy sources. Ethical Considerations and Limitations: The use of Large Language Models involves risks and ethical considerations people must be aware of, including but not limited to: bias and fairness, misinformation, and autonomous decision-making. Granite-3.0-8B-Base model is not the exception in this regard. Even though this model is suited for multiple generative AI tasks, it has not undergone any safety alignment, there it may produce problematic outputs. Additionally, it remains uncertain whether smaller models might exhibit increased susceptibility to hallucination in generation scenarios by copying text verbatim from the training dataset due to their reduced sizes and memorization capacities. This aspect is currently an active area of research, and we anticipate more rigorous exploration, comprehension, and mitigations in this domain. Regarding ethics, a latent risk associated with all Large Language Models is their malicious utilization. We urge the community to use Granite-3.0-8B-Base model with ethical intentions and in a responsible way. Resources - ⭐️ Learn about the latest updates with Granite: https://www.ibm.com/granite - 📄 Get started with tutorials, best practices, and prompt engineering advice: https://www.ibm.com/granite/docs/ - 💡 Learn about the latest Granite learning resources: https://ibm.biz/granite-learning-resources
granite-3.1-8b-base
Model Summary: Granite-3.1-8B-Base extends the context length of Granite-3.0-8B-Base from 4K to 128K using a progressive training strategy by increasing the supported context length in increments while adjusting RoPE theta until the model has successfully adapted to desired length of 128K. This long-context pre-training stage was performed using approximately 500B tokens. - Developers: Granite Team, IBM - GitHub Repository: ibm-granite/granite-3.1-language-models - Website: Granite Docs - Paper: Granite 3.1 Language Models (coming soon) - Release Date: December 18th, 2024 - License: Apache 2.0 Supported Languages: English, German, Spanish, French, Japanese, Portuguese, Arabic, Czech, Italian, Korean, Dutch, and Chinese. Users may finetune Granite 3.1 models for languages beyond these 12 languages. Intended Use: Prominent use cases of LLMs in text-to-text generation include summarization, text classification, extraction, question-answering, and other long-context tasks. All Granite Base models are able to handle these tasks as they were trained on a large amount of data from various domains. Moreover, they can serve as baseline to create specialized models for specific application scenarios. Generation: This is a simple example of how to use Granite-3.1-8B-Base model. Then, copy the code snippet below to run the example. Models ARC-Challenge Hellaswag MMLU TruthfulQA Winogrande GSM8K Avg Granite-3.1-8B-Base 63.99 83.27 63.45 51.29 78.92 60.19 66.85 Granite-3.1-2B-Base 53.58 77.67 52.86 39.02 72.84 47.99 57.32 Granite-3.1-3B-A800M-Base 50.76 74.45 48.31 39.91 69.29 40.56 53.88 Granite-3.1-1B-A400M-Base 39.42 66.13 26.53 37.67 2.03 18.87 31.78 Models IFEval BBH MATH Lvl 5 GPQA MUSR MMLU-Pro Avg Granite-3.1-8B-Base 42.21 26.02 9.52 9.51 8.36 24.8 20.07 Granite-3.1-2B-Base 35.22 16.84 5.59 3.69 3.9 13.9 13.19 Granite-3.1-3B-A800M-Base 29.96 11.91 4 3.69 1.11 8.81 9.91 Granite-3.1-1B-A400M-Base 25.19 6.43 2.19 0.22 1.76 1.55 6.22 Model Architecture: Granite-3.1-8B-Base is based on a decoder-only dense transformer architecture. Core components of this architecture are: GQA and RoPE, MLP with SwiGLU, RMSNorm, and shared input/output embeddings. Training Data: This model is trained on a mix of open source and proprietary data following a three-stage training strategy. Stage 1 data: The data for stage 1 is sourced from diverse domains, such as: web, code, academic sources, books, and math data. Stage 2 data: The data for stage 2 comprises a curated mix of high-quality data from the same domains, plus multilingual and instruction data. The goal of this second training phase is to enhance the model’s performance on specific tasks. Stage 3 data: The data for stage 3 consists of original stage-2 pretraining data with additional synthetic long-context data in form of QA/summary pairs where the answer contains a recitation of the related paragraph before the answer. A detailed attribution of datasets can be found in the Granite 3.0 Technical Report, Granite 3.1 Technical Report (coming soon), and Accompanying Author List. Infrastructure: We train Granite 3.1 Language Models using IBM's super computing cluster, Blue Vela, which is outfitted with NVIDIA H100 GPUs. This cluster provides a scalable and efficient infrastructure for training our models over thousands of GPUs. Ethical Considerations and Limitations: The use of Large Language Models involves risks and ethical considerations people must be aware of, including but not limited to: bias and fairness, misinformation, and autonomous decision-making. Granite-3.1-8B-Base model is not the exception in this regard. Even though this model is suited for multiple generative AI tasks, it has not undergone any safety alignment, there it may produce problematic outputs. Additionally, it remains uncertain whether smaller models might exhibit increased susceptibility to hallucination in generation scenarios by copying text verbatim from the training dataset due to their reduced sizes and memorization capacities. This aspect is currently an active area of research, and we anticipate more rigorous exploration, comprehension, and mitigations in this domain. Regarding ethics, a latent risk associated with all Large Language Models is their malicious utilization. We urge the community to use Granite-3.1-8B-Base model with ethical intentions and in a responsible way. Resources - ⭐️ Learn about the latest updates with Granite: https://www.ibm.com/granite - 📄 Get started with tutorials, best practices, and prompt engineering advice: https://www.ibm.com/granite/docs/ - 💡 Learn about the latest Granite learning resources: https://ibm.biz/granite-learning-resources
granite-3.1-1b-a400m-base
Model Summary: Granite-3.1-1B-A400M-Base extends the context length of Granite-3.0-1B-A400M-Base from 4K to 128K using a progressive training strategy by increasing the supported context length in increments while adjusting RoPE theta until the model has successfully adapted to desired length of 128K. This long-context pre-training stage was performed using approximately 500B tokens. - Developers: Granite Team, IBM - GitHub Repository: ibm-granite/granite-3.1-language-models - Website: Granite Docs - Paper: Granite 3.1 Language Models (coming soon) - Release Date: December 18th, 2024 - License: Apache 2.0 Supported Languages: English, German, Spanish, French, Japanese, Portuguese, Arabic, Czech, Italian, Korean, Dutch, and Chinese. Users may finetune Granite 3.1 models for languages beyond these 12 languages. Intended Use: Prominent use cases of LLMs in text-to-text generation include summarization, text classification, extraction, question-answering, and more. All Granite Base models are able to handle these tasks as they were trained on a large amount of data from various domains. Moreover, they can serve as baseline to create specialized models for specific application scenarios. Generation: This is a simple example of how to use Granite-3.1-1B-A400M-Base model. Then, copy the code snippet below to run the example. Models ARC-Challenge Hellaswag MMLU TruthfulQA Winogrande GSM8K Avg Granite-3.1-8B-Base 63.99 83.27 63.45 51.29 78.92 60.19 66.85 Granite-3.1-2B-Base 53.58 77.67 52.86 39.02 72.84 47.99 57.32 Granite-3.1-3B-A800M-Base 50.76 74.45 48.31 39.91 69.29 40.56 53.88 Granite-3.1-1B-A400M-Base 39.42 66.13 26.53 37.67 2.03 18.87 31.78 Models IFEval BBH MATH Lvl 5 GPQA MUSR MMLU-Pro Avg Granite-3.1-8B-Base 42.21 26.02 9.52 9.51 8.36 24.8 20.07 Granite-3.1-2B-Base 35.22 16.84 5.59 3.69 3.9 13.9 13.19 Granite-3.1-3B-A800M-Base 29.96 11.91 4 3.69 1.11 8.81 9.91 Granite-3.1-1B-A400M-Base 25.19 6.43 2.19 0.22 1.76 1.55 6.22 Model Architecture: Granite-3.1-1B-A400M-Base is based on a decoder-only sparse Mixture of Experts (MoE) transformer architecture. Core components of this architecture are: Fine-grained Experts, Dropless Token Routing, and Load Balancing Loss. Training Data: This model is trained on a mix of open source and proprietary data following a two-stage training strategy. Stage 1 data: The data for stage 1 is sourced from diverse domains, such as: web, code, academic sources, books, and math data. Stage 2 data: The data for stage 2 comprises a curated mix of high-quality data from the same domains, plus multilingual and instruction data. The goal of this second training phase is to enhance the model’s performance on specific tasks. Stage 3 data: The data for stage 3 consists of original stage-2 pretraining data with additional synthetic long-context data in form of QA/summary pairs where the answer contains a recitation of the related paragraph before the answer. A detailed attribution of datasets can be found in the Granite 3.0 Technical Report, Granite 3.1 Technical Report (coming soon), and Accompanying Author List. Infrastructure: We train Granite 3.1 Language Models using IBM's super computing cluster, Blue Vela, which is outfitted with NVIDIA H100 GPUs. This cluster provides a scalable and efficient infrastructure for training our models over thousands of GPUs. Ethical Considerations and Limitations: The use of Large Language Models involves risks and ethical considerations people must be aware of, including but not limited to: bias and fairness, misinformation, and autonomous decision-making. Granite-3.1-1B-A400M-Base model is not the exception in this regard. Even though this model is suited for multiple generative AI tasks, it has not undergone any safety alignment, there it may produce problematic outputs. Additionally, it remains uncertain whether smaller models might exhibit increased susceptibility to hallucination in generation scenarios by copying text verbatim from the training dataset due to their reduced sizes and memorization capacities. This aspect is currently an active area of research, and we anticipate more rigorous exploration, comprehension, and mitigations in this domain. Regarding ethics, a latent risk associated with all Large Language Models is their malicious utilization. We urge the community to use Granite-3.1-1B-A400M-Base model with ethical intentions and in a responsible way. Resources - ⭐️ Learn about the latest updates with Granite: https://www.ibm.com/granite - 📄 Get started with tutorials, best practices, and prompt engineering advice: https://www.ibm.com/granite/docs/ - 💡 Learn about the latest Granite learning resources: https://ibm.biz/granite-learning-resources
granite-4.0-h-small-base
Model Summary: Granite-4.0-H-Small-Base is a decoder-only, long-context language model designed for a wide range of text-to-text generation tasks. It also supports Fill-in-the-Middle (FIM) code completion through the use of specialized prefix and suffix tokens. The model is trained from scratch on approximately 23 trillion tokens following a four-stage training strategy: 15 trillion tokens in the first stage, 5 trillion in the second, 2 trillion in the third, and 0.5 trillion in the final stage. - Developers: Granite Team, IBM - HF Collection: Granite 4.0 Language Models HF Collection - GitHub Repository: ibm-granite/granite-4.0-language-models - Website: Granite Docs - Release Date: October 2nd, 2025 - License: Apache 2.0 Supported Languages: English, German, Spanish, French, Japanese, Portuguese, Arabic, Czech, Italian, Korean, Dutch, and Chinese. Users may finetune Granite 4.0 models for languages beyond these languages. Intended Use: Prominent use cases of LLMs in text-to-text generation include summarization, text classification, extraction, question-answering, code-completion (including FIM), and long-context generation tasks. All Granite Base models are able to handle these tasks as they were trained on a large amount of data from various domains. Moreover, they can serve as baseline to create specialized models for specific application scenarios. Generation: This is a simple example of how to use Granite-4.0-H-Small-Base model. Then, copy the code snippet below to run the example. Benchmarks Metric Micro Dense H Micro Dense H Tiny MoE H Small MoE HumanEval pass@1 [StarCoder Prompt] 76.19 73.72 77.59 83.66 Multilingual Benchmarks and thr included languages: MMMLU 11 ar, de, en, es, fr, ja, ko, pt, zh, bn, hi INCLUDE 14 hindi, bengali, tamil, telugu, arabic, german, spanish, french, italian, japanese, korean, dutch, portuguese, chinese --> hi, bn, ta, te, ar, de, es, fr, it, ja, ko, nl, pt, zh Model Architecture: Granite-4.0-H-Small-Base is based on a decoder-only MoE transformer architecture. Core components of this architecture are: GQA, Mamba2, MoEs with shared experts, SwiGLU activation, RMSNorm, and shared input/output embeddings. Model Micro Dense H Micro Dense H Tiny MoE H Small MoE Number of layers 40 attention 4 attention / 36 Mamba2 4 attention / 36 Mamba2 4 attention / 36 Mamba2 MLP / Shared expert hidden size 8192 8192 1024 1536 Training Data: This model is trained on a mix of open source and proprietary data following a four-stage training strategy. Stage Characteristics Micro Dense H Micro Dense H Tiny MoE H Small MoE I General mixture of training data, warmup, and power scheduler for learning rate. 10 10 15 15 II General mixture of training data with higher percentages of code and math with power scheduler for learning rate. 2 5 5 5 III High quality training data, exponential decay of learning rate. 2 2 2 2 IV High quality training data, linear decay to zero for learning rate. 0.5 0.5 0.5 0.5 Infrastructure: We trained the Granite 4.0 Language Models utilizing an NVIDIA GB200 NVL72 cluster hosted in CoreWeave. Intra-rack communication occurs via the 72-GPU NVLink domain, and a non-blocking, full Fat-Tree NDR 400 Gb/s InfiniBand network provides inter-rack communication. This cluster provides a scalable and efficient infrastructure for training our models over thousands of GPUs. Ethical Considerations and Limitations: The use of Large Language Models involves risks and ethical considerations people must be aware of, including but not limited to: bias and fairness, misinformation, and autonomous decision-making. Granite-4.0-H-Small-Base model is not the exception in this regard. Even though this model is suited for multiple generative AI tasks, it has not undergone any safety alignment, there it may produce problematic outputs. Additionally, it remains uncertain whether smaller models might exhibit increased susceptibility to hallucination in generation scenarios by copying text verbatim from the training dataset due to their reduced sizes and memorization capacities. This aspect is currently an active area of research, and we anticipate more rigorous exploration, comprehension, and mitigations in this domain. Regarding ethics, a latent risk associated with all Large Language Models is their malicious utilization. We urge the community to use Granite-4.0-H-Small-Base model with ethical intentions and in a responsible way. Resources - ⭐️ Learn about the latest updates with Granite: https://www.ibm.com/granite - 📄 Get started with tutorials, best practices, and prompt engineering advice: https://www.ibm.com/granite/docs/ - 💡 Learn about the latest Granite learning resources: https://github.com/ibm-granite-community/
granite-3.1-3b-a800m-base
Model Summary: Granite-3.1-3B-A800M-Base extends the context length of Granite-3.0-3B-A800M-Base from 4K to 128K using a progressive training strategy by increasing the supported context length in increments while adjusting RoPE theta until the model has successfully adapted to desired length of 128K. This long-context pre-training stage was performed using approximately 500B tokens. - Developers: Granite Team, IBM - GitHub Repository: ibm-granite/granite-3.1-language-models - Website: Granite Docs - Paper: Granite 3.1 Language Models (coming soon) - Release Date: December 18th, 2024 - License: Apache 2.0 Supported Languages: English, German, Spanish, French, Japanese, Portuguese, Arabic, Czech, Italian, Korean, Dutch, and Chinese. Users may finetune Granite 3.1 models for languages beyond these 12 languages. Intended Use: Prominent use cases of LLMs in text-to-text generation include summarization, text classification, extraction, question-answering, and more. All Granite Base models are able to handle these tasks as they were trained on a large amount of data from various domains. Moreover, they can serve as baseline to create specialized models for specific application scenarios. Generation: This is a simple example of how to use Granite-3.1-3B-A800M-Base model. Then, copy the code snippet below to run the example. Models ARC-Challenge Hellaswag MMLU TruthfulQA Winogrande GSM8K Avg Granite-3.1-8B-Base 63.99 83.27 63.45 51.29 78.92 60.19 66.85 Granite-3.1-2B-Base 53.58 77.67 52.86 39.02 72.84 47.99 57.32 Granite-3.1-3B-A800M-Base 50.76 74.45 48.31 39.91 69.29 40.56 53.88 Granite-3.1-1B-A400M-Base 39.42 66.13 26.53 37.67 2.03 18.87 31.78 Models IFEval BBH MATH Lvl 5 GPQA MUSR MMLU-Pro Avg Granite-3.1-8B-Base 42.21 26.02 9.52 9.51 8.36 24.8 20.07 Granite-3.1-2B-Base 35.22 16.84 5.59 3.69 3.9 13.9 13.19 Granite-3.1-3B-A800M-Base 29.96 11.91 4 3.69 1.11 8.81 9.91 Granite-3.1-1B-A400M-Base 25.19 6.43 2.19 0.22 1.76 1.55 6.22 Model Architecture: Granite-3.1-3B-A800M-Base is based on a decoder-only sparse Mixture of Experts (MoE) transformer architecture. Core components of this architecture are: Fine-grained Experts, Dropless Token Routing, and Load Balancing Loss. Training Data: This model is trained on a mix of open source and proprietary data following a two-stage training strategy. Stage 1 data: The data for stage 1 is sourced from diverse domains, such as: web, code, academic sources, books, and math data. Stage 2 data: The data for stage 2 comprises a curated mix of high-quality data from the same domains, plus multilingual and instruction data. The goal of this second training phase is to enhance the model’s performance on specific tasks. Stage 3 data: The data for stage 3 consists of original stage-2 pretraining data with additional synthetic long-context data in form of QA/summary pairs where the answer contains a recitation of the related paragraph before the answer. A detailed attribution of datasets can be found in the Granite 3.0 Technical Report, Granite 3.1 Technical Report (coming soon), and Accompanying Author List. Infrastructure: We train Granite 3.1 Language Models using IBM's super computing cluster, Blue Vela, which is outfitted with NVIDIA H100 GPUs. This cluster provides a scalable and efficient infrastructure for training our models over thousands of GPUs. Ethical Considerations and Limitations: The use of Large Language Models involves risks and ethical considerations people must be aware of, including but not limited to: bias and fairness, misinformation, and autonomous decision-making. Granite-3.1-3B-A800M-Base model is not the exception in this regard. Even though this model is suited for multiple generative AI tasks, it has not undergone any safety alignment, there it may produce problematic outputs. Additionally, it remains uncertain whether smaller models might exhibit increased susceptibility to hallucination in generation scenarios by copying text verbatim from the training dataset due to their reduced sizes and memorization capacities. This aspect is currently an active area of research, and we anticipate more rigorous exploration, comprehension, and mitigations in this domain. Regarding ethics, a latent risk associated with all Large Language Models is their malicious utilization. We urge the community to use Granite-3.1-3B-A800M-Base model with ethical intentions and in a responsible way. Resources - ⭐️ Learn about the latest updates with Granite: https://www.ibm.com/granite - 📄 Get started with tutorials, best practices, and prompt engineering advice: https://www.ibm.com/granite/docs/ - 💡 Learn about the latest Granite learning resources: https://ibm.biz/granite-learning-resources
granite-3.0-3b-a800m-base
Model Summary: Granite-3.0-3B-A800M-Base is a decoder-only language model to support a variety of text-to-text generation tasks. It is trained from scratch following a two-stage training strategy. In the first stage, it is trained on 8 trillion tokens sourced from diverse domains. During the second stage, it is further trained on 2 trillion tokens using a carefully curated mix of high-quality data, aiming to enhance its performance on specific tasks. - Developers: Granite Team, IBM - GitHub Repository: ibm-granite/granite-3.0-language-models - Website: Granite Docs - Paper: Granite 3.0 Language Models - Release Date: October 21st, 2024 - License: Apache 2.0 Supported Languages: English, German, Spanish, French, Japanese, Portuguese, Arabic, Czech, Italian, Korean, Dutch, and Chinese. Users may finetune Granite 3.0 models for languages beyond these 12 languages. Intended use: Prominent use cases of LLMs in text-to-text generation include summarization, text classification, extraction, question-answering, and more. All Granite Base models are able to handle these tasks as they were trained on a large amount of data from various domains. Moreover, they can serve as baseline to create specialized models for specific application scenarios. Generation: This is a simple example of how to use Granite-3.0-3B-A800M-Base model. Then, copy the code snippet below to run the example. Model Architecture: Granite-3.0-3B-A800M-Base is based on a decoder-only sparse Mixture of Experts (MoE) transformer architecture. Core components of this architecture are: Fine-grained Experts, Dropless Token Routing, and Load Balancing Loss. | Model | 2B Dense | 8B Dense | 1B MoE | 3B MoE | | :-------- | :--------| :--------| :--------| :-------- | | Embedding size | 2048 | 4096 | 1024 | 1536 | | Number of layers | 40 | 40 | 24 | 32 | | Attention head size | 64 | 128 | 64 | 64 | | Number of attention heads | 32 | 32 | 16 | 24 | | Number of KV heads | 8 | 8 | 8 | 8 | | MLP hidden size | 8192 | 12800 | 512 | 512 | | MLP activation | SwiGLU | SwiGLU | SwiGLU | SwiGLU | | Number of Experts | — | — | 32 | 40 | | MoE TopK | — | — | 8 | 8 | | Initialization std | 0.1 | 0.1 | 0.1 | 0.1 | | Sequence Length | 4096 | 4096 | 4096 | 4096 | | Position Embedding | RoPE | RoPE | RoPE | RoPE | | # Parameters | 2.5B | 8.1B | 1.3B | 3.3B | | # Active Parameters | 2.5B | 8.1B | 400M | 800M | | # Training tokens | 12T | 12T | 10T | 10T | Training Data: This model is trained on a mix of open source and proprietary data following a two-stage training strategy. Stage 1 data: The data for stage 1 is sourced from diverse domains, such as: web, code, academic sources, books, and math data. Stage 2 data: The data for stage 2 comprises a curated mix of high-quality data from the same domains, plus multilingual and instruction data. The goal of this second training phase is to enhance the model’s performance on specific tasks. A detailed attribution of datasets can be found in the Granite Technical Report and Accompanying Author List. Infrastructure: We train Granite 3.0 Language Models using IBM's super computing cluster, Blue Vela, which is outfitted with NVIDIA H100 GPUs. This cluster provides a scalable and efficient infrastructure for training our models over thousands of GPUs while minimizing environmental impact by utilizing 100% renewable energy sources. Ethical Considerations and Limitations: The use of Large Language Models involves risks and ethical considerations people must be aware of, including but not limited to: bias and fairness, misinformation, and autonomous decision-making. Granite-3.0-3B-A800M-Base model is not the exception in this regard. Even though this model is suited for multiple generative AI tasks, it has not undergone any safety alignment, there it may produce problematic outputs. Additionally, it remains uncertain whether smaller models might exhibit increased susceptibility to hallucination in generation scenarios by copying text verbatim from the training dataset due to their reduced sizes and memorization capacities. This aspect is currently an active area of research, and we anticipate more rigorous exploration, comprehension, and mitigations in this domain. Regarding ethics, a latent risk associated with all Large Language Models is their malicious utilization. We urge the community to use Granite-3.0-3B-A800M-Base model with ethical intentions and in a responsible way. Resources - ⭐️ Learn about the latest updates with Granite: https://www.ibm.com/granite - 📄 Get started with tutorials, best practices, and prompt engineering advice: https://www.ibm.com/granite/docs/ - 💡 Learn about the latest Granite learning resources: https://ibm.biz/granite-learning-resources
granite-3b-code-instruct-2k
Granite 4.0 1b Base
Model Summary: Granite-4.0-1B-Base is a lightweight decoder-only language model designed for scenarios where efficiency and speed are critical. They can run on resource-constrained devices such as smartphones or IoT hardware, enabling offline and privacy-preserving applications. It also supports Fill-in-the-Middle (FIM) code completion through the use of specialized prefix and suffix tokens. The model is trained from scratch on approximately 15 trillion tokens following a four-stage training strategy: 10 trillion tokens in the first stage, 2 trillion in the second, another 2 trillion in the third, and 0.5 trillion in the final stage. - Developers: Granite Team, IBM - HF Collection: Granite 4.0 Nano Language Models HF Collection - GitHub Repository: ibm-granite/granite-4.0-nano-language-models - Website: Granite Docs - Release Date: October 28, 2025 - License: Apache 2.0 Supported Languages: English, German, Spanish, French, Japanese, Portuguese, Arabic, Czech, Italian, Korean, Dutch, and Chinese. Users may fine-tune Granite 4.0 Nano models to support languages beyond those included in this list. Intended Use: Prominent use cases of LLMs in text-to-text generation include summarization, text classification, extraction, question-answering and code-completion (including FIM) tasks. Moreover, these lightweight models can serve as baseline to create task-specific models for different applications. Generation: This is a simple example of how to use Granite-4.0-1B-Base model. Then, copy the code snippet below to run the example. Benchmarks Metric 350M Dense H 350M Dense 1B Dense H 1B Dense HumanEval pass@1 [StarCoder Prompt] 34.6 35.61 68.08 68.26 Multilingual Benchmarks and thr included languages: MMMLU 11 ar, de, en, es, fr, ja, ko, pt, zh, bn, hi INCLUDE 14 hi, bn, ta, te, ar, de, es, fr, it, ja, ko, nl, pt, zh Granite-4.0-1B-Base is based on a decoder-only dense transformer architecture. Core components of this architecture are: GQA, MLP with SwiGLU, RMSNorm, and shared input/output embeddings. Number of layers 28 attention 4 attention / 28 Mamba2 40 attention 4 attention / 36 Mamba2 MLP / Shared expert hidden size 2048 2048 4096 4096 Training Data: This model is trained on a mix of open source and proprietary data following a four-stage training strategy. Stage Characteristics 350M Dense H 350M Dense 1B Dense H 1B Dense I General mixture of training data, warmup, and power scheduler for learning rate. 10 10 10 10 II General mixture of training data with higher percentages of code and math with power scheduler for learning rate. 2 2 2 2 III High quality training data, exponential decay of learning rate. 2 2 2 2 IV High quality training data, linear decay to zero for learning rate. 0.5 0.5 0.5 0.5 Infrastructure: We trained the Granite 4.0 Nano Language Models utilizing an NVIDIA GB200 NVL72 cluster hosted in CoreWeave. Intra-rack communication occurs via the 72-GPU NVLink domain, and a non-blocking, full Fat-Tree NDR 400 Gb/s InfiniBand network provides inter-rack communication. This cluster provides a scalable and efficient infrastructure for training our models over thousands of GPUs. Ethical Considerations and Limitations: The use of Large Language Models involves risks and ethical considerations people must be aware of, including but not limited to: bias and fairness, misinformation, and autonomous decision-making. Granite-4.0-1B-Base model is not the exception in this regard. Even though this model is suited for multiple generative AI tasks, it has not undergone any safety alignment; therefore, it may produce problematic outputs. Additionally, it remains uncertain whether smaller models might exhibit increased susceptibility to hallucination in generation scenarios by copying text verbatim from the training dataset due to their reduced sizes and memorization capacities. This aspect is currently an active area of research, and we anticipate more rigorous exploration, comprehension, and mitigations in this domain. Regarding ethics, a latent risk associated with all Large Language Models is their malicious utilization. We urge the community to use Granite-4.0-1B-Base model with ethical intentions and in a responsible way. Resources - ⭐️ Learn about the latest updates with Granite: https://www.ibm.com/granite - 📄 Get started with tutorials, best practices, and prompt engineering advice: https://www.ibm.com/granite/docs/ - 💡 Learn about the latest Granite learning resources: https://github.com/ibm-granite-community/
granite-20b-code-instruct-8k
granite-guardian-3.1-2b
granite-3.3-8b-instruct-FP8
> [!NOTE] > This repository contains the FP8 version of Granite-3.3-8b-Instruct. > > Please reference the base model's full model card here: > https://huggingface.co/ibm-granite/granite-3.3-8b-instruct
granite-4.0-tiny-preview-GGUF
granite-vision-3.1-2b-preview
granite-vision-3.3-2b-GGUF
> [!NOTE] > This repository contains models that have been converted to the GGUF format with various quantizations from an IBM Granite base model. > > Please reference the base model's full model card here: > https://huggingface.co/ibm-granite/granite-vision-3.3-2b Model Summary: Granite-vision-3.3-2b is a compact and efficient vision-language model, specifically designed for visual document understanding, enabling automated content extraction from tables, charts, infographics, plots, diagrams, and more. Granite-vision-3.3-2b introduces several novel experimental features such as image segmentation, doctags generation, and multi-page support (see Experimental Capabilities for more details) and offers enhanced safety when compared to earlier Granite vision models. The model was trained on a meticulously curated instruction-following data, comprising diverse public and synthetic datasets tailored to support a wide range of document understanding and general image tasks. Granite-vision-3.3-2b was trained by fine-tuning a Granite large language model with both image and text modalities. - Paper: Granite Vision: a lightweight, open-source multimodal model for enterprise Intelligence. Note that the paper describes Granite Vision 3.2. Granite Vision 3.3 shares most of the technical underpinnings with Granite 3.2. However, there are several enhancements in terms of new and improved vision encoder, many new high quality datasets for training, and several new experimental capabilities. - Release Date: Jun 11th, 2025 - License: Apache 2.0 Supported Input Format: Currently the model supports English instructions and images (png, jpeg) as input format. Intended Use: The model is intended to be used in enterprise applications that involve processing visual and text data. In particular, the model is well-suited for a range of visual document understanding tasks, such as analyzing tables and charts, performing optical character recognition (OCR), and answering questions based on document content. Additionally, its capabilities extend to general image understanding, enabling it to be applied to a broader range of business applications. For tasks that exclusively involve text-based input, we suggest using our Granite large language models, which are optimized for text-only processing and offer superior performance compared to this model.
granite-3.3-2b-instruct-GGUF
granite-timeseries-patchtsmixer
granite-guardian-3.2-5b
granite-8b-code-base-128k
Granite 4.0 350m Base
Model Summary: Granite-4.0-350M-Base is a lightweight decoder-only language model designed for scenarios where efficiency and speed are critical. They can run on resource-constrained devices such as smartphones or IoT hardware, enabling offline and privacy-preserving applications. It also supports Fill-in-the-Middle (FIM) code completion through the use of specialized prefix and suffix tokens. The model is trained from scratch on approximately 15 trillion tokens following a four-stage training strategy: 10 trillion tokens in the first stage, 2 trillion in the second, another 2 trillion in the third, and 0.5 trillion in the final stage. - Developers: Granite Team, IBM - HF Collection: Granite 4.0 Nano Language Models HF Collection - GitHub Repository: ibm-granite/granite-4.0-nano-language-models - Website: Granite Docs - Release Date: October 28, 2025 - License: Apache 2.0 Supported Languages: English, German, Spanish, French, Japanese, Portuguese, Arabic, Czech, Italian, Korean, Dutch, and Chinese. Users may fine-tune Granite 4.0 Nano models to support languages beyond those included in this list. Intended Use: Prominent use cases of LLMs in text-to-text generation include summarization, text classification, extraction, question-answering and code-completion (including FIM) tasks. Moreover, these lightweight models can serve as baseline to create task-specific models for different applications. Generation: This is a simple example of how to use Granite-4.0-350M-Base model. Then, copy the code snippet below to run the example. Benchmarks Metric 350M Dense H 350M Dense 1B Dense H 1B Dense HumanEval pass@1 [StarCoder Prompt] 34.6 35.61 68.08 68.26 Multilingual Benchmarks and thr included languages: MMMLU 11 ar, de, en, es, fr, ja, ko, pt, zh, bn, hi INCLUDE 14 hi, bn, ta, te, ar, de, es, fr, it, ja, ko, nl, pt, zh Granite-4.0-350M-Base is based on a decoder-only dense transformer architecture. Core components of this architecture are: GQA, MLP with SwiGLU, RMSNorm, and shared input/output embeddings. Number of layers 28 attention 4 attention / 28 Mamba2 40 attention 4 attention / 36 Mamba2 MLP / Shared expert hidden size 2048 2048 4096 4096 Training Data: This model is trained on a mix of open source and proprietary data following a four-stage training strategy. Stage Characteristics 350M Dense H 350M Dense 1B Dense H 1B Dense I General mixture of training data, warmup, and power scheduler for learning rate. 10 10 10 10 II General mixture of training data with higher percentages of code and math with power scheduler for learning rate. 2 2 2 2 III High quality training data, exponential decay of learning rate. 2 2 2 2 IV High quality training data, linear decay to zero for learning rate. 0.5 0.5 0.5 0.5 Infrastructure: We trained the Granite 4.0 Nano Language Models utilizing an NVIDIA GB200 NVL72 cluster hosted in CoreWeave. Intra-rack communication occurs via the 72-GPU NVLink domain, and a non-blocking, full Fat-Tree NDR 400 Gb/s InfiniBand network provides inter-rack communication. This cluster provides a scalable and efficient infrastructure for training our models over thousands of GPUs. Ethical Considerations and Limitations: The use of Large Language Models involves risks and ethical considerations people must be aware of, including but not limited to: bias and fairness, misinformation, and autonomous decision-making. Granite-4.0-350M-Base model is not the exception in this regard. Even though this model is suited for multiple generative AI tasks, it has not undergone any safety alignment; therefore, it may produce problematic outputs. Additionally, it remains uncertain whether smaller models might exhibit increased susceptibility to hallucination in generation scenarios by copying text verbatim from the training dataset due to their reduced sizes and memorization capacities. This aspect is currently an active area of research, and we anticipate more rigorous exploration, comprehension, and mitigations in this domain. Regarding ethics, a latent risk associated with all Large Language Models is their malicious utilization. We urge the community to use Granite-4.0-350M-Base model with ethical intentions and in a responsible way. Resources - ⭐️ Learn about the latest updates with Granite: https://www.ibm.com/granite - 📄 Get started with tutorials, best practices, and prompt engineering advice: https://www.ibm.com/granite/docs/ - 💡 Learn about the latest Granite learning resources: https://github.com/ibm-granite-community/
Granite 4.0 H 350m Base
Model Summary: Granite-4.0-H-350M-Base is a lightweight decoder-only language model designed for scenarios where efficiency and speed are critical. They can run on resource-constrained devices such as smartphones or IoT hardware, enabling offline and privacy-preserving applications. It also supports Fill-in-the-Middle (FIM) code completion through the use of specialized prefix and suffix tokens. The model is trained from scratch on approximately 15 trillion tokens following a four-stage training strategy: 10 trillion tokens in the first stage, 2 trillion in the second, another 2 trillion in the third, and 0.5 trillion in the final stage. - Developers: Granite Team, IBM - HF Collection: Granite 4.0 Nano Language Models HF Collection - GitHub Repository: ibm-granite/granite-4.0-nano-language-models - Website: Granite Docs - Release Date: October 28, 2025 - License: Apache 2.0 Supported Languages: English, German, Spanish, French, Japanese, Portuguese, Arabic, Czech, Italian, Korean, Dutch, and Chinese. Users may fine-tune Granite 4.0 Nano models to support languages beyond those included in this list. Intended Use: Prominent use cases of LLMs in text-to-text generation include summarization, text classification, extraction, question-answering and code-completion (including FIM) tasks. Moreover, these lightweight models can serve as baseline to create task-specific models for different applications. Generation: This is a simple example of how to use Granite-4.0-H-350M-Base model. Then, copy the code snippet below to run the example. Benchmarks Metric 350M Dense H 350M Dense 1B Dense H 1B Dense HumanEval pass@1 [StarCoder Prompt] 34.6 35.61 68.08 68.26 Multilingual Benchmarks and thr included languages: MMMLU 11 ar, de, en, es, fr, ja, ko, pt, zh, bn, hi INCLUDE 14 hi, bn, ta, te, ar, de, es, fr, it, ja, ko, nl, pt, zh Model Architecture: Granite-4.0-H-350M-Base is based on a decoder-only dense transformer architecture. Core components of this architecture are: GQA, Mamba2, MLP with SwiGLU, RMSNorm, and shared input/output embeddings. Number of layers 28 attention 4 attention / 28 Mamba2 40 attention 4 attention / 36 Mamba2 MLP / Shared expert hidden size 2048 2048 4096 4096 Training Data: This model is trained on a mix of open source and proprietary data following a four-stage training strategy. Stage Characteristics 350M Dense H 350M Dense 1B Dense H 1B Dense I General mixture of training data, warmup, and power scheduler for learning rate. 10 10 10 10 II General mixture of training data with higher percentages of code and math with power scheduler for learning rate. 2 2 2 2 III High quality training data, exponential decay of learning rate. 2 2 2 2 IV High quality training data, linear decay to zero for learning rate. 0.5 0.5 0.5 0.5 Infrastructure: We trained the Granite 4.0 Nano Language Models utilizing an NVIDIA GB200 NVL72 cluster hosted in CoreWeave. Intra-rack communication occurs via the 72-GPU NVLink domain, and a non-blocking, full Fat-Tree NDR 400 Gb/s InfiniBand network provides inter-rack communication. This cluster provides a scalable and efficient infrastructure for training our models over thousands of GPUs. Ethical Considerations and Limitations: The use of Large Language Models involves risks and ethical considerations people must be aware of, including but not limited to: bias and fairness, misinformation, and autonomous decision-making. Granite-4.0-H-350M-Base model is not the exception in this regard. Even though this model is suited for multiple generative AI tasks, it has not undergone any safety alignment; therefore, it may produce problematic outputs. Additionally, it remains uncertain whether smaller models might exhibit increased susceptibility to hallucination in generation scenarios by copying text verbatim from the training dataset due to their reduced sizes and memorization capacities. This aspect is currently an active area of research, and we anticipate more rigorous exploration, comprehension, and mitigations in this domain. Regarding ethics, a latent risk associated with all Large Language Models is their malicious utilization. We urge the community to use Granite-4.0-H-350M-Base model with ethical intentions and in a responsible way. Resources - ⭐️ Learn about the latest updates with Granite: https://www.ibm.com/granite - 📄 Get started with tutorials, best practices, and prompt engineering advice: https://www.ibm.com/granite/docs/ - 💡 Learn about the latest Granite learning resources: https://github.com/ibm-granite-community/
granite-4.0-h-small-base-GGUF
> [!NOTE] > This repository contains models that have been converted to the GGUF format with various quantizations from an IBM Granite base model. > > Please reference the base model's full model card here: > https://huggingface.co/ibm-granite/granite-4.0-h-small-base
granite-3b-code-instruct-128k
granite-guardian-hap-125m
Granite 4.0 H 1b Base
Model Summary: Granite-4.0-H-1B-Base is a lightweight decoder-only language model designed for scenarios where efficiency and speed are critical. They can run on resource-constrained devices such as smartphones or IoT hardware, enabling offline and privacy-preserving applications. It also supports Fill-in-the-Middle (FIM) code completion through the use of specialized prefix and suffix tokens. The model is trained from scratch on approximately 15 trillion tokens following a four-stage training strategy: 10 trillion tokens in the first stage, 2 trillion in the second, another 2 trillion in the third, and 0.5 trillion in the final stage. - Developers: Granite Team, IBM - HF Collection: Granite 4.0 Nano Language Models HF Collection - GitHub Repository: ibm-granite/granite-4.0-nano-language-models - Website: Granite Docs - Release Date: October 28, 2025 - License: Apache 2.0 Supported Languages: English, German, Spanish, French, Japanese, Portuguese, Arabic, Czech, Italian, Korean, Dutch, and Chinese. Users may fine-tune Granite 4.0 Nano models to support languages beyond those included in this list. Intended Use: Prominent use cases of LLMs in text-to-text generation include summarization, text classification, extraction, question-answering and code-completion (including FIM) tasks. Moreover, these lightweight models can serve as baseline to create task-specific models for different applications. Generation: This is a simple example of how to use Granite-4.0-H-1B-Base model. Then, copy the code snippet below to run the example. Benchmarks Metric 350M Dense H 350M Dense 1B Dense H 1B Dense HumanEval pass@1 [StarCoder Prompt] 34.6 35.61 68.08 68.26 Multilingual Benchmarks and thr included languages: MMMLU 11 ar, de, en, es, fr, ja, ko, pt, zh, bn, hi INCLUDE 14 hi, bn, ta, te, ar, de, es, fr, it, ja, ko, nl, pt, zh Granite-4.0-H-1B-Base is based on a decoder-only dense transformer architecture. Core components of this architecture are: GQA, Mamba2, MLP with SwiGLU, RMSNorm, and shared input/output embeddings. Number of layers 28 attention 4 attention / 28 Mamba2 40 attention 4 attention / 36 Mamba2 MLP / Shared expert hidden size 2048 2048 4096 4096 Training Data: This model is trained on a mix of open source and proprietary data following a four-stage training strategy. Stage Characteristics 350M Dense H 350M Dense 1B Dense H 1B Dense I General mixture of training data, warmup, and power scheduler for learning rate. 10 10 10 10 II General mixture of training data with higher percentages of code and math with power scheduler for learning rate. 2 2 2 2 III High quality training data, exponential decay of learning rate. 2 2 2 2 IV High quality training data, linear decay to zero for learning rate. 0.5 0.5 0.5 0.5 Infrastructure: We trained the Granite 4.0 Nano Language Models utilizing an NVIDIA GB200 NVL72 cluster hosted in CoreWeave. Intra-rack communication occurs via the 72-GPU NVLink domain, and a non-blocking, full Fat-Tree NDR 400 Gb/s InfiniBand network provides inter-rack communication. This cluster provides a scalable and efficient infrastructure for training our models over thousands of GPUs. Ethical Considerations and Limitations: The use of Large Language Models involves risks and ethical considerations people must be aware of, including but not limited to: bias and fairness, misinformation, and autonomous decision-making. Granite-4.0-H-1B-Base model is not the exception in this regard. Even though this model is suited for multiple generative AI tasks, it has not undergone any safety alignment; therefore, it may produce problematic outputs. Additionally, it remains uncertain whether smaller models might exhibit increased susceptibility to hallucination in generation scenarios by copying text verbatim from the training dataset due to their reduced sizes and memorization capacities. This aspect is currently an active area of research, and we anticipate more rigorous exploration, comprehension, and mitigations in this domain. Regarding ethics, a latent risk associated with all Large Language Models is their malicious utilization. We urge the community to use Granite-4.0-H-1B-Base model with ethical intentions and in a responsible way. Resources - ⭐️ Learn about the latest updates with Granite: https://www.ibm.com/granite - 📄 Get started with tutorials, best practices, and prompt engineering advice: https://www.ibm.com/granite/docs/ - 💡 Learn about the latest Granite learning resources: https://github.com/ibm-granite-community/
granite-vision-3.3-2b-embedding
granite-geospatial-biomass
granite-geospatial-wxc-downscaling
granite-20b-code-base-8k
granite-20b-functioncalling
granite-guardian-3.1-8b
Granite 4.0 Tiny Base Preview
Granite-4.0-Tiny-Base-Preview is a 7B-parameter hybrid mixture-of-experts (MoE) language model featuring a 128k token context window. The architecture leverages Mamba-2, superimposed with a softmax attention for enhanced expressiveness, with no positional encoding for better length generalization. - Developers: Granite Team, IBM - Website: Granite Docs - Release Date: May 2nd, 2025 - License: Apache 2.0 Supported Languages: English, German, Spanish, French, Japanese, Portuguese, Arabic, Czech, Italian, Korean, Dutch, and Chinese. Users may finetune Granite 4.0 models for languages beyond these 12 languages. Intended Use: Prominent use cases of LLMs in text-to-text generation include summarization, text classification, extraction, question-answering, and other long-context tasks. All Granite Base models are able to handle these tasks as they were trained on a large amount of data from various domains. Moreover, they can serve as baseline to create specialized models for specific application scenarios. Installation: You need to install transformer from source to use this checkpoint. HuggingFace PR: https://github.com/huggingface/transformers/pull/37658 Install transformer from source: https://huggingface.co/docs/transformers/en/installation#install-from-source Generation: After installation, copy the code snippet below to run the example. Models ARC-Challenge Hellaswag MMLU TruthfulQA Winogrande GSM8K DROP NQ AGIEval TriviaQA Avg Granite-3.1-2B-Base 46.83 74.9 54.87 38.93 71.8 53.0 30.08 24.46 38.24 63.18 49.63 --> Granite-3.3-2B-Base 47.49 73.2 54.33 40.83 70.4 50.0 32.552 24.36 38.78 63.22 49.52 Granite-3.1-8B-Base 53.51 81.4 64.28 51.27 76.2 70.5 45.87 35.97 48.99 78.33 60.63 --> Granite-3.3-8B-Base 50.84 80.1 63.89 52.15 74.4 59.0 36.14 36.5 49.3 78.18 58.05 Granite-4.0-Tiny-Base-Preview 54.52 75.80 57.86 44.57 71.1 49.0 41.74 28.48 42.61 67.85 53.35 Model Architecture: Granite-4.0-Tiny-Base-Preview is based on a decoder-only dense transformer architecture. Core components of this architecture are: GQA and NoPE, MLP with SwiGLU, RMSNorm, and shared input/output embeddings. Model 2B Dense 8B Dense Granite-4.0-Tiny-Base-Preview Training Data: This model is trained on a mix of open source and proprietary data following a two-stage training strategy. Stage 1 data: The data for stage 1 is sourced from diverse domains, such as: web, code, academic sources, books, and math data. Stage 2 data: The data for stage 2 comprises a curated mix of high-quality data from the same domains, plus multilingual and instruction data. The goal of this second training phase is to enhance the model’s performance on specific tasks. contains a recitation of the related paragraph before the answer. Infrastructure: We train Granite 4.0 Language Models using IBM's super computing cluster, Blue Vela, which is outfitted with NVIDIA H100 GPUs. This cluster provides a scalable and efficient infrastructure for training our models over thousands of GPUs. Ethical Considerations and Limitations: The use of Large Language Models involves risks and ethical considerations people must be aware of, including but not limited to: bias and fairness, misinformation, and autonomous decision-making. Granite-4.0-Tiny-Base-Preview model is not the exception in this regard. Even though this model is suited for multiple generative AI tasks, it has not undergone any safety alignment, there it may produce problematic outputs. Additionally, it remains uncertain whether smaller models might exhibit increased susceptibility to hallucination in generation scenarios by copying text verbatim from the training dataset due to their reduced sizes and memorization capacities. This aspect is currently an active area of research, and we anticipate more rigorous exploration, comprehension, and mitigations in this domain. Regarding ethics, a latent risk associated with all Large Language Models is their malicious utilization. We urge the community to use Granite-4.0-Tiny-Base-Preview model with ethical intentions and in a responsible way. Signature verification: Model signing is an experimental feature with ongoing development, which might include breaking changes. We are releasing these capabilities to improve the integrity of our models for our security-conscious users and to facilitate feedback from the community. Before trying to verify the signature, ensure that the tensor files have been downloaded with git-lfs and that no files have been added, removed, or modified in your local git checkout: Install the modelsigning (v1.0.1) library with the following command: Then verify the signature with the following command ensuring that the IBM identity '[email protected]' was used for signing this model: Resources - ⭐️ Learn about the latest updates with Granite: https://www.ibm.com/granite - 📄 Get started with tutorials, best practices, and prompt engineering advice: https://www.ibm.com/granite/docs/ - 💡 Learn about the latest Granite learning resources: https://ibm.biz/granite-learning-resources
granite-3.0-1b-a400m-instruct
Model Summary: Granite-3.0-1B-A400M-Instruct is an 1B parameter model finetuned from Granite-3.0-1B-A400M-Base using a combination of open source instruction datasets with permissive license and internally collected synthetic datasets. This model is developed using a diverse set of techniques with a structured chat format, including supervised finetuning, model alignment using reinforcement learning, and model merging. - Developers: Granite Team, IBM - GitHub Repository: ibm-granite/granite-3.0-language-models - Website: Granite Docs - Paper: Granite 3.0 Language Models - Release Date: October 21st, 2024 - License: Apache 2.0 Supported Languages: English, German, Spanish, French, Japanese, Portuguese, Arabic, Czech, Italian, Korean, Dutch, and Chinese. Users may finetune Granite 3.0 models for languages beyond these 12 languages. Intended use: The model is designed to respond to general instructions and can be used to build AI assistants for multiple domains, including business applications. Capabilities Summarization Text classification Text extraction Question-answering Retrieval Augmented Generation (RAG) Code related tasks Function-calling tasks Multilingual dialog use cases Generation: This is a simple example of how to use Granite-3.0-1B-A400M-Instruct model. Then, copy the snippet from the section that is relevant for your use case. Model Architecture: Granite-3.0-1B-A400M-Instruct is based on a decoder-only sparse Mixture of Experts (MoE) transformer architecture. Core components of this architecture are: Fine-grained Experts, Dropless Token Routing, and Load Balancing Loss. | Model | 2B Dense | 8B Dense | 1B MoE | 3B MoE | | :-------- | :--------| :--------| :-------- |:-------- | | Embedding size | 2048 | 4096 | 1024 | 1536 | | Number of layers | 40 | 40 | 24 | 32 | | Attention head size | 64 | 128 | 64 | 64 | | Number of attention heads | 32 | 32 | 16 | 24 | | Number of KV heads | 8 | 8 | 8 | 8 | | MLP hidden size | 8192 | 12800 | 512 | 512 | | MLP activation | SwiGLU | SwiGLU | SwiGLU | SwiGLU | | Number of Experts | — | — | 32 | 40 | | MoE TopK | — | — | 8 | 8 | | Initialization std | 0.1 | 0.1 | 0.1 | 0.1 | | Sequence Length | 4096 | 4096 | 4096 | 4096 | | Position Embedding | RoPE | RoPE | RoPE | RoPE | | # Parameters | 2.5B | 8.1B | 1.3B | 3.3B | | # Active Parameters | 2.5B | 8.1B | 400M | 800M | | # Training tokens | 12T | 12T | 10T | 10T | Training Data: Overall, our SFT data is largely comprised of three key sources: (1) publicly available datasets with permissive license, (2) internal synthetic data targeting specific capabilities, and (3) very small amounts of human-curated data. A detailed attribution of datasets can be found in the Granite Technical Report and Accompanying Author List. Infrastructure: We train Granite 3.0 Language Models using IBM's super computing cluster, Blue Vela, which is outfitted with NVIDIA H100 GPUs. This cluster provides a scalable and efficient infrastructure for training our models over thousands of GPUs while minimizing environmental impact by utilizing 100% renewable energy sources. Ethical Considerations and Limitations: Granite 3.0 Instruct Models are primarily finetuned using instruction-response pairs mostly in English, but also multilingual data covering eleven languages. Although this model can handle multilingual dialog use cases, its performance might not be similar to English tasks. In such case, introducing a small number of examples (few-shot) can help the model in generating more accurate outputs. While this model has been aligned by keeping safety in consideration, the model may in some cases produce inaccurate, biased, or unsafe responses to user prompts. So we urge the community to use this model with proper safety testing and tuning tailored for their specific tasks. Resources - ⭐️ Learn about the latest updates with Granite: https://www.ibm.com/granite - 📄 Get started with tutorials, best practices, and prompt engineering advice: https://www.ibm.com/granite/docs/ - 💡 Learn about the latest Granite learning resources: https://ibm.biz/granite-learning-resources
granite-7b-base
Architecture: The model architecture is a replica of Meta’s Llama2-7B base variant with MHA, trained with 1M batch size on 2T tokens. Representing IBM’s commitment to open source innovation IBM has released granite-7b-base, a base pre-trained LLM from IBM’s Granite model series, under an apache-2.0 license for community and commercial use. Granite-7b-base was pre-trained from scratch on IBM-curated data as an open reference implementation of Meta’s Llama-2-7B. In a commitment to data transparency and fostering open innovation, the data sources, sampling proportions, and URLs for access are provided below. For more information about training this model, please check out the blog: https://pytorch.org/blog/maximizing-training/ The model was trained on 2T tokens, with sampling proportions designed to match the sampling distributions released in the Llama1 paper as closely as possible. | Dataset | Description | Sampling Proportion | URL | |-------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------|--------------------------------------------------------------------| | Common Crawl | Open repository of web crawl data with snapshots ranging from 2021 to 2023. | 77% | https://data.commoncrawl.org/ | | GithubClean | Code data from CodeParrot covering a variety of coding languages. | 5.50% | https://huggingface.co/datasets/codeparrot/github-code-clean | | Wikipedia and Wikimedia | Eight Wikimedia projects (enwiki, enwikibooks, enwikinews, enwikiquote, enwikisource, enwikiversity, enwikivoyage, enwiktionary). containing extracted plain text from pages and articles. | 2% | https://dumps.wikimedia.org | | USPTO | US patents granted from 1975 to May 2023, excluding design patents. | 5% | https://bulkdata.uspto.gov/ | | PubMed Central | Biomedical and life sciences papers. | 1.75% | https://ftp.ncbi.nlm.nih.gov/pub/pmc/oapackage/ | | arXiv | Over 1.8 million scientific paper pre-prints posted to arXiv. | 2.50% | https://huggingface.co/datasets/togethercomputer/RedPajama-Data-1T | | StackExchange | Anonymized set of all user-contributed content on the Stack Exchange network, a popular collection of websites centered around user-contributed questions and answers. | 1% | https://archive.org/details/stackexchange20221206 | | PG19 | A repository of free e-books with focus on older works for which U.S. copyright has expired. | 0.25% | https://github.com/google-deepmind/pg19 | | Webhose | Unstructured web content converted into machine-readable data feeds purchased by IBM. | 5% | N/A | | Evaluation metric | Llama2-7B (baseline) | Granite-7b-base | |----------------------------|----------------------|-----------------| | MMLU (zero shot) | 0.41 | 0.43 | | MMLU (5-shot weighted avg) | 0.47 | 0.50 | | Arc challenge | 0.46 | 0.44 | | Arc easy | 0.74 | 0.71 | | Boolq | 0.78 | 0.76 | | Copa | 0.87 | 0.83 | | Hellaswag | 0.76 | 0.74 | | Openbookqa | 0.44 | 0.42 | | Piqa | 0.79 | 0.79 | | Sciq | 0.91 | 0.91 | | Winogrande | 0.69 | 0.67 | | Truthfulqa | 0.39 | 0.39 | | GSM8k (8-shot) | 0.13 | 0.11 | Granite-7b-base is a base model and has not undergone any safety alignment, there it may produce problematic outputs. In the absence of adequate safeguards and RLHF, there exists a risk of malicious utilization of these models for generating disinformation or harmful content. Caution is urged against complete reliance on a specific language model for crucial decisions or impactful information, as preventing these models from fabricating content is not straightforward. Additionally, it remains uncertain whether smaller models might exhibit increased susceptibility to hallucination in ungrounded generation scenarios due to their reduced sizes and memorization capacities. This aspect is currently an active area of research, and we anticipate more rigorous exploration, comprehension, and mitigations in this domain.
granite-4.0-h-micro-base-GGUF
> [!NOTE] > This repository contains models that have been converted to the GGUF format with various quantizations from an IBM Granite base model. > > Please reference the base model's full model card here: > https://huggingface.co/ibm-granite/granite-4.0-h-micro-base
granite-4.0-h-tiny-base-GGUF
> [!NOTE] > This repository contains models that have been converted to the GGUF format with various quantizations from an IBM Granite base model. > > Please reference the base model's full model card here: > https://huggingface.co/ibm-granite/granite-4.0-h-tiny-base
granite-4.0-tiny-base-preview-GGUF
granite-geospatial-land-surface-temperature
granite-4.0-micro-base-GGUF
> [!NOTE] > This repository contains models that have been converted to the GGUF format with various quantizations from an IBM Granite base model. > > Please reference the base model's full model card here: > https://huggingface.co/ibm-granite/granite-4.0-micro-base
granite-4.0-1b-GGUF
granite-4.0-350m-GGUF
granite-4.0-h-1b-GGUF
granite-4.0-h-350m-GGUF
granite-guardian-3.0-2b
granite-guardian-3.0-8b
granite-7b-instruct
| Model | Alignment | Base | Teacher | MTBench (Avg) | MMLU(5-shot) | | --- | --- | --- | --- | --- | --- | | Llama-2-13b-chat-hf | RLHF | Llama-2-13b | Human Annotators | 6.65 |54.58 | | Orca-2-13b | Progressive Training | Llama-2-13b | GPT-4 | 6.15 | 60.37 | | WizardLM-13B-V1.2 | Evol-Instruct | Llama-2-13b | GPT-4 | 7.20 | 54.83 | | Labradorite-13b | Large-scale Alignment for chatBots (LAB) | Llama-2-13b | Mixtral-8x7B-Instruct | 7.23 | 58.89 | | Mistral-7B-Instruct-v0.1 | SFT | Mistral-7B-v0.1 | - | 6.84 | 60.37 | | zephyr-7b-beta | SFT/DPO | Mistral-7B-v0.1 | GPT-4 | 7.34 | 61.07 | | Mistral-7B-Instruct-v0.2 | SFT | Mistral-7B-v0.1 | - | 7.6 | 60.78 | | Merlinite-7b-lab | Large-scale Alignment for chatBots (LAB) | Mistral-7B-v0.1 | Mixtral-8x7B-Instruct | 7.66 |64.88 | | Granite-7b-lab | Large-scale Alignment for chatBots (LAB) | Granite-7b-base| Mixtral-8x7B-Instruct | 6.69 | 51.91 | [] Numbers for models other than Merlinite-7b-lab, Granite-7b-lab and Labradorite-13b are taken from lmsys/chatbot-arena-leaderboard LAB: Large-scale Alignment for chatBots is a novel synthetic data-based alignment tuning method for LLMs from IBM Research. Granite-7b-lab is a Granite-7b-base derivative model trained with the LAB methodology, using Mixtral-8x7b-Instruct as a teacher model. 1. Taxonomy-driven data curation process 2. Large-scale synthetic data generator 3. Two-phased-training with replay buffers LAB approach allows for adding new knowledge and skills, in an incremental fashion, to an already pre-trained model without suffering from catastrophic forgetting. Taxonomy is a tree of seed examples that are used to prompt a teacher model to generate synthetic data. Taxonomy allows the data curator or the model designer to easily specify a diverse set of the knowledge-domains and skills that they would like to include in their LLM. At a high level, these can be categorized into three high-level bins - knowledge, foundational skills, and compositional skills. The leaf nodes of the taxonomy are tasks associated with one or more seed examples. During the synthetic data generation, unlike previous approaches where seed examples are uniformly drawn from the entire pool (i.e. self-instruct), we use the taxonomy to drive the sampling process: For each knowledge/skill, we only use the local examples within the leaf node as seeds to prompt the teacher model. This makes the teacher model better exploit the task distributions defined by the local examples of each node and the diversity in the taxonomy itself ensures the entire generation covers a wide range of tasks, as illustrated below. In turns, this allows for using Mixtral 8x7B as the teacher model for generation while performing very competitively with models such as ORCA-2, WizardLM, and Zephyr Beta that rely on synthetic data generated by much larger and capable models like GPT-4. For adding new domain-specific knowledge, we provide an external knowledge source (document) and prompt the model to generate questions and answers based on the document. Foundational skills such as reasoning and compositional skills such as creative writing are generated through in-context learning using the seed examples from the taxonomy. Additionally, to ensure the data is high-quality and safe, we employ steps to check the questions and answers to ensure that they are grounded and safe. This is done using the same teacher model that generated the data. Our training consists of two major phases: knowledge tuning and skills tuning. There are two steps in knowledge tuning where the first step learns simple knowledge (short samples) and the second step learns complicated knowledge (longer samples). The second step uses replay a replay buffer with data from the first step. Both foundational skills and compositional skills are learned during the skills tuning phases, where a replay buffer of data from the knowledge phase is used. Importantly, we use a set of hyper-parameters for training that are very different from standard small-scale supervised fine-training: larger batch size and carefully optimized learning rate and scheduler. Model description - Model Name: Granite-7b-lab - Language(s): Primarily English - License: Apache 2.0 - Base model: ibm/granite-7b-base - Teacher Model: mistralai/Mixtral-8x7B-Instruct-v0.1 We advise utilizing the system prompt employed during the model's training for optimal inference performance, as there could be performance variations based on the provided instructions. Granite-7b-lab is a base model and has not undergone any safety alignment, there it may produce problematic outputs. In the absence of adequate safeguards and RLHF, there exists a risk of malicious utilization of these models for generating disinformation or harmful content. Caution is urged against complete reliance on a specific language model for crucial decisions or impactful information, as preventing these models from fabricating content is not straightforward. Additionally, it remains uncertain whether smaller models might exhibit increased susceptibility to hallucination in ungrounded generation scenarios due to their reduced sizes and memorization capacities. This aspect is currently an active area of research, and we anticipate more rigorous exploration, comprehension, and mitigations in this domain.
granite-3.3-8b-base-GGUF
granite-4.0-h-1b-base-GGUF
granite-4.0-h-350m-base-GGUF
granite-4.0-1b-base-GGUF
granite-4.0-350m-base-GGUF
granite-34b-code-instruct-8k
granite-3.3-2b-base-GGUF
granite-speech-3.2-8b
granite-geospatial-uki
granite-8b-code-base-4k
granite-guardian-3.2-5b-lora-harm-correction
Granite Guardian 3.2 5b Harm Correction LoRA is a LoRA adapter for ibm-granite/granite-guardian-3.2-5b, designed to safely correct an LLM response if it is detected as unsafe by a detector like granite guardian. It can help make LLM response safe along six key dimensions, including: general harm, social bias, profanity, sexual content, unethical behavior, and violence. - Developers: IBM Research - GitHub Repository: ibm-granite/granite-guardian - Cookbook Harm Correction LoRA Notebook - Website: Granite Guardian Docs - Paper: Granite Guardian - Paper - Evaluation Methodology: Comprehensive Evaluation Framework of Alignment Techniques - Release Date: September 2, 2025 - License: Apache 2.0 Intended Use Granite Guardian is useful for risk detection use-cases which are applicable across a wide-range of enterprise applications. Granite-guardian-3.2-5b-harm-corrector-lora takes an input consisting of the original user prompt and the `originalbadresponse` generated by granite guardian-3.2-5b, and generates a `correctedresponse` via the granite-guardian-3.2-5b-harm-corrector-lora. This model is part of an ongoing research effort focused on post-generation mitigation and remains experimental and under active development. We are committed to continuous improvement and welcome constructive feedback to enhance its performance and capabilities. The model is specifically designed to correct assistant messages containing only the following risks: - Harm: Any content considered generally harmful by the model, specific risks under harm include: - Social Bias: prejudice based on identity or characteristics. - Violence: content promoting physical, mental, or sexual harm. - Profanity: use of offensive language or insults. - Sexual Content: explicit or suggestive material of a sexual nature. - Unethical Behavior: actions that violate moral or legal standards. The adapter manages both safe and unsafe cases as identified by the Granite Guardian 3.2 5b model. If the assistant message is deemed unsafe, it will correct the response. If the assistant message is already safe, it simply outputs an empty string, confirming that no correction was needed, thus helping to save compute resources. It is important to note that there is no built-in safeguard to guarantee that the corrected response will always be safe. As with other generative models, safety assurance relies on offline evaluations (see Evaluations ), and we expect, but cannot ensure, that the `correctedresponse` meets safety standards. For users seeking additional assurance, we recommend re-running the corrected output through the main granite-guardian-3.2-5b `correctedresponse` to verify that it is indeed safe Granite Guardian Cookbooks offers an excellent starting point for working with guardian models, providing a variety of examples that demonstrate how the models can be configured for different risk detection scenarios. Refer to Quick Start Guide and Detailed Guide to get ready with Granite Guardian scope of use. Granite Guardian 3.2 5B Harm Correction LoRA Cookbooks provide the steps to insert the LoRA adapter on top of Granite Guardian for harm-based corrections (or text-mitigation). This correction-LoRA model takes an input consisting of a prompt and an originalbadresponse, and generates a correctedresponse. Importantly, there is no built-in safeguard to guarantee that the corrected response will always be safe. As with other generative models, safety assurance relies on offline evaluations, and we expect—but cannot ensure—that the correctedresponse meets safety standards (see the Evaluations section for details). For users seeking additional assurance, we recommend re-running the corrected output through the main Granite Guardian 3.3 (GG3.3) model to verify that it is indeed safe The following code describes how to apply the Granite Guardian 3.2 5b Harm Correction LoRA to safely correct assistant message. The code checks if the assistant message contains a certain risk (e.g. violence), using Granite Guardian 3.2 5b. It extracts a "Yes" (i.e. unsafe) or "No" (i.e. safe) label and a confidence level from the model’s output. If the response is detected as unsafe, it uses the Harm Correction LoRA adapter to generate a safer version of the assistant message. Note: to specify a different risk type, simply change the `riskname` value in `guardianconfig = {"riskname": "violence"}`. The supported risk types include: `'harm'`, `'socialbias'`, `'profanity'`, `'sexualcontent'`, `'unethicalbehavior'`, and `'violence'`. - The Granite Guardian 3.2 5b Harm Correction LoRA adapter is intended for use cases that involve the safe correction of LLM responses. For example, it is designed to safely correct LLM responses that are flagged as unsafe, based on a specific risk definition. Note that the adapter is only designed to work with Granite Guardian 3.2 5b. The temperature parameter of 0 generates more deterministic responses, while higher values introduce greater randomness and creativity. We found that a temperature value of 0.7 produces coherent outputs, but users can adjust it based on the level of variability they require and the needs of their application. Granite Guardian 3.2 5b Harm Correction LoRA adapter was trained using synthetic data that was generated via the Principle-Instruct synthetic data generation pipeline. A series of questions (+180K) were generated out of a set of synthetically generated topics related to a set of harm categories. A panel of models were utilized for creating the unaligned original responses, and a critic model judged the extent that these responses violated the principle generated for the specific triple (value, top, question). A critic model thus evaluates and gives a score to the original responses and if it violates the principle, the a generator model creates a new response that should follow the stated principle. This is repeated until all unaligned responses are thus corrected. We evaluate the performance of our Granite Guardian 3.2 5b Harm Correction LoRA using our developed Comprehensive Evaluation Framework of Alignment Techniques of LLMs. This evaluation is followed these steps: 1. Collect User Prompts - Start with a set of user prompts sourced from various benchmarks. 2. Generate Original Responses - For each user prompt, generate an original responses using the `granite-3.3-8b-base` base model. 3. Apply the Harm Correction - Pass each original response through the Granite Guardian 3.2 5b Harm Correction LoRA using the correction template to produce a corrected version if the original response contains a type of risk. - Hyperparameters: temperature = 0.7 (with three random seeds), maxnewtokens = 1024. 4. Judge the Responses - Use three separate judge models to compare the original and corrected responses: - `llama-3.1-405b` - `llama-3.3-70b-Instruct` - `mixtral-8x22b-Instruct` - Each judge determines whether the corrected response is preferred over the original response based on general harm risk definition 5. Calculate Win Rate - Compute the win rate: the percentage of cases where the corrected response was preferred by the judge models over the original response, after removing any positional bias in the judge models. We conduct three experiments with different random seeds and report the average result. The following table presents the Win Rate scores (averaged across seeds) on our internal data averaged across the three judge models by each harm criteria. | General harm | Profanity | Sexual Content | Social Bias | Unethical Behavior | Violence | |---------------|------------|------------------|-----------------|--------------------|----------| | 95.32 | 78.62 | 83.60 | 99.51 | 74.87 | 97.79 | The following table presents the Win Rate scores (averaged across seeds) for each out-of-distribution (OOD) benchmark averaged across the three judge models using the general harm risk definition. | Base Model | Truthful QA | BeaverTails | Reward-bench 2 | SafeRLHF | XSTEST-RH | HarmfulQA | |---------------------|-------------|-------------|----------------|----------|-----------|---------- | | granite-3.3-8b-base | 89.97 | 94.81 | 87.53 | 92.75 | 94.46 | 92.54 | Citation If you find this adapter useful, please cite the following work. Momin Abbas, Muneeza Azmat, Marcelo Carpinette Grave, Raya Horesh, Rogerio A de Paula, Maysa Malfiza Garcia de Macedo, Tiago Machado, Rebecka Nordenlow, Heloisa Caroline de Souza Pereira Candello, Luan Soares de Souza, Aminat Adebiyi
granite-3b-code-base-128k
granite-3.2-8b-instruct-preview
granite-3b-code-instruct-2k-GGUF
granite-34b-code-base-8k
granite-20b-code-instruct-8k-GGUF
GneissWeb.7B_ablation_model_on_350B_GneissWeb.seed2
GneissWeb.7B_ablation_model_on_350B_FineWeb.Edu.seed2
GneissWeb.7B_ablation_model_on_350B_FineWeb.Edu.seed1
GneissWeb.7B_ablation_model_on_350B_GneissWeb.seed3
granite-20b-code-base-r1.1
granite-20b-code-instruct-r1.1
GneissWeb.7B_ablation_model_on_350B_FineWeb.seed1
GneissWeb.7B_ablation_model_on_350B_GneissWeb.seed1
granite-geospatial-uki-flooddetection
granite-3.1-8b-lora-intrinsics-v0.1
GneissWeb.7B_ablation_model_on_350B_FineWeb.seed2
GneissWeb.7B_ablation_model_on_350B_FineWeb.Edu.seed3
granite-3.0-8b-lora-intrinsics-v0.1
GneissWeb.7B_ablation_model_on_350B_FineWeb.seed3
granite-20b-code-base-8k-GGUF
granite-guardian-3.3-8b-GGUF
> [!NOTE] > This repository contains models that have been converted to the GGUF format with various quantizations from an IBM Granite base model. > > Please reference the base model's full model card here: > https://huggingface.co/ibm-granite/granite-guardian-3.3-8b Model Summary: Granite Guardian 3.3 8b is a specialized Granite 3.3 8B model designed to judge if the input prompts and the output responses of an LLM based system meet specified criteria. The model comes pre-baked with certain criteria including but not limited to: jailbreak attempts, profanity, and hallucinations related to tool calls and retrieval augmented generation in agent-based systems. Additionally, the model also allows users to bring their own criteria and tailor the judging behavior to specific use-cases. This version of Granite Guardian is a hybrid thinking model that allows the user to operate in thinking or non-thinking model. In thinking mode, the model produces detailed reasoning traces though and tags. In non-thinking mode, the model only produces the judgement score though the tags. It is trained on unique data comprising human annotations and synthetic data informed by internal red-teaming. It outperforms other open-source models in the same space on standard benchmarks. - Developers: IBM Research - GitHub Repository: ibm-granite/granite-guardian - Cookbook: Granite Guardian Recipes - Website: Granite Guardian Docs - Paper: Granite Guardian - Release Date: August 1, 2025 - License: Apache 2.0
granite-3b-code-instruct-accelerator
granite-34b-code-instruct-8k-GGUF
granite-docling-258M-GGUF
granite-34b-code-base-8k-GGUF
granite-8b-code-base-4k-GGUF
granite-7b-instruct-accelerator
granite-8b-code-instruct-accelerator
granite-3.0-8b-instruct-accelerator
granite-34b-code-instruct-accelerator
granite-20b-code-instruct-accelerator
granite-3b-code-base-2k-GGUF
granite-3.2-8b-lora-rag-hallucination-detection
granite-3.2-8b-lora-rag-answerability-prediction
granite-3.2-8b-lora-rag-citation-generation
granite-geospatial-canopyheight
granite-3.3-8b-rag-agent-lib
Granite 3.3 8b Lora Math Prm
granite-3.2-8b-alora-uncertainty
rag-intrinsics-lib
Granite 3.3 8b Security Lib
The LLM Intrinsics Security Library includes six intrinsics implemented as LoRA adapters for `ibm-granite/granite-3.3-8b-instruct`, each of which expects a conversation between a user and an AI assistant as input. Each intrinsic has been developed for a specific task that is likely to be useful for LLM security, privacy or robustness. We give a brief overview of the functionality of each intrinsic, as the details can be found in each individual intrinsic readme. The six intrinsics that have been implemented as LoRA adapters for `ibm-granite/granite-3.3-8b-instruct` and made available in this HF repository are: Adversarial Scoping: This experimental LoRA module is designed to constrain the model to a specific task (summarization), while maintaining safety with respect to harmful prompts. The model was trained to perform summarization tasks using datasets such as CNN/Daily Mail, Amazon food reviews, and abstract summarization corpora. In parallel, the LoRA was also trained to reject harmful requests. As a result, the model, although scoped to summarization, is expected to refuse to summarize content that is harmful or inappropriate, thereby preserving alignment and safety within its operational boundaries. Function Calling Scanner: This LoRA intrinsic is finetuned for detecting incorrect function calls from an LLM agent. Given a user prompt, tool options, and underlying model response, this intrinsic acts as a safeguard blocking LLM agent tool errors. These errors can be from simple LLM mistakes, or due to tool hijacking from jailbreak and prompt injection attacks. Jailbreak Detector: This is an experimental LoRA designed for detecting jailbreak and prompt injection risks in user inputs. Jailbreaks attempt to bypass safeguards in AI systems for malicious purposes, using a variety of attack techniques. This model helps filter such prompts to protect against adversarial threats. In particular, it focuses on social engineering based manipulation like role-playing or use of hypothetical scenarios. PII Detector: This is an experimental LoRA that is designed for detecting PII in model outputs. Models with access to personal information via RAG or similar may present additional data protection risks that can be mitigated by using this LoRA to check model outputs. RAG Data Leakage: This experimental safeguard is designed to detect and mitigate the risk of sensitive data leakage from RAG documents into model outputs. RAG systems enhance AI responses by retrieving relevant documents from external databases, but this introduces the potential for unintended disclosure of private, proprietary, or sensitive information. This model monitors generated responses to prevent such leaks, especially in scenarios where retrieved content may be sensitive or confidential. System Prompt Leakage: This is an experimental LoRA-based model designed to detect risks of system prompt leakage in user inputs. System prompt leakage occurs when adversaries attempt to extract or infer hidden instructions or configurations that guide AI behavior. This model helps identify and filter such attempts, enhancing the security and integrity of AI systems. It is particularly focused on detecting subtle probing techniques, indirect questioning, and prompt engineering strategies that aim to reveal internal system behavior or constraints. To invoke the LoRA adapters, you can follow the following process. 1. Select the LoRA adapter that you want to experiment with from here. 2. Download the LoRA adapter to a local directory. Following example shows how to download the "granite-3.3-8b-instruct-lora-jailbreak-detector" intrinsic to the local directory `intrinsics/jailbreakdetection` 3. Load the LoRA adapter from the downloaded local directory and run the intrinsic model. Each intrinsic contains a README file inside the LoRA adapter directory which explains how to run the model. Here is an example.
granite-rag-3.0-8b-lora
GneissWeb.Quality_annotator
granite-uncertainty-3.0-8b-lora
granite-3.2-8b-lora-rag-query-rewrite
GneissWeb.Tech_classifier
granite-3.2-8b-lora-uncertainty
granite-3.2-8b-alora-rag-query-rewrite
granite-3.3-8b-alora-requirement-check
granite-guardian-3.2-5b-lora-harm-categories
Granite Guardian 3.2 5b Harm Categories LoRA is a LoRA adapter for ibm-granite/granite-guardian-3.2-5b, designed to detect specific and multi-risks in prompts and responses. While the base model identifies a broad range of harms, this adapter allows users to detect specific sub-categories of harm without requiring multiple, parallel calls. It can help with risk detection along many key dimensions catalogued in the IBM AI Risk Atlas. - Developers: IBM Research - GitHub Repository: ibm-granite/granite-guardian - Cookbook: Harm Categories LoRA Notebook - Website: Granite Guardian Docs - Paper: Granite Guardian - Release Date: Sept 2, 2025 - License: Apache 2.0 Granite Guardian is useful for risk detection use-cases which are applicable across a wide-range of enterprise applications. The granite-guardian-3.2-5b-lora-harm-categories scope is to prevent multiple calls to granite-guardian-3.2-5b using all the risk-definitions. Specifically, after one call using the umbrella risk definition (i.e., `harm`), if the text is detected as unsafe, the adapter is applied to detect the sub-category(ies) identified under Risk Definitions The granite-guardian-3.2-5b model is specifically designed to detect various risks in user and assistant messages. This includes an umbrella Harm category designed for out-of-the-box detection for content broadly recognized as harmful, while the granite-guardian-3.2-5b-lora-harm-categories captures specific risks. - Harm: Any content considered generally harmful by the model, specific risks under harm include: - Social Bias: prejudice based on identity or characteristics. - Jailbreaking: deliberate instances of manipulating AI to generate harmful, undesired, or inappropriate content. - Violence: content promoting physical, mental, or sexual harm. - Profanity: use of offensive language or insults. - Sexual Content: explicit or suggestive material of a sexual nature. - Unethical Behavior: actions that violate moral or legal standards. The adapter can handle cases where the prompt/response is predicted to have no-harm or harm. In the latter, the adapter generates one or more risk categories. - When a prompt is determined to be safe, adapter generates: - Not harmful prompt (no category needed) - When a response is determined to be safe, adapter generates: - Not harmful response (no category needed) - When a prompt or response is determined to be unsafe, the adapter identifies the specific type of risk by generating one or more of the following categories: - Social Bias - Jailbreaking - Violence - Profanity - Sexual Content - Unethical Behavior - Harmful (used for general harm that doesn't fit another category) Granite Guardian Cookbooks offers an excellent starting point for working with guardian models, providing a variety of examples that demonstrate how the models can be configured for different risk detection scenarios. Refer to Quick Start Guide and Detailed Guide to get ready with Granite Guardian scope of use. Granite Guardian 3.2 5B Harm Categories LoRA Cookbooks provide the steps to insert the LoRA adapter on top of Granite Guardian and provide example of single- and multiple-risk prediction using only one adapter call on top of Granite Guardian. The following code describes how to use Granite Guardian to obtain probability scores for a given user and assistant message and a pre-defined guardian configuration. At the Granite Guardian it is applied the Multi-Risk LoRA adapter for predicting the risk categories. Prompt Template (Granite-Guardian) Granite Guardian is designed to output a yes/no response for the following prompt template. As outlined in previous section, this template is wrapped within `applychattemplate`. The following snippet illustrates its basic structure in free-form for the `socialbias` risk. The Harm Categories LoRA adapter is intended for use cases that involve the detection of multiple risk dimensions. The adapter's enforce in this way, overall latency, throughput, and inference benefits as single call instead of one for each sub-risk dimension. The granite-guardian-3.2-5b-lora-harm-categories adpater is specifically designed to work with granite-guardian-3.2-5b model. The Harm Category LoRA adapter has been trained with the annotation category provided by DataForce on the samples derived from hh-rlhf dataset used to train Granite Guardian. In addition to the categories flagged by the annotators, a small curated dataset has been added to train jailbreak and profanity. The following table presents the Recall scores for each trained sub-categories on the test set of our annotated dataset. | Metric | Not Harmful Prompt | Not Harmful Response | Harmful | Social Bias | Jailbreaking | Violence | Profanity | Sexual Content | Unethical Behavior | |------------|--------------------|----------------------|---------|-------------|--------------|----------|-----------|----------------|--------------------| | Recall | 1.00 | 1.00 | 0.44 | 0.94 | 0.25 | 0.68 | 0.78 | 0.70 | 0.94 | The following table presents the Recall scores for each harm sub-categories on out-of-distribution (OOD) data. | Metric | Social Bias | Jailbreaking | Violence | Profanity | Sexual Content | Unethical Behavior | |------------|-------------|--------------|----------|-----------|----------------|--------------------| | Recall | 0.89 | 0.91 | 0.81 | 0.74 | 0.80 | 0.97 | Following, the dataset source: - Social Bias - OpenAI Moderation data using only sample flagged as `H` and `H2`; - Jailbreaking - ToxicChat version `0124` using only samples flagged as jailbreak; - Violence - OpenAI Moderation data using only sample flagged as `V` and `V2`; - Profanity - Google Civil Comments using sample with a `insult` score higher than 0.8; - Sexual Content - OpenAI Moderation data using only sample flagged as `S` and `S3`; - Unethical Behavior - Do-Notanswer using sample with `typesofharm` equal to `Nudging or advising users to perform unethical or unsafe actions`; > Note: To better evaluate the correctness of the adapters, the evaluation has been done only on categoryside enforcing a correct prediction of Granite Guardian prefilling the Prompt Template (Multi-Risk) with the `labelprediction = 'Yes'` but using the Granite Guardian 3.2 5b confidence prediction. Citation If you find this adapter useful, please cite the following work. Giandomenico Cornacchia and The Granite Guardian Team