nvidia

βœ“ VerifiedEnterprise

NVIDIA AI, GPU technology leader and model developers

500 models β€’ 86 total models in database
Sort by:

parakeet-tdt-0.6b-v2

[](#model-architecture) | [](#model-architecture) | [](#datasets) > πŸŽ‰ NEW: Multilingual Parakeet TDT 0.6B V3 is now available! > 🌍 25 European Languages | πŸš€ Enhanced Performance | πŸ”— Try it here: nvidia/parakeet-tdt-0.6b-v3 `parakeet-tdt-0.6b-v2` is a 600-million-parameter automatic speech recognition (ASR) model designed for high-quality English transcription, featuring support for punctuation, capitalization, and accurate timestamp prediction. Try Demo here: https://huggingface.co/spaces/nvidia/parakeet-tdt-0.6b-v2 This XL variant of the FastConformer [1] architecture integrates the TDT [2] decoder and is trained with full attention, enabling efficient transcription of audio segments up to 24 minutes in a single pass. The model achieves an RTFx of 3380 on the HF-Open-ASR leaderboard with a batch size of 128. Note: RTFx Performance may vary depending on dataset audio duration and batch size. Key Features - Accurate word-level timestamp predictions - Automatic punctuation and capitalization - Robust performance on spoken numbers, and song lyrics transcription For more information, refer to the Model Architecture section and the NeMo documentation. This model is ready for commercial/non-commercial use. GOVERNING TERMS: Use of this model is governed by the CC-BY-4.0 license. Discover more from NVIDIA: For documentation, deployment guides, enterprise-ready APIs, and the latest open modelsβ€”including Nemotron and other cutting-edge speech, translation, and generative AIβ€”visit the NVIDIA Developer Portal at developer.nvidia.com. Join the community to access tools, support, and resources to accelerate your development with NVIDIA’s NeMo, Riva, NIM, and foundation models. Explore more from NVIDIA: What is Nemotron? NVIDIA Developer Nemotron NVIDIA Riva Speech NeMo Documentation This model serves developers, researchers, academics, and industries building applications that require speech-to-text capabilities, including but not limited to: conversational AI, voice assistants, transcription services, subtitle generation, and voice analytics platforms. This model was developed based on FastConformer encoder architecture[1] and TDT decoder[2] This model has 600 million model parameters. Input: - Input Type(s): 16kHz Audio - Input Format(s): `.wav` and `.flac` audio formats - Input Parameters: 1D (audio signal) - Other Properties Related to Input: Monochannel audio Output: - Output Type(s): Text - Output Format: String - Output Parameters: 1D (text) - Other Properties Related to Output: Punctuations and Capitalizations included. Our AI models are designed and/or optimized to run on NVIDIA GPU-accelerated systems. By leveraging NVIDIA's hardware (e.g. GPU cores) and software frameworks (e.g., CUDA libraries), the model achieves faster training and inference times compared to CPU-only solutions. To train, fine-tune or play with the model you will need to install NVIDIA NeMo. We recommend you install it after you've installed latest PyTorch version. The model is available for use in the NeMo toolkit [3], and can be used as a pre-trained checkpoint for inference or for fine-tuning on another dataset. Transcribing using Python First, let's get a sample Supported Hardware Microarchitecture Compatibility: NVIDIA Ampere NVIDIA Blackwell NVIDIA Hopper NVIDIA Volta Atleast 2GB RAM for model to load. The bigger the RAM, the larger audio input it supports. Current version: parakeet-tdt-0.6b-v2. Previous versions can be accessed here. This model was trained using the NeMo toolkit [3], following the strategies below: - Initialized from a FastConformer SSL checkpoint that was pretrained with a wav2vec method on the LibriLight dataset[7]. - Trained for 150,000 steps on 64 A100 GPUs. - Dataset corpora were balanced using a temperature sampling value of 0.5. - Stage 2 fine-tuning was performed for 2,500 steps on 4 A100 GPUs using approximately 500 hours of high-quality, human-transcribed data of NeMo ASR Set 3.0. Training was conducted using this example script and TDT configuration. The tokenizer was constructed from the training set transcripts using this script. Training Dataset The model was trained on the Granary dataset[8], consisting of approximately 120,000 hours of English speech data: - 10,000 hours from human-transcribed NeMo ASR Set 3.0, including: - LibriSpeech (960 hours) - Fisher Corpus - National Speech Corpus Part 1 - VCTK - VoxPopuli (English) - Europarl-ASR (English) - Multilingual LibriSpeech (MLS English) – 2,000-hour subset - Mozilla Common Voice (v7.0) - AMI - 110,000 hours of pseudo-labeled data from: - YTC (YouTube-Commons) dataset[4] - YODAS dataset [5] - Librilight [7] All transcriptions preserve punctuation and capitalization. The Granary dataset[8] will be made publicly available after presentation at Interspeech 2025. Noise robust data from various sources Single channel, 16kHz sampled data Huggingface Open ASR Leaderboard datasets are used to evaluate the performance of this model. All are commonly used for benchmarking English ASR systems. Audio data is typically processed into a 16kHz mono channel format for ASR evaluation, consistent with benchmarks like the Open ASR Leaderboard. Huggingface Open-ASR-Leaderboard Performance The performance of Automatic Speech Recognition (ASR) models is measured using Word Error Rate (WER). Given that this model is trained on a large and diverse dataset spanning multiple domains, it is generally more robust and accurate across various types of audio. Base Performance The table below summarizes the WER (%) using a Transducer decoder with greedy decoding (without an external language model): | Model | Avg WER | AMI | Earnings-22 | GigaSpeech | LS test-clean | LS test-other | SPGI Speech | TEDLIUM-v3 | VoxPopuli | |:-------------|:-------------:|:---------:|:------------------:|:----------------:|:-----------------:|:-----------------:|:------------------:|:----------------:|:---------------:| | parakeet-tdt-0.6b-v2 | 6.05 | 11.16 | 11.15 | 9.74 | 1.69 | 3.19 | 2.17 | 3.38 | 5.95 | - | Noise Robustness Performance across different Signal-to-Noise Ratios (SNR) using MUSAN music and noise samples: | SNR Level | Avg WER | AMI | Earnings | GigaSpeech | LS test-clean | LS test-other | SPGI | Tedlium | VoxPopuli | Relative Change | |:---------------|:-------------:|:----------:|:------------:|:----------------:|:-----------------:|:-----------------:|:-----------:|:-------------:|:---------------:|:-----------------:| | Clean | 6.05 | 11.16 | 11.15 | 9.74 | 1.69 | 3.19 | 2.17 | 3.38 | 5.95 | - | | SNR 10 | 6.95 | 14.38 | 12.04 | 10.24 | 1.92 | 4.13 | 2.84 | 3.63 | 6.38 | -14.75% | | SNR 5 | 8.23 | 18.07 | 13.82 | 11.18 | 2.33 | 5.58 | 3.81 | 4.24 | 6.81 | -35.97% | | SNR 0 | 11.88 | 25.43 | 18.59 | 14.32 | 4.40 | 10.07 | 7.27 | 6.42 | 8.54 | -96.28% | | SNR -5 | 20.26 | 36.57 | 28.06 | 22.27 | 11.82 | 19.91 | 16.14 | 13.07 | 14.23 | -234.66% | Telephony Audio Performance Performance comparison between standard 16kHz audio and telephony-style audio (using ΞΌ-law encoding with 16kHzβ†’8kHzβ†’16kHz conversion): | Audio Format | Avg WER | AMI | Earnings | GigaSpeech | LS test-clean | LS test-other | SPGI | Tedlium | VoxPopuli | Relative Change | |:-----------------|:-------------:|:----------:|:------------:|:----------------:|:-----------------:|:-----------------:|:-----------:|:-------------:|:---------------:|:-----------------:| | Standard 16kHz | 6.05 | 11.16 | 11.15 | 9.74 | 1.69 | 3.19 | 2.17 | 3.38 | 5.95 | - | | ΞΌ-law 8kHz | 6.32 | 11.98 | 11.16 | 10.02 | 1.78 | 3.52 | 2.20 | 3.38 | 6.52 | -4.10% | These WER scores were obtained using greedy decoding without an external language model. Additional evaluation details are available on the Hugging Face ASR Leaderboard.[6] [1] Fast Conformer with Linearly Scalable Attention for Efficient Speech Recognition [2] Efficient Sequence Transduction by Jointly Predicting Tokens and Durations [4] Youtube-commons: A massive open corpus for conversational and multimodal data [5] Yodas: Youtube-oriented dataset for audio and speech [7] MOSEL: 950,000 Hours of Speech Data for Open-Source Speech Foundation Model Training on EU Languages [8] Granary: Speech Recognition and Translation Dataset in 25 European Languages Test Hardware: NVIDIA A10 NVIDIA A100 NVIDIA A30 NVIDIA H100 NVIDIA L4 NVIDIA L40 NVIDIA Turing T4 NVIDIA Volta V100 Ethical Considerations: NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their supporting model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse. For more detailed information on ethical considerations for this model, please see the Model Card++ Explainability, Bias, Safety & Security, and Privacy Subcards here. Please report security vulnerabilities or NVIDIA AI Concerns here. Field | Response ---------------------------------------------------------------------------------------------------|--------------- Participation considerations from adversely impacted groups protected classes in model design and testing | None Measures taken to mitigate against unwanted bias | None Field | Response ------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------- Intended Domain | Speech to Text Transcription Model Type | FastConformer Intended Users | This model is intended for developers, researchers, academics, and industries building conversational based applications. Output | Text Describe how the model works | Speech input is encoded into embeddings and passed into conformer-based model and output a text response. Name the adversely impacted groups this has been tested to deliver comparable outcomes regardless of | Not Applicable Technical Limitations & Mitigation | Transcripts may be not 100% accurate. Accuracy varies based on language and characteristics of input audio (Domain, Use Case, Accent, Noise, Speech Type, Context of speech, etc.) Verified to have met prescribed NVIDIA quality standards | Yes Performance Metrics | Word Error Rate Potential Known Risks | If a word is not trained in the language model and not presented in vocabulary, the word is not likely to be recognized. Not recommended for word-for-word/incomplete sentences as accuracy varies based on the context of input text Licensing | GOVERNING TERMS: Use of this model is governed by the CC-BY-4.0 license. Field | Response ----------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------- Generatable or reverse engineerable personal data? | None Personal data used to create this model? | None Is there provenance for all datasets used in training? | Yes Does data labeling (annotation, metadata) comply with privacy laws? | Yes Is data compliant with data subject requests for data correction or removal, if such a request was made? | No, not possible with externally-sourced data. Applicable Privacy Policy | https://www.nvidia.com/en-us/about-nvidia/privacy-policy/ Field | Response ---------------------------------------------------|---------------------------------- Model Application(s) | Speech to Text Transcription Describe the life critical impact | None Use Case Restrictions | Abide by CC-BY-4.0 License Model and dataset restrictions | The Principle of least privilege (PoLP) is applied limiting access for dataset generation and model development. Restrictions enforce dataset access during training, and dataset license constraints adhered to.

β€”
3,643,785
1,368

parakeet-rnnt-0.6b

[](#model-architecture) | [](#model-architecture) | [](#datasets) `parakeet-rnnt-0.6b` is an ASR model that transcribes speech in lower case English alphabet. This model is jointly developed by NVIDIA NeMo and Suno.ai teams. It is an XL version of FastConformer Transducer [1] (around 600M parameters) model. See the model architecture section and NeMo documentation for complete architecture details. License to use this model is covered by the CC-BY-4.0. By downloading the public and release version of the model, you accept the terms and conditions of the CC-BY-4.0 license. Discover more from NVIDIA: For documentation, deployment guides, enterprise-ready APIs, and the latest open modelsβ€”including Nemotron and other cutting-edge speech, translation, and generative AIβ€”visit the NVIDIA Developer Portal at developer.nvidia.com. Join the community to access tools, support, and resources to accelerate your development with NVIDIA’s NeMo, Riva, NIM, and foundation models. Explore more from NVIDIA: What is Nemotron? NVIDIA Developer Nemotron NVIDIA Riva Speech NeMo Documentation To train, fine-tune or play with the model you will need to install NVIDIA NeMo. We recommend you install it after you've installed latest PyTorch version. The model is available for use in the NeMo toolkit [3], and can be used as a pre-trained checkpoint for inference or for fine-tuning on another dataset. Transcribing using Python First, let's get a sample This model accepts 16000 Hz mono-channel audio (wav files) as input. This model provides transcribed speech as a string for a given audio sample. FastConformer [1] is an optimized version of the Conformer model with 8x depthwise-separable convolutional downsampling. The model is trained in a multitask setup with a Transducer decoder (RNNT) loss. You may find more information on the details of FastConformer here: Fast-Conformer Model. The NeMo toolkit [3] was used for training the models for over several hundred epochs. These model are trained with this example script and this base config. The tokenizers for these models were built using the text transcripts of the train set with this script. The model was trained on 64K hours of English speech collected and prepared by NVIDIA NeMo and Suno teams. The training dataset consists of private subset with 40K hours of English speech plus 24K hours from the following public datasets: - Librispeech 960 hours of English speech - Fisher Corpus - Switchboard-1 Dataset - WSJ-0 and WSJ-1 - National Speech Corpus (Part 1, Part 6) - VCTK - VoxPopuli (EN) - Europarl-ASR (EN) - Multilingual Librispeech (MLS EN) - 2,000 hour subset - Mozilla Common Voice (v7.0) - People's Speech - 12,000 hour subset The performance of Automatic Speech Recognition models is measuring using Word Error Rate. Since this dataset is trained on multiple domains and a much larger corpus, it will generally perform better at transcribing audio in general. The following tables summarizes the performance of the available models in this collection with the Transducer decoder. Performances of the ASR models are reported in terms of Word Error Rate (WER%) with greedy decoding. |Version|Tokenizer|Vocabulary Size|AMI|Earnings-22|Giga Speech|LS test-clean|SPGI Speech|TEDLIUM-v3|Vox Populi|Common Voice| |---------|-----------------------|-----------------|---------------|---------------|------------|-----------|-----|-------|------|------| | 1.22.0 | SentencePiece Unigram | 1024 | 17.55 | 14.78 | 10.07 | 1.63 | 3.06 | 3.47 | 3.86 | 6.05 | 8.07 | These are greedy WER numbers without external LM. More details on evaluation can be found at HuggingFace ASR Leaderboard NVIDIA Riva, is an accelerated speech AI SDK deployable on-prem, in all clouds, multi-cloud, hybrid, on edge, and embedded. Additionally, Riva provides: World-class out-of-the-box accuracy for the most common languages with model checkpoints trained on proprietary data with hundreds of thousands of GPU-compute hours Best in class accuracy with run-time word boosting (e.g., brand and product names) and customization of acoustic model, language model, and inverse text normalization Streaming speech recognition, Kubernetes compatible scaling, and enterprise-grade support. Although this model isn’t supported yet by Riva, the list of supported models is here. Check out Riva live demo. References [1] Fast Conformer with Linearly Scalable Attention for Efficient Speech Recognition

β€”
3,110,585
12

bigvgan_v2_22khz_80band_256x

--- license: mit license_link: https://huggingface.co/nvidia/BigVGAN/blob/main/LICENSE tags: - neural-vocoder - audio-generation library_name: PyTorch pipeline_tag: audio-to-audio ---

NaNK
license:mit
2,071,898
21

Llama-3.1-Nemotron-Nano-VL-8B-V1

--- license: other license_name: nvidia-open-model-license license_link: >- https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/ pipeline_tag: image-text-to-text library_name: transformers tags: - nvidia - VLM - llama3.1 ---

NaNK
llama3.1
747,190
167

bigvgan_v2_44khz_128band_512x

--- license: mit license_link: https://huggingface.co/nvidia/BigVGAN/blob/main/LICENSE tags: - neural-vocoder - audio-generation library_name: PyTorch pipeline_tag: audio-to-audio ---

NaNK
license:mit
516,665
58

Cosmos-Reason1-7B

--- license: other license_name: nvidia-open-model-license license_link: >- https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license datasets: - nvidia/Cosmos-Reason1-SFT-Dataset - nvidia/Cosmos-Reason1-RL-Dataset - nvidia/Cosmos-Reason1-Benchmark library_name: transformers language: - en base_model: - Qwen/Qwen2.5-VL-7B-Instruct tags: - nvidia - cosmos pipeline_tag: image-text-to-text ---

NaNK
β€”
413,361
188

segformer-b0-finetuned-ade-512-512

--- license: other tags: - vision - image-segmentation datasets: - scene_parse_150 widget: - src: https://huggingface.co/datasets/hf-internal-testing/fixtures_ade20k/resolve/main/ADE_val_00000001.jpg example_title: House - src: https://huggingface.co/datasets/hf-internal-testing/fixtures_ade20k/resolve/main/ADE_val_00000002.jpg example_title: Castle ---

β€”
378,845
173

speakerverification_en_titanet_large

--- language: - en library_name: nemo datasets: - VOXCELEB-1 - VOXCELEB-2 - FISHER - switchboard - librispeech_asr - SRE thumbnail: null tags: - speaker - speech - audio - speaker-verification - speaker-recognition - speaker-diarization - titanet - NeMo - pytorch license: cc-by-4.0 widget: - src: https://huggingface.co/nvidia/speakerverification_en_titanet_large/resolve/main/an255-fash-b.wav example_title: Speech sample 1 - src: https://huggingface.co/nvidia/speakerverification_en_titanet_large/

license:cc-by-4.0
350,877
104

prompt-task-and-complexity-classifier

--- tags: - model_hub_mixin - pytorch_model_hub_mixin license: other ---

β€”
338,166
74

mit-b2

--- license: other tags: - vision datasets: - imagenet_1k widget: - src: https://huggingface.co/datasets/hf-internal-testing/fixtures_ade20k/resolve/main/ADE_val_00000001.jpg example_title: House - src: https://huggingface.co/datasets/hf-internal-testing/fixtures_ade20k/resolve/main/ADE_val_00000002.jpg example_title: Castle ---

β€”
285,202
4

mit-b3

--- license: other tags: - vision datasets: - imagenet_1k widget: - src: https://huggingface.co/datasets/hf-internal-testing/fixtures_ade20k/resolve/main/ADE_val_00000001.jpg example_title: House - src: https://huggingface.co/datasets/hf-internal-testing/fixtures_ade20k/resolve/main/ADE_val_00000002.jpg example_title: Castle ---

β€”
258,325
5

Llama-4-Scout-17B-16E-Instruct-FP8

--- base_model: - meta-llama/Llama-4-Scout-17B-16E-Instruct license: other license_name: nvidia-open-model-license license_link: >- https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license library_name: Model Optimizer tags: - nvidia - modelopt - llama4 - quantized - FP8 extra_gated_prompt: >-

NaNK
llama4
253,105
6

NV-Embed-v2

--- tags: - mteb - sentence-transformers model-index: - name: NV-Embed-v2 results: - dataset: config: en name: MTEB AmazonCounterfactualClassification (en) revision: e8379541af4e31359cca9fbcf4b00f2671dba205 split: test type: mteb/amazon_counterfactual metrics: - type: accuracy value: 94.28358208955224 - type: accuracy_stderr value: 0.40076780842082305 - type: ap value: 76.49097318319616 - type: ap_stderr value: 1.2418692675183929 - type: f1 value: 91.41982003001168 - type: f1_stderr value: 0.504

license:cc-by-nc-4.0
198,193
478

segformer-b1-finetuned-ade-512-512

SegFormer model fine-tuned on ADE20k at resolution 512x512. It was introduced in the paper SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers by Xie et al. and first released in this repository. Disclaimer: The team releasing SegFormer did not write a model card for this model so this model card has been written by the Hugging Face team. SegFormer consists of a hierarchical Transformer encoder and a lightweight all-MLP decode head to achieve great results on semantic segmentation benchmarks such as ADE20K and Cityscapes. The hierarchical Transformer is first pre-trained on ImageNet-1k, after which a decode head is added and fine-tuned altogether on a downstream dataset. You can use the raw model for semantic segmentation. See the model hub to look for fine-tuned versions on a task that interests you. Here is how to use this model to classify an image of the COCO 2017 dataset into one of the 1,000 ImageNet classes: For more code examples, we refer to the documentation.

β€”
166,648
13

segformer-b5-finetuned-ade-640-640

SegFormer model fine-tuned on ADE20k at resolution 640x640. It was introduced in the paper SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers by Xie et al. and first released in this repository. Disclaimer: The team releasing SegFormer did not write a model card for this model so this model card has been written by the Hugging Face team. SegFormer consists of a hierarchical Transformer encoder and a lightweight all-MLP decode head to achieve great results on semantic segmentation benchmarks such as ADE20K and Cityscapes. The hierarchical Transformer is first pre-trained on ImageNet-1k, after which a decode head is added and fine-tuned altogether on a downstream dataset. You can use the raw model for semantic segmentation. See the model hub to look for fine-tuned versions on a task that interests you. Here is how to use this model to classify an image of the COCO 2017 dataset into one of the 1,000 ImageNet classes: For more code examples, we refer to the documentation.

β€”
166,378
43

parakeet-ctc-1.1b

[](#model-architecture) | [](#model-architecture) | [](#datasets) `parakeet-ctc-1.1b` is an ASR model that transcribes speech in lower case English alphabet. This model is jointly developed by NVIDIA NeMo and Suno.ai teams. It is an XXL version of FastConformer CTC [1] (around 1.1B parameters) model. See the model architecture section and NeMo documentation for complete architecture details. To train, fine-tune or play with the model you will need to install NVIDIA NeMo. We recommend you install it after you've installed latest PyTorch version. The model is available for use in the NeMo toolkit [3], and can be used as a pre-trained checkpoint for inference or for fine-tuning on another dataset. Moreover, you can now run Parakeet CTC natively with Transformers πŸ€—. For more details about usage, the refer to Transformers' documentation. This model accepts 16000 Hz mono-channel audio (wav files) as input. This model provides transcribed speech as a string for a given audio sample. FastConformer [1] is an optimized version of the Conformer model with 8x depthwise-separable convolutional downsampling. The model is trained using CTC loss. You may find more information on the details of FastConformer here: Fast-Conformer Model. The NeMo toolkit [3] was used for training the models for over several hundred epochs. These model are trained with this example script and this base config. The tokenizers for these models were built using the text transcripts of the train set with this script. The model was trained on 64K hours of English speech collected and prepared by NVIDIA NeMo and Suno teams. The training dataset consists of private subset with 40K hours of English speech plus 24K hours from the following public datasets: - Librispeech 960 hours of English speech - Fisher Corpus - Switchboard-1 Dataset - WSJ-0 and WSJ-1 - National Speech Corpus (Part 1, Part 6) - VCTK - VoxPopuli (EN) - Europarl-ASR (EN) - Multilingual Librispeech (MLS EN) - 2,000 hour subset - Mozilla Common Voice (v7.0) - People's Speech - 12,000 hour subset The performance of Automatic Speech Recognition models is measuring using Word Error Rate. Since this dataset is trained on multiple domains and a much larger corpus, it will generally perform better at transcribing audio in general. The following tables summarizes the performance of the available models in this collection with the CTC decoder. Performances of the ASR models are reported in terms of Word Error Rate (WER%) with greedy decoding. |Version|Tokenizer|Vocabulary Size|AMI|Earnings-22|Giga Speech|LS test-clean|SPGI Speech|TEDLIUM-v3|Vox Populi|Common Voice| |---------|-----------------------|-----------------|---------------|---------------|------------|-----------|-----|-------|------|------| | 1.22.0 | SentencePiece Unigram | 1024 | 15.62 | 13.69 | 10.27 | 1.83 | 3.54 | 4.20 | 3.54 | 6.53 | 9.02 | These are greedy WER numbers without external LM. More details on evaluation can be found at HuggingFace ASR Leaderboard NVIDIA Riva, is an accelerated speech AI SDK deployable on-prem, in all clouds, multi-cloud, hybrid, on edge, and embedded. Additionally, Riva provides: World-class out-of-the-box accuracy for the most common languages with model checkpoints trained on proprietary data with hundreds of thousands of GPU-compute hours Best in class accuracy with run-time word boosting (e.g., brand and product names) and customization of acoustic model, language model, and inverse text normalization Streaming speech recognition, Kubernetes compatible scaling, and enterprise-grade support Although this model isn’t supported yet by Riva, the list of supported models is here. Check out Riva live demo. References [1] Fast Conformer with Linearly Scalable Attention for Efficient Speech Recognition License to use this model is covered by the CC-BY-4.0. By downloading the public and release version of the model, you accept the terms and conditions of the CC-BY-4.0 license.

NaNK
license:cc-by-4.0
163,381
35

canary-1b-flash

NaNK
license:cc-by-4.0
130,668
255

parakeet-tdt_ctc-110m

license:cc-by-4.0
129,007
36

segformer-b5-finetuned-cityscapes-1024-1024

SegFormer (b5-sized) model fine-tuned on CityScapes SegFormer model fine-tuned on CityScapes at resolution 1024x1024. It was introduced in the paper SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers by Xie et al. and first released in this repository. Disclaimer: The team releasing SegFormer did not write a model card for this model so this model card has been written by the Hugging Face team. SegFormer consists of a hierarchical Transformer encoder and a lightweight all-MLP decode head to achieve great results on semantic segmentation benchmarks such as ADE20K and Cityscapes. The hierarchical Transformer is first pre-trained on ImageNet-1k, after which a decode head is added and fine-tuned altogether on a downstream dataset. You can use the raw model for semantic segmentation. See the model hub to look for fine-tuned versions on a task that interests you. Here is how to use this model to classify an image of the COCO 2017 dataset into one of the 1,000 ImageNet classes: For more code examples, we refer to the documentation.

β€”
104,163
35

Llama-3_3-Nemotron-Super-49B-v1_5

--- library_name: transformers license: other license_name: nvidia-open-model-license license_link: >- https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/

NaNK
llama-3
103,600
209

DeepSeek-R1-0528-FP4

NaNK
license:mit
102,319
38

llama-embed-nemotron-8b

NaNK
llama_bidirec
100,859
69

C-RADIOv3-B

This model performs visual feature extraction. For instance, RADIO generates image embeddings that can be used by a downstream model to classify images. C-RADIOv3 models are available in multiple sizes: Base (90M parameters). Large (320M parameters). Huge (653M parameters). (In training) Gigantic (1.1B parameters). C-RADIOv3 was trained for 1M steps (400k more steps than v1), using inverse frequency sampling for data balancing, and PHI Standardization for teacher distribution balancing. As well as new techniques for summary distribution matching, and domain generalization. This model is ready for commercial/non-commercial use. GOVERNING TERMS: Use of this model is governed by the NVIDIA Open Model License Agreement. The embeddings generated by this model are expected to be used by a downstream application. For example: Image-level understanding (image classification, curation, etc.). Dense processing (semantic segmentation, depth estimation, etc.). Integration into a Vision-Language Model. Huggingface: 03/26/2025 via RADIO Collection of Models. \[CVPR 2025\] RADIOv2.5: Improved Baselines for Agglomerative Vision Foundation Models \[CVPR 2024\] AM-RADIO: Agglomerative Vision Foundation Model - Reduce All Domains Into One Architecture Type: Neural Network Network Architecture: Vision Transformer Input Type(s): Image Input Format(s): Red, Green, Blue (RGB) Input Parameters: Two Dimensional (2D) Other Properties Related to Input: Image resolutions up to 2048x2028 in increments of 16 pixels Output Type(s): Embeddings Output Format: Tensor Output Parameters: 2D Other Properties Related to Output: Downstream model required to leverage image features RADIO will return a tuple with two tensors. The `summary` is similar to the `clstoken` in ViT and is meant to represent the general concept of the entire image. It has shape `(B,C)` with `B` being the batch dimension, and `C` being some number of channels. The `spatialfeatures` represent more localized content which should be suitable for dense tasks such as semantic segmentation, or for integration into an LLM. Spatial features have shape `(B,T,D)` with `T` being the flattened spatial tokens, and `D` being the channels for spatial features. Note that `C!=D` in general. Converting to a spatial tensor format can be done using the downsampling size of the model, combined with the input tensor shape. For RADIO, the patch size is 16. The resulting tensor will have shape `(B,D,H,W)`, as is typically seen with computer vision models. Supported Hardware Microarchitecture Compatibility: NVIDIA Ampere NVIDIA Blackwell NVIDIA Jetson NVIDIA Hopper NVIDIA Lovelace NVIDIA Pascal NVIDIA Turing NVIDIA Volta [Preferred/Supported] Operating System(s): Linux Linux 4 Tegra QNX Windows C-RADIOv3-B (90M parameters). C-RADIOv3-L (320M parameters). C-RADIOv3-H (653M parameters). C-RADIOv3-g (1.2B parameters). Properties: This dataset spans 1000 object classes and contains 1,281,167 training images, 50,000 validation images and 100,000 test images. NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse. For more detailed information on ethical considerations for this model, please see the Model Card++ Explainability, Bias, Safety & Security, and Privacy Subcards below. Please report security vulnerabilities or NVIDIA AI Concerns here. Field | Response :---------------------------------------------------------------------------------------------------|:--------------- Participation considerations from adversely impacted groups protected classes in model design and testing: | None Measures taken to mitigate against unwanted bias: | None Field | Response :------------------------------------------------------------------------------------------------------|:--------------------------------------------------------------------------------- Intended Application & Domain: | Visual Feature Extraction Model Type: | Vision Transformer Intended Users: | Developers of downstream vision applications Output: | Image embeddings Describe how the model works: | The model takes an image as input, processes the image through multiple transformer blocks, and outputs summary and patch embeddings. Name the adversely impacted groups this has been tested to deliver comparable outcomes regardless of: | Not Applicable Technical Limitations: | This model generates image embeddings that can be used by a downstream model to, for example, classify images. The downstream model must be trained to leverage the visual embeddings. Verified to have met prescribed NVIDIA quality standards: | Yes Performance Metrics: | Image classification accuracy, semantic segmentation mean-over-intersection. Potential Known Risks: | This model is only tested on input resolutions ranging from 256 to 2048, in increments of 16 pixels. Additionally, the generated embeddings might fail to disambiguate differences that appear evident to humans (e.g. two images showing different breeds of dogs might in fact produce very similar embeddings). Domain-specific evaluation is required for the target application. Licensing: | NVIDIA Open Model License Field | Response :----------------------------------------------------------------------------------------------------------------------------------|:----------------------------------------------- Generatable or reverse engineerable personal data? | None Personal data used to create this model? | None How often is dataset reviewed? | Before Every Release Is there provenance for all datasets used in training? | Yes Does data labeling (annotation, metadata) comply with privacy laws? | Yes Is data compliant with data subject requests for data correction or removal, if such a request was made? | Yes Field | Response :---------------------------------------------------|:---------------------------------- Model Application(s): | Generation of visual embeddings Describe the life critical impact (if present). | Not Applicable Use Case Restrictions: | Abide by NVIDIA Open Model License Agreement Model and dataset restrictions: | The Principle of least privilege (PoLP) is applied limiting access for dataset generation and model development. Restrictions enforce dataset access during training, and dataset license constraints adhered to.

β€”
95,797
3

DeepSeek-R1-0528-NVFP4

NaNK
license:mit
95,018
38

NVIDIA-Nemotron-Nano-9B-v2

The pretraining data has a cutoff date of September 2024. NVIDIA-Nemotron-Nano-9B-v2 is a large language model (LLM) trained from scratch by NVIDIA, and designed as a unified model for both reasoning and non-reasoning tasks. It responds to user queries and tasks by first generating a reasoning trace and then concluding with a final response. The model's reasoning capabilities can be controlled via a system prompt. If the user prefers the model to provide its final answer without intermediate reasoning traces, it can be configured to do so, albeit with a slight decrease in accuracy for harder prompts that require reasoning. Conversely, allowing the model to generate reasoning traces first generally results in higher-quality final solutions to queries and tasks. The model uses a hybrid architecture consisting primarily of Mamba-2 and MLP layers combined with just four Attention layers. For the architecture, please refer to the Nemotron-H tech report. The model was trained using Megatron-LM and NeMo-RL. The supported languages include: English, German, Spanish, French, Italian, and Japanese. Improved using Qwen. We want to hear from you! Share your ideas, vote on what matters, and help shape the future of Nemotron. GOVERNING TERMS: This trial service is governed by the NVIDIA API Trial Terms of Service. Use of this model is governed by the NVIDIA Open Model License Agreement. We evaluated our model in Reasoning-On mode across all benchmarks, except RULER, which is evaluated in Reasoning-Off mode. | Benchmark | Qwen3-8B | NVIDIA-Nemotron-Nano-9B-v2 | | :---- | ----: | ----: | | AIME25 | 69.3% | 72.1% | | MATH500 | 96.3% | 97.8% | | GPQA | 59.6% | 64.0% | | LCB | 59.5% | 71.1% | | BFCL v3 | 66.3% | 66.9% | | IFEval (Instruction Strict) | 89.4% | 90.3% | | HLE | 4.4% | 6.5% | | RULER (128K) | 74.1% | 78.9% | All evaluations were done using NeMo-Skills. We published a tutorial with all details necessary to reproduce our evaluation results. This model supports runtime β€œthinking” budget control. During inference, the user can specify how many tokens the model is allowed to "think". - Architecture Type: Mamba2-Transformer Hybrid - Network Architecture: Nemotron-Hybrid NVIDIA-Nemotron-Nano-9B-v2 is a general purpose reasoning and chat model intended to be used in English and coding languages. Other non-English languages (German, French, Italian, Spanish and Japanese) are also supported. Developers designing AI Agent systems, chatbots, RAG systems, and other AI-powered applications. Also suitable for typical instruction-following tasks. - Huggingface 08/18/2025 via https://huggingface.co/nvidia/NVIDIA-Nemotron-Nano-9B-v2 - API Catalog 08/18/2025 via https://build.nvidia.com/nvidia/nvidia-nemotron-nano-9b-v2 - NVIDIA Nemotron Nano 2: An Accurate and Efficient Hybrid Mamba-Transformer Reasoning Model - Input Type(s): Text - Input Format(s): String - Input Parameters: One-Dimensional (1D): Sequences - Other Properties Related to Input: Context length up to 128K. Supported languages include German, Spanish, French, Italian, Korean, Portuguese, Russian, Japanese, Chinese and English. - Output Type(s): Text - Output Format: String - Output Parameters: One-Dimensional (1D): Sequences up to 128K Our models are designed and optimized to run on NVIDIA GPU-accelerated systems. By leveraging NVIDIA’s hardware (e.g. GPU cores) and software frameworks (e.g., CUDA libraries), the model achieves faster training and inference times compared to CPU-only solutions. - Runtime Engine(s): NeMo 25.07.nemotron-nano-v2 - Supported Hardware Microarchitecture Compatibility: NVIDIA A10G, NVIDIA H100-80GB, NVIDIA A100, Jetson AGX Thor - Operating System(s): Linux The snippet below shows how to use this model with Huggingface Transformers (tested on version 4.48.3). Case 1: `/think` or no reasoning signal is provided in the system prompt, reasoning will be set to `True` Case 2: `/nothink` is provided, reasoning will be set to `False` Note: `/think` or `/nothink` keywords can also be provided in β€œuser” messages for turn-level reasoning control. We recommend setting `temperature` to `0.6`, `topp` to `0.95` for reasoning True and greedy search for reasoning False, and increase `maxnewtokens` to `1024` or higher for reasoning True. The snippet below shows how to use this model with TRT-LLM. We tested this on the following commit and followed these instructions to build and install TRT-LLM in a docker container. The snippet below shows how to use this model with vLLM. Use the latest version of vLLM and follow these instructions to build and install vLLM. Note: - Remember to add \`--mamba\ssm\cache\dtype float32\` for accurate quality. Without this option, the model’s accuracy may degrade. - If you encounter a CUDA OOM issue, try `--max-num-seqs 64` and consider lower the value further if the error persists. Alternativly, you can use Docker to launch a vLLM server. For Jetson AGX Thor, please use this vLLM container. The thinking budget allows developers to keep accuracy high and meet response‑time targets \- which is especially crucial for customer support, autonomous agent steps, and edge devices where every millisecond counts. With budget control, you can set a limit for internal reasoning: `maxthinkingtokens`: This is a threshold that will attempt to end the reasoning trace at the next newline encountered in the reasoning trace. If no newline is encountered within 500 tokens, it will abruptly end the reasoning trace at \`max\thinking\tokens \+ 500\`. Calling the server with a budget (Restricted to 32 tokens here as an example) After launching a vLLM server, you can call the server with tool-call support using a Python script like below: We follow the jinja chat template provided below. This template conditionally adds ` \n` to the start of the Assistant response if `/think` is found in either the system prompt or any user message. If no reasoning signal is added, the model defaults to reasoning "on" mode. The chat template adds ` ` to the start of the Assistant response if `/nothink` is found in the system prompt. Thus enforcing reasoning on/off behavior. Data Modality: Text Text Training Data Size: More than 10 Trillion Tokens Train/Test/Valid Split: We used 100% of the corpus for pre-training and relied on external benchmarks for testing. Data Collection Method by dataset: Hybrid: Automated, Human, Synthetic Labeling Method by dataset: Hybrid: Automated, Human, Synthetic Properties: The post-training corpus for NVIDIA-Nemotron-Nano-9B-v2 consists of English and multilingual text (German, Spanish, French, Italian, Korean, Portuguese, Russian, Japanese, Chinese and English). Our sources cover a variety of document types such as: webpages, dialogue, articles, and other written materials. The corpus spans domains including code, legal, math, science, finance, and more. We also include a small portion of question-answering, and alignment style data to improve model accuracies. For several of the domains listed above we used synthetic data, specifically reasoning traces, from DeepSeek R1/R1-0528, Qwen3-235B-A22B, Nemotron 4 340B, Qwen2.5-32B-Instruct-AWQ, Qwen2.5-14B-Instruct, Qwen 2.5 72B. The pre-training corpus for NVIDIA-Nemotron-Nano-9B-v2 consists of high-quality curated and synthetically-generated data. It is trained in the English language, as well as 15 multilingual languages and 43 programming languages. Our sources cover a variety of document types such as: webpages, dialogue, articles, and other written materials. The corpus spans domains including legal, math, science, finance, and more. We also include a small portion of question-answering, and alignment style data to improve model accuracy. The model was pre-trained for approximately twenty trillion tokens. Alongside the model, we release our final pretraining data, as outlined in this section. For ease of analysis, there is a sample set that is ungated. For all remaining code, math and multilingual data, gating and approval is required, and the dataset is permissively licensed for model training purposes. More details on the datasets and synthetic data generation methods can be found in the technical report NVIDIA Nemotron Nano 2: An Accurate and Efficient Hybrid Mamba-Transformer Reasoning Model . | Dataset | Collection Period | | :---- | :---- | | Problems in Elementary Mathematics for Home Study | 4/23/2025 | | GSM8K | 4/23/2025 | | PRM800K | 4/23/2025 | | CC-NEWS | 4/23/2025 | | Common Crawl | 4/23/2025 | | Wikimedia | 4/23/2025 | | Bespoke-Stratos-17k | 4/23/2025 | | tigerbot-kaggle-leetcodesolutions-en-2k | 4/23/2025 | | glaive-function-calling-v2 | 4/23/2025 | | APIGen Function-Calling | 4/23/2025 | | LMSYS-Chat-1M | 4/23/2025 | | Open Textbook Library \- CC BY-SA & GNU subset and OpenStax \- CC BY-SA subset | 4/23/2025 | | Advanced Reasoning Benchmark, tigerbot-kaggle-leetcodesolutions-en-2k, PRM800K, and SciBench | 4/23/2025 | | FineWeb-2 | 4/23/2025 | | Court Listener | Legacy Download | | peS2o | Legacy Download | | OpenWebMath | Legacy Download | | BioRxiv | Legacy Download | | PMC Open Access Subset | Legacy Download | | OpenWebText2 | Legacy Download | | Stack Exchange Data Dump | Legacy Download | | PubMed Abstracts | Legacy Download | | NIH ExPorter | Legacy Download | | arXiv | Legacy Download | | BigScience Workshop Datasets | Legacy Download | | Reddit Dataset | Legacy Download | | SEC's Electronic Data Gathering, Analysis, and Retrieval (EDGAR) | Legacy Download | | Public Software Heritage S3 | Legacy Download | | The Stack | Legacy Download | | mC4 | Legacy Download | | Advanced Mathematical Problem Solving | Legacy Download | | MathPile | Legacy Download | | NuminaMath CoT | Legacy Download | | PMC Article | Legacy Download | | FLAN | Legacy Download | | Advanced Reasoning Benchmark | Legacy Download | | SciBench | Legacy Download | | WikiTableQuestions | Legacy Download | | FinQA | Legacy Download | | Riddles | Legacy Download | | Problems in Elementary Mathematics for Home Study | Legacy Download | | MedMCQA | Legacy Download | | Cosmos QA | Legacy Download | | MCTest | Legacy Download | | AI2's Reasoning Challenge | Legacy Download | | OpenBookQA | Legacy Download | | MMLU Auxiliary Train | Legacy Download | | social-chemestry-101 | Legacy Download | | Moral Stories | Legacy Download | | The Common Pile v0.1 | Legacy Download | | FineMath | Legacy Download | | MegaMath | Legacy Download | | FastChat | 6/30/2025 | Private Non-publicly Accessible Datasets of Third Parties | Dataset | | :---- | | Global Regulation | | Workbench | The English Common Crawl data was downloaded from the Common Crawl Foundation (see their FAQ for details on their crawling) and includes the snapshots CC-MAIN-2013-20 through CC-MAIN-2025-13. The data was subsequently deduplicated and filtered in various ways described in the Nemotron-CC paper. Additionally, we extracted data for fifteen languages from the following three Common Crawl snapshots: CC-MAIN-2024-51, CC-MAIN-2025-08, CC-MAIN-2025-18. The fifteen languages included were Arabic, Chinese, Danish, Dutch, French, German, Italian, Japanese, Korean, Polish, Portuguese, Russian, Spanish, Swedish, and Thai. As we did not have reliable multilingual model-based quality classifiers available, we applied just heuristic filtering insteadβ€”similar to what we did for lower quality English data in the Nemotron-CC pipeline, but selectively removing some filters for some languages that did not work well. Deduplication was done in the same way as for Nemotron-CC. The GitHub Crawl was collected using the GitHub REST API and the Amazon S3 API. Each crawl was operated in accordance with the rate limits set by its respective source, either GitHub or S3. We collect raw source code and subsequently remove any having a license which does not exist in our permissive-license set (for additional details, refer to the technical report). | Dataset | Modality | Dataset Size (Tokens) | Collection Period | | :---- | :---- | :---- | :---- | | English Common Crawl | Text | 3.360T | 4/8/2025 | | Multilingual Common Crawl | Text | 812.7B | 5/1/2025 | | GitHub Crawl | Text | 747.4B | 4/29/2025 | | Dataset | Modality | Dataset Size (Tokens) | Seed Dataset | Model(s) used for generation | | :---- | :---- | :---- | :---- | :---- | | Synthetic Art of Problem Solving from DeepSeek-R1 | Text | 25.5B | Art of Problem Solving; American Mathematics Competitions 8; American Mathematics Competitions 10; | DeepSeek-R1 | | Synthetic Moral Stories and Social Chemistry from Mixtral-8x22B-v0.1 | Text | 327M | social-chemestry-101; Moral Stories | Mixtral-8x22B-v0.1 | | Synthetic Social Sciences seeded with OpenStax from DeepSeek-V3, Mixtral-8x22B-v0.1, and Qwen2.5-72B | Text | 83.6M | OpenStax \- CC BY-SA subset | DeepSeek-V3; Mixtral-8x22B-v0.1; Qwen2.5-72B | | Synthetic Health Sciences seeded with OpenStax from DeepSeek-V3, Mixtral-8x22B-v0.1, and Qwen2.5-72B | Text | 9.7M | OpenStax \- CC BY-SA subset | DeepSeek-V3; Mixtral-8x22B-v0.1; Qwen2.5-72B | | Synthetic STEM seeded with OpenStax, Open Textbook Library, and GSM8K from DeepSeek-R1, DeepSeek-V3, DeepSeek-V3-0324, and Qwen2.5-72B | Text | 175M | OpenStax \- CC BY-SA subset; GSM8K; Open Textbook Library \- CC BY-SA & GNU subset | DeepSeek-R1, DeepSeek-V3; DeepSeek-V3-0324; Qwen2.5-72B | | Nemotron-PrismMath | Text | 4.6B | Big-Math-RL-Verified; OpenR1-Math-220k | Qwen2.5-0.5B-instruct, Qwen2.5-72B-Instruct; DeepSeek-R1-Distill-Qwen-32B | | Synthetic Question Answering Data from Papers and Permissible Books from Qwen2.5-72B-Instruct | Text | 350M | arXiv; National Institutes of Health ExPorter; BioRxiv; PMC Article; USPTO Backgrounds; peS2o; Global Regulation; CORE; PG-19; DOAB CC BY & CC BY-SA subset; NDLTD | Qwen2.5-72B-Instruct | | Synthetic FineMath-4+ Reprocessed from DeepSeek-V3 | Text | 9.2B | Common Crawl | DeepSeek-V3 | | Synthetic FineMath-3+ Reprocessed from phi-4 | Text | 27.6B | Common Crawl | phi-4 | | Synthetic Union-3+ Reprocessed from phi-4 | Text | 93.1B | Common Crawl | phi-4 | | Refreshed Nemotron-MIND from phi-4 | Text | 73B | Common Crawl | phi-4 | | Synthetic Union-4+ Reprocessed from phi-4 | Text | 14.12B | Common Crawl | phi-4 | | Synthetic Union-3+ minus 4+ Reprocessed from phi-4 | Text | 78.95B | Common Crawl | phi-4 | | Synthetic Union-3 Refreshed from phi-4 | Text | 80.94B | Common Crawl | phi-4 | | Synthetic Union-4+ Refreshed from phi-4 | Text | 52.32B | Common Crawl | phi-4 | | Synthetic AGIEval seeded with AQUA-RAT, LogiQA, and AR-LSAT from DeepSeek-V3 and DeepSeek-V3-0324 | Text | 4.0B | AQUA-RAT; LogiQA; AR-LSAT | DeepSeek-V3; DeepSeek-V3-0324 | | Synthetic AGIEval seeded with AQUA-RAT, LogiQA, and AR-LSAT from Qwen3-30B-A3B | Text | 4.2B | AQUA-RAT; LogiQA; AR-LSAT | Qwen3-30B-A3B | | Synthetic Art of Problem Solving from Qwen2.5-32B-Instruct, Qwen2.5-Math-72B, Qwen2.5-Math-7B, and Qwen2.5-72B-Instruct | Text | 83.1B | Art of Problem Solving; American Mathematics Competitions 8; American Mathematics Competitions 10; GSM8K; PRM800K | Qwen2.5-32B-Instruct; Qwen2.5-Math-72B; Qwen2.5-Math-7B; Qwen2.5-72B-Instruct | | Synthetic MMLU Auxiliary Train from DeepSeek-R1 | Text | 0.5B | MMLU Auxiliary Train | DeepSeek-R1 | | Synthetic Long Context Continued Post-Training Data from Papers and Permissible Books from Qwen2.5-72B-Instruct | Text | 5.4B | arXiv; National Institutes of Health ExPorter; BioRxiv; PMC Article; USPTO Backgrounds; peS2o; Global Regulation; CORE; PG-19; DOAB CC BY & CC BY-SA subset; NDLTD | Qwen2.5-72B-Instruct | | Synthetic Common Crawl from Qwen3-30B-A3B and Mistral-Nemo-12B-Instruct | Text | 1.949T | Common Crawl | Qwen3-30B-A3B; Mistral-NeMo-12B-Instruct | | Synthetic Multilingual Data from Common Crawl from Qwen3-30B-A3B | Text | 997.3B | Common Crawl | Qwen3-30B-A3B | | Synthetic Multilingual Data from Wikimedia from Qwen3-30B-A3B | Text | 55.1B | Wikimedia | Qwen3-30B-A3B | | Synthetic OpenMathReasoning from DeepSeek-R1-0528 | Text | 1.5M | OpenMathReasoning | DeepSeek-R1-0528 | | Synthetic OpenCodeReasoning from DeepSeek-R1-0528 | Text | 1.1M | OpenCodeReasoning | DeepSeek-R1-0528 | | Synthetic Science Data from DeepSeek-R1-0528 | Text | 1.5M | \- | DeepSeek-R1-0528 | | Synthetic Humanity's Last Exam from DeepSeek-R1-0528 | Text | 460K | Humanity's Last Exam | DeepSeek-R1-0528 | | Synthetic ToolBench from Qwen3-235B-A22B | Text | 400K | ToolBench | Qwen3-235B-A22B | | Synthetic Nemotron Content Safety Dataset V2, eval-safety, Gretel Synthetic Safety Alignment, and RedTeam\2K from DeepSeek-R1-0528 | Text | 52K | Nemotron Content Safety Dataset V2; eval-safety; Gretel Synthetic Safety Alignment; RedTeam\2K | DeepSeek-R1-0528 | | Synthetic HelpSteer from Qwen3-235B-A22B | Text | 120K | HelpSteer3; HelpSteer2 | Qwen3-235B-A22B | | Synthetic Alignment data from Mixtral-8x22B-Instruct-v0.1, Mixtral-8x7B-Instruct-v0.1, and Nemotron-4 Family | Text | 400K | HelpSteer2; C4; LMSYS-Chat-1M; ShareGPT52K; tigerbot-kaggle-leetcodesolutions-en-2k; GSM8K; PRM800K; lm\identity (NVIDIA internal); FinQA; WikiTableQuestions; Riddles; ChatQA nvolve-multiturn (NVIDIA internal); glaive-function-calling-v2; SciBench; OpenBookQA; Advanced Reasoning Benchmark; Public Software Heritage S3; Khan Academy Math Keywords | Nemotron-4-15B-Base (NVIDIA internal); Nemotron-4-15B-Instruct (NVIDIA internal); Nemotron-4-340B-Base; Nemotron-4-340B-Instruct; Nemotron-4-340B-Reward; Mixtral-8x7B-Instruct-v0.1; Mixtral-8x22B-Instruct-v0.1 | | Synthetic LMSYS-Chat-1M from Qwen3-235B-A22B | Text | 1M | LMSYS-Chat-1M | Qwen3-235B-A22B | | Synthetic Multilingual Reasoning data from DeepSeek-R1-0528, Qwen2.5-32B-Instruct-AWQ, and Qwen2.5-14B-Instruct | Text | 25M | OpenMathReasoning; OpenCodeReasoning | DeepSeek-R1-0528; Qwen2.5-32B-Instruct-AWQ (translation); Qwen2.5-14B-Instruct (translation); | | Synthetic Multilingual Reasoning data from Qwen3-235B-A22B and Gemma 3 Post-Trained models | Text | 5M | WildChat | Qwen3-235B-A22B; Gemma 3 PT 12B; Gemma 3 PT 27B | Data Collection Method by dataset: Hybrid: Human, Synthetic Labeling Method by dataset: Hybrid: Automated, Human, Synthetic - ## Test Hardware NVIDIA A10G 24GB, H100 80GB, Jetson AGX Thor NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our Trustworthy AI terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse. For more detailed information on ethical considerations for this model, please see the Model Card++ Bias, Explainability, Safety & Security, and Privacy Subcards. Please report security vulnerabilities or NVIDIA AI Concerns here.

NaNK
β€”
92,690
441

NVIDIA-Nemotron-Nano-9B-v2-Base

--- library_name: transformers license: other license_name: nvidia-open-model-license license_link: >- https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/ pipeline_tag: text-generation language: - en - es - fr - de - ja - it - pt - zh - ar - da - ko - nl - pl - ru - sv - th tags: - nvidia - pytorch datasets: - nvidia/Nemotron-Pretraining-Dataset-sample - nvidia/Nemotron-CC-v2 - nvidia/Nemotron-CC-Math-v1 - nvidia/Nemotron-Pretraining-Code-v1 - nvidia/Nemotron-P

NaNK
β€”
89,731
40

DeepSeek-R1-0528-NVFP4-v2

NaNK
license:mit
89,007
9

canary-1b-v2

NaNK
license:cc-by-4.0
79,853
302

mit-b0

SegFormer encoder fine-tuned on Imagenet-1k. It was introduced in the paper SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers by Xie et al. and first released in this repository. Disclaimer: The team releasing SegFormer did not write a model card for this model so this model card has been written by the Hugging Face team. SegFormer consists of a hierarchical Transformer encoder and a lightweight all-MLP decode head to achieve great results on semantic segmentation benchmarks such as ADE20K and Cityscapes. The hierarchical Transformer is first pre-trained on ImageNet-1k, after which a decode head is added and fine-tuned altogether on a downstream dataset. This repository only contains the pre-trained hierarchical Transformer, hence it can be used for fine-tuning purposes. You can use the model for fine-tuning of semantic segmentation. See the model hub to look for fine-tuned versions on a task that interests you. Here is how to use this model to classify an image of the COCO 2017 dataset into one of the 1,000 ImageNet classes: For more code examples, we refer to the documentation.

β€”
76,757
37

DeepSeek-R1-0528-FP4-v2

NaNK
license:mit
75,941
8

difix_ref

β€”
71,563
2

parakeet-tdt-0.6b-v3

🦜 parakeet-tdt-0.6b-v3: Multilingual Speech-to-Text Model [](#model-architecture) | [](#model-architecture) | [](#datasets) `parakeet-tdt-0.6b-v3` is a 600-million-parameter multilingual automatic...

NaNK
license:cc-by-4.0
70,498
421

NVLM-D-72B

Description This family of models performs vision-language and text-only tasks including optical character recognition, multimodal reasoning, localization, common sense reasoning, world knowledge utilization, and coding. Governing Terms: Deed - Attribution-NonCommercial 4.0 International - Creative Commons. Additional Information: LICENSE Β· Qwen/Qwen2-72B-Instruct at main for Qwen2-72B-Instruct and The MIT License – Open Source Initiative for InternViT-6B-448px-V1-2. Today (September 17th, 2024), we introduce NVLM 1.0, a family of frontier-class multimodal large language models (LLMs) that achieve state-of-the-art results on vision-language tasks, rivaling the leading proprietary models (e.g., GPT-4o) and open-access models (e.g., Llama 3-V 405B and InternVL 2). Remarkably, NVLM 1.0 shows improved text-only performance over its LLM backbone after multimodal training. In this repo, we are open-sourcing NVLM-1.0-D-72B (decoder-only architecture), the decoder-only model weights and code for the community. Reference(s) Paper   Inference Code (HF)   Training Code   Website Benchmark Results We train our model with legacy Megatron-LM and adapt the codebase to Huggingface for model hosting, reproducibility, and inference. We observe numerical differences between the Megatron and Huggingface codebases, which are within the expected range of variation. We provide the results from both the Huggingface codebase and the Megatron codebase for reproducibility and comparison with other models. Results (as of September 17th, 2024) in the multimodal benchmarks are as follows: | Benchmark | MMMU (val / test) | MathVista | OCRBench | AI2D | ChartQA | DocVQA | TextVQA | RealWorldQA | VQAv2 | |------------------------------|-------------------|-----------|----------|------|---------|--------|---------|-------------|-------| | NVLM-D 1.0 72B (Huggingface) | 58.7 / 54.9 | 65.2 | 852 | 94.2 | 86.0 | 92.6 | 82.6 | 69.5 | 85.4 | | NVLM-D 1.0 72B (Megatron) | 59.7 / 54.6 | 65.2 | 853 | 94.2 | 86.0 | 92.6 | 82.1 | 69.7 | 85.4 | | Llama 3.2 90B | 60.3 / - | 57.3 | - | 92.3 | 85.5 | 90.1 | - | - | 78.1 | | Llama 3-V 70B | 60.6 / - | - | - | 93.0 | 83.2 | 92.2 | 83.4 | - | 79.1 | | Llama 3-V 405B | 64.5 / - | - | - | 94.1 | 85.8 | 92.6 | 84.8 | - | 80.2 | | InternVL2-Llama3-76B | 55.2 / - | 65.5 | 839 | 94.8 | 88.4 | 94.1 | 84.4 | 72.2 | - | | GPT-4V | 56.8 / 55.7 | 49.9 | 645 | 78.2 | 78.5 | 88.4 | 78.0 | 61.4 | 77.2 | | GPT-4o | 69.1 / - | 63.8 | 736 | 94.2 | 85.7 | 92.8 | - | - | - | | Claude 3.5 Sonnet | 68.3 / - | 67.7 | 788 | 94.7 | 90.8 | 95.2 | - | - | - | | Gemini 1.5 Pro (Aug 2024) | 62.2 / - | 63.9 | 754 | 94.4 | 87.2 | 93.1 | 78.7 | 70.4 | 80.2 | | Tasks | Backbone LLM | MMLU | GSM8K | MATH | HumanEval | Avg. Accuracy | |------------------------------|--------------|------|-------|------|-----------|------------------| | Proprietary | | | | | | | | GPT-4.0 | N/A | 88.7 | - | 76.6 | 90.2 | - | | Gemini Pro 1.5 (Aug 2024) | N/A | 85.9 | 90.8 | 67.7 | 84.1 | 82.1 | | Claude 3.5 Sonnet | N/A | 88.7 | 96.4 | 71.1 | 92.0 | 87.0 | | Open LLM | | | | | | | | (a) Nous-Hermes-2-Yi-34B | N/A | 75.5 | 78.6 | 21.8 | 43.3 | 54.8 | | (b) Qwen-72B-Instruct | N/A | 82.3 | 91.1 | 59.7 | 86.0 | 79.8 | | (c) Llama-3-70B-Instruct | N/A | 82.0 | 93.0 | 51.0 | 81.7 | 76.6 | | (d) Llama-3.1-70B-Instruct | N/A | 83.6 | 95.1 | 68.0 | 80.5 | 81.8 | | (e) Llama-3.1-405B-Instruct | N/A | 87.3 | 96.8 | 73.8 | 89.0 | 86.7 | | Open Multimodal LLM | | | | | | | | VILA-1.5 40B | (a) | 73.3 | 67.5 | 16.8 | 34.1 | πŸ₯Ά 47.9 (-6.9) | | LLaVA-OneVision 72B | (b) | 80.6 | 89.9 | 49.2 | 74.4 | πŸ₯Ά 73.5 (-6.3) | | InternVL-2-Llama3-76B | (c) | 78.5 | 87.1 | 42.5 | 71.3 | πŸ₯Ά 69.9 (-6.7) | | Llama 3-V 70B | (d) | 83.6 | 95.1 | 68.0 | 80.5 | πŸ™‚ 81.8 (0) | | Llama 3-V 405B | (e) | 87.3 | 96.8 | 73.8 | 89.0 | πŸ™‚ 86.7 (0) | | NVLM-D 1.0 72B (Megatron) | (b) | 82.0 | 92.9 | 73.1 | 88.4 | πŸ₯³ 84.1 (+4.3) | | NVLM-D 1.0 72B (Huggingface) | (b) | 81.7 | 93.2 | 73.1 | 89.0 | πŸ₯³ 84.3 (+4.5) | The model trained on this dataset cannot regenerate its training data: 1. The model has no image generation capability since its output is only text. Hence it cannot regenerate any image it would have seen during training. 2. The model cannot regenerate training text data: during training, the model takes text and images as inputs, and the model output (text) is conditioned on both inputs. During inference, without training images as input, the models would not be able to reproduce any part of the training text data. Input Input Type(s): Text, Image Input Format(s): String, Pillow Library-Supported Formats Input Dimensions: One-Dimensional (1D), Two Dimensional (2D) Other Properties Related to Input: Maximum Token Length = 128K Tokens Output Output Type(s): Text Output Format: String Model Output: 1D Other Properties Related to Output: None When converting Megatron checkpoint to Huggingface, we adapt InternVL codebase to support model loading and multi-GPU inference in HF. We also use the tokenizer from Qwen2.5-72B-Instruct when adapting the tokenizer to Huggingface, as it contains extra special tokens for vision tasks, e.g., ` `. We train NVLM-1.0-D-72B based on the Qwen2-72B-Instruct text-only model and InternViT-6B-448px-V1-5 ViT model with our large-scale high-quality multimodal dataset. For training code, please refer to Megatron-Core. We provide a docker build file in the Dockerfile for reproduction. The docker image is based on `nvcr.io/nvidia/pytorch:23.09-py3`. Note: We observe that different transformer versions / CUDA versions / docker versions can lead to slight benchmark number differences. We recommend using the Dockerfile above for precise reproduction. The model can be loaded on multiple GPUs as follows: To test our NVLM-1.0 model on the benchmark datasets, you can use the following code: Specifically, - `--config-path eval/fulleval.yaml` file contains the evaluation configurations, including the evaluation prompt, the evaluation dataset paths, and generation hyper-parameters. - `--result-save-path path/to/evalresults/` specifies the path to save the evaluation results. - `--zero-shot-eval-tasks` specifies the tasks to evaluate on. Supported Hardware Microarchitecture Compatibility: NVIDIA Hopper Data Collection Method by dataset Hybrid: Automated, Human, Synthetic, Unknown Labeling Method by dataset Hybrid: Automated, Human, Synthetic, Unknown Properties Trained on image captions, image-text pairs, natural images, charts, documents, scene descriptions, and mathematical reasoning. Data Collection Method by dataset Hybrid: Automated, Human, Synthetic, Unknown Labeling Method by dataset Hybrid: Automated, Human, Synthetic, Unknown Properties Trained on image captions; general knowledge; image-text pairs; natural images; charts; diagrams; documents; scene descriptions; science diagrams, lessons, textbook data, and question-answer pairs; visual instruction tuning; and mathematical reasoning. Evaluation Dataset Link See Section 6.1, "Benchmark" Properties Evaluated on general knowledge, visual answering, chart understanding, table, optical character recognition, and mathematical reasoning. Correspondence to Wenliang Dai ([email protected]), Nayeon Lee ([email protected]), Boxin Wang ([email protected]), Zhuolin Yang ([email protected]), Wei Ping ([email protected]) @article{nvlm2024, title={NVLM: Open Frontier-Class Multimodal LLMs}, author={Dai, Wenliang and Lee, Nayeon and Wang, Boxin and Yang, Zhuolin and Liu, Zihan and Barker, Jon and Rintamaki, Tuomas and Shoeybi, Mohammad and Catanzaro, Bryan and Ping, Wei}, journal={arXiv preprint}, year={2024}} Ethical Considerations NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their supporting model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse. Please report security vulnerabilities or NVIDIA AI Concerns here.

NaNK
license:cc-by-nc-4.0
68,394
774

dragon-multiturn-query-encoder

Model Description We introduce Dragon-multiturn, a retriever specifically designed for the conversational QA scenario. It can handle conversational query which combine dialogue history with the current query. It is built on top of the Dragon retriever. The details of Dragon-multiturn can be found in here. Please note that Dragon-multiturn is a dual encoder consisting of a query encoder and a context encoder. This repository is only for the query encoder of Dragon-multiturn for getting the query embeddings, and you also need the context encoder to get context embeddings, which can be found here. Both query encoder and context encoder share the same tokenizer. Other Resources Llama3-ChatQA-1.5-8B   Llama3-ChatQA-1.5-70B   Evaluation Data   Training Data   Website   Paper .tg {border:none;border-collapse:collapse;border-spacing:0;} .tg td{border-style:solid;border-width:1px;font-family:Arial, sans-serif;font-size:14px;overflow:hidden; padding:10px 5px;word-break:normal;} .tg th{border-style:solid;border-width:1px;font-family:Arial, sans-serif;font-size:14px;font-weight:normal; overflow:hidden;padding:10px 5px;word-break:normal;} .tg .tg-c3ow{border-color:inherit;text-align:center;vertical-align:center} .tg .tg-0pky{border-color:inherit;text-align:left;vertical-align:center} top-1 top-5 top-1 top-5 top-1 top-5 top-1 top-5 top-5 top-20 top-5 top-20 Dragon 46.3 73.1 43.3 75.6 56.8 82.9 46.2 82.0 57.7 78.8 27.5 46.2 Dragon-multiturn 53.0 81.2 48.6 83.5 54.8 83.2 49.6 86.7 64.5 85.2 47.4 67.1 Retrieval results across five multi-turn QA datasets (Doc2Dial, QuAC, QReCC, TopiOCQA, INSCIT) with the average top-1 and top-5 recall scores. Since the average context length in TopiOCQA and INSCIT is smaller than in other datasets, we report top-5 and top-20 to roughly match the context lengths of top-1 and top-5, respectively, in those datasets. Evaluations on Multi-Turn QA Retrieval Benchmark (UPDATE!!) We evaluate multi-turn QA retrieval on five datasets: Doc2Dial, QuAC, QReCC, TopiOCQA, and INSCIT, which can be found in the ChatRAG Bench. The evaluation scripts can be found here. License Dragon-multiturn is built on top of Dragon. We refer users to the original license of the Dragon model. Dragon-multiturn is also subject to the Terms of Use. Correspondence to Zihan Liu ([email protected]), Wei Ping ([email protected]) @article{liu2024chatqa, title={ChatQA: Surpassing GPT-4 on Conversational QA and RAG}, author={Liu, Zihan and Ping, Wei and Roy, Rajarshi and Xu, Peng and Lee, Chankyu and Shoeybi, Mohammad and Catanzaro, Bryan}, journal={arXiv preprint arXiv:2401.10225}, year={2024}}

β€”
63,622
60

dragon-multiturn-context-encoder

Model Description We introduce Dragon-multiturn, a retriever specifically designed for the conversational QA scenario. It can handle conversational query which combine dialogue history with the current query. It is built on top of the Dragon retriever. The details of Dragon-multiturn can be found in here. Please note that Dragon-multiturn is a dual encoder consisting of a query encoder and a context encoder. This repository is only for the context encoder of Dragon-multiturn for getting the context embeddings, and you also need the query encoder to get query embeddings, which can be found here. Both query encoder and context encoder share the same tokenizer. Other Resources Llama3-ChatQA-1.5-8B   Llama3-ChatQA-1.5-70B   Evaluation Data   Training Data   Website   Paper .tg {border:none;border-collapse:collapse;border-spacing:0;} .tg td{border-style:solid;border-width:1px;font-family:Arial, sans-serif;font-size:14px;overflow:hidden; padding:10px 5px;word-break:normal;} .tg th{border-style:solid;border-width:1px;font-family:Arial, sans-serif;font-size:14px;font-weight:normal; overflow:hidden;padding:10px 5px;word-break:normal;} .tg .tg-c3ow{border-color:inherit;text-align:center;vertical-align:center} .tg .tg-0pky{border-color:inherit;text-align:left;vertical-align:center} top-1 top-5 top-1 top-5 top-1 top-5 top-1 top-5 top-5 top-20 top-5 top-20 Dragon 46.3 73.1 43.3 75.6 56.8 82.9 46.2 82.0 57.7 78.8 27.5 46.2 Dragon-multiturn 53.0 81.2 48.6 83.5 54.8 83.2 49.6 86.7 64.5 85.2 47.4 67.1 Retrieval results across five multi-turn QA datasets (Doc2Dial, QuAC, QReCC, TopiOCQA, INSCIT) with the average top-1 and top-5 recall scores. Since the average context length in TopiOCQA and INSCIT is smaller than in other datasets, we report top-5 and top-20 to roughly match the context lengths of top-1 and top-5, respectively, in those datasets. Evaluations on Multi-Turn QA Retrieval Benchmark (UPDATE!!) We evaluate multi-turn QA retrieval on five datasets: Doc2Dial, QuAC, QReCC, TopiOCQA, and INSCIT, which can be found in the ChatRAG Bench. The evaluation scripts can be found here. License Dragon-multiturn is built on top of Dragon. We refer users to the original license of the Dragon model. Dragon-multiturn is also subject to the Terms of Use. Correspondence to Zihan Liu ([email protected]), Wei Ping ([email protected]) @article{liu2024chatqa, title={ChatQA: Surpassing GPT-4 on Conversational QA and RAG}, author={Liu, Zihan and Ping, Wei and Roy, Rajarshi and Xu, Peng and Lee, Chankyu and Shoeybi, Mohammad and Catanzaro, Bryan}, journal={arXiv preprint arXiv:2401.10225}, year={2024}}

β€”
63,619
24

segformer-b4-finetuned-cityscapes-1024-1024

SegFormer (b4-sized) model fine-tuned on CityScapes SegFormer model fine-tuned on CityScapes at resolution 1024x1024. It was introduced in the paper SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers by Xie et al. and first released in this repository. Disclaimer: The team releasing SegFormer did not write a model card for this model so this model card has been written by the Hugging Face team. SegFormer consists of a hierarchical Transformer encoder and a lightweight all-MLP decode head to achieve great results on semantic segmentation benchmarks such as ADE20K and Cityscapes. The hierarchical Transformer is first pre-trained on ImageNet-1k, after which a decode head is added and fine-tuned altogether on a downstream dataset. You can use the raw model for semantic segmentation. See the model hub to look for fine-tuned versions on a task that interests you. Here is how to use this model to classify an image of the COCO 2017 dataset into one of the 1,000 ImageNet classes: For more code examples, we refer to the documentation.

β€”
57,310
7

segformer-b2-finetuned-ade-512-512

SegFormer model fine-tuned on ADE20k at resolution 512x512. It was introduced in the paper SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers by Xie et al. and first released in this repository. Disclaimer: The team releasing SegFormer did not write a model card for this model so this model card has been written by the Hugging Face team. SegFormer consists of a hierarchical Transformer encoder and a lightweight all-MLP decode head to achieve great results on semantic segmentation benchmarks such as ADE20K and Cityscapes. The hierarchical Transformer is first pre-trained on ImageNet-1k, after which a decode head is added and fine-tuned altogether on a downstream dataset. You can use the raw model for semantic segmentation. See the model hub to look for fine-tuned versions on a task that interests you. Here is how to use this model to classify an image of the COCO 2017 dataset into one of the 1,000 ImageNet classes: For more code examples, we refer to the documentation.

β€”
53,241
5

MambaVision-S-1K

β€”
45,103
9

Cosmos-Transfer2.5-2B

NaNK
β€”
44,180
17

Llama-3.3-70B-Instruct-FP8

NaNK
llama
42,089
11

omni-embed-nemotron-3b

NV-QwenOmni-Embed-3B-v1 is a versatile multimodal embedding model capable of encoding content across multiple modalities, including text, image, audio, and video, either individually or in combinat...

NaNK
β€”
41,544
68

Llama-3.3-70B-Instruct-NVFP4

NaNK
llama
39,237
24

Llama-3.3-70B-Instruct-FP4

Description: The NVIDIA Llama 3.3 70B Instruct FP4 model is the quantized version of the Meta's Llama 3.3 70B Instruct model, which is an auto-regressive language model that uses an optimized transformer architecture. For more information, please check here. The NVIDIA Llama 3.3 70B Instruct FP4 model is quantized with TensorRT Model Optimizer. This model is ready for commercial/non-commercial use. Third-Party Community Consideration This model is not owned or developed by NVIDIA. This model has been developed and built to a third-party’s requirements for this application and use case; see link to Non-NVIDIA (Meta-Llama-3.3-70B-Instruct) Model Card. Model Architecture: Architecture Type: Transformers Network Architecture: Llama3.3 Input: Input Type(s): Text Input Format(s): String Input Parameters: 1D (One Dimensional): Sequences Other Properties Related to Input: Context length up to 128K Output: Output Type(s): Text Output Format: String Output Parameters: 1D (One Dimensional): Sequences Other Properties Related to Output: N/A Software Integration: Supported Runtime Engine(s): Tensor(RT)-LLM Supported Hardware Microarchitecture Compatibility: NVIDIA Blackwell Model Version(s): The model is quantized with nvidia-modelopt v0.23.0 Datasets: Calibration Dataset: cnndailymail Data collection method: Automated. Labeling method: Unknown. Inference: Engine: Tensor(RT)-LLM Test Hardware: B200 Post Training Quantization This model was obtained by quantizing the weights and activations of Meta-Llama-3.3-70B-Instruct to FP4 data type, ready for inference with TensorRT-LLM. Only the weights and activations of the linear operators within transformers blocks are quantized. This optimization reduces the number of bits per parameter from 16 to 4, reducing the disk size and GPU memory requirements by approximately 3.3x. To deploy the quantized checkpoint with TensorRT-LLM LLM API, follow the sample codes below: Please refer to the TensorRT-LLM llm-api documentation for more details. Evaluation The accuracy benchmark results are presented in the table below: NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse. Please report security vulnerabilities or NVIDIA AI Concerns here.

NaNK
llama
36,481
24

Llama-3.1-Nemotron-Nano-4B-v1.1

NaNK
llama
32,839
109

Llama-3.1-8B-Instruct-FP8

Description: The NVIDIA Llama 3.1 8B Instruct FP8 model is the quantized version of the Meta's Llama 3.1 8B Instruct model, which is an auto-regressive language model that uses an optimized transformer architecture. For more information, please check here. The NVIDIA Llama 3.1 8B Instruct FP8 model is quantized with TensorRT Model Optimizer. This model is ready for commercial and non-commercial use. Third-Party Community Consideration This model is not owned or developed by NVIDIA. This model has been developed and built to a third-party’s requirements for this application and use case; see link to Non-NVIDIA (Meta-Llama-3.1-8B-Instruct) Model Card. Model Architecture: Architecture Type: Transformers Network Architecture: Llama3.1 Input: Input Type(s): Text Input Format(s): String Input Parameters: Sequences Other Properties Related to Input: Context length up to 128K Output: Output Type(s): Text Output Format: String Output Parameters: Sequences Other Properties Related to Output: N/A Software Integration: Supported Runtime Engine(s): Tensor(RT)-LLM vLLM Supported Hardware Microarchitecture Compatibility: NVIDIA Blackwell NVIDIA Hopper NVIDIA Lovelace Model Version(s): The model is quantized with nvidia-modelopt v0.27.0 Datasets: Calibration Dataset: cnndailymail Evaluation Dataset: MMLU Inference: Engine: Tensor(RT)-LLM or vLLM Test Hardware: H100 Post Training Quantization This model was obtained by quantizing the weights and activations of Meta-Llama-3.1-8B-Instruct to FP8 data type, ready for inference with TensorRT-LLM and vLLM. Only the weights and activations of the linear operators within transformers blocks are quantized. This optimization reduces the number of bits per parameter from 16 to 8, reducing the disk size and GPU memory requirements by approximately 50%. On H100, we achieved 1.3x speedup. To deploy the quantized checkpoint with TensorRT-LLM, follow the sample commands below with the TensorRT-LLM GitHub repo: Please refer to the TensorRT-LLM benchmarking documentation for details. We benchmarked with tensorrt-llm v0.13 on 8 H100 GPUs, using batch size 1024 for the throughputs with in-flight batching enabled. We achieved ~1.3x speedup with FP8. Deploy with vLLM To deploy the quantized checkpoint with vLLM, follow the instructions below: 1. Install vLLM from directions here. 2. To use a Model Optimizer PTQ checkpoint with vLLM, `quantization=modelopt` flag must be passed into the config while initializing the `LLM` Engine. This model can be deployed with an OpenAI Compatible Server via the vLLM backend. Instructions here.

NaNK
llama
31,062
30

RADIO-L

β€”
28,284
10

gpt-oss-120b-Eagle3

Description: The NVIDIA gpt-oss-120b Eagle model is the Eagle head of the OpenAI’s gpt-oss-120b model, which is an auto-regressive language model that uses a mixture-of-experts (MoE) architecture with 5 billion activated parameters and 120 billion total parameters. For more information, please check here. The NVIDIA gpt-oss-120b Eagle3 model incorporates Eagle speculative decoding with TensorRT Model Optimizer. This model is ready for commercial/non-commercial use. Note For use cases of less than 8k context length - please consider using gpt-oss-120b-Eagle3-v2 Use Case: Developers designing AI Agent systems, chatbots, RAG systems, and other AI-powered applications. Also suitable for typical instruction-following tasks. Release Date: Huggingface: Aug 20th, 2025 via [https://huggingface.co/nvidia/gpt-oss-120b-Eagle3] Model Architecture: Architecture Type: Transformers Network Architecture: gpt-oss-120b ##Computational Load Cumulative Compute: 4.8x10^20 Estimated Energy and Emissions for Model Training: Total kWh = 2500 Total Emissions (tCO2e) = 0.8075 Input: Input Type(s): Text Input Format(s): String Input Parameters: One Dimensional (1D): Sequences Output: Output Type(s): Text Output Format: String Output Parameters: One-Dimensional (1D): Sequences Our AI models are designed and/or optimized to run on NVIDIA GPU-accelerated systems. By leveraging NVIDIA’s hardware (e.g. GPU cores) and software frameworks (e.g., CUDA libraries), the model achieves faster training and inference times compared to CPU-only solutions. Software Integration: Supported Runtime Engine(s): TensorRT-LLM Supported Hardware Microarchitecture Compatibility: NVIDIA Blackwell The integration of foundation and fine-tuned models into AI systems requires additional testing using use-case-specific data to ensure safe and effective deployment. Following the V-model methodology, iterative testing and validation at both unit and system levels are essential to mitigate risks, meet technical and functional requirements, and ensure compliance with safety and ethical standards before deployment. Model Version(s): The model is quantized with nvidia-modelopt v0.35.0 The total size (in number of data points) 503.3K Total number of datasets 2 Dataset partition: Training 100% Link: ultrachat200k and Magpie-Llama-3.1-Pro-300K-Filtered, only prompts from the datasets were used for data synthesis, (the original responses from GPT were not used) for data synthesis, which is then used to train the Eagle modules. Click the links above for more information regarding the dataset. Data Collection Method by dataset Hybrid: Synthetic, Human, Automated Labeling Method by dataset Hybrid: Synthetic, Human, Automated Properties: 500K samples, majority synthetic, others sourced from commercially-friendly datasets. Evaluation Dataset: Link: MTBench, for more details, see here Data Collection Method by dataset Hybrid: Human, Synthetic Labeling Method by dataset Hybrid: Human, Synthetic Properties: 3,300 multi-turn dialogue sequences, each annotated with expert preference votes. Eagle Speculative Decoding Synthesized data was obtained from OpenAI's gpt-oss-120b model, which is then used to finetune the Eagle modules. This model is ready for inference with TensorRT-LLM in Eagle speculative decoding mode. Eagle modules are used to predict candidate tokens beyond the next token. In the generation step, each forward Eagle module generates a distribution of tokens beyond the previous. Then, a tree-based attention mechanism samples some candidate sequences for the original model to validate. The longest accepted candidate sequence is selected so that more than 1 token is returned in the generation step. The number of tokens generated in each step is called acceptance rate. To serve the checkpoint with TensorRT-LLM, follow the sample commands below with the TensorRT-LLM GitHub repo: Evaluation The Eagle acceptance rate benchmark results (MT-Bench) with draft length 3 are presented in the table below for medium reasoning: | Category | MT Bench Acceptance Rate | |:-----------|:------------------------:| | writing | 2.24 | | roleplay | 2.25 | | reasoning | 2.47 | | math | 2.83 | | coding | 2.51 | | extraction | 2.53 | | stem | 2.17 | | humanities | 1.95 | NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse. For more detailed information on ethical considerations for this model, please see the Model Card++ Explainability, Bias, Safety & Security, and Privacy Subcards. Please report security vulnerabilities or NVIDIA AI Concerns here. SUBCARDS: |Field:|Response:| |:---:|:---:| |Intended Application(s) & Domain(s):| Text generation, reasoning, summarization, and question answering. | |Model Type: |Text and Image-to-text transformer | |Intended Users:|This model is intended for developers, researchers, and customers building/utilizing LLMs, while balancing accuracy and efficiency.| |Output:|Text String(s)| |Describe how the model works:|Generates text by predicting the next word or token based on the context provided in the input sequence using multiple self-attention layers| |Technical Limitations:| The model was trained on data that contains toxic language and societal biases originally crawled from the internet. Therefore, the model may amplify those biases and return toxic responses especially when prompted with toxic prompts. Therefore, before deploying any applications of this model, developers should perform safety testing and tuning tailored to their specific applications of the model.| |Verified to have met prescribed quality standards?|Yes| |Performance Metrics:|Accuracy, Throughput, and user-side throughput| |Potential Known Risk| The model may generate answers that may be inaccurate, omit key information, or include irrelevant or redundant text producing socially unacceptable or undesirable text, even if the prompt itself does not include anything explicitly offensive. | |Licensing:| Your usage is governed by the following license | |Field:|Response:| |:---:|:---:| |Participation considerations from adversely impacted groups (protected classes) in model design and testing:|None| |Measures taken to mitigate against unwanted bias:|None| |Field:|Response:| |:---:|:---:| |Model Application(s):|Chat, Instruction Following, Chatbot Development, Code Generation, Reasoning| |Describe life critical application (if present):|None Known| |Use Case Restrictions:|Abide by the license | |Model and Dataset Restrictions:|The Principle of least privilege (PoLP) is applied limiting access for dataset generation. Restrictions enforce dataset access during training, and dataset license constraints adhered to. Model checkpoints are made available on Hugging Face, and may become available on cloud providers' model catalog.| |Field:|Response:| |:---:|:---:| |Generatable or Reverse engineerable personal data?|None| |Was consent obtained for any personal data used?|None Known| |Personal data used to create this model?|None Known| |How often is dataset reviewed?|Before Release| |Is there provenance for all datasets used in training?|Yes| |Does data labeling (annotation, metadata) comply with privacy laws?|Yes| |Applicable NVIDIA Privacy Policy|https://www.nvidia.com/en-us/about-nvidia/privacy-policy/|

NaNK
llama
28,072
31

Aegis-AI-Content-Safety-LlamaGuard-Defensive-1.0

Model Card License The use of this model is governed by the Llama 2 Community License Agreement. Model Details Llama Nemotron Safety Guard Defensive V1, formerly known as Aegis-AI-Content-Safety-LlamaGuard-LLM-Defensive-1.0, is an LLM content safety model. It is a parameter efficient instruction tuned version of Llama Guard based on Llama2-7B trained on Nvidia's content safety dataset Nemotron Content Safety Dataset V1 covering Nvidia's broad taxonomy of 13 critical safety risk categories. Paper Details: Aegis 1.0: Online Adaptive AI Content Safety Moderation with Ensemble of LLM Experts Model Description The Llama-2-Nemotron-Safety-Guard-Defensive-7B-v1 model involves the following: 1. System instruction including the safety taxonomy, a safety policy with inclusions and, exclusions. 2. The system prompt instructs the LLM to moderate user prompt, partial dialog or full dialog. 3. The LLM response is a string which can be either safe or unsafe. If the string generated by the LLM is "unsafe", on a new line, the category ID of violation is output by the LLM based on the policy in the system prompt. 4. Novel safety risk categories and policy can be provided in the instruction for the model to categorize using the novel taxonomy and policy. 5. The safety taxonomy and policy used to train the models contain 13 critically unsafe risk categories, a safe category and a "needs caution" category. 6. Internally annotated dataset called Nemotron Content Safety Dataset V1 of approximately 11,000 prompts and responses are used to instruction tune the model. Annotations are at dialog level not per turn. We have since collected in total 30,000 annotations on a further expanded taxonomy and future versions of the models will be trained on the full set. The annotations are at dialog level instead of per-turn level. 7. Model is instruction tuned with safety instruction, with the LLM behaving as a classifier in this setting. PLEASE NOTE: Model has only been trained to perform prompt classification since the annotations were not available at turn level. If you wish to use the model for response classification, use the template as provided below. - Developed by: Shaona Ghosh, Nvidia - Model type: Instruction tuned LLama2-7B - License: Llama 2 - Finetuned from model: Llama Guard Uses Ethical use: Technology can have a profound impact on people and the world, and NVIDIA is committed to enabling trust and transparency in AI development. NVIDIA encourages users to adopt principles of AI ethics and trustworthiness to guide your business decisions by following the guidelines in the Llama 2 Community License Agreement. - The Llama-2-Nemotron-Safety-Guard-Defensive-7B-v1 model is for users who wants to safeguard or evaluate a general purpose LLM's generated content The Principle of least privilege (PoLP) is applied limiting access for dataset generation and model development. Restrictions enforce dataset access during training, and dataset license constraints adhered to. Downstream Use - Alternatively, the model can be used for performing toxicity classification for any text content such as pre-training data not exclusively limited to human-LLM interaction data - The model can be finetuned further with custom safety policy and taxonomies. - Different adapter weights (used in conjunction with this model) can be used to enforce different safety tolerance. Given the nature of the work, the model has been trained on critically unsafe data that includes social biases to be able to categorize the safety risks based on a broad safety risk taxonomy. However, - Even though we have performed exhaustive evaluation, occasionally, the model can make errors in predicting the unsafe category. - Even though, we have internally red teamed the model (please see paper for details), the safety guardrails of the model can be bypassed by adversarial prompts and the underlying LLM may be prompted to generate unsafe text. Bias Field | Response :---------------------------------------------------------------------------------------------------|:--------------------------------------------------------------------------------- Participation considerations from adversely impacted groups (protected classes) in model design and testing: | None of the Above Measures taken to mitigate against unwanted bias: | None of the Above Privacy Field | Response :----------------------------------------------------------------------------------------------------------------------------------|:----------------------------------------------- Generatable or reverse engineerable personally-identifiable information (PII)? | None Was consent obtained for any PII used? | Not Applicable PII used to create this model? | None Known How often is dataset reviewed? | During dataset creation, model training, evaluation and before release Is a mechanism in place to honor data subject right of access or deletion of personal data? | Not Applicable If PII collected for the development of the model, was it collected directly by NVIDIA? | Not Applicable If PII collected for the development of the model by NVIDIA, do you maintain or have access to disclosures made to data subjects? | Not Applicable If PII collected for the development of this AI model, was it minimized to only what was required? | Not Applicable Is there provenance for all datasets used in training? | Yes Does data labeling (annotation, metadata) comply with privacy laws? | Yes Is data compliant with data subject requests for data correction or removal, if such a request was made? | Not Applicable We recommend users to monitor for the above risks before deploying the models. If you notice any concerns, please report to us immediately. How to Get Started with the Model - Download the original Llama Guard weights from Llama Guard after requesting access. - Use transformers PEFT library for loading the adapter weights from this repository. How To Use in NVIDIA NeMo Curator NeMo Curator improves generative AI model accuracy by processing text, image, and video data at scale for training and customization. It also provides pre-built pipelines for generating synthetic data to customize and evaluate generative AI systems. The inference code for this model is available through the NeMo Curator GitHub repository. Check out this example notebook to get started. Training Data The model has been trained on Nvidia's Nemotron Content Safety Dataset V1 Human Prompts from Anthropic RLHF harmless dataset Anthropic RLHF LLM response generated from Mistral-7B-v0.1 Mistral-7B-v0.1 Trained on approximately 10,800 user prompts, user prompts and LLM response single turn, user prompts and LLM response muliple turns. Training Hyperparameters rank 16 alpha 32 Num of nodes 1 Num of GPUs per node 8 Learning rate 1e-06 We use the PEFT library from Hugging Face and the training and validation code from the Llama recipes repository. We use FSDP during training. The model has been evaluated on the following benchmarks: Test partition of Nvidia's content safety dataset Nemotron Content Safety Dataset V1 Toxic Chat Dataset Open AI Moderation Dataset SimpleSafetyTests Benchmark Metrics We report F1 and AUPRC scores for the model on the evaluation benchmarks. Results on the Nemotron Content Safety V1 Test Set Model | AUPRC | F1 | ------------ |:-----------: |-----------: | Llama Guard Base |0.930 |0.62 | OpenAI Mod API |0.895 |0.34 | Perspective API |0.860 |0.24 | Llama Guard Defensive |0.941 |0.85 | Results on Toxic Chat Dataset Model | AUPRC | F1 | ------------ |:-----------: |-----------: | Llama Guard Base |0.664 |0.58 | OpenAI Mod API |0.588 |- | Perspective API |0.532 |- | Llama Guard Defensive |0.699 |0.64 | Results on Open AI Moderation Dataset Model | AUPRC | F1 | ------------ |:-----------: |-----------: | Llama Guard Base |0.845 |0.76 | OpenAI Mod API |0.856 |- | Perspective API |0.787 |- | Llama Guard Defensive |0.844 |0.68 | Results on Simple Safety Tests Benchmark Model | Accuracy | ------------ |:-----------: | Llama Guard Base |87% | Perspective API |72% | GPT4 |89% | Llama Guard Defensive |100% | Compute Infrastructure Supported Hardware: H100, A100 80GB, A100 40GB

license:llama2
28,058
25

DeepSeek-V3-0324-NVFP4

NaNK
license:mit
27,522
12

OpenReasoning-Nemotron-7B

Description: OpenReasoning-Nemotron-7B is a large language model (LLM) which is a derivative of Qwen2.5-7B (AKA the reference model). It is a reasoning model that is post-trained for reasoning about math, code and science solution generation. We evaluated this model with up to 64K output tokens. The OpenReasoning model is available in the following sizes: 1.5B, 7B and 14B and 32B. This model is ready for commercial/non-commercial research use. License/Terms of Use: GOVERNING TERMS: Use of the models listed above are governed by the Creative Commons Attribution 4.0 International License (CC-BY-4.0). ADDITIONAL INFORMATION: Apache 2.0 License Our models demonstrate exceptional performance across a suite of challenging reasoning benchmarks. The 7B, 14B, and 32B models consistently set new state-of-the-art records for their size classes. | Model | AritificalAnalysisIndex | GPQA | MMLU-PRO | HLE | LiveCodeBench | SciCode | AIME24 | AIME25 | HMMT FEB 25 | | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | | 1.5B| 31.0 | 31.6 | 47.5 | 5.5 | 28.6 | 1.0 | 55.5 | 45.6 | 31.5 | | 7B | 54.7 | 61.1 | 71.9 | 8.3 | 63.3 | 20.3 | 84.7 | 78.2 | 63.5 | | 14B | 60.9 | 71.6 | 77.5 | 10.1 | 67.8 | 32.4 | 87.8 | 82.0 | 71.2 | | 32B | 64.3 | 73.1 | 80.0 | 11.9 | 70.2 | 39.6 | 89.2 | 84.0 | 73.8 | \ This is our estimation of the Artificial Analysis Intelligence Index, not an official score. Combining the work of multiple agents OpenReasoning-Nemotron models can be used in a "heavy" mode by starting multiple parallel generations and combining them together via generative solution selection (GenSelect). To add this "skill" we follow the original GenSelect training pipeline except we do not train on the selection summary but use the full reasoning trace of DeepSeek R1 0528 671B instead. We only train models to select the best solution for math problems but surprisingly find that this capability directly generalizes to code and science questions! With this "heavy" GenSelect inference mode, OpenReasoning-Nemotron-32B model surpasses O3 (High) on math and coding benchmarks. | Model | Pass@1 (Avg@64) | Majority@64 | GenSelect | | :--- | :--- | :--- | :--- | | 1.5B | | | | | AIME24 | 55.5 | 76.7 | 76.7 | | AIME25 | 45.6 | 70.0 | 70.0 | | HMMT Feb 25 | 31.5 | 46.7 | 53.3 | | 7B | | | | | AIME24 | 84.7 | 93.3 | 93.3 | | AIME25 | 78.2 | 86.7 | 93.3 | | HMMT Feb 25 | 63.5 | 83.3 | 90.0 | | LCB v6 2408-2505 | 63.4 | n/a | 67.7 | | 14B | | | | | AIME24 | 87.8 | 93.3 | 93.3 | | AIME25 | 82.0 | 90.0 | 90.0 | | HMMT Feb 25 | 71.2 | 86.7 | 93.3 | | LCB v6 2408-2505 | 67.9 | n/a | 69.1 | | 32B | | | | | AIME24 | 89.2 | 93.3 | 93.3 | | AIME25 | 84.0 | 90.0 | 93.3 | | HMMT Feb 25 | 73.8 | 86.7 | 96.7 | | LCB v6 2408-2505 | 70.2 | n/a | 75.3 | | HLE | 11.8 | 13.4 | 15.5 | python for just the final solution code block with the following format: Math generation prompt prompt = """Solve the following math problem. Make sure to put the answer (and only answer) inside \\boxed{}. {user} """ Science generation prompt You can refer to prompts here - https://github.com/NVIDIA/NeMo-Skills/blob/main/nemoskills/prompt/config/generic/hle.yaml (HLE) https://github.com/NVIDIA/NeMo-Skills/blob/main/nemoskills/prompt/config/eval/aai/mcq-4choices-boxed.yaml (for GPQA) https://github.com/NVIDIA/NeMo-Skills/blob/main/nemoskills/prompt/config/eval/aai/mcq-10choices-boxed.yaml (MMLU-Pro) messages = [ { "role": "user", "content": prompt.format(user="Write a program to calculate the sum of the first $N$ fibonacci numbers")}, ] outputs = pipeline( messages, maxnewtokens=64000, ) print(outputs[0]["generatedtext"][-1]['content']) @article{ahmad2025opencodereasoning, title={{OpenCodeReasoning: Advancing Data Distillation for Competitive Coding}}, author={Wasi Uddin Ahmad, Sean Narenthiran, Somshubra Majumdar, Aleksander Ficek, Siddhartha Jain, Jocelyn Huang, Vahid Noroozi, Boris Ginsburg}, year={2025}, eprint={2504.01943}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2504.01943}, } @misc{ahmad2025opencodereasoningiisimpletesttime, title={{OpenCodeReasoning-II: A Simple Test Time Scaling Approach via Self-Critique}}, author={Wasi Uddin Ahmad and Somshubra Majumdar and Aleksander Ficek and Sean Narenthiran and Mehrzad Samadi and Jocelyn Huang and Siddhartha Jain and Vahid Noroozi and Boris Ginsburg}, year={2025}, eprint={2507.09075}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2507.09075}, } @misc{moshkov2025aimo2winningsolutionbuilding, title={{AIMO-2 Winning Solution: Building State-of-the-Art Mathematical Reasoning Models with OpenMathReasoning dataset}}, author={Ivan Moshkov and Darragh Hanley and Ivan Sorokin and Shubham Toshniwal and Christof Henkel and Benedikt Schifferer and Wei Du and Igor Gitman}, year={2025}, eprint={2504.16891}, archivePrefix={arXiv}, primaryClass={cs.AI}, url={https://arxiv.org/abs/2504.16891}, } @inproceedings{toshniwal2025genselect, title={{GenSelect: A Generative Approach to Best-of-N}}, author={Shubham Toshniwal and Ivan Sorokin and Aleksander Ficek and Ivan Moshkov and Igor Gitman}, booktitle={2nd AI for Math Workshop @ ICML 2025}, year={2025}, url={https://openreview.net/forum?id=8LhnmNmUDb} } ``` Use Case: This model is intended for developers and researchers who work on competitive math, code and science problems. It has been trained via only supervised fine-tuning to achieve strong scores on benchmarks. Release Date: Huggingface [07/16/2025] via https://huggingface.co/nvidia/OpenReasoning-Nemotron-7B/ Reference(s): [2504.01943] OpenCodeReasoning: Advancing Data Distillation for Competitive Coding [2504.01943] OpenCodeReasoning: Advancing Data Distillation for Competitive Coding [2504.16891] AIMO-2 Winning Solution: Building State-of-the-Art Mathematical Reasoning Models with OpenMathReasoning dataset Model Architecture: Architecture Type: Dense decoder-only Transformer model Network Architecture: Qwen2.5-7B This model was developed based on Qwen2.5-7B and has 7B model parameters. OpenReasoning-Nemotron-1.5B was developed based on Qwen2.5-1.5B and has 1.5B model parameters. OpenReasoning-Nemotron-7B was developed based on Qwen2.5-7B and has 7B model parameters. OpenReasoning-Nemotron-14B was developed based on Qwen2.5-14B and has 14B model parameters. OpenReasoning-Nemotron-32B was developed based on Qwen2.5-32B and has 32B model parameters. Input: Input Type(s): Text Input Format(s): String Input Parameters: One-Dimensional (1D) Other Properties Related to Input: Trained for up to 64,000 output tokens Output: Output Type(s): Text Output Format: String Output Parameters: One-Dimensional (1D) Other Properties Related to Output: Trained for up to 64,000 output tokens Our AI models are designed and/or optimized to run on NVIDIA GPU-accelerated systems. By leveraging NVIDIA’s hardware (e.g. GPU cores) and software frameworks (e.g., CUDA libraries), the model achieves faster training and inference times compared to CPU-only solutions. Software Integration : Runtime Engine: NeMo 2.3.0 Recommended Hardware Microarchitecture Compatibility: NVIDIA Ampere NVIDIA Hopper Preferred/Supported Operating System(s): Linux Model Version(s): 1.0 (7/16/2025) OpenReasoning-Nemotron-32B OpenReasoning-Nemotron-14B OpenReasoning-Nemotron-7B OpenReasoning-Nemotron-1.5B The training corpus for OpenReasoning-Nemotron-7B is comprised of questions from OpenCodeReasoning dataset, OpenCodeReasoning-II, OpenMathReasoning, and the Synthetic Science questions from the Llama-Nemotron-Post-Training-Dataset. All responses are generated using DeepSeek-R1-0528. We also include the instruction following and tool calling data from Llama-Nemotron-Post-Training-Dataset without modification. Data Collection Method: Hybrid: Automated, Human, Synthetic Labeling Method: Hybrid: Automated, Human, Synthetic Properties: 5M DeepSeek-R1-0528 generated responses from OpenCodeReasoning questions (https://huggingface.co/datasets/nvidia/OpenCodeReasoning), OpenMathReasoning, and the Synthetic Science questions from the Llama-Nemotron-Post-Training-Dataset. We also include the instruction following and tool calling data from Llama-Nemotron-Post-Training-Dataset without modification. Evaluation Dataset: We used the following benchmarks to evaluate the model holistically. Data Collection Method: Hybrid: Automated, Human, Synthetic Labeling Method: Hybrid: Automated, Human, Synthetic Ethical Considerations: NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse. For more detailed information on ethical considerations for this model, please see the Model Card++ Explainability, Bias, Safety & Security, and Privacy Subcards. Please report model quality, risk, security vulnerabilities or NVIDIA AI Concerns here.

NaNK
license:cc-by-4.0
27,482
46

DeepSeek-V3-0324-FP4

NaNK
license:mit
27,281
12

parakeet-tdt_ctc-1.1b

NaNK
license:cc-by-4.0
26,836
19

Llama-4-Scout-17B-16E-Instruct-NVFP4

NaNK
llama4
26,339
5

Llama-3.1-Nemotron-70B-Instruct-HF

Llama-3.1-Nemotron-70B-Instruct is a large language model customized by NVIDIA to improve the helpfulness of LLM generated responses to user queries. This model reaches Arena Hard of 85.0, AlpacaEval 2 LC of 57.6 and GPT-4-Turbo MT-Bench of 8.98, which are known to be predictive of LMSys Chatbot Arena Elo As of 1 Oct 2024, this model is #1 on all three automatic alignment benchmarks (verified tab for AlpacaEval 2 LC), edging out strong frontier models such as GPT-4o and Claude 3.5 Sonnet. As of Oct 24th, 2024 the model has Elo Score of 1267(+-7), rank 9 and style controlled rank of 26 on ChatBot Arena leaderboard. This model was trained using RLHF (specifically, REINFORCE), Llama-3.1-Nemotron-70B-Reward and HelpSteer2-Preference prompts on a Llama-3.1-70B-Instruct model as the initial policy. Llama-3.1-Nemotron-70B-Instruct-HF has been converted from Llama-3.1-Nemotron-70B-Instruct to support it in the HuggingFace Transformers codebase. Please note that evaluation results might be slightly different from the Llama-3.1-Nemotron-70B-Instruct as evaluated in NeMo-Aligner, which the evaluation results below are based on. Try hosted inference for free at build.nvidia.com - it comes with an OpenAI-compatible API interface. See details on our paper at https://arxiv.org/abs/2410.01257 - as a preview, this model can correctly the question without specialized prompting or additional reasoning tokens: Note: This model is a demonstration of our techniques for improving helpfulness in general-domain instruction following. It has not been tuned for performance in specialized domains such as math. License Your use of this model is governed by the NVIDIA Open Model License. Additional Information: Llama 3.1 Community License Agreement. Built with Llama. As of 1 Oct 2024, Llama-3.1-Nemotron-70B-Instruct performs best on Arena Hard, AlpacaEval 2 LC (verified tab) and MT Bench (GPT-4-Turbo) | Model | Arena Hard | AlpacaEval | MT-Bench | Mean Response Length | |:-----------------------------|:----------------|:-----|:----------|:-------| |Details | (95% CI) | 2 LC (SE) | (GPT-4-Turbo) | (# of Characters for MT-Bench)| | Llama-3.1-Nemotron-70B-Instruct | 85.0 (-1.5, 1.5) | 57.6 (1.65) | 8.98 | 2199.8 | | Llama-3.1-70B-Instruct | 55.7 (-2.9, 2.7) | 38.1 (0.90) | 8.22 | 1728.6 | | Llama-3.1-405B-Instruct | 69.3 (-2.4, 2.2) | 39.3 (1.43) | 8.49 | 1664.7 | | Claude-3-5-Sonnet-20240620 | 79.2 (-1.9, 1.7) | 52.4 (1.47) | 8.81 | 1619.9 | | GPT-4o-2024-05-13 | 79.3 (-2.1, 2.0) | 57.5 (1.47) | 8.74 | 1752.2 | You can use the model using HuggingFace Transformers library with 2 or more 80GB GPUs (NVIDIA Ampere or newer) with at least 150GB of free disk space to accomodate the download. This code has been tested on Transformers v4.44.0, torch v2.4.0 and 2 A100 80GB GPUs, but any setup that supports should support this model as well. If you run into problems, you can consider doing . NeMo Aligner HelpSteer2-Preference HelpSteer2 Introducing Llama 3.1: Our most capable models to date Meta's Llama 3.1 Webpage Meta's Llama 3.1 Model Card Model Architecture: Architecture Type: Transformer Network Architecture: Llama 3.1 Input: Input Type(s): Text Input Format: String Input Parameters: One Dimensional (1D) Other Properties Related to Input: Max of 128k tokens Output: Output Type(s): Text Output Format: String Output Parameters: One Dimensional (1D) Other Properties Related to Output: Max of 4k tokens Software Integration: Supported Hardware Microarchitecture Compatibility: NVIDIA Ampere NVIDIA Hopper NVIDIA Turing Supported Operating System(s): Linux Alignment methodology REINFORCE implemented in NeMo Aligner Data Collection Method by dataset [Hybrid: Human, Synthetic] Properties (Quantity, Dataset Descriptions, Sensor(s)): 21, 362 prompt-responses built to make more models more aligned with human preference - specifically more helpful, factually-correct, coherent, and customizable based on complexity and verbosity. 20, 324 prompt-responses used for training and 1, 038 used for validation. Ethical Considerations: NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their supporting model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse. For more detailed information on ethical considerations for this model, please see the Model Card++ Explainability, Bias, Safety & Security, and Privacy Subcards. Please report security vulnerabilities or NVIDIA AI Concerns here. Please report security vulnerabilities or NVIDIA AI Concerns here. If you find this model useful, please cite the following works

NaNK
llama
26,324
2,057

Llama-4-Scout-17B-16E-Instruct-FP4

NaNK
llama4
24,814
5

Cosmos-Guardrail1

β€”
23,631
8

stt_it_fastconformer_hybrid_large_pc

license:cc-by-4.0
22,622
2

Cosmos-Predict2.5-2B

NaNK
β€”
22,573
22

llama-nemotron-embed-1b-v2

NaNK
llama_bidirec
20,854
19

segformer-b0-finetuned-cityscapes-1024-1024

β€”
19,269
10

diar_streaming_sortformer_4spk-v2

This model is a streaming version of Sortformer diarizer. Sortformer[1] is a novel end-to-end neural model for speaker diarization, trained with unconventional objectives compared to existing end-to-end diarization models. Streaming Sortformer[2] employs an Arrival-Order Speaker Cache (AOSC) to store frame-level acoustic embeddings of previously observed speakers. Sortformer resolves permutation problem in diarization following the arrival-time order of the speech segments from each speaker. Streaming sortformer employs pre-encode layer in the Fast-Conformer to generate speaker-cache. At each step, speaker cache is filtered to only retain the high-quality speaker cache vectors. Aside from speaker-cache management part, streaming Sortformer follows the architecture of the offline version of Sortformer. Sortformer consists of an L-size (17 layers) NeMo Encoder for Speech Tasks (NEST)[3] which is based on Fast-Conformer[4] encoder. Following that, an 18-layer Transformer[5] encoder with hidden size of 192, and two feedforward layers with 4 sigmoid outputs for each frame input at the top layer. More information can be found in the Streaming Sortformer paper[2]. To train, fine-tune or perform diarization with Sortformer, you will need to install NVIDIA NeMo[6]. We recommend you install it after you've installed Cython and latest PyTorch version. The model is available for use in the NeMo Framework[6], and can be used as a pre-trained checkpoint for inference or for fine-tuning on another dataset. Input Format Input to Sortformer can be an individual audio file: where each line is a dictionary containing the following fields: Streaming configuration is defined by the following parameters, all measured in 80ms frames: CHUNKSIZE: The number of frames in a processing chunk. RIGHTCONTEXT: The number of future frames attached after the chunk. FIFOSIZE: The number of previous frames attached before the chunk, from the FIFO queue. UPDATEPERIOD: The number of frames extracted from the FIFO queue to update the speaker cache. SPEAKERCACHESIZE: The total number of frames in the speaker cache. Here are recommended configurations for different scenarios: | Configuration | Latency | RTF | CHUNKSIZE | RIGHTCONTEXT | FIFOSIZE | UPDATEPERIOD | SPEAKERCACHESIZE | | :---------------- | :---------- | :------ | :------------- | :---------------- | :------------ | :---------------- | :--------------------- | | very high latency | 30.4s | 0.002 | 340 | 40 | 40 | 300 | 188 | | high latency | 10.0s | 0.005 | 124 | 1 | 124 | 124 | 188 | | low latency | 1.04s | 0.093 | 6 | 7 | 188 | 144 | 188 | | ultra low latency | 0.32s | 0.180 | 3 | 1 | 188 | 144 | 188 | For clarity on the metrics used in the table: Latency: Refers to Input Buffer Latency, calculated as CHUNKSIZE + RIGHTCONTEXT. This value does not include computational processing time. Real-Time Factor (RTF): Characterizes processing speed, calculated as the time taken to process an audio file divided by its duration. RTF values are measured with a batch size of 1 on an NVIDIA RTX 6000 Ada Generation GPU. Getting Diarization Results To perform speaker diarization and get a list of speaker-marked speech segments in the format 'beginseconds, endseconds, speakerindex', simply use: To obtain tensors of speaker activity probabilities, use: This model accepts single-channel (mono) audio sampled at 16,000 Hz. - The actual input tensor is a Ns x 1 matrix for each audio clip, where Ns is the number of samples in the time-series signal. - For instance, a 10-second audio clip sampled at 16,000 Hz (mono-channel WAV file) will form a 160,000 x 1 matrix. The output of the model is an T x S matrix, where: - S is the maximum number of speakers (in this model, S = 4). - T is the total number of frames, including zero-padding. Each frame corresponds to a segment of 0.08 seconds of audio. Each element of the T x S matrix represents the speaker activity probability in the [0, 1] range. For example, a matrix element a(150, 2) = 0.95 indicates a 95% probability of activity for the second speaker during the time range [12.00, 12.08] seconds. Train and evaluate Sortformer diarizer using NeMo Training Sortformer diarizer models are trained on 8 nodes of 8Γ—NVIDIA Tesla V100 GPUs. We use 90 second long training samples and batch size of 4. The model can be trained using this example script and base config. Sortformer diarizer models can be performed with post-processing algorithms using inference example script. If you provide the post-processing YAML configs in `postprocessing` folder to reproduce the optimized post-processing algorithm for each development dataset. - The model operates in a streaming mode (online mode). - It can detect a maximum of 4 speakers; performance degrades on recordings with 5 and more speakers. - While the model is designed for long-form audio and can handle recordings that are several hours long, performance may degrade on very long recordings. - The model was trained on publicly available speech datasets, primarily in English. As a result: Performance may degrade on non-English speech. Performance may also degrade on out-of-domain data, such as recordings in noisy conditions. Sortformer was trained on a combination of 2445 hours of real conversations and 5150 hours or simulated audio mixtures generated by NeMo speech data simulator[7]. All the datasets listed above are based on the same labeling method via RTTM format. A subset of RTTM files used for model training are processed for the speaker diarization model training purposes. Data collection methods vary across individual datasets. For example, the above datasets include phone calls, interviews, web videos, and audiobook recordings. Please refer to the Linguistic Data Consortium (LDC) website or dataset webpage for detailed data collection methods. Training Datasets (Real conversations) - Fisher English (LDC) - AMI Meeting Corpus - VoxConverse-v0.3 - ICSI - AISHELL-4 - Third DIHARD Challenge Development (LDC) - 2000 NIST Speaker Recognition Evaluation, split1 (LDC) - DiPCo - AliMeeting Training Datasets (Used to simulate audio mixtures) - 2004-2010 NIST Speaker Recognition Evaluation (LDC) - Librispeech | Dataset | Number of speakers | Number of Sessions | |----------------------------|------------------------|------------------------| | DIHARD III Eval =5spk | 5-9 | 40 | | DIHARD III Eval full | 1-9 | 259 | | CALLHOME-part2 2spk | 2 | 148 | | CALLHOME-part2 3spk | 3 | 74 | | CALLHOME-part2 4spk | 4 | 20 | | CALLHOME-part2 5spk | 5 | 5 | | CALLHOME-part2 6spk | 6 | 3 | | CALLHOME-part2 full | 2-6 | 250 | | CH109 | 2 | 109 | Diarization Error Rate (DER) All evaluations include overlapping speech. Collar tolerance is 0s for DIHARD III Eval, and 0.25s for CALLHOME-part2 and CH109. Post-Processing (PP) is optimized on two different held-out dataset splits. - DIHARD III Dev Optimized Post-Processing for DIHARD III Eval - CALLHOME-part1 Optimized Post-Processing for CALLHOME-part2 and CH109 - | Latency | PP | DIHARD III Eval =5spk | DIHARD III Eval full | CALLHOME-part2 2spk | CALLHOME-part2 3spk | CALLHOME-part2 4spk | CALLHOME-part2 5spk | CALLHOME-part2 6spk | CALLHOME-part2 full | CH109 | |-------------|------|----------------------------|----------------------------|--------------------------|-------------------------|-------------------------|-------------------------|-------------------------|-------------------------|-------------------------|-----------| | 30.4s | no | 14.63 | 40.74 | 19.68 | 6.27 | 10.27 | 12.30 | 19.08 | 28.09 | 10.50 | 5.03 | | 30.4s | yes | 13.45 | 41.40 | 18.85 | 5.34 | 9.22 | 11.29 | 18.84 | 27.29 | 9.54 | 4.61 | | 10.0s | no | 14.90 | 41.06 | 19.96 | 6.96 | 11.05 | 12.93 | 20.47 | 28.10 | 11.21 | 5.28 | | 10.0s | yes | 13.75 | 41.41 | 19.10 | 6.05 | 9.88 | 11.72 | 19.66 | 27.37 | 10.15 | 4.80 | | 1.04s | no | 14.49 | 42.22 | 19.85 | 7.51 | 11.45 | 13.75 | 23.22 | 29.22 | 11.89 | 5.37 | | 1.04s | yes | 13.24 | 42.56 | 18.91 | 6.57 | 10.05 | 12.44 | 21.68 | 28.74 | 10.70 | 4.88 | | 0.32s | no | 14.64 | 43.47 | 20.19 | 8.63 | 12.91 | 16.19 | 29.40 | 30.60 | 13.57 | 6.46 | | 0.32s | yes | 13.44 | 43.73 | 19.28 | 6.91 | 10.45 | 13.70 | 27.04 | 28.58 | 11.38 | 5.27 | Streaming Sortformer is deployed via NVIDIA RIVA ASR - Speech Recognition with Speaker Diarization NVIDIA Riva, is an accelerated speech AI SDK deployable on-prem, in all clouds, multi-cloud, hybrid, on edge, and embedded. Additionally, Riva provides: World-class out-of-the-box accuracy for the most common languages with model checkpoints trained on proprietary data with hundreds of thousands of GPU-compute hours Best in class accuracy with run-time word boosting (e.g., brand and product names) and customization of acoustic model, language model, and inverse text normalization Streaming speech recognition, Kubernetes compatible scaling, and enterprise-grade support For more information on NVIDIA RIVA, see the list of supported models is here. Also check out the Riva live demo. [1] Sortformer: Seamless Integration of Speaker Diarization and ASR by Bridging Timestamps and Tokens [2] Streaming Sortformer: Speaker Cache-Based Online Speaker Diarization with Arrival-Time Ordering [3] NEST: Self-supervised Fast Conformer as All-purpose Seasoning to Speech Processing Tasks [4] Fast Conformer with Linearly Scalable Attention for Efficient Speech Recognition License to use this model is covered by the CC-BY-4.0. By downloading the public and release version of the model, you accept the terms and conditions of the CC-BY-4.0 license.

license:cc-by-4.0
18,377
71

NVIDIA-Nemotron-Nano-12B-v2

--- license: other license_name: nvidia-open-model-license license_link: >- https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/ pipeline_tag: text-generation data...

NaNK
β€”
17,910
132

Llama-3.1-8B-Instruct-NVFP4

NaNK
llama
17,673
2

NVIDIA-Nemotron-Nano-9B-v2-FP8

The pretraining data has a cutoff date of September 2024. NVIDIA-Nemotron-Nano-9B-v2-FP8 is a quantized version of NVIDIA-Nemotron-Nano-9B-v2 and is a large language model (LLM) trained from scratch by NVIDIA, and designed as a unified model for both reasoning and non-reasoning tasks. It responds to user queries and tasks by first generating a reasoning trace and then concluding with a final response. The model's reasoning capabilities can be controlled via a system prompt. If the user prefers the model to provide its final answer without intermediate reasoning traces, it can be configured to do so, albeit with a slight decrease in accuracy for harder prompts that require reasoning. Conversely, allowing the model to generate reasoning traces first generally results in higher-quality final solutions to queries and tasks. The model uses a hybrid architecture consisting primarily of Mamba-2 and MLP layers combined with just four Attention layers. For the architecture, please refer to the Nemotron-H tech report. Nano 9B v2 has 56 layers in total: 27 Mamba layers, 25 MLP layers, and 4 attention layers. In the FP8 quantized version, the Mamba and MLP layers are quantized to FP8, while all 4 attention layers and the Conv1d components within the Mamba layers are kept in BF16. The supported languages include: English, German, Spanish, French, Italian, and Japanese. Improved using Qwen. We want to hear from you! Share your ideas, vote on what matters, and help shape the future of Nemotron. GOVERNING TERMS: Use of this model is governed by the NVIDIA Open Model License Agreement. We evaluated our model in Reasoning-On mode across all benchmarks. | Benchmark | Qwen3-8B | NVIDIA-Nemotron-Nano-9B-v2 | NVIDIA-Nemotron-Nano-9B-v2-FP8 | | :---- | ----: | ----: | ----: | | AIME25 | 69.3% | 72.1% | 71.3% | | MATH500 | 96.3% | 97.8% | 97.5% | | GPQA | 59.6% | 64.0% | 60.6% | | LCB | 59.5% | 71.1% | 67.6% | | BFCL v3 | 66.3% | 66.9% | 65.7% | | IFEval (Instruction Strict) | 89.4% | 90.3% | 90.2% | | RULER (128K) | 74.1% | 78.9% | 76.1% | - Architecture Type: Mamba2-Transformer Hybrid - Network Architecture: Nemotron-Hybrid NVIDIA-Nemotron-Nano-9B-v2-FP8 is a general purpose reasoning and chat model intended to be used in English and coding languages. Other non-English languages (German, French, Italian, Spanish and Japanese) are also supported. Developers designing AI Agent systems, chatbots, RAG systems, and other AI-powered applications. Also suitable for typical instruction-following tasks. Huggingface 9/22/2025 via https://huggingface.co/nvidia/NVIDIA-Nemotron-Nano-9B-v2-FP8 - NVIDIA Nemotron Nano 2: An Accurate and Efficient Hybrid Mamba-Transformer Reasoning Model - Input Type(s): Text - Input Format(s): String - Input Parameters: One-Dimensional (1D): Sequences - Other Properties Related to Input: Context length up to 128K. Supported languages include German, Spanish, French, Italian, Korean, Portuguese, Russian, Japanese, Chinese and English. - Output Type(s): Text - Output Format: String - Output Parameters: One-Dimensional (1D): Sequences up to 128K Our models are designed and optimized to run on NVIDIA GPU-accelerated systems. By leveraging NVIDIA’s hardware (e.g. GPU cores) and software frameworks (e.g., CUDA libraries), the model achieves faster training and inference times compared to CPU-only solutions. - Runtime Engine(s): NeMo 25.07.nemotron-nano-v2 - Supported Hardware Microarchitecture Compatibility: NVIDIA A10G, NVIDIA H100-80GB, NVIDIA A100, Jetson AGX Thor - Operating System(s): Linux The snippet below shows how to use this model with Huggingface Transformers (tested on version 4.48.3). Case 1: `/think` or no reasoning signal is provided in the system prompt, reasoning will be set to `True` Case 2: `/nothink` is provided, reasoning will be set to `False` Note: `/think` or `/nothink` keywords can also be provided in β€œuser” messages for turn-level reasoning control. We recommend setting `temperature` to `0.6`, `topp` to `0.95` for reasoning True and greedy search for reasoning False, and increase `maxnewtokens` to `1024` or higher for reasoning True. The snippet below shows how to use this model with TRT-LLM. We tested this on the following commit and followed these instructions to build and install TRT-LLM in a docker container. The snippet below shows how to use this model with vLLM. Use the latest version of vLLM and follow these instructions to build and install vLLM. Note: - Remember to add \`--mamba\ssm\cache\dtype float32\` for accurate quality. Without this option, the model’s accuracy may degrade. - If you encounter a CUDA OOM issue, try `--max-num-seqs 64` and consider lower the value further if the error persists. Alternativly, you can use Docker to launch a vLLM server. For Jetson AGX Thor, please use this vLLM container. The thinking budget allows developers to keep accuracy high and meet response‑time targets \- which is especially crucial for customer support, autonomous agent steps, and edge devices where every millisecond counts. With budget control, you can set a limit for internal reasoning: `maxthinkingtokens`: This is a threshold that will attempt to end the reasoning trace at the next newline encountered in the reasoning trace. If no newline is encountered within 500 tokens, it will abruptly end the reasoning trace at \`max\thinking\tokens \+ 500\`. Calling the server with a budget (Restricted to 32 tokens here as an example) After launching a vLLM server, you can call the server with tool-call support using a Python script like below: We follow the jinja chat template provided below. This template conditionally adds ` \n` to the start of the Assistant response if `/think` is found in either the system prompt or any user message. If no reasoning signal is added, the model defaults to reasoning "on" mode. The chat template adds ` ` to the start of the Assistant response if `/nothink` is found in the system prompt. Thus enforcing reasoning on/off behavior. Data Modality: Text Text Training Data Size: More than 10 Trillion Tokens Train/Test/Valid Split: We used 100% of the corpus for pre-training and relied on external benchmarks for testing. Data Collection Method by dataset: Hybrid: Automated, Human, Synthetic Labeling Method by dataset: Hybrid: Automated, Human, Synthetic Properties: The post-training corpus for NVIDIA-Nemotron-Nano-9B-v2-FP8 consists of English and multilingual text (German, Spanish, French, Italian, Korean, Portuguese, Russian, Japanese, Chinese and English). Our sources cover a variety of document types such as: webpages, dialogue, articles, and other written materials. The corpus spans domains including code, legal, math, science, finance, and more. We also include a small portion of question-answering, and alignment style data to improve model accuracies. For several of the domains listed above we used synthetic data, specifically reasoning traces, from DeepSeek R1/R1-0528, Qwen3-235B-A22B, Nemotron 4 340B, Qwen2.5-32B-Instruct-AWQ, Qwen2.5-14B-Instruct, Qwen 2.5 72B. The pre-training corpus for NVIDIA-Nemotron-Nano-9B-v2-FP8 consists of high-quality curated and synthetically-generated data. It is trained in the English language, as well as 15 multilingual languages and 43 programming languages. Our sources cover a variety of document types such as: webpages, dialogue, articles, and other written materials. The corpus spans domains including legal, math, science, finance, and more. We also include a small portion of question-answering, and alignment style data to improve model accuracy. The model was pre-trained for approximately twenty trillion tokens. Alongside the model, we release our final pretraining data, as outlined in this section. For ease of analysis, there is a sample set that is ungated. For all remaining code, math and multilingual data, gating and approval is required, and the dataset is permissively licensed for model training purposes. More details on the datasets and synthetic data generation methods can be found in the technical report NVIDIA Nemotron Nano 2: An Accurate and Efficient Hybrid Mamba-Transformer Reasoning Model . | Dataset | Collection Period | | :---- | :---- | | Problems in Elementary Mathematics for Home Study | 4/23/2025 | | GSM8K | 4/23/2025 | | PRM800K | 4/23/2025 | | CC-NEWS | 4/23/2025 | | Common Crawl | 4/23/2025 | | Wikimedia | 4/23/2025 | | Bespoke-Stratos-17k | 4/23/2025 | | tigerbot-kaggle-leetcodesolutions-en-2k | 4/23/2025 | | glaive-function-calling-v2 | 4/23/2025 | | APIGen Function-Calling | 4/23/2025 | | LMSYS-Chat-1M | 4/23/2025 | | Open Textbook Library \- CC BY-SA & GNU subset and OpenStax \- CC BY-SA subset | 4/23/2025 | | Advanced Reasoning Benchmark, tigerbot-kaggle-leetcodesolutions-en-2k, PRM800K, and SciBench | 4/23/2025 | | FineWeb-2 | 4/23/2025 | | Court Listener | Legacy Download | | peS2o | Legacy Download | | OpenWebMath | Legacy Download | | BioRxiv | Legacy Download | | PMC Open Access Subset | Legacy Download | | OpenWebText2 | Legacy Download | | Stack Exchange Data Dump | Legacy Download | | PubMed Abstracts | Legacy Download | | NIH ExPorter | Legacy Download | | arXiv | Legacy Download | | BigScience Workshop Datasets | Legacy Download | | Reddit Dataset | Legacy Download | | SEC's Electronic Data Gathering, Analysis, and Retrieval (EDGAR) | Legacy Download | | Public Software Heritage S3 | Legacy Download | | The Stack | Legacy Download | | mC4 | Legacy Download | | Advanced Mathematical Problem Solving | Legacy Download | | MathPile | Legacy Download | | NuminaMath CoT | Legacy Download | | PMC Article | Legacy Download | | FLAN | Legacy Download | | Advanced Reasoning Benchmark | Legacy Download | | SciBench | Legacy Download | | WikiTableQuestions | Legacy Download | | FinQA | Legacy Download | | Riddles | Legacy Download | | Problems in Elementary Mathematics for Home Study | Legacy Download | | MedMCQA | Legacy Download | | Cosmos QA | Legacy Download | | MCTest | Legacy Download | | AI2's Reasoning Challenge | Legacy Download | | OpenBookQA | Legacy Download | | MMLU Auxiliary Train | Legacy Download | | social-chemestry-101 | Legacy Download | | Moral Stories | Legacy Download | | The Common Pile v0.1 | Legacy Download | | FineMath | Legacy Download | | MegaMath | Legacy Download | | FastChat | 6/30/2025 | Private Non-publicly Accessible Datasets of Third Parties | Dataset | | :---- | | Global Regulation | | Workbench | The English Common Crawl data was downloaded from the Common Crawl Foundation (see their FAQ for details on their crawling) and includes the snapshots CC-MAIN-2013-20 through CC-MAIN-2025-13. The data was subsequently deduplicated and filtered in various ways described in the Nemotron-CC paper. Additionally, we extracted data for fifteen languages from the following three Common Crawl snapshots: CC-MAIN-2024-51, CC-MAIN-2025-08, CC-MAIN-2025-18. The fifteen languages included were Arabic, Chinese, Danish, Dutch, French, German, Italian, Japanese, Korean, Polish, Portuguese, Russian, Spanish, Swedish, and Thai. As we did not have reliable multilingual model-based quality classifiers available, we applied just heuristic filtering insteadβ€”similar to what we did for lower quality English data in the Nemotron-CC pipeline, but selectively removing some filters for some languages that did not work well. Deduplication was done in the same way as for Nemotron-CC. The GitHub Crawl was collected using the GitHub REST API and the Amazon S3 API. Each crawl was operated in accordance with the rate limits set by its respective source, either GitHub or S3. We collect raw source code and subsequently remove any having a license which does not exist in our permissive-license set (for additional details, refer to the technical report). | Dataset | Modality | Dataset Size (Tokens) | Collection Period | | :---- | :---- | :---- | :---- | | English Common Crawl | Text | 3.360T | 4/8/2025 | | Multilingual Common Crawl | Text | 812.7B | 5/1/2025 | | GitHub Crawl | Text | 747.4B | 4/29/2025 | | Dataset | Modality | Dataset Size (Tokens) | Seed Dataset | Model(s) used for generation | | :---- | :---- | :---- | :---- | :---- | | Synthetic Art of Problem Solving from DeepSeek-R1 | Text | 25.5B | Art of Problem Solving; American Mathematics Competitions 8; American Mathematics Competitions 10; | DeepSeek-R1 | | Synthetic Moral Stories and Social Chemistry from Mixtral-8x22B-v0.1 | Text | 327M | social-chemestry-101; Moral Stories | Mixtral-8x22B-v0.1 | | Synthetic Social Sciences seeded with OpenStax from DeepSeek-V3, Mixtral-8x22B-v0.1, and Qwen2.5-72B | Text | 83.6M | OpenStax \- CC BY-SA subset | DeepSeek-V3; Mixtral-8x22B-v0.1; Qwen2.5-72B | | Synthetic Health Sciences seeded with OpenStax from DeepSeek-V3, Mixtral-8x22B-v0.1, and Qwen2.5-72B | Text | 9.7M | OpenStax \- CC BY-SA subset | DeepSeek-V3; Mixtral-8x22B-v0.1; Qwen2.5-72B | | Synthetic STEM seeded with OpenStax, Open Textbook Library, and GSM8K from DeepSeek-R1, DeepSeek-V3, DeepSeek-V3-0324, and Qwen2.5-72B | Text | 175M | OpenStax \- CC BY-SA subset; GSM8K; Open Textbook Library \- CC BY-SA & GNU subset | DeepSeek-R1, DeepSeek-V3; DeepSeek-V3-0324; Qwen2.5-72B | | Nemotron-PrismMath | Text | 4.6B | Big-Math-RL-Verified; OpenR1-Math-220k | Qwen2.5-0.5B-instruct, Qwen2.5-72B-Instruct; DeepSeek-R1-Distill-Qwen-32B | | Synthetic Question Answering Data from Papers and Permissible Books from Qwen2.5-72B-Instruct | Text | 350M | arXiv; National Institutes of Health ExPorter; BioRxiv; PMC Article; USPTO Backgrounds; peS2o; Global Regulation; CORE; PG-19; DOAB CC BY & CC BY-SA subset; NDLTD | Qwen2.5-72B-Instruct | | Synthetic FineMath-4+ Reprocessed from DeepSeek-V3 | Text | 9.2B | Common Crawl | DeepSeek-V3 | | Synthetic FineMath-3+ Reprocessed from phi-4 | Text | 27.6B | Common Crawl | phi-4 | | Synthetic Union-3+ Reprocessed from phi-4 | Text | 93.1B | Common Crawl | phi-4 | | Refreshed Nemotron-MIND from phi-4 | Text | 73B | Common Crawl | phi-4 | | Synthetic Union-4+ Reprocessed from phi-4 | Text | 14.12B | Common Crawl | phi-4 | | Synthetic Union-3+ minus 4+ Reprocessed from phi-4 | Text | 78.95B | Common Crawl | phi-4 | | Synthetic Union-3 Refreshed from phi-4 | Text | 80.94B | Common Crawl | phi-4 | | Synthetic Union-4+ Refreshed from phi-4 | Text | 52.32B | Common Crawl | phi-4 | | Synthetic AGIEval seeded with AQUA-RAT, LogiQA, and AR-LSAT from DeepSeek-V3 and DeepSeek-V3-0324 | Text | 4.0B | AQUA-RAT; LogiQA; AR-LSAT | DeepSeek-V3; DeepSeek-V3-0324 | | Synthetic AGIEval seeded with AQUA-RAT, LogiQA, and AR-LSAT from Qwen3-30B-A3B | Text | 4.2B | AQUA-RAT; LogiQA; AR-LSAT | Qwen3-30B-A3B | | Synthetic Art of Problem Solving from Qwen2.5-32B-Instruct, Qwen2.5-Math-72B, Qwen2.5-Math-7B, and Qwen2.5-72B-Instruct | Text | 83.1B | Art of Problem Solving; American Mathematics Competitions 8; American Mathematics Competitions 10; GSM8K; PRM800K | Qwen2.5-32B-Instruct; Qwen2.5-Math-72B; Qwen2.5-Math-7B; Qwen2.5-72B-Instruct | | Synthetic MMLU Auxiliary Train from DeepSeek-R1 | Text | 0.5B | MMLU Auxiliary Train | DeepSeek-R1 | | Synthetic Long Context Continued Post-Training Data from Papers and Permissible Books from Qwen2.5-72B-Instruct | Text | 5.4B | arXiv; National Institutes of Health ExPorter; BioRxiv; PMC Article; USPTO Backgrounds; peS2o; Global Regulation; CORE; PG-19; DOAB CC BY & CC BY-SA subset; NDLTD | Qwen2.5-72B-Instruct | | Synthetic Common Crawl from Qwen3-30B-A3B and Mistral-Nemo-12B-Instruct | Text | 1.949T | Common Crawl | Qwen3-30B-A3B; Mistral-NeMo-12B-Instruct | | Synthetic Multilingual Data from Common Crawl from Qwen3-30B-A3B | Text | 997.3B | Common Crawl | Qwen3-30B-A3B | | Synthetic Multilingual Data from Wikimedia from Qwen3-30B-A3B | Text | 55.1B | Wikimedia | Qwen3-30B-A3B | | Synthetic OpenMathReasoning from DeepSeek-R1-0528 | Text | 1.5M | OpenMathReasoning | DeepSeek-R1-0528 | | Synthetic OpenCodeReasoning from DeepSeek-R1-0528 | Text | 1.1M | OpenCodeReasoning | DeepSeek-R1-0528 | | Synthetic Science Data from DeepSeek-R1-0528 | Text | 1.5M | \- | DeepSeek-R1-0528 | | Synthetic Humanity's Last Exam from DeepSeek-R1-0528 | Text | 460K | Humanity's Last Exam | DeepSeek-R1-0528 | | Synthetic ToolBench from Qwen3-235B-A22B | Text | 400K | ToolBench | Qwen3-235B-A22B | | Synthetic Nemotron Content Safety Dataset V2, eval-safety, Gretel Synthetic Safety Alignment, and RedTeam\2K from DeepSeek-R1-0528 | Text | 52K | Nemotron Content Safety Dataset V2; eval-safety; Gretel Synthetic Safety Alignment; RedTeam\2K | DeepSeek-R1-0528 | | Synthetic HelpSteer from Qwen3-235B-A22B | Text | 120K | HelpSteer3; HelpSteer2 | Qwen3-235B-A22B | | Synthetic Alignment data from Mixtral-8x22B-Instruct-v0.1, Mixtral-8x7B-Instruct-v0.1, and Nemotron-4 Family | Text | 400K | HelpSteer2; C4; LMSYS-Chat-1M; ShareGPT52K; tigerbot-kaggle-leetcodesolutions-en-2k; GSM8K; PRM800K; lm\identity (NVIDIA internal); FinQA; WikiTableQuestions; Riddles; ChatQA nvolve-multiturn (NVIDIA internal); glaive-function-calling-v2; SciBench; OpenBookQA; Advanced Reasoning Benchmark; Public Software Heritage S3; Khan Academy Math Keywords | Nemotron-4-15B-Base (NVIDIA internal); Nemotron-4-15B-Instruct (NVIDIA internal); Nemotron-4-340B-Base; Nemotron-4-340B-Instruct; Nemotron-4-340B-Reward; Mixtral-8x7B-Instruct-v0.1; Mixtral-8x22B-Instruct-v0.1 | | Synthetic LMSYS-Chat-1M from Qwen3-235B-A22B | Text | 1M | LMSYS-Chat-1M | Qwen3-235B-A22B | | Synthetic Multilingual Reasoning data from DeepSeek-R1-0528, Qwen2.5-32B-Instruct-AWQ, and Qwen2.5-14B-Instruct | Text | 25M | OpenMathReasoning; OpenCodeReasoning | DeepSeek-R1-0528; Qwen2.5-32B-Instruct-AWQ (translation); Qwen2.5-14B-Instruct (translation); | | Synthetic Multilingual Reasoning data from Qwen3-235B-A22B and Gemma 3 Post-Trained models | Text | 5M | WildChat | Qwen3-235B-A22B; Gemma 3 PT 12B; Gemma 3 PT 27B | Data Collection Method by dataset: Hybrid: Human, Synthetic Labeling Method by dataset: Hybrid: Automated, Human, Synthetic - ## Test Hardware NVIDIA A10G 24GB, H100 80GB, Jetson AGX Thor NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our Trustworthy AI terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse. For more detailed information on ethical considerations for this model, please see the Model Card++ Bias, Explainability, Safety & Security, and Privacy Subcards. Please report security vulnerabilities or NVIDIA AI Concerns here.

NaNK
β€”
17,543
6

bigvgan_v2_24khz_100band_256x

NaNK
license:mit
17,382
17

Llama-3.1-8B-Instruct-FP4

Description: The NVIDIA Llama 3.1 8B Instruct FP4 model is the quantized language model of the Meta's Llama 3.1 8B model, which is an auto-regressive language model. For more information, please check here. This model is ready for commercial and non-commercial use. Third-Party Community Consideration This model is not owned or developed by NVIDIA. This model has been developed and built to a third-party’s requirements for this application and use case; see link to Non-NVIDIA (Llama-3.1-8B-Instruct) Model Card. License/Terms of Use: GOVERNING TERMS: Use of this model is governed by the NVIDIA Open Model License. ADDITIONAL INFORMATION: Llama3 Community License Agreement. Built with Llama. Deployment Geography: Global, except in European Union Use Case: Developers looking to take off the shelf pre-quantized models for deployment in AI Agent systems, chatbots, RAG systems, and other AI-powered applications. Release Date: Huggingface 09/15/2025 via [https://huggingface.co/nvidia/Llama-3.1-8B-Instruct-FP4] Model Architecture: Architecture Type: Transformers Network Architecture: Llama3 This model was developed based on Llama3.1-8B-Instruct Number of model parameters 8.010^9 Input: Input Type(s): Text Input Format(s): String Input Parameters: 1D (One-Dimensional): Sequences Other Properties Related to Input: Context length up to 128,000 Output: Output Type(s): Text Output Format: String Output Parameters: 1D (One-Dimensional): Sequences Other Properties Related to Output: N/A Our AI models are designed and/or optimized to run on NVIDIA GPU-accelerated systems. By leveraging NVIDIA’s hardware (e.g. GPU cores) and software frameworks (e.g., CUDA libraries), the model achieves faster training and inference times compared to CPU-only solutions. Software Integration : Supported Runtime Engine(s): TensorRT-LLM vLLM SGLang Supported Hardware Microarchitecture Compatibility: NVIDIA Blackwell Model Version(s): The model is quantized with nvidia-modelopt v0.35.0 Post Training Quantization This model was obtained by quantizing the weights and activations of Llama 3.1 8B Instruct to FP4 data type, ready for inference with TensorRT-LLM. Only the weights and activations of the linear operators within transformer blocks are quantized. Calibration Dataset: Link: cnndailymail Data collection method: Automated Labeling method: Automated. Training Datasets: Data Collection Method by Dataset: Undisclosed Labeling Method by Dataset: Undisclosed Properties: Undisclosed Testing Dataset: Data Collection Method by Dataset: Undisclosed Labeling Method by Dataset: Undisclosed Properties: Undisclosed To serve the quantized checkpoint with TensorRT-LLM, follow the sample commands below with the TensorRT-LLM GitHub repo: Ethical Considerations : NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. Developers should perform safety testing and tuning tailored to their specific applications of the model. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse. Please report model quality, risk, security vulnerabilities or NVIDIA AI Concerns here.

NaNK
llama
17,101
2

NVIDIA-Nemotron-Nano-12B-v2-VL-FP8

NVIDIA-Nemotron-Nano-VL-12B-V2-FP8 is the quantized version of the NVIDIA Nemotron Nano VL V2 model, which is an auto-regressive vision language model that uses an optimized transformer architecture. For more information, please check here. The NVIDIA Nemotron Nano VL FP4 QAD model is quantized with TensorRT Model Optimizer. This model was trained on commercial images for all three stages of training and supports single image inference. Your use of the model is governed by the NVIDIA Open License Agreement. Use Cases: Image summarization. Text-image analysis, Optical Character Recognition, Interactive Q&A on images, Text Chain-of-Thought reasoning - Build.Nvidia.com [October 28th, 2025] via nvidia/NVIDIA-Nemotron-Nano-VL-12B-V2 - Hugging Face [October 28th, 2025] via nvidia/NVIDIA-Nemotron-Nano-VL-12B-V2-BF16 - Hugging Face [October 28th, 2025] via nvidia/NVIDIA-Nemotron-Nano-VL-12B-V2-FP8 - Hugging Face [October 28th, 2025] via nvidia/NVIDIA-Nemotron-Nano-VL-12B-V2-NVFP4 Input Type(s): Image, Text - Input Images - Language Supported: German, Spanish, French, Italian, Korean, Portuguese, Russian, Japanese, Chinese, English Input Format(s): Image (Red, Green, Blue (RGB)), and Text (String) - Context length up to 128K - Maximum Resolution: Determined by a 12-tile layout constraint, with each tile being 512 Γ— 512 pixels. This supports aspect ratios such as: - 4 Γ— 3 layout: up to 2048 Γ— 1536 pixels - 3 Γ— 4 layout: up to 1536 Γ— 2048 pixels - 2 Γ— 6 layout: up to 1024 Γ— 3072 pixels - 6 Γ— 2 layout: up to 3072 Γ— 1024 pixels - Other configurations allowed, provided total tiles ≀ 12 - Channel Count: 3 channels (RGB) - Alpha Channel: Not supported (no transparency) Output Parameters: One-Dimensional (1D): Sequences up to 128K Our AI models are designed and/or optimized to run on NVIDIA GPU-accelerated systems. By leveraging NVIDIA’s hardware (e.g. GPU cores) and software frameworks (e.g., CUDA libraries), the model achieves faster training and inference times compared to CPU-only solutions. Software Integration Runtime Engine(s): vLLM Supported Hardware Microarchitecture Compatibility: H100 SXM 80GB Supported Operating System(s): Linux @misc{nvidia2025nvidianemotronnanov2, title={NVIDIA Nemotron Nano V2 VL}, author={NVIDIA}, year={2025}, eprint={2511.03929}, archivePrefix={arXiv}, primaryClass={cs.LG}, url={https://arxiv.org/abs/2511.03929}, } ```

NaNK
β€”
17,065
38

low-frame-rate-speech-codec-22khz

β€”
16,368
19

llama-nemoretriever-colembed-3b-v1

The nvidia/llama-nemoretriever-colembed-3b-v1 is a late interaction embedding model fine-tuned for query-document retrieval. Users can input `queries`, which are text, or `documents` which are page images, to the model. The model outputs ColBERT-style multi-vector numerical representations for input queries and documents. It achieved 1st place on ViDoRe V1 (nDCG@5), ViDoRe V2 (nDCG@5) and MTEB VisualDocumentRetrieval (Rank Borda) (as of 27th June, 2025). For more technical details, please refer to our technical report: Llama Nemoretriever Colembed: Top-Performing Text-Image Retrieval Model This model is for non-commercial/research use only. License/Terms of Use Governing Terms for llama-nemoretriever-colembed-3b-v1 model: NVIDIA Non-Commercial License Additional Information: Apache License 2.0 for siglip2-giant-opt-patch16-384; and LLAMA 3.2 Community License Agreement for Llama-3.2-1B. Built with Meta Llama 3. Improved using Qwen. This project will download and install additional third-party open source software projects. Review the license terms of these open source projects before use. This project will download and install additional third-party open source software projects. Review the license terms of these open source projects before use. Team - Mengyao Xu - Gabriel Moreira - Radek Osmulski - Ronay Ak - Yauhen Babakhin - Even Oldridge - Benedikt Schifferer Correspondence to Mengyao Xu ([email protected]) and Benedikt Schifferer ([email protected]) | Model Name | Use-Case | Comment | |-------------------------------------------|------------------------|---------------------------------------------------------------------------------| | nvidia/llama-NemoRetriever-ColEmbed-1B-v1 | Research-Only | Smaller Version of nvidia/llama-NemoRetriever-ColEmbed-3B-v1 | | nvidia/llama-NemoRetriever-ColEmbed-3B-v1 | Research-Only | #1 ViDoRe V1, V2 and MTEB VisualDocumentRetrieval as of June 27, 2025 | | llama-32-nemoretriever-1b-vlm-embed-v1 | Commercial Application | MultiModal Embedding Model for Production Use-Case of Visual Document Retrieval | | llama-32-nv-embedqa-1b-v2 | Commercial Application | Text Embedding Model for Production Use-Case of Text Document Retrieval | | llama-32-nemoretriever-500m-rerank-v2 | Commercial Application | Text Reranker Model for Production Use-Case of Text Document Retrieval | | llama-32-nv-rerankqa-1b-v2 | Commercial Application | Text Reranker Model for Production Use-Case of Text Document Retrieval | | nvidia/NV-Embed-v2 | Research-Only | #1 MTEB as of Aug 30, 2024 | | nvidia/MM-Embed | Research-Only | Improved nvidia/NV-Embed-v1 and multimodal embeddings | | nvidia/NV-Retriever-v1 | Research-Only | #1 MTEB BEIR as of July 12th, 2024 Use Case llama-nemoretriever-colembed is intended for researchers exploring applications that must understand or retrieve information across both text and image modalities. It is instrumental in multimodal RAG systems, where queries are in text format and documents are images, such as pages, text, charts, tables or infographics. Potential applications include multimedia search engines, cross-modal retrieval systems, and conversational AI with rich input understanding. Release Date Huggingface on 06/27/2025 via https://huggingface.co/nvidia/llama-nemoretriever-colembed-3b-v1 - Architecture Type: Transformer - Network Architecture: google/siglip2-giant-opt-patch16-384 + meta-llama/Llama-3.2-3B The llama-nemoretriever-colembed-3b-v1 is a transformer-based multimodal embedding model built on top of a VLM based on google/siglip2-giant-opt-patch16-384 and meta-llama/Llama-3.2-3B. | Property | Query | Document | |----------|-------|----------| | Input Type | Text | Text \| Image | | Input Format | List of strings | List of strings \| List of Images | | Input Parameter | 1D | 1D | | Other Properties | The model's maximum context length is 8192 tokens. Texts longer than maximum length must either be chunked or truncated. | The model's maximum context length is 8192 tokens. Texts longer than maximum length must either be chunked or truncated. Images must be python PIL format. The model will scale the image into multiple tiles of 512x512. | - Output Type: Floats - Output Format: List of float arrays - Output Parameters: The list of floats equivalent to [batchsize x seq length x embeddingdim] - Other Properties Related to Output: Model outputs embedding vectors of dimension for each input token. Our AI models are designed and/or optimized to run on NVIDIA GPU-accelerated systems. By leveraging NVIDIA’s hardware (e.g. GPU cores) and software frameworks (e.g., CUDA libraries), the model achieves faster training and inference times compared to CPU-only solutions. The model requires transformers version 4.49.0 and flash attention The HuggingFace model artifact contains a script to evaluate ViDoRe V1 and ViDoRe V2 based on the GitHub repository The HuggingFace model artifact contains a script to evaluate MTEB VisualDocumentRetrieval. We install ViDoRe benchmark to capture dependencies, first. Runtime Engine(s): TensorRT, Triton Supported Hardware Microarchitecture Compatibility: A100 40GB, A100 80GB, H100 80GB Supported Operating System(s): Linux Model Version(s) llama-NemoRetriever-colembed-3b-v1 - The total size (in number of data points) 12.74M qa pairs for training - Total number of datasets: 23 datasets used for training and 17 datasets used for evaluation. Training Dataset The model was trained on publicly available datasets, including HotpotQA, MIRACL, Natural Questions (NQ), Stack Exchange, SQuAD, Tiger Math/Stack, DocMatix-IR, VDR, Vidore-ColPali-Training, VisRAG-Ret-Train-Synthetic-data, VisRAG-Ret-Train-In-domain-data, and Wiki-SS-NQ. - Data Collection Method by dataset: Hybrid: Automated, Human, Synthetic - Labeling Method by dataset: Hybrid: Automated, Human, Synthetic - Properties: Training: 1st Stage: 12M qa pairs, 2nd Stage: 500k qa pairs, 3rd Stage: 240k qa pairs Evaluation Dataset We evaluate the model on multiple benchmarks for Visual Document Retrieval, ViDoRe V1, ViDoRe V2 and MTEB Visual Document Retrieval. - Data Collection Method by dataset: Hybrid: Automated, Human, Synthetic - Labeling Method by dataset: Hybrid: Automated, Human, Synthetic - Properties: More details on ViDoRe V1 and ViDoRe V2 can be found on their leaderboard. Visual Document Retrieval Benchmark, ViDoRe, is composed of various page-level retrieving tasks spanning multiple domains, languages, and settings. | Benchmark | Model 1B | Model 3B | |--------------------------------|--------------|--------------| | ViDoRe V1 (06/27/2025) | 0.9050 | 0.9100 | | ViDoRe V1 (deprecated) | 0.9049 | 0.9098 | | ViDoRe V2 (06/27/2025) | 0.6209 | 0.6352 | | ViDoRe V2 (deprecated) | 0.6261 | 0.6342 | | MTEB Visual Document Retrieval | 0.8238 | 0.8315 | Note: All scores are Avg. NDCG@5. ViDoRe V1 and V2 was updated on June 27th 2025 to use the calculated scores from MTEB, which can result in slightly different scores. The ViDoRe V2 (06/27/2025) uses only 4 of the original 7 datasets. Ethical Considerations NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse. Please report security vulnerabilities or NVIDIA AI Concerns here.

NaNK
llama_nemoretrievercolembed
16,259
60

C-RADIO

β€”
15,835
22

stt_en_conformer_ctc_large

license:cc-by-4.0
15,206
28

Nemotron-H-8B-Base-8K

NVIDIA Nemotron-H-8B-Base-8K is a large language model (LLM) developed by NVIDIA that is designed as a completion model for a given piece of text. It uses a hybrid model architecture that consists primarily of Mamba-2 and MLP layers combined with just four Attention layers. The model features a context length of 8K. The supported languages include: English, German, Spanish, French, Italian, Korean, Portuguese, Russian, Japanese, and Chinese. For more detailed information on the model architecture, training, and evaluation, please see the project page and the technical report. For best performance on a given task, users are encouraged to customize the model using the NeMo Framework suite of customization tools including Parameter-Efficient Fine-Tuning (P-tuning, Adapters, LoRA, and more), and Model Alignment (SFT, SteerLM, RLHF, and more) using NeMo-Aligner. This model is part of the Nemotron-H Collection. You can find the models in this family here: - Nemotron-H-56B-Base-8K - Nemotron-H-47B-Base-8K - Nemotron-H-8B-Base-8K GOVERNING TERMS: Use of this model is governed by the NVIDIA Internal Scientific Research and Development Model License. The pretraining data has a cutoff date of September 2024. This model is intended for developers and researchers building LLMs. - [\[2504.03624\] Nemotron-H: A Family of Accurate and Efficient Hybrid Mamba-Transformer Models](https://arxiv.org/abs/2504.03624) Model Architecture - Architecture Type: Hybrid Mamba-Transformer - Network Architecture: Nemotron-H Input - Input Type(s): Text - Input Format(s): String - Input Parameters: One-Dimensional (1D): Sequences - Other Properties Related to Input: Context length up to 8K. Supported languages include German, Spanish, French, Italian, Korean, Portuguese, Russian, Japanese, Chinese and English. Output - Output Type(s): Text - Output Format: String - Output Parameters: One-Dimensional (1D): Sequences Our AI models are designed and/or optimized to run on NVIDIA GPU-accelerated systems. By leveraging NVIDIA’s hardware (e.g. GPU cores) and software frameworks (e.g., CUDA libraries), the model achieves faster training and inference times compared to CPU-only solutions. Software Integration - Runtime Engine(s): NeMo 24.12 - Supported Hardware Microarchitecture Compatibility: NVIDIA H100-80GB, NVIDIA A100 - Operating System(s): Linux As this is a base model, no explicit prompt format is recommended or required. The training corpus for Nemotron-H-8B-Base-8K consists of English and multilingual text (German, Spanish, French, Italian, Korean, Portuguese, Russian, Japanese, Chinese and English), as well as code. Our sources cover a variety of document types such as: webpages, dialogue, articles, and other written materials. This model was also improved using synthetic data from Qwen (Built with Qwen). The corpus spans domains including legal, math, science, finance, and more. We also include a small portion of question-answering, and alignment style data to improve model accuracy. Data Collection for Training & Testing Datasets: Hybrid: Automated, Human, Synthetic Data Labeling for Training & Testing Datasets: Hybrid: Automated, Human, Synthetic We used the datasets listed in the next section to evaluate Nemotron-H-8B-Base-8K. Data Collection for Evaluation Datasets: Hybrid: Human, Synthetic Data Labeling for Evaluation Datasets: Hybrid: Human, Synthetic, Automatic | ARC Challenge 25-shot | Hellaswag 10-shot | Winogrande 5-shot | CommonsenseQA 7-shot | |-------------|--------------|-----------------|------------------| | 88.74 | 83.23| 80.51 | 78.71 | - ARC (Ai2 reasoning challenge)-Challenge - The challenge set of questions from a benchmark that contains grade-school level, multiple-choice science questions to assess question answering ability of language models. Dataset - Hellaswag - Tests the ability of a language model to correctly finish the provided context from a choice of possible options. Dataset - Winogrande - Tests the ability to choose the right option for a given sentence which requires commonsense reasoning. Dataset - CommonsenseQA - A multiple-choice question answering dataset that requires different type of commonsense knowledge to predict the correct answers. Dataset | MBPP (sanitized) 3-shot | MBPP+ 0-shot | HumanEval 0-shot | HumanEval+ 0-shot | |-------------|--------------|-----------------|------------------| | 65.37 | 59.52| 58.54 | 55.49 | - MBPP (Mostly Basic Python Programming Problems) - Evaluates ability to generate solutions for Python programming tasks. Dataset - MBPP+ - Extended version of MBPP with additional validation. Dataset - HumanEval - Tests code generation and completion abilities in Python. Dataset | GSM8K 8-shot CoT | MATH 4-shot CoT | MATH-Lvl 5 4-shot CoT | MATH-500 4-shot CoT | |--------------|------------|------------|------------| | 87.11 | 46.52 | 22.93 | 44.43 | - GSM8K (Grade School Math 8K) - Evaluates grade school level mathematical word problem solving. Dataset - MATH - Tests mathematical ability across multiple difficulty levels and various subjects including: Prealgebra, Algebra, Number Theory, Counting and Probability, Geometry, Intermediate Algebra, and Precalculus. Dataset - MATH Lvl 5 - Only the most difficult questions from the MATH dataset. Dataset - MATH-500 - Tests advanced mathematical problem solving across algebra, geometry, and calculus. Dataset | MMLU-Pro 5-shot CoT | MMLU 5-shot| |-------------------|------------------| |44.01 |72.77 | - MMLU Pro - Evaluates language understanding models across a broad range of challenging, reasoning-focused questions across 14 diverse domains. Dataset - MMLU - Tests knowledge across 57 subjects including science, humanities, math and more. Dataset The model was trained on data that contains toxic language and societal biases originally crawled from the internet. Therefore, the model may amplify those biases and return toxic responses especially when prompted with toxic prompts. The model may generate answers that may be inaccurate, omit key information, or include irrelevant or redundant text producing socially unacceptable or undesirable text, even if the prompt itself does not include anything explicitly offensive. The model demonstrates weakness to indirect prompt injection via some encodings, including Base16, Hex/ASCII, and Braille, though is more resilient than other similar models to injections using the more common Base64 vector. Inference - Engine: NeMo - Test Hardware NVIDIA H100-80GB Ethical Considerations NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse. For more detailed information on ethical considerations for this model, please see the Responsible Use Guide available at http://nvidia.com/nemotron-responsible-use. Please report security vulnerabilities or NVIDIA AI Concerns here.

NaNK
β€”
14,955
52

canary-qwen-2.5b

[](#model-architecture) | [](#model-architecture) | [](#datasets) Description: NVIDIA NeMo Canary-Qwen-2.5B is an English speech recognition model that achieves state-of-the art performance on multiple English speech benchmarks. With 2.5 billion parameters and running at 418 RTFx, Canary-Qwen-2.5B supports automatic speech-to-text recognition (ASR) in English with punctuation and capitalization (PnC). The model works in two modes: as a transcription tool (ASR mode) and as an LLM (LLM mode). In ASR mode, the model is only capable of transcribing the speech into text, but does not retain any LLM-specific skills such as reasoning. In LLM mode, the model retains all of the original LLM capabilities, which can be used to post-process the transcript, e.g. summarize it or answer questions about it. In LLM mode, the model does not "understand" the raw audio anymore - only its transcript. This model is ready for commercial use. License/Terms of Use: Canary-Qwen-2.5B is released under the CC-BY-4.0 license. By using this model, you are agreeing to the terms and conditions of the license. References: [1] Less is More: Accurate Speech Recognition & Translation without Web-Scale Data [2] Fast Conformer with Linearly Scalable Attention for Efficient Speech Recognition [5] Training and Inference Efficiency of Encoder-Decoder Speech Models [7] Granary: Speech Recognition and Translation Dataset in 25 European Languages [8] Towards Measuring Fairness in AI: the Casual Conversations Dataset [9] SALM: Speech-augmented Language Model with In-context Learning for Speech Recognition and Translation The model is intended for users requiring speech-to-text transcription capabilities for English speech, and/or transcript post-processing capabilities enabled by prompting the underlying LLMs. Typical use-cases: transcription, summarization, answering user questions about the transcript. Huggingface 07/17/2025 via https://huggingface.co/nvidia/canary-qwen-2.5b Model Architecture: Canary-Qwen is a Speech-Augmented Language Model (SALM) [9] model with FastConformer [2] Encoder and Transformer Decoder [3]. It is built using two base models: `nvidia/canary-1b-flash` [1,5] and `Qwen/Qwen3-1.7B` [4], a linear projection, and low-rank adaptation (LoRA) applied to the LLM. The audio encoder computes audio representation that is mapped to the LLM embedding space via a linear projection, and concatenated with the embeddings of text tokens. The model is prompted with "Transcribe the following: ", using Qwen's chat template. Input length. The maximum audio duration in training was 40s, and the maximum token sequence length was 1024 tokens (including prompt, audio, and response). The model may technically be able to process longer sequences, but its accuracy may be degraded. Exclusively ASR oriented capabilities. The model is not expected to preserve any of the underlying LLM's capabilities into speech modality. English-only language support. The model was trained using English data only. It may be able to spuriously transcribe other languages as the underlying encoder was pretrained using German, French, and Spanish speech in addition to English, but it's unlikely to be reliable as a multilingual model. To train, fine-tune or transcribe with Canary-Qwen-2.5B, you will need to install NVIDIA NeMo. The model is available for use in the NVIDIA NeMo toolkit [6], and can be used as a pre-trained checkpoint for inference or for fine-tuning on another dataset. Input Type(s): Audio, text prompt Input Format(s): Audio: .wav or .flac files. Text prompt string for ASR mode: `Transcribe the following: ` Input Parameters(s): Audio: Two-Dimensional (batch, audio-samples); Text: One-Dimensional (string) Other Properties Related to Input: 16000 Hz Mono-channel Audio, Pre-Processing Not Needed Input to Canary-Qwen-2.5B is a batch of prompts that include audio. To transcribe a dataset of recordings, specify the input as jsonl manifest file, where each line in the file is a dictionary containing the following fields: Output: Output Type(s): Text Output Format: Text transcript as a sequence of token IDs or a string Output Parameters: One-Dimensional text string Other Properties Related to Output: May Need Inverse Text Normalization Our AI models are designed and/or optimized to run on NVIDIA GPU-accelerated systems. By leveraging NVIDIA’s hardware (e.g. GPU cores) and software frameworks (e.g., CUDA libraries), the model achieves faster training and inference times compared to CPU-only solutions. Software Integration: Runtime Engine(s): NeMo - 2.5.0 or higher Supported Hardware Microarchitecture Compatibility: [NVIDIA Ampere] [NVIDIA Blackwell] [NVIDIA Jetson] [NVIDIA Hopper] [NVIDIA Lovelace] [NVIDIA Pascal] [NVIDIA Turing] [NVIDIA Volta] [Preferred/Supported] Operating System(s): [Linux] [Linux 4 Tegra] [Windows] Canary-Qwen-2.5B was trained using the NVIDIA NeMo toolkit [6] for a total of 90k steps on 32 NVIDIA A100 80GB GPUs. LLM parameters were kept frozen. Speech encoder, projection, and LoRA parameters were trainable. The encoder's output frame rate is 80ms, or 12.5 tokens per second. The model was trained on approximately 1.3B tokens in total (this number inlcudes the speech encoder output frames, text response tokens, prompt tokens, and chat template tokens). The model can be trained using this example script and base config. The tokenizer was inherited from `Qwen/Qwen3-1.7B`. The total size (in number of data points): approx. 40 million (speech, text) pairs Total number of datasets: 26, with 18 for training and 8 for test Dataset partition: Training 99.6%, testing 0.04%, validation 0% Time period for training data collection: 1990-2025 Time period for testing data collection: 2005-2022 Time period for validation data collection N/A (unused) The Canary-Qwen-2.5B model is trained on a total of 234K hrs of publicly available speech data. The datasets below include conversations, videos from the web and audiobook recordings. The majority of the training data comes from the English portion of the Granary dataset [7]: - YouTube-Commons (YTC) (109.5k hours) - YODAS2 (77k hours) - LibriLight (13.6k hours) In addition, the following datasets were used: - Librispeech 960 hours - Fisher Corpus - Switchboard-1 Dataset - WSJ-0 and WSJ-1 - National Speech Corpus (Part 1, Part 6) - VCTK - VoxPopuli (EN) - Europarl-ASR (EN) - Multilingual Librispeech (MLS EN) - Mozilla Common Voice (v11.0) - Mozilla Common Voice (v7.0) - Mozilla Common Voice (v4.0) - AMI - FLEURS AMI was oversampled during model training to constitute about 15% of the total data observed. This skewed the model towards predicting verbatim transcripts that include conversational speech disfluencies such as repetitions. The training transcripts contained punctuation and capitalization. Automatic Speech Recognition: HuggingFace OpenASR Leaderboard evaluation sets The ASR predictions were generated using greedy decoding. The ASR performance is measured with word error rate (WER), and we process the groundtruth and predicted text with whisper-normalizer version 0.1.12. | Version | Model | RTFx | Mean | AMI | GigaSpeech | LS Clean | LS Other | Earnings22 | SPGISpech | Tedlium | Voxpopuli | |:---------:|:-----------:|:------:|:------:|:------:|:------:|:------:|:------:|:------:|:------:|:------:|:------:| | 2.5.0 | Canary-Qwen-2.5B | 418 | 5.63 | 10.18 | 9.41 | 1.60 | 3.10 | 10.42 | 1.90 | 2.72 | 5.66 | More details on evaluation can be found at HuggingFace ASR Leaderboard Hallucination Robustness Number of characters per minute on MUSAN 48 hrs eval set (`maxnewtokens=50` following `nvidia/canary-1b-flash` evaluation) | Version | Model | # of character per minute | |:-----------:|:---------:|:----------:| | 2.5.0 | Canary-Qwen-2.5B | 138.1 | Noise Robustness WER on Librispeech Test Clean at different SNR (signal to noise ratio) levels of additive white noise | Version | Model | SNR 10 | SNR 5 | SNR 0 | SNR -5 | |:-----------:|:---------:|:----------:|:----------:|:----------:|:----------:| | 2.5.0 | Canary-Qwen-2.5B | 2.41% | 4.08% | 9.83% | 30.60% | As outlined in the paper "Towards Measuring Fairness in AI: the Casual Conversations Dataset" [8], we assessed the Canary-Qwen-2.5B model for fairness. The model was evaluated on the CasualConversations-v1 dataset with inference done on non-overlapping 40s chunks, and the results are reported as follows: | Gender | Male | Female | N/A | Other | | :--- | :--- | :--- | :--- | :--- | | Num utterances | 18471 | 23378 | 880 | 18 | | % WER | 16.71 | 13.85 | 17.71 | 29.46 | | Age Group | (18-30) | (31-45) | (46-85) | (1-100) | | :--- | :--- | :--- | :--- | :--- | | Num utterances | 15058 | 13984 | 12810 | 41852 | | % WER | 15.73 | 15.3 | 14.14 | 15.11 | (Error rates for fairness evaluation are determined by normalizing both the reference and predicted text, similar to the methods used in the evaluations found at https://github.com/huggingface/openasrleaderboard.) Ethical Considerations: NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse. For more detailed information on ethical considerations for this model, please see the Model Card++ Explainability, Bias, Safety & Security, and Privacy Subcards. Please report security vulnerabilities or NVIDIA AI Concerns here.

NaNK
license:cc-by-4.0
13,325
291

omnivinci

OmniVinci: Enhancing Architecture and Data for Omni-Modal Understanding LLM [](https://arxiv.org/abs/2510.15870) [](https://github.com/NVlabs/OmniVinci) [](https://huggingface.co/nvidia/omnivinci) [](https://nvlabs.github.io/OmniVinci) Introduction OmniVinci is an NVIDIA research project focused on exploring omni-modal LLMs that can not only see and read but also listen, speak, and reason. We are among the best omni-modality understanding models. Check out our performance on some of the most popular omni-modality, audio, and vision benchmarks: Below, we provide simple examples to show how to use our model with Transformers. 1. Download and navigate to the HuggingFace repository: 2. Install Python environment (based on NVILA codebase): - For audio and image inference examples, please refer to `exampleminiaudio.py` and `exampleminiimage.py`. License / Terms of Use The model is released under the NVIDIA OneWay Noncommercial License. Citation Please consider to cite our paper and this framework, if they are helpful in your research.

β€”
13,017
149

Llama-3_3-Nemotron-Super-49B-v1_5-FP8

Llama-3.3-Nemotron-Super-49B-v1.5-FP8 is a significantly upgraded version of Llama-3.3-Nemotron-Super-49B-v1 and is a large language model (LLM) which is a derivative of Meta Llama-3.3-70B-Instruct (AKA the reference model). It is a reasoning model that is post trained for reasoning, human chat preferences, and agentic tasks, such as RAG and tool calling. The model supports a context length of 128K tokens. Llama-3.3-Nemotron-Super-49B-v1.5-FP8 is a model which offers a great tradeoff between model accuracy and efficiency. Efficiency (throughput) directly translates to savings. Using a novel Neural Architecture Search (NAS) approach, we greatly reduce the model’s memory footprint, enabling larger workloads, as well as fitting the model on a single GPU at high workloads (H200). This NAS approach enables the selection of a desired point in the accuracy-efficiency tradeoff. For more information on the NAS approach, please refer to this paper The model underwent a multi-phase post-training process to enhance both its reasoning and non-reasoning capabilities. This includes a supervised fine-tuning stage for Math, Code, Science, and Tool Calling. Additionally, the model went through multiple stages of Reinforcement Learning (RL) including Reward-aware Preference Optimization (RPO) for chat, Reinforcement Learning with Verifiable Rewards (RLVR) for reasoning, and iterative Direct Preference Optimization (DPO) for Tool Calling capability enhancements. The final checkpoint was achieved after merging several RL and DPO checkpoints. This model is part of the Llama Nemotron Collection. You can find the other model(s) in this family here: - Llama-3.1-Nemotron-Nano-4B-v1.1 - Llama-3.1-Nemotron-Ultra-253B-v1 We want to hear from you! Share your ideas, vote on what matters, and help shape the future of Nemotron. GOVERNING TERMS: Your use of this model is governed by the NVIDIA Open Model License. Additional Information: Llama 3.3 Community License Agreement. Built with Llama. Model Dates: Trained between November 2024 and July 2025 Data Freshness: The pretraining data has a cutoff of 2023 per Meta Llama 3.3 70B Use Case: Developers designing AI Agent systems, chatbots, RAG systems, and other AI-powered applications. Also suitable for typical instruction-following tasks. Release Date: - Hugging Face 7/25/2025 via Llama-33-Nemotron-Super-49B-v15 - build.nvidia.com 7/25/2025 Llama-33-Nemotron-Super-49B-v15 [\[2505.00949\] Llama-Nemotron: Efficient Reasoning Models](https://arxiv.org/abs/2505.00949) [\[2502.00203\] Reward-aware Preference Optimization: A Unified Mathematical Framework for Model Alignment](https://arxiv.org/abs/2502.00203) [\[2411.19146\]Puzzle: Distillation-Based NAS for Inference-Optimized LLMs](https://arxiv.org/abs/2411.19146) Architecture Type: Dense decoder-only Transformer model Network Architecture: Llama 3.3 70B Instruct, customized through Neural Architecture Search (NAS) The model is a derivative of Meta’s Llama-3.3-70B-Instruct, using Neural Architecture Search (NAS). The NAS algorithm results in non-standard and non-repetitive blocks. This includes the following: Skip attention: In some blocks, the attention is skipped entirely, or replaced with a single linear layer. Variable FFN: The expansion/compression ratio in the FFN layer is different between blocks. We utilize a block-wise distillation of the reference model, where for each block we create multiple variants providing different tradeoffs of quality vs. computational complexity, discussed in more depth below. We then search over the blocks to create a model which meets the required throughput and memory (optimized for a single H100-80GB GPU) while minimizing the quality degradation. The model then undergoes knowledge distillation (KD), with a focus on English single and multi-turn chat use-cases. The KD step included 40 billion tokens consisting of a mixture of 3 datasets - FineWeb, Buzz-V1.2 and Dolma. Llama-3.3-Nemotron-Super-49B-v1.5-FP8 is a general purpose reasoning and chat model intended to be used in English and coding languages. Other non-English languages (German, French, Italian, Portuguese, Hindi, Spanish, and Thai) are also supported. Input - Input Type: Text - Input Format: String - Input Parameters: One-Dimensional (1D) - Other Properties Related to Input: Context length up to 131,072 tokens Output - Output Type: Text - Output Format: String - Output Parameters: One-Dimensional (1D) - Other Properties Related to Output: Context length up to 131,072 tokens Our AI models are designed and/or optimized to run on NVIDIA GPU-accelerated systems. By leveraging NVIDIA’s hardware (e.g. GPU cores) and software frameworks (e.g., CUDA libraries), the model achieves faster training and inference times compared to CPU-only solutions. Software Integration - Runtime Engine: Transformers - Recommended Hardware Microarchitecture Compatibility: - NVIDIA Ampere - NVIDIA Hopper - Preferred Operating System(s): Linux 1. By default (empty system prompt) the model will respond in reasoning ON mode. Setting `/nothink` in the system prompt will enable reasoning OFF mode. 2. We recommend setting temperature to `0.6`, and Top P to `0.95` for Reasoning ON mode 3. We recommend using greedy decoding for Reasoning OFF mode You can try this model out through the preview API, using this link: Llama-33-Nemotron-Super-49B-v15. Running a vLLM Server with Tool-call Support To enable tool calling usage with this model, we provide a tool parser in the repository. Here is an example on how to use it: After launching a vLLM server, you can call the server with tool-call support using a Python script like below. A large variety of training data was used for the knowledge distillation phase before post-training pipeline, 3 of which included: FineWeb, Buzz-V1.2, and Dolma. The data for the multi-stage post-training phases for improvements in Code, Math, and Reasoning is a compilation of SFT and RL data that supports improvements of math, code, general reasoning, and instruction following capabilities of the original Llama instruct model. Prompts have been sourced from either public and open corpus or synthetically generated. Responses were synthetically generated by a variety of models, with some prompts containing responses for both reasoning on and off modes, to train the model to distinguish between two modes. We have released our Nemotron-Post-Training-Dataset-v1 to promote openness and transparency in model development and improvement. Data Collection for Training Datasets: Hybrid: Automated, Human, Synthetic Data Labeling for Training Datasets: Hybrid: Automated, Human, Synthetic We used the datasets listed below to evaluate Llama-3.3-Nemotron-Super-49B-v1.5-FP8. Data Collection for Evaluation Datasets: - Hybrid: Human. Synthetic Data Labeling for Evaluation Datasets: - Hybrid: Human, Synthetic, Automatic Evaluation Results We evaluate the model using temperature=`0.6`, topp=`0.95`, and 64k sequence length. We run the benchmarks up to 16 times and average the scores to be more accurate. | Reasoning Mode | pass@1 (avg. over 4 runs) | |--------------|------------| | Reasoning On | 97.8 | | Reasoning Mode | pass@1 (avg. over 16 runs) | |--------------|------------| | Reasoning On | 88.54 | | Reasoning Mode | pass@1 (avg. over 16 runs) | |--------------|------------| | Reasoning On | 83.75 | | Reasoning Mode | pass@1 (avg. over 4 runs) | |--------------|------------| | Reasoning On | 69.57 | | Reasoning Mode | pass@1 (avg. over 4 runs) | |--------------|------------| | Reasoning On | 69.52 | | Reasoning Mode | pass@1 (avg. over 2 runs) | |--------------|------------| | Reasoning On | 71.11 | | Reasoning Mode | Strict:Instruction | |--------------|------------| | Reasoning On | 87.05 | | Reasoning Mode | pass@1 (avg. over 1 runs) | |--------------|------------| | Reasoning On | 88.7 | All evaluations were done using the NeMo-Skills repository. NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse. For more detailed information on ethical considerations for this model, please see the Model Card++ Explainability, Bias, Safety & Security, and Privacy Subcards. Please report security vulnerabilities or NVIDIA AI Concerns here.

NaNK
llama-3
12,516
20

Llama-3_3-Nemotron-Super-49B-v1

NaNK
llama-3
12,459
320

Llama-3.1-Nemotron-Nano-8B-v1

NaNK
llama
12,376
210

Nemotron-H-56B-Base-8K

NaNK
β€”
12,294
32

NVIDIA-Nemotron-Nano-12B-v2-VL-BF16

Description: NVIDIA Nemotron Nano v2 12B VL model enables multi-image reasoning and video understanding, along with strong document intelligence, visual Q&A and summarization capabilities. Governing Terms: Use of this model is governed by the NVIDIA Open Model License Agreement Use Case: Nemotron Nano 12B V2 VL is a model for multi-modal document intelligence. It would be used by individuals or businesses that need to process documents such as invoices, receipts, and manuals. The model is capable of handling multiple images of documents, up to four images at a resolution of 1k x 2k each, along with a long text prompt. The expected use is for tasks like summarization and Visual Question Answering (VQA). The model is also expected to have a significant advantage in throughput. Release Date: - Build.Nvidia.com [October 28th, 2025] via nvidia/NVIDIA-Nemotron-Nano-VL-12B-V2 - Hugging Face [October 28th, 2025] via nvidia/NVIDIA-Nemotron-Nano-VL-12B-V2-BF16 - Hugging Face [October 28th, 2025] via nvidia/NVIDIA-Nemotron-Nano-VL-12B-V2-FP8 - Hugging Face [October 28th, 2025] via nvidia/NVIDIA-Nemotron-Nano-VL-12B-V2-NVFP4 Model Architecture: Architecture Type: Transformer Network Architecture: Vision Encoder: CRadioV2-H Language Encoder: NVIDIA-Nemotron-Nano-12B-v2 Computational Load (For NVIDIA Models Only) Cumulative Compute: 2.2e+22 Estimated Energy and Emissions for Model Training: Energy Consumption: 7,827.46 kWh Carbon Emissions: 3.21 tCO2e Input: Input Type(s): Image, Video, Text Input Format: Image (png,jpg), Video (MP4, MKV, FLV, 3GP), Text (String) Input Parameters: Image (2D),Video(3D), Text (1D) Other Properties Related to Input: - Input Images Supported: 4 - Language Supported: English only - Input + Output Token: 128K - Minimum Resolution: 32 Γ— 32 pixels - Maximum Resolution: Determined by a 12-tile layout constraint, with each tile being 512 Γ— 512 pixels. This supports aspect ratios such as: - 4 Γ— 3 layout: up to 2048 Γ— 1536 pixels - 3 Γ— 4 layout: up to 1536 Γ— 2048 pixels - 2 Γ— 6 layout: up to 1024 Γ— 3072 pixels - 6 Γ— 2 layout: up to 3072 Γ— 1024 pixels - Other configurations allowed, provided total tiles ≀ 12 - Channel Count: 3 channels (RGB) - Alpha Channel: Not supported (no transparency) - Frames: 2 FPS with min of 8 frame and max of 128 frames Output: Output Type(s): Text Output Format: String Output Parameters: 1D Other Properties Related to Output: Input + Output Token: 128K Our AI models are designed and/or optimized to run on NVIDIA GPU-accelerated systems. By leveraging NVIDIA’s hardware (e.g. GPU cores) and software frameworks (e.g., CUDA libraries), the model achieves faster training and inference times compared to CPU-only solutions. Software Integration: Runtime Engine(s): [vLLM] [TRT-LLM] Supported Hardware Microarchitecture Compatibility: NVIDIA L40S NVIDIA A100 NVIDIA B200 NVIDIA H100/H200 NVIDIA RTX PRO 6000 Server Edition NVIDIA GB200 The integration of foundation and fine-tuned models into AI systems requires additional testing using use-case-specific data to ensure safe and effective deployment. Following the V-model methodology, iterative testing and validation at both unit and system levels are essential to mitigate risks, meet technical and functional requirements, and ensure compliance with safety and ethical standards before deployment. @misc{nvidia2025nvidianemotronnanov2, title={NVIDIA Nemotron Nano V2 VL}, author={NVIDIA}, year={2025}, eprint={2511.03929}, archivePrefix={arXiv}, primaryClass={cs.LG}, url={https://arxiv.org/abs/2511.03929}, } ```

NaNK
β€”
11,691
48

segformer-b3-finetuned-ade-512-512

β€”
11,612
12

segformer-b4-finetuned-ade-512-512

β€”
11,399
2

stt_ru_fastconformer_hybrid_large_pc

license:cc-by-4.0
11,108
15

Minitron-8B-Base

Minitron-8B-Base is a large language model (LLM) obtained by pruning Nemotron-4 15B; specifically, we prune model embedding size, number of attention heads, and MLP intermediate dimension. Following pruning, we perform continued training with distillation using 94 billion tokens to arrive at the final model; we use the continuous pre-training data corpus used in Nemotron-4 15B for this purpose. Deriving the Minitron 8B and 4B models from the base 15B model using our approach requires up to 40x fewer training tokens per model compared to training from scratch; this results in compute cost savings of 1.8x for training the full model family (15B, 8B, and 4B). Minitron models exhibit up to a 16% improvement in MMLU scores compared to training from scratch, perform comparably to other community models such as Mistral 7B, Gemma 7B and Llama-3 8B, and outperform state-of-the-art compression techniques from the literature. Please refer to our arXiv paper for more details. Model Dates: Minitron-8B-Base was trained between February 2024 and June 2024. Minitron-8B-Base is released under the NVIDIA Open Model License Agreement. Minitron-8B-Base uses a model embedding size of 4096, 48 attention heads, and an MLP intermediate dimension of 16384. It also uses Grouped-Query Attention (GQA) and Rotary Position Embeddings (RoPE). Architecture Type: Transformer Decoder (auto-regressive language model) Support for this model will be added in the upcoming `transformers` release. In the meantime, please install the library from source: The following code provides an example of how to load the Minitron-8B model and use it to perform text generation. Properties: The training corpus for Minitron-8B-Base consists of English and multilingual text, as well as code. Our sources cover a variety of document types such as: webpages, dialogue, articles, and other written materials. The corpus spans domains including legal, math, science, finance, and more. In our continued training set, we introduce a small portion of question-answering, and alignment style data to improve model performance. Data Freshness: The pretraining data has a cutoff of June 2023. 5-shot performance. Language Understanding evaluated using Massive Multitask Language Understanding: Zero-shot performance. Evaluated using select datasets from the LM Evaluation Harness with additions: HellaSwag | Winogrande | GSM8K| ARC-C | XLSum | | :------------- | :------------- | :------------- | :------------- | :------------- | | 81.6 | 80.3 | 54.2 | 49.2 | 31.1 Code generation performance. Evaluated using HumanEval: Please refer to our paper for the full set of results. The model was trained on data that contains toxic language, unsafe content, and societal biases originally crawled from the internet. Therefore, the model may amplify those biases and return toxic responses especially when prompted with toxic prompts. The model may generate answers that may be inaccurate, omit key information, or include irrelevant or redundant text producing socially unacceptable or undesirable text, even if the prompt itself does not include anything explicitly offensive. NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse. Please report security vulnerabilities or NVIDIA AI Concerns here. If you find our work helpful, please consider citing our paper:

NaNK
llama-3
11,047
69

Cosmos-Embed1-336p

β€”
10,686
1

Llama3-ChatQA-1.5-8B

Model Details We introduce Llama3-ChatQA-1.5, which excels at conversational question answering (QA) and retrieval-augmented generation (RAG). Llama3-ChatQA-1.5 is developed using an improved training recipe from ChatQA paper, and it is built on top of Llama-3 base model. Specifically, we incorporate more conversational QA data to enhance its tabular and arithmetic calculation capability. Llama3-ChatQA-1.5 has two variants: Llama3-ChatQA-1.5-8B and Llama3-ChatQA-1.5-70B. Both models were originally trained using Megatron-LM, we converted the checkpoints to Hugging Face format. For more information about ChatQA, check the website! Other Resources Llama3-ChatQA-1.5-70B   Evaluation Data   Training Data   Retriever   Website   Paper Benchmark Results Results in ChatRAG Bench are as follows: | | ChatQA-1.0-7B | Command-R-Plus | Llama3-instruct-70b | GPT-4-0613 | GPT-4-Turbo | ChatQA-1.0-70B | ChatQA-1.5-8B | ChatQA-1.5-70B | | -- |:--:|:--:|:--:|:--:|:--:|:--:|:--:|:--:| | Doc2Dial | 37.88 | 33.51 | 37.88 | 34.16 | 35.35 | 38.90 | 39.33 | 41.26 | | QuAC | 29.69 | 34.16 | 36.96 | 40.29 | 40.10 | 41.82 | 39.73 | 38.82 | | QReCC | 46.97 | 49.77 | 51.34 | 52.01 | 51.46 | 48.05 | 49.03 | 51.40 | | CoQA | 76.61 | 69.71 | 76.98 | 77.42 | 77.73 | 78.57 | 76.46 | 78.44 | | DoQA | 41.57 | 40.67 | 41.24 | 43.39 | 41.60 | 51.94 | 49.60 | 50.67 | | ConvFinQA | 51.61 | 71.21 | 76.6 | 81.28 | 84.16 | 73.69 | 78.46 | 81.88 | | SQA | 61.87 | 74.07 | 69.61 | 79.21 | 79.98 | 69.14 | 73.28 | 83.82 | | TopioCQA | 45.45 | 53.77 | 49.72 | 45.09 | 48.32 | 50.98 | 49.96 | 55.63 | | HybriDial | 54.51 | 46.7 | 48.59 | 49.81 | 47.86 | 56.44 | 65.76 | 68.27 | | INSCIT | 30.96 | 35.76 | 36.23 | 36.34 | 33.75 | 31.90 | 30.10 | 32.31 | | Average (all) | 47.71 | 50.93 | 52.52 | 53.90 | 54.03 | 54.14 | 55.17 | 58.25 | | Average (exclude HybriDial) | 46.96 | 51.40 | 52.95 | 54.35 | 54.72 | 53.89 | 53.99 | 57.14 | Note that ChatQA-1.5 is built based on Llama-3 base model, and ChatQA-1.0 is built based on Llama-2 base model. ChatQA-1.5 models use HybriDial training dataset. To ensure fair comparison, we also compare average scores excluding HybriDial. The data and evaluation scripts for ChatRAG Bench can be found here. Prompt Format We highly recommend that you use the prompt format we provide, as follows: when context is available The content of the system's turn (i.e., {System}) for both scenarios is as follows: This is a chat between a user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions based on the context. The assistant should also indicate when the answer cannot be found in the context. Note that our ChatQA-1.5 models are optimized for the capability with context, e.g., over documents or retrieved context. take the whole document as context This can be applied to the scenario where the whole document can be fitted into the model, so that there is no need to run retrieval over the document. run retrieval to get top-n chunks as context This can be applied to the scenario when the document is very long, so that it is necessary to run retrieval. Here, we use our Dragon-multiturn retriever which can handle conversatinoal query. In addition, we provide a few documents for users to play with. Correspondence to Zihan Liu ([email protected]), Wei Ping ([email protected]) @article{liu2024chatqa, title={ChatQA: Surpassing GPT-4 on Conversational QA and RAG}, author={Liu, Zihan and Ping, Wei and Roy, Rajarshi and Xu, Peng and Lee, Chankyu and Shoeybi, Mohammad and Catanzaro, Bryan}, journal={arXiv preprint arXiv:2401.10225}, year={2024}} License The use of this model is governed by the META LLAMA 3 COMMUNITY LICENSE AGREEMENT

NaNK
llama
10,273
554

quality-classifier-deberta

license:apache-2.0
9,798
73

difix

β€”
9,683
16

multilingual-domain-classifier

β€”
9,550
19

domain-classifier

Model Overview This is a text classification model to classify documents into one of 26 domain classes: Model Architecture - The model architecture is Deberta V3 Base - Context length is 512 tokens Training Details Training data: - 1 million Common Crawl samples, labeled using Google Cloud’s Natural Language API: https://cloud.google.com/natural-language/docs/classifying-text - 500k Wikepedia articles, curated using Wikipedia-API: https://pypi.org/project/Wikipedia-API/ Training steps: Model was trained in multiple rounds using Wikipedia and Common Crawl data, labeled by a combination of pseudo labels and Google Cloud API. How To Use This Model Input The model takes one or several paragraphs of text as input. Example input: Output The model outputs one of the 26 domain classes as the predicted domain for each input sample. Example output: The inference code is available on NeMo Curator's GitHub repository. Check out this example notebook to get started. How to Use in Transformers To use the domain classifier, use the following code: PR-AUC score on evaluation set with 105k samples - 0.9873 PR-AUC score for each domain: | Domain | PR-AUC | |:-------------------------|:-------| | Adult | 0.999 | | ArtsandEntertainment | 0.997 | | AutosandVehicles | 0.997 | | BeautyandFitness | 0.997 | | BooksandLiterature | 0.995 | | BusinessandIndustrial | 0.982 | | ComputersandElectronics| 0.992 | | Finance | 0.989 | | FoodandDrink | 0.998 | | Games | 0.997 | | Health | 0.997 | | HobbiesandLeisure | 0.984 | | HomeandGarden | 0.997 | | InternetandTelecom | 0.982 | | JobsandEducation | 0.993 | | LawandGovernment | 0.967 | | News | 0.918 | | OnlineCommunities | 0.983 | | PeopleandSociety | 0.975 | | PetsandAnimals | 0.997 | | RealEstate | 0.997 | | Science | 0.988 | | SensitiveSubjects | 0.982 | | Shopping | 0.995 | | Sports | 0.995 | | TravelandTransportation| 0.996 | | Mean | 0.9873 | References - https://arxiv.org/abs/2111.09543 - https://github.com/microsoft/DeBERTa License License to use this model is covered by the Apache 2.0. By downloading the public and release version of the model, you accept the terms and conditions of the Apache License 2.0. This repository contains the code for the domain classifier model.

license:apache-2.0
9,137
95

mit-b5

β€”
8,587
12

GR00T-N1.5-3B

NVIDIA Isaac GR00T N1.5 is an open foundation model for generalized humanoid robot reasoning and skills. This cross-embodiment model takes multimodal input, including language and images, to perform manipulation tasks in diverse environments. Developers and researchers can post-train GR00T N1.5 with real or synthetic data for their specific humanoid robot or task. Isaac `GR00T N1.5-3B` is the medium-sized version of our model built using pre-trained vision and language encoders, and uses a flow matching action transformer to model a chunk of actions conditioned on vision, language and proprioception. This model is ready for non-commercial use. License/Terms of Use Nvidia License You are responsible for ensuring that your use of NVIDIA AI Foundation Models complies with all applicable laws. Use Case: Researchers, Academics, Open-Source Community: AI-driven robotics research and algorithm development. Developers: Integrate and customize AI for various robotic applications. Startups & Companies: Accelerate robotics development and reduce training costs. Reference(s): NVIDIA-EAGLE: Li, Zhiqi, et al. "Eagle 2: Building Post-Training Data Strategies from Scratch for Frontier Vision-Language Models." arXiv preprint arXiv:2501.14818 (2025). Rectified Flow: Liu, Xingchao, and Chengyue Gong. "Flow Straight and Fast: Learning to Generate and Transfer Data with Rectified Flow." The Eleventh International Conference on Learning Representations”. Flow Matching Policy: Black, Kevin, et al. "Ο€0: A Vision-Language-Action Flow Model for General Robot Control." arXiv preprint arXiv:2410.24164 (2024). Model Architecture: Architecture Type: Vision Transformer, Multilayer Perceptron, Flow matching Transformer Isaac GR00T N1.5 uses vision and text transformers to encode the robot's image observations and text instructions. The architecture handles a varying number of views per embodiment by concatenating image token embeddings from all frames into a sequence, followed by language token embeddings. To model proprioception and a sequence of actions conditioned on observations, Isaac GR00T N1.5-3B uses a flow matching transformer. The flow matching transformer interleaves self-attention over proprioception and actions with cross-attention to the vision and language embeddings. During training, the input actions are corrupted by randomly interpolating between the clean action vector and a gaussian noise vector. At inference time, the policy first samples a gaussian noise vector and iteratively reconstructs a continuous-value action using its velocity prediction. In GR00T-N1.5, the MLP connector between the vision-language features and the diffusion-transformer (DiT) has been modified for improved performance on our sim benchmarks. Also, it was trained jointly with flow matching and world-modeling objectives. The schematic diagram is shown in the illustration above. Red, Green, Blue (RGB) camera frames are processed through a pre-trained vision transformer (SigLip2). Text is encoded by a pre-trained transformer (T5) Robot proprioception is encoded using a multi-layer perceptron (MLP) indexed by the embodiment ID. To handle variable-dimension proprio, inputs are padded to a configurable max length before feeding into the MLP. Actions are encoded and velocity predictions decoded by an MLP, one per unique embodiment. The flow matching transformer is implemented as a diffusion transformer (DiT), in which the diffusion step conditioning is implemented using adaptive layernorm (AdaLN). Input: Input Type: Vision: Image Frames State: Robot Proprioception Language Instruction: Text Input Format: Vision: Variable number of 224x224 uint8 image frames, coming from robot cameras State: Floating Point Language Instruction: String Input Parameters: Vision: 2D - RGB image, square State: 1D - Floating number vector Language Instruction: 1D - String Output: Output Type(s): Actions Output Format Continuous-value vectors Output Parameters: [Two-Dimensional (2D)] Other Properties Related to Output: Continuous-value vectors correspond to different motor controls on a robot, which depends on Degrees of Freedom of the robot embodiment. Our AI models are designed and/or optimized to run on NVIDIA GPU-accelerated systems. By leveraging NVIDIA’s hardware (e.g. GPU cores) and software frameworks (e.g., CUDA libraries), the model achieves faster training and inference times compared to CPU-only solutions. Supported Hardware Microarchitecture Compatibility: All of the below: NVIDIA Ampere NVIDIA Blackwell NVIDIA Jetson NVIDIA Hopper NVIDIA Lovelace Ethical Considerations: NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse. For more detailed information on ethical considerations for this model, please see the Model Card++ Explainability, Bias, Safety & Security), and Privacy Subcards. Please report security vulnerabilities or NVIDIA AI Concerns here. Previous Version: https://huggingface.co/nvidia/GR00T-N1-2B Blogpost: https://nvidianews.nvidia.com/news/foundation-model-isaac-robotics-platform Community Article with the tutorial how to finetune on SO100/101: https://huggingface.co/blog/nvidia/gr00t-n1-5-so101-tuning

NaNK
β€”
7,863
151

Nemotron-Research-Reasoning-Qwen-1.5B

πŸš€ The leading generalist reasoning model for cutting-edge research and development 🌟 News - [2025-11-20] Nemotron-Research-Reasoning-Qwen-1.5B-BroRL is released. - [2025-08-11] ProRL V2 blog post is released: ProRL V2 - Prolonged Training Validates RL Scaling Laws. - [2025-07-23] Nemotron-Research-Reasoning-Qwen-1.5B-v2 is released. - [2025-05-29] Nemotron-Research-Reasoning-Qwen-1.5B is released. Introduction Nemotron-Research-Reasoning-Qwen-1.5B is the world’s leading 1.5B open-weight model for complex reasoning tasks such as mathematical problems, coding challenges, scientific questions, and logic puzzles. It is trained using the ProRL algorithm on a diverse and comprehensive set of datasets. Our model has achieved impressive results, outperforming Deepseek’s 1.5B model by a large margin on a broad range of tasks, including math, coding, and GPQA. ProRL: Prolonged Reinforcement Learning ProRL is designed to enable extended RL training periods that facilitate deeper exploration of reasoning strategies. It enables more than 2k training steps and scale the training data across diverse tasksβ€”from traditional math and code tasks to STEM problems, logical puzzles, and instruction following, which, we hypothesize, are crucial for generalization. Based on Group Relative Policy Optimization (GRPO), ProRL introduces three key techniques: 1. Mitigating Entropy Collapse 2. Decoupled clip and dynamic sampling policy optimization (DAPO) 3. KL regularization and reference policy reset Using ProRL, we developed the world's best 1.5B reasoning model that significantly outperforms its base model, DeepSeek-R1-1.5B, and matches or even surpasses the performance of DeepSeek-R1-7B across a diverse range of benchmarks. Notably, compared to DeepSeek-R1-1.5B, we achieve average pass@1 improvements of 14.7\% on math benchmarks, 13.9\% on coding, 54.8\% on logic puzzles, 25.1\% on STEM reasoning, and 18.1\% on instruction-following tasks. Training Datasets | Dataset | Link | |---------------------------|-------------------------------------------------------------------------------------------| | DeepScaleR-Preview-Dataset | Link | | Eurus-2-RL-Data | Link | | Reasoning-gym | Link | | IFEval | Link | | SCP-116K | Link | Table 1: Performance (pass@1) comparison for benchmarks across Math domain. | Model | AIME24 | AIME25 | AMC | Math | Minerva | Olympiad | Avg | |-------------------------------|--------|--------|-------|-------|----------|----------|--------| | DeepSeek-R1-Distill-Qwen-1.5B | 28.54 | 22.71 | 62.58 | 82.90 | 26.38 | 43.58 | 44.45 | | DeepScaleR-1.5B | 40.21 | 31.46 | 73.04 | 89.36 | 41.57 | 51.63 | 54.54 | | DeepSeek-R1-Distill-Qwen-7B | 53.54 | 40.83 | 82.83 | 93.68 | 50.60 | 57.66 | 63.19 | | Nemotron-Research-Reasoning-Qwen-1.5B | 48.13 | 33.33 | 79.29 | 91.89 | 47.98 | 60.22 | 60.14 | | Nemotron-Research-Reasoning-Qwen-1.5B-v2 | 49.58 | 36.04 | 82.53 | 92.49 | 49.03 | 60.44 | 61.69 | | Nemotron-Research-Reasoning-Qwen-1.5B-BroRL | 60.42 | 35.63 | 83.06 | 92.20 | 48.58 | 62.11 | 63.66 | Table 2: Performance (pass@1) comparison across benchmarks for Code. We abbreviate benchmarks names for codecontests (cc), codeforces (cf), humanevalplus (human), and livecodebench (LCB). | Model | apps | cc | cf | taco | human | LCB | Avg | |-------------------------------|--------|--------|--------|--------|--------|--------|--------| | DeepSeek-R1-Distill-Qwen-1.5B | 20.95 | 16.79 | 14.13 | 8.03 | 61.77 | 16.80 | 23.08 | | DeepCoder-1.5B | 30.37 | 23.76 | 21.70 | 13.76 | 73.40 | 22.76 | 30.96 | | DeepSeek-R1-Distill-Qwen-7B | 42.08 | 32.76 | 33.08 | 19.08 | 83.32 | 38.04 | 41.39 | | Nemotron-Research-Reasoning-Qwen-1.5B | 41.99 | 31.80 | 34.50 | 20.81 | 72.05 | 23.81 | 37.49 | | Nemotron-Research-Reasoning-Qwen-1.5B-v2 | 46.39 | 35.59 | 40.75 | 22.89 | 72.89 | 27.69 | 41.03 | | Nemotron-Research-Reasoning-Qwen-1.5B-BroRL | 50.61 | 38.71 | 45.88 | 25.90 | - | - | - | Table 3: Performance comparison on STEM reasoning (GPQA Diamond), instruction following (IFEval), and logic puzzles (Reasoning Gym) tasks. We also present results on OOD tasks: acre, boxnet, and gameoflifehalting (game). | Model | GPQA | IFEval | Reasoning | acre | boxnet | game | |-------------------------------|--------|--------|-----------|--------|--------|--------| | DeepSeek-R1-Distill-Qwen-1.5B | 15.86 | 44.05 | 4.24 | 5.99 | 0.00 | 3.49 | | DeepSeek-R1-Distill-Qwen-7B | 35.44 | 58.01 | 28.55 | 20.21 | 1.71 | 12.94 | | Nemotron-Research-Reasoning-Qwen-1.5B | 41.78 | 66.02 | 59.06 | 58.57 | 7.91 | 52.29 | | Nemotron-Research-Reasoning-Qwen-1.5B-v2 | 41.32 | 70.85 | 62.49 | - | - | - | In the wake of the release of Nemotron-Research-Reasoning-Qwen-1.5B, we scaling the training steps from 2000 to 3000, resulting in Nemotron-Research-Reasoning-Qwen-1.5B-v2. Nemotron-Research-Reasoning-Qwen-1.5B-v2 builds on top of REINFORCE++-baseline with dynamic sampling and clip-higher, and proposes several critical enhancements such as periodically refreshing the reference model with the current best checkpoint and imposing the length penalty only in scheduled cycles. Together, these techniques allow model performance to continually improve with more RL training steps and expand LLMs' reasoning boundaries. Our latest checkpoint, Nemotron-Research-Reasoning-Qwen-1.5B-v2, trained for 3000 steps, sets a new state-of-the-art (SOTA) among 1.5B reasoning models. For the Nemotron-Research-Reasoning-Qwen-1.5B-v2, you can use the following code to load the model: For the original Nemotron-Research-Reasoning-Qwen-1.5B, you can use the following code to load the model: BroRL In BroRL, we continued training for 419 steps based on a nearly fully trained ProRLv2 checkpoint, increasing the number of samples per prompt from 16 to 512. We found that the improvement of BroRL over ProRLv2 was greater than that of ProRLv2 over ProRLv1. Ethical Considerations NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse. Please report security vulnerabilities or NVIDIA AI Concerns here. Citation If you find our dataset helpful, please cite the following ProRL paper and BroRL paper:

NaNK
license:cc-by-nc-4.0
7,230
231

Eagle2.5-8B

NaNK
β€”
7,108
30

AceReason-Nemotron-14B

NaNK
β€”
6,948
93

NVIDIA-Nemotron-Nano-12B-v2-VL-NVFP4-QAD

NaNK
β€”
6,812
9

Qwen3-8B-NVFP4

NaNK
license:apache-2.0
6,558
4

Llama-3.1-70B-Instruct-FP8

NaNK
llama
6,310
13

Qwen3-14B-NVFP4

NaNK
license:apache-2.0
6,285
2

segformer-b2-finetuned-cityscapes-1024-1024

β€”
6,227
2

DeepSeek-R1-FP4

Description: The NVIDIA DeepSeek R1 FP4 model is the quantized version of the DeepSeek AI's DeepSeek R1 model, which is an auto-regressive language model that uses an optimized transformer architecture. For more information, please check here. The NVIDIA DeepSeek R1 FP4 model is quantized with TensorRT Model Optimizer. This model is ready for commercial/non-commercial use. Third-Party Community Consideration This model is not owned or developed by NVIDIA. This model has been developed and built to a third-party’s requirements for this application and use case; see link to Non-NVIDIA (DeepSeek R1) Model Card. Model Architecture: Architecture Type: Transformers Network Architecture: DeepSeek R1 Input: Input Type(s): Text Input Format(s): String Input Parameters: One Dimensional(1D): Sequences Other Properties Related to Input: Context length up to 128K Output: Output Type(s): Text Output Format: String Output Parameters: 1D: Sequences Other Properties Related to Output: N/A Software Integration: Supported Runtime Engine(s): TensorRT-LLM Supported Hardware Microarchitecture Compatibility: NVIDIA Blackwell Model Version(s): The model is quantized with nvidia-modelopt v0.23.0 Datasets: Calibration Dataset: cnndailymail Data collection method: Automated. Labeling method: Unknown. Evaluation Dataset: MMLU Data collection method: Unknown. Labeling method: N/A. Inference: Engine: TensorRT-LLM Test Hardware: B200 Post Training Quantization This model was obtained by quantizing the weights and activations of DeepSeek R1 to FP4 data type, ready for inference with TensorRT-LLM. Only the weights and activations of the linear operators within transformers blocks are quantized. This optimization reduces the number of bits per parameter from 8 to 4, reducing the disk size and GPU memory requirements by approximately 1.6x. To deploy the quantized FP4 checkpoint with TensorRT-LLM LLM API, follow the sample codes below (you need 8xB200 GPU and TensorRT LLM built from source with the latest main branch): Evaluation The accuracy benchmark results are presented in the table below: NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse. Please report security vulnerabilities or NVIDIA AI Concerns here.

NaNK
license:mit
6,144
265

esm2_t33_650M_UR50D

license:mit
6,034
1

DAM-3B

NaNK
llava_llama
5,941
128

diar_sortformer_4spk-v1

A newer streaming Sortformer is available at huggingface.co/nvidia/diarstreamingsortformer4spk-v2. Sortformer[1] is a novel end-to-end neural model for speaker diarization, trained with unconventional objectives compared to existing end-to-end diarization models. Sortformer resolves permutation problem in diarization following the arrival-time order of the speech segments from each speaker. Sortformer consists of an L-size (18 layers) NeMo Encoder for Speech Tasks (NEST)[2] which is based on Fast-Conformer[3] encoder. Following that, an 18-layer Transformer[4] encoder with hidden size of 192, and two feedforward layers with 4 sigmoid outputs for each frame input at the top layer. More information can be found in the Sortformer paper[1]. To train, fine-tune or perform diarization with Sortformer, you will need to install NVIDIA NeMo[5]. We recommend you install it after you've installed Cython and latest PyTorch version. The model is available for use in the NeMo Framework[5], and can be used as a pre-trained checkpoint for inference or for fine-tuning on another dataset. Input Format Input to Sortformer can be an individual audio file: where each line is a dictionary containing the following fields: Getting Diarization Results To perform speaker diarization and get a list of speaker-marked speech segments in the format 'beginseconds, endseconds, speakerindex', simply use: To obtain tensors of speaker activity probabilities, use: This model accepts single-channel (mono) audio sampled at 16,000 Hz. - The actual input tensor is a Ns x 1 matrix for each audio clip, where Ns is the number of samples in the time-series signal. - For instance, a 10-second audio clip sampled at 16,000 Hz (mono-channel WAV file) will form a 160,000 x 1 matrix. The output of the model is a T x S matrix, where: - S is the maximum number of speakers (in this model, S = 4). - T is the total number of frames, including zero-padding. Each frame corresponds to a segment of 0.08 seconds of audio. - Each element of the T x S matrix represents the speaker activity probability in the [0, 1] range. For example, a matrix element a(150, 2) = 0.95 indicates a 95% probability of activity for the second speaker during the time range [12.00, 12.08] seconds. Train and evaluate Sortformer diarizer using NeMo Training Sortformer diarizer models are trained on 8 nodes of 8Γ—NVIDIA Tesla V100 GPUs. We use 90 second long training samples and batch size of 4. The model can be trained using this example script and base config. To evaluate Sortformer diarizer and save diarization results in RTTM format, use the inference example script: You can provide the post-processing YAML configs from `postprocessing` folder to reproduce the optimized post-processing algorithm for each development dataset: - The model operates in a non-streaming mode (offline mode). - It can detect a maximum of 4 speakers; performance degrades on recordings with 5 and more speakers. - The maximum duration of a test recording depends on available GPU memory. For an RTX A6000 48GB model, the limit is around 12 minutes. - The model was trained on publicly available speech datasets, primarily in English. As a result: Performance may degrade on non-English speech. Performance may also degrade on out-of-domain data, such as recordings in noisy conditions. Sortformer was trained on a combination of 2030 hours of real conversations and 5150 hours or simulated audio mixtures generated by NeMo speech data simulator[6]. All the datasets listed above are based on the same labeling method via RTTM format. A subset of RTTM files used for model training are processed for the speaker diarization model training purposes. Data collection methods vary across individual datasets. For example, the above datasets include phone calls, interviews, web videos, and audiobook recordings. Please refer to the Linguistic Data Consortium (LDC) website or dataset webpage for detailed data collection methods. Training Datasets (Real conversations) - Fisher English (LDC) - 2004-2010 NIST Speaker Recognition Evaluation (LDC) - Librispeech - AMI Meeting Corpus - VoxConverse-v0.3 - ICSI - AISHELL-4 - Third DIHARD Challenge Development (LDC) - 2000 NIST Speaker Recognition Evaluation, split1 (LDC) Training Datasets (Used to simulate audio mixtures) - 2004-2010 NIST Speaker Recognition Evaluation (LDC) - Librispeech | Dataset | DIHARD3-Eval | CALLHOME-part2 | CALLHOME-part2 | CALLHOME-part2 | CH109 | |:------------------------------|:------------------:|:-------------------:|:-------------------:|:-------------------:|:------------------:| | Number of Speakers | ≀ 4 speakers | 2 speakers | 3 speakers | 4 speakers | 2 speakers | | Collar (sec) | 0.0s | 0.25s | 0.25s | 0.25s | 0.25s | | Mean Audio Duration (sec) | 453.0s | 73.0s | 135.7s | 329.8s | 552.9s | All evaluations include overlapping speech. Bolded and italicized numbers represent the best-performing Sortformer evaluations. Post-Processing (PP) is optimized on two different held-out dataset splits. - YAML file for DH3-dev Optimized Post-Processing - YAML file for CallHome-part1 Optimized Post-Processing | Dataset | DIHARD3-Eval | CALLHOME-part2 | CALLHOME-part2 | CALLHOME-part2 | CH109 | |:----------------------------------------------------------|:------------------:|:-------------------:|:-------------------:|:-------------------:|:------------------:| | DER diarsortformer4spk-v1 | 16.28 | 6.49 | 10.01 | 14.14 | 6.27 | | DER diarsortformer4spk-v1 + DH3-dev Opt. PP | 14.76 | - | - | - | - | | DER diarsortformer4spk-v1 + CallHome-part1 Opt. PP | - | 5.85 | 8.46 | 12.59 | 6.86 | All tests were measured on RTX A6000 48GB with batch size of 1. Post-processing is not included in RTFx calculations. | Datasets | DIHARD3-Eval | CALLHOME-part2 | CALLHOME-part2 | CALLHOME-part2 | CH109 | |:----------------------------------|:-------------------:|:-------------------:|:-------------------:|:-------------------:|:------------------:| | RTFx diarsortformer4spk-v1 | 437 | 1053 | 915 | 545 | 415 | NVIDIA Riva, is an accelerated speech AI SDK deployable on-prem, in all clouds, multi-cloud, hybrid, on edge, and embedded. Additionally, Riva provides: World-class out-of-the-box accuracy for the most common languages with model checkpoints trained on proprietary data with hundreds of thousands of GPU-compute hours Best in class accuracy with run-time word boosting (e.g., brand and product names) and customization of acoustic model, language model, and inverse text normalization Streaming speech recognition, Kubernetes compatible scaling, and enterprise-grade support Although this model isn’t supported yet by Riva, the list of supported models is here. Check out Riva live demo. References [1] Sortformer: Seamless Integration of Speaker Diarization and ASR by Bridging Timestamps and Tokens [2] NEST: Self-supervised Fast Conformer as All-purpose Seasoning to Speech Processing Tasks [3] Fast Conformer with Linearly Scalable Attention for Efficient Speech Recognition License to use this model is covered by the CC-BY-NC-4.0. By downloading the public and release version of the model, you accept the terms and conditions of the CC-BY-NC-4.0 license.

license:cc-by-nc-4.0
5,909
112

OpenReasoning-Nemotron-32B

Description: OpenReasoning-Nemotron-32B is a large language model (LLM) which is a derivative of Qwen2.5-32B (AKA the reference model). It is a reasoning model that is post-trained for reasoning about math, code and science solution generation. We evaluated this model with up to 64K output tokens. The OpenReasoning model is available in the following sizes: 1.5B, 7B and 14B and 32B. This model is ready for commercial/non-commercial research use. License/Terms of Use: GOVERNING TERMS: Use of the models listed above are governed by the Creative Commons Attribution 4.0 International License (CC-BY-4.0). ADDITIONAL INFORMATION: Apache 2.0 License Our models demonstrate exceptional performance across a suite of challenging reasoning benchmarks. The 7B, 14B, and 32B models consistently set new state-of-the-art records for their size classes. | Model | AritificalAnalysisIndex | GPQA | MMLU-PRO | HLE | LiveCodeBench | SciCode | AIME24 | AIME25 | HMMT FEB 25 | | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | | 1.5B| 31.0 | 31.6 | 47.5 | 5.5 | 28.6 | 1.0 | 55.5 | 45.6 | 31.5 | | 7B | 54.7 | 61.1 | 71.9 | 8.3 | 63.3 | 20.3 | 84.7 | 78.2 | 63.5 | | 14B | 60.9 | 71.6 | 77.5 | 10.1 | 67.8 | 32.4 | 87.8 | 82.0 | 71.2 | | 32B | 64.3 | 73.1 | 80.0 | 11.9 | 70.2 | 39.6 | 89.2 | 84.0 | 73.8 | \ This is our estimation of the Artificial Analysis Intelligence Index, not an official score. Combining the work of multiple agents OpenReasoning-Nemotron models can be used in a "heavy" mode by starting multiple parallel generations and combining them together via generative solution selection (GenSelect). To add this "skill" we follow the original GenSelect training pipeline except we do not train on the selection summary but use the full reasoning trace of DeepSeek R1 0528 671B instead. We only train models to select the best solution for math problems but surprisingly find that this capability directly generalizes to code and science questions! With this "heavy" GenSelect inference mode, OpenReasoning-Nemotron-32B model surpasses O3 (High) on math and coding benchmarks. | Model | Pass@1 (Avg@64) | Majority@64 | GenSelect | | :--- | :--- | :--- | :--- | | 1.5B | | | | | AIME24 | 55.5 | 76.7 | 76.7 | | AIME25 | 45.6 | 70.0 | 70.0 | | HMMT Feb 25 | 31.5 | 46.7 | 53.3 | | 7B | | | | | AIME24 | 84.7 | 93.3 | 93.3 | | AIME25 | 78.2 | 86.7 | 93.3 | | HMMT Feb 25 | 63.5 | 83.3 | 90.0 | | LCB v6 2408-2505 | 63.4 | n/a | 67.7 | | 14B | | | | | AIME24 | 87.8 | 93.3 | 93.3 | | AIME25 | 82.0 | 90.0 | 90.0 | | HMMT Feb 25 | 71.2 | 86.7 | 93.3 | | LCB v6 2408-2505 | 67.9 | n/a | 69.1 | | 32B | | | | | AIME24 | 89.2 | 93.3 | 93.3 | | AIME25 | 84.0 | 90.0 | 93.3 | | HMMT Feb 25 | 73.8 | 86.7 | 96.7 | | LCB v6 2408-2505 | 70.2 | n/a | 75.3 | | HLE | 11.8 | 13.4 | 15.5 | python for just the final solution code block with the following format: Math generation prompt prompt = """Solve the following math problem. Make sure to put the answer (and only answer) inside \\boxed{}. {user} """ Science generation prompt You can refer to prompts here - https://github.com/NVIDIA/NeMo-Skills/blob/main/nemoskills/prompt/config/generic/hle.yaml (HLE) https://github.com/NVIDIA/NeMo-Skills/blob/main/nemoskills/prompt/config/eval/aai/mcq-4choices-boxed.yaml (for GPQA) https://github.com/NVIDIA/NeMo-Skills/blob/main/nemoskills/prompt/config/eval/aai/mcq-10choices-boxed.yaml (MMLU-Pro) messages = [ { "role": "user", "content": prompt.format(user="Write a program to calculate the sum of the first $N$ fibonacci numbers")}, ] outputs = pipeline( messages, maxnewtokens=64000, ) print(outputs[0]["generatedtext"][-1]['content']) @inproceedings{toshniwal2025genselect, title={{GenSelect: A Generative Approach to Best-of-N}}, author={Shubham Toshniwal and Ivan Sorokin and Aleksander Ficek and Ivan Moshkov and Igor Gitman}, booktitle={2nd AI for Math Workshop @ ICML 2025}, year={2025}, url={https://openreview.net/forum?id=8LhnmNmUDb} } @misc{ahmad2025opencodereasoningiisimpletesttime, title={{OpenCodeReasoning-II: A Simple Test Time Scaling Approach via Self-Critique}}, author={Wasi Uddin Ahmad and Somshubra Majumdar and Aleksander Ficek and Sean Narenthiran and Mehrzad Samadi and Jocelyn Huang and Siddhartha Jain and Vahid Noroozi and Boris Ginsburg}, year={2025}, eprint={2507.09075}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2507.09075}, } @misc{moshkov2025aimo2winningsolutionbuilding, title={{AIMO-2 Winning Solution: Building State-of-the-Art Mathematical Reasoning Models with OpenMathReasoning dataset}}, author={Ivan Moshkov and Darragh Hanley and Ivan Sorokin and Shubham Toshniwal and Christof Henkel and Benedikt Schifferer and Wei Du and Igor Gitman}, year={2025}, eprint={2504.16891}, archivePrefix={arXiv}, primaryClass={cs.AI}, url={https://arxiv.org/abs/2504.16891}, } @article{ahmad2025opencodereasoning, title={{OpenCodeReasoning: Advancing Data Distillation for Competitive Coding}}, author={Wasi Uddin Ahmad, Sean Narenthiran, Somshubra Majumdar, Aleksander Ficek, Siddhartha Jain, Jocelyn Huang, Vahid Noroozi, Boris Ginsburg}, year={2025}, eprint={2504.01943}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2504.01943}, } ``` Use Case: This model is intended for developers and researchers who work on competitive math, code and science problems. It has been trained via only supervised fine-tuning to achieve strong scores on benchmarks. Release Date: Huggingface [07/16/2025] via https://huggingface.co/nvidia/OpenReasoning-Nemotron-32B/ Reference(s): [2504.01943] OpenCodeReasoning: Advancing Data Distillation for Competitive Coding [2504.01943] OpenCodeReasoning: Advancing Data Distillation for Competitive Coding [2504.16891] AIMO-2 Winning Solution: Building State-of-the-Art Mathematical Reasoning Models with OpenMathReasoning dataset Model Architecture: Architecture Type: Dense decoder-only Transformer model Network Architecture: Qwen2.5-32B This model was developed based on Qwen2.5-32B and has 32B model parameters. OpenReasoning-Nemotron-1.5B was developed based on Qwen2.5-1.5B and has 1.5B model parameters. OpenReasoning-Nemotron-7B was developed based on Qwen2.5-7B and has 7B model parameters. OpenReasoning-Nemotron-14B was developed based on Qwen2.5-14B and has 14B model parameters. OpenReasoning-Nemotron-32B was developed based on Qwen2.5-32B and has 32B model parameters. Input: Input Type(s): Text Input Format(s): String Input Parameters: One-Dimensional (1D) Other Properties Related to Input: Trained for up to 64,000 output tokens Output: Output Type(s): Text Output Format: String Output Parameters: One-Dimensional (1D) Other Properties Related to Output: Trained for up to 64,000 output tokens Our AI models are designed and/or optimized to run on NVIDIA GPU-accelerated systems. By leveraging NVIDIA’s hardware (e.g. GPU cores) and software frameworks (e.g., CUDA libraries), the model achieves faster training and inference times compared to CPU-only solutions. Software Integration : Runtime Engine: NeMo 2.3.0 Recommended Hardware Microarchitecture Compatibility: NVIDIA Ampere NVIDIA Hopper Preferred/Supported Operating System(s): Linux Model Version(s): 1.0 (7/16/2025) OpenReasoning-Nemotron-32B OpenReasoning-Nemotron-14B OpenReasoning-Nemotron-7B OpenReasoning-Nemotron-1.5B The training corpus for OpenReasoning-Nemotron-32B is comprised of questions from OpenCodeReasoning dataset, OpenCodeReasoning-II, OpenMathReasoning, and the Synthetic Science questions from the Llama-Nemotron-Post-Training-Dataset. All responses are generated using DeepSeek-R1-0528. We also include the instruction following and tool calling data from Llama-Nemotron-Post-Training-Dataset without modification. Data Collection Method: Hybrid: Automated, Human, Synthetic Labeling Method: Hybrid: Automated, Human, Synthetic Properties: 5M DeepSeek-R1-0528 generated responses from OpenCodeReasoning questions (https://huggingface.co/datasets/nvidia/OpenCodeReasoning), OpenMathReasoning, and the Synthetic Science questions from the Llama-Nemotron-Post-Training-Dataset. We also include the instruction following and tool calling data from Llama-Nemotron-Post-Training-Dataset without modification. Evaluation Dataset: We used the following benchmarks to evaluate the model holistically. Data Collection Method: Hybrid: Automated, Human, Synthetic Labeling Method: Hybrid: Automated, Human, Synthetic Ethical Considerations: NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse. For more detailed information on ethical considerations for this model, please see the Model Card++ Explainability, Bias, Safety & Security, and Privacy Subcards. Please report model quality, risk, security vulnerabilities or NVIDIA AI Concerns here.

NaNK
license:cc-by-4.0
5,868
118

Qwen3-8B-FP4

Description: The NVIDIA Qwen3-8B FP4 model is the quantized version of Alibaba's Qwen3-8B model, which is an auto-regressive language model that uses an optimized transformer architecture. For more information, please check here. The NVIDIA Qwen3-8B FP4 model is quantized with TensorRT Model Optimizer. This model is ready for commercial/non-commercial use. Third-Party Community Consideration This model is not owned or developed by NVIDIA. This model has been developed and built to a third-party’s requirements for this application and use case; see link to Non-NVIDIA (Qwen3-8B) Model Card. License/Terms of Use: Use of this model is governed by Apache license 2.0 Use Case: Developers looking to take off the shelf pre-quantized models for deployment in AI Agent systems, chatbots, RAG systems, and other AI-powered applications. Release Date: Huggingface 09/15/2025 via https://huggingface.co/nvidia/Qwen3-8B-FP4 Model Architecture: Architecture Type: Transformers Network Architecture: Qwen3-8B This model was developed based on Qwen3-8B Number of model parameters: 8.210^9 Input: Input Type(s): Text Input Format(s): String Input Parameters: 1D (One-Dimensional): Sequences Other Properties Related to Input: Context length up to 131K Output: Output Type(s): Text Output Format: String Output Parameters: 1D (One-Dimensional): Sequences Other Properties Related to Output: N/A Our AI models are designed and/or optimized to run on NVIDIA GPU-accelerated systems. By leveraging NVIDIA’s hardware (e.g. GPU cores) and software frameworks (e.g., CUDA libraries), the model achieves faster training and inference times compared to CPU-only solutions. Software Integration: Supported Runtime Engine(s): TensorRT-LLM Supported Hardware Microarchitecture Compatibility: NVIDIA Blackwell Model Version(s): The model is quantized with nvidia-modelopt v0.35.0 Post Training Quantization This model was obtained by quantizing the weights and activations of Qwen3-8B to FP4 data type, ready for inference with TensorRT-LLM. Only the weights and activations of the linear operators within transformer blocks are quantized. Calibration Dataset: Link: cnndailymail Data collection method: Automated. Labeling method: Automated. Training Datasets: Data Collection Method by Dataset: Undisclosed Labeling Method by Dataset: Undisclosed Properties: Undisclosed Testing Dataset: Data Collection Method by Dataset: Undisclosed Labeling Method by Dataset: Undisclosed Properties: Undisclosed To deploy the quantized checkpoint with TensorRT-LLM LLM API, follow the sample codes below: NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse. Please report model quality, risk, security vulnerabilities or NVIDIA AI Concerns here.

NaNK
license:apache-2.0
5,667
3

segformer-b1-finetuned-cityscapes-1024-1024

β€”
5,598
18

AceReason-Nemotron-1.1-7B

NaNK
β€”
5,279
56

audio-flamingo-3-hf

Audio Flamingo 3: Advancing Audio Intelligence with Fully Open Large Audio-Language Models Description: Audio Flamingo 3 (AF3) is a fully open, state-of-the-art Large Audio-Language Model (LALM) that advances reasoning and understanding across speech, sounds, and music. AF3 builds on previous work with innovations in: - Unified audio representation learning (speech, sound, music) - Flexible, on-demand chain-of-thought reasoning - Long-context audio comprehension (up to 10 minutes) - Multi-turn, multi-audio conversational dialogue (AF3-Chat) - Voice-to-voice interaction (AF3-Chat) Extensive evaluations confirm AF3’s effectiveness, setting new benchmarks on over 20 public audio understanding and reasoning tasks. This model is for non-commercial research purposes only. Audio Flamingo 3 is supported in πŸ€— Transformers. To run the model, first install Transformers: > Note: AF3 processes audio in 30-second windows with a 10-minute total cap per sample. Longer inputs are truncated. AF3 transcription checkpoints prepend answers with fixed assistant phrasing such as `The spoken content of the audio is " ".`. Passing `stripprefix=True` removes that canned prefix and the surrounding quotes so you only keep the transcription. You can tune decoding similar to other text-generation models: If your GPU supports it and you are not using `torch.compile`, install Flash-Attention and enable it at load time: AF3’s forward pass is compatible with `torch.compile` for significant speed-ups: > `torch.compile` is not compatible with Flash Attention 2 at the same time. If Flash-Attention isn’t available, AF3 will use PyTorch scaled-dot product attention (SDPA) by default on supported PyTorch versions. You can set it explicitly: Model Architecture: Audio Flamingo 3 uses AF-Whisper unified audio encoder, MLP-based audio adaptor, Decoder-only LLM backbone (Qwen2.5-7B), and Streaming TTS module (AF3-Chat). Audio Flamingo 3 can take up to 10 minutes of audio inputs. License / Terms of Use The model is released under the NVIDIA OneWay Noncommercial License. Portions of the dataset generation are also subject to the Qwen Research License and OpenAI’s Terms of Use. Use Case Intended for researchers and developers to explore: - Audio question answering and reasoning - Long-context audio comprehension - Interactive sound/music design assistants - Multi-turn (voice) chat Release Date - Github (07/10/2025) via https://github.com/NVIDIA/audio-flamingo - HuggingFace (07/10/2025) via https://huggingface.co/nvidia/audio-flamingo-3 References: [Audio Flamingo 3: Advancing Audio Intelligence with Fully Open Large Audio-Language Models]() Project Page Demo Website Hugging Face Model Architecture: Architecture Type: Transformer Network Architecture: Audio Flamingo 3 AF3 uses: - AF-Whisper unified audio encoder - MLP-based audio adaptor - Decoder-only LLM backbone (Qwen2.5-7B) - Streaming TTS module (AF3-Chat) This model was developed based on NVILA and Qwen-2.5-7B Input: - Input Type: Audio, Text - Input Format: WAV/MP3/FLAC, UTF-8 text - Input Parameters: Audio is Two-Dimensional (2D) and Text is One-Dimensional (1D) - Other Properties Related to Input: - Max Audio Length: 10 Minutes - Max Text Length: 16000 tokens Output: - Output Type: Text (and optional speech) - Text Format: UTF-8 string - Output Parameters: One-Dimensional (1D) - Other Properties Related to Output: - Max Text Length: 1024 tokens - Speech Format: streaming TTS (text-to-speech) waveform Our AI models are designed and/or optimized to run on NVIDIA GPU-accelerated systems (A100/H100). By leveraging NVIDIA’s hardware (e.g. GPU cores) and software frameworks (e.g., CUDA libraries), the model achieves faster training and inference times compared to CPU-only solutions. Software Integration: Runtime Engine: PyTorch / HuggingFace Transformers Supported Hardware: NVIDIA Ampere (A100) NVIDIA Hopper (H100) Training Dataset: AF3 is trained entirely on open-source audio data, organized into four novel, large-scale collections. For each dataset, we mention whether the dataset annotations are collected by Human or they are Automated i.e. generated using AI models. The data collection method noted below applies for all datasets used for training and testing: Data Collection Method: Human Labeling Collection Method: Please see below: General Sound: WavCaps (Automated) MACS (Human) SoundDescs (Human) Clotho-v2 (Human) WavText5K (Human) Clotho-AQA (Human) Open-AQA (Automated) CompA-R (Automated) Salmonn AQA (Automated) Audio Entailment(Automated) CompA (Automated) AudioSet (Human) YouTube-8M (Human) FSD50k (Human) CochlScene (Human) NonSpeech7K (Human) Chime-Home (Human) Sonyc-UST (Human) Music: LP-MusicCaps (Automated) MusicQA (Automated) MusicAVQA (Human) MusicBench (Automated) Mu-LLAMA (Automated) NSynth (Human) FMA (Human) MusDB-HQ (Human) Music4All (Human) Million Song Dataset (Human) Speech: MSP-Podcast (Human) JL-Corpus (Human) MELD (Human) Tess (Human) OMGEmotion (Human) Emov-DB (Human) LibriSpeech (Human) SPGISpeech (Human) TEDLIUM (Human) GigaSpeech (Human) Common Voice 15 (Human) VoxPopuli (Human) VoxCeleb2 (Human) Switchboard (Human) AMI (Human) Mixed: AudioSkills-XL (ours) (Automated) LongAudio-XL (ours) (Automated) AF-Think (ours) (Automated) AF-Chat (ours) (Automated) Testing Dataset: Audio Flamingo 3 is evaluated on the test split of the following datasets. Data Collection Method: Human (for all datasets noted below) Labeling Method: See below ClothoAQA (Human) MusicAVQA (Human) Clotho-v2 (Human) CochlScene (Human) NonSpeech7K (Human) NSynth (Human) AudioCaps (Human) US8K (Human) GTZAN (Human) MMAU (Human) MMAR (Human) Audio Entailment(Automated) CompA-R-test (Automated) MuchoMusic (Automated) Open-AQA(Automated) MusicInstruct (Automated) MusicQA (Automated) CMM Hallucination (Human) IEMOCAP (Human) VoiceBench (Human) OpenAudioBench (Human) SEED (Human) LibriSpeech (Human) SPGISpeech (Human) TEDLIUM (Human) GigaSpeech (Human) Common Voice 15 (Human) VoxPopuli (Human) LongAudioBench (ours) (Automated) AF-Chat-test (ours) (Human) Engine: HuggingFace Transformers Test Hardware: NVIDIA A100 80 GB Ethical Considerations: NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse. Please report security vulnerabilities or NVIDIA AI Concerns here. Acknowledgements Built with Qwen, NVILA and the open audio-ML community.

β€”
5,278
125

Qwen3-30B-A3B-NVFP4

NaNK
license:apache-2.0
5,215
15

Llama-3_1-Nemotron-51B-Instruct

NaNK
llama-3
5,070
209

NVIDIA-Nemotron-Parse-v1.1

β€”
4,894
98

mit-b1

β€”
4,745
2

Qwen3-30B-A3B-FP4

Description: The NVIDIA Qwen3-30B-A3B FP4 model is the quantized version of Alibaba's Qwen3-30B-A3B model, which is an auto-regressive language model that uses an optimized transformer architecture. For more information, please check here. The NVIDIA Qwen3-30B-A3B FP4 model is quantized with TensorRT Model Optimizer. This model is ready for commercial/non-commercial use. Third-Party Community Consideration This model is not owned or developed by NVIDIA. This model has been developed and built to a third-party’s requirements for this application and use case; see link to Non-NVIDIA (Qwen3-30B-A3B) Model Card. Use Case: Developers looking to take off the shelf pre-quantized models for deployment in AI Agent systems, chatbots, RAG systems, and other AI-powered applications. Release Date: Huggingface 08/22/2025 via https://huggingface.co/nvidia/Qwen3-30B-A3B-FP4 Model Architecture: Architecture Type: Transformers Network Architecture: Qwen3-30B-A3B Input: Input Type(s): Text Input Format(s): String Input Parameters: 1D (One-Dimensional): Sequences Other Properties Related to Input: Context length up to 131K Output: Output Type(s): Text Output Format: String Output Parameters: 1D (One-Dimensional): Sequences Other Properties Related to Output: N/A Our AI models are designed and/or optimized to run on NVIDIA GPU-accelerated systems. By leveraging NVIDIA’s hardware (e.g. GPU cores) and software frameworks (e.g., CUDA libraries), the model achieves faster training and inference times compared to CPU-only solutions. Software Integration: Supported Runtime Engine(s): TensorRT-LLM Supported Hardware Microarchitecture Compatibility: NVIDIA Blackwell Model Version(s): The model is quantized with nvidia-modelopt v0.31.0 Calibration Dataset: Link: cnndailymail Data collection method: Automated. Labeling method: Automated. Training Datasets: Data Collection Method by Dataset: Undisclosed Labeling Method by Dataset: Undisclosed Properties: Undisclosed Testing Dataset: Data Collection Method by Dataset: Undisclosed Labeling Method by Dataset: Undisclosed Properties: Undisclosed Evaluation Dataset: Datasets: MMLU Pro, GPQA Diamond, HLE, LiveCodeBench, SciCode, HumanEval, AIME 2024, MATH-500 Data collection method: Hybrid: Automated, Human Labeling method: Hybrid: Human, Automated Post Training Quantization This model was obtained by quantizing the weights and activations of Qwen3-30B-A3B to FP4 data type, ready for inference with TensorRT-LLM. Only the weights and activations of the linear operators within transformer blocks are quantized. This optimization reduces the number of bits per parameter from 16 to 4, reducing the disk size and GPU memory requirements by approximately 3.3x. To deploy the quantized checkpoint with TensorRT-LLM LLM API, follow the sample codes below: Evaluation The accuracy benchmark results are presented in the table below: NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse. Please report model quality, risk, security vulnerabilities or NVIDIA AI Concerns here.

NaNK
license:apache-2.0
4,663
14

C-RADIOv2-H

β€”
4,542
8

Qwen3-14B-FP4

Description: The NVIDIA Qwen3-14B FP4 model is the quantized version of Alibaba's Qwen3-14B model, which is an auto-regressive language model that uses an optimized transformer architecture. For more information, please check here. The NVIDIA Qwen3-14B FP4 model is quantized with TensorRT Model Optimizer. This model is ready for commercial/non-commercial use. Third-Party Community Consideration This model is not owned or developed by NVIDIA. This model has been developed and built to a third-party’s requirements for this application and use case; see link to Non-NVIDIA (Qwen3-14B) Model Card. License/Terms of Use: Use of this model is governed by Apache license 2.0 Use Case: Developers looking to take off the shelf pre-quantized models for deployment in AI Agent systems, chatbots, RAG systems, and other AI-powered applications. Release Date: Huggingface 09/15/2025 via https://huggingface.co/nvidia/Qwen3-14B-FP4 Model Architecture: Architecture Type: Transformers Network Architecture: Qwen3-14B This model was developed based on Qwen3-14B Number of model parameters: 14.810^9 Input: Input Type(s): Text Input Format(s): String Input Parameters: 1D (One-Dimensional): Sequences Other Properties Related to Input: Context length up to 131K Output: Output Type(s): Text Output Format: String Output Parameters: 1D (One-Dimensional): Sequences Other Properties Related to Output: N/A Our AI models are designed and/or optimized to run on NVIDIA GPU-accelerated systems. By leveraging NVIDIA’s hardware (e.g. GPU cores) and software frameworks (e.g., CUDA libraries), the model achieves faster training and inference times compared to CPU-only solutions. Software Integration: Supported Runtime Engine(s): TensorRT-LLM Supported Hardware Microarchitecture Compatibility: NVIDIA Blackwell Model Version(s): The model is quantized with nvidia-modelopt v0.35.0 Post Training Quantization This model was obtained by quantizing the weights and activations of Qwen3-14B to FP4 data type, ready for inference with TensorRT-LLM. Only the weights and activations of the linear operators within transformer blocks are quantized. Calibration Dataset: Link: cnndailymail Data collection method: Automated. Labeling method: Automated. Training Datasets: Data Collection Method by Dataset: Undisclosed Labeling Method by Dataset: Undisclosed Properties: Undisclosed Testing Dataset: Data Collection Method by Dataset: Undisclosed Labeling Method by Dataset: Undisclosed Properties: Undisclosed To deploy the quantized checkpoint with TensorRT-LLM LLM API, follow the sample codes below: NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse. Please report model quality, risk, security vulnerabilities or NVIDIA AI Concerns here.

NaNK
license:apache-2.0
4,387
2

DeepSeek-R1-NVFP4

NaNK
license:mit
4,361
265

AceReason-Nemotron-7B

NaNK
β€”
4,288
19

Qwen3-235B-A22B-NVFP4

NaNK
license:apache-2.0
4,205
3

parakeet-tdt_ctc-0.6b-ja

[](#model-architecture) | [](#model-architecture) | [](#datasets) `parakeet-tdtctc-0.6b-ja` is an ASR model that transcribes Japanese speech with Punctuations. This model is developed by NVIDIA NeMo team. It is an XL version of Hybrid FastConformer [1] TDT-CTC [2] (around 0.6B parameters) model. See the model architecture section and NeMo documentation for complete architecture details. To train, fine-tune or play with the model you will need to install NVIDIA NeMo. We recommend you install it after you've installed latest PyTorch version. The model is available for use in the NeMo Framework [3], and can be used as a pre-trained checkpoint for inference or for fine-tuning on another dataset. By default model uses TDT to transcribe the audio files, to switch decoder to use CTC, use decodingtype='ctc' This model accepts 16000 Hz mono-channel audio (wav files) as input. This model provides transcribed speech as a string for a given audio sample. This model uses a Hybrid FastConformer-TDT-CTC architecture. FastConformer [1] is an optimized version of the Conformer model with 8x depthwise-separable convolutional downsampling. You may find more information on the details of FastConformer here: Fast-Conformer Model. TDT (Token-and-Duration Transducer) [2] is a generalization of conventional Transducers by decoupling token and duration predictions. Unlike conventional Transducers which produces a lot of blanks during inference, a TDT model can skip majority of blank predictions by using the duration output (up to 4 frames for this `parakeet-tdtctc-0.6b-ja` model), thus brings significant inference speed-up. The detail of TDT can be found here: Efficient Sequence Transduction by Jointly Predicting Tokens and Durations. The NeMo Framework [3] was used for training this model with this example script and this base config. The model was trained for 300k steps with dynamic bucketing and a batch duration of 600s per GPU on 32 NVIDIA A100 80GB GPUs, and then finetuned for 100k additional steps on the modified training data (predicted texts for training samples with CER>10%). SentencePiece [4] tokenizer with 3072 tokens for this model was built using the text transcripts of the train set with this script. The model was trained on ReazonSpeech v2.0 [5] speech corpus containing more than 35k hours of natural Japanese speech. The following table summarizes the performance of this model in terms of Character Error Rate (CER%). In CER calculation, punctuation marks and non-alphabet characters are removed, and numbers are transformed to words using `num2words` library [6]. |Version|Decoder|JSUT basic5000|MCV 8.0 test|MCV 16.1 dev|MCV16.1 test|TEDxJP-10k| |:---:|:---:|:---:|:---:|:---:|:---:|:---:| | 1.23.0 | TDT | 6.4 | 7.1 | 10.1 | 13.2 | 9.0 | | 1.23.0 | CTC | 6.5 | 7.2 | 10.2 | 13.3 | 9.1 | NVIDIA Riva, is an accelerated speech AI SDK deployable on-prem, in all clouds, multi-cloud, hybrid, on edge, and embedded. Additionally, Riva provides: World-class out-of-the-box accuracy for the most common languages with model checkpoints trained on proprietary data with hundreds of thousands of GPU-compute hours Best in class accuracy with run-time word boosting (e.g., brand and product names) and customization of acoustic model, language model, and inverse text normalization Streaming speech recognition, Kubernetes compatible scaling, and enterprise-grade support Although this model isn’t supported yet by Riva, the list of supported models is here. Check out Riva live demo. References [1] Fast Conformer with Linearly Scalable Attention for Efficient Speech Recognition [2] Efficient Sequence Transduction by Jointly Predicting Tokens and Durations [6] num2words library - Convert numbers to words in multiple languages License to use this model is covered by the CC-BY-4.0. By downloading the public and release version of the model, you accept the terms and conditions of the CC-BY-4.0 license.

NaNK
license:cc-by-4.0
4,198
36

NVIDIA-Nemotron-Nano-12B-v2-Base

NaNK
β€”
4,176
80

Qwen3-8B-FP8

Description: The NVIDIA Qwen3-8B FP8 model is the quantized version of Alibaba's Qwen3-8B model, which is an auto-regressive language model that uses an optimized transformer architecture. For more information, please check here. The NVIDIA Qwen3-8B FP8 model is quantized with TensorRT Model Optimizer. This model is ready for commercial/non-commercial use. Third-Party Community Consideration This model is not owned or developed by NVIDIA. This model has been developed and built to a third-party’s requirements for this application and use case; see link to Non-NVIDIA (Qwen3-8B) Model Card. License/Terms of Use: Use of this model is governed by Apache license 2.0 Use Case: Developers looking to take off the shelf pre-quantized models for deployment in AI Agent systems, chatbots, RAG systems, and other AI-powered applications. Release Date: Huggingface 09/15/2025 via https://huggingface.co/nvidia/Qwen3-8B-FP8 Model Architecture: Architecture Type: Transformers Network Architecture: Qwen3-8B This model was developed based on Qwen3-8B Number of model parameters: 8.210^9 Input: Input Type(s): Text Input Format(s): String Input Parameters: 1D (One-Dimensional): Sequences Other Properties Related to Input: Context length up to 131K Output: Output Type(s): Text Output Format: String Output Parameters: 1D (One-Dimensional): Sequences Other Properties Related to Output: N/A Our AI models are designed and/or optimized to run on NVIDIA GPU-accelerated systems. By leveraging NVIDIA’s hardware (e.g. GPU cores) and software frameworks (e.g., CUDA libraries), the model achieves faster training and inference times compared to CPU-only solutions. Software Integration: Supported Runtime Engine(s): TensorRT-LLM Supported Hardware Microarchitecture Compatibility: NVIDIA Blackwell Model Version(s): The model is quantized with nvidia-modelopt v0.35.0 Post Training Quantization This model was obtained by quantizing the weights and activations of Qwen3-8B to FP8 data type, ready for inference with TensorRT-LLM. Only the weights and activations of the linear operators within transformer blocks are quantized. Calibration Dataset: Link: cnndailymail Data collection method: Automated. Labeling method: Automated. Training Datasets: Data Collection Method by Dataset: Undisclosed Labeling Method by Dataset: Undisclosed Properties: Undisclosed Testing Dataset: Data Collection Method by Dataset: Undisclosed Labeling Method by Dataset: Undisclosed Properties: Undisclosed To deploy the quantized checkpoint with TensorRT-LLM LLM API, follow the sample codes below: NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse. Please report model quality, risk, security vulnerabilities or NVIDIA AI Concerns here.

NaNK
license:apache-2.0
3,902
0

Qwen3-235B-A22B-FP4

NaNK
license:apache-2.0
3,787
3

Mistral-NeMo-Minitron-8B-Base

Mistral-NeMo-Minitron-8B-Base is a base text-to-text model that can be adopted for a variety of natural language generation tasks. It is a large language model (LLM) obtained by pruning and distilling the Mistral-NeMo 12B; specifically, we prune the embedding dimension and MLP intermediate dimension in the model. Following pruning, we perform continued training with distillation using 380 billion tokens to arrive at the final model; we use the continuous pre-training data corpus used in Nemotron-4 15B for this purpose. Please refer to our technical report for more details. Model Dates: Mistral-NeMo-Minitron-8B-Base was trained between July 24, 2024 and August 10, 2024. This model is released under the NVIDIA Open Model License Agreement. Mistral-NeMo-Minitron-8B-Base uses a model embedding size of 4096, 32 attention heads, MLP intermediate dimension of 11520, with 40 layers in total. Additionally, it uses Grouped-Query Attention (GQA) and Rotary Position Embeddings (RoPE). Architecture Type: Transformer Decoder (Auto-Regressive Language Model) Other Properties Related to Input: Works well within 8k characters or less. Usage Support for this model will be added in the upcoming `transformers` release. In the meantime, please install the library from source: Supported Hardware Microarchitecture Compatibility: NVIDIA Ampere NVIDIA Blackwell NVIDIA Hopper NVIDIA Lovelace Properties: The training corpus for Mistral-NeMo-Minitron-8B-Base consists of English and multilingual text, as well as code. Our sources cover a variety of document types such as: webpages, dialogue, articles, and other written materials. The corpus spans domains including legal, math, science, finance, and more. In our continued training set, we introduce a small portion of question-answering, and alignment style data to improve model performance. Data Freshness: Training was done in 2024, the pretraining data has a cutoff of June 2023. 5-shot performance. Language Understanding evaluated using Massive Multitask Language Understanding: Zero-shot performance. Evaluated using select datasets from the LM Evaluation Harness with additions: | HellaSwag | Winogrande | GSM8K| ARC-Challenge | XLSum | | :---- | :---- | :---- | :---- | :---- | | 83.0 | 80.4 | 58.5 | 64.4 | 32.0 Code generation performance. Evaluated using MBPP: | Score | | :---- | | 43.77 | The model was trained on data that contains toxic language, unsafe content, and societal biases originally crawled from the internet. Therefore, the model may amplify those biases and return toxic responses especially when prompted with toxic prompts. The model may generate answers that may be inaccurate, omit key information, or include irrelevant or redundant text producing socially unacceptable or undesirable text, even if the prompt itself does not include anything explicitly offensive. NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse. Please report security vulnerabilities or NVIDIA AI Concerns here. Minitron: Compact Language Models via Pruning and Knowledge Distillation LLM Pruning and Distillation in Practice: The Minitron Approach

NaNK
β€”
3,698
176

parakeet-ctc-0.6b

NaNK
license:cc-by-4.0
3,587
19

esm2_t48_15B_UR50D

NaNK
license:mit
3,524
4

nemo-nano-codec-22khz-0.6kbps-12.5fps

β€”
3,485
11

Llama-3.1-405B-Instruct-NVFP4

NaNK
llama
3,464
5

Qwen3-32B-FP4

Description: The NVIDIA Qwen3-32B FP4 model is the quantized version of Alibaba's Qwen3-32B model, which is an auto-regressive language model that uses an optimized transformer architecture. For more information, please check here. The NVIDIA Qwen3-32B FP4 model is quantized with TensorRT Model Optimizer. This model is ready for commercial/non-commercial use. Third-Party Community Consideration This model is not owned or developed by NVIDIA. This model has been developed and built to a third-party’s requirements for this application and use case; see link to Non-NVIDIA (Qwen3-32B) Model Card. Use Case: Developers looking to take off the shelf pre-quantized models for deployment in AI Agent systems, chatbots, RAG systems, and other AI-powered applications. Release Date: Huggingface 09/15/2025 via https://huggingface.co/nvidia/Qwen3-32B-FP4 Model Architecture: Architecture Type: Transformers Network Architecture: Qwen3-32B This model was developed based on Qwen3-32B Number of model parameters: 32.8B Input: Input Type(s): Text Input Format(s): String Input Parameters: 1D (One-Dimensional): Sequences Other Properties Related to Input: Context length up to 131K Output: Output Type(s): Text Output Format: String Output Parameters: 1D (One-Dimensional): Sequences Other Properties Related to Output: N/A Our AI models are designed and/or optimized to run on NVIDIA GPU-accelerated systems. By leveraging NVIDIA’s hardware (e.g. GPU cores) and software frameworks (e.g., CUDA libraries), the model achieves faster training and inference times compared to CPU-only solutions. Software Integration: Supported Runtime Engine(s): TensorRT-LLM Supported Hardware Microarchitecture Compatibility: NVIDIA Blackwell Model Version(s): The model is quantized with nvidia-modelopt v0.35.0 Post Training Quantization This model was obtained by quantizing the weights and activations of Qwen3-32B to FP4 data type, ready for inference with TensorRT-LLM. Only the weights and activations of the linear operators within transformer blocks are quantized. Training, Testing, and Evaluation Datasets: Data Modality [Text] Calibration Dataset: Link: cnndailymail Data collection method: Automated. Labeling method: Automated. Training Datasets: Data Collection Method by Dataset: Undisclosed Labeling Method by Dataset: Undisclosed Properties: Undisclosed Testing Dataset: Data Collection Method by Dataset: Undisclosed Labeling Method by Dataset: Undisclosed Properties: Undisclosed Evaluation Dataset: Datasets: MMLU Pro, GPQA Diamond, HLE, LiveCodeBench, SciCode, HumanEval, AIME 2024, MATH-500 Data collection method: Hybrid: Automated, Human Labeling method: Hybrid: Human, Automated To deploy the quantized checkpoint with TensorRT-LLM LLM API, follow the sample codes below: Evaluation The accuracy benchmark results are presented in the table below: NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse. Please report model quality, risk, security vulnerabilities or NVIDIA AI Concerns here.

NaNK
license:apache-2.0
3,427
1

Llama-3.1-405B-Instruct-FP8

NaNK
llama
3,318
12

mit-b4

β€”
3,128
1

Cosmos-Tokenize1-CV8x8x8-720p

β€”
3,122
7

Qwen3-14B-FP8

Description: The NVIDIA Qwen3-14B FP8 model is the quantized version of Alibaba's Qwen3-14B model, which is an auto-regressive language model that uses an optimized transformer architecture. For more information, please check here. The NVIDIA Qwen3-14B FP8 model is quantized with TensorRT Model Optimizer. This model is ready for commercial/non-commercial use. Third-Party Community Consideration This model is not owned or developed by NVIDIA. This model has been developed and built to a third-party’s requirements for this application and use case; see link to Non-NVIDIA (Qwen3-14B) Model Card. License/Terms of Use: Use of this model is governed by Apache license 2.0 Use Case: Developers looking to take off the shelf pre-quantized models for deployment in AI Agent systems, chatbots, RAG systems, and other AI-powered applications. Release Date: Huggingface 09/15/2025 via https://huggingface.co/nvidia/Qwen3-14B-FP8 Model Architecture: Architecture Type: Transformers Network Architecture: Qwen3-14B This model was developed based on Qwen3-14B Number of model parameters: 14.810^9 Input: Input Type(s): Text Input Format(s): String Input Parameters: 1D (One-Dimensional): Sequences Other Properties Related to Input: Context length up to 131K Output: Output Type(s): Text Output Format: String Output Parameters: 1D (One-Dimensional): Sequences Other Properties Related to Output: N/A Our AI models are designed and/or optimized to run on NVIDIA GPU-accelerated systems. By leveraging NVIDIA’s hardware (e.g. GPU cores) and software frameworks (e.g., CUDA libraries), the model achieves faster training and inference times compared to CPU-only solutions. Software Integration: Supported Runtime Engine(s): TensorRT-LLM Supported Hardware Microarchitecture Compatibility: NVIDIA Blackwell Model Version(s): The model is quantized with nvidia-modelopt v0.35.0 Post Training Quantization This model was obtained by quantizing the weights and activations of Qwen3-14B to FP8 data type, ready for inference with TensorRT-LLM. Only the weights and activations of the linear operators within transformer blocks are quantized. Calibration Dataset: Link: cnndailymail Data collection method: Automated. Labeling method: Automated. Training Datasets: Data Collection Method by Dataset: Undisclosed Labeling Method by Dataset: Undisclosed Properties: Undisclosed Testing Dataset: Data Collection Method by Dataset: Undisclosed Labeling Method by Dataset: Undisclosed Properties: Undisclosed To deploy the quantized checkpoint with TensorRT-LLM LLM API, follow the sample codes below: NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse. Please report model quality, risk, security vulnerabilities or NVIDIA AI Concerns here.

NaNK
license:apache-2.0
2,952
0

Nemotron-Flash-3B-Instruct

NaNK
β€”
2,855
7

Llama-3.1-Minitron-4B-Width-Base

NaNK
llama
2,598
192

Llama-3.1-405B-Instruct-FP4

NaNK
llama
2,535
5

Qwen2.5-VL-7B-Instruct-FP4

Description: The NVIDIA Qwen2.5-VL-7B-Instruct-FP4 model is the quantized version of Alibaba's Qwen2.5-VL-7B-Instruct model, which is an auto-regressive language model that uses an optimized transformer architecture. For more information, please check here. The NVIDIA Qwen2.5-VL-7B-Instruct-FP4 model is quantized with TensorRT Model Optimizer. This model is ready for commercial/non-commercial use. Third-Party Community Consideration This model is not owned or developed by NVIDIA. It was developed and built to a third party’s requirements for this application and use case. See the Non-NVIDIA (Qwen2.5-VL-7B-Instruct) Model Card. License/Terms of Use: Use of this model is governed by nvidia-open-model-license ADDITIONAL INFORMATION: Apache 2.0. Deployment Geography: Global, except in European Union Use Case: Developers looking to take off the shelf pre-quantized models for deployment in AI Agent systems, chatbots, RAG systems, and other AI-powered applications. Release Date: Huggingface 08/22/2025 via https://huggingface.co/nvidia/Qwen2.5-VL-7B-Instruct-FP4 Model Architecture: Architecture Type: Transformers Network Architecture: Qwen2.5-VL-7B This model was developed based on Qwen2.5-VL-7B Number of model parameters 710^9 Input: Input Type(s): Multilingual text, and images Input Format(s): String, Images Input Parameters: One-Dimensional (1D), Two-Dimensional (2D) Other Properties Related to Input: Context length up to 32K Output: Output Type(s): Text Output Format: String Output Parameters: 1D (One-Dimensional): Sequences Other Properties Related to Output: N/A Our AI models are designed and/or optimized to run on NVIDIA GPU-accelerated systems. By leveraging NVIDIA’s hardware (e.g. GPU cores) and software frameworks (e.g., CUDA libraries), they achieve faster training and inference times compared to CPU-only solutions. Software Integration: Supported Runtime Engine(s): TensorRT-LLM Supported Hardware Microarchitecture Compatibility: NVIDIA Blackwell Model Version(s): The model is quantized with nvidia-modelopt v0.35.0 Post Training Quantization This model was obtained by quantizing the weights and activations of Qwen2.5-VL-7B-Instruct to FP4 data type, ready for inference with TensorRT-LLM. Only the weights and activations of the linear operators within transformer blocks of the language model are quantized. Training, Testing, and Evaluation Datasets: Data Modality [Image] [Text] Calibration Dataset: Link: cnndailymail Data collection method: Automated. Labeling method: Automated. Training Datasets: Data Collection Method by Dataset: Undisclosed Labeling Method by Dataset: Undisclosed Properties: Undisclosed Testing Dataset: Data Collection Method by Dataset: Undisclosed Labeling Method by Dataset: Undisclosed Properties: Undisclosed To deploy the quantized checkpoint with TensorRT-LLM LLM API, follow the sample codes below: NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse. Please report model quality, risk, security vulnerabilities or NVIDIA AI Concerns here.

NaNK
β€”
2,512
3

Qwen2.5-VL-7B-Instruct-NVFP4

NaNK
β€”
2,449
3

parakeet-tdt-1.1b

NaNK
license:cc-by-4.0
2,378
110

groupvit-gcc-yfcc

β€”
2,359
6

Cosmos-Transfer1-7B

NaNK
β€”
2,258
57

llama-3.1-nemoguard-8b-content-safety

Llama Nemotron Safety Guard V2, formerly known as Llama-3.1-NemoGuard-8B-ContentSafety, is a content safety model trained on the Nemotron Content Safety Dataset V2 that moderates human-LLM interaction content and classifies user prompts and LLM responses as safe or unsafe. If the content is unsafe, the model additionally returns a response with a list of categories that the content violates. The base large language model (LLM) is the multilingual Llama-3.1-8B-Instruct model from Meta. NVIDIA’s optimized release is LoRa-tuned on approved datasets and better conforms NVIDIA’s content safety risk taxonomy and other safety risks in human-LLM interactions. The model can be prompted using an instruction and a taxonomy of unsafe risks to be categorized. The instruction format for prompt moderation is shown below under input and output examples. Governing NVIDIA Download Terms & Third-Party Component Attribution Terms (Hugging Face LORA weights) GOVERNING TERMS: Use of this model is governed by the NVIDIA Open Model License Agreement. Additional Information: Llama 3.1 Community License Agreement. Built with Llama. Network Architecture: The base model architecture is based on the Llama-3.1-8B-Instruct model from Meta (Model Card). We perform Parameter Efficient FineTuning (PEFT) over the above base model using the following Network Architecture parameters: - Rank: 16 - Alpha: 32 - Targeted low rank adaptation modules: 'kproj', 'qproj', 'vproj', 'oproj', 'upproj', 'downproj', 'gateproj', 'lmhead'. The training method for Llama Nemotron Safety Guard V2 involves the following concepts: - A system prompt, including the Nemotron Content Safety Dataset V2 Taxonomy, which is a safety policy that contains a list of unsafe categories. - Novel safety risk categories and policies can be provided in the instruction for the model to predict categories of violation if unsafe - The safety taxonomy and policy used to train the model contains 23 critically unsafe risk categories, a safe category and a "needs caution" category. - An internally annotated dataset, called Nemotron Content Safety Dataset V2, of approximately 30,000 prompts and responses are used to instruction-tune the model. - The model is instruction-tuned to follow either safety or topic-following system prompts, with the LLM behaving as a classifier in both settings. - The model can return labels for both user and bot messages together in one inference call, if they both exist in the payload. This is unlike previous models in this space, where the system prompt needs to instruct the LLM to moderate either the user or an LLM turn. See the section on output format for more information. ## Prompt Format: The prompt template consists of the Nemotron Content Safety Dataset V2 Taxonomy followed placeholders for either a user message alone, or a user message and a bot response, and finally an instruction with the task and output format. ### Example Model Input/Output for prompt safety classification: Note that the and tags only contain the `user:` field for prompt classification. Input ### Example Model Input/Output for LLM response safety moderation: Note that the and tags contain the `response: agent:` field as a placeholder for the response from the agent, as well as the `user:` field as a placeholder for user input for response classification conditioned on the user prompt. Input ## Inference Methods: The model can be accessed as an NVIDIA NIM or in a HuggingFace pipeline using the provided LoRA adapter. - TensorRT-LLM: This engine is optimized for inference speed, accelerated by TensorRT-LLM, an open-source library for optimizing inference performance. It is available as a docker container through NVIDIA Inference Microservice (NIM) (documentation). - HuggingFace: Alternatively, you can download the provided LoRA adapter and use it in a conventional HuggingFace pipeline. ## Training and Evaluation Datasets: Kindly read our paper for details about the creation of the taxonomy, curation of the training dataset, and results on content moderation evaluation benchmarks. ## Ethical Considerations NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with the terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse. Due to the serious nature of this project, annotators were asked to join on a volunteer basis based on their skill level, availability, and willingness to expose themselves to potentially unsafe content. Before work on this project began, all participants were asked to sign an Adult Content Acknowledgement that coincides with the organization's existing AntiHarassment Policy and Code of Conduct. This was to ensure that all annotators be made aware of the nature of this work, as well as the resources available to them should it affect their mental well-being. Regular 1:1 meetings were held between the leads assigned to this project and each annotator to make sure they are still comfortable with the material and are capable of continuing on with this type of work. Throughout the six months time span of the Content Moderation Guardrails project, we averaged twelve annotators at any given time. Of these twelve, four annotators come from Engineering backgrounds specializing in data analysis and collection, gaming, and robotics. Eight annotators have a background in Creative Writing, with specialization in linguistics, research and development, and other creative arts such as photography and film. All annotators have been extensively trained in working with Large Language Models (LLM), as well as other variations of Generative AI such as image retrieval or evaluations of multi-turn conversations. All are capable of generating creative text-based output and categorization work. Each of these twelve annotators resides in the United States, all from various ethnic and religious backgrounds that allow for representation across race, age, and social status. The process in which the Nemotron Content Safety Dataset V2 creation abides by ethical data categorization work is based within the tooling of Label Studio, an open source data labeling tool often used for the organization's internal projects. This tooling technology allows for large sets of data to be analyzed by individual annotators without seeing the work of their peers. This is essential in preventing bias between annotators, as well as delivering prompts to each individual with variability so that no one annotator is completing similar tasks based on how the data was initially arranged. For more detailed information on ethical considerations for this model, please see the Model Card++ Explainability, Bias, Safety & Security, and Privacy Subcards. Please report security vulnerabilities or NVIDIA AI Concerns here. ## Bias Field | Response :---------------------------------------------------------------------------------------------------|:--------------------------------------------------------------------------------- Participation considerations from adversely impacted groups (protected classes) in model design and testing: | None of the Above Measures taken to mitigate against unwanted bias: | None of the Above Explainability Field | Response :------------------------------------------------------------------------------------------------------|:--------------------------------------------------------------------------------- Intended Application & Domain: | AI Content Safety Risks Moderation Model Type: | Content Classification Intended User: | This model is intended for enterprise developers building large language models. Output: | Text (Safe/UnSafe String); Unsafe Category ID if content is unsafe Describe how the model works: | Classifies user prompt as safe or unsafe per taxonomy and policy. If unsafe, the model predicts the category ID of violation. Name the adversely impacted groups this has been tested to deliver comparable outcomes regardless of: | None of the Above Verified to have met prescribed quality standards: | Yes Performance Metrics: | Area under Precision-Recall (PR) curve (AUPRC), F-1 Score, Jailbreak Resiliency Potential Known Risks: | Even though we have tested the jaillbreak resiliency of this model, the safety model since its a LLM, can still get jailbroken through adversarial prompts leading the model to predict an unsafe hidden inside an adversarial prompt as "safe". The model may also also flag a content as a false positive/false negative under a certain unsafe category. Licensing: | Use of this model is governed by the NVIDIA Open Model License Agreement. Additional Information: Llama 3.1 Community License Agreement. Built with Llama. Privacy Field | Response :----------------------------------------------------------------------------------------------------------------------------------|:----------------------------------------------- Generatable or reverse engineerable personally-identifiable information (PII)? | None Was consent obtained for any PII used? | Not Applicable PII used to create this model? | None Known How often is dataset reviewed? | During dataset creation, model training, evaluation and before release Is a mechanism in place to honor data subject right of access or deletion of personal data? | Not Applicable If PII collected for the development of the model, was it collected directly by NVIDIA? | Not Applicable If PII collected for the development of the model by NVIDIA, do you maintain or have access to disclosures made to data subjects? | Not Applicable If PII collected for the development of this AI model, was it minimized to only what was required? | Not Applicable Is there provenance for all datasets used in training? | Yes Does data labeling (annotation, metadata) comply with privacy laws? | Yes Is data compliant with data subject requests for data correction or removal, if such a request was made? | Not Applicable Safety Field | Response :---------------------------------------------------|:---------------------------------- Model Application(s): | Large Language Model-based Content Safety & Moderation Describe the life critical impact (if present). | Not Applicable Use Case Restrictions: | Abide by Llama 3.1 License Agreement Model and dataset restrictions: | The Principle of least privilege (PoLP) is applied limiting access for dataset generation and model development. Restrictions enforce dataset access during training, and dataset license constraints adhered to. Dataset Card Authors: Prasoon Varshney: [email protected] Contacts: Prasoon Varshney, Shaona Ghosh: {prasoonv,shaonag}@nvidia.com

NaNK
base_model:meta-llama/Llama-3.1-8B-Instruct
2,228
30

Mistral-NeMo-12B-Instruct

NaNK
license:apache-2.0
2,199
163

Qwen3-32B-NVFP4

NaNK
license:apache-2.0
2,172
1

NVIDIA-Nemotron-Nano-9B-v2-NVFP4

The pretraining data has a cutoff date of September 2024. NVIDIA-Nemotron-Nano-9B-v2-NVFP4 is a quantized version of NVIDIA-Nemotron-Nano-9B-v2 and is a large language model (LLM) trained from scra...

NaNK
β€”
2,166
10

stt_kk_ru_fastconformer_hybrid_large

license:cc-by-4.0
2,041
3

Frame_VAD_Multilingual_MarbleNet_v2.0

β€”
2,017
22

Cosmos-Embed1-448p

β€”
1,986
2

NV-Embed-v1

license:cc-by-nc-4.0
1,963
426

MambaVision-T-1K

β€”
1,961
37

Eagle2-2B

NaNK
license:cc-by-nc-4.0
1,950
30

OpenReasoning-Nemotron-14B

NaNK
license:cc-by-4.0
1,857
43

Nemotron-H-47B-Base-8K

NaNK
β€”
1,838
21

Llama-3.1-Nemotron-70B-Reward-HF

NaNK
llama
1,817
89

esm2_t6_8M_UR50D

license:mit
1,759
2

Nemotron-Mini-4B-Instruct

Nemotron-Mini-4B-Instruct is a model for generating responses for roleplaying, retrieval augmented generation, and function calling. It is a small language model (SLM) optimized through distillation, pruning and quantization for speed and on-device deployment. It is a fine-tuned version of nvidia/Minitron-4B-Base, which was pruned and distilled from Nemotron-4 15B using our LLM compression technique. This instruct model is optimized for roleplay, RAG QA, and function calling in English. It supports a context length of 4,096 tokens. This model is ready for commercial use. For more details about how this model is used for NVIDIA ACE, please refer to this blog post and this demo video, which showcases how the model can be integrated into a video game. You can download the model checkpoint for NVIDIA AI Inference Manager (AIM) SDK from here. Model Dates: Nemotron-Mini-4B-Instruct was trained between February 2024 and Aug 2024. Nemotron-Mini-4B-Instruct uses a model embedding size of 3072, 32 attention heads, and an MLP intermediate dimension of 9216. It also uses Grouped-Query Attention (GQA) and Rotary Position Embeddings (RoPE). Architecture Type: Transformer Decoder (auto-regressive language model) We recommend using the following prompt template, which was used to fine-tune the model. The model may not perform optimally without it. You can also use `pipeline` but you need to create a tokenizer object and assign it to the pipeline manually. The Nemotron-Mini-4B-Instruct model underwent AI safety evaluation including adversarial testing via three distinct methods: - Garak, is an automated LLM vulnerability scanner that probes for common weaknesses, including prompt injection and data leakage. - AEGIS, is a content safety evaluation dataset and LLM based content safety classifier model, that adheres to a broad taxonomy of 13 categories of critical risks in human-LLM interactions. - Human Content Red Teaming leveraging human interaction and evaluation of the models' responses. The model was trained on data that contains toxic language and societal biases originally crawled from the internet. Therefore, the model may amplify those biases and return toxic responses especially when prompted with toxic prompts. The model may generate answers that may be inaccurate, omit key information, or include irrelevant or redundant text producing socially unacceptable or undesirable text, even if the prompt itself does not include anything explicitly offensive. This issue could be exacerbated without the use of the recommended prompt template. This issue could be exacerbated without the use of the recommended prompt template. NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse. For more detailed information on ethical considerations for this model, please see the Model Card++. Please report security vulnerabilities or NVIDIA AI Concerns here.

NaNK
llama-3
1,664
171

Llama-3_1-Nemotron-Ultra-253B-v1

Llama-3.1-Nemotron-Ultra-253B-v1 is a large language model (LLM) which is a derivative of Meta Llama-3.1-405B-Instruct (AKA the reference model). It is a reasoning model that is post trained for reasoning, human chat preferences, and tasks, such as RAG and tool calling. The model supports a context length of 128K tokens. This model fits on a single 8xH100 node for inference. Llama-3.1-Nemotron-Ultra-253B-v1 is a model which offers a great tradeoff between model accuracy and efficiency. Efficiency (throughput) directly translates to savings. Using a novel Neural Architecture Search (NAS) approach, we greatly reduce the model’s memory footprint, enabling larger workloads, as well as reducing the number of GPUs required to run the model in a data center environment. This NAS approach enables the selection of a desired point in the accuracy-efficiency tradeoff. Furthermore, by using a novel method to vertically compress the model (see details here), it also offers a significant improvement in latency. The model underwent a multi-phase post-training process to enhance both its reasoning and non-reasoning capabilities. This includes a supervised fine-tuning stage for Math, Code, Reasoning, Chat, and Tool Calling as well as multiple reinforcement learning (RL) stages using Group Relative Policy Optimization (GRPO) algorithms for reasoning, chat, and instruction-following. For more details on how the model was trained, please see our technical report and blog. This model is part of the Llama Nemotron Collection. You can find the other model(s) in this family here: - Llama-3.1-Nemotron-Nano-8B-v1 - Llama-3.3-Nemotron-Super-49B-v1 We want to hear from you! Share your ideas, vote on what matters, and help shape the future of Nemotron. GOVERNING TERMS: Your use of this model is governed by the NVIDIA Open Model License. Additional Information: Llama 3.1 Community License Agreement. Built with Llama. Model Dates: Trained between November 2024 and April 2025 Data Freshness: The pretraining data has a cutoff of 2023 per Llama-3.1-405B-Instruct Use Case: Developers designing AI Agent systems, chatbots, RAG systems, and other AI-powered applications. Also suitable for typical instruction-following tasks. [\[2505.00949\] Llama-Nemotron: Efficient Reasoning Models](https://arxiv.org/abs/2505.00949) [\[2502.00203\] Reward-aware Preference Optimization: A Unified Mathematical Framework for Model Alignment](https://arxiv.org/abs/2502.00203) [\[2411.19146\]Puzzle: Distillation-Based NAS for Inference-Optimized LLMs](https://arxiv.org/abs/2411.19146) [\[2503.18908\]FFN Fusion: Rethinking Sequential Computation in Large Language Models](https://arxiv.org/abs/2503.18908) Model Architecture Architecture Type: Dense decoder-only Transformer model Network Architecture: Llama-3.1-405B-Instruct, customized through Neural Architecture Search (NAS) This model was developed based on Llama-3.1-405B-Instruct This model has 253B model parameters. The model is a derivative of Llama 3.1-405B-Instruct, using Neural Architecture Search (NAS). The NAS algorithm results in non-standard and non-repetitive blocks. This includes the following: Skip attention: In some blocks, the attention is skipped entirely, or replaced with a single linear layer. Variable FFN: The expansion/compression ratio in the FFN layer is different between blocks. FFN Fusion: When several consecutive attention layers are skipped, which can result in a sequence of multiple FFNs, that sequence of FFNs are fused into a smaller number of wider FFN layers. For each block of the reference model, we create multiple variants providing different tradeoffs of quality vs. computational complexity, discussed in more depth below. We then search over the blocks to create a model which meets the required throughput and memory while minimizing the quality degradation. To recover performance, the model initially undergoes knowledge distillation (KD) for 65 billion tokens. This is followed by a continual pretraining (CPT) phase for 88 billion tokens. Llama-3.1-Nemotron-Ultra-253B-v1 is a general purpose reasoning and chat model intended to be used in English and coding languages. Other non-English languages (German, French, Italian, Portuguese, Hindi, Spanish, and Thai) are also supported. Input - Input Type: Text - Input Format: String - Input Parameters: One-Dimensional (1D) - Other Properties Related to Input: Context length up to 131,072 tokens Output - Output Type: Text - Output Format: String - Output Parameters: One-Dimensional (1D) - Other Properties Related to Output: Context length up to 131,072 tokens Software Integration - Runtime Engine: Transformers - Recommended Hardware Microarchitecture Compatibility: - NVIDIA Hopper - NVIDIA Ampere - Preferred Operating System(s): Linux 1. Reasoning mode (ON/OFF) is controlled via the system prompt, which must be set as shown in the example below. All instructions should be contained within the user prompt 2. We recommend setting temperature to \`0.6\`, and Top P to \`0.95\` for Reasoning ON mode 3. We recommend using greedy decoding (temperature 0\) for Reasoning OFF mode 4. We do not recommend to add additional system prompts besides the control prompt, all instructions should be put into user query 5. We have provided a list of prompts to use for evaluation for each benchmark where a specific template is required 6. The model will include ` ` if no reasoning was necessary in Reasoning ON model, this is expected behaviour You can try this model out through the preview API, using this link: Llama-3\1-Nemotron-Ultra-253B-v1. Use It with Transformers See the snippet below for usage with Hugging Face Transformers library. Reasoning mode (ON/OFF) is controlled via system prompt. Please see the example below We recommend using the transformers package with version 4.48.3. Example of reasoning on: Test Hardware: - BF16: - 8x NVIDIA H100-80GB - 4x NVIDIA B100 - FP 8 - 4x NVIDIA H100-80GB A large variety of training data was used for the knowledge distillation phase before post-training pipeline, 3 of which included: FineWeb, Buzz-V1.2, and Dolma. The data for the multi-stage post-training phases is a compilation of SFT and RL data that supports improvements of math, code, general reasoning, and instruction following capabilities of the original Llama instruct model. Prompts have been sourced from either public and open corpus or synthetically generated. Responses were synthetically generated by a variety of models, with some prompts containing responses for both reasoning on and off modes, to train the model to distinguish between two modes. This model was improved with Qwen. We have released our Llama-Nemotron-Post-Training-Dataset to promote openness and transparency in model development and improvement. We used the datasets listed in the next section to evaluate Llama-3.1-Nemotron-Ultra-253B-v1. Evaluation Results These results contain both Reasoning On, and Reasoning Off. We recommend using temperature=\`0.6\`, top\p=\`0.95\` for Reasoning On mode, and greedy decoding for Reasoning Off mode. All evaluations are done with 32k sequence length. We run the benchmarks up to 16 times and average the scores to be more accurate. > NOTE: Where applicable, a Prompt Template will be provided. While completing benchmarks, please ensure that you are parsing for the correct output format as per the provided prompt in order to reproduce the benchmarks seen below. | Reasoning Mode | pass@1 | |--------------|------------| | Reasoning Off | 56.60 | | Reasoning On | 76.01 | | Reasoning Mode | pass@1 | |--------------|------------| | Reasoning Off | 16.67 | | Reasoning On | 72.50 | | Reasoning Mode | Score | |--------------|------------| | Reasoning Off | 73.62 | | Reasoning On | 74.10 | | Reasoning Mode | pass@1 | |--------------|------------| | Reasoning Off | 29.03 | | Reasoning On | 66.31 | | Reasoning Mode | Strict:Instruction | |--------------|------------| | Reasoning Off | 88.85 | | Reasoning On | 89.45 | | Reasoning Mode | pass@1 | |--------------|------------| | Reasoning Off | 80.40 | | Reasoning On | 97.00 | | Reasoning Mode | Knowledge Score | Reasoning Score | Math Score | Coding Score | Overall Score | |--------------|------------|------------|------------|------------|------------| | Reasoning On | 70.13 | 81.63 | 89.29 | 92.86 | 79.14 | NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse. For more detailed information on ethical considerations for this model, please see the Model Card++ Explainability, Bias, Safety & Security, and Privacy Subcards. Please report security vulnerabilities or NVIDIA AI Concerns here.

NaNK
llama-3
1,616
339

segformer-b3-finetuned-cityscapes-1024-1024

β€”
1,578
3

C-RADIOv3-H

β€”
1,561
1

llama-nemotron-rerank-1b-v2

NaNK
llama_bidirec
1,539
9

OpenMath-Nemotron-1.5B

NaNK
license:cc-by-4.0
1,519
25

AceMath-RL-Nemotron-7B

NaNK
β€”
1,492
24

gpt-oss-120b-Eagle3-v2

Description: The NVIDIA gpt-oss-120b Eagle model is the Eagle head of the OpenAI’s gpt-oss-120b model, which is an auto-regressive language model that uses a mixture-of-experts (MoE) architecture with 5 billion activated parameters and 120 billion total parameters. For more information, please check here. The NVIDIA gpt-oss-120b Eagle3 model incorporates Eagle speculative decoding with TensorRT Model Optimizer. This model is ready for commercial/non-commercial use. Note `nvidia/gpt-oss-120b-Eagle3-v2` is typically better for use cases of less than 8k context length. Use Case: Developers designing AI Agent systems, chatbots, RAG systems, and other AI-powered applications. Also suitable for typical instruction-following tasks. Release Date: Huggingface: Oct 6th, 2025 via [https://huggingface.co/nvidia/gpt-oss-120b-Eagle3-v2] Model Architecture: Architecture Type: Transformers Network Architecture: gpt-oss-120b ##Computational Load Cumulative Compute: 4.8x10^20 Estimated Energy and Emissions for Model Training: Total kWh = 2500 Total Emissions (tCO2e) = 0.8075 Input: Input Type(s): Text Input Format(s): String Input Parameters: One Dimensional (1D): Sequences Output: Output Type(s): Text Output Format: String Output Parameters: One-Dimensional (1D): Sequences Our AI models are designed and/or optimized to run on NVIDIA GPU-accelerated systems. By leveraging NVIDIA’s hardware (e.g. GPU cores) and software frameworks (e.g., CUDA libraries), the model achieves faster training and inference times compared to CPU-only solutions. Software Integration: Supported Runtime Engine(s): TensorRT-LLM Supported Hardware Microarchitecture Compatibility: NVIDIA Blackwell The integration of foundation and fine-tuned models into AI systems requires additional testing using use-case-specific data to ensure safe and effective deployment. Following the V-model methodology, iterative testing and validation at both unit and system levels are essential to mitigate risks, meet technical and functional requirements, and ensure compliance with safety and ethical standards before deployment. The total size (in number of data points) 503.3K Total number of datasets 2 Dataset partition: Training 100% Link: ultrachat200k and Magpie-Llama-3.1-Pro-300K-Filtered, only prompts from the datasets were used for data synthesis, (the original responses from GPT were not used) for data synthesis, which is then used to train the Eagle modules. Click the links above for more information regarding the dataset. Data Collection Method by dataset Hybrid: Synthetic, Human, Automated Labeling Method by dataset Hybrid: Synthetic, Human, Automated Properties: 500K samples, majority synthetic, others sourced from commercially-friendly datasets. Evaluation Dataset: Link: MTBench, for more details, see here Data Collection Method by dataset Hybrid: Human, Synthetic Labeling Method by dataset Hybrid: Human, Synthetic Properties: 3,300 multi-turn dialogue sequences, each annotated with expert preference votes. Eagle Speculative Decoding Synthesized data was obtained from OpenAI's gpt-oss-120b model, which is then used to finetune the Eagle modules. This model is ready for inference with TensorRT-LLM in Eagle speculative decoding mode. Eagle modules are used to predict candidate tokens beyond the next token. In the generation step, each forward Eagle module generates a distribution of tokens beyond the previous. Then, a tree-based attention mechanism samples some candidate sequences for the original model to validate. The longest accepted candidate sequence is selected so that more than 1 token is returned in the generation step. The number of tokens generated in each step is called acceptance rate. To serve the checkpoint with TensorRT-LLM, follow the sample commands below with the TensorRT-LLM GitHub repo: Note that the only layer from the target passed to the draft is the final hidden state post LayerNorm and pre-LMHead. Evaluation The Eagle acceptance rate benchmark results (MT-Bench) with draft length 3 are presented in the table below for medium reasoning: | Category | MT Bench Acceptance Rate | |:-----------|:------------------------:| | writing | 2.10 | | roleplay | 2.18 | | reasoning | 2.57 | | math | 2.75 | | coding | 2.67 | | extraction | 2.54 | | stem | 2.27 | | humanities | 2.00 | NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse. For more detailed information on ethical considerations for this model, please see the Model Card++ Explainability, Bias, Safety & Security, and Privacy Subcards. Please report security vulnerabilities or NVIDIA AI Concerns here. SUBCARDS: |Field:|Response:| |:---:|:---:| |Intended Application(s) & Domain(s):| Text generation, reasoning, summarization, and question answering. | |Model Type: |Text and Image-to-text transformer | |Intended Users:|This model is intended for developers, researchers, and customers building/utilizing LLMs, while balancing accuracy and efficiency.| |Output:|Text String(s)| |Describe how the model works:|Generates text by predicting the next word or token based on the context provided in the input sequence using multiple self-attention layers| |Technical Limitations:| The model was trained on data that contains toxic language and societal biases originally crawled from the internet. Therefore, the model may amplify those biases and return toxic responses especially when prompted with toxic prompts. Therefore, before deploying any applications of this model, developers should perform safety testing and tuning tailored to their specific applications of the model.| |Verified to have met prescribed quality standards?|Yes| |Performance Metrics:|Accuracy, Throughput, and user-side throughput| |Potential Known Risk| The model may generate answers that may be inaccurate, omit key information, or include irrelevant or redundant text producing socially unacceptable or undesirable text, even if the prompt itself does not include anything explicitly offensive. | |Licensing:| Your usage is governed by the following license | |Field:|Response:| |:---:|:---:| |Participation considerations from adversely impacted groups (protected classes) in model design and testing:|None| |Measures taken to mitigate against unwanted bias:|None| |Field:|Response:| |:---:|:---:| |Model Application(s):|Chat, Instruction Following, Chatbot Development, Code Generation, Reasoning| |Describe life critical application (if present):|None Known| |Use Case Restrictions:|Abide by the license | |Model and Dataset Restrictions:|The Principle of least privilege (PoLP) is applied limiting access for dataset generation. Restrictions enforce dataset access during training, and dataset license constraints adhered to. Model checkpoints are made available on Hugging Face, and may become available on cloud providers' model catalog.| |Field:|Response:| |:---:|:---:| |Generatable or Reverse engineerable personal data?|None| |Was consent obtained for any personal data used?|None Known| |Personal data used to create this model?|None Known| |How often is dataset reviewed?|Before Release| |Is there provenance for all datasets used in training?|Yes| |Does data labeling (annotation, metadata) comply with privacy laws?|Yes| |Applicable NVIDIA Privacy Policy|https://www.nvidia.com/en-us/about-nvidia/privacy-policy/|

NaNK
llama
1,448
4

parakeet-rnnt-1.1b

[](#model-architecture) | [](#model-architecture) | [](#datasets) `parakeet-rnnt-1.1b` is an ASR model that transcribes speech in lower case English alphabet. This model is jointly developed by NVIDIA NeMo and Suno.ai teams. It is an XXL version of FastConformer Transducer [1] (around 1.1B parameters) model. See the model architecture section and NeMo documentation for complete architecture details. License to use this model is covered by the CC-BY-4.0. By downloading the public and release version of the model, you accept the terms and conditions of the CC-BY-4.0 license. Discover more from NVIDIA: For documentation, deployment guides, enterprise-ready APIs, and the latest open modelsβ€”including Nemotron and other cutting-edge speech, translation, and generative AIβ€”visit the NVIDIA Developer Portal at developer.nvidia.com. Join the community to access tools, support, and resources to accelerate your development with NVIDIA’s NeMo, Riva, NIM, and foundation models. Explore more from NVIDIA: What is Nemotron? NVIDIA Developer Nemotron NVIDIA Riva Speech NeMo Documentation To train, fine-tune or play with the model you will need to install NVIDIA NeMo. We recommend you install it after you've installed latest PyTorch version. The model is available for use in the NeMo toolkit [3], and can be used as a pre-trained checkpoint for inference or for fine-tuning on another dataset. Transcribing using Python First, let's get a sample This model accepts 16000 Hz mono-channel audio (wav files) as input. This model provides transcribed speech as a string for a given audio sample. FastConformer [1] is an optimized version of the Conformer model with 8x depthwise-separable convolutional downsampling. The model is trained in a multitask setup with a Transducer decoder (RNNT) loss. You may find more information on the details of FastConformer here: Fast-Conformer Model. The NeMo toolkit [3] was used for training the models for over several hundred epochs. These model are trained with this example script and this base config. The tokenizers for these models were built using the text transcripts of the train set with this script. The model was trained on 64K hours of English speech collected and prepared by NVIDIA NeMo and Suno teams. The training dataset consists of private subset with 40K hours of English speech plus 24K hours from the following public datasets: - Librispeech 960 hours of English speech - Fisher Corpus - Switchboard-1 Dataset - WSJ-0 and WSJ-1 - National Speech Corpus (Part 1, Part 6) - VCTK - VoxPopuli (EN) - Europarl-ASR (EN) - Multilingual Librispeech (MLS EN) - 2,000 hour subset - Mozilla Common Voice (v7.0) - People's Speech - 12,000 hour subset The performance of Automatic Speech Recognition models is measuring using Word Error Rate. Since this dataset is trained on multiple domains and a much larger corpus, it will generally perform better at transcribing audio in general. The following tables summarizes the performance of the available models in this collection with the Transducer decoder. Performances of the ASR models are reported in terms of Word Error Rate (WER%) with greedy decoding. |Version|Tokenizer|Vocabulary Size|AMI|Earnings-22|Giga Speech|LS test-clean|SPGI Speech|TEDLIUM-v3|Vox Populi|Common Voice| |---------|-----------------------|-----------------|---------------|---------------|------------|-----------|-----|-------|------|------| | 1.22.0 | SentencePiece Unigram | 1024 | 17.10 | 14.11 | 9.96 | 1.46 | 2.47 | 3.11 | 3.92 | 5.39 | 5.79 | These are greedy WER numbers without external LM. More details on evaluation can be found at HuggingFace ASR Leaderboard NVIDIA Riva, is an accelerated speech AI SDK deployable on-prem, in all clouds, multi-cloud, hybrid, on edge, and embedded. Additionally, Riva provides: World-class out-of-the-box accuracy for the most common languages with model checkpoints trained on proprietary data with hundreds of thousands of GPU-compute hours Best in class accuracy with run-time word boosting (e.g., brand and product names) and customization of acoustic model, language model, and inverse text normalization Streaming speech recognition, Kubernetes compatible scaling, and enterprise-grade support Although this model isn’t supported yet by Riva, the list of supported models is here. Check out Riva live demo. References [1] Fast Conformer with Linearly Scalable Attention for Efficient Speech Recognition

NaNK
license:cc-by-4.0
1,397
163

canary-1b

[](#model-architecture) | [](#model-architecture) | [](#datasets) NVIDIA NeMo Canary is a family of multi-lingual multi-tasking models that achieves state-of-the art performance on multiple benchmarks. With 1 billion parameters, Canary-1B supports automatic speech-to-text recognition (ASR) in 4 languages (English, German, French, Spanish) and translation from English to German/French/Spanish and from German/French/Spanish to English with or without punctuation and capitalization (PnC). 🚨Note: Checkout our latest Canary-1B-Flash model, a faster and more accurate variant of Canary-1B! Canary is an encoder-decoder model with FastConformer [1] encoder and Transformer Decoder [2]. With audio features extracted from the encoder, task tokens such as ` `, ` `, ` ` and ` ` are fed into the Transformer Decoder to trigger the text generation process. Canary uses a concatenated tokenizer [5] from individual SentencePiece [3] tokenizers of each language, which makes it easy to scale up to more languages. The Canay-1B model has 24 encoder layers and 24 layers of decoder layers in total. To train, fine-tune or Transcribe with Canary, you will need to install NVIDIA NeMo. We recommend you install it after you've installed Cython and latest PyTorch version. The model is available for use in the NeMo toolkit [4], and can be used as a pre-trained checkpoint for inference or for fine-tuning on another dataset. Input Format Input to Canary can be either a list of paths to audio files or a jsonl manifest file. If the input is a list of paths, Canary assumes that the audio is English and Transcribes it. I.e., Canary default behaviour is English ASR. To use Canary for transcribing other supported languages or perform Speech-to-Text translation, specify the input as jsonl manifest file, where each line in the file is a dictionary containing the following fields: An example manifest for transcribing English audios can be: An example manifest for transcribing English audios into German text can be: Alternatively, one can use `transcribespeech.py` script to do the same. This model accepts single channel (mono) audio sampled at 16000 Hz, along with the task/languages/PnC tags as input. The model outputs the transcribed/translated text corresponding to the input audio, in the specified target language and with or without punctuation and capitalization. Canary-1B is trained using the NVIDIA NeMo toolkit [4] for 150k steps with dynamic bucketing and a batch duration of 360s per GPU on 128 NVIDIA A100 80GB GPUs. The model can be trained using this example script and base config. The tokenizers for these models were built using the text transcripts of the train set with this script. The Canary-1B model is trained on a total of 85k hrs of speech data. It consists of 31k hrs of public data, 20k hrs collected by Suno, and 34k hrs of in-house data. English (25.5k hours) - Librispeech 960 hours - Fisher Corpus - Switchboard-1 Dataset - WSJ-0 and WSJ-1 - National Speech Corpus (Part 1, Part 6) - VCTK - VoxPopuli (EN) - Europarl-ASR (EN) - Multilingual Librispeech (MLS EN) - 2,000 hour subset - Mozilla Common Voice (v7.0) - People's Speech - 12,000 hour subset - Mozilla Common Voice (v11.0) - 1,474 hour subset German (2.5k hours) - Mozilla Common Voice (v12.0) - 800 hour subset - Multilingual Librispeech (MLS DE) - 1,500 hour subset - VoxPopuli (DE) - 200 hr subset Spanish (1.4k hours) - Mozilla Common Voice (v12.0) - 395 hour subset - Multilingual Librispeech (MLS ES) - 780 hour subset - VoxPopuli (ES) - 108 hour subset - Fisher - 141 hour subset French (1.8k hours) - Mozilla Common Voice (v12.0) - 708 hour subset - Multilingual Librispeech (MLS FR) - 926 hour subset - VoxPopuli (FR) - 165 hour subset In both ASR and AST experiments, predictions were generated using beam search with width 5 and length penalty 1.0. The ASR performance is measured with word error rate (WER), and we process the groundtruth and predicted text with whisper-normalizer. | Version | Model | En | De | Es | Fr | |:---------:|:-----------:|:------:|:------:|:------:|:------:| | 1.23.0 | canary-1b | 7.97 | 4.61 | 3.99 | 6.53 | | Version | Model | En | De | Es | Fr | |:---------:|:-----------:|:------:|:------:|:------:|:------:| | 1.23.0 | canary-1b | 3.06 | 4.19 | 3.15 | 4.12 | More details on evaluation can be found at HuggingFace ASR Leaderboard We evaluate AST performance with BLEU score, and use native annotations with punctuation and capitalization in the datasets. | Version | Model | En->De | En->Es | En->Fr | De->En | Es->En | Fr->En | |:-----------:|:---------:|:----------:|:----------:|:----------:|:----------:|:----------:|:----------:| | 1.23.0 | canary-1b | 32.15 | 22.66 | 40.76 | 33.98 | 21.80 | 30.95 | | Version | Model | De->En | Es->En | Fr->En | |:-----------:|:---------:|:----------:|:----------:|:----------:| | 1.23.0 | canary-1b | 37.67 | 40.7 | 40.42 | | Version | Model | En->De | En->Es | En->Fr | |:-----------:|:---------:|:----------:|:----------:|:----------:| | 1.23.0 | canary-1b | 23.84 | 35.74 | 28.29 | As outlined in the paper "Towards Measuring Fairness in AI: the Casual Conversations Dataset", we assessed the Canary-1B model for fairness. The model was evaluated on the CausalConversations-v1 dataset, and the results are reported as follows: | Gender | Male | Female | N/A | Other | | :--- | :--- | :--- | :--- | :--- | | Num utterances | 19325 | 24532 | 926 | 33 | | % WER | 14.64 | 12.92 | 17.88 | 126.92 | | Age Group | (18-30) | (31-45) | (46-85) | (1-100) | | :--- | :--- | :--- | :--- | :--- | | Num utterances | 15956 | 14585 | 13349 | 43890 | | % WER | 14.64 | 13.07 | 13.47 | 13.76 | (Error rates for fairness evaluation are determined by normalizing both the reference and predicted text, similar to the methods used in the evaluations found at https://github.com/huggingface/openasrleaderboard.) NVIDIA Riva, is an accelerated speech AI SDK deployable on-prem, in all clouds, multi-cloud, hybrid, on edge, and embedded. Additionally, Riva provides: World-class out-of-the-box accuracy for the most common languages with model checkpoints trained on proprietary data with hundreds of thousands of GPU-compute hours Best in class accuracy with run-time word boosting (e.g., brand and product names) and customization of acoustic model, language model, and inverse text normalization Streaming speech recognition, Kubernetes compatible scaling, and enterprise-grade support Canary is available as a NIM endpoint via Riva. Try the model yourself here: https://build.nvidia.com/nvidia/canary-1b-asr. References [1] Fast Conformer with Linearly Scalable Attention for Efficient Speech Recognition [5] Unified Model for Code-Switching Speech Recognition and Language Identification Based on Concatenated Tokenizer License to use this model is covered by the CC-BY-NC-4.0. By downloading the public and release version of the model, you accept the terms and conditions of the CC-BY-NC-4.0 license.

NaNK
license:cc-by-nc-4.0
1,386
448

Llama-4-Maverick-17B-128E-Instruct-FP8

NaNK
llama4
1,381
9

llama-3.1-nemoguard-8b-topic-control

NaNK
base_model:meta-llama/Llama-3.1-8B-Instruct
1,350
16

Mistral-NeMo-Minitron-8B-Instruct

Mistral-NeMo-Minitron-8B-Instruct is a model for generating responses for various text-generation tasks including roleplaying, retrieval augmented generation, and function calling. It is a fine-tuned version of nvidia/Mistral-NeMo-Minitron-8B-Base, which was pruned and distilled from Mistral-NeMo 12B using our LLM compression technique. The model was trained using a multi-stage SFT and preference-based alignment technique with NeMo Aligner. For details on the alignment technique, please refer to the Nemotron-4 340B Technical Report. The model supports a context length of 8,192 tokens. Model Dates: Mistral-NeMo-Minitron-8B-Instruct was trained between August 2024 and September 2024. Mistral-NeMo-Minitron-8B-Instruct uses a model embedding size of 4096, 32 attention heads, MLP intermediate dimension of 11520, with 40 layers in total. Additionally, it uses Grouped-Query Attention (GQA) and Rotary Position Embeddings (RoPE). Architecture Type: Transformer Decoder (Auto-regressive Language Model) We recommend using the following prompt template, which was used to fine-tune the model. The model may not perform optimally without it. - Note that a newline character `\n` should be added at the end of the prompt. - We recommend using ` ` as a stop token. You can also use `pipeline` but you need to create a tokenizer object and assign it to the pipeline manually. | Category | Benchmark | # Shots | Mistral-NeMo-Minitron-8B-Instruct | |:----------------------|:----------------------|--------:|----------------------------------:| | General | MMLU | 5 | 70.4 | | | MT Bench (GPT4-Turbo) | 0 | 7.86 | | Math | GMS8K | 0 | 87.1 | | Reasoning | GPQA | 0 | 31.5 | | Code | HumanEval | 0 | 71.3 | | | MBPP | 0 | 72.5 | | Instruction Following | IFEval | 0 | 84.4 | | Tool Use | BFCL v2 Live | 0 | 67.6 | The Mistral-NeMo-Minitron-8B-Instruct model underwent AI safety evaluation including adversarial testing via three distinct methods: - Garak, is an automated LLM vulnerability scanner that probes for common weaknesses, including prompt injection and data leakage. - AEGIS, is a content safety evaluation dataset and LLM based content safety classifier model, that adheres to a broad taxonomy of 13 categories of critical risks in human-LLM interactions. - Human Content Red Teaming leveraging human interaction and evaluation of the models' responses. The model was trained on data that contains toxic language and societal biases originally crawled from the internet. Therefore, the model may amplify those biases and return toxic responses especially when prompted with toxic prompts. The model may generate answers that may be inaccurate, omit key information, or include irrelevant or redundant text producing socially unacceptable or undesirable text, even if the prompt itself does not include anything explicitly offensive. This issue could be exacerbated without the use of the recommended prompt template. This issue could be exacerbated without the use of the recommended prompt template. If you are going to use this model in an agentic workflow, validate that the imported packages are from a trusted source to ensure end-to-end security. NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse. For more detailed information on ethical considerations for this model, please see the Model Card++. Please report security vulnerabilities or NVIDIA AI Concerns here.

NaNK
β€”
1,305
81

DeepSeek-R1-NVFP4-v2

NaNK
license:mit
1,288
3

segformer-b0-finetuned-cityscapes-512-1024

β€”
1,271
1

bigvgan_22khz_80band

NaNK
license:mit
1,230
1

stt_ar_fastconformer_hybrid_large_pcd_v1.0

license:cc-by-4.0
1,220
23

Qwen3-235B-A22B-FP8

NaNK
license:apache-2.0
1,139
0

audio-flamingo-3

🚨 Audio Flamingo 3 has now been integrated to HuggingFace Transformers. Check out the model card here! Audio Flamingo 3: Advancing Audio Intelligence with Fully Open Large Audio-Language Models Description: Audio Flamingo 3 (AF3) is a fully open, state-of-the-art Large Audio-Language Model (LALM) that advances reasoning and understanding across speech, sounds, and music. AF3 builds on previous work with innovations in: - Unified audio representation learning (speech, sound, music) - Flexible, on-demand chain-of-thought reasoning - Long-context audio comprehension (up to 10 minutes) - Multi-turn, multi-audio conversational dialogue (AF3-Chat) - Voice-to-voice interaction (AF3-Chat) Extensive evaluations confirm AF3’s effectiveness, setting new benchmarks on over 20 public audio understanding and reasoning tasks. This model is for non-commercial research purposes only. Audio Flamingo 3 is supported in πŸ€— Transformers. To run the model, first install Transformers: > Note: AF3 processes audio in 30-second windows with a 10-minute total cap per sample. Longer inputs are truncated. AF3 transcription checkpoints prepend answers with fixed assistant phrasing such as `The spoken content of the audio is " ".`. Passing `stripprefix=True` removes that canned prefix and the surrounding quotes so you only keep the transcription. You can tune decoding similar to other text-generation models: If your GPU supports it and you are not using `torch.compile`, install Flash-Attention and enable it at load time: AF3’s forward pass is compatible with `torch.compile` for significant speed-ups: > `torch.compile` is not compatible with Flash Attention 2 at the same time. If Flash-Attention isn’t available, AF3 will use PyTorch scaled-dot product attention (SDPA) by default on supported PyTorch versions. You can set it explicitly: Model Architecture: Audio Flamingo 3 uses AF-Whisper unified audio encoder, MLP-based audio adaptor, Decoder-only LLM backbone (Qwen2.5-7B), and Streaming TTS module (AF3-Chat). Audio Flamingo 3 can take up to 10 minutes of audio inputs. License / Terms of Use The model is released under the NVIDIA OneWay Noncommercial License. Portions of the dataset generation are also subject to the Qwen Research License and OpenAI’s Terms of Use. Use Case Intended for researchers and developers to explore: - Audio question answering and reasoning - Long-context audio comprehension - Interactive sound/music design assistants - Multi-turn (voice) chat Release Date - Github (07/10/2025) via https://github.com/NVIDIA/audio-flamingo - HuggingFace (07/10/2025) via https://huggingface.co/nvidia/audio-flamingo-3 References: [Audio Flamingo 3: Advancing Audio Intelligence with Fully Open Large Audio-Language Models]() Project Page Demo Website Hugging Face Model Architecture: Architecture Type: Transformer Network Architecture: Audio Flamingo 3 AF3 uses: - AF-Whisper unified audio encoder - MLP-based audio adaptor - Decoder-only LLM backbone (Qwen2.5-7B) - Streaming TTS module (AF3-Chat) This model was developed based on NVILA and Qwen-2.5-7B Input: - Input Type: Audio, Text - Input Format: WAV/MP3/FLAC, UTF-8 text - Input Parameters: Audio is Two-Dimensional (2D) and Text is One-Dimensional (1D) - Other Properties Related to Input: - Max Audio Length: 10 Minutes - Max Text Length: 16000 tokens Output: - Output Type: Text (and optional speech) - Text Format: UTF-8 string - Output Parameters: One-Dimensional (1D) - Other Properties Related to Output: - Max Text Length: 1024 tokens - Speech Format: streaming TTS (text-to-speech) waveform Our AI models are designed and/or optimized to run on NVIDIA GPU-accelerated systems (A100/H100). By leveraging NVIDIA’s hardware (e.g. GPU cores) and software frameworks (e.g., CUDA libraries), the model achieves faster training and inference times compared to CPU-only solutions. Software Integration: Runtime Engine: PyTorch / HuggingFace Transformers Supported Hardware: NVIDIA Ampere (A100) NVIDIA Hopper (H100) Training Dataset: AF3 is trained entirely on open-source audio data, organized into four novel, large-scale collections. For each dataset, we mention whether the dataset annotations are collected by Human or they are Automated i.e. generated using AI models. The data collection method noted below applies for all datasets used for training and testing: Data Collection Method: Human Labeling Collection Method: Please see below: General Sound: WavCaps (Automated) MACS (Human) SoundDescs (Human) Clotho-v2 (Human) WavText5K (Human) Clotho-AQA (Human) Open-AQA (Automated) CompA-R (Automated) Salmonn AQA (Automated) Audio Entailment(Automated) CompA (Automated) AudioSet (Human) YouTube-8M (Human) FSD50k (Human) CochlScene (Human) NonSpeech7K (Human) Chime-Home (Human) Sonyc-UST (Human) Music: LP-MusicCaps (Automated) MusicQA (Automated) MusicAVQA (Human) MusicBench (Automated) Mu-LLAMA (Automated) NSynth (Human) FMA (Human) MusDB-HQ (Human) Music4All (Human) Million Song Dataset (Human) Speech: MSP-Podcast (Human) JL-Corpus (Human) MELD (Human) Tess (Human) OMGEmotion (Human) Emov-DB (Human) LibriSpeech (Human) SPGISpeech (Human) TEDLIUM (Human) GigaSpeech (Human) Common Voice 15 (Human) VoxPopuli (Human) VoxCeleb2 (Human) Switchboard (Human) AMI (Human) Mixed: AudioSkills-XL (ours) (Automated) LongAudio-XL (ours) (Automated) AF-Think (ours) (Automated) AF-Chat (ours) (Automated) Testing Dataset: Audio Flamingo 3 is evaluated on the test split of the following datasets. Data Collection Method: Human (for all datasets noted below) Labeling Method: See below ClothoAQA (Human) MusicAVQA (Human) Clotho-v2 (Human) CochlScene (Human) NonSpeech7K (Human) NSynth (Human) AudioCaps (Human) US8K (Human) GTZAN (Human) MMAU (Human) MMAR (Human) Audio Entailment(Automated) CompA-R-test (Automated) MuchoMusic (Automated) Open-AQA(Automated) MusicInstruct (Automated) MusicQA (Automated) CMM Hallucination (Human) IEMOCAP (Human) VoiceBench (Human) OpenAudioBench (Human) SEED (Human) LibriSpeech (Human) SPGISpeech (Human) TEDLIUM (Human) GigaSpeech (Human) Common Voice 15 (Human) VoxPopuli (Human) LongAudioBench (ours) (Automated) AF-Chat-test (ours) (Human) Engine: HuggingFace Transformers Test Hardware: NVIDIA A100 80 GB Ethical Considerations: NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse. Please report security vulnerabilities or NVIDIA AI Concerns here. Acknowledgements Built with Qwen, NVILA and the open audio-ML community.

β€”
1,120
131

music-flamingo-hf

β€”
1,113
21

OpenReasoning-Nemotron-1.5B

NaNK
license:cc-by-4.0
995
48

Cosmos-Transfer1-7B-Sample-AV-Single2MultiView

NaNK
β€”
995
4

Nemotron-H-47B-Reasoning-128K

NaNK
β€”
981
18

canary-180m-flash

license:cc-by-4.0
978
80

Qwen3-235B-A22B-Eagle3

NaNK
llama
977
6

MambaVision-B-21K

β€”
973
5

esm2_t36_3B_UR50D

NaNK
license:mit
972
1

Nemotron-H-8B-Reasoning-128K

NaNK
β€”
967
23

Cosmos-1.0-Diffusion-7B-Text2World

NaNK
β€”
942
228

Llama-3_3-Nemotron-Super-49B-v1_5-NVFP4

NaNK
llama-3
890
11

Nemotron-Flash-3B

NaNK
β€”
877
2

Cosmos-Transfer1-7B-Sample-AV

NaNK
β€”
873
15

Cosmos-Predict2-2B-Video2World

NaNK
β€”
852
33

llama-3.2-nv-embedqa-1b-v2

The Llama Nemotron Embedding 1B model is optimized for multilingual and cross-lingual text question-answering retrieval with support for long documents (up to 8192 tokens) and dynamic embedding size (Matryoshka Embeddings). This model was evaluated on 26 languages: English, Arabic, Bengali, Chinese, Czech, Danish, Dutch, Finnish, French, German, Hebrew, Hindi, Hungarian, Indonesian, Italian, Japanese, Korean, Norwegian, Persian, Polish, Portuguese, Russian, Spanish, Swedish, Thai, and Turkish. In addition to enabling multilingual and cross-lingual question-answering retrieval, this model reduces the data storage footprint by 35x through dynamic embedding sizing and support for longer token length, making it feasible to handle large-scale datasets efficiently. An embedding model is a crucial component of a text retrieval system, as it transforms textual information into dense vector representations. They are typically transformer encoders that process tokens of input text (for example: question, passage) to output an embedding. The Llama Nemotron Embedding 1B model is a part of the NVIDIA NeMo Retriever collection of NIM, which provide state-of-the-art, commercially-ready models and microservices, optimized for the lowest latency and highest throughput. It features a production-ready information retrieval pipeline with enterprise support. The models that form the core of this solution have been trained using responsibly selected, auditable data sources. With multiple pre-trained models available as starting points, developers can also readily customize them for domain-specific use cases, such as information technology, human resource help assistants, and research & development research assistants. We are excited to announce the open sourcing of this commercial embedding model. For users interested in deploying this model in production environments, it is also available via the model API in NVIDIA Inference Microservices (NIM) at llama-nemotron-embed-1b-v2. The Llama Nemotron Embedding 1B model is most suitable for users who want to build a multilingual question-and-answer application over a large text corpus, leveraging the latest dense retrieval technologies. Use of this model is governed by the NVIDIA Open Model License Agreement. Additional Information: Llama 3.2 Community Model License Agreement. Architecture Type: Transformer Network Architecture: Fine-tuned Llama3.2 1B Retriever This NeMo embedding model is a transformer encoder - a fine-tuned version of Llama3.2 1b, with 16 layers and an embedding size of 2048, which is trained on public datasets. The AdamW optimizer is employed incorporating 100 warm up steps and 5e-6 learning rate with WarmupDecayLR scheduler. Embedding models for text retrieval are typically trained using a bi-encoder architecture. This involves encoding a pair of sentences (for example, query and chunked passages) independently using the embedding model. Contrastive learning is used to maximize the similarity between the query and the passage that contains the answer, while minimizing the similarity between the query and sampled negative passages not useful to answer the question. Input Type: Text Input Format: List of strings Input Parameter: 1D Other Properties Related to Input: The model's maximum context length is 8192 tokens. Texts longer than maximum length must either be chunked or truncated. Output Type: Floats Output Format: List of float arrays Output: Model outputs embedding vectors of maximum dimension 2048 for each text string (can be configured based on 384, 512, 768, 1024, or 2048). Other Properties Related to Output: N/A Transformers Usage You can also use transformers directly to run the model. The model requires transformers version 4.47.1. 1. Ensure you are using `vllm==0.11.0`. 2. Clone this model's repository. 3. Overwrite `config.json` with `configvllm.json`. 4. Start the vLLM server with the following command (replace the ` ` and ` ` with your values): You can now access the model using the OpenAI sdk, for instance: Runtime Engine: Llama Nemotron embedding NIM Supported Hardware Microarchitecture Compatibility: NVIDIA Ampere, NVIDIA Hopper, NVIDIA Lovelace Supported Operating System(s): Linux Llama Nemotron Embedding 1B v2 Short Name: llama-nemotron-embed-1b-v2 The development of large-scale public open-QA datasets has enabled tremendous progress in powerful embedding models. However, one popular dataset named MS MARCO restricts β€Œcommercial licensing, limiting the use of these models in commercial settings. To address this, NVIDIA created its own training dataset blend based on public QA datasets, which each have a license for commercial applications. Data Collection Method by dataset: Automated, Unknown Properties: Semi-supervised pre-training on 12M samples from public datasets and fine-tuning on 1M samples from public datasets. Properties: We evaluated the NeMo Rtriever embdding model in comparison to literature open & commercial retriever models on academic benchmarks for question-answering - NQ, HotpotQA and FiQA (Finance Q\&A) from BeIR benchmark and TechQA dataset. Note that the model was evaluated offline on A100 GPUs using the model's PyTorch checkpoint. In this benchmark, the metric used was Recall@5. | Open & Commercial Retrieval Models | Average Recall@5 on NQ, HotpotQA, FiQA, TechQA dataset | | ----- | ----- | | llama-nemotron-embed-1b-v2 (embedding dim 2048) | 68.60% | | llama-nemotron-embed-1b-v2 (embedding dim 384) | 64.48% | | llama-3.2-nv-embedqa-1b-v1 (embedding dim 2048) | 68.97% | | nv-embedqa-mistral-7b-v2 | 72.97% | | nv-embedqa-mistral-7B-v1 | 64.93% | | nv-embedqa-e5-v5 | 62.07% | | nv-embedqa-e5-v4 | 57.65% | | e5-large-unsupervised | 48.03% | | BM25 | 44.67% | We evaluated the multilingual capabilities on the academic benchmark MIRACL across 15 languages and translated the English and Spanish version of MIRACL into additional 11 languages. The reported scores are based on an internal version of MIRACL by selecting hard negatives for each query to reduce the corpus size. | Open & Commercial Retrieval Models | Average Recall@5 on multilingual | | ----- | ----- | | llama-nemotron-embed-1b-v2 (embedding dim 2048) | 60.75% | | llama-nemotron-embed-1b-v2 (embedding dim 384) | 58.62% | | llama-3.2-nv-embedqa-1b-v1 | 60.07% | | nv-embedqa-mistral-7b-v2 | 50.42% | | BM25 | 26.51% | We evaluated the cross-lingual capabilities on the academic benchmark MLQA based on 7 languages (Arabic, Chinese, English, German, Hindi, Spanish, Vietnamese). We consider only evaluation datasets when the query and documents are in different languages. We calculate the average Recall@5 across the 42 different language pairs. | Open & Commercial Retrieval Models | Average Recall@5 on MLQA dataset with different languages | | ----- | ----- | | llama-nemotron-embed-1b-v2 (embedding dim 2048) | 79.86% | | llama-nemotron-embed-1b-v2 (embedding dim 384) | 71.61% | | llama-3.2-nv-embedqa-1b-v1 (embedding dim 2048) | 78.77% | | nv-embedqa-mistral-7b-v2 | 68.38% | | BM25 | 13.01% | We evaluated the support of long documents on the academic benchmark Multilingual Long-Document Retrieval (MLDR) built on Wikipedia and mC4, covering 12 typologically diverse languages. The English version has a median length of 2399 tokens and 90th percentile of 7483 tokens using the llama 3.2 tokenizer. The MLDR dataset is based on synthetic generated questions with a LLM, which has the tendency to create questions with similar keywords than the positive document, but might not be representative for real user queries. This characteristic of the dataset benefits sparse embeddings like BM25. | Open & Commercial Retrieval Models | Average Recall@5 on MLDR | | ----- | ----- | | llama-nemotron-embed-1b-v2 (embedding dim 2048) | 59.55% | | llama-nemotron-embed-1b-v2 (embedding dim 384) | 54.77% | | llama-3.2-nv-embedqa-1b-v1 (embedding dim 2048) | 60.49% | | nv-embedqa-mistral-7b-v2 | 43.24% | | BM25 | 71.39% | Properties: The evaluation datasets are based on MTEB/BEIR, TextQA, TechQA, MIRACL, MLQA, and MLDR. The size ranges between 10,000s up to 5M depending on the dataset. Inference Engine: TensorRT Test Hardware: H100 PCIe/SXM, A100 PCIe/SXM, L40s, L4, and A10G NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their supporting model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse. For more detailed information on ethical considerations for this model, please see the Model Card++ tab for the Explainability, Bias, Safety & Security, and Privacy subcards. Please report security vulnerabilities or NVIDIA AI Concerns here. Get access to knowledge base articles and support cases or submit a ticket at the NVIDIA AI Enterprise Support Services page.. NVIDIA NIM Documentation Visit the NeMo Retriever docs page for release documentation, deployment guides and more. | Field | Response | | ----- | ----- | | Participation considerations from adversely impacted groups protected classes in model design and testing | None | | Measures taken to mitigate against unwanted bias | None | | Field | Response | | ----- | ----- | | Intended Application & Domain: | Passage and query embedding for question and answer retrieval | | Model Type: | Transformer encoder | | Intended User: | Generative AI creators working with conversational AI models - users who want to build a multilingual question and answer application over a large text corpus, leveraging the latest dense retrieval technologies. | | Output: | Array of float numbers (Dense Vector Representation for the input text) | | Describe how the model works: | Model transforms the tokenized input text into a dense vector representation. | | Performance Metrics: | Accuracy, Throughput, and Latency | | Potential Known Risks: | This model does not always guarantee to retrieve the correct passage(s) for a given query. | | Licensing & Terms of Use: | Use of this model is governed by the NVIDIA Open Model License Agreement. Additional Information: Llama 3.2 Community Model License Agreement. | | Technical Limitations | The model’s max sequence length is 8192. Therefore, the longer text inputs should be truncated. | | Name the adversely impacted groups this has been tested to deliver comparable outcomes regardless of: | N/A | | Verified to have met prescribed NVIDIA quality standards: | Yes | | Field | Response | | ----- | ----- | | Generatable or reverse engineerable personally-identifiable information (PII)? | None | | Was consent obtained for any personal data used? | Not Applicable | | PII used to create this model? | None | | How often is the dataset reviewed? | Before Every Release | | Is a mechanism in place to honor data subject right of access or deletion of personal data? | No | | If personal data was collected for the development of the model, was it collected directly by NVIDIA? | Not Applicable | | If personal data was collected for the development of the model by NVIDIA, do you maintain or have access to disclosures made to data subjects? | Not Applicable | | If personal data was collected for the development of this AI model, was it minimized to only what was required? | Not Applicable | | Is there provenance for all datasets used in training? | Yes | | Does data labeling (annotation, metadata) comply with privacy laws? | Yes | | Is data compliant with data subject requests for data correction or removal, if such a request was made? | No, not possible with externally-sourced data. | | Field | Response | | ----- | ----- | | Model Application(s): | Text Embedding for Retrieval | | Describe the physical safety impact (if present). | Not Applicable | | Use Case Restrictions: | Use of this model is governed by the NVIDIA Open Model License Agreement. Additional Information: Llama 3.2 Community Model License Agreement. | | Model and dataset restrictions: | The Principle of least privilege (PoLP) is applied limiting access for dataset generation and model development. Restrictions enforce dataset access during training, and dataset license constraints adhered to. |

NaNK
llama_bidirec
847
9

Eagle2-1B

NaNK
license:cc-by-nc-4.0
830
26

stt_en_fastconformer_hybrid_large_streaming_multi

license:cc-by-4.0
824
16

Hymba 1.5B Instruct

πŸ’Ύ Github &nbsp&nbsp | &nbsp&nbsp πŸ“„ Paper | &nbsp&nbsp πŸ“œ Blog &nbsp Hymba-1.5B-Instruct is a 1.5B parameter model finetuned from Hymba-1.5B-Base using a combination of open source instruction datasets and internally collected synthetic datasets. This model is finetuned with supervised fine-tuning and direct preference optimization. Hymba-1.5B-Instruct is capable of many complex and important tasks like math reasoning, function calling, and role playing. Model Dates: Hymba-1.5B-Instruct was trained between September 4, 2024 and November 10th, 2024. > ⚑️ We've released a minimal implementation of Hymba on GitHub to help developers understand and implement its design principles in their own models. Check it out! barebones-hymba. > Hymba-1.5B-Instruct has a model embedding size of 1600, 25 attention heads, and an MLP intermediate dimension of 5504, with 32 layers in total, 16 SSM states, 3 full attention layers, the rest are sliding window attention. Unlike the standard Transformer, each attention layer in Hymba has a hybrid combination of standard attention heads and Mamba heads in parallel. Additionally, it uses Grouped-Query Attention (GQA) and Rotary Position Embeddings (RoPE). - Fuse attention heads and SSM heads within the same layer, offering parallel and complementary processing of the same inputs. - Introduce meta tokens that are prepended to the input sequences and interact with all subsequent tokens, thus storing important information and alleviating the burden of "forced-to-attend" in attention. - Integrate with cross-layer KV sharing and global-local attention to further boost memory and computation efficiency. - Hymba-1.5B-Instruct outperforms popular small language models and achieves the highest average performance across all tasks. Since Hymba-1.5B-Instruct employs FlexAttention, which relies on Pytorch2.5 and other related dependencies, we provide two ways to setup the environment: - [Local install] Install the related packages using our provided `setup.sh` (support CUDA 12.1/12.4): - [Docker] A docker image is provided with all of Hymba's dependencies installed. You can download our docker image and start a container using the following commands: Step 2: Chat with Hymba-1.5B-Instruct After setting up the environment, you can use the following script to chat with our Model The prompt template used by Hymba-1.5B-Instruct is as follows, which has been integrated into the tokenizer and can be applied using `tokenizer.applychattemplate`: LMFlow is a complete pipeline for fine-tuning large language models. The following steps provide an example of how to fine-tune the `Hymba-1.5B-Base` models using LMFlow. 3. Fine-tune the model using the following command. With LMFlow, you can also fine-tune the model on your custom dataset. The only thing you need to do is transform your dataset into the LMFlow data format. In addition to full-finetuniing, you can also fine-tune hymba efficiently with DoRA, LoRA, LISA, Flash Attention, and other acceleration techniques. For more details, please refer to the LMFlow for Hymba documentation. Limitations The model was trained on data that contains toxic language, unsafe content, and societal biases originally crawled from the internet. Therefore, the model may amplify those biases and return toxic responses especially when prompted with toxic prompts. The model may generate answers that may be inaccurate, omit key information, or include irrelevant or redundant text producing socially unacceptable or undesirable text, even if the prompt itself does not include anything explicitly offensive. The testing suggests that this model is susceptible to jailbreak attacks. If using this model in a RAG or agentic setting, we recommend strong output validation controls to ensure security and safety risks from user-controlled model outputs are consistent with the intended use cases. Ethical Considerations NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse. Please report security vulnerabilities or NVIDIA AI Concerns here.

NaNK
β€”
774
240

stt_en_fastconformer_ctc_large

license:cc-by-4.0
755
14

QLIP-B-16-256

license:cc-by-nc-4.0
745
4

DAM-3B-Self-Contained

NaNK
llava_llama
713
24

stt_uz_fastconformer_hybrid_large_pc

license:cc-by-4.0
701
8

stt_en_fastconformer_hybrid_large_pc

license:cc-by-4.0
684
3

MambaVision-L3-256-21K

β€”
680
7

Cosmos-1.0-Diffusion-7B-Video2World

NaNK
β€”
653
38

Qwen2.5-VL-7B-Instruct-FP8

Description: The NVIDIA Qwen2.5-VL-7B-Instruct-FP8 model is the quantized version of Alibaba's Qwen2.5-VL-7B-Instruct model, which is an auto-regressive language model that uses an optimized transformer architecture. For more information, please check here. The NVIDIA Qwen2.5-VL-7B-Instruct-FP8 model is quantized with TensorRT Model Optimizer. This model is ready for commercial/non-commercial use. Third-Party Community Consideration This model is not owned or developed by NVIDIA. It was developed and built to a third party’s requirements for this application and use case. See the Non-NVIDIA (Qwen2.5-VL-7B-Instruct) Model Card. License/Terms of Use: Use of this model is governed by nvidia-open-model-license ADDITIONAL INFORMATION: Apache 2.0. Deployment Geography: Global, except in European Union Use Case: Developers looking to take off the shelf pre-quantized models for deployment in AI Agent systems, chatbots, RAG systems, and other AI-powered applications. Release Date: Huggingface 08/22/2025 via https://huggingface.co/nvidia/Qwen2.5-VL-7B-Instruct-FP8 Model Architecture: Architecture Type: Transformers Network Architecture: Qwen2.5-VL-7B This model was developed based on Qwen2.5-VL-7B Number of model parameters 710^9 Input: Input Type(s): Multilingual text, and images Input Format(s): String, Images Input Parameters: One-Dimensional (1D), Two-Dimensional (2D) Other Properties Related to Input: Context length up to 32K Output: Output Type(s): Text Output Format: String Output Parameters: 1D (One-Dimensional): Sequences Other Properties Related to Output: N/A Our AI models are designed and/or optimized to run on NVIDIA GPU-accelerated systems. By leveraging NVIDIA’s hardware (e.g. GPU cores) and software frameworks (e.g., CUDA libraries), they achieve faster training and inference times compared to CPU-only solutions. Software Integration: Supported Runtime Engine(s): TensorRT-LLM Supported Hardware Microarchitecture Compatibility: NVIDIA Blackwell Model Version(s): The model is quantized with nvidia-modelopt v0.35.0 Post Training Quantization This model was obtained by quantizing the weights and activations of Qwen2.5-VL-7B-Instruct to FP8 data type, ready for inference with TensorRT-LLM. Only the weights and activations of the linear operators within transformer blocks of the language model are quantized. Training, Testing, and Evaluation Datasets: Data Modality [Image] [Text] Calibration Dataset: Link: cnndailymail Data collection method: Automated. Labeling method: Automated. Training Datasets: Data Collection Method by Dataset: Undisclosed Labeling Method by Dataset: Undisclosed Properties: Undisclosed Testing Dataset: Data Collection Method by Dataset: Undisclosed Labeling Method by Dataset: Undisclosed Properties: Undisclosed To deploy the quantized checkpoint with TensorRT-LLM LLM API, follow the sample codes below: NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse. Please report model quality, risk, security vulnerabilities or NVIDIA AI Concerns here.

NaNK
β€”
629
3

Llama-3.1-Minitron-4B-Depth-Base

Llama-3.1-Minitron-4B-Depth-Base is a base text-to-text model that can be adopted for a variety of natural language generation tasks. It is obtained by pruning Llama-3.1-8B; specifically, we prune the number of transformer blocks in the model. Following pruning, we perform continued training with distillation using 94 billion tokens to arrive at the final model; we use the continuous pre-training data corpus used in Nemotron-4 15B for this purpose. Please refer to our technical report for more details. Model Dates: Llama-3.1-Minitron-4B-Depth-Base was trained between July 29, 2024 and Aug 3, 2024 This model is released under the NVIDIA Open Model License Agreement. Llama-3.1-Minitron-4B-Depth-Base uses a model embedding size of 4096, 32 attention heads, MLP intermediate dimension of 14336, with 32 layers in total. Additionally, it uses Grouped-Query Attention (GQA) and Rotary Position Embeddings (RoPE). Architecture Type: Transformer Decoder (Auto-Regressive Language Model) Other Properties Related to Input: Works well within 8k characters or less. Supported Hardware Microarchitecture Compatibility: NVIDIA Ampere NVIDIA Blackwell NVIDIA Hopper NVIDIA Lovelace Properties: The training corpus for Llama-3.1-Minitron-4B-Depth-Base consists of English and multilingual text, as well as code. Our sources cover a variety of document types such as: webpages, dialogue, articles, and other written materials. The corpus spans domains including legal, math, science, finance, and more. In our continued training set, we introduce a small portion of question-answering, and alignment style data to improve model performance. Data Freshness: The pretraining data has a cutoff of June 2023. Overview 5-shot performance. Language Understanding evaluated using Massive Multitask Language Understanding: Zero-shot performance. Evaluated using select datasets from the LM Evaluation Harness with additions: | HellaSwag | Winogrande | GSM8K| ARC-Challenge | XLSum | | :---- | :---- | :---- | :---- | :---- | | 73.2 | 72.1 | 16.8 | 52.6 | 27.2 Code generation performance. Evaluated using MBPP: | Score | | :---- | | 30.7 | The model was trained on data that contains toxic language, unsafe content, and societal biases originally crawled from the internet. Therefore, the model may amplify those biases and return toxic responses especially when prompted with toxic prompts. The model may generate answers that may be inaccurate, omit key information, or include irrelevant or redundant text producing socially unacceptable or undesirable text, even if the prompt itself does not include anything explicitly offensive. NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse. Please report security vulnerabilities or NVIDIA AI Concerns here. References Compact Language Models via Pruning and Knowledge Distillation LLM Pruning and Distillation in Practice: The Minitron Approach

NaNK
llama
602
21

Llama 3.2 Nv Rerankqa 1b V2

The Llama Nemotron Reranking 1B model is optimized for providing a logit score that represents how relevant a document(s) is to a given query. The model was fine-tuned for multilingual, cross-lingual text question-answering retrieval, with support for long documents (up to 8192 tokens). This model was evaluated on 26 languages: English, Arabic, Bengali, Chinese, Czech, Danish, Dutch, Finnish, French, German, Hebrew, Hindi, Hungarian, Indonesian, Italian, Japanese, Korean, Norwegian, Persian, Polish, Portuguese, Russian, Spanish, Swedish, Thai, and Turkish. This model is a component in a text retrieval system to improve the overall accuracy. A text retrieval system often uses an embedding model (dense) or lexical search (sparse) index to return relevant text passages given the input. A reranking model can be used to rerank the potential candidate into a final order. The reranking model has the question-passage pairs as an input and therefore, can process cross attention between the words. It’s not feasible to apply a Ranking model on all documents in the knowledge base, therefore, ranking models are often deployed in combination with embedding models. The Llama Nemotron Reranking 1B model is a part of the NeMo Retriever collection of NIM, which provide state-of-the-art, commercially-ready models and microservices, optimized for the lowest latency and highest throughput. It features a production-ready information retrieval pipeline with enterprise support. The models that form the core of this solution have been trained using responsibly selected, auditable data sources. With multiple pre-trained models available as starting points, developers can also readily customize them for their domain-specific use cases, such as information technology, human resource help assistants, and research & development research assistants. We are excited to announce the open sourcing of this commercial embedding model. For users interested in deploying this model in production environments, it is also available via the model API in NVIDIA Inference Microservices (NIM) at llama-nemotron-rerank-1b-v2. Use of this model is governed by the NVIDIA Open Model License Agreement. Additional Information: Llama 3.2 Community Model License Agreement. The Llama Nemotron Reranking 1B model is most suitable for users who want to improve their multilingual retrieval tasks by reranking a set of candidates for a given question. Architecture Type: Transformer Network Architecture: Fine-tuned ranker model from the `meta-llama/Llama-3.2-1B` model. The Llama Nemotron Reranking 1B model is a transformer cross-encoder fine-tuned with contrastive learning. We employ bi-directional attention when fine-tuning for higher accuracy. The last embedding output by the decoder model is used with a mean pooling strategy, and a binary classification head is fine-tuned for the ranking task. Ranking models for text ranking are typically trained as a cross-encoder for sentence classification. This involves predicting the relevancy of a sentence pair (for example, question and chunked passages). The CrossEntropy loss is used to maximize the likelihood of passages containing information to answer the question and minimize the likelihood for (negative) passages that do not contain information to answer the question. We trained the model on public datasets described in the Dataset and Training section. Input Type: Pair of Texts Input Format: List of text pairs Input Parameters: 1D Other Properties Related to Input: The model was trained on question and answering over text documents from multiple languages. It was evaluated to work successfully with up to a sequence length of 8192 tokens. Longer texts are recommended to be either chunked or truncated. Output Type: Floats Output Format: List of floats Output Parameters: 1D Other Properties Related to Output: Each value corresponds to a raw logit. Users can choose to apply a Sigmoid activation function to the logits to convert them into probabilities during model usage. Runtime: Llama Nemotron Reranking 1B NIM Supported Hardware Microarchitecture Compatibility: NVIDIA Ampere, NVIDIA Hopper, NVIDIA Lovelace Supported Operating System(s): Linux Llama Nemotron Reranking 1B Short Name: llama-nemotron-rerank-1b-v2 The development of large-scale public open-QA datasets has enabled tremendous progress in powerful embedding models. However, one popular dataset named MSMARCO restricts β€Œcommercial licensing, limiting the use of these models in commercial settings. To address this, NVIDIA created its own training dataset blend based on public QA datasets, which each have a license for commercial applications. Data Collection Method by dataset: Automated, Unknown Properties: This model was trained on 800k samples from public datasets. We evaluate the pipelines on a set of evaluation benchmarks. We applied the ranking model to the candidates retrieved from a retrieval embedding model. Overall, the pipeline llama-nemotron-embed-1b-v2 + llama-nemotron-rerank-1b-v2 provides high BEIR+TechQA accuracy with multilingual and crosslingual support. The llama-nemotron-embed-1b-v2 ranking model is 3.5x smaller than the nv-rerankqa-mistral-4b-v3 model. We evaluated the NVIDIA Retrieval QA Embedding Model in comparison to literature open & commercial retriever models on academic benchmarks for question-answering \- NQ, HotpotQA and FiQA (Finance Q\&A) from BeIR benchmark and TechQA dataset. In this benchmark, the metric used was Recall@5. As described, we need to apply the ranking model on the output of an embedding model. | Open & Commercial Reranker Models | Average Recall@5 on NQ, HotpotQA, FiQA, TechQA dataset | | ----- | ----- | | llama-nemotron-embed-1b-v2 + llama-nemotron-rerank-1b-v2 | 73.64% | | llama-nemotron-embed-1b-v2 | 68.60% | | nv-embedqa-e5-v5 \+ nv-rerankQA-mistral-4b-v3 | 75.45% | | nv-embedqa-e5-v5 | 62.07% | | nv-embedqa-e5-v4 | 57.65% | | e5-large\unsupervised | 48.03% | | BM25 | 44.67% | We evaluated the model’s multilingual capabilities on the MIRACL academic benchmark \- a multilingual retrieval dataset, across 15 languages, and on an additional 11 languages that were translated from the English and Spanish versions of MIRACL. The reported scores are based on a custom subsampled version by selecting hard negatives for each query to reduce the corpus size. | Open & Commercial Retrieval Models | Average Recall@5 on MIRACL multilingual datasets | | :---- | :---- | | llama-nemotron-embed-1b-v2 + llama-nemotron-rerank-1b-v2 | 65.80% | | llama-nemotron-embed-1b-v2 | 60.75% | | nv-embedqa-mistral-7b-v2 | 50.42% | | BM25 | 26.51% | We evaluated the cross-lingual capabilities on the academic benchmark MLQA based on 7 languages (Arabic, Chinese, English, German, Hindi, Spanish, Vietnamese). We consider only evaluation datasets when the query and documents are in different languages. We calculate the average Recall@5 across the 42 different language pairs. | Open & Commercial Retrieval Models | Average Recall@5 on MLQA dataset with different languages | | :---- | :---- | | llama-nemotron-embed-1b-v2 + llama-nemotron-rerank-1b-v2 | 86.83% | | llama-nemotron-embed-1b-v2 | 79.86% | | nv-embedqa-mistral-7b-v2 | 68.38% | | BM25 | 13.01% | We evaluated the support of long documents on the academic benchmark Multilingual Long-Document Retrieval (MLDR) built on Wikipedia and mC4, covering 12 typologically diverse languages . The English version has a median length of 2399 tokens and 90th percentile of 7483 tokens using the llama 3.2 tokenizer. | Open & Commercial Retrieval Models | Average Recall@5 on MLDR | | :---- | :---- | | llama-nemotron-embed-1b-v2 + llama-nemotron-rerank-1b-v2 | 70.69% | | llama-nemotron-embed-1b-v2 | 59.55% | | nv-embedqa-mistral-7b-v2 | 43.24% | | BM25 | 71.39% | Properties The evaluation datasets are based on three MTEB/BEIR TextQA datasets, the TechQA dataset, MIRACL, MLDR and MLQA multilingual retrieval datasets, which are all public datasets. The sizes range between 10,000s up to 5M depending on the dataset. Inference Engine: TensorRT Test Hardware: H100 PCIe/SXM, A100 PCIe/SXM, L40s, L4, and A10G NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their supporting model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse. For more detailed information on ethical considerations for this model, please see the Explainability, Bias, Safety, and Privacy sections. Please report security vulnerabilities or NVIDIA AI Concerns here. Enterprise Support Get access to knowledge base articles and support cases or submit a ticket at the NVIDIA AI Enterprise Support Services page.. NVIDIA NIM Documentation Visit the NeMo Retriever docs page for release documentation, deployment guides and more. | Field | Response | | ----- | ----- | | Participation considerations from adversely impacted groups protected classes in model design and testing | None | | Measures taken to mitigate against unwanted bias | None | | Field | Response | | ----- | ----- | | Intended Application & Domain: | Passage ranking for question and answer retrieval | | Model Type: | Transformer encoder | | Intended User: | Generative AI creators working with conversational AI models - users who want to build a multilingual question and answer application over a large text corpus, leveraging the latest dense retrieval technologies. | | Output: | Array of float numbers (Dense Vector Representation for the input text) | | Describe how the model works: | Model transforms the tokenized input text into a dense vector representation. | | Performance Metrics: | Accuracy, Throughput, and Latency | | Potential Known Risks: | This model does not always guarantee to retrieve the correct passage(s) for a given query. | | Licensing & Terms of Use: | Use of this model is governed by the NVIDIA Open Model License Agreement. Additional Information: Llama 3.2 Community Model License Agreement. | | Technical Limitations | The model’s max sequence length is 8192. Therefore, the longer text inputs should be truncated. | | Name the adversely impacted groups this has been tested to deliver comparable outcomes regardless of: | N/A | | Verified to have met prescribed NVIDIA quality standards: | Yes | | Field | Response | | ----- | ----- | | Generatable or reverse engineerable personally-identifiable information (PII)? | None | | Was consent obtained for any personal data used? | Not Applicable | | PII used to create this model? | None | | How often is the dataset reviewed? | Before Every Release | | Is a mechanism in place to honor data subject right of access or deletion of personal data? | No | | If personal data was collected for the development of the model, was it collected directly by NVIDIA? | Not Applicable | | If personal data was collected for the development of the model by NVIDIA, do you maintain or have access to disclosures made to data subjects? | Not Applicable | | If personal data was collected for the development of this AI model, was it minimized to only what was required? | Not Applicable | | Is there provenance for all datasets used in training? | Yes | | Does data labeling (annotation, metadata) comply with privacy laws? | Yes | | Is data compliant with data subject requests for data correction or removal, if such a request was made? | No, not possible with externally-sourced data. | | Field | Response | | ----- | ----- | | Model Application(s): | Text Reranking for Retrieval | | Describe the physical safety impact (if present). | Not Applicable | | Use Case Restrictions: | Use of this model is governed by the NVIDIA Open Model License Agreement. Additional Information: Llama 3.2 Community Model License Agreement. | | Model and dataset restrictions: | The Principle of least privilege (PoLP) is applied limiting access for dataset generation and model development. Restrictions enforce dataset access during training, and dataset license constraints adhered to. |

NaNK
llama_bidirec
599
5

C-RADIOv2-B

β€”
589
9

segformer-b0-finetuned-cityscapes-768-768

β€”
549
0

Phi-4-multimodal-instruct-NVFP4

β€”
531
4

Qwen3 Nemotron 32B RLBFF

NaNK
β€”
524
25

bigvgan_v2_44khz_128band_256x

NaNK
license:mit
523
7

AMPLIFY_120M

license:mit
522
4

stt_de_fastconformer_hybrid_large_pc

license:cc-by-4.0
499
5

Nemotron-H-4B-Instruct-128K

The pretraining data has a cutoff date of September 2024. NVIDIA Nemotron-H-4B-Instruct-128K is a large language model (LLM) developed by NVIDIA, optimized for single and multi-turn chat, instruction following, and tool-calling use-cases. It uses a hybrid model architecture that consists primarily of Mamba-2 and MLP layers combined with just four Attention layers. The model is an aligned version of Nemotron-H-4B-Base-8K, and features a 128K context length. The supported languages include: English, German, Spanish, French, Italian, Korean, Portuguese, Russian, Japanese, and Chinese. The model underwent a multi-phase post-training process including multiple supervised fine-tuning stages for math, code, science, and then chat, instruction following, and tool-calling, followed by multiple preference tuning stages using Reward-aware Preference Optimization (RPO) for both chat and instruction-following. The base model was pruned and distilled from Nemotron-H-Base-8K using our hybrid language model compression technique. For more details, please refer to the paper. The paper has been accepted for publication at NeurIPS 2025. License/Terms of Use GOVERNING TERMS: Use of this model is governed by the NVIDIA Internal Scientific Research and Development Model License Model Architecture - Architecture Type: Transformer - Network Architecture: Nemotron-Hybrid Use Case: This model is intended for developers designing AI Agent systems, chatbots, RAG systems, and other AI-powered applications. This model is also suitable for typical instruction-following tasks. Release Date: Huggingface: 10/23/2025 via https://huggingface.co/ Input - Input Type(s): Text - Input Format(s): String - Input Parameters: One-Dimensional (1D): Sequences - Other Properties Related to Input: Context length up to 128K. Supported languages include German, Spanish, French, Italian, Korean, Portuguese, Russian, Japanese, Chinese and English. Output - Output Type(s): Text - Output Format: String - Output Parameters: One-Dimensional (1D): Sequences Our AI models are designed and/or optimized to run on NVIDIA GPU-accelerated systems. By leveraging NVIDIA’s hardware (e.g. GPU cores) and software frameworks (e.g., CUDA libraries), the model achieves faster training and inference times compared to CPU-only solutions. Software Integration - Runtime Engine(s): NeMo 24.12 - Supported Hardware Microarchitecture Compatibility: NVIDIA H100-80GB, NVIDIA A100 - Operating System(s): Linux References [\[2504.11409\] Efficient Hybrid Language Model Compression through Group-Aware SSM Pruning](https://arxiv.org/abs/2504.11409) Note: Newline should be present after the last Assistant as a generation prompt. The data for post-training phases is a compilation of supervised fine-tuning and preference tuning data for improving math, code, science, chat, tool-calling, and instruction following capabilities. Data Collection for Training & Testing Datasets: Hybrid: Automated, Human, Synthetic Data Labeling for Training & Testing Datasets: Hybrid: Automated, Human, Synthetic We used the datasets listed in the next section to evaluate the model. Data Collection for Training Datasets: Hybrid: Automated, Human, Synthetic Data Labeling for Training Datasets: Hybrid: Automated, Human, Synthetic | MT-Bench 0-shot | IFEval Strict Average 0-shot | |-------------|--------------| | 7.9 | 76.24 | MT-Bench - A set of 80 multi-turn, open-ended questions for evaluating chat abilities. We use GPT-4-Turbo as the judge model. Dataset & Code IFEval - Contains 500 verifiable instructions to test instruction following abilities of language models. We report the average of prompt and instruction level scores in the strict category. Dataset | MBPP 0-shot | MBPP+ 0-shot | HumanEval 0-shot | HumanEval+ 0-shot | |-------------|--------------|-----------------|------------------| | 78.6 | 68.25 | 76.2 | 70.85 | MBPP - Evaluates ability to generate solutions for Python programming tasks. Dataset MBPP+ - Extended version of MBPP with additional tests. Dataset HumanEval - Tests code generation and completion abilities in Python. Dataset HumanEval+ - Extended version of HumanEval with additional tests. Dataset | GSM8K 0-shot | MATH-500 0-shot | |--------------|--------------| | 88.93 | 76.4 | GSM8K - Evaluates grade school level mathematical word problem solving. Dataset MATH-500 - A subset of 500 questions from the MATH benchmark. Dataset | BFCL v2 Live Overall Accuracy 0-shot | |--------------| | 65.88 | BFCL v2 Live - Evaluates tool-calling ability of language models over multiple categories in real-world scenarios. Dataset | MMLU 0-shot (Generative) | |------------------| | 66.96 | MMLU - Tests knowledge across 57 subjects including science, humanities, math and more. Dataset The model was trained on data that contains toxic language, unsafe content, and societal biases originally crawled from the internet. Therefore, the model may amplify those biases and return toxic responses especially when prompted with toxic prompts. The model may generate answers that may be inaccurate, omit key information, or include irrelevant or redundant text producing socially unacceptable or undesirable text, even if the prompt itself does not include anything explicitly offensive. Code produced by the model may not always model real-world contexts and should be checked. The model demonstrates weakness to alignment-breaking attacks. Users are advised to deploy language model guardrails alongside this model to prevent potentially harmful outputs. The model may generate answers that are inaccurate, omit key information, or include irrelevant or redundant text. Inference - Engine: NeMo - Test Hardware NVIDIA H100-80GB Ethical Considerations NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse. For more detailed information on ethical considerations for this model, please see the Responsible Use Guide available at http://nvidia.com/nemotron-responsible-use. Please report security vulnerabilities or NVIDIA AI Concerns here.

NaNK
β€”
496
7

ssl_en_nest_large_v1.0

license:cc-by-4.0
496
6

tts_hifigan

license:cc-by-4.0
488
38

Minitron-4B-Base

Minitron-4B-Base is a large language model (LLM) obtained by pruning Nemotron-4 15B; specifically, we prune model embedding size, number of attention heads, and MLP intermediate dimension. Following pruning, we perform continued training with distillation using 94 billion tokens to arrive at the final model; we use the continuous pre-training data corpus used in Nemotron-4 15B for this purpose. Deriving the Minitron 8B and 4B models from the base 15B model using our approach requires up to 40x fewer training tokens per model compared to training from scratch; this results in compute cost savings of 1.8x for training the full model family (15B, 8B, and 4B). Minitron models exhibit up to a 16% improvement in MMLU scores compared to training from scratch, perform comparably to other community models such as Mistral 7B, Gemma 7B and Llama-3 8B, and outperform state-of-the-art compression techniques from the literature. Please refer to our arXiv paper for more details. Model Dates: Minitron-4B-Base was trained between February 2024 and June 2024. Minitron-4B-Base is released under the NVIDIA Open Model License Agreement. Minitron-4B-Base uses a model embedding size of 3072, 32 attention heads, and an MLP intermediate dimension of 9216. It also uses Grouped-Query Attention (GQA) and Rotary Position Embeddings (RoPE). Architecture Type: Transformer Decoder (auto-regressive language model) Support for Nemotron models will be added in the upcoming transformers library release. In the meantime, please install the library from source: The following code provides an example of how to load the Minitron-4B-Base model and use it to perform text generation. Properties: The training corpus for Minitron-4B-Base consists of English and multilingual text, as well as code. Our sources cover a variety of document types such as: webpages, dialogue, articles, and other written materials. The corpus spans domains including legal, math, science, finance, and more. In our continued training set, we introduce a small portion of question-answering, and alignment style data to improve model performance. Data Freshness: The pretraining data has a cutoff of June 2023. 5-shot performance. Language Understanding evaluated using Massive Multitask Language Understanding: Zero-shot performance. Evaluated using select datasets from the LM Evaluation Harness with additions: | HellaSwag | Winogrande | GSM8K| ARC-C | XLSum | | :------------- | :------------- | :------------- | :------------- | :------------- | | 75.0 | 74.0 | 24.1 | 50.9 | 29.5 Code generation performance. Evaluated using HumanEval: Please refer to our paper for the full set of results. The model was trained on data that contains toxic language, unsafe content, and societal biases originally crawled from the internet. Therefore, the model may amplify those biases and return toxic responses especially when prompted with toxic prompts. The model may generate answers that may be inaccurate, omit key information, or include irrelevant or redundant text producing socially unacceptable or undesirable text, even if the prompt itself does not include anything explicitly offensive. NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse. Please report security vulnerabilities or NVIDIA AI Concerns here. If you find our work helpful, please consider citing our paper:

NaNK
llama-3
482
134

Llama-3.1-Nemotron-8B-UltraLong-4M-Instruct

We introduce Nemotron-UltraLong-8B, a series of ultra-long context language models designed to process extensive sequences of text (up to 1M, 2M, and 4M tokens) while maintaining competitive performance on standard benchmarks. Built on the Llama-3.1, UltraLong-8B leverages a systematic training recipe that combines efficient continued pretraining with instruction tuning to enhance long-context understanding and instruction-following capabilities. This approach enables our models to efficiently scale their context windows without sacrificing general performance. - nvidia/Llama-3.1-Nemotron-8B-UltraLong-1M-Instruct - nvidia/Llama-3.1-Nemotron-8B-UltraLong-2M-Instruct - nvidia/Llama-3.1-Nemotron-8B-UltraLong-4M-Instruct Starting with `transformers >= 4.43.0` onward, you can run conversational inference using the Transformers `pipeline` abstraction or by leveraging the Auto classes with the `generate()` function. Make sure to update your transformers installation via `pip install --upgrade transformers`. Base model: meta-llama/Llama-3.1-8B-Instruct Continued Pretraining: The training data consists of 1B tokens sourced from a pretraining corpus using per-domain upsampling based on sample length. The model was trained for 150 iterations with a sequence length of 4M and a global batch size of 2. Supervised fine-tuning (SFT): 1B tokens on open-source instruction datasets across general, mathematics, and code domains. We subsample the data from the β€˜generalsftstage2’ from AceMath-Instruct. We evaluate Nemotron-UltraLong-8B on a diverse set of benchmarks, including long-context tasks (e.g., RULER, LV-Eval, and InfiniteBench) and standard tasks (e.g., MMLU, MATH, GSM-8K, and HumanEval). UltraLong-8B achieves superior performance on ultra-long context tasks while maintaining competitive results on standard benchmarks. Correspondence to Chejian Xu ([email protected]), Wei Ping ([email protected]) @article{ulralong2025, title={From 128K to 4M: Efficient Training of Ultra-Long Context Large Language Models}, author={Xu, Chejian and Ping, Wei and Xu, Peng and Liu, Zihan and Wang, Boxin and Shoeybi, Mohammad and Catanzaro, Bryan}, journal={arXiv preprint}, year={2025} }

NaNK
llama
480
120

Llama-3.1-Nemotron-8B-UltraLong-1M-Instruct

NaNK
llama
480
52

Phi-4-reasoning-plus-NVFP4

NaNK
license:mit
480
4

MM-Embed

license:cc-by-nc-4.0
477
61

Cosmos-Embed1-224p

β€”
474
4

Phi-4-reasoning-plus-FP8

NaNK
license:mit
449
1

stt_pt_fastconformer_hybrid_large_pc

license:cc-by-nc-4.0
442
2

MambaVision L3 512 21K

MambaVision: A Hybrid Mamba-Transformer Vision Backbone. We have developed the first hybrid model for computer vision which leverages the strengths of Mamba and Transformers. Specifically, our core contribution includes redesigning the Mamba formulation to enhance its capability for efficient modeling of visual features. In addition, we conducted a comprehensive ablation study on the feasibility of integrating Vision Transformers (ViT) with Mamba. Our results demonstrate that equipping the Mamba architecture with several self-attention blocks at the final layers greatly improves the modeling capacity to capture long-range spatial dependencies. Based on our findings, we introduce a family of MambaVision models with a hierarchical architecture to meet various design criteria. MambaVision-L3-512-21K is pretrained on ImageNet-21K dataset and finetuned on ImageNet-1K at 512 x 512 resolution. Name Acc@1(%) Acc@5(%) #Params(M) FLOPs(G) Resolution MambaVision-L3-512-21K 88.1 98.6 739.6 489.1 512x512 In addition, the MambaVision models demonstrate a strong performance by achieving a new SOTA Pareto-front in terms of Top-1 accuracy and throughput. It is highly recommended to install the requirements for MambaVision by running the following: For each model, we offer two variants for image classification and feature extraction that can be imported with 1 line of code. In the following example, we demonstrate how MambaVision can be used for image classification. Given the following image from COCO dataset val set as an input: The following snippet can be used for image classification: MambaVision can also be used as a generic feature extractor. Specifically, we can extract the outputs of each stage of model (4 stages) as well as the final averaged-pool features that are flattened. The following snippet can be used for feature extraction: Name Acc@1(%) Acc@5(%) #Params(M) FLOPs(G) Resolution HF Download MambaVision-B-21K 84.9 97.5 97.7 15.0 224x224 link model MambaVision-L-21K 86.1 97.9 227.9 34.9 224x224 link model MambaVision-L2-512-21K 87.3 98.4 241.5 196.3 512x512 link model MambaVision-L3-256-21K 87.3 98.3 739.6 122.3 256x256 link model MambaVision-L3-512-21K 88.1 98.6 739.6 489.1 512x512 link model Name Acc@1(%) Acc@5(%) Throughput(Img/Sec) Resolution #Params(M) FLOPs(G) HF Download MambaVision-T 82.3 96.2 6298 224x224 31.8 4.4 link model MambaVision-T2 82.7 96.3 5990 224x224 35.1 5.1 link model MambaVision-S 83.3 96.5 4700 224x224 50.1 7.5 link model MambaVision-B 84.2 96.9 3670 224x224 97.7 15.0 link model MambaVision-L 85.0 97.1 2190 224x224 227.9 34.9 link model MambaVision-L2 85.3 97.2 1021 224x224 241.5 37.5 link model We provide a docker file. In addition, assuming that a recent PyTorch package is installed, the dependencies can be installed by running:

β€”
429
54

Llama-3_1-Nemotron-Ultra-253B-v1-FP8

NaNK
llama-3
425
8

Phi-4-reasoning-plus-FP4

Description: The NVIDIA Phi-4-reasoning-plus FP4 model is the quantized version of Microsoft’s Phi-4-reasoning-plus model, which is a multimodal foundation model that uses an optimized transformer architecture. For more information, please check here. The NVIDIA Phi-4-reasoning-plus FP4 model is quantized with TensorRT Model Optimizer. This model is ready for commercial/non-commercial use. Third-Party Community Consideration This model is not owned or developed by NVIDIA. This model has been developed and built to a third-party’s requirements for this application and use case; see link to Non-NVIDIA (Phi-4-reasoning-plus) Model Card. License/Terms of Use: Use of this model is governed by MIT Deployment Geography: Global, except in European Union Use Case: Developers looking to take off the shelf pre-quantized models for deployment in AI Agent systems, chatbots, RAG systems, and other AI-powered applications. Release Date: Huggingface 09/15/2025 via https://huggingface.co/nvidia/Phi-4-reasoning-plus-FP4 Model Architecture: Architecture Type: Transformers Network Architecture: Phi3ForCausalLM This model was developed based on Phi-4-reasoning-plus Number of model parameters 14.710^9 Input: Input Type(s): Text Input Format(s): String Input Parameters: 1D (One-Dimensional): Sequences Other Properties Related to Input: Context length up to 64k Output: Output Type(s): Text Output Format: String Output Parameters: 1D (One-Dimensional): Sequences Other Properties Related to Output: N/A Our AI models are designed and/or optimized to run on NVIDIA GPU-accelerated systems. By leveraging NVIDIA’s hardware (e.g. GPU cores) and software frameworks (e.g., CUDA libraries), the model achieves faster training and inference times compared to CPU-only solutions. Software Integration: Supported Runtime Engine(s): TensorRT-LLM Supported Hardware Microarchitecture Compatibility: NVIDIA Blackwell Model Version(s): The model is quantized with nvidia-modelopt v0.35.0 Post Training Quantization This model was obtained by quantizing the weights and activations of Phi-4-reasoning-plus to FP4 data type, ready for inference with TensorRT-LLM. Only the weights and activations of the linear operators within transformer blocks are quantized. Training and Testing Datasets: Data Modality [Text] Text Training Data Size [1 Billion to 10 Trillion Tokens] Calibration Dataset: Link: cnndailymail Data collection method: Automated. Labeling method: Automated. Training Datasets: Data Collection Method by Dataset: Automated, Synthetic, Human Labeling Method by Dataset: Undisclosed Properties: Training data is a mixture of Q&A, chat format data in math, science, and coding. The chat prompts are sourced from filtered high-quality web data and optionally rewritten and processed through a synthetic data generation pipeline. We further include data to improve truthfulness and safety. Testing Dataset: Data Collection Method by Dataset: Undisclosed Labeling Method by Dataset: Undisclosed Properties: Undisclosed To deploy the quantized checkpoint with TensorRT-LLM LLM API, follow the sample codes below: NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse. Please report model quality, risk, security vulnerabilities or NVIDIA AI Concerns here.

NaNK
license:mit
424
4

nemocurator-fineweb-mixtral-edu-classifier

β€”
421
6

Phi-4-multimodal-instruct-FP4

Description: The NVIDIA Phi-4-multimodal-instruct FP4 model is the quantized version of Microsoft’s Phi-4-multimodal-instruct model, which is a multimodal foundation model that uses an optimized transformer architecture. For more information, please check here. The NVIDIA Phi-4-multimodal-instruct FP4 model is quantized with TensorRT Model Optimizer. This model is ready for commercial/non-commercial use. Third-Party Community Consideration This model is not owned or developed by NVIDIA. This model has been developed and built to a third-party’s requirements for this application and use case; see link to Non-NVIDIA (Phi-4-multimodal-instruct) Model Card. License/Terms of Use: Use of this model is governed by nvidia-open-model-license ADDITIONAL INFORMATION: MITLicense. Deployment Geography: Global, except in European Union Use Case: Developers looking to take off the shelf pre-quantized models for deployment in AI Agent systems, chatbots, RAG systems, and other AI-powered applications. Release Date: Huggingface 09/15/2025 via https://huggingface.co/nvidia/Phi-4-multimodal-instruct-FP4 Model Architecture: Architecture Type: Transformers Network Architecture: Phi4MMForCausalLM This model was developed based on Phi-4-multimodal-instruct Number of model parameters 5.610^9 Input: Input Type(s): Text, image and speech Input Format(s): String, Images (see properties), Soundfile Input Parameters: One-Dimensional (1D), Two-Dimensional (2D), One-Dimensional (1D) Other Properties Related to Input: Any common RGB/gray image format (e.g., (".jpg", ".jpeg", ".png", ".ppm", ".bmp", ".pgm", ".tif", ".tiff", ".webp")) can be supported. Any audio format that can be loaded by soundfile package should be supported. Context length up to 128K Output: Output Type(s): Text Output Format: String Output Parameters: 1D (One-Dimensional): Sequences Other Properties Related to Output: N/A Our AI models are designed and/or optimized to run on NVIDIA GPU-accelerated systems. By leveraging NVIDIA’s hardware (e.g. GPU cores) and software frameworks (e.g., CUDA libraries), the model achieves faster training and inference times compared to CPU-only solutions. Software Integration: Supported Runtime Engine(s): TensorRT-LLM Supported Hardware Microarchitecture Compatibility: NVIDIA Blackwell Model Version(s): The model is quantized with nvidia-modelopt v0.35.0 Post Training Quantization This model was obtained by quantizing the weights and activations of Phi-4-multimodal-instruct to FP4 data type, ready for inference with TensorRT-LLM. Only the weights and activations of the linear operators within transformer blocks of the language model are quantized. Training and Testing Datasets: Data Modality [Audio] [Image] [Text] Text Training Data Size [1 Billion to 10 Trillion Tokens] Audio Training Data Size [More than 1 Million Hours] Image Training Data Size [1 Billion to 10 Trillion image-text Tokens] Calibration Dataset: Link: cnndailymail Data collection method: Automated. Labeling method: Automated. Training Datasets: Data Collection Method by Dataset: Automated Labeling Method by Dataset: Human, Automated Properties: publicly available documents filtered for quality, selected high-quality educational data, and code newly created synthetic, β€œtextbook-like” data for the purpose of teaching math, coding, common sense reasoning, general knowledge of the world (e.g., science, daily activities, theory of mind, etc.) high quality human labeled data in chat format selected high-quality image-text interleave data synthetic and publicly available image, multi-image, and video data anonymized in-house speech-text pair data with strong/weak transcriptions selected high-quality publicly available and anonymized in-house speech data with task-specific supervisions selected synthetic speech data synthetic vision-speech data Testing Dataset: Data Collection Method by Dataset: Undisclosed Labeling Method by Dataset: Undisclosed Properties: Undisclosed To deploy the quantized checkpoint with TensorRT-LLM LLM API, follow the sample codes below: NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse. Please report model quality, risk, security vulnerabilities or NVIDIA AI Concerns here.

β€”
409
4

Cosmos-UpsamplePrompt1-12B-Text2World

NaNK
β€”
406
1

nemo-nano-codec-22khz-1.78kbps-12.5fps

β€”
402
9

tts_en_fastpitch

license:cc-by-4.0
401
39

Llama-3.1-Nemotron-Nano-VL-8B-V1-FP4-QAD

Llama-3.1-Nemotron-Nano-VL-8B-V1-FP4-QAD is the quantized version of the NVIDIA Llama Nemotron Nano VL model, which is an auto-regressive vision language model that uses an optimized transformer architecture. For more information, please check here. The NVIDIA Llama Nemotron Nano VL FP4 QAD model is quantized with TensorRT Model Optimizer. This model was trained on commercial images using Quantization-aware Distillation (QAD). This model is ready for commercial/non-commercial use. License/Terms of Use Governing Terms: Your use of the model is governed by the NVIDIA Open License Agreement. Additional Information: Llama 3.1 Community Model License. Built with Llama. The intended users of this model are AI foundry enterprise customers, as well as researchers or developers. This model may be used for image summarization, text-image analysis, Optical Character Recognition, interactive Q&A on images, and Chain-of-Thought reasoning. - Hugging Face [October 8th, 2025] via https://huggingface.co/nvidia/Llama-3.1-Nemotron-Nano-VL-8B-V1-FP4-QAD Input Format(s): Image (Red, Green, Blue (RGB)), and Text (String) Input Parameters: Image (Two-Dimensional - 2D), Text (One-Dimensional - 1D) - Language Supported: English only - Input + Output Token: 16K - Maximum Resolution: Determined by a 12-tile layout constraint, with each tile being 512 Γ— 512 pixels. This supports aspect ratios such as: - 4 Γ— 3 layout: up to 2048 Γ— 1536 pixels - 3 Γ— 4 layout: up to 1536 Γ— 2048 pixels - 2 Γ— 6 layout: up to 1024 Γ— 3072 pixels - 6 Γ— 2 layout: up to 3072 Γ— 1024 pixels - Other configurations allowed, provided total tiles ≀ 12 - Channel Count: 3 channels (RGB) - Alpha Channel: Not supported (no transparency) Other Properties Related to Output: Input + Output Token: 16K Our AI models are designed and/or optimized to run on NVIDIA GPU-accelerated systems. By leveraging NVIDIA’s hardware (e.g. GPU cores) and software frameworks (e.g., CUDA libraries), the model achieves faster training and inference times compared to CPU-only solutions. Software Integration Runtime Engine(s): vLLM Supported Hardware Microarchitecture Compatibility: B100/B200 Supported Operating System(s): Linux The integration of foundation and fine-tuned models into AI systems requires additional testing using use-case-specific data to ensure safe and effective deployment. Following the V-model methodology, iterative testing and validation at both unit and system levels are essential to mitigate risks, meet technical and functional requirements, and ensure compliance with safety and ethical standards before deployment. Model Versions: Llama-3.1-Nemotron-Nano-VL-8B-V1-FP4-QAD To serve this checkpoint with vLLM, you can start the docker `vllm/vllm-openai:latest` and run the sample command below: Image Training Data Size: - 1 Million to 1 Billion Images Text Training Data Size: - Less than a Billion Tokens NV-Pretraining and NV-CosmosNemotron-SFT were used for training and evaluation Data Collection Method by dataset: Hybrid: Human, Synthetic Labeling Method by dataset: Hybrid: Human, Synthetic Properties: The dataset collection (for training and evaluation) consists of a mix of internal and public datasets designed for training and evaluation across various tasks. It includes: β€’ Internal datasets built with public commercial images and internal labels, supporting tasks like conversation modeling and document analysis. β€’ Public datasets sourced from publicly available images and annotations, adapted for tasks such as image captioning and visual question answering. β€’ Synthetic datasets generated programmatically for specific tasks like tabular data understanding. β€’ Specialized datasets for safety alignment, function calling, and domain-specific tasks (e.g., science diagrams, financial question answering). NV-Pretraining and NV-CosmosNemotron-SFT were used for training and evaluation. Data Collection Method by dataset: Hybrid: Human, Synthetic Labeling Method by dataset: Hybrid: Human, Synthetic Properties: Additionally, the dataset collection (for training and evaluation) consists of a mix of internal and public datasets designed for training and evaluation across various tasks. It includes: β€’ Internal datasets built with public commercial images and internal labels, supporting tasks like conversation modeling and document analysis. β€’ Public datasets sourced from publicly available images and annotations, adapted for tasks such as image captioning and visual question answering. β€’ Synthetic datasets generated programmatically for specific tasks like tabular data understanding. β€’ Specialized datasets for safety alignment, function calling, and domain-specific tasks (e.g., science diagrams, financial question answering). | Benchmark | Score (FP4) | Score (BF16) | --- | --- | --- | | MMMU Val with chatGPT as a judge | 47.9% | 48.2% | | AI2D | 85.0% | 85.0% | | ChartQA | 86.5% | 86.3% | | InfoVQA Val | 77.6% | 77.4% | | OCRBench | 836 | 839 | | OCRBenchV2 English | 59.5% | 60.1% | | OCRBenchV2 Chinese | 38.0% | 37.9% | | DocVQA val | 91.5% | 91.2% | | VideoMME | 54.6% | 54.7% | The evaluation for this checkpoint was done with FP4 simulated quantization on H100. Inference Engine: vLLM Test Hardware: 1x NVIDIA B100/B200 NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse. For more detailed information on ethical considerations for this model, please see the Model Card++ Explainability, Bias, Safety & Security, and Privacy Subcards. Please report security vulnerabilities or NVIDIA AI Concerns here. Please make sure you have proper rights and permissions for all input image and video content; if image or video includes people, personal health information, or intellectual property, the image or video generated will not blur or maintain proportions of image subjects included. Users are responsible for model inputs and outputs. Users are responsible for ensuring safe integration of this model, including implementing guardrails as well as other safety mechanisms, prior to deployment. Please report model quality, risk, security vulnerabilities or NVIDIA AI Concerns here.

NaNK
Llama_Nemotron_Nano_VL
401
10

stt_fr_fastconformer_hybrid_large_pc

license:cc-by-4.0
399
5

Llama-3.3-Nemotron-70B-Reward

NaNK
llama
396
2

stt_fa_fastconformer_hybrid_large

license:cc-by-4.0
390
17

stt_ar_fastconformer_hybrid_large_pc_v1.0

license:cc-by-4.0
389
9

Cosmos-1.0-Guardrail

β€”
384
57

OpenMath2-Llama3.1-70B

NaNK
llama
382
20

OpenCodeReasoning-Nemotron-1.1-14B

NaNK
β€”
381
13

Phi-4-multimodal-instruct-FP8

Description: The NVIDIA Phi-4-multimodal-instruct FP8 model is the quantized version of Microsoft’s Phi-4-multimodal-instruct model, which is a multimodal foundation model that uses an optimized transformer architecture. For more information, please check here. The NVIDIA Phi-4-multimodal-instruct FP8 model is quantized with TensorRT Model Optimizer. This model is ready for commercial/non-commercial use. Third-Party Community Consideration This model is not owned or developed by NVIDIA. This model has been developed and built to a third-party’s requirements for this application and use case; see link to Non-NVIDIA (Phi-4-multimodal-instruct) Model Card. License/Terms of Use: Use of this model is governed by nvidia-open-model-license ADDITIONAL INFORMATION: MITLicense. Deployment Geography: Global, except in European Union Use Case: Developers looking to take off the shelf pre-quantized models for deployment in AI Agent systems, chatbots, RAG systems, and other AI-powered applications. Release Date: Huggingface 09/15/2025 via https://huggingface.co/nvidia/Phi-4-multimodal-instruct-FP8 Model Architecture: Architecture Type: Transformers Network Architecture: Phi4MMForCausalLM This model was developed based on Phi-4-multimodal-instruct Number of model parameters 5.610^9 Input: Input Type(s): Text, image and speech Input Format(s): String, Images (see properties), Soundfile Input Parameters: One-Dimensional (1D), Two-Dimensional (2D), One-Dimensional (1D) Other Properties Related to Input: Any common RGB/gray image format (e.g., (".jpg", ".jpeg", ".png", ".ppm", ".bmp", ".pgm", ".tif", ".tiff", ".webp")) can be supported. Any audio format that can be loaded by soundfile package should be supported. Context length up to 128K Output: Output Type(s): Text Output Format: String Output Parameters: 1D (One-Dimensional): Sequences Other Properties Related to Output: N/A Our AI models are designed and/or optimized to run on NVIDIA GPU-accelerated systems. By leveraging NVIDIA’s hardware (e.g. GPU cores) and software frameworks (e.g., CUDA libraries), the model achieves faster training and inference times compared to CPU-only solutions. Software Integration: Supported Runtime Engine(s): TensorRT-LLM Supported Hardware Microarchitecture Compatibility: NVIDIA Blackwell Model Version(s): The model is quantized with nvidia-modelopt v0.35.0 Post Training Quantization This model was obtained by quantizing the weights and activations of Phi-4-multimodal-instruct to FP8 data type, ready for inference with TensorRT-LLM. Only the weights and activations of the linear operators within transformer blocks of the language model are quantized. Training and Testing Datasets: Data Modality [Audio] [Image] [Text] Text Training Data Size [1 Billion to 10 Trillion Tokens] Audio Training Data Size [More than 1 Million Hours] Image Training Data Size [1 Billion to 10 Trillion image-text Tokens] Calibration Dataset: Link: cnndailymail Data collection method: Automated. Labeling method: Automated. Training Datasets: Data Collection Method by Dataset: Automated Labeling Method by Dataset: Human, Automated Properties: publicly available documents filtered for quality, selected high-quality educational data, and code newly created synthetic, β€œtextbook-like” data for the purpose of teaching math, coding, common sense reasoning, general knowledge of the world (e.g., science, daily activities, theory of mind, etc.) high quality human labeled data in chat format selected high-quality image-text interleave data synthetic and publicly available image, multi-image, and video data anonymized in-house speech-text pair data with strong/weak transcriptions selected high-quality publicly available and anonymized in-house speech data with task-specific supervisions selected synthetic speech data synthetic vision-speech data Testing Dataset: Data Collection Method by Dataset: Undisclosed Labeling Method by Dataset: Undisclosed Properties: Undisclosed To deploy the quantized checkpoint with TensorRT-LLM LLM API, follow the sample codes below: NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse. Please report model quality, risk, security vulnerabilities or NVIDIA AI Concerns here.

β€”
377
1

stt_es_fastconformer_hybrid_large_pc_nc

license:cc-by-nc-4.0
376
1

ChronoEdit-14B-Diffusers-Upscaler-Lora

NaNK
license:apache-2.0
375
61

audio-codec-44khz

β€”
375
23

DAM-3B-Video

NaNK
llava_llama
374
56

stt_ru_conformer_ctc_large

license:cc-by-4.0
369
8

stt_ru_conformer_transducer_large

license:cc-by-4.0
360
22

MambaVision-B-1K

license:apache-2.0
360
11

ChronoEdit 14B Diffusers

NaNK
license:apache-2.0
350
77

stt_en_conformer_ctc_small

license:cc-by-4.0
349
4

stt_uk_citrinet_1024_gamma_0_25

license:cc-by-4.0
348
12

C-RADIOv3-L

β€”
347
1

stt_en_conformer_transducer_xlarge

license:cc-by-4.0
340
56

GEN3C Cosmos 7B

GEN3C: 3D-Informed World-Consistent Video Generation with Precise Camera Control CVPR 2025 (Highlight) Xuanchi Ren, Tianchang Shen Jiahui Huang, Huan Ling, Yifan Lu, Merlin Nimier-David, Thomas MΓΌller, Alexander Keller, Sanja Fidler, Jun Gao \ indicates equal contribution GEN3C is a generative video model with precise camera control and temporal three-dimensional (3D) Consistency. We achieve this with a 3D cache: point clouds obtained by predicting the pixel-wise depth of seed images or previously generated frames. When generating the next frames, GEN3C is conditioned on the two-dimensional (2D) renderings of the 3D cache with the new camera trajectory provided by the user. Our results demonstrate more precise camera control than prior work, as well as state-of-the-art results in sparse-view novel view synthesis, even in challenging settings such as driving scenes and monocular dynamic video. This model is ready for commercial/non-commercial use License/Terms of Use: This model is released under the NVIDIA Open Model License. For a custom license, please contact [email protected]. Important Note: If you bypass, disable, reduce the efficacy of, or circumvent any technical limitation, safety guardrail or associated safety guardrail hyperparameter, encryption, security, digital rights management, or authentication mechanism contained in the Model, your rights under NVIDIA Open Model License Agreement will automatically terminate. Use Case: This model is intended for researchers interested in developing consistent video generation and allows users to use cameras to control the final generation. For AV applications, we can enable users to generate driving videos and specify the camera trajectories in this video, such as switching from the viewpoint of a sedan car to a truck, or looking at a different lane. Release Date: Github 06/10/2025 via https://github.com/nv-tlabs/Gen3C Reference: GEN3C: 3D-Informed World-Consistent Video Generation with Precise Camera Control Paper, Project Page Model Architecture: Architecture Type: Convolutional Neural Network (CNN), Transformer Network Architecture: Transformer This model was developed based on Cosmos Predict 1 This model has 7B of model parameters. Input: Input Type(s): Camera Parameters, Image Input Format(s): 1D Array of Camera Poses, 2D Array of Images. Input Parameters: Camera Poses (1D), Images (2D) Other Properties Related to Input: The input image should be 720 1080 resolution, and we recommend using 121 frames for the camera parameters. Output: Output Type(s): Videos Output Format: MP4 video Output Parameters: 3D (N x H x W), with 3 channels (Red, Green, Blue ((RGB)) Other Properties Related to Output: A sequence of images (N x H x W x 3), N is the number of frames, H is the height and W is the width. Three (3) refers to the number of RGB channels. Our AI models are designed and/or optimized to run on NVIDIA GPU-accelerated systems A100 and H100. By leveraging NVIDIA’s hardware (e.g. GPU cores) and software frameworks (e.g., CUDA libraries), the model achieves faster training and inference times compared to CPU-only solutions. Software Integration: Runtime Engine(s): Cosmos-Predict1 Supported Hardware Microarchitecture Compatibility: NVIDIA Ampere NVIDIA Blackwell NVIDIA Hopper Ethical Considerations: NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse. Users are responsible for model inputs and outputs. Users are responsible for ensuring safe integration of this model, including implementing guardrails as well as other safety mechanisms, prior to deployment. For more detailed information on ethical considerations for this model, please see the Model Card++ Explainability, Bias, Safety & Security, and Privacy Subcards link to subcard. Please report security vulnerabilities or NVIDIA AI Concerns here. We value you, the datasets, the diversity they represent, and what we have been entrusted with. This model and its associated data have been: - Verified to comply with current applicable disclosure laws, regulations, and industry standards. - Verified to comply with applicable privacy labeling requirements. - Annotated to describe the collector/source (NVIDIA or a third-party). - Characterized for technical limitations. - Reviewed to ensure proper disclosure is accessible to, maintained for, and in compliance with NVIDIA data subjects and their requests. - Reviewed before release. - Tagged for known restrictions and potential safety implications. Field | Response :---------------------------------------------------------------------------------------------------|:--------------- Participation considerations from adversely impacted groups protected classes in model design and testing: | None Measures taken to mitigate against unwanted bias: | None Field | Response :------------------------------------------------------------------------------------------------------|:--------------------------------------------------------------------------------- Intended Task/Domain: | Novel view synthesis, video generation Model Type: | Transformer Intended Users: | Physical AI developers. Output: | Videos Describe how the model works: | We first predict depth for the input image, unproject it in to 3D to maintain a 3D cache. The 3D cache is then projected into a incomplete 2D video, which will be used as a condition for Cosmos to generate final video. Name the adversely impacted groups this has been tested to deliver comparable outcomes regardless of: | Not Applicable. Technical Limitations & Mitigation: | While the model aims to create photorealistic scenes that replicate real-world conditions, it may generate outputs that are not entirely visually accurate and may require augmentation and/or real-world data depending on the scope and use case. Verified to have met prescribed NVIDIA quality standards: | Yes Performance Metrics: | Qualitative and Quantitative Evaluation including PSNR, SSIM, LPIPS metrics. See Gen3C paper Section 5. for details. Potential Known Risks: | This model may inaccurately characterize depth, which will make the generated video un-realistic and prone to artifacts. Licensing: | NVIDIA Open Model License Field | Response :----------------------------------------------------------------------------------------------------------------------------------|:----------------------------------------------- Generatable or reverse engineerable personal data? | [None Known] Personal data used to create this model? | [None Known] Was consent obtained for any personal data used? | [None Known] How often is dataset reviewed? | Before Release Does data labeling (annotation, metadata) comply with privacy laws? | Yes Applicable Privacy Policy | https://www.nvidia.com/en-us/about-nvidia/privacy-policy/ Field | Response :---------------------------------------------------|:---------------------------------- Model Application Field(s): | World Generation Describe the life critical impact (if present). | None Known Use Case Restrictions: | Abide by NVIDIA Open Model License Model and dataset restrictions: | The Principle of least privilege (PoLP) is applied limiting access for dataset generation and model development. Restrictions enforce dataset access during training, and dataset license constraints adhered to. Citation ``` @inproceedings{ren2025gen3c, title={GEN3C: 3D-Informed World-Consistent Video Generation with Precise Camera Control}, author={Ren, Xuanchi and Shen, Tianchang and Huang, Jiahui and Ling, Huan and Lu, Yifan and Nimier-David, Merlin and MΓΌller, Thomas and Keller, Alexander and Fidler, Sanja and Gao, Jun}, booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition}, year={2025} }

NaNK
β€”
333
25

Cosmos-Predict2-14B-Video2World

NaNK
β€”
328
27

bigvgan_v2_22khz_80band_fmax8k_256x

NaNK
license:mit
318
2

stt_en_fastconformer_transducer_large

license:cc-by-4.0
316
9

C-RADIOv3-g

β€”
307
7

stt_ua_fastconformer_hybrid_large_pc

license:cc-by-4.0
307
4

PS3-4K-SigLIP2

NaNK
β€”
307
1

stt_es_conformer_ctc_large

license:cc-by-4.0
298
2

llama-nemoretriever-colembed-1b-v1

The nvidia/llama-nemoretriever-colembed-1b-v1 is a late interaction embedding model fine-tuned for query-document retrieval. Users can input `queries`, which are text, or `documents` which are page images, to the model. The model outputs ColBERT-style multi-vector numerical representations for input queries and documents. It is the smaller version of llama-nemoretriever-colembed-3b-v1, which achieved 1st place on ViDoRe V1 (nDCG@5), ViDoRe V2 (nDCG@5) and MTEB VisualDocumentRetrieval (Rank Borda) (as of 27th June, 2025). nvidia/llama-nemoretriever-colembed-1b-v1 achieves 2nd place on the benchmarks. For more technical details, please refer to our technical report: Llama Nemoretriever Colembed: Top-Performing Text-Image Retrieval Model This model is for non-commercial/research use only. License/Terms of Use Governing Terms for llama-nemoretriever-colembed-1b-v1 model: NVIDIA Non-Commercial License Additional Information: Apache License 2.0 for siglip2-giant-opt-patch16-384; and LLAMA 3.2 Community License Agreement for Llama-3.2-1B. Built with Meta Llama 3. Improved using Qwen. This project will download and install additional third-party open source software projects. Review the license terms of these open source projects before use. Team - Mengyao Xu - Gabriel Moreira - Radek Osmulski - Ronay Ak - Yauhen Babakhin - Even Oldridge - Benedikt Schifferer Correspondence to Mengyao Xu ([email protected]) and Benedikt Schifferer ([email protected]) | Model Name | Use-Case | Comment | |-------------------------------------------|------------------------|---------------------------------------------------------------------------------| | nvidia/llama-NemoRetriever-ColEmbed-1B-v1 | Research-Only | Smaller Version of nvidia/llama-NemoRetriever-ColEmbed-3B-v1 | | nvidia/llama-NemoRetriever-ColEmbed-3B-v1 | Research-Only | #1 ViDoRe V1, V2 and MTEB VisualDocumentRetrieval as of June 27, 2025 | | llama-32-nemoretriever-1b-vlm-embed-v1 | Commercial Application | MultiModal Embedding Model for Production Use-Case of Visual Document Retrieval | | llama-32-nv-embedqa-1b-v2 | Commercial Application | Text Embedding Model for Production Use-Case of Text Document Retrieval | | llama-32-nemoretriever-500m-rerank-v2 | Commercial Application | Text Reranker Model for Production Use-Case of Text Document Retrieval | | llama-32-nv-rerankqa-1b-v2 | Commercial Application | Text Reranker Model for Production Use-Case of Text Document Retrieval | | nvidia/NV-Embed-v2 | Research-Only | #1 MTEB as of Aug 30, 2024 | | nvidia/MM-Embed | Research-Only | Improved nvidia/NV-Embed-v1 and multimodal embeddings | | nvidia/NV-Retriever-v1 | Research-Only | #1 MTEB BEIR as of July 12th, 2024 Use Case llama-nemoretriever-colembed is intended for researchers exploring applications that must understand or retrieve information across both text and image modalities. It is instrumental in multimodal RAG systems, where queries are in text format and documents are images, such as pages, text, charts, tables or infographics. Potential applications include multimedia search engines, cross-modal retrieval systems, and conversational AI with rich input understanding. Release Date Huggingface on 06/27/2025 via https://huggingface.co/nvidia/llama-nemoretriever-colembed-1b-v1 - Architecture Type: Transformer - Network Architecture: google/siglip2-giant-opt-patch16-384 + meta-llama/Llama-3.2-1B The llama-nemoretriever-colembed-1b-v1 is a transformer-based multimodal embedding model built on top of a VLM based on google/siglip2-giant-opt-patch16-384 and meta-llama/Llama-3.2-1B. | Property | Query | Document | |----------|-------|----------| | Input Type | Text | Text \| Image | | Input Format | List of strings | List of strings \| List of Images | | Input Parameter | 1D | 1D | | Other Properties | The model's maximum context length is 8192 tokens. Texts longer than maximum length must either be chunked or truncated. | The model's maximum context length is 8192 tokens. Texts longer than maximum length must either be chunked or truncated. Images must be python PIL format. The model will scale the image into multiple tiles of 512x512. | - Output Type: Floats - Output Format: List of float arrays - Output Parameters: The list of floats equivalent to [batchsize x seq length x embeddingdim] - Other Properties Related to Output: Model outputs embedding vectors of dimension for each input token. Our AI models are designed and/or optimized to run on NVIDIA GPU-accelerated systems. By leveraging NVIDIA’s hardware (e.g. GPU cores) and software frameworks (e.g., CUDA libraries), the model achieves faster training and inference times compared to CPU-only solutions. The model requires transformers version 4.49.0 and flash attention The HuggingFace model artifact contains a script to evaluate ViDoRe V1 and ViDoRe V2 based on the GitHub repository The HuggingFace model artifact contains a script to evaluate MTEB VisualDocumentRetrieval. We install ViDoRe benchmark to capture dependencies, first. Runtime Engine(s): TensorRT, Triton Supported Hardware Microarchitecture Compatibility: A100 40GB, A100 80GB, H100 80GB Supported Operating System(s): Linux Model Version(s) llama-nemoretriever-colembed-1b-v1 - The total size (in number of data points) 12.74M qa pairs for training - Total number of datasets: 23 datasets used for training and 17 datasets used for evaluation. Training Dataset The model was trained on publicly available datasets, including HotpotQA, MIRACL, Natural Questions (NQ), Stack Exchange, SQuAD, Tiger Math/Stack, DocMatix-IR, VDR, Vidore-ColPali-Training, VisRAG-Ret-Train-Synthetic-data, VisRAG-Ret-Train-In-domain-data, and Wiki-SS-NQ. - Data Collection Method by dataset: Hybrid: Automated, Human, Synthetic - Labeling Method by dataset: Hybrid: Automated, Human, Synthetic - Properties: Training: 1st Stage: 12M qa pairs, 2nd Stage: 500k qa pairs, 3rd Stage: 240k qa pairs Evaluation Dataset We evaluate the model on multiple benchmarks for Visual Document Retrieval, ViDoRe V1, ViDoRe V2 and MTEB Visual Document Retrieval. - Data Collection Method by dataset: Hybrid: Automated, Human, Synthetic - Labeling Method by dataset: Hybrid: Automated, Human, Synthetic - Properties: More details on ViDoRe V1 and ViDoRe V2 can be found on their leaderboard. Visual Document Retrieval Benchmark, ViDoRe, is composed of various page-level retrieving tasks spanning multiple domains, languages, and settings. | Benchmark | Model 1B | Model 3B | |--------------------------------|--------------|--------------| | ViDoRe V1 (06/27/2025) | 0.9050 | 0.9100 | | ViDoRe V1 (deprecated) | 0.9049 | 0.9098 | | ViDoRe V2 (06/27/2025) | 0.6209 | 0.6352 | | ViDoRe V2 (deprecated) | 0.6261 | 0.6342 | | MTEB Visual Document Retrieval | 0.8238 | 0.8315 | Note: All scores are Avg. NDCG@5. ViDoRe V1 and V2 was updated on June 27th 2025 to use the calculated scores from MTEB, which can result in slightly different scores. The ViDoRe V2 (06/27/2025) uses only 4 of the original 7 datasets. Ethical Considerations NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse. Please report security vulnerabilities or NVIDIA AI Concerns here.

NaNK
llama_nemoretrievercolembed
297
17

content-type-classifier-deberta

β€”
296
15

stt_ka_fastconformer_hybrid_large_pc

license:cc-by-4.0
295
6

Llama-3.1-Nemotron-8B-UltraLong-2M-Instruct

NaNK
llama
293
15

Mistral-NeMo-12B-Base

NaNK
license:apache-2.0
292
40

Cosmos-Predict2-2B-Text2Image

NaNK
β€”
289
63

stt_es_fastconformer_hybrid_large_pc

license:cc-by-4.0
288
3

nemocurator-fineweb-nemotron-4-edu-classifier

β€”
286
11

stt_pl_fastconformer_hybrid_large_pc

license:cc-by-4.0
283
2

Nemotron-4-Mini-Hindi-4B-Instruct

NaNK
β€”
274
19

OpenMath2-Llama3.1-8B

OpenMath2-Llama3.1-8B is obtained by finetuning Llama3.1-8B-Base with OpenMathInstruct-2. The model outperforms Llama3.1-8B-Instruct on all the popular math benchmarks we evaluate on, especially on MATH by 15.9%. .image-container { display: flex; justify-content: center; align-items: center; gap: 20px; } .image-container img { width: 350px; height: auto; } | Model | GSM8K | MATH | AMC 2023 | AIME 2024 | Omni-MATH | |:---|:---:|:---:|:---:|:---:|:---:| | Llama3.1-8B-Instruct | 84.5 | 51.9 | 9/40 | 2/30 | 12.7 | | OpenMath2-Llama3.1-8B (nemo \| HF) | 91.7 | 67.8 | 16/40 | 3/30 | 22.0 | | + majority@256 | 94.1 | 76.1 | 23/40 | 3/30 | 24.6 | | Llama3.1-70B-Instruct | 95.8 | 67.9 | 19/40 | 6/30 | 19.0 | | OpenMath2-Llama3.1-70B (nemo \| HF) | 94.9 | 71.9 | 20/40 | 4/30 | 23.1 | | + majority@256 | 96.0 | 79.6 | 24/40 | 6/30 | 27.6 | The pipeline we used to produce the data and models is fully open-sourced! Our models are trained with the same "chat format" as Llama3.1-instruct models (same system/user/assistant tokens). Please note that these models have not been instruction tuned on general data and thus might not provide good answers outside of math domain. We recommend using instructions in our repo to run inference with these models, but here is an example of how to do it through transformers api: We provide all instructions to fully reproduce our results. If you find our work useful, please consider citing us! By accessing this model, you are agreeing to the LLama 3.1 terms and conditions of the license, acceptable use policy and Meta’s privacy policy

NaNK
llama
272
31

stt_zh_citrinet_1024_gamma_0_25

license:cc-by-4.0
267
5

stt_de_conformer_ctc_large

license:cc-by-4.0
263
4

audio-codec-22khz

β€”
262
5

stt_nl_fastconformer_hybrid_large_pc

license:cc-by-4.0
259
1

DeepSeek-R1-FP4-v2

NaNK
license:mit
257
3

nemo-nano-codec-22khz-1.89kbps-21.5fps

β€”
256
5

stt_en_fastconformer_ctc_xlarge

license:cc-by-4.0
255
2

stt_es_conformer_transducer_large

license:cc-by-4.0
253
3

stt_kab_conformer_transducer_large

license:cc-by-4.0
251
2

GR00T-N1-2B

NaNK
β€”
250
335

Audio2Face 3D V3.0

Description: Audio2Face-3D-v3.0 generates 3D facial animations from audio inputs, for use in applications such as video conferencing, virtual reality, and digital content creation. This model is ready for commercial/non-commercial use. For source code, documentation, helper scripts, packaged builds, and links to all components in the Audio2Face-3D technology stack, visit the Audio2Face-3D GitHub repository Use of this model is governed by the NVIDIA Open Model License Use Case: Audio2Face-3D-v3.0 is designed for developers and researchers working on audio-driven animation and emotion detection applications, such as virtual assistants, chatbots, and affective computing systems. Release Date: Hugging Face: 09/24/2025 via https://huggingface.co/nvidia/Audio2Face-3D-v3.0 References(s): NVIDIA, Audio2Face-3D: Audio-driven Realistic Facial Animation For Digital Avatars, 2025. https://arxiv.org/abs/2508.16401 Model Architecture: Architecture Type: Transformer, Diffusion Network Architecture: Hubert Number of model parameters: 1.80x10^8 Input: Input Type(s): Audio Input Format: Array of float Input Parameters: One-Dimensional (1D) Other Properties Related to Input: All audio is resampled to 16KHz Output: Output Type(s): Facial motion Output Format: Array of float Output Parameters: Two-Dimensional (2D) Other Properties Related to Output: Facial motion on skin, tongue, jaw, and eyeballs Our AI models are designed and/or optimized to run on NVIDIA GPU-accelerated systems. By leveraging NVIDIA’s hardware (e.g. GPU cores) and software frameworks (e.g., CUDA libraries), the model achieves faster training and inference times compared to CPU-only solutions. Software Integration: Runtime Engine(s): Audio2Face-SDK Supported Hardware Microarchitecture Compatibility: NVIDIA Ampere NVIDIA Blackwell NVIDIA Hopper NVIDIA Lovelace NVIDIA Pascal NVIDIA Turing Preferred/Supported Operating System(s): Linux Windows The integration of foundation and fine-tuned models into AI systems requires additional testing using use-case-specific data to ensure safe and effective deployment. Following the V-model methodology, iterative testing and validation at both unit and system levels are essential to mitigate risks, meet technical and functional requirements, and ensure compliance with safety and ethical standards before deployment. This AI model can be embedded as an Application Programming Interface (API) call into the software environment described above. Data Collection Method by dataset Human - 3D facial motion data and audio Labeling Method by dataset Human - Commercial capture solution and internal labeling Properties (Quantity, Dataset Descriptions, Sensor(s)): Audio and 3D facial motion from multiple speech sequences Data Collection Method by dataset: Human - 3D facial motion data and audio Labeling Method by dataset: Human - Commercial capture solution and internal labeling Properties (Quantity, Dataset Descriptions, Sensor(s)): Audio and 3D facial motion from multiple speech sequences Data Collection Method by dataset: Human - 3D facial motion data and audio Labeling Method by dataset: Human - Commercial capture solution and internal labeling Properties (Quantity, Dataset Descriptions, Sensor(s)): Audio and 3D facial motion from multiple speech sequences Ethical Considerations: NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse. For more detailed information on ethical considerations for this model, please see the Model Card++ Bias, Explainability, Safety & Security, and Privacy Subcards. Please report model quality, risk, security vulnerabilities or NVIDIA AI Concerns here Bias Field | Response :---------------------------------------------------------------------------------------------------|:--------------- Participation considerations from adversely impacted groups protected classes in model design and testing: | None Measures taken to mitigate against unwanted bias: | None Explainability Field | Response :------------------------------------------------------------------------------------------------------|:--------------------------------------------------------------------------------- Intended Task/Domain: | Customer Service, Media & Entertainment Model Type: | Transformer, Diffusion Intended Users: | Interactive avatar developers, Digital content creators Output: | Facial pose Describe how the model works: | Audio input is encoded and concatenated with emotion label, then passed into diffusion-mechanism to output facial motion sequence. Name the adversely impacted groups this has been tested to deliver comparable outcomes regardless of: | Not Applicable Technical Limitations & Mitigation: | This model may not work well with poor audio input. Verified to have met prescribed NVIDIA quality standards: | Yes Performance Metrics: | Lipsync accuracy, Latency, Throughput Potential Known Risks: | This model may generate inaccurate lip poses given low-quality audio input. Licensing: | Use of this model is governed by the NVIDIA Open Model License Privacy Field | Response :----------------------------------------------------------------------------------------------------------------------------------|:----------------------------------------------- Generatable or reverse engineerable personal data? | No Personal data used to create this model? | Yes Was consent obtained for any personal data used? | Yes How often is dataset reviewed? | Before Release Is a mechanism in place to honor data subject right of access or deletion of personal data? | Yes If personal data was collected for the development of the model, was it collected directly by NVIDIA? | Yes If personal data was collected for the development of the model by NVIDIA, do you maintain or have access to disclosures made to data subjects? | Yes If personal data was collected for the development of this AI model, was it minimized to only what was required? | Yes Is there provenance for all datasets used in training? | Yes Does data labeling (annotation, metadata) comply with privacy laws? | Yes Is data compliant with data subject requests for data correction or removal, if such a request was made? | Yes Applicable Privacy Policy | https://www.nvidia.com/en-us/about-nvidia/privacy-policy/ Safety & Security Field | Response :---------------------------------------------------|:---------------------------------- Model Application Field(s): | Customer Service, Media & Entertainment Describe the life critical impact (if present). | Not Applicable Use Case Restrictions: | Abide by NVIDIA Open Model License Model and dataset restrictions: | The Principle of least privilege (PoLP) is applied limiting access for dataset generation and model development. Restrictions enforce dataset access during training, and dataset license constraints adhered to.

β€”
250
34

stt_en_conformer_transducer_large

license:cc-by-4.0
250
7

MambaVision-L2-512-21K

β€”
248
3

Llama-3.1-Nemotron-Safety-Guard-8B-v3

Llama-3.1-Nemotron-Safety-Guard-8B-v3 is a multilingual content safety model that moderates human-LLM interaction content and classifies user prompts and LLM responses as safe or unsafe. If the content is unsafe, the model additionally returns a response with a list of categories that the content violates. It supports 9 languages: English, Spanish, Mandarin, German, French, Hindi, Japanese, Arabic, and Thai. The base large language model (LLM) is the multilingual Llama-3.1-8B-Instruct model from Meta. NVIDIA’s optimized release is LoRa-tuned on approved datasets and better conforms to NVIDIA’s content safety risk taxonomy and other safety risks in human-LLM interactions. The model is trained using the Nemotron-Safety-Guard-Dataset-v3 dataset which is synthetically curated using the CultureGuard pipeline. The model shows strong zero-shot generalization supporting over 20 languages (en, ar, de, es, fr, hi, ja, th, zh, it, ko, nl, cs, da, fi, iw, pt-BR, pl, ru, sv). The model can be prompted using an instruction and a taxonomy of unsafe risks to be categorized. The instruction format for prompt moderation is shown below under input and output examples. For a detailed description of the dataset and model, please see our paper . Experience the model in the NVIDIA NIM Preview here. License/Terms of Use GOVERNING TERMS: Your use of this model is governed by the NVIDIA Open Model License. Additional Information: Llama 3.1 Community License Agreement. Built with Llama. Use Case: This model is intended for developers and researchers building LLMs Model Architecture Architecture Type: Transformer Network Architecture: The base model architecture is based on the Llama-3.1-8B-Instruct model from Meta (Model Card). We perform Parameter Efficient FineTuning (PEFT) over the above base model using the following Network Architecture parameters: - Rank: 8 - Alpha: 32 - Targeted low rank adaptation modules: 'qproj', 'vproj'. The resulting LoRA adapter weights are provided separately and can be downloaded here . Base Model: This model was developed based on Llama-3.1-8B-Instruct Number of model parameters: 8.03B Input Input Type(s): Text Input Format(s): String Input Parameters: One-Dimensional (1D): Sequences Other Properties Related to Input: Context length up to 8K. Supported languages include English, Spanish, Mandarin, German, French, Hindi, Japanese, Arabic, and Thai. Output Output Type(s): Text Json Output Format: String Output Parameters: One-Dimensional (1D): Sequences Other Properties Related to Output: Text in JSON format with 3 fields: "User Safety", "Response Safety", and "Safety Categories". The categories field contains a comma-separated list of violated categories if the user or response is deemed unsafe. The violated category names are not fixed and are dependent on the categories mentioned in the instruction. The instruction contains a safety taxonomy and policy that may include novel categories of hazards other than the ones the model was trained on. Our AI models are designed and/or optimized to run on NVIDIA GPU-accelerated systems. By leveraging NVIDIA’s hardware (e.g. GPU cores) and software frameworks (e.g., CUDA libraries), the model achieves faster training and inference times compared to CPU-only solutions. Software Integration Runtime Engine(s): NeMo 24.12 Supported Hardware Microarchitecture Compatibility: NVIDIA H100-80GB, NVIDIA A100 Operating System(s): Linux The integration of foundation and fine-tuned models into AI systems requires additional testing using use-case-specific data to ensure safe and effective deployment. Following the V-model methodology, iterative testing and validation at both unit and system levels are essential to mitigate risks, meet technical and functional requirements, and ensure compliance with safety and ethical standards before deployment. See the snippet below for usage with Hugging Face Transformers library. A sample inference script is also shared here. The prompt template contains 3 sections, `instruction`, `user prompt`, or `user prompt and LLM response (single or multiple turns)`. The `instruction` remains the same across all languages, while the `user prompt` and `LLM response` are provided in the target language Example Model Input/Output for prompt safety moderation: Input Example Model Input/Output for LLM response safety moderation: Input Dataset Partition: Training (90%), Testing (6%), Evaluation (4%) Our curated training dataset named Nemotron-Safety-Guard-Dataset-v3 consists of a mix collected or generated using the following data sources. English samples taken from Nemotron Content Safety Dataset V2 Samples from Nemotron Content Safety Dataset V2, translated to target languages Synthetic samples generated from Mixtral 8x7B and Mixtral 8x22B and safety annotated using Qwen3-235B Data Collection for Training & Testing Datasets: Hybrid: Automated, Human, Synthetic Data Labeling for Training & Testing Datasets: Hybrid: Automated, Human, Synthetic We used the datasets listed in the next section to evaluate the model. | Nemotron-Safety-Guard-Dataset-v3 | PolyGuardPrompts | RTP-LX | MultiJail | XSafety | Aya Red-teaming | |-------------|--------------|-----------------|------------------|------------------|------------------| | 85.32 | 76.07 | 91.49 | 95.36 | 66.97 | 96.79 | Test split of Nemotron-Safety-Guard-Dataset-v3 - Dataset Inference - Engine: NeMo - Test Hardware NVIDIA H100-80GB Ethical Considerations NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse. Considering the sensitive nature of this project, all data was synthetically generated, and human annotators did not curate any new data. For more detailed information on ethical considerations for this model, please see the Responsible Use Guide available at http://nvidia.com/nemotron-responsible-use. For more detailed information on ethical considerations for this model, please see the Model Card++ Explainability, Bias, Safety & Security, and Privacy Subcards. Please report model quality, risk, security vulnerabilities or NVIDIA AI Concerns here . Citing If you find our work helpful, please consider citing our paper:

NaNK
llama
248
2

ssl_en_nest_xlarge_v1.0

license:cc-by-4.0
246
5

stt_en_citrinet_1024_gamma_0_25

license:cc-by-4.0
246
3

stt_zh_conformer_transducer_large

license:cc-by-4.0
243
13

stt_fr_conformer_ctc_large

license:cc-by-4.0
241
7

Llama-3_3-Nemotron-Super-49B-v1-FP8

NaNK
llama-3
240
9

OpenMath-Nemotron-7B

NaNK
license:cc-by-4.0
238
9

RADIO

β€”
235
41

stt_hr_conformer_transducer_large

license:cc-by-4.0
234
1

stt_en_fastconformer_transducer_xlarge

license:cc-by-4.0
233
24

mel-codec-44khz

β€”
233
7

stt_rw_conformer_transducer_large

license:cc-by-4.0
232
1

stt_be_fastconformer_hybrid_large_pc

license:cc-by-4.0
230
4

mel-codec-22khz

β€”
228
3

stt_hr_fastconformer_hybrid_large_pc

license:cc-by-4.0
226
0

Llama2-13B-SteerLM-RM

NaNK
llama2
225
8

stt_ca_conformer_ctc_large

license:cc-by-4.0
225
1

stt_hr_conformer_ctc_large

license:cc-by-4.0
224
1

stt_it_conformer_ctc_large

license:cc-by-4.0
224
1

stt_ca_conformer_transducer_large

license:cc-by-4.0
222
1

stt_en_fastconformer_ctc_xxlarge

license:cc-by-4.0
220
1

Llama-3.3-Nemotron-70B-Select

NaNK
llama
219
10

stt_de_conformer_transducer_large

license:cc-by-4.0
219
7

stt_be_conformer_transducer_large

license:cc-by-4.0
217
6

stt_fr_conformer_transducer_large

license:cc-by-4.0
216
10

slu_conformer_transformer_large_slurp

license:cc-by-4.0
216
2

stt_ka_fastconformer_hybrid_transducer_ctc_large_streaming_80ms_pc

license:cc-by-4.0
216
2

stt_en_citrinet_512_ls

license:cc-by-4.0
215
0

stt_eo_conformer_ctc_large

license:cc-by-4.0
214
3

stt_en_citrinet_384_ls

license:cc-by-4.0
214
0

stt_it_conformer_transducer_large

license:cc-by-4.0
214
0

Nemotron-H-4B-Base-8K

The pretraining data has a cutoff date of September 2024. NVIDIA Nemotron-H-4B-Base-8K is a large language model (LLM) developed by NVIDIA, designed as a completion model for a given piece of text. It uses a hybrid model architecture that consists primarily of Mamba-2 and MLP layers combined with just four Attention layers. The model is pruned and distilled from Nemotron-H-8B-Base-8K using 380B tokens, and features an 8K context length. The supported languages include: English, German, Spanish, French, Italian, Korean, Portuguese, Russian, Japanese, and Chinese. For best performance on a given task, users are encouraged to customize the model using the NeMo Framework suite of customization tools, including Parameter-Efficient Fine-Tuning (P-tuning, Adapters, LoRA, and more), and Model Alignment (SFT, SteerLM, RLHF, and more) using NeMo-Aligner. The model was pruned and distilled from Nemotron-H-Base-8K using our hybrid language model compression technique and then fine-tuned into Nemotron-H-4B-Instruct-128K. For more details, please refer to the paper. The paper has been accepted for publication at NeurIPS 2025. License/Terms of Use GOVERNING TERMS: Use of this model is governed by the NVIDIA Internal Scientific Research and Development Model License Model Architecture - Architecture Type: Transformer - Network Architecture: Nemotron-Hybrid Use Case: This model is intended for developers and researchers building LLMs Release Date: Huggingface: 10/23/2025 via https://huggingface.co/ Input - Input Type(s): Text - Input Format(s): String - Input Parameters: One-Dimensional (1D): Sequences - Other Properties Related to Input: Context length up to 8K. Supported languages include German, Spanish, French, Italian, Korean, Portuguese, Russian, Japanese, Chinese and English. Output - Output Type(s): Text - Output Format: String - Output Parameters: One-Dimensional (1D): Sequences Our AI models are designed and/or optimized to run on NVIDIA GPU-accelerated systems. By leveraging NVIDIA’s hardware (e.g. GPU cores) and software frameworks (e.g., CUDA libraries), the model achieves faster training and inference times compared to CPU-only solutions. Software Integration - Runtime Engine(s): NeMo 24.12 - Supported Hardware Microarchitecture Compatibility: NVIDIA H100-80GB, NVIDIA A100 - Operating System(s): Linux References [\[2504.11409\] Efficient Hybrid Language Model Compression through Group-Aware SSM Pruning](https://arxiv.org/abs/2504.11409) As this is a base model, no explicit prompt format is recommended or required. #Training & Testing Datasets: The training corpus for Nemotron-H-4B-Base-8K consists of English and multilingual text (German, Spanish, French, Italian, Korean, Portuguese, Russian, Japanese, Chinese and English), as well as code. Our sources cover a variety of document types such as: webpages, dialogue, articles, and other written materials. This model was also improved using synthetic data from Qwen (Built with Qwen). The corpus spans domains including legal, math, science, finance, and more. We also include a small portion of question-answering, and alignment style data to improve model accuracies. Data Collection for Training & Testing Datasets: Hybrid: Automated, Human, Synthetic Data Labeling for Training & Testing Datasets: Hybrid: Automated, Human, Synthetic We used the datasets listed in the next section to evaluate the model. Data Collection for Training Datasets: Hybrid: Automated, Human, Synthetic Data Labeling for Training Datasets: Hybrid: Automated, Human, Synthetic | ARC Challenge 0-shot | Hellaswag 0-shot | Winogrande 0-shot | CommonsenseQA 0-shot | |-------------|--------------|-----------------|------------------| | 54.4 | 77.0 | 71.3 | 70.2 | ARC (Ai2 reasoning challenge)-Challenge - The challenge set of questions from a benchmark that contains grade-school level, multiple-choice science questions to assess question answering ability of language models. Dataset Hellaswag - Tests the ability of a language model to correctly finish the provided context from a choice of possible options. Dataset Winogrande - Tests the ability to choose the right option for a given sentence which requires commonsense reasoning. Dataset CommonsenseQA - A multiple-choice question answering dataset that requires different type of commonsense knowledge to predict the correct answers. Dataset | MBPP(sanitized) 3-shot | MBPP+ 0-shot | HumanEval 0-shot | HumanEval+ 0-shot | |-------------|--------------|-----------------|------------------| | 65.0 | 61.1 | 59.8 | 55.5 | MBPP (Mostly Basic Python Programming Problems) - Evaluates ability to generate solutions for Python programming tasks. Dataset MBPP+ - Extended version of MBPP with additional validation. Dataset HumanEval - Tests code generation and completion abilities in Python. Dataset GSM8K (Grade School Math 8K) - Evaluates grade school level mathematical word problem solving. Dataset MMLU - Tests knowledge across 57 subjects including science, humanities, math and more. Dataset The model was trained on data that contains toxic language, unsafe content, and societal biases originally crawled from the internet. Therefore, the model may amplify those biases and return toxic responses especially when prompted with toxic prompts. The model may generate answers that may be inaccurate, omit key information, or include irrelevant or redundant text producing socially unacceptable or undesirable text, even if the prompt itself does not include anything explicitly offensive. The model demonstrates weakness to alignment-breaking attacks. Users are advised to deploy language model guardrails alongside this model to prevent potentially harmful outputs. The model may generate answers that are inaccurate, omit key information, or include irrelevant or redundant text. Inference - Engine: NeMo - Test Hardware NVIDIA H100-80GB Ethical Considerations NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse. For more detailed information on ethical considerations for this model, please see the Responsible Use Guide available at http://nvidia.com/nemotron-responsible-use. Please report security vulnerabilities or NVIDIA AI Concerns here.

NaNK
β€”
213
5

stt_en_citrinet_256_ls

license:cc-by-4.0
212
0

stt_eo_conformer_transducer_large

license:cc-by-4.0
209
3

stt_rw_conformer_ctc_large

license:cc-by-4.0
208
1

stt_en_citrinet_768_ls

license:cc-by-4.0
208
0

OpenCodeReasoning-Nemotron-14B

NaNK
license:apache-2.0
206
18

stt_en_fastconformer_transducer_xxlarge

license:cc-by-4.0
206
12

Riva-Translate-4B-Instruct

NaNK
β€”
206
9

stt_be_conformer_ctc_large

license:cc-by-4.0
204
4

Llama-3.3-Nemotron-70B-Edit

NaNK
llama
204
3

Parakeet Rnnt 110m Da Dk

NVIDIA Parakeet-RNNT 110M (Danish) `parakeet-rnnt-110m-da-dk` `parakeet-rnnt-110m-da-dk` is a 110-million-parameter automatic speech recognition (ASR) model trained to transcribe Danish speech. This model uses the FastConformer architecture trained with RNN-T (RNN-Transducer) loss. For more information, refer to the NeMo documentation. This model is ready for commercial/non-commercial use. GOVERNING TERMS: Use of this model is governed by the NVIDIA Open Model License Agreement (found at https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/). Performing speech-to-text transcription in Danish language Release Date: Huggingface 10/15/25 via https://huggingface.co/nvidia/parakeet-rnnt-110m-da-dk/tree/main [1] Fast Conformer with Linearly Scalable Attention for Efficient Speech Recognition Model Architecture: Architecture Type: Transducer model Network Architecture: FastConformer encoder and LSTM prediction network Input Type(s): Audio Input Format(s): 16000 Hz mono-channel Input Parameters: 1D Output Type(s): Text Output Format: string Output Parameters: 1D Other Properties Related to Output: Not applicable This model provides transcribed speech as a string for a given audio with Danish speech. Our AI models are designed and/or optimized to run on NVIDIA GPU-accelerated systems. By leveraging NVIDIA’s hardware (e.g. GPU cores) and software frameworks (e.g., CUDA libraries), the model achieves faster training and inference times compared to CPU-only solutions. Supported Hardware Microarchitecture Compatibility: The total size: ~10000 hours Total number of datasets: 3 - Data Collection Method by Dataset: Human - Labeling Method by Dataset: Human - Properties: read aloud Danish speech Link: Mozilla Common Voice v17 - Data Collection Method by Dataset: Human - Labeling Method by Dataset: Automated - Properties: sentences recorded by different people Link: Granary Danish a cleaned version of Danish part of MOSEL - Data Collection Method by Dataset: Human - Labeling Method by Dataset: Automated - Properties: EU Parlament sessions - Data Collection Method by Dataset: Human - Labeling Method by Dataset: Human - Properties: read aloud Danish speech Link: Mozilla Common Voice v17 - Data Collection Method by Dataset: Human - Labeling Method by Dataset: Automated - Properties: sentences recorded by different people - Data Collection Method by Dataset: Human - Labeling Method by Dataset: Human - Properties: sentences recorded by different people The table below summarizes the WER (%) of this model using RNN-T decoder with greedy decoding: | Dataset | WER (%) | |-------------|-------------| | CoRal Test | 10.7 | | MCV v17 | 8.8 | | Google FLEURS | 10.6 | These WER scores were obtained using greedy decoding without an external language model. NVIDIA Riva, is an accelerated speech AI SDK deployable on-prem, in all clouds, multi-cloud, hybrid, on edge, and embedded. Additionally, Riva provides: World-class out-of-the-box accuracy for the most common languages with model checkpoints trained on proprietary data with hundreds of thousands of GPU-compute hours Best in class accuracy with run-time word boosting (e.g., brand and product names) and customization of acoustic model, language model, and inverse text normalization Streaming speech recognition, Kubernetes compatible scaling, and enterprise-grade support Although this model isn’t supported yet by Riva, the list of supported models is here. Check out Riva live demo. Acceleration Engine: Not Applicable Test Hardware: NVIDIA RTX 6000 Ethical Considerations: NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse. Please report security vulnerabilities or NVIDIA AI Concerns here. To train, fine-tune or play with the model you will need to install NVIDIA NeMo. We recommend you install it after you've installed latest PyTorch version. Alternatively, you can use a Docker container. The code below was verified to work with NeMo Framework 25.09. Transcribing using Python Assuming you have an audio file with Danish speech simply do:

β€”
201
13

Cosmos-1.0-Tokenizer-CV8x8x8

β€”
198
21

Nemotron-4-Mini-Hindi-4B-Base

NaNK
β€”
198
13

OpenCodeReasoning-Nemotron-1.1-7B

NaNK
β€”
198
12

Llama-3.3-Nemotron-70B-Feedback

NaNK
llama
195
7

bigvgan_base_24khz_100band

NaNK
license:mit
187
0

Audio2Face-3D-v2.3.1-Claire

β€”
183
2

Audio2Face-3D-v2.3-Mark

β€”
181
9

DLER Llama Nemotron 8B Merge Research

NaNK
llama
179
10

Llama3-ChatQA-1.5-70B

NaNK
llama
177
333

RADIO-H

β€”
177
10

Audio2Face-3D-v2.3.1-James

β€”
176
3

DLER R1 7B Research

πŸš€ The leading efficient reasoning model for cutting-edge research and development 🌟 [](https://www.arxiv.org/abs/2510.15110) [](https://github.com/NVlabs/DLER) [](https://huggingface.co/collections/nvidia/reasoning-efficiency-research) [](https://nvlabs.github.io/DLER/) Description: DLER-Qwen-R1-7B is an ultra-efficient 7B open-weight reasoning model designed for challenging tasks such as mathematics, programming, and scientific problem-solving. It is trained with the DLER algorithm on agentica-org/DeepScaleR-Preview-Dataset. Compared to DeepSeek’s 7B model, DLER-Qwen-R1-7B achieves substantial efficiency gains, reducing the average response length by nearly 80% across diverse mathematical benchmarks with better accuracy. Evaluation Results: | Model | MATH | Length | AIME | Length | AMC | Length | Minerva |Length | Olympiad |Length | Total Avg Length | |------------------|----------|------------|--------------------|------------------|--------------------|------------------|--------------------|------------------|--------------------|------------------|-----------------| | Deepseek-R1-7B | 93.60 | 3999 | 55.40 | 13241 | 82.90 | 7461 | 49.79 | 5199 | 58.21 | 8837 | 7747 | | DLER-R1-7B | 94.21 (+0.61%) | 1634 (-60%) | 55.62 (+0.22%) | 3230 (-76%) | 84.41 (+1.51%) | 2512 (-0.67%) | 53.88 (+4.09%) | 2058 (-61%) | 60.48 (+2.27%) | 2592 (-71%) | 2405 (-69%) | Citation If you find our model helpful, please cite the following [paper]():

NaNK
β€”
175
13

Nemotron 4 340B Instruct

[](#model-architecture)[](#model-architecture)[](#datasets) Nemotron-4-340B-Instruct is a large language model (LLM) that can be used as part of a synthetic data generation pipeline to create training data that helps researchers and developers build their own LLMs. It is a fine-tuned version of the Nemotron-4-340B-Base model, optimized for English-based single and multi-turn chat use-cases. It supports a context length of 4,096 tokens. The base model was pre-trained on a corpus of 9 trillion tokens consisting of a diverse assortment of English based texts, 50+ natural languages, and 40+ coding languages. Subsequently the Nemotron-4-340B-Instruct model went through additional alignment steps including: - Supervised Fine-tuning (SFT) - Direct Preference Optimization (DPO) - Reward-aware Preference Optimization (RPO) (Additional in-house alignment technique) Throughout the alignment process, we relied on only approximately 20K human-annotated data while our data generation pipeline synthesized over 98% of the data used for supervised fine-tuning and preference fine-tuning (DPO & RPO). We provide comprehensive details about our synthetic data generation pipeline in the technical report. This results in a model that is aligned for human chat preferences, improvements in mathematical reasoning, coding and instruction-following, and is capable of generating high quality synthetic data for a variety of use cases. Under the NVIDIA Open Model License, NVIDIA confirms: - Models are commercially usable. - You are free to create and distribute Derivative Models. - NVIDIA does not claim ownership to any outputs generated using the Models or Derivative Models. Nemotron-4-340B-Instruct is a chat model intended for use for the English language. Nemotron-4-340B-Instruct is designed for Synthetic Data Generation to enable developers and enterprises for building and customizing their own large language models and LLM applications. The instruct model itself can be further customized using the NeMo Framework suite of customization tools including Parameter-Efficient Fine-Tuning (P-tuning, Adapters, LoRA, and more), and Model Alignment (SFT, SteerLM, RLHF, and more) using NeMo-Aligner. Refer to the documentation for examples. Model Dates: Nemotron-4-340B-Instruct was trained between December 2023 and May 2024. Data Freshness: The pretraining data has a cutoff of June 2023. BF16 Inference: - 8x H200 (1x H200 node) - 16x H100 (2x H100 nodes) - 16x A100 80GB (2x A100 80GB nodes) Nemotron-4-340B-Instruct is standard decoder-only Transformer, trained with a sequence length of 4096 tokens, uses Grouped-Query Attention (GQA), and Rotary Position Embeddings (RoPE). Architecture Type: Transformer Decoder (auto-regressive language model) Note: For Nemotron-4-340B-Instruct we recommend keeping the system prompt empty. An example of a formattable prompt template is available in the following section. Deployment and inference with Nemotron-4-340B-Instruct can be done in three steps using NeMo Framework: Create a Python script to interact with the deployed model. Create a Bash script to start the inference server Schedule a Slurm job to distribute the model across 2 nodes and associate them with the inference server. 2. Given this Python script, create a Bash script which spins up the inference server within the NeMo container () and calls the Python script ``callserver.py``. The Bash script ``nemoinference.sh`` is as follows, 3. Launch ``nemoinference.sh`` with a Slurm script defined like below, which starts a 2-node job for model inference. Evaluated using MT-Bench judging by GPT-4-0125-Preview as described in Appendix H in the HelpSteer2 Dataset Paper | total | writing | roleplay | extraction | stem | humanities | reasoning | math | coding | turn 1 | turn 2 | | :----- | :------- | :-------- | :---------- | :---- | :---------- | :--------- | :---- | ------ | :------ | :------ | | 8.22 | 8.70 | 8.70 | 9.20 | 8.75 | 8.95 | 6.40 | 8.40 | 6.70 | 8.61 | 7.84 | Evaluated using the Instruction Following Eval (IFEval) introduced in Instruction-Following Evaluation for Large Language Models. | Prompt-Strict Acc | Instruction-Strict Acc | | :----------------------- | :---------------------------- | | 79.9 | 86.1 | Evaluated using the Multi-task Language Understanding benchmarks as introduced in Measuring Massive Multitask Language Understanding. Evaluated using the Grade School Math 8K (GSM8K) benchmark as introduced in Training Verifiers to Solve Math Word Problems. Evaluated using the HumanEval benchmark as introduced in Evaluating Large Language Models Trained on Code. Evaluated using the MBPP Dataset as introduced in the Program Synthesis with Large Language Models. Evaluated using the Arena-Hard Pipeline from the LMSys Org. Evaluated using the AlpacaEval 2.0 LC (Length Controlled) as introduced in the paper: Length-Controlled AlpacaEval: A Simple Way to Debias Automatic Evaluators | AlpacaEval 2.0 LC| | :----------------- | | 41.5 | Evaluated using the CantTalkAboutThis Dataset as introduced in the CantTalkAboutThis: Aligning Language Models to Stay on Topic in Dialogues. | Distractor F1 | On-topic F1 | | :----------------------- | :---------------------------- | | 81.7 | 97.7 | The Nemotron-4 340B-Instruct model underwent safety evaluation including adversarial testing via three distinct methods: - Garak, is an automated LLM vulnerability scanner that probes for common weaknesses, including prompt injection and data leakage. - AEGIS, is a content safety evaluation dataset and LLM based content safety classifier model, that adheres to a broad taxonomy of 13 categories of critical risks in human-LLM interactions. - Human Content Red Teaming leveraging human interaction and evaluation of the models' responses. The model was trained on data that contains toxic language, unsafe content, and societal biases originally crawled from the internet. Therefore, the model may amplify those biases and return toxic responses especially when prompted with toxic prompts. The model may generate answers that may be inaccurate, omit key information, or include irrelevant or redundant text producing socially unacceptable or undesirable text, even if the prompt itself does not include anything explicitly offensive. NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse. For more detailed information on ethical considerations for this model, please see the Model Card++ Explainability, Bias, Safety & Security, and Privacy Subcards here. Please report security vulnerabilities or NVIDIA AI Concerns here.

NaNK
β€”
173
690

Llama-3.3-Nemotron-70B-Reward-Multilingual

NaNK
llama
172
10

Cosmos-Tokenize1-DV8x16x16-720p

β€”
170
2

NV Reason CXR 3B

Description: NV-Reason-CXR-3B is a specialized vision-language model designed for medical reasoning and interpretation of chest X-ray images, with detailed explanations. The model combines visual understanding with medical reasoning capabilities, enabling healthcare professionals to access comprehensive analyses and engage in follow-up discussions about radiological findings. NV-Reason-CXR-3B provides step-by-step reasoning that mirrors clinical thinking patterns, making it valuable for educational and research applications in medical imaging. πŸ’» [\[Github code\]](https://github.com/NVIDIA-Medtech/NV-Reason-CXR) 🩻 [\[Web Demo\]](https://huggingface.co/spaces/nvidia/nv-reason-cxr) - Transformers: 4.56.1 - Pytorch: 2.7.1 - Tokenizers: 0.22.0 License/Terms of Use: NVIDIA OneWay Non-Commercial License for academic research purposes Use Case: Radiologists, medical students, and medical researchers would be expected to use this system for chest X-ray interpretation with detailed reasoning, educational training with AI-generated explanations, and research applications requiring explainable medical AI analyses. Important Medical AI Considerations: This model is designed for research and educational purposes only and should not be used for clinical diagnosis or treatment decisions. All outputs should be reviewed by qualified medical professionals. The model's reasoning capabilities are intended to support medical education and research, not replace clinical judgment. Release Date: Huggingface: 10/27/2025 via https://huggingface.co/NVIDIA Model Architecture: - Architecture Type: Transformer - Network Architecture: Vision-Language Model based on Qwen2.5-VL architecture with medical reasoning capabilities This model was developed by fine-tuning Qwen2.5-VL-3B using Supervised Fine-Tuning (SFT) and Group Relative Policy Optimization (GRPO) for enhanced medical reasoning. Number of model parameters: 3B Input: - Input Type(s): Image, Text - Input Format(s): Medical images (JPEG, PNG), Text prompts (string) - Input Parameters: Two-Dimensional (2D) images with accompanying text queries (1D) - Other Properties Related to Input: Supports frontal chest X-ray images with flexible scaling. Accepts natural language prompts for medical queries, follow-up questions, and reasoning requests. Input images are automatically processed without specific size constraints. Input Specifications: - Medical Images: Chest X-ray images in standard medical imaging formats - Text Prompts: Natural language queries about radiological findings, diagnostic questions, or requests for detailed explanations - Interactive Dialogue: Support for follow-up questions and clarification requests Output: - Output Type(s): Text - Output Format: Structured reasoning with XML-like tags - Output Parameters: One-Dimensional (1D) Natural language reasoning and analysis - Other Properties Related to Output: Outputs contain structured thinking processes enclosed in ` ` tags showing step-by-step medical reasoning, followed by concise answers in ` ` tags. This format enables transparency in the model's diagnostic reasoning process and supports educational use cases. Our AI models are designed and/or optimized to run on NVIDIA GPU-accelerated systems. By leveraging NVIDIA's hardware (GPU cores) and software frameworks (CUDA libraries), the model achieves faster training and inference times compared to CPU-only solutions. Software Integration: Runtime Engine(s): PyTorch Transformers library Supported Hardware Microarchitecture Compatibility: NVIDIA Ampere NVIDIA Hopper NVIDIA Lovelace The integration of foundation and fine-tuned models into AI systems requires additional testing using use-case-specific data to ensure safe and effective deployment. Following the V-model methodology, iterative testing and validation at both unit and system levels are essential to mitigate risks, meet technical and functional requirements, and ensure compliance with safety and ethical standards before deployment. Model Version(s): 0.1 - Initial release version for chest X-ray reasoning and interpretation with structured thinking output Dataset Overview: Large-scale chest X-ray datasets including MIMIC-CXR, ChestXRay14, and CheXpert. Ethical Considerations: NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse. Please make sure you have proper rights and permissions for all input image and video content; if image or video includes people, personal health information, or intellectual property, the image or video generated will not blur or maintain proportions of image subjects included. Please report model quality, risk, security vulnerabilities or concerns here.

NaNK
β€”
168
9

Llama-4-Maverick-17B-128E-Eagle3

NaNK
llama
167
8

Audio2Emotion V2.2

β€”
165
10

C-RADIOv2-VLM-H

β€”
161
10

MambaVision-L-21K

β€”
158
4

PS3-1.5K-SigLIP2

NaNK
β€”
156
1

NV-EmbedCode-7b-v1

NaNK
β€”
155
19

OpenCodeReasoning-Nemotron-1.1-32B

NaNK
β€”
152
45

OpenMath-Nemotron-14B-Kaggle

NaNK
license:cc-by-4.0
152
16

nemo-megatron-gpt-5B

NaNK
license:cc-by-4.0
151
22

DLER R1 1.5B Research

πŸš€ The leading efficient reasoning model for cutting-edge research and development 🌟 [](https://www.arxiv.org/abs/2510.15110) [](https://github.com/NVlabs/DLER) [](https://huggingface.co/collections/nvidia/reasoning-efficiency-research) [](https://nvlabs.github.io/DLER/) Description: DLER-Qwen-R1-1.5B is an ultra-efficient 1.5B open-weight reasoning model designed for challenging tasks such as mathematics, programming, and scientific problem-solving. It is trained with the DLER algorithm on agentica-org/DeepScaleR-Preview-Dataset. Compared to DeepSeek’s 1.5B model, DLER-Qwen-R1-1.5B achieves substantial efficiency gains, reducing the average response length by nearly 80% across diverse mathematical benchmarks with better accuracy. |Model | MATH | Length | AIME | Length | AMC | Length | Minerva |Length | Olympiad |Length | Total Avg Length | |--------------------|----------|------------|--------|----------|-------|----------|---------|----------|----------|----------|--------------| | Deepseek-R1-1.5B | 84.31 | 5500 | 29.79 | 16916 | 61.97 | 10967 | 38.41 | 7494 | 44.07 | 11620 | 10499 | | DLER-R1-1.5B | 86.95 (+2.64%) | 1652 (-70%) | 34.375 (+4.59%) | 3551 (-80%) | 70.48 (+8.51%) | 2537 (-77%) | 43.58 (+5.18%) | 2029 (-73%) | 48.314 (+4.24%) | 2563 (-78%) | 2466 (-77%) | Citation If you find our model helpful, please cite the following [paper]():

NaNK
β€”
148
11

QLIP-B-8-256

license:cc-by-nc-4.0
147
7

bigvgan_24khz_100band

NaNK
license:mit
146
3

Cosmos-Predict2-14B-Text2Image

NaNK
β€”
139
43

SteerLM-llama2-13B

NaNK
license:llama2
139
42

Cosmos-Predict2-0.6B-Text2Image

NaNK
β€”
134
6

bigvgan_base_22khz_80band

NaNK
license:mit
132
0

Diffusion_Renderer_Inverse_Cosmos_7B

NaNK
β€”
128
6

Cosmos-Predict1-7B-Text2World

NaNK
β€”
128
4

Llama3-ChatQA-2-8B

NaNK
llama
125
17

OpenMath-Nemotron-14B

NaNK
license:cc-by-4.0
124
13

Cosmos-0.1-Tokenizer-CV4x8x8

β€”
124
8

Cosmos-1.0-Tokenizer-DV8x16x16

β€”
122
16

Eagle2-9B

NaNK
license:cc-by-nc-4.0
120
61

OpenCodeReasoning-Nemotron-7B

NaNK
license:apache-2.0
120
37

Cosmos-0.1-Tokenizer-DV4x8x8

β€”
119
12

Cosmos-0.1-Tokenizer-DV8x8x8

β€”
119
6

MambaVision-L-1K

β€”
119
5

Diffusion_Renderer_Forward_Cosmos_7B

NaNK
β€”
119
3

Cosmos-1.0-Prompt-Upsampler-12B-Text2World

NaNK
β€”
116
13

Llama3-ChatQA-2-70B

NaNK
llama-3
112
13

NVIDIA-Nemotron-Parse-v1.1-TC

β€”
112
5

Cosmos-Predict2-14B-Sample-GR00T-Dreams-GR1

NaNK
β€”
112
2

parakeet_realtime_eou_120m-v1

Description: Parakeet-Realtime-EOU-120m-v1 is a streaming speech recognition model that also performs end-of-utterance (EOU) detection. It achieves low latency (80ms~160 ms) and signals EOU by emitting an ` ` token at the end of each utterance. The model supports only English and does not output punctuation or capitalization. This model is designed for use in voice AI agent pipelines (e.g., NeMo Voice Agent): This model is ready for commercial/non-commercial use. Model Architecture: Architecture Type: FastConformer-RNNT [1] Network Architecture: cache-aware streaming FastConformer [2] with 17 encoder layers (attention context = [70,1]) and RNNT decoder. Input: Input Type(s): Audio Input Format: Audio waveform Input Parameters: 1-Dimensional Other Properties Related to Input: Single-channel audio in 16kHz sampling rate, at least 160ms duration is required. Output: Output Type(s): Text with optional ` ` token (e.g., "what is your name\ ") Output Format: String Output Parameters: 1-Dimensional Other Properties Related to Output: The output text might be empty if input audio doesn't contain any speech. References(s): [1] Fast Conformer With Linearly Scalable Attention For Efficient Speech Recognition [2] Stateful Conformer with Cache-based Inference for Streaming Automatic Speech Recognition [3] NVIDIA NeMo Toolkit This model is primarily designed for use in voice AI agents under streaming settings. Please refer to NeMo Voice Agent for examples on how to setup up a voice agent with 80ms ASR latency. To use this model in NeMo Voice Agent, set this in the server config yaml: You will need to install NVIDIA NeMo [3]. We recommend you install it after you've installed latest PyTorch version. The model can then be used in the offline setting showned below. Transcribing using Python First, let's get a sample Software Integration: Runtime Engine(s): NeMo 2.5.3+ Supported Hardware Microarchitecture Compatibility: NVIDIA Ampere NVIDIA Blackwell NVIDIA Hopper NVIDIA Volta Training, Testing, and Evaluation Datasets: Training Dataset: - AMI - DialogStudio (subset from task-oriented domain with commercial license) - Granary - Google Speech Commands - LibriTTS - 10,000 hours from human-transcribed NeMo ASR Set 3.0, including: - LibriSpeech (960 hours) - Fisher Corpus - National Speech Corpus Part 1 - VCTK - Europarl-ASR - Multilingual LibriSpeech - Mozilla Common Voice (v7.0) Data Collection Method [Hybrid: Human, Synthetic] - Most audios are human recorded, but some are generated by TTS models with commercial license Labeling Method [Hybrid: Human, Synthetic] - Some transcripts are automatically generated by automatic speech recognition (ASR) models, while others are manually labeled. - HuggingFace ASR Leaderboard - AMI - Earnings22 - Gigaspeech - LS-test-clean - LS-test-other - SPGI - Tedlium - Voxpopuli - DialogStudio (subset from task-oriented domain with commercial license) Data Collection Method [Hybrid: Human, Synthetic] - Most audios are human recorded, but some are generated by TTS models with commercial license Labeling Method [Hybrid: Human, Synthetic] - Some transcripts are generated by ASR models, while some are manually labeled Word error rate (WER) on HuggingFace OpenASR leaderboard measured in 160ms streaming setting. Text is normalized by this normalizer before caculating the metrics. | Metric | Average | AMI | Earnings22 | Gigaspeech | LS-test-clean | LS-test-other | SPGI | Tedlium | Voxpopuli | |----------------|---------|-------|------------|------------|---------------|----------------|------|---------|-----------| | WER (%) | 9.30 | 15.62 | 15.76 | 13.31 | 3.61 | 7.79 | 3.79 | 5.48 | 9.07 | The latency metrics are evaluated on TTS generated audios from DialogStudio, and a 3-second silence is appended to each sample. The actual performance on real-world scenarios will vary by acoustic environment, accents, etc. | Percentile | Latency | |---------|-----| | 50% | 160ms | | 90% | 280ms | | 95% | 320ms | Test Hardware: NVIDIA V100 NVIDIA A100 NVIDIA A6000 Ethical Considerations: NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse. Please report model quality, risk, security vulnerabilities or NVIDIA AI Concerns here.

β€”
110
29

Cosmos-0.1-Tokenizer-DI8x8

β€”
110
11

Cosmos-0.1-Tokenizer-DI16x16

β€”
110
9

Cosmos-0.1-Tokenizer-CI16x16

β€”
110
8

Cosmos-0.1-Tokenizer-CV8x16x16

β€”
109
9

Cosmos-Predict1-7B-WorldInterpolator

NaNK
β€”
103
5

Cosmos-0.1-Tokenizer-DV8x16x16

β€”
101
9

OpenMath-Nemotron-32B

NaNK
license:cc-by-4.0
100
28

Qwen3 Nemotron 8B BRRM

NaNK
β€”
100
4

Cosmos-0.1-Tokenizer-CV8x8x8

β€”
99
13

Cosmos-Tokenize1-CI8x8-360p

β€”
99
2

Cosmos-Predict1-7B-Decoder-DV8x16x16ToCV8x8x8-720p

NaNK
β€”
99
0

Cosmos-Predict2-14B-Sample-GR00T-Dreams-DROID

NaNK
β€”
97
1

Cosmos-0.1-Tokenizer-CI8x8

β€”
94
32

esm2_t12_35M_UR50D

> [!NOTE] > This model has been optimized using NVIDIA's TransformerEngine > library. Slight numerical differences may be observed between the original model and the optimized > version. For instructions on how to install TransformerEngine, please refer to the > official documentation. ESM-2 is a state-of-the-art protein model trained on a masked language modelling objective. It predicts protein structures from amino acid sequences, leveraging a transformer-based architecture for accurate 3D modeling. It is suitable for fine-tuning on a wide range of tasks that take protein sequences as input. This version of the ESM-2 model is optimized with NVIDIA's TransformerEngine library. It is based on the original ESM-2 model from Facebook Research, and (within numerical precision) has identical weights and outputs. This model is ready for commercial/non-commercial use. This model is not owned or developed by NVIDIA. This model has been developed and built to a third-party's requirements for this application and use case; see link to Non-NVIDIA Model Card ESM-2 Model Card. Protein structure prediction, specifically predicting 3D protein structures from amino acid sequences. Hugging Face 07/29/2025 via https://huggingface.co/nvidia/esm2t1235MUR50D - Evolutionary-scale prediction of atomic level protein structure with a language model - detailed information on the model architecture and training data, please refer to the accompanying [paper]. - Demo notebooks (PyTorch, TensorFlow) which demonstrate how to fine-tune ESM-2 models on your tasks of interest. Architecture Type: Transformer Network Architecture: ESM-2 This model was developed based on: ESM-2 Number of model parameters: 3.4 x 10^7 Input Type: Text (Protein Sequences) Input Format: String Input Parameters: One-Dimensional (1D) Other Properties Related to Input: Protein sequence represented as a string of canonical amino acids, of maximum length 1022. Longer sequences are automatically truncated to this length. Output Type: Embeddings (Amino acid and sequence-level) Output Format: Vector Output Parameters: One-Dimensional (1D) Other Properties Related to Output: Numeric vector with floating-point values corresponding to an embedding for each amino acid in the input protein sequence. Maximum output length is 1022 embeddings - one embedding vector per amino acid. Our AI models are designed and/or optimized to run on NVIDIA GPU-accelerated systems. By leveraging NVIDIA’s hardware (e.g. GPU cores) and software frameworks (e.g., CUDA libraries), the model achieves faster training and inference times compared to CPU-only solutions. Supported Hardware Microarchitecture Compatibility: The integration of foundation and fine-tuned models into AI systems requires additional testing using use-case-specific data to ensure safe and effective deployment. Following the V-model methodology, iterative testing and validation at both unit and system levels are essential to mitigate risks, meet technical and functional requirements, and ensure compliance with safety and ethical standards before deployment. Model Version: This model features the following version/checkpoints: Several ESM-2 checkpoints are available with varying sizes. Larger sizes have better accuracy, but require more memory and time to train: | Checkpoint name | Num layers | Num parameters | | ------------------------------------------------------------------------ | ---------- | -------------- | | esm2t4815BUR50D | 48 | 15B | | esm2t363BUR50D | 36 | 3B | | esm2t33650MUR50D | 33 | 650M | | esm2t30150MUR50D | 30 | 150M | | esm2t1235MUR50D | 12 | 35M | | esm2t68MUR50D | 6 | 8M | Properties (Quantity, Dataset Descriptions, Sensor(s)): UniRef90 clusters are generated from the UniRef100 seed sequences with a 90% sequence identity threshold using the MMseqs2 algorithm. The seed sequences are the longest members of the UniRef100 cluster. However, the longest sequence is not always the most informative. There is often more biologically relevant information and annotation (name, function, cross-references) available on other cluster members. All the proteins in each cluster are ranked to facilitate the selection of a biologically relevant representative for the cluster. Properties: UniRef50 clusters are generated from the UniRef90 seed sequences with a 50% sequence identity threshold using the MMseqs2 algorithm. The seed sequences are the longest members of the UniRef90 cluster. However, the longest sequence is not always the most informative. There is often more biologically relevant information and annotation (name, function, cross-references) available on other cluster members. All the proteins in each cluster are ranked to facilitate the selection of a biologically relevant representative for the cluster. Link: Continuous Automated Model Evaluation (CAMEO) Properties: The data is collected by taking sequences of protein structures that are about to be released weekly by the Protein Data Bank (PDB). These sequences are sent as "blind targets" to participating protein structure prediction servers, which then return their predictions. Link: CASP14 (Critical Assessment of Methods of Protein Structure Prediction) Properties: The data for CASP14 targets is collected from protein structures that are newly solved by experimental structural biologists. The CASP organizers receive the amino acid sequences of these proteins before their full, three-dimensional structures are publicly released in the Protein Data Bank (PDB). They then provide these sequences to participating research groups and servers, who must submit their predicted structures within a specific time frame. NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse. Users are responsible for ensuring the physical properties of model-generated molecules are appropriately evaluated and comply with applicable safety regulations and ethical standards. Please report model quality, risk, security vulnerabilities or NVIDIA AI Concerns here.

license:mit
94
1

Cosmos-1.0-Diffusion-14B-Text2World

NaNK
β€”
93
60

OpenCodeReasoning-Nemotron-32B

NaNK
license:apache-2.0
92
72

OpenMath-Mistral-7B-v0.1-hf

NaNK
license:apache-2.0
91
33

Cosmos-Tokenize1-CV4x8x8-360p

β€”
91
1

MambaVision-T2-1K

β€”
90
4

segformer-b0-finetuned-cityscapes-640-1280

β€”
90
0

Gliner PII

Description: GLiNER-PII is a successor to the Gretel GLiNER PII/PHI models. Built on the GLiNER bi-large base, it detects and classifies a broad range of Personally Identifiable Information (PII) and Protected Health Information (PHI) in structured and unstructured text. It is non-generative and produces span-level entity annotations with confidence scores across 55+ categories. This model was developed by NVIDIA. This model is ready for commercial/non-commercial use. License/Terms of Use Use of this model is governed by the NVIDIA Open Model License Agreement. Use Case: GLiNER-PII supports detection and redaction of sensitive information across regulated and enterprise scenarios. - Healthcare: Redact PHI in clinical notes, reports, and medical documents. - Finance: Identify account numbers, SSNs, and transaction details in banking and insurance documents. - Legal: Protect client information in contracts, filings, and discovery materials. - Enterprise Data Governance: Scan documents, emails, and data stores for sensitive information. - Data Privacy Compliance: Support GDPR, HIPAA, and CCPA workflows across varied document types. - Cybersecurity: Detect sensitive data in logs, security reports, and incident records. - Content Moderation: Flag personal information in user-generated content. Note: performance varies by domain, format, and threshold, so validation and human review are recommended for high‑stakes deployments. Release Date: Hugging Face 10/28/2025 via https://huggingface.co/nvidia/gliner-pii References: - GLiNER base (Hugging Face): https://huggingface.co/knowledgator/gliner-bi-large-v1.0 - Gretel GLiNER PII/PHI models: https://huggingface.co/gretelai/gretel-gliner-bi-large-v1.0 - Training dataset: https://huggingface.co/datasets/nvidia/nemotron-pii - GLiNER library: https://pypi.org/project/gliner/ This model was developed based on knowledgator/gliner-bi-large-v1.0 Number of model parameters: 5.7 Γ— 10^8 Input: Input Type(s): Text Input Format: UTF-8 string(s) Input Parameters: One-Dimensional (1D) Other Properties Related to Input: supports structured and unstructured text Output: Output Type(s): Text Output Format: String Output Parameters: One-Dimensional (1D) Other Properties Related to Output: List of dictionaries with keys {text, label, start, end, score} Software Integration: Runtime Engine(s): PyTorch, GLiNER Python library Supported Hardware Microarchitecture Compatibility: NVIDIA Ampere NVIDIA Blackwell NVIDIA Hopper NVIDIA Lovelace NVIDIA Pascal NVIDIA Turing NVIDIA Volta CPU (x8664) The integration of foundation and fine-tuned models into AI systems requires additional testing using use-case-specific data to ensure safe and effective deployment. Following the V-model methodology, iterative testing and validation at both unit and system levels are essential to mitigate risks, meet technical and functional requirements, and ensure compliance with safety and ethical standards before deployment. Model Version(s): - nvidia/gliner-pii - Version: v1.0 Link: nvidia/nemotron-pii Data Modality: Text Text Training Data Size: \~100k records (\~10^5, Data Collection Method: Synthetic Labeling Method: Synthetic Properties: Synthetic persona-grounded dataset generated with NVIDIA NeMo Data Designer, spanning 50+ industries and 55+ entity types (U.S. and international formats). Includes both structured and unstructured records. Labels automatically injected during generation. Data Collection Method: Hybrid: Automated, Human Labeling Method: Hybrid: Automated, Human Evaluation Results From the combined evaluation across Argilla, AI4Privacy, and Gretel PII datasets: | Benchmark | Strict F1 | | --------------------| -----------: | | Argilla PII | 0.70 | | AI4Privacy | 0.64 | | nvidia/Nemotron-PII | 0.87 | --- We evaluated the model using `threshold=0.3`. First, make sure you have the gliner library installed: Now, let's try to find an email, SSN, and phone number in a messy block of text. Ethical Considerations: NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse. For more detailed information on ethical considerations for this model, please see the Bias, Explainability, Safety & Security, and Privacy Subcards. Please report model quality, risk, security vulnerabilities or NVIDIA AI Concerns here.

β€”
89
12

E-RADIO

β€”
87
14

Llama-3_3-Nemotron-Super-49B-GenRM-Multilingual

NaNK
llama3.3
87
6

Cosmos-Tokenize1-DV4x8x8-360p

β€”
86
1

nemotron-page-elements-v3

The Nemotron Page Elements v3 model is a specialized object detection model designed to identify and extract elements from document pages. While the underlying technology builds upon work from Megvii Technology, we developed our own base model through complete retraining rather than using pre-trained weights. YOLOX is an anchor-free version of YOLO (You Only Look Once), this model combines a simpler architecture with enhanced performance. The model is trained to detect tables, charts, infographics, titles, header/footers and texts in documents. This model supersedes the nemotron-page-elements model and is a part of the NVIDIA Nemotron family of NIM microservices specifically for object detection and multimodal extraction of enterprise documents. This model is ready for commercial/non-commercial use. We are excited to announce the open sourcing of this commercial model. For users interested in deploying this model in production environments, it is also available via the model API in NVIDIA Inference Microservices (NIM) at nemoretriever-page-elements-v2. The use of this model is governed by the NVIDIA Open Model License Agreement and the use of the post-processing scripts are licensed under Apache 2.0. - Theo Viel - Bo Liu - Darragh Hanley - Even Oldridge Correspondence to Theo Viel ([email protected]) and Bo Liu ([email protected]) The Nemotron Page Elements v3 model is designed for automating extraction of text, charts, tables, infographics etc in enterprise documents. It can be used for document analysis, understanding and processing. Key applications include: - Enterprise document extraction, embedding and indexing - Augmenting Retrieval Augmented Generation (RAG) workflows with multimodal retrieval - Data extraction from legacy documents and reports 10/23/2025 via https://huggingface.co/nvidia/nemotron-page-elements-v3 - YOLOX paper: https://arxiv.org/abs/2107.08430 - YOLOX repo: https://github.com/Megvii-BaseDetection/YOLOX - Previous version of the Page Element model: https://build.nvidia.com/nvidia/nemoretriever-page-elements-v2 - Technical blog: https://developer.nvidia.com/blog/approaches-to-pdf-data-extraction-for-information-retrieval/ Architecture Type: YOLOX Network Architecture: DarkNet53 Backbone \+ FPN Decoupled head (one 1x1 convolution \+ 2 parallel 3x3 convolutions (one for the classification and one for the bounding box prediction). YOLOX is a single-stage object detector that improves on Yolo-v3. This model was developed based on the Yolo architecture Number of model parameters: 5.4e7 Input Type(s): Image Input Format(s): Red, Green, Blue (RGB) Input Parameters: Two-Dimensional (2D) Other Properties Related to Input: Image size resized to `(1024, 1024)` Output Type(s): Array Output Format: A dictionary of dictionaries containing `np.ndarray` objects. The outer dictionary has entries for each sample (page), and the inner dictionary contains a list of dictionaries, each with a bounding box (`np.ndarray`), class label, and confidence score for that page. Output Parameters: One-Dimensional (1D) Other Properties Related to Output: The output contains bounding boxes, detection confidence scores, and object classes (chart, table, infographic, title, text, headers and footers). The thresholds used for non-maximum suppression are `confthresh=0.01` and `iouthresh=0.5`. Output Classes: Table Data structured in rows and columns Chart Specifically bar charts, line charts, or pie charts Infographic Visual representations of information that is more complex than a chart, including diagrams and flowcharts Maps are not considered infographics Title Titles can be section titles, or table/chart/infographic titles Header/footer Page headers and footers Text Texts are regions of one or more text paragraphs, or standalone text not belonging to any of the classes above Our AI models are designed and/or optimized to run on NVIDIA GPU-accelerated systems. By leveraging NVIDIA’s hardware (e.g. GPU cores) and software frameworks (e.g., CUDA libraries), the model achieves faster training and inference times compared to CPU-only solutions. The model requires torch, and the custom code available in this repository. - Make sure git-lfs is installed (https://git-lfs.com) Optional: This can be installed as a package using pip Note that this repository only provides minimal code to infer the model. If you wish to do additional training, refer to the original repo. Additional post-processing might be required to use the model as part of a data extraction pipeline. We provide examples in the notebook `Demo.ipynb`. Data Modality: Image Image Training Data Size: Less than a Million Images Data collection method by dataset: Automated Labeling method by dataset: Hybrid: Automated, Human Pretraining (by NVIDIA): 118,287 images of the COCO train2017 dataset Finetuning (by NVIDIA): 36,093 images from Digital Corpora dataset, with annotations from Azure AI Document Intelligence and data annotation team Number of bounding boxes per class: 35,328 tables, 44,178 titles, 11,313 charts and 6,500 infographics, 90,812 texts and 10,743 header/footers. The layout model of Document Intelligence was used with `2024-02-29-preview` API version. The primary evaluation set is a cut of the Azure labels and digital corpora images. Number of bounding boxes per class: 1,985 tables, 2,922 titles, 498 charts, 572 infographics, 4,400 texts and 492 header/footers. Mean Average Precision (mAP) was used as an evaluation metric, which measures the model's ability to correctly identify and localize objects across different confidence thresholds. Data collection method by dataset: Hybrid: Automated, Human Labeling method by dataset: Hybrid: Automated, Human Properties: We evaluated with Azure labels from manually selected pages, as well as manual inspection on public PDFs and powerpoint slides. Per-class Performance Metrics: | Class | AP (%) | AR (%) | |:------------|:-------|:-------| | table | 44.643 | 62.242 | | chart | 54.191 | 77.557 | | title | 38.529 | 56.315 | | infographic | 66.863 | 69.306 | | text | 45.418 | 73.017 | | headerfooter | 53.895 | 75.670 | Test hardware: See Support Matrix from NIM documentation ---> NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse. | Field | Response | | ----- | ----- | | Participation considerations from adversely impacted groups protected classes in model design and testing | None | | Measures taken to mitigate against unwanted bias | None | | Field | Response | | ----- | ----- | | Intended Task/Domain: | Document Understanding | | Model Type: | YOLOX Object Detection for Charts, Tables, Infographics, Header/footers, Texts, and Titles | | Intended User: | Enterprise developers, data scientists, and other technical users who need to extract structural elements from documents. | | Output: | After post-processing, the output is three numpy array that contains the detections: `boxes [N x 4]` (format is normalized `(xmin, ymin, xmax, ymax)`), associated classes: `labels [N]` and confidence scores: `scores [N]`.| | Describe how the model works: | The model identifies objects in an image by first dividing the image into a grid. For each grid cell, it extracts visual features and simultaneously predicts which objects are present (for example, 'chart' or 'table') and where they are located in that cell, all in a single pass through the image. | | Name the adversely impacted groups this has been tested to deliver comparable outcomes regardless of: | Not Applicable | | Technical Limitations & Mitigation: | The model may not generalize to unknown document types/formats not commonly found on the web. Further fine-tuning might be required for such documents. | | Verified to have met prescribed NVIDIA quality standards: | Yes | | Performance Metrics: | Mean Average Precision, detectionr recall and visual inspection | | Potential Known Risks: | This model may not always detect all elements in a document. | | Licensing & Terms of Use: | Use of this model is governed by NVIDIA Open Model License Agreement and the use of the post-processing scripts are licensed under Apache 2.0. | | Field | Response | | ----- | ----- | | Generatable or reverse engineerable personal data? | No | | Personal data used to create this model? | No | | Was consent obtained for any personal data used? | Not Applicable | | How often is the dataset reviewed? | Before Release | | Is there provenance for all datasets used in training? | Yes | | Does data labeling (annotation, metadata) comply with privacy laws? | Yes | | Is data compliant with data subject requests for data correction or removal, if such a request was made? | No, not possible with externally-sourced data. | | Applicable Privacy Policy | https://www.nvidia.com/en-us/about-nvidia/privacy-policy/ | | Field | Response | | ----- | ----- | | Model Application Field(s): | Object Detection for Retrieval, focused on Enterprise | | Describe the life critical impact (if present). | Not Applicable | | Use Case Restrictions: | Abide by NVIDIA Open Model License Agreement and the use of the post-processing scripts are licensed under Apache 2.0. | | Model and dataset restrictions: | The Principle of least privilege (PoLP) is applied limiting access for dataset generation and model development. Restrictions enforce dataset access during training, and dataset license constraints adhered to. |

β€”
85
26

AMPLIFY_350M

license:mit
82
7

NFT-32B

NaNK
β€”
82
5

Meta-Llama-3.1-8B-Instruct-ONNX-INT4

NaNK
llama
81
5

NFT-7B

NaNK
β€”
81
2

OpenMath-CodeLlama-34b-Python-hf

NaNK
llama
81
1

GR00T-N1-2B-tuned-Nut-Pouring-task

NaNK
β€”
81
0

OpenMath-CodeLlama-70b-Python-hf

NaNK
llama
80
11

Cosmos-Tokenize1-CI16x16-360p

β€”
80
1

OpenMath-CodeLlama-7b-Python-hf

NaNK
llama
79
7

C-RADIOv2-L

β€”
79
2

Audio Flamingo 3 Chat

Audio Flamingo 3: Advancing Audio Intelligence with Fully Open Large Audio-Language Models Description: Audio Flamingo 3 (AF3) is a fully open, state-of-the-art Large Audio-Language Model (LALM) that advances reasoning and understanding across speech, sounds, and music. AF3 builds on previous work with innovations in: - Unified audio representation learning (speech, sound, music) - Flexible, on-demand chain-of-thought reasoning - Long-context audio comprehension (up to 10 minutes) - Multi-turn, multi-audio conversational dialogue (AF3-Chat) - Voice-to-voice interaction (AF3-Chat) Extensive evaluations confirm AF3’s effectiveness, setting new benchmarks on over 20 public audio understanding and reasoning tasks. This model is the chat version of AF3, capable of voice chat and muiti-tun multi-audio dialogue. The non-chat version can be found here Please note that we do not currently provide the streaming TTS-based voice output module. We plan to release it at a later date along with a detailed report. This model is for non-commercial research purposes only. Model Architecture: Audio Flamingo 3 uses AF-Whisper unified audio encoder, MLP-based audio adaptor, Decoder-only LLM backbone (Qwen2.5-7B), and Streaming TTS module (AF3-Chat). Audio Flamingo 3 can take up to 10 minutes of audio inputs. License / Terms of Use The model is released under the NVIDIA OneWay Noncommercial License. Portions of the dataset generation are also subject to the Qwen Research License and OpenAI’s Terms of Use. Use Case Intended for researchers and developers to explore: - Audio question answering and reasoning - Long-context audio comprehension - Interactive sound/music design assistants - Multi-turn (voice) chat Release Date - Github (07/10/2025) via https://github.com/NVIDIA/audio-flamingo - HuggingFace (07/10/2025) via https://huggingface.co/nvidia/audio-flamingo-3 References: [Audio Flamingo 3: Advancing Audio Intelligence with Fully Open Large Audio-Language Models]() Project Page Demo Website Hugging Face Model Architecture: Architecture Type: Transformer Network Architecture: Audio Flamingo 3 AF3 uses: - AF-Whisper unified audio encoder - MLP-based audio adaptor - Decoder-only LLM backbone (Qwen2.5-7B) - Streaming TTS module (AF3-Chat) This model was developed based on NVILA and Qwen-2.5-7B Input: - Input Type: Audio, Text - Input Format: WAV/MP3/FLAC, UTF-8 text - Input Parameters: Audio is Two-Dimensional (2D) and Text is One-Dimensional (1D) - Other Properties Related to Input: - Max Audio Length: 10 Minutes - Max Text Length: 16000 tokens Output: - Output Type: Text (and optional speech) - Text Format: UTF-8 string - Output Parameters: One-Dimensional (1D) - Other Properties Related to Output: - Max Text Length: 1024 tokens - Speech Format: streaming TTS (text-to-speech) waveform Our AI models are designed and/or optimized to run on NVIDIA GPU-accelerated systems (A100/H100). By leveraging NVIDIA’s hardware (e.g. GPU cores) and software frameworks (e.g., CUDA libraries), the model achieves faster training and inference times compared to CPU-only solutions. Software Integration: Runtime Engine: PyTorch / HuggingFace Transformers Supported Hardware: NVIDIA Ampere (A100) NVIDIA Hopper (H100) Training Dataset: AF3 is trained entirely on open-source audio data, organized into four novel, large-scale collections. For each dataset, we mention whether the dataset annotations are collected by Human or they are Automated i.e. generated using AI models. The data collection method noted below applies for all datasets used for training and testing: Data Collection Method: Human Labeling Collection Method: Please see below: General Sound: WavCaps (Automated) MACS (Human) SoundDescs (Human) Clotho-v2 (Human) WavText5K (Human) Clotho-AQA (Human) Open-AQA (Automated) CompA-R (Automated) Salmonn AQA (Automated) Audio Entailment(Automated) CompA (Automated) AudioSet (Human) YouTube-8M (Human) FSD50k (Human) CochlScene (Human) NonSpeech7K (Human) Chime-Home (Human) Sonyc-UST (Human) Music: LP-MusicCaps (Automated) MusicQA (Automated) MusicAVQA (Human) MusicBench (Automated) Mu-LLAMA (Automated) NSynth (Human) FMA (Human) MusDB-HQ (Human) Music4All (Human) Million Song Dataset (Human) Speech: MSP-Podcast (Human) JL-Corpus (Human) MELD (Human) Tess (Human) OMGEmotion (Human) Emov-DB (Human) LibriSpeech (Human) SPGISpeech (Human) TEDLIUM (Human) GigaSpeech (Human) Common Voice 15 (Human) VoxPopuli (Human) VoxCeleb2 (Human) Switchboard (Human) AMI (Human) Mixed: AudioSkills-XL (ours) (Automated) LongAudio-XL (ours) (Automated) AF-Think (ours) (Automated) AF-Chat (ours) (Automated) Testing Dataset: Audio Flamingo 3 is evaluated on the test split of the following datasets. Data Collection Method: Human (for all datasets noted below) Labeling Method: See below ClothoAQA (Human) MusicAVQA (Human) Clotho-v2 (Human) CochlScene (Human) NonSpeech7K (Human) NSynth (Human) AudioCaps (Human) US8K (Human) GTZAN (Human) MMAU (Human) MMAR (Human) Audio Entailment(Automated) CompA-R-test (Automated) MuchoMusic (Automated) Open-AQA(Automated) MusicInstruct (Automated) MusicQA (Automated) CMM Hallucination (Human) IEMOCAP (Human) VoiceBench (Human) OpenAudioBench (Human) SEED (Human) LibriSpeech (Human) SPGISpeech (Human) TEDLIUM (Human) GigaSpeech (Human) Common Voice 15 (Human) VoxPopuli (Human) LongAudioBench (ours) (Automated) AF-Chat-test (ours) (Human) Engine: HuggingFace Transformers Test Hardware: NVIDIA A100 80 GB Ethical Considerations: NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse. Please report security vulnerabilities or NVIDIA AI Concerns here. Acknowledgements Built with Qwen, NVILA and the open audio-ML community.

NaNK
β€”
78
43

OpenMath-Llama-2-70b-hf

NaNK
llama
78
2

OpenMath-CodeLlama-13b-Python-hf

NaNK
llama
78
1

NeKo-v0-post-correction

β€”
78
1

OpenCodeReasoning-Nemotron-32B-IOI

NaNK
license:apache-2.0
77
25

esm2_t30_150M_UR50D

> [!NOTE] > This model has been optimized using NVIDIA's TransformerEngine > library. Slight numerical differences may be observed between the original model and the optimized > version. For instructions on how to install TransformerEngine, please refer to the > official documentation. ESM-2 is a state-of-the-art protein model trained on a masked language modelling objective. It predicts protein structures from amino acid sequences, leveraging a transformer-based architecture for accurate 3D modeling. It is suitable for fine-tuning on a wide range of tasks that take protein sequences as input. This version of the ESM-2 model is optimized with NVIDIA's TransformerEngine library. It is based on the original ESM-2 model from Facebook Research, and (within numerical precision) has identical weights and outputs. This model is ready for commercial/non-commercial use. This model is not owned or developed by NVIDIA. This model has been developed and built to a third-party's requirements for this application and use case; see link to Non-NVIDIA Model Card ESM-2 Model Card. Protein structure prediction, specifically predicting 3D protein structures from amino acid sequences. Hugging Face 07/29/2025 via https://huggingface.co/nvidia/esm2t30150MUR50D - Evolutionary-scale prediction of atomic level protein structure with a language model - detailed information on the model architecture and training data, please refer to the accompanying [paper]. - Demo notebooks (PyTorch, TensorFlow) which demonstrate how to fine-tune ESM-2 models on your tasks of interest. Architecture Type: Transformer Network Architecture: ESM-2 This model was developed based on: ESM-2 Number of model parameters: 1.5 x 10^8 Input Type: Text (Protein Sequences) Input Format: String Input Parameters: One-Dimensional (1D) Other Properties Related to Input: Protein sequence represented as a string of canonical amino acids, of maximum length 1022. Longer sequences are automatically truncated to this length. Output Type: Embeddings (Amino acid and sequence-level) Output Format: Vector Output Parameters: One-Dimensional (1D) Other Properties Related to Output: Numeric vector with floating-point values corresponding to an embedding for each amino acid in the input protein sequence. Maximum output length is 1022 embeddings - one embedding vector per amino acid. Our AI models are designed and/or optimized to run on NVIDIA GPU-accelerated systems. By leveraging NVIDIA’s hardware (e.g. GPU cores) and software frameworks (e.g., CUDA libraries), the model achieves faster training and inference times compared to CPU-only solutions. Supported Hardware Microarchitecture Compatibility: The integration of foundation and fine-tuned models into AI systems requires additional testing using use-case-specific data to ensure safe and effective deployment. Following the V-model methodology, iterative testing and validation at both unit and system levels are essential to mitigate risks, meet technical and functional requirements, and ensure compliance with safety and ethical standards before deployment. Model Version: This model features the following version/checkpoints: Several ESM-2 checkpoints are available with varying sizes. Larger sizes have better accuracy, but require more memory and time to train: | Checkpoint name | Num layers | Num parameters | | ------------------------------------------------------------------------ | ---------- | -------------- | | esm2t4815BUR50D | 48 | 15B | | esm2t363BUR50D | 36 | 3B | | esm2t33650MUR50D | 33 | 650M | | esm2t30150MUR50D | 30 | 150M | | esm2t1235MUR50D | 12 | 35M | | esm2t68MUR50D | 6 | 8M | Properties (Quantity, Dataset Descriptions, Sensor(s)): UniRef90 clusters are generated from the UniRef100 seed sequences with a 90% sequence identity threshold using the MMseqs2 algorithm. The seed sequences are the longest members of the UniRef100 cluster. However, the longest sequence is not always the most informative. There is often more biologically relevant information and annotation (name, function, cross-references) available on other cluster members. All the proteins in each cluster are ranked to facilitate the selection of a biologically relevant representative for the cluster. Properties: UniRef50 clusters are generated from the UniRef90 seed sequences with a 50% sequence identity threshold using the MMseqs2 algorithm. The seed sequences are the longest members of the UniRef90 cluster. However, the longest sequence is not always the most informative. There is often more biologically relevant information and annotation (name, function, cross-references) available on other cluster members. All the proteins in each cluster are ranked to facilitate the selection of a biologically relevant representative for the cluster. Link: Continuous Automated Model Evaluation (CAMEO) Properties: The data is collected by taking sequences of protein structures that are about to be released weekly by the Protein Data Bank (PDB). These sequences are sent as "blind targets" to participating protein structure prediction servers, which then return their predictions. Link: CASP14 (Critical Assessment of Methods of Protein Structure Prediction) Properties: The data for CASP14 targets is collected from protein structures that are newly solved by experimental structural biologists. The CASP organizers receive the amino acid sequences of these proteins before their full, three-dimensional structures are publicly released in the Protein Data Bank (PDB). They then provide these sequences to participating research groups and servers, who must submit their predicted structures within a specific time frame. NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse. Users are responsible for ensuring the physical properties of model-generated molecules are appropriately evaluated and comply with applicable safety regulations and ethical standards. Please report model quality, risk, security vulnerabilities or NVIDIA AI Concerns here.

license:mit
76
0

nemotron-ocr-v1

β€”
75
20

Qwen3 Nemotron 32B GenRM Principle

Qwen3-Nemotron-32B-GenRM-Principle is a large language model that leverages Qwen3-32B as the foundation and is fine-tuned to predict the extent to which LLM-generated responses fulfils user-specified principles. Given a conversation with multiple turns between user and assistant and a user-specified principle, it rates the quality of the final assistant turn using a reward score. For the same prompt, a response with higher reward score fulfils the user-specified principle to a larger extent than another response with a lower reward score. As of 24 Sep 2025, this model achieves JudgeBench of 81.4% and RM-Bench of 86.2% which make it the top Generative Reward Models for both benchmarks. See details on how this model was trained at https://arxiv.org/abs/2509.21319 GOVERNING TERMS: Use of this model is governed by the NVIDIA Open Model License. Qwen3-Nemotron-32B-GenRM-Principle labels an LLM-generated response to a user query and a user-specified principle with a reward score. HuggingFace 10/27/2025 via https://huggingface.co/nvidia/Qwen3-Nemotron-32B-GenRM-Principle RLBFF HelpSteer3 HelpSteer3-Preference HelpSteer2-Preference SteerLM method HelpSteer HelpSteer2 As of 24 Sep 2025, our reward model is the top performing generative reward models on RM-Bench, an improved variant of RewardBench for evaluating Reward Models in Chat, Math, Code and Safety. | Model | Chat | Math | Code | Safety | Easy | Normal | Hard | Overall RM-Bench| |:-----------------------------|:------|:------|:------|:------|:------|:------|:------|:------| |Qwen3-Nemotron-32B-GenRM-Principle | 80.4 | 92.0 | 77.0 | 95.5 | 88.9 | 86.4 | 83.4 |86.2 | |Llama-33-Nemotron-Super-49B-GenRM | 73.7 | 91.4 | 75.0 | 90.6 | 91.2 | 85.7 | 71.2 | 82.7 | |RewardAnything-8B-v1 | 76.7 | 90.3 | 75.2 | 90.2 | 85.6 | 82.2 | 81.5 | 83.1 | |RM-R1-DeepSeek-Distilled-Qwen-32B | 74.2 | 91.8 | 74.1 | 95.4 | 89.5 | 85.4 | 76.7 | 83.9 | |R3-QWEN3-14B-LORA-4K | 76.5 | 92.4 | 78.7 | 91.9 | 91.4 | 86.2 | 77.1 | 84.9 | JudgeBench LeaderBoard As of 24 Sep 2025, our reward model is the top performing models on JudgeBench, a popular benchmark for evaluating LLM-as-a-judge applications relating to General Knowledge, Logical Reasoning, Math and Coding. | Model | Knowl.| Reason.| Math | Code | Overall JudgeBench | |:-----------------------------|:------|:------|:------|:------|:------| | Qwen3-Nemotron-32B-GenRM-Principle | 74.6 | 85.7 | 85.7 | 90.5 | 81.4 | | Llama-33-Nemotron-Super-49B-GenRM | 71.4 | 73.5 | 87.5 | 76.2 | 75.1 | | RewardAnything-8B-v1 | 61.0 | 57.1 | 73.2 | 66.7 | 62.6 | | RM-R1-DeepSeek-Distilled-Qwen-32B | 56.5 | 66.3 | 85.7 | 73.8 | 66.0| | R3-QWEN3-14B-LORA-4K | 50.0 | 64.3 | 76.8 | 71.4 | 60.9 | Model Architecture: Architecture Type: Transformer Network Architecture: Qwen3 We developed this model using Qwen/Qwen3-32B as its foundation. This model contains 32 billion parameters. Input: Input Type(s): Text Input Format: String Input Parameters: One Dimensional (1D) Other Properties Related to Input: Max of 128k tokens (but trained only on conversations up to 8K tokens) Output: Output Type(s): Float Output Format: One Single Float Output Parameters: One-Dimensional (1D) Other Properties Related to Output: The float value represents the extent to which the response fulfils the user-specified principle, with a higher value representing greater fulfilment. Our AI models are designed and/or optimized to run on NVIDIA GPU-accelerated systems. By leveraging NVIDIA’s hardware (e.g. GPU cores) and software frameworks (e.g., CUDA libraries), the model achieves faster training and inference times compared to CPU-only solutions. Software Integration: Runtime Engine(s): [NeMo-RL - 0.3] Supported Hardware Microarchitecture Compatibility: NVIDIA Ampere NVIDIA Hopper NVIDIA Turing You can use the model using HuggingFace Transformers library with 1 or more 80GB GPUs (NVIDIA Ampere or newer) with at least 70GB of free disk space to accomodate the download. Alternatively, you can use vLLM for accelerated inference. This code has been tested on Transformers v4.57.0, torch v2.3.0a0+40ec155e58.nv24.3 and 1 H100 80GB GPUs, but any setup that supports Qwen/Qwen3-32B should support this model as well. If you run into problems, you can consider doing pip install -U transformers. Dataset Name: HelpSteer3 Dataset Link: https://huggingface.co/datasets/nvidia/HelpSteer3 Data Collection Method by dataset [Hybrid: Human, Synthetic] Properties: 77,564 prompt-responses, each annotated with up to 3 annotations of free-text feedback (each being 50-250 words long) elaborating upon the overall helpfulness of the response. Dataset Name: HelpSteer3 Dataset Link: https://huggingface.co/datasets/nvidia/HelpSteer3 Data Collection Method by dataset [Hybrid: Human, Synthetic] Properties: 4,078 prompt-responses, each annotated with up to 3 annotations of free-text feedback (each being 50-250 words long) elaborating upon the overall helpfulness of the response. Dataset Name: RM-Bench Dataset Link: https://huggingface.co/datasets/THU-KEG/RM-Bench Data Collection Method by dataset [Hybrid: Human, Synthetic] Labeling Method by dataset [Hybrid: Human, Synthetic] Properties: 1,327 prompts, each with three pairs of responses as well as preferences between the pair of responses. Dataset Name: JudgeBench Dataset Link: https://huggingface.co/datasets/ScalerLab/JudgeBench Data Collection Method by dataset [Hybrid: Human, Synthetic] Labeling Method by dataset [Hybrid: Human, Synthetic] Properties: 350 prompts, each with a pair of responses as well as preferences between the pair of responses. Ethical Considerations: NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their supporting model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse. For more detailed information on ethical considerations for this model, please see the Model Card++ Explainability, Bias, Safety and Security, and Privacy Subcards. Please report security vulnerabilities or NVIDIA AI Concerns here. If you find this model useful, please cite the following work:

NaNK
β€”
75
3

Llama-3.1-8B-Medusa-FP8

NaNK
llama
74
7

MambaVision-L2-1K

β€”
72
13

Qwen-3-Nemotron-32B-Reward

NaNK
β€”
71
16

Cosmos-Tokenize1-DI8x8-360p

β€”
71
1

NV-CodonFM-Encodon-TE-80M-v1

This is a TransformerEngine accelerated version of CodonFM. The code for this model can be found within bionemo-recipes, and this checkpoint is an exact parameter match to the original research work which can be found at the official CodonFM Github repository. Description: CodonFM predicts masked codons in mRNA sequences from codon-level context to enable variant effect interpretation and codon optimization as part of NVIDIA’s CodonFM Encodon family. For this family of models we have 4 models. The first set of 3 models are with randomly masked tokens with 80 million, 600 million and 1 Billion parameter. The fourth model is with 1 Billion parameters but is trained with codon frequency aware masking. An additional set of accelerated checkpoints also available for use. This model is ready for commercial/non-commercial use. Governing Terms: Use of this model is governed by the NVIDIA Open Model License Agreement. - Optimized Expression and Stability for mRNA design: To design mRNAs with codon usage patterns that enhance translation efficiency, protein yield, and transcript stability across specific cell types and tissues. - Variant Interpretation for pathogenicity: To identify and prioritize functional synonymous and missense variants in the context of diseases. Release Date: Github 10/27/2025 via https://github.com/NVIDIA-Digital-Bio/CodonFM Hugging Face 10/27/2025 via: - Random Mask - https://huggingface.co/nvidia/NV-CodonFM-Encodon-1B-v1 - https://huggingface.co/nvidia/NV-CodonFM-Encodon-600M-v1 - https://huggingface.co/nvidia/NV-CodonFM-Encodon-80M-v1 - https://huggingface.co/nvidia/NV-CodonFM-Encodon-TE-1B-v1 - https://huggingface.co/nvidia/NV-CodonFM-Encodon-TE-600M-v1 - https://huggingface.co/nvidia/NV-CodonFM-Encodon-TE-80M-v1 - Codon Frequency Aware Mask - https://huggingface.co/nvidia/NV-CodonFM-Encodon-Cdwt-1B-v1 - https://huggingface.co/nvidia/NV-CodonFM-Encodon-TE-Cdwt-1B-v1 NGC 10/27/2025 via https://catalog.ngc.nvidia.com/orgs/nvidia/teams/clara/models/nvcodonfmencodon Model Architecture: The NVIDIA CodonFM Encodon family features Transformer-based architectures tailored for codon-level sequence modeling in mRNA. Each model applies a masked language modeling (MLM) objective to predict masked codons from surrounding context of 2046 codons, enabling genome-scale codon optimization and synonymous variant interpretation. | Model Name | Parameters | |-------------|-------------:| | Encodon-80M | 7.68 Γ— 10⁷ | | Encodon-600M | 6.09 Γ— 10⁸ | | Encodon-1B | 9.11 Γ— 10⁸ | | Encodon-Cdwt-1B | 9.11 Γ— 10⁸ | Input: Input Type(s): Text (mRNA Sequence) Input Format: fasta files converted to memmaps Input Parameters: 1D Other Properties Related to Input: mRNA sequence represented as a string of codons, of maximum length 2046. Longer sequences are automatically truncated to this length Output: Output Type(s): mRNA Sequence Output Format: Text Output Parameters: 2D Other Properties Related to Output: Numeric 2D tensor with float-point values representing probabilities of a given codon at a give position within the sequence Our AI models are designed and/or optimized to run on NVIDIA GPU-accelerated systems. By leveraging NVIDIA’s hardware (e.g. GPU cores) and software frameworks (e.g., CUDA libraries), the model achieves faster training and inference times compared to CPU-only solutions. Software Integration: Runtime Engine(s): PyTorch - 2.5.1 Supported Hardware Microarchitecture Compatibility: NVIDIA Ampere NVIDIA Hopper The integration of foundation and fine-tuned models into AI systems requires additional testing using use-case-specific data to ensure safe and effective deployment. Following the V-model methodology, iterative testing and validation at both unit and system levels are essential to mitigate risks, meet technical and functional requirements, and ensure compliance with safety and ethical standards before deployment. Model Version(s): - NV-CodonFM-Encodon-80M-v1 - NV-CodonFM-Encodon-600M-v1 - NV-CodonFM-Encodon-1B-v1 - NV-CodonFM-Encodon-Cdwt-1B-v1 - NV-CodonFM-Encodon-TE-80M-v1 - NV-CodonFM-Encodon-TE-600M-v1 - NV-CodonFM-Encodon-TE-1B-v1 - NV-CodonFM-Encodon-TE-Cdwt-1B-v1 Properties: Coding sequences from the NCBI RefSeq database (release 2024-04) were used for training. A total of >130M non-viral protein-coding sequences from >22,000 species were included, comprising >2,000 eukaryotes. Sequences not divisible by three or containing ambiguous bases were removed. Taxonomy-level deduplication using MMSeqs eliminated redundant entries, and coding sequences from bacteria pathogenic to humans were excluded. The resulting dataset was partitioned into nine species groups: primates, archaea, bacteria, fungi, invertebrate, plant, protozoa, non-primate mammals, and non-mammal vertebrates. Sequences were clustered by similarity and then split into training and validation sets with stratification across groups to ensure balanced representation. Encodon models use codon-level tokenization, processing input sequences of up to 2,046 codons. Each model was trained using a masked language modeling (MLM) objective, where randomly masked codons were predicted from their context. The Encodon pretraining dataset was sorted based on sequence taxonomy to maintain species balance, and sequence subsets could be resampled dynamically. Non-Audio, Image, Text Training Data Size: NCBI RefSeq genomes FTP directory currently contains over 395,000 genomes totaling approximately 3.3 terabases (Tb) Data Collection Method for all data Dataset: Automatic/Sensors | Link | Properties | |------|--------------------------------------------------| | ClinVar Variant Interpretation | This task involves classifying genetic variants from ClinVar, a publicly available database that aggregates information about the clinical significance of human genetic variants, into pathogenic or benign categories based on their coding sequence context| | Denovo variant classification | This task uses variants from the Deciphering Developmental Disorders (DDD) and autism spectrum disorder (ASD) cohort study, which catalogs genetic mutations linked to rare pediatric and developmental diseases, to evaluate classification of pathogenic versus benign variants based on coding sequence context.| | mRNA Translation Efficiency | This task predicts ribosome profiling signal intensity along coding sequences, evaluating how well models capture translation efficiency and codon-level regulation from sequence context.| | Protein Abundance | This task predicts fluorescent protein expression levels (mRFP) from coding sequences, testing how accurately models capture codon-dependent effects on translation efficiency and protein abundance. | Ethical Considerations: NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse. For more detailed information on ethical considerations for this model, please see the Model Card++ Bias, Explainability, Safety & Security, and Privacy Subcards. Users are responsible for ensuring the physical properties of model-generated molecules are appropriately evaluated and comply with applicable safety regulations and ethical standards. Please report model quality, risk, security vulnerabilities or NVIDIA AI Concerns here.

β€”
71
0

Cosmos-Predict1-7B-Video2World

NaNK
β€”
70
2

Llama-3_3-Nemotron-Super-49B-GenRM

NaNK
llama3.3
69
17

nemotron-table-structure-v1

β€”
69
13

Cosmos-Tokenize1-DI16x16-360p

β€”
69
0

NV-Retriever-v1

β€”
68
21

Cosmos-Predict1-14B-Text2World

NaNK
β€”
68
5

C-RADIOv2-g

β€”
67
12

Llama-3_1-Nemotron-Ultra-253B-CPT-v1

NaNK
llama-3
67
6

GPT-2B-001

NaNK
license:cc-by-4.0
66
193

nemo-megatron-gpt-1.3B

NaNK
license:cc-by-4.0
66
33

Llama 3.3 Nemotron 70B Reward Principle

Llama-3.3-Nemotron-70B-Reward-Principle is a large language model that leverages Meta-Llama-3.3-70B-Instruct as the foundation and is fine-tuned to predict the extent to which LLM-generated responses fulfils user-specified principles. Given a conversation with multiple turns between the user and assistant (of up to 4,096 tokens), and a user-specified principle, it rates the quality of the final assistant turn using a reward score. For the same prompt, a response with higher reward score fulfils the user-specified principle to a larger extent than another response with a lower reward score. As of 24 Sep 2025, this model achieves JudgeBench of 76.3% and RM-Bench of 83.6% which make it among the top Scalar Reward Models for both benchmarks. See details on how this model was trained at https://arxiv.org/abs/2509.21319 GOVERNING TERMS: Use of this model is governed by the NVIDIA Open Model License . Additional Information: Llama 3.3 Community License Agreement. Built with Llama. Llama-3.3-Nemotron-70B-Reward-Principle labels an LLM-generated response to a user query and a user-specified principle with a reward score. HuggingFace 10/27/2025 via https://huggingface.co/nvidia/Llama-3.3-Nemotron-70B-Reward-Principle RLBFF HelpSteer3 HelpSteer3-Preference HelpSteer2-Preference SteerLM method HelpSteer HelpSteer2 The future of AI: Built with Llama Meta's Llama 3.3 Webpage Meta's Llama 3.3 Model Card As of 24 Sep 2025, our reward models are top performing reward models on RM-Bench, an improved variant of RewardBench for evaluating Reward Models in Chat, Math, Code and Safety. | Model | Chat | Math | Code | Safety | Easy | Normal | Hard | Overall RM-Bench| |:-----------------------------|:------|:------|:------|:------|:------|:------|:------|:------| | Llama-3.3-Nemotron-70B-Reward-Principle | 85.3 | 81.9 | 70.4 | 96.9 | 85.5 | 84.9 | 80.5 | 83.6 | | Llama-3.3-Nemotron-70B-Reward-Multilingual | 86.2 | 82.4 | 66.8 | 94.1 | 86.5 | 85.4 | 80.0 | 82.4 | | Llama-3.3-Nemotron-70B-Reward |75.4 | 84.5 | 69.3 | 90.4 | 92.1 | 85.7 |71.1 | 79.9| | Llama-3.1-Nemotron-70B-Reward | 70.7 | 64.3 | 57.4 | 90.3 | 92.2 | 76.8 | 48.0 | 70.7 | | Skywork-Reward-Gemma-2-27B | 71.8 | 59.2 | 56.6 | 94.3 | 89.6 | 75.4 | 50.0 | 70.5 | | Skywork-Reward-Llama-3.1-8B | 69.5 | 60.6 | 54.5 |95.7 | 89.0 | 74.7 | 46.6 | 70.1 | Note that Skywork-Reward-Llama-3.1-8B was the best performing reward model reported on RM-Bench and we evaluated all other models. As of 24 Sep 2025, our reward models are the top performing Scalar reward models on JudgeBench, a popular benchmark for evaluating LLM-as-a-judge applications relating to General Knowledge, Logical Reasoning, Math and Coding. | Model | Knowl.| Reason.| Math | Code | Overall JudgeBench | |:-----------------------------|:------|:------|:------|:------|:------| | Llama-3.3-Nemotron-70B-Reward-Principle |74.0 |74.5 | 82.1 | 81.0 |76.3 | | Llama-3.3-Nemotron-70B-Reward | 70.8 |76.5 | 82.1 | 66.7 |73.7 | | Llama-3.3-Nemotron-70B-Reward-Multilingual |66.2 | 71.4 | 82.1 |59.5 | 69.4| | Llama-3.1-Nemotron-70B-Reward |62.3 | 72.5 | 76.8 | 57.1 | 66.9 | | Skywork-Reward-Gemma-2-27B | 59.7 | 66.3 | 83.9 | 50.0 | 64.3 | | Skywork-Reward-Llama-3.1-8B | 59.1 | 64.3 | 76.8 | 50.0 | 62.3 | Note that Skywork-Reward-Gemma-2-27B was the best performing reward model reported on JudgeBench and we evaluated all other models. Model Architecture: Architecture Type: Transformer Network Architecture: Llama 3.3 We developed this model using Llama-3.3-70B-Instruct as its foundation. This model contains 70 billion parameters. Input: Input Type(s): Text Input Format: String Input Parameters: One Dimensional (1D) Other Properties Related to Input: Max of 128k tokens (but trained only on conversations up to 4K tokens) Output: Output Type(s): Float Output Format: One Single Float Output Parameters: One-Dimensional (1D) Other Properties Related to Output: The float value represents the extent to which the response fulfils the user-specified principle, with a higher value representing greater fulfilment. Our AI models are designed and/or optimized to run on NVIDIA GPU-accelerated systems. By leveraging NVIDIA’s hardware (e.g. GPU cores) and software frameworks (e.g., CUDA libraries), the model achieves faster training and inference times compared to CPU-only solutions. Software Integration: Runtime Engine(s): [NeMo - 24.05.llama.3.1] Supported Hardware Microarchitecture Compatibility: NVIDIA Ampere NVIDIA Hopper NVIDIA Turing You can use the model using HuggingFace Transformers library with 2 or more 80GB GPUs (NVIDIA Ampere or newer) with at least 150GB of free disk space to accomodate the download. This code has been tested on Transformers v4.45.0, torch v2.3.0a0+40ec155e58.nv24.3 and 2 H100 80GB GPUs, but any setup that supports meta-llama/Llama-3.1-70B-Instruct should support this model as well. If you run into problems, you can consider doing pip install -U transformers. Dataset Name: HelpSteer3 Dataset Link: https://huggingface.co/datasets/nvidia/HelpSteer3 Data Collection Method by dataset [Hybrid: Human, Synthetic] Properties: 77,564 prompt-responses, each annotated with up to 3 annotations of free-text feedback (each being 50-250 words long) elaborating upon the overall helpfulness of the response. Dataset Name: HelpSteer3 Dataset Link: https://huggingface.co/datasets/nvidia/HelpSteer3 Data Collection Method by dataset [Hybrid: Human, Synthetic] Properties: 4,078 prompt-responses, each annotated with up to 3 annotations of free-text feedback (each being 50-250 words long) elaborating upon the overall helpfulness of the response. Dataset Name: RM-Bench Dataset Link: https://huggingface.co/datasets/THU-KEG/RM-Bench Data Collection Method by dataset [Hybrid: Human, Synthetic] Labeling Method by dataset [Hybrid: Human, Synthetic] Properties: 1,327 prompts, each with three pairs of responses as well as preferences between the pair of responses. Dataset Name: JudgeBench Dataset Link: https://huggingface.co/datasets/ScalerLab/JudgeBench Data Collection Method by dataset [Hybrid: Human, Synthetic] Labeling Method by dataset [Hybrid: Human, Synthetic] Properties: 350 prompts, each with a pair of responses as well as preferences between the pair of responses. Ethical Considerations: NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their supporting model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse. For more detailed information on ethical considerations for this model, please see the Model Card++ Explainability, Bias, Safety & Security, and Privacy Subcards. Please report security vulnerabilities or NVIDIA AI Concerns here. If you find this model useful, please cite the following work:

NaNK
llama
65
2

Audio2Emotion V3.0

β€”
64
19

Cosmos-Transfer1-7B-4KUpscaler

NaNK
β€”
64
7

RADIO-B

β€”
62
3

instruction-data-guard

β€”
61
17

Lyra

Lyra: Generative 3D Scene Reconstruction via Video Diffusion Model Self-Distillation Sherwin Bahmani, Tianchang Shen, Jiawei Ren, Jiahui Huang, Yifeng Jiang, Haithem Turki, Andrea Tagliasacchi, David B. Lindell, Zan Gojcic, Sanja Fidler, Huan Ling, Jun Gao, Xuanchi Ren Description: Lyra is a 3D / 4D feed-forward 3D Gaussian Splatting (3DGS) reconstruction model. We achieve this by distilling a pre-trained video diffusion model into a feed-forward 3D Gaussian Splatting (3DGS) generator. Lyra circumvents the need for 3D datasets or model fine-tuning by leveraging the inherent 3D consistency of video outputs to align 2D renderings with a 3DGS decoder. By using the generated synthetic multi-view data, we train a decoder to operate directly in the video model's latent space to produce 3D Gaussians. This framework enables real-time rendering and establishes a new state of the art for 3D / 4D scene generation, supporting both single-view and video inputs. This model is released under the NVIDIA Open Model License. For a custom license, please contact [email protected]. Important Note: If you bypass, disable, reduce the efficacy of, or circumvent any technical limitation, safety guardrail or associated safety guardrail hyperparameter, encryption, security, digital rights management, or authentication mechanism contained in the Model, your rights under NVIDIA Open Model License Agreement will automatically terminate. Use Case: This model is intended for researchers developing 3D / 4D reconstruction techniques, and it allows them to generate a 3D / 4D scene from a single image. Release Date: Github 09/23/2025 via https://github.com/nv-tlabs/lyra References(s): Lyra: Generative 3D Scene Reconstruction via Video Diffusion Model Self-Distillation Architecture Type: Convolutional Neural Network (CNN), Transformer Network Architecture: Transformer This model was developed based on Cosmos Predict 1. Number of model parameters: 32.75M Input: Input Type(s): Camera Parameters, Image Input Format(s): One-Dimensional (1D) Array of Camera Poses, Two-Dimensional (2D) Array of Images. Input Parameters: Camera Poses (1D), Images (2D) Other Properties Related to Input: The input image should be 720 1080 resolution, and we recommend using 121 frames for the camera parameters. Output: Output Type(s): Three-Dimensional (3D) Gaussian Scene Output Format: Point cloud file (e.g., .ply) Output Parameters: A set of M 3D Gaussians, where each Gaussian is defined by a collection of attributes. Other Properties Related to Output: The output is not a sequence of 2D images but a set of 3D primitives used to render a scene. For each of the M Gaussians, the key properties are: - Position (Mean): A 3D vector (x,y,z) defining the center of the Gaussian in 3D space. - Covariance (Shape & Orientation): This defines the ellipsoid's shape and rotation. It's typically stored as a 3D scale vector (sx, sy, sz) and a 4D rotation quaternion (rw, rx, ry, rz). - Color: A 3-vector (R,G,B) representing the color of the Gaussian. This can also be represented by more complex Spherical Harmonics (SH) coefficients for view-dependent color effects. - Opacity: A scalar value (Ξ±) that controls the transparency of the Gaussian. Our AI models are designed and/or optimized to run on NVIDIA GPU-accelerated systems A100 and H100. By leveraging NVIDIA’s hardware (e.g. GPU cores) and software frameworks (e.g., CUDA libraries), the model achieves faster training and inference times compared to CPU-only solutions. Software Integration: Runtime Engine(s): Cosmos-Predict1 Supported Hardware Microarchitecture Compatibility: NVIDIA Ampere NVIDIA Blackwell NVIDIA Hopper The integration of foundation and fine-tuned models into AI systems requires additional testing using use-case-specific data to ensure safe and effective deployment. Following the V-model methodology, iterative testing and validation at both unit and system levels are essential to mitigate risks, meet technical and functional requirements, and ensure compliance with safety and ethical standards before deployment. Ethical Considerations: NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse. For more detailed information on ethical considerations for this model, please see the Model Card++ Bias, Explainability, Safety & Security, and Privacy Subcards. Users are responsible for model inputs and outputs. Users are responsible for ensuring safe integration of this model, including implementing guardrails as well as other safety mechanisms, prior to deployment. Please report model quality, risk, security vulnerabilities or NVIDIA AI Concerns here. We value you, the datasets, the diversity they represent, and what we have been entrusted with. This model and its associated data have been: - Verified to comply with current applicable disclosure laws, regulations, and industry standards. - Verified to comply with applicable privacy labeling requirements. - Annotated to describe the collector/source (NVIDIA or a third-party). - Characterized for technical limitations. - Reviewed to ensure proper disclosure is accessible to, maintained for, and in compliance with NVIDIA data subjects and their requests. - Reviewed before release. - Tagged for known restrictions and potential safety implications. Field | Response :---------------------------------------------------------------------------------------------------|:--------------- Participation considerations from adversely impacted groups protected classes in model design and testing: | None Measures taken to mitigate against unwanted bias: | None Field | Response :------------------------------------------------------------------------------------------------------|:--------------------------------------------------------------------------------- Intended Task/Domain: | Novel view synthesis, video generation Model Type: | Transformer Intended Users: | Physical AI developers. Output: | Three-Dimensional (3D) Gaussian Scene. Describe how the model works: | We take a video as input, encode it using Cosmos tokenizer to Cosmos latent space. We then use our model which is a transformer-like architecture to lift the latent to 3D Gaussians. Name the adversely impacted groups this has been tested to deliver comparable outcomes regardless of: | Not Applicable. Technical Limitations & Mitigation: | The proposed method relies only on synthetic data generated by Cosmos for training, which might limit the generalization ability if the target scenario is not in the pre-generated SDG dataset. Verified to have met prescribed NVIDIA quality standards: | Yes Performance Metrics: | Qualitative and Quantitative Evaluation including PSNR, SSIM, LPIPS metrics. Potential Known Risks: | This model is only trained on synthetic data generated by Cosmos, and may inaccurately reconstruct a out-of-distribution video that is not in the synthetic data domain. Licensing: | NVIDIA Open Model License Field | Response :----------------------------------------------------------------------------------------------------------------------------------|:----------------------------------------------- Generatable or reverse engineerable personal data? | No Personal data used to create this model? | [None Known] Is there provenance for all datasets used in training? | Yes How often is dataset reviewed? | Before Release Does data labeling (annotation, metadata) comply with privacy laws? | Not Applicable Is data compliant with data subject requests for data correction or removal, if such a request was made? | No, not possible with externally-sourced data. Applicable Privacy Policy | https://www.nvidia.com/en-us/about-nvidia/privacy-policy/ Field | Response :---------------------------------------------------|:---------------------------------- Model Application Field(s): | World Generation Describe the life critical impact (if present). | Not Applicable Use Case Restrictions: | Abide by NVIDIA Open Model License Model and dataset restrictions: | The Principle of least privilege (PoLP) is applied limiting access for dataset generation and model development. Restrictions enforce dataset access during training, and dataset license constraints adhered to.

β€”
60
28

RADIO-g

β€”
59
7

Cosmos-Predict1-14B-Video2World

NaNK
β€”
59
4

RADIOv2-bigG

β€”
58
1

NV-CodonFM-Encodon-600M-v1

The code for using the CodonFM model checkpoint is available in the official Github repository. Description: CodonFM predicts masked codons in mRNA sequences from codon-level context to enable variant effect interpretation and codon optimization as part of NVIDIA’s CodonFM Encodon family. For this family of models we have 4 models. The first set of 3 models are with randomly masked tokens with 80 million, 600 million and 1 Billion parameter. The fourth model is with 1 Billion parameters but is trained with codon frequency aware masking. An additional set of accelerated checkpoints also available for use. This model is ready for commercial/non-commercial use. Governing Terms: Use of this model is governed by the NVIDIA Open Model License Agreement. - Optimized Expression and Stability for mRNA design: To design mRNAs with codon usage patterns that enhance translation efficiency, protein yield, and transcript stability across specific cell types and tissues. - Variant Interpretation for pathogenicity: To identify and prioritize functional synonymous and missense variants in the context of diseases. Release Date: Github 10/27/2025 via https://github.com/NVIDIA-Digital-Bio/CodonFM Hugging Face 10/27/2025 via: - Random Mask - https://huggingface.co/nvidia/NV-CodonFM-Encodon-1B-v1 - https://huggingface.co/nvidia/NV-CodonFM-Encodon-600M-v1 - https://huggingface.co/nvidia/NV-CodonFM-Encodon-80M-v1 - https://huggingface.co/nvidia/NV-CodonFM-Encodon-TE-1B-v1 - https://huggingface.co/nvidia/NV-CodonFM-Encodon-TE-600M-v1 - https://huggingface.co/nvidia/NV-CodonFM-Encodon-TE-80M-v1 - Codon Frequency Aware Mask - https://huggingface.co/nvidia/NV-CodonFM-Encodon-Cdwt-1B-v1 - https://huggingface.co/nvidia/NV-CodonFM-Encodon-TE-Cdwt-1B-v1 NGC 10/27/2025 via https://catalog.ngc.nvidia.com/orgs/nvidia/teams/clara/models/nvcodonfmencodon Model Architecture: The NVIDIA CodonFM Encodon family features Transformer-based architectures tailored for codon-level sequence modeling in mRNA. Each model applies a masked language modeling (MLM) objective to predict masked codons from surrounding context of 2046 codons, enabling genome-scale codon optimization and synonymous variant interpretation. | Model Name | Parameters | |-------------|-------------:| | Encodon-80M | 7.68 Γ— 10⁷ | | Encodon-600M | 6.09 Γ— 10⁸ | | Encodon-1B | 9.11 Γ— 10⁸ | | Encodon-Cdwt-1B | 9.11 Γ— 10⁸ | Input: Input Type(s): Text (mRNA Sequence) Input Format: fasta files converted to memmaps Input Parameters: 1D Other Properties Related to Input: mRNA sequence represented as a string of codons, of maximum length 2046. Longer sequences are automatically truncated to this length Output: Output Type(s): mRNA Sequence Output Format: Text Output Parameters: 2D Other Properties Related to Output: Numeric 2D tensor with float-point values representing probabilities of a given codon at a give position within the sequence Our AI models are designed and/or optimized to run on NVIDIA GPU-accelerated systems. By leveraging NVIDIA’s hardware (e.g. GPU cores) and software frameworks (e.g., CUDA libraries), the model achieves faster training and inference times compared to CPU-only solutions. Software Integration: Runtime Engine(s): PyTorch - 2.5.1 Supported Hardware Microarchitecture Compatibility: NVIDIA Ampere NVIDIA Hopper The integration of foundation and fine-tuned models into AI systems requires additional testing using use-case-specific data to ensure safe and effective deployment. Following the V-model methodology, iterative testing and validation at both unit and system levels are essential to mitigate risks, meet technical and functional requirements, and ensure compliance with safety and ethical standards before deployment. Model Version(s): - NV-CodonFM-Encodon-80M-v1 - NV-CodonFM-Encodon-600M-v1 - NV-CodonFM-Encodon-1B-v1 - NV-CodonFM-Encodon-Cdwt-1B-v1 - NV-CodonFM-Encodon-TE-80M-v1 - NV-CodonFM-Encodon-TE-600M-v1 - NV-CodonFM-Encodon-TE-1B-v1 - NV-CodonFM-Encodon-TE-Cdwt-1B-v1 Properties: Coding sequences from the NCBI RefSeq database (release 2024-04) were used for training. A total of >130M non-viral protein-coding sequences from >22,000 species were included, comprising >2,000 eukaryotes. Sequences not divisible by three or containing ambiguous bases were removed. Taxonomy-level deduplication using MMSeqs eliminated redundant entries, and coding sequences from bacteria pathogenic to humans were excluded. The resulting dataset was partitioned into nine species groups: primates, archaea, bacteria, fungi, invertebrate, plant, protozoa, non-primate mammals, and non-mammal vertebrates. Sequences were clustered by similarity and then split into training and validation sets with stratification across groups to ensure balanced representation. Encodon models use codon-level tokenization, processing input sequences of up to 2,046 codons. Each model was trained using a masked language modeling (MLM) objective, where randomly masked codons were predicted from their context. The Encodon pretraining dataset was sorted based on sequence taxonomy to maintain species balance, and sequence subsets could be resampled dynamically. Non-Audio, Image, Text Training Data Size: NCBI RefSeq genomes FTP directory currently contains over 395,000 genomes totaling approximately 3.3 terabases (Tb) Data Collection Method for all data Dataset: Automatic/Sensors | Link | Properties | |------|--------------------------------------------------| | ClinVar Variant Interpretation | This task involves classifying genetic variants from ClinVar, a publicly available database that aggregates information about the clinical significance of human genetic variants, into pathogenic or benign categories based on their coding sequence context| | Denovo variant classification | This task uses variants from the Deciphering Developmental Disorders (DDD) and autism spectrum disorder (ASD) cohort study, which catalogs genetic mutations linked to rare pediatric and developmental diseases, to evaluate classification of pathogenic versus benign variants based on coding sequence context.| | mRNA Translation Efficiency | This task predicts ribosome profiling signal intensity along coding sequences, evaluating how well models capture translation efficiency and codon-level regulation from sequence context.| | Protein Abundance | This task predicts fluorescent protein expression levels (mRFP) from coding sequences, testing how accurately models capture codon-dependent effects on translation efficiency and protein abundance. | Ethical Considerations: NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse. For more detailed information on ethical considerations for this model, please see the Model Card++ Bias, Explainability, Safety & Security, and Privacy Subcards. Users are responsible for ensuring the physical properties of model-generated molecules are appropriately evaluated and comply with applicable safety regulations and ethical standards. Please report model quality, risk, security vulnerabilities or NVIDIA AI Concerns here.

β€”
57
0

SO_ARM_Starter_Gr00t

license:apache-2.0
56
3

C-RADIOv2-VLM-H-RC3

β€”
56
0

GR00T-N1-2B-tuned-Exhaust-Pipe-Sorting-task

NaNK
β€”
56
0

Cosmos-1.0-Diffusion-14B-Video2World

NaNK
β€”
55
56

Cosmos-Predict1-7B-Text2World-Sample-AV-Multiview

NaNK
β€”
55
8

Cosmos-Predict2-2B-Sample-Action-Conditioned

NaNK
β€”
55
6

nemotron-graphic-elements-v1

The input of this model is expected to be a chart image. You can use the Nemotron Page Element v3 to detect and crop such images. The Nemotron Graphic Elements v1 model is a specialized object detection system designed to identify and extract key elements from charts and graphs. Based on YOLOX, an anchor-free version of YOLO (You Only Look Once), this model combines a simpler architecture with enhanced performance. While the underlying technology builds upon work from Megvii Technology, we developed our own base model through complete retraining rather than using pre-trained weights. The model excels at detecting and localizing various graphic elements within chart images, including titles, axis labels, legends, and data point annotations. This capability makes it particularly valuable for document understanding tasks and automated data extraction from visual content. This model is ready for commercial/non-commercial use. We are excited to announce the open sourcing of this commercial model. For users interested in deploying this model in production environments, it is also available via the model API in NVIDIA Inference Microservices (NIM) at nemotron-graphic-elements-v1. The use of this model is governed by the NVIDIA Open Model License Agreement and the use of the post-processing scripts are licensed under Apache 2.0. - Theo Viel - Bo Liu - Darragh Hanley - Even Oldridge Correspondence to Theo Viel ([email protected]) and Bo Liu ([email protected]) The Nemotron Graphic Elements v1 is designed for automating extraction of graphic elements of charts in enterprise documents. Key applications include: - Enterprise document extraction, embedding and indexing - Augmenting Retrieval Augmented Generation (RAG) workflows with multimodal retrieval - Data extraction from legacy documents and reports 10/23/2025 via https://huggingface.co/nvidia/nemotron-graphic-elements-v1 - YOLOX paper: https://arxiv.org/abs/2107.08430 - YOLOX repo: https://github.com/Megvii-BaseDetection/YOLOX - CACHED paper: https://arxiv.org/abs/2305.04151 - CACHED repo : https://github.com/pengyu965/ChartDete - Technical blog: https://developer.nvidia.com/blog/approaches-to-pdf-data-extraction-for-information-retrieval/ Architecture Type: YOLOX Network Architecture: DarkNet53 Backbone \+ FPN Decoupled head (one 1x1 convolution \+ 2 parallel 3x3 convolutions (one for the classification and one for the bounding box prediction). YOLOX is a single-stage object detector that improves on Yolo-v3. This model was developed based on the Yolo architecture Number of model parameters: 5.4e7 Input Type(s): Image Input Format(s): Red, Green, Blue (RGB) Input Parameters: Two-Dimensional (2D) Other Properties Related to Input: Image size resized to `(1024, 1024)` Output Type(s): Array Output Format: A dictionary of dictionaries containing `np.ndarray` objects. The outer dictionary has entries for each sample (page), and the inner dictionary contains a list of dictionaries, each with a bounding box (`np.ndarray`), class label, and confidence score for that page. Output Parameters: One-Dimensional (1D) Other Properties Related to Output: The output contains bounding boxes, detection confidence scores, and object classes (chart title, x/y axis titles and labels, legend title and labels, marker labels, value labels and other texts). The thresholds used for non-maximum suppression are `confthresh=0.01` and `iouthresh=0.25`. Output Classes: Chart title Title or caption associated to the chart x-axis title Title associated to the x axis y-axis title Title associated to the y axis x-axis label(s) Labels associated to the x axis y-axis label(s) Labels associated to the y axis Legend title Title of the legend Legend label(s) Labels associated to the legend Marker label(s) Labels associated to markers Value label(s) Labels associated to values Other Miscellaneous other text components Our AI models are designed and/or optimized to run on NVIDIA GPU-accelerated systems. By leveraging NVIDIA’s hardware (e.g. GPU cores) and software frameworks (e.g., CUDA libraries), the model achieves faster training and inference times compared to CPU-only solutions. The model requires torch, and the custom code available in this repository. - Make sure git-lfs is installed (https://git-lfs.com) Optional: This can be installed as a package using pip Note that this repository only provides minimal code to infer the model. If you wish to do additional training, refer to the original repo. Additional post-processing might be required to use the model as part of a data extraction pipeline. We provide examples in the notebook `Demo.ipynb`. Data Modality: Image Image Training Data Size: Less than a Million Images Data collection method by dataset: Automated Labeling method by dataset: Hybrid: Automated, Human Pretraining (by NVIDIA): 118,287 images of the COCO train2017 dataset Finetuning (by NVIDIA): 5,614 images from the PubMed Central (PMC) Chart Dataset. 9,091 images from the DeepRule Dataset with annotations obtained using the CACHED model Number of bounding boxes per class: | Label | Images | Boxes | | :--------------- | ---------: | ----------: | | charttitle | 9,487 | 18,754 | | xtitle | 5,995 | 9,152 | | ytitle | 8,487 | 12,893 | | xlabel | 13,227 | 217,820 | | ylabel | 12,983 | 172,431 | | legendtitle | 168 | 209 | | legendlabel | 9,812 | 59,044 | | marklabel | 660 | 2,887 | | valuelabel | 3,573 | 65,847 | | other | 3,717 | 29,565 | | Total | 14,143 | 588,602 | Results were evaluated using the PMC Chart dataset. The Mean Average Precision (mAP) was used as the evaluation metric to measure the model's ability to correctly identify and localize objects across different confidence thresholds. Number of bounding boxes and images per class: | Label | Images | Boxes | | :--------------- | ---------: | ---------: | | charttitle | 38 | 38 | | xtitle | 404 | 437 | | ytitle | 502 | 505 | | xlabel | 553 | 4,091 | | ylabel | 534 | 3,944 | | legendtitle | 17 | 19 | | legendlabel | 318 | 1,077 | | marklabel | 42 | 219 | | valuelabel | 52 | 726 | | other | 113 | 464 | | Total | 560 | 11,520 | Data collection method by dataset: Hybrid: Automated, Human Labeling method by dataset: Hybrid: Automated, Human Properties: The validation dataset is the same as the PMC Chart dataset. Per-class Performance Metrics: | Class | AP (%) | AR (%) | | :----------- | :----- | :----- | | charttitle | 82.38 | 93.16 | | xtitle | 88.77 | 92.31 | | ytitle | 89.48 | 92.32 | | xlabel | 85.04 | 88.93 | | ylabel | 86.22 | 89.40 | | other | 55.14 | 79.48 | | legendlabel | 84.09 | 88.07 | | legendtitle | 60.61 | 68.42 | | marklabel | 49.31 | 73.61 | | valuelabel | 62.66 | 68.32 | Test hardware: See Support Matrix from NIM documentation ---> NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse. For more detailed information on ethical considerations for this model, please see the Explainability, Bias, Safety & Security, and Privacy sections below. Please report security vulnerabilities or NVIDIA AI Concerns here. | Field | Response | | ----- | ----- | | Participation considerations from adversely impacted groups protected classes in model design and testing | None | | Measures taken to mitigate against unwanted bias | None | | Field | Response | | ----- | ----- | | Intended Application & Domain: | Object Detection | | Model Type: | YOLOX-architecture for detection of graphic elements within images of charts. | | Intended User: | Enterprise developers, data scientists, and other technical users who need to extract textual elements from charts and graphs. | | Output: | After post-processing, the output is three numpy array that contains the detections: `boxes [N x 4]` (format is normalized `(xmin, ymin, xmax, ymax)`), associated classes: `labels [N]` and confidence scores: `scores [N]`.| | Describe how the model works: | Finds and identifies objects in images by first dividing the image into a grid. For each section of the grid, the model uses a series of neural networks to extract visual features and simultaneously predict what objects are present (in this case "chart title" or "axis label" etc.) and exactly where they are located in that section, all in a single pass through the image. | | Name the adversely impacted groups this has been tested to deliver comparable outcomes regardless of: | Not Applicable | | Technical Limitations & Mitigation: | The model may not generalize to unknown chart types/formats. Further fine-tuning might be required for such images. | | Verified to have met prescribed NVIDIA quality standards: | Yes | | Performance Metrics: | Mean Average Precision, detectionr recall and visual inspection | | Potential Known Risks: | This model may not always detect all elements in a document. | | Licensing & Terms of Use: | Use of this model is governed by NVIDIA Open Model License Agreement and the use of the post-processing scripts are licensed under Apache 2.0. | | Field | Response | | ----- | ----- | | Generatable or reverse engineerable personal data? | No | | Personal data used to create this model? | No | | Was consent obtained for any personal data used? | Not Applicable | | How often is the dataset reviewed? | Before Release | | Is there provenance for all datasets used in training? | Yes | | Does data labeling (annotation, metadata) comply with privacy laws? | Yes | | Is data compliant with data subject requests for data correction or removal, if such a request was made? | No, not possible with externally-sourced data. | | Applicable Privacy Policy | https://www.nvidia.com/en-us/about-nvidia/privacy-policy/ | | Field | Response | | ----- | ----- | | Model Application Field(s): | Object Detection for Retrieval, focused on Enterprise | | Describe the life critical impact (if present). | Not Applicable | | Use Case Restrictions: | Abide by NVIDIA Open Model License Agreement and the use of the post-processing scripts are licensed under Apache 2.0. | | Model and dataset restrictions: | The Principle of least privilege (PoLP) is applied limiting access for dataset generation and model development. Restrictions enforce dataset access during training, and dataset license constraints adhered to. |

β€”
53
12

NV-CodonFM-Encodon-1B-v1

The code for using the CodonFM model checkpoint is available in the official Github repository. Description: CodonFM predicts masked codons in mRNA sequences from codon-level context to enable variant effect interpretation and codon optimization as part of NVIDIA’s CodonFM Encodon family. For this family of models we have 4 models. The first set of 3 models are with randomly masked tokens with 80 million, 600 million and 1 Billion parameter. The fourth model is with 1 Billion parameters but is trained with codon frequency aware masking. An additional set of accelerated checkpoints also available for use. This model is ready for commercial/non-commercial use. Governing Terms: Use of this model is governed by the NVIDIA Open Model License Agreement. - Optimized Expression and Stability for mRNA design: To design mRNAs with codon usage patterns that enhance translation efficiency, protein yield, and transcript stability across specific cell types and tissues. - Variant Interpretation for pathogenicity: To identify and prioritize functional synonymous and missense variants in the context of diseases. Release Date: Github 10/27/2025 via https://github.com/NVIDIA-Digital-Bio/CodonFM Hugging Face 10/27/2025 via: - Random Mask - https://huggingface.co/nvidia/NV-CodonFM-Encodon-1B-v1 - https://huggingface.co/nvidia/NV-CodonFM-Encodon-600M-v1 - https://huggingface.co/nvidia/NV-CodonFM-Encodon-80M-v1 - https://huggingface.co/nvidia/NV-CodonFM-Encodon-TE-1B-v1 - https://huggingface.co/nvidia/NV-CodonFM-Encodon-TE-600M-v1 - https://huggingface.co/nvidia/NV-CodonFM-Encodon-TE-80M-v1 - Codon Frequency Aware Mask - https://huggingface.co/nvidia/NV-CodonFM-Encodon-Cdwt-1B-v1 - https://huggingface.co/nvidia/NV-CodonFM-Encodon-TE-Cdwt-1B-v1 NGC 10/27/2025 via https://catalog.ngc.nvidia.com/orgs/nvidia/teams/clara/models/nvcodonfmencodon Model Architecture: The NVIDIA CodonFM Encodon family features Transformer-based architectures tailored for codon-level sequence modeling in mRNA. Each model applies a masked language modeling (MLM) objective to predict masked codons from surrounding context of 2046 codons, enabling genome-scale codon optimization and synonymous variant interpretation. | Model Name | Parameters | |-------------|-------------:| | Encodon-80M | 7.68 Γ— 10⁷ | | Encodon-600M | 6.09 Γ— 10⁸ | | Encodon-1B | 9.11 Γ— 10⁸ | | Encodon-Cdwt-1B | 9.11 Γ— 10⁸ | Input: Input Type(s): Text (mRNA Sequence) Input Format: fasta files converted to memmaps Input Parameters: 1D Other Properties Related to Input: mRNA sequence represented as a string of codons, of maximum length 2046. Longer sequences are automatically truncated to this length Output: Output Type(s): mRNA Sequence Output Format: Text Output Parameters: 2D Other Properties Related to Output: Numeric 2D tensor with float-point values representing probabilities of a given codon at a give position within the sequence Our AI models are designed and/or optimized to run on NVIDIA GPU-accelerated systems. By leveraging NVIDIA’s hardware (e.g. GPU cores) and software frameworks (e.g., CUDA libraries), the model achieves faster training and inference times compared to CPU-only solutions. Software Integration: Runtime Engine(s): PyTorch - 2.5.1 Supported Hardware Microarchitecture Compatibility: NVIDIA Ampere NVIDIA Hopper The integration of foundation and fine-tuned models into AI systems requires additional testing using use-case-specific data to ensure safe and effective deployment. Following the V-model methodology, iterative testing and validation at both unit and system levels are essential to mitigate risks, meet technical and functional requirements, and ensure compliance with safety and ethical standards before deployment. Model Version(s): - NV-CodonFM-Encodon-80M-v1 - NV-CodonFM-Encodon-600M-v1 - NV-CodonFM-Encodon-1B-v1 - NV-CodonFM-Encodon-Cdwt-1B-v1 - NV-CodonFM-Encodon-TE-80M-v1 - NV-CodonFM-Encodon-TE-600M-v1 - NV-CodonFM-Encodon-TE-1B-v1 - NV-CodonFM-Encodon-TE-Cdwt-1B-v1 Properties: Coding sequences from the NCBI RefSeq database (release 2024-04) were used for training. A total of >130M non-viral protein-coding sequences from >22,000 species were included, comprising >2,000 eukaryotes. Sequences not divisible by three or containing ambiguous bases were removed. Taxonomy-level deduplication using MMSeqs eliminated redundant entries, and coding sequences from bacteria pathogenic to humans were excluded. The resulting dataset was partitioned into nine species groups: primates, archaea, bacteria, fungi, invertebrate, plant, protozoa, non-primate mammals, and non-mammal vertebrates. Sequences were clustered by similarity and then split into training and validation sets with stratification across groups to ensure balanced representation. Encodon models use codon-level tokenization, processing input sequences of up to 2,046 codons. Each model was trained using a masked language modeling (MLM) objective, where randomly masked codons were predicted from their context. The Encodon pretraining dataset was sorted based on sequence taxonomy to maintain species balance, and sequence subsets could be resampled dynamically. Non-Audio, Image, Text Training Data Size: NCBI RefSeq genomes FTP directory currently contains over 395,000 genomes totaling approximately 3.3 terabases (Tb) Data Collection Method for all data Dataset: Automatic/Sensors | Link | Properties | |------|--------------------------------------------------| | ClinVar Variant Interpretation | This task involves classifying genetic variants from ClinVar, a publicly available database that aggregates information about the clinical significance of human genetic variants, into pathogenic or benign categories based on their coding sequence context| | Denovo variant classification | This task uses variants from the Deciphering Developmental Disorders (DDD) and autism spectrum disorder (ASD) cohort study, which catalogs genetic mutations linked to rare pediatric and developmental diseases, to evaluate classification of pathogenic versus benign variants based on coding sequence context.| | mRNA Translation Efficiency | This task predicts ribosome profiling signal intensity along coding sequences, evaluating how well models capture translation efficiency and codon-level regulation from sequence context.| | Protein Abundance | This task predicts fluorescent protein expression levels (mRFP) from coding sequences, testing how accurately models capture codon-dependent effects on translation efficiency and protein abundance. | Ethical Considerations: NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse. For more detailed information on ethical considerations for this model, please see the Model Card++ Bias, Explainability, Safety & Security, and Privacy Subcards. Users are responsible for ensuring the physical properties of model-generated molecules are appropriately evaluated and comply with applicable safety regulations and ethical standards. Please report model quality, risk, security vulnerabilities or NVIDIA AI Concerns here.

NaNK
β€”
53
0

Cosmos-1.0-Autoregressive-4B

NaNK
β€”
51
55

Nemotron-H-47B-Reasoning-128K-FP8

NaNK
β€”
48
5

Nemotron-H-8B-Reasoning-128K-FP8

NaNK
β€”
47
12

Cosmos-Predict1-4B

NaNK
β€”
47
2

NV-CodonFM-Encodon-80M-v1

The code for using the CodonFM model checkpoint is available in the official Github repository. Description: CodonFM predicts masked codons in mRNA sequences from codon-level context to enable variant effect interpretation and codon optimization as part of NVIDIA’s CodonFM Encodon family. For this family of models we have 4 models. The first set of 3 models are with randomly masked tokens with 80 million, 600 million and 1 Billion parameter. The fourth model is with 1 Billion parameters but is trained with codon frequency aware masking. An additional set of accelerated checkpoints also available for use. This model is ready for commercial/non-commercial use. Governing Terms: Use of this model is governed by the NVIDIA Open Model License Agreement. - Optimized Expression and Stability for mRNA design: To design mRNAs with codon usage patterns that enhance translation efficiency, protein yield, and transcript stability across specific cell types and tissues. - Variant Interpretation for pathogenicity: To identify and prioritize functional synonymous and missense variants in the context of diseases. Release Date: Github 10/27/2025 via https://github.com/NVIDIA-Digital-Bio/CodonFM Hugging Face 10/27/2025 via: - Random Mask - https://huggingface.co/nvidia/NV-CodonFM-Encodon-1B-v1 - https://huggingface.co/nvidia/NV-CodonFM-Encodon-600M-v1 - https://huggingface.co/nvidia/NV-CodonFM-Encodon-80M-v1 - https://huggingface.co/nvidia/NV-CodonFM-Encodon-TE-1B-v1 - https://huggingface.co/nvidia/NV-CodonFM-Encodon-TE-600M-v1 - https://huggingface.co/nvidia/NV-CodonFM-Encodon-TE-80M-v1 - Codon Frequency Aware Mask - https://huggingface.co/nvidia/NV-CodonFM-Encodon-Cdwt-1B-v1 - https://huggingface.co/nvidia/NV-CodonFM-Encodon-TE-Cdwt-1B-v1 NGC 10/27/2025 via https://catalog.ngc.nvidia.com/orgs/nvidia/teams/clara/models/nvcodonfmencodon Model Architecture: The NVIDIA CodonFM Encodon family features Transformer-based architectures tailored for codon-level sequence modeling in mRNA. Each model applies a masked language modeling (MLM) objective to predict masked codons from surrounding context of 2046 codons, enabling genome-scale codon optimization and synonymous variant interpretation. | Model Name | Parameters | |-------------|-------------:| | Encodon-80M | 7.68 Γ— 10⁷ | | Encodon-600M | 6.09 Γ— 10⁸ | | Encodon-1B | 9.11 Γ— 10⁸ | | Encodon-Cdwt-1B | 9.11 Γ— 10⁸ | Input: Input Type(s): Text (mRNA Sequence) Input Format: fasta files converted to memmaps Input Parameters: 1D Other Properties Related to Input: mRNA sequence represented as a string of codons, of maximum length 2046. Longer sequences are automatically truncated to this length Output: Output Type(s): mRNA Sequence Output Format: Text Output Parameters: 2D Other Properties Related to Output: Numeric 2D tensor with float-point values representing probabilities of a given codon at a give position within the sequence Our AI models are designed and/or optimized to run on NVIDIA GPU-accelerated systems. By leveraging NVIDIA’s hardware (e.g. GPU cores) and software frameworks (e.g., CUDA libraries), the model achieves faster training and inference times compared to CPU-only solutions. Software Integration: Runtime Engine(s): PyTorch - 2.5.1 Supported Hardware Microarchitecture Compatibility: NVIDIA Ampere NVIDIA Hopper The integration of foundation and fine-tuned models into AI systems requires additional testing using use-case-specific data to ensure safe and effective deployment. Following the V-model methodology, iterative testing and validation at both unit and system levels are essential to mitigate risks, meet technical and functional requirements, and ensure compliance with safety and ethical standards before deployment. Model Version(s): - NV-CodonFM-Encodon-80M-v1 - NV-CodonFM-Encodon-600M-v1 - NV-CodonFM-Encodon-1B-v1 - NV-CodonFM-Encodon-Cdwt-1B-v1 - NV-CodonFM-Encodon-TE-80M-v1 - NV-CodonFM-Encodon-TE-600M-v1 - NV-CodonFM-Encodon-TE-1B-v1 - NV-CodonFM-Encodon-TE-Cdwt-1B-v1 Properties: Coding sequences from the NCBI RefSeq database (release 2024-04) were used for training. A total of >130M non-viral protein-coding sequences from >22,000 species were included, comprising >2,000 eukaryotes. Sequences not divisible by three or containing ambiguous bases were removed. Taxonomy-level deduplication using MMSeqs eliminated redundant entries, and coding sequences from bacteria pathogenic to humans were excluded. The resulting dataset was partitioned into nine species groups: primates, archaea, bacteria, fungi, invertebrate, plant, protozoa, non-primate mammals, and non-mammal vertebrates. Sequences were clustered by similarity and then split into training and validation sets with stratification across groups to ensure balanced representation. Encodon models use codon-level tokenization, processing input sequences of up to 2,046 codons. Each model was trained using a masked language modeling (MLM) objective, where randomly masked codons were predicted from their context. The Encodon pretraining dataset was sorted based on sequence taxonomy to maintain species balance, and sequence subsets could be resampled dynamically. Non-Audio, Image, Text Training Data Size: NCBI RefSeq genomes FTP directory currently contains over 395,000 genomes totaling approximately 3.3 terabases (Tb) Data Collection Method for all data Dataset: Automatic/Sensors | Link | Properties | |------|--------------------------------------------------| | ClinVar Variant Interpretation | This task involves classifying genetic variants from ClinVar, a publicly available database that aggregates information about the clinical significance of human genetic variants, into pathogenic or benign categories based on their coding sequence context| | Denovo variant classification | This task uses variants from the Deciphering Developmental Disorders (DDD) and autism spectrum disorder (ASD) cohort study, which catalogs genetic mutations linked to rare pediatric and developmental diseases, to evaluate classification of pathogenic versus benign variants based on coding sequence context.| | mRNA Translation Efficiency | This task predicts ribosome profiling signal intensity along coding sequences, evaluating how well models capture translation efficiency and codon-level regulation from sequence context.| | Protein Abundance | This task predicts fluorescent protein expression levels (mRFP) from coding sequences, testing how accurately models capture codon-dependent effects on translation efficiency and protein abundance. | Ethical Considerations: NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse. For more detailed information on ethical considerations for this model, please see the Model Card++ Bias, Explainability, Safety & Security, and Privacy Subcards. Users are responsible for ensuring the physical properties of model-generated molecules are appropriately evaluated and comply with applicable safety regulations and ethical standards. Please report model quality, risk, security vulnerabilities or NVIDIA AI Concerns here.

β€”
47
0

Cosmos-1.0-Diffusion-7B-Decoder-DV8x16x16ToCV8x8x8

NaNK
β€”
46
9

Nemotron-Elastic-12B

NaNK
β€”
45
22

nemo-megatron-mt5-3B

NaNK
license:cc-by-4.0
44
13

Cosmos-Predict1-5B-Video2World

NaNK
β€”
44
4

Cosmos-Predict1-12B

NaNK
β€”
44
1

PS3_Lang-1.5K-SigLIP2

NaNK
license:cc-by-nc-sa-4.0
44
1

Nemoretriever Ocr V1

The Nemotron OCR v1 model is a state-of-the-art text recognition model designed for robust end-to-end optical character recognition (OCR) on complex real-world images. It integrates three core neural network modules: a detector for text region localization, a recognizer for transcription of detected regions, and a relational model for layout and structure analysis. This model is optimized for a wide variety of OCR tasks, including multi-line, multi-block, and natural scene text, and it supports advanced reading order analysis via its relational model component. Nemotron OCR v1 has been developed to be production-ready and commercially usable, with a focus on speed and accuracy on both document and natural scene images. The Nemotron OCR v1 model is part of the NVIDIA NeMo Retriever collection of NIM microservices, which provides state-of-the-art, commercially-ready models and microservices optimized for the lowest latency and highest throughput. It features a production-ready information retrieval pipeline with enterprise support. The models that form the core of this solution have been trained using responsibly selected, auditable data sources. With multiple pre-trained models available as starting points, developers can readily customize them for domain-specific use cases, such as information technology, human resource help assistants, and research & development research assistants. We are excited to announce the open sourcing of this commercial model. For users interested in deploying this model in production environments, it is also available via the model API in NVIDIA Inference Microservices (NIM) at nemotron-ocr-v1. The use of this model is governed by the NVIDIA Open Model License Agreement and the use of the post-processing scripts are licensed under Apache 2.0. - Mike Ranzinger - Bo Liu - Theo Viel - Charles Blackmon-Luca - Oliver Holworthy - Edward Kim - Even Oldridge The Nemotron OCR v1 model is designed for high-accuracy and high-speed extraction of textual information from images, making it ideal for powering multimodal retrieval systems, Retrieval-Augmented Generation (RAG) pipelines, and agentic applications that require seamless integration of visual and language understanding. Its robust performance and efficiency make it an excellent choice for next-generation AI systems that demand both precision and scalability across diverse real-world content. 10/23/2025 via https://huggingface.co/nvidia/nemotron-ocr-v1 - Technical blog: https://developer.nvidia.com/blog/approaches-to-pdf-data-extraction-for-information-retrieval/ Architecture Type: Hybrid detector–recognizer with document-level relational modeling The Nemotron OCR v1 model integrates three specialized neural components: - Text Detector: Utilizes a RegNetY-8GF convolutional backbone for high-accuracy localization of text regions within images. - Text Recognizer: Employs a Transformer-based sequence recognizer to transcribe text from detected regions, supporting variable word and line lengths. - Relational Model: Applies a multi-layer global relational module to predict logical groupings, reading order, and layout relationships across detected text elements. All components are trained jointly in an end-to-end fashion, providing robust, scalable, and production-ready OCR for diverse document and scene images. | Component | Parameters | |-------------------|-------------| | Detector | 45,268,472 | | Recognizer | 4,944,346 | | Relational model | 2,254,422 | | Total | 52,467,240 | | Property | Value | |------------------|-------------------| | Input Type & Format | Image (RGB, PNG/JPEG, float32/uint8), aggregation level (word, sentence, or paragraph) | | Input Parameters (Two-Dimensional) | 3 x H x W (single image) or B x 3 x H x W (batch) | | Input Range | [0, 1] (float32) or [0, 255] (uint8, auto-converted) | | Other Properties | Handles both single images and batches. Automatic multi-scale resizing for best accuracy. | | Property | Value | |-----------------|-------------------| | Output Type | Structured OCR results: a list of detected text regions (bounding boxes), recognized text, and confidence scores | | Output Format | Bounding boxes: tuple of floats, recognized text: string, confidence score: float | | Output Parameters | Bounding boxes: One-Dimenional (1D) list of bounding box coordinates, recognized text: One-Dimenional (1D) list of strings, confidence score: One-Dimenional (1D) list of floats | | Other Properties | Please see the sample output for an example of the model output | Our AI models are designed and/or optimized to run on NVIDIA GPU-accelerated systems. By leveraging NVIDIA’s hardware (e.g. GPU cores) and software frameworks (e.g., CUDA libraries), the model achieves faster training and inference times compared to CPU-only solutions. - OS: Linux amd64 with NVIDIA GPU - CUDA: CUDA Toolkit 12.8 and compatible NVIDIA driver installed (for PyTorch CUDA). Verify with `nvidia-smi`. - Python: 3.12 (both subpackages require `python = ~3.12`) - Build tools (when building the C++ extension): - GCC/G++ with C++17 support - CUDA toolkit headers (for building CUDA kernels) - OpenMP (used by the C++ extension) Installation The model requires torch, and the custom code available in this repository. - Make sure git-lfs is installed (https://git-lfs.com) - Create and activate a Python 3.12 environment (optional) - Run the following command to install the package: Run the example end-to-end without installing anything on the host (besides Docker, docker compose, and NVIDIA Container Toolkit): - From the repo root, bring up the service to run the example against the provided image `ocr-example-image.png`: This will: - Build an image from the provided `Dockerfile` (based on `nvcr.io/nvidia/pytorch`) - Mount the repo at `/workspace` - Run `example.py` with model from `checkpoints` Output is saved next to your input image as ` -annotated. ` on the host. Image Training Data Size Less than a Million Images The model is trained on a large-scale, curated mix of public and proprietary OCR datasets, focusing on high diversity of document layouts and scene images. The training set includes synthetic and real images with varied noise and backgrounds, filtered for commercial use eligibility. Data Collection Method: Hybrid (Automated, Human, Synthetic) Labeling Method: Hybrid (Automated, Human, Synthetic) Properties: Includes scanned documents, natural scene images, receipts, and business documents. The Nemotron OCR v1 model is evaluated on several NVIDIA internal datasets for various tasks, such as pure OCR, table content extraction, and document retrieval. Data Collection Method: Hybrid (Automated, Human, Synthetic) Labeling Method: Hybrid (Automated, Human, Synthetic) Properties: Benchmarks include challenging scene images, documents with varied layouts, and multi-language data. We benchmarked Nemotron OCR v1 on internal evaluation datasets against PaddleOCR on various tasks, such as pure OCR (Character Error Rate), table content extraction (TEDS), and document retrieval (Recall@5). | Metric | Nemotron OCR v1 | PaddleOCR | Net change | |-------------------------------------------|--------------------|-----------|-----------------| | Character Error Rate | 0.1633 | 0.2029 | -19.5% βœ”οΈ | | Bag-of-character Error Rate | 0.0453 | 0.0512 | -11.5% βœ”οΈ | | Bag-of-word Error Rate | 0.1203 | 0.2748 | -56.2% βœ”οΈ | | Table Extraction TEDS | 0.781 | 0.781 | 0.0% βš–οΈ | | Public Earnings Multimodal Recall@5 | 0.779 | 0.775 | +0.5% βœ”οΈ | | Digital Corpora Multimodal Recall@5 | 0.901 | 0.883 | +2.0% βœ”οΈ | The model demonstrates robust performance on complex layouts, noisy backgrounds, and challenging real-world scenes. Reading order and block detection are powered by the relational module, supporting downstream applications such as chart-to-text, table-to-text, and infographic-to-text extraction. Acceleration Engine: TensorRT, PyTorch Test Hardware: H100 PCIe/SXM, A100 PCIe/SXM, L40s, L4, and A10G --> NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse. For more detailed information on ethical considerations for this model, please see the Explainability, Bias, Safety & Security, and Privacy sections below. Please report security vulnerabilities or NVIDIA AI Concerns here. | Field | Response | | ----- | ----- | | Participation considerations from adversely impacted groups protected classes in model design and testing | None | | Measures taken to mitigate against unwanted bias | None | | Field | Response | | ----- | ----- | | Intended Task/Domain: | Optical Character Recognition (OCR) with a focus on retrieval application and documents. | | Model Type: | Hybrid neural network with convolutional detector, transformer recognizer, and document structure modeling. | | Intended Users: | Developers and teams building AI-driven search applications, retrieval-augmented generation (RAG) workflows, multimodal agents, or document intelligence applications. It is ideal for those working with large collections of scanned or photographed documents, including PDFs, forms, and reports. | | Output: | Structured OCR results, including detected bounding boxes, recognized text, and confidence scores. | | Describe how the model works: | The model first detects text regions in the image, then transcribes recognized text, and finally analyzes document structure and reading order. Outputs structured, machine-readable results suitable for downstream search and analysis. | | Name the adversely impacted groups this has been tested to deliver comparable outcomes regardless of: | Not Applicable | | Technical Limitations: | This model version supports English only. | | Verified to have met prescribed NVIDIA quality standards: | Yes | | Performance Metrics: | Accuracy (e.g., character error rate), throughput, and latency. | | Potential Known Risks: | The model may not always extract or transcribe all text with perfect accuracy, particularly in cases of poor image quality or highly stylized fonts. | | Licensing & Terms of Use: | Use of this model is governed by NVIDIA Open Model License Agreement and the use of the post-processing scripts are licensed under Apache 2.0. | | Field | Response | | ----- | ----- | | Generatable or reverse engineerable personal data? | No | | Personal data used to create this model? | None Known | | How often is dataset reviewed? | The dataset is initially reviewed when added, and subsequent reviews are conducted as needed or in response to change requests. | | Is there provenance for all datasets used in training? | Yes | | Does data labeling (annotation, metadata) comply with privacy laws? | Yes | | Is data compliant with data subject requests for data correction or removal, if such a request was made? | No, not possible with externally-sourced data. | | Applicable Privacy Policy | https://www.nvidia.com/en-us/about-nvidia/privacy-policy/ | | Field | Response | | ----- | ----- | | Model Application Field(s): | Text recognition and structured OCR for multimodal retrieval. Inputs can include natural scene images, scanned documents, charts, tables, and infographics. | | Use Case Restrictions: | Abide by NVIDIA Open Model License Agreement and the use of the post-processing scripts are licensed under Apache 2.0. | | Model and dataset restrictions: | The principle of least privilege (PoLP) is applied, limiting access for dataset generation and model development. Restrictions enforce dataset access only during training, and all dataset license constraints are adhered to. | | Describe the life critical impact (if present): | Not applicable. |

β€”
43
15

VILA-HD-8B-PS3-1.5K-SigLIP2

NaNK
llava_topdown_llama
42
0

Cosmos-1.0-Autoregressive-12B

NaNK
β€”
41
30

NemoGuard-JailbreakDetect

β€”
41
15

PS3-4K-SigLIP

NaNK
β€”
41
2

PS3_Lang-4K-SigLIP2

NaNK
license:cc-by-nc-sa-4.0
41
0

Aegis-AI-Content-Safety-LlamaGuard-Permissive-1.0

license:llama2
40
18

VILA-HD-8B-PS3-1.5K-SigLIP

NaNK
llava_topdown_llama
40
3

Nemotron-Flash-1B

NaNK
β€”
39
3

Cosmos-Predict1-13B-Video2World

NaNK
β€”
39
2

VILA-HD-8B-PS3-4K-SigLIP

NaNK
llava_topdown_llama
39
1

PS3-4K-C-RADIOv2

NaNK
β€”
39
0

NV-CodonFM-Encodon-TE-600M-v1

This is a TransformerEngine accelerated version of CodonFM. The code for this model can be found within bionemo-recipes, and this checkpoint is an exact parameter match to the original research work which can be found at the official CodonFM Github repository. Description: CodonFM predicts masked codons in mRNA sequences from codon-level context to enable variant effect interpretation and codon optimization as part of NVIDIA’s CodonFM Encodon family. For this family of models we have 4 models. The first set of 3 models are with randomly masked tokens with 80 million, 600 million and 1 Billion parameter. The fourth model is with 1 Billion parameters but is trained with codon frequency aware masking. An additional set of accelerated checkpoints also available for use. This model is ready for commercial/non-commercial use. Governing Terms: Use of this model is governed by the NVIDIA Open Model License Agreement. - Optimized Expression and Stability for mRNA design: To design mRNAs with codon usage patterns that enhance translation efficiency, protein yield, and transcript stability across specific cell types and tissues. - Variant Interpretation for pathogenicity: To identify and prioritize functional synonymous and missense variants in the context of diseases. Release Date: Github 10/27/2025 via https://github.com/NVIDIA-Digital-Bio/CodonFM Hugging Face 10/27/2025 via: - Random Mask - https://huggingface.co/nvidia/NV-CodonFM-Encodon-1B-v1 - https://huggingface.co/nvidia/NV-CodonFM-Encodon-600M-v1 - https://huggingface.co/nvidia/NV-CodonFM-Encodon-80M-v1 - https://huggingface.co/nvidia/NV-CodonFM-Encodon-TE-1B-v1 - https://huggingface.co/nvidia/NV-CodonFM-Encodon-TE-600M-v1 - https://huggingface.co/nvidia/NV-CodonFM-Encodon-TE-80M-v1 - Codon Frequency Aware Mask - https://huggingface.co/nvidia/NV-CodonFM-Encodon-Cdwt-1B-v1 - https://huggingface.co/nvidia/NV-CodonFM-Encodon-TE-Cdwt-1B-v1 NGC 10/27/2025 via https://catalog.ngc.nvidia.com/orgs/nvidia/teams/clara/models/nvcodonfmencodon Model Architecture: The NVIDIA CodonFM Encodon family features Transformer-based architectures tailored for codon-level sequence modeling in mRNA. Each model applies a masked language modeling (MLM) objective to predict masked codons from surrounding context of 2046 codons, enabling genome-scale codon optimization and synonymous variant interpretation. | Model Name | Parameters | |-------------|-------------:| | Encodon-80M | 7.68 Γ— 10⁷ | | Encodon-600M | 6.09 Γ— 10⁸ | | Encodon-1B | 9.11 Γ— 10⁸ | | Encodon-Cdwt-1B | 9.11 Γ— 10⁸ | Input: Input Type(s): Text (mRNA Sequence) Input Format: fasta files converted to memmaps Input Parameters: 1D Other Properties Related to Input: mRNA sequence represented as a string of codons, of maximum length 2046. Longer sequences are automatically truncated to this length Output: Output Type(s): mRNA Sequence Output Format: Text Output Parameters: 2D Other Properties Related to Output: Numeric 2D tensor with float-point values representing probabilities of a given codon at a give position within the sequence Our AI models are designed and/or optimized to run on NVIDIA GPU-accelerated systems. By leveraging NVIDIA’s hardware (e.g. GPU cores) and software frameworks (e.g., CUDA libraries), the model achieves faster training and inference times compared to CPU-only solutions. Software Integration: Runtime Engine(s): PyTorch - 2.5.1 Supported Hardware Microarchitecture Compatibility: NVIDIA Ampere NVIDIA Hopper The integration of foundation and fine-tuned models into AI systems requires additional testing using use-case-specific data to ensure safe and effective deployment. Following the V-model methodology, iterative testing and validation at both unit and system levels are essential to mitigate risks, meet technical and functional requirements, and ensure compliance with safety and ethical standards before deployment. Model Version(s): - NV-CodonFM-Encodon-80M-v1 - NV-CodonFM-Encodon-600M-v1 - NV-CodonFM-Encodon-1B-v1 - NV-CodonFM-Encodon-Cdwt-1B-v1 - NV-CodonFM-Encodon-TE-80M-v1 - NV-CodonFM-Encodon-TE-600M-v1 - NV-CodonFM-Encodon-TE-1B-v1 - NV-CodonFM-Encodon-TE-Cdwt-1B-v1 Properties: Coding sequences from the NCBI RefSeq database (release 2024-04) were used for training. A total of >130M non-viral protein-coding sequences from >22,000 species were included, comprising >2,000 eukaryotes. Sequences not divisible by three or containing ambiguous bases were removed. Taxonomy-level deduplication using MMSeqs eliminated redundant entries, and coding sequences from bacteria pathogenic to humans were excluded. The resulting dataset was partitioned into nine species groups: primates, archaea, bacteria, fungi, invertebrate, plant, protozoa, non-primate mammals, and non-mammal vertebrates. Sequences were clustered by similarity and then split into training and validation sets with stratification across groups to ensure balanced representation. Encodon models use codon-level tokenization, processing input sequences of up to 2,046 codons. Each model was trained using a masked language modeling (MLM) objective, where randomly masked codons were predicted from their context. The Encodon pretraining dataset was sorted based on sequence taxonomy to maintain species balance, and sequence subsets could be resampled dynamically. Non-Audio, Image, Text Training Data Size: NCBI RefSeq genomes FTP directory currently contains over 395,000 genomes totaling approximately 3.3 terabases (Tb) Data Collection Method for all data Dataset: Automatic/Sensors | Link | Properties | |------|--------------------------------------------------| | ClinVar Variant Interpretation | This task involves classifying genetic variants from ClinVar, a publicly available database that aggregates information about the clinical significance of human genetic variants, into pathogenic or benign categories based on their coding sequence context| | Denovo variant classification | This task uses variants from the Deciphering Developmental Disorders (DDD) and autism spectrum disorder (ASD) cohort study, which catalogs genetic mutations linked to rare pediatric and developmental diseases, to evaluate classification of pathogenic versus benign variants based on coding sequence context.| | mRNA Translation Efficiency | This task predicts ribosome profiling signal intensity along coding sequences, evaluating how well models capture translation efficiency and codon-level regulation from sequence context.| | Protein Abundance | This task predicts fluorescent protein expression levels (mRFP) from coding sequences, testing how accurately models capture codon-dependent effects on translation efficiency and protein abundance. | Ethical Considerations: NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse. For more detailed information on ethical considerations for this model, please see the Model Card++ Bias, Explainability, Safety & Security, and Privacy Subcards. Users are responsible for ensuring the physical properties of model-generated molecules are appropriately evaluated and comply with applicable safety regulations and ethical standards. Please report model quality, risk, security vulnerabilities or NVIDIA AI Concerns here.

β€”
39
0

Qwen3 Nemotron 14B BRRM

NaNK
β€”
38
4

VILA-HD-8B-PS3-4K-C-RADIOv2

NaNK
llava_topdown_llama
38
0

Cosmos-Predict1-7B-Video2World-Sample-AV-Multiview

NaNK
β€”
37
4

VILA-HD-8B-PS3-4K-SigLIP2

NaNK
llava_topdown_llama
37
1

Liver_Scan_Gr00t_Cosmos_Rel

license:apache-2.0
37
0

Cosmos-1.0-Autoregressive-5B-Video2World

NaNK
β€”
36
30

QLIP-L-14-392

license:cc-by-nc-4.0
36
12

PS3-1.5K-SigLIP

NaNK
β€”
36
5

Qwen-2.5-Nemotron-32B-Reward

NaNK
β€”
36
2

VILA-HD-8B-PS3-1.5K-C-RADIOv2

NaNK
llava_topdown_llama
36
0

NV-CodonFM-Encodon-TE-Cdwt-1B-v1

This is a TransformerEngine accelerated version of CodonFM. The code for this model can be found within bionemo-recipes, and this checkpoint is an exact parameter match to the original research work which can be found at the official CodonFM Github repository. Description: CodonFM predicts masked codons in mRNA sequences from codon-level context to enable variant effect interpretation and codon optimization as part of NVIDIA’s CodonFM Encodon family. For this family of models we have 4 models. The first set of 3 models are with randomly masked tokens with 80 million, 600 million and 1 Billion parameter. The fourth model is with 1 Billion parameters but is trained with codon frequency aware masking. An additional set of accelerated checkpoints also available for use. This model is ready for commercial/non-commercial use. Governing Terms: Use of this model is governed by the NVIDIA Open Model License Agreement. - Optimized Expression and Stability for mRNA design: To design mRNAs with codon usage patterns that enhance translation efficiency, protein yield, and transcript stability across specific cell types and tissues. - Variant Interpretation for pathogenicity: To identify and prioritize functional synonymous and missense variants in the context of diseases. Release Date: Github 10/27/2025 via https://github.com/NVIDIA-Digital-Bio/CodonFM Hugging Face 10/27/2025 via: - Random Mask - https://huggingface.co/nvidia/NV-CodonFM-Encodon-1B-v1 - https://huggingface.co/nvidia/NV-CodonFM-Encodon-600M-v1 - https://huggingface.co/nvidia/NV-CodonFM-Encodon-80M-v1 - https://huggingface.co/nvidia/NV-CodonFM-Encodon-TE-1B-v1 - https://huggingface.co/nvidia/NV-CodonFM-Encodon-TE-600M-v1 - https://huggingface.co/nvidia/NV-CodonFM-Encodon-TE-80M-v1 - Codon Frequency Aware Mask - https://huggingface.co/nvidia/NV-CodonFM-Encodon-Cdwt-1B-v1 - https://huggingface.co/nvidia/NV-CodonFM-Encodon-TE-Cdwt-1B-v1 NGC 10/27/2025 via https://catalog.ngc.nvidia.com/orgs/nvidia/teams/clara/models/nvcodonfmencodon Model Architecture: The NVIDIA CodonFM Encodon family features Transformer-based architectures tailored for codon-level sequence modeling in mRNA. Each model applies a masked language modeling (MLM) objective to predict masked codons from surrounding context of 2046 codons, enabling genome-scale codon optimization and synonymous variant interpretation. | Model Name | Parameters | |-------------|-------------:| | Encodon-80M | 7.68 Γ— 10⁷ | | Encodon-600M | 6.09 Γ— 10⁸ | | Encodon-1B | 9.11 Γ— 10⁸ | | Encodon-Cdwt-1B | 9.11 Γ— 10⁸ | Input: Input Type(s): Text (mRNA Sequence) Input Format: fasta files converted to memmaps Input Parameters: 1D Other Properties Related to Input: mRNA sequence represented as a string of codons, of maximum length 2046. Longer sequences are automatically truncated to this length Output: Output Type(s): mRNA Sequence Output Format: Text Output Parameters: 2D Other Properties Related to Output: Numeric 2D tensor with float-point values representing probabilities of a given codon at a give position within the sequence Our AI models are designed and/or optimized to run on NVIDIA GPU-accelerated systems. By leveraging NVIDIA’s hardware (e.g. GPU cores) and software frameworks (e.g., CUDA libraries), the model achieves faster training and inference times compared to CPU-only solutions. Software Integration: Runtime Engine(s): PyTorch - 2.5.1 Supported Hardware Microarchitecture Compatibility: NVIDIA Ampere NVIDIA Hopper The integration of foundation and fine-tuned models into AI systems requires additional testing using use-case-specific data to ensure safe and effective deployment. Following the V-model methodology, iterative testing and validation at both unit and system levels are essential to mitigate risks, meet technical and functional requirements, and ensure compliance with safety and ethical standards before deployment. Model Version(s): - NV-CodonFM-Encodon-80M-v1 - NV-CodonFM-Encodon-600M-v1 - NV-CodonFM-Encodon-1B-v1 - NV-CodonFM-Encodon-Cdwt-1B-v1 - NV-CodonFM-Encodon-TE-80M-v1 - NV-CodonFM-Encodon-TE-600M-v1 - NV-CodonFM-Encodon-TE-1B-v1 - NV-CodonFM-Encodon-TE-Cdwt-1B-v1 Properties: Coding sequences from the NCBI RefSeq database (release 2024-04) were used for training. A total of >130M non-viral protein-coding sequences from >22,000 species were included, comprising >2,000 eukaryotes. Sequences not divisible by three or containing ambiguous bases were removed. Taxonomy-level deduplication using MMSeqs eliminated redundant entries, and coding sequences from bacteria pathogenic to humans were excluded. The resulting dataset was partitioned into nine species groups: primates, archaea, bacteria, fungi, invertebrate, plant, protozoa, non-primate mammals, and non-mammal vertebrates. Sequences were clustered by similarity and then split into training and validation sets with stratification across groups to ensure balanced representation. Encodon models use codon-level tokenization, processing input sequences of up to 2,046 codons. Each model was trained using a masked language modeling (MLM) objective, where randomly masked codons were predicted from their context. The Encodon pretraining dataset was sorted based on sequence taxonomy to maintain species balance, and sequence subsets could be resampled dynamically. Non-Audio, Image, Text Training Data Size: NCBI RefSeq genomes FTP directory currently contains over 395,000 genomes totaling approximately 3.3 terabases (Tb) Data Collection Method for all data Dataset: Automatic/Sensors | Link | Properties | |------|--------------------------------------------------| | ClinVar Variant Interpretation | This task involves classifying genetic variants from ClinVar, a publicly available database that aggregates information about the clinical significance of human genetic variants, into pathogenic or benign categories based on their coding sequence context| | Denovo variant classification | This task uses variants from the Deciphering Developmental Disorders (DDD) and autism spectrum disorder (ASD) cohort study, which catalogs genetic mutations linked to rare pediatric and developmental diseases, to evaluate classification of pathogenic versus benign variants based on coding sequence context.| | mRNA Translation Efficiency | This task predicts ribosome profiling signal intensity along coding sequences, evaluating how well models capture translation efficiency and codon-level regulation from sequence context.| | Protein Abundance | This task predicts fluorescent protein expression levels (mRFP) from coding sequences, testing how accurately models capture codon-dependent effects on translation efficiency and protein abundance. | Ethical Considerations: NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse. For more detailed information on ethical considerations for this model, please see the Model Card++ Bias, Explainability, Safety & Security, and Privacy Subcards. Users are responsible for ensuring the physical properties of model-generated molecules are appropriately evaluated and comply with applicable safety regulations and ethical standards. Please report model quality, risk, security vulnerabilities or NVIDIA AI Concerns here.

NaNK
β€”
35
0

PS3-1.5K-C-RADIOv2

NaNK
β€”
34
0

VideoITG-8B

NaNK
β€”
33
7

NV-CodonFM-Encodon-TE-1B-v1

This is a TransformerEngine accelerated version of CodonFM. The code for this model can be found within bionemo-recipes, and this checkpoint is an exact parameter match to the original research work which can be found at the official CodonFM Github repository. Description: CodonFM predicts masked codons in mRNA sequences from codon-level context to enable variant effect interpretation and codon optimization as part of NVIDIA’s CodonFM Encodon family. For this family of models we have 4 models. The first set of 3 models are with randomly masked tokens with 80 million, 600 million and 1 Billion parameter. The fourth model is with 1 Billion parameters but is trained with codon frequency aware masking. An additional set of accelerated checkpoints also available for use. This model is ready for commercial/non-commercial use. Governing Terms: Use of this model is governed by the NVIDIA Open Model License Agreement. - Optimized Expression and Stability for mRNA design: To design mRNAs with codon usage patterns that enhance translation efficiency, protein yield, and transcript stability across specific cell types and tissues. - Variant Interpretation for pathogenicity: To identify and prioritize functional synonymous and missense variants in the context of diseases. Release Date: Github 10/27/2025 via https://github.com/NVIDIA-Digital-Bio/CodonFM Hugging Face 10/27/2025 via: - Random Mask - https://huggingface.co/nvidia/NV-CodonFM-Encodon-1B-v1 - https://huggingface.co/nvidia/NV-CodonFM-Encodon-600M-v1 - https://huggingface.co/nvidia/NV-CodonFM-Encodon-80M-v1 - https://huggingface.co/nvidia/NV-CodonFM-Encodon-TE-1B-v1 - https://huggingface.co/nvidia/NV-CodonFM-Encodon-TE-600M-v1 - https://huggingface.co/nvidia/NV-CodonFM-Encodon-TE-80M-v1 - Codon Frequency Aware Mask - https://huggingface.co/nvidia/NV-CodonFM-Encodon-Cdwt-1B-v1 - https://huggingface.co/nvidia/NV-CodonFM-Encodon-TE-Cdwt-1B-v1 NGC 10/27/2025 via https://catalog.ngc.nvidia.com/orgs/nvidia/teams/clara/models/nvcodonfmencodon Model Architecture: The NVIDIA CodonFM Encodon family features Transformer-based architectures tailored for codon-level sequence modeling in mRNA. Each model applies a masked language modeling (MLM) objective to predict masked codons from surrounding context of 2046 codons, enabling genome-scale codon optimization and synonymous variant interpretation. | Model Name | Parameters | |-------------|-------------:| | Encodon-80M | 7.68 Γ— 10⁷ | | Encodon-600M | 6.09 Γ— 10⁸ | | Encodon-1B | 9.11 Γ— 10⁸ | | Encodon-Cdwt-1B | 9.11 Γ— 10⁸ | Input: Input Type(s): Text (mRNA Sequence) Input Format: fasta files converted to memmaps Input Parameters: 1D Other Properties Related to Input: mRNA sequence represented as a string of codons, of maximum length 2046. Longer sequences are automatically truncated to this length Output: Output Type(s): mRNA Sequence Output Format: Text Output Parameters: 2D Other Properties Related to Output: Numeric 2D tensor with float-point values representing probabilities of a given codon at a give position within the sequence Our AI models are designed and/or optimized to run on NVIDIA GPU-accelerated systems. By leveraging NVIDIA’s hardware (e.g. GPU cores) and software frameworks (e.g., CUDA libraries), the model achieves faster training and inference times compared to CPU-only solutions. Software Integration: Runtime Engine(s): PyTorch - 2.5.1 Supported Hardware Microarchitecture Compatibility: NVIDIA Ampere NVIDIA Hopper The integration of foundation and fine-tuned models into AI systems requires additional testing using use-case-specific data to ensure safe and effective deployment. Following the V-model methodology, iterative testing and validation at both unit and system levels are essential to mitigate risks, meet technical and functional requirements, and ensure compliance with safety and ethical standards before deployment. Model Version(s): - NV-CodonFM-Encodon-80M-v1 - NV-CodonFM-Encodon-600M-v1 - NV-CodonFM-Encodon-1B-v1 - NV-CodonFM-Encodon-Cdwt-1B-v1 - NV-CodonFM-Encodon-TE-80M-v1 - NV-CodonFM-Encodon-TE-600M-v1 - NV-CodonFM-Encodon-TE-1B-v1 - NV-CodonFM-Encodon-TE-Cdwt-1B-v1 Properties: Coding sequences from the NCBI RefSeq database (release 2024-04) were used for training. A total of >130M non-viral protein-coding sequences from >22,000 species were included, comprising >2,000 eukaryotes. Sequences not divisible by three or containing ambiguous bases were removed. Taxonomy-level deduplication using MMSeqs eliminated redundant entries, and coding sequences from bacteria pathogenic to humans were excluded. The resulting dataset was partitioned into nine species groups: primates, archaea, bacteria, fungi, invertebrate, plant, protozoa, non-primate mammals, and non-mammal vertebrates. Sequences were clustered by similarity and then split into training and validation sets with stratification across groups to ensure balanced representation. Encodon models use codon-level tokenization, processing input sequences of up to 2,046 codons. Each model was trained using a masked language modeling (MLM) objective, where randomly masked codons were predicted from their context. The Encodon pretraining dataset was sorted based on sequence taxonomy to maintain species balance, and sequence subsets could be resampled dynamically. Non-Audio, Image, Text Training Data Size: NCBI RefSeq genomes FTP directory currently contains over 395,000 genomes totaling approximately 3.3 terabases (Tb) Data Collection Method for all data Dataset: Automatic/Sensors | Link | Properties | |------|--------------------------------------------------| | ClinVar Variant Interpretation | This task involves classifying genetic variants from ClinVar, a publicly available database that aggregates information about the clinical significance of human genetic variants, into pathogenic or benign categories based on their coding sequence context| | Denovo variant classification | This task uses variants from the Deciphering Developmental Disorders (DDD) and autism spectrum disorder (ASD) cohort study, which catalogs genetic mutations linked to rare pediatric and developmental diseases, to evaluate classification of pathogenic versus benign variants based on coding sequence context.| | mRNA Translation Efficiency | This task predicts ribosome profiling signal intensity along coding sequences, evaluating how well models capture translation efficiency and codon-level regulation from sequence context.| | Protein Abundance | This task predicts fluorescent protein expression levels (mRFP) from coding sequences, testing how accurately models capture codon-dependent effects on translation efficiency and protein abundance. | Ethical Considerations: NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse. For more detailed information on ethical considerations for this model, please see the Model Card++ Bias, Explainability, Safety & Security, and Privacy Subcards. Users are responsible for ensuring the physical properties of model-generated molecules are appropriately evaluated and comply with applicable safety regulations and ethical standards. Please report model quality, risk, security vulnerabilities or NVIDIA AI Concerns here.

NaNK
β€”
33
0

Cosmos-1.0-Autoregressive-13B-Video2World

NaNK
β€”
32
32

groupvit-gcc-redcaps

β€”
31
5

nemo-megatron-gpt-20B

NaNK
license:cc-by-4.0
28
32

NV-CodonFM-Encodon-Cdwt-1B-v1

NaNK
β€”
28
0

GR00T-N1.5-3B-WaveHand

NaNK
β€”
26
1

Llama-3.1-Nemotron-70B-Reward

NaNK
llama3.1
25
77

ChronoEdit-14B-Diffusers-Paint-Brush-Lora

NaNK
license:apache-2.0
25
16

Nemoretriever Page Elements V3

The Nemotron Page Elements v3 model is a specialized object detection model designed to identify and extract elements from document pages. While the underlying technology builds upon work from Megvii Technology, we developed our own base model through complete retraining rather than using pre-trained weights. YOLOX is an anchor-free version of YOLO (You Only Look Once), this model combines a simpler architecture with enhanced performance. The model is trained to detect tables, charts, infographics, titles, header/footers and texts in documents. This model supersedes the nemotron-page-elements model and is a part of the NVIDIA Nemotron family of NIM microservices specifically for object detection and multimodal extraction of enterprise documents. This model is ready for commercial/non-commercial use. We are excited to announce the open sourcing of this commercial model. For users interested in deploying this model in production environments, it is also available via the model API in NVIDIA Inference Microservices (NIM) at nemoretriever-page-elements-v2. The use of this model is governed by the NVIDIA Open Model License Agreement and the use of the post-processing scripts are licensed under Apache 2.0. - Theo Viel - Bo Liu - Darragh Hanley - Even Oldridge Correspondence to Theo Viel ([email protected]) and Bo Liu ([email protected]) The Nemotron Page Elements v3 model is designed for automating extraction of text, charts, tables, infographics etc in enterprise documents. It can be used for document analysis, understanding and processing. Key applications include: - Enterprise document extraction, embedding and indexing - Augmenting Retrieval Augmented Generation (RAG) workflows with multimodal retrieval - Data extraction from legacy documents and reports 10/23/2025 via https://huggingface.co/nvidia/nemotron-page-elements-v3 - YOLOX paper: https://arxiv.org/abs/2107.08430 - YOLOX repo: https://github.com/Megvii-BaseDetection/YOLOX - Previous version of the Page Element model: https://build.nvidia.com/nvidia/nemoretriever-page-elements-v2 - Technical blog: https://developer.nvidia.com/blog/approaches-to-pdf-data-extraction-for-information-retrieval/ Architecture Type: YOLOX Network Architecture: DarkNet53 Backbone \+ FPN Decoupled head (one 1x1 convolution \+ 2 parallel 3x3 convolutions (one for the classification and one for the bounding box prediction). YOLOX is a single-stage object detector that improves on Yolo-v3. This model was developed based on the Yolo architecture Number of model parameters: 5.4e7 Input Type(s): Image Input Format(s): Red, Green, Blue (RGB) Input Parameters: Two-Dimensional (2D) Other Properties Related to Input: Image size resized to `(1024, 1024)` Output Type(s): Array Output Format: A dictionary of dictionaries containing `np.ndarray` objects. The outer dictionary has entries for each sample (page), and the inner dictionary contains a list of dictionaries, each with a bounding box (`np.ndarray`), class label, and confidence score for that page. Output Parameters: One-Dimensional (1D) Other Properties Related to Output: The output contains bounding boxes, detection confidence scores, and object classes (chart, table, infographic, title, text, headers and footers). The thresholds used for non-maximum suppression are `confthresh=0.01` and `iouthresh=0.5`. Output Classes: Table Data structured in rows and columns Chart Specifically bar charts, line charts, or pie charts Infographic Visual representations of information that is more complex than a chart, including diagrams and flowcharts Maps are not considered infographics Title Titles can be section titles, or table/chart/infographic titles Header/footer Page headers and footers Text Texts are regions of one or more text paragraphs, or standalone text not belonging to any of the classes above Our AI models are designed and/or optimized to run on NVIDIA GPU-accelerated systems. By leveraging NVIDIA’s hardware (e.g. GPU cores) and software frameworks (e.g., CUDA libraries), the model achieves faster training and inference times compared to CPU-only solutions. The model requires torch, and the custom code available in this repository. - Make sure git-lfs is installed (https://git-lfs.com) Optional: This can be installed as a package using pip Note that this repository only provides minimal code to infer the model. If you wish to do additional training, refer to the original repo. Additional post-processing might be required to use the model as part of a data extraction pipeline. We provide examples in the notebook `Demo.ipynb`. Data Modality: Image Image Training Data Size: Less than a Million Images Data collection method by dataset: Automated Labeling method by dataset: Hybrid: Automated, Human Pretraining (by NVIDIA): 118,287 images of the COCO train2017 dataset Finetuning (by NVIDIA): 36,093 images from Digital Corpora dataset, with annotations from Azure AI Document Intelligence and data annotation team Number of bounding boxes per class: 35,328 tables, 44,178 titles, 11,313 charts and 6,500 infographics, 90,812 texts and 10,743 header/footers. The layout model of Document Intelligence was used with `2024-02-29-preview` API version. The primary evaluation set is a cut of the Azure labels and digital corpora images. Number of bounding boxes per class: 1,985 tables, 2,922 titles, 498 charts, 572 infographics, 4,400 texts and 492 header/footers. Mean Average Precision (mAP) was used as an evaluation metric, which measures the model's ability to correctly identify and localize objects across different confidence thresholds. Data collection method by dataset: Hybrid: Automated, Human Labeling method by dataset: Hybrid: Automated, Human Properties: We evaluated with Azure labels from manually selected pages, as well as manual inspection on public PDFs and powerpoint slides. Per-class Performance Metrics: | Class | AP (%) | AR (%) | |:------------|:-------|:-------| | table | 44.643 | 62.242 | | chart | 54.191 | 77.557 | | title | 38.529 | 56.315 | | infographic | 66.863 | 69.306 | | text | 45.418 | 73.017 | | headerfooter | 53.895 | 75.670 | Test hardware: See Support Matrix from NIM documentation ---> NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse. | Field | Response | | ----- | ----- | | Participation considerations from adversely impacted groups protected classes in model design and testing | None | | Measures taken to mitigate against unwanted bias | None | | Field | Response | | ----- | ----- | | Intended Task/Domain: | Document Understanding | | Model Type: | YOLOX Object Detection for Charts, Tables, Infographics, Header/footers, Texts, and Titles | | Intended User: | Enterprise developers, data scientists, and other technical users who need to extract structural elements from documents. | | Output: | After post-processing, the output is three numpy array that contains the detections: `boxes [N x 4]` (format is normalized `(xmin, ymin, xmax, ymax)`), associated classes: `labels [N]` and confidence scores: `scores [N]`.| | Describe how the model works: | The model identifies objects in an image by first dividing the image into a grid. For each grid cell, it extracts visual features and simultaneously predicts which objects are present (for example, 'chart' or 'table') and where they are located in that cell, all in a single pass through the image. | | Name the adversely impacted groups this has been tested to deliver comparable outcomes regardless of: | Not Applicable | | Technical Limitations & Mitigation: | The model may not generalize to unknown document types/formats not commonly found on the web. Further fine-tuning might be required for such documents. | | Verified to have met prescribed NVIDIA quality standards: | Yes | | Performance Metrics: | Mean Average Precision, detectionr recall and visual inspection | | Potential Known Risks: | This model may not always detect all elements in a document. | | Licensing & Terms of Use: | Use of this model is governed by NVIDIA Open Model License Agreement and the use of the post-processing scripts are licensed under Apache 2.0. | | Field | Response | | ----- | ----- | | Generatable or reverse engineerable personal data? | No | | Personal data used to create this model? | No | | Was consent obtained for any personal data used? | Not Applicable | | How often is the dataset reviewed? | Before Release | | Is there provenance for all datasets used in training? | Yes | | Does data labeling (annotation, metadata) comply with privacy laws? | Yes | | Is data compliant with data subject requests for data correction or removal, if such a request was made? | No, not possible with externally-sourced data. | | Applicable Privacy Policy | https://www.nvidia.com/en-us/about-nvidia/privacy-policy/ | | Field | Response | | ----- | ----- | | Model Application Field(s): | Object Detection for Retrieval, focused on Enterprise | | Describe the life critical impact (if present). | Not Applicable | | Use Case Restrictions: | Abide by NVIDIA Open Model License Agreement and the use of the post-processing scripts are licensed under Apache 2.0. | | Model and dataset restrictions: | The Principle of least privilege (PoLP) is applied limiting access for dataset generation and model development. Restrictions enforce dataset access during training, and dataset license constraints adhered to. |

β€”
25
13

nemo-megatron-t5-3B

NaNK
license:cc-by-4.0
22
10

Nemotron-4-340B-Base

NaNK
β€”
21
146

Nemoretriever Table Structure V1

The input of this model is expected to be a table image. You can use the Nemotron Page Element v3 to detect and crop such images. The Nemotron Table Structure v1 model is a specialized object detection model designed to identify and extract the structure of tables in images. Based on YOLOX, an anchor-free version of YOLO (You Only Look Once), this model combines a simpler architecture with enhanced performance. While the underlying technology builds upon work from Megvii Technology, we developed our own base model through complete retraining rather than using pre-trained weights. The model excels at detecting and localizing the fundamental structural elements within tables. Through careful fine-tuning, it can accurately identify and delineate three key components within tables: 1. Individual cells (including merged cells) 2. Rows 3. Columns This specialized focus on table structure enables precise decomposition of complex tables into their constituent parts, forming the foundation for downstream retrieval tasks. This model helps convert tables into the markdown format which can improve retrieval accuracy. This model is ready for commercial/non-commercial use. We are excited to announce the open sourcing of this commercial model. For users interested in deploying this model in production environments, it is also available via the model API in NVIDIA Inference Microservices (NIM) at nemotron-table-structure-v1. The use of this model is governed by the NVIDIA Open Model License Agreement and the use of the post-processing scripts are licensed under Apache 2.0. - Theo Viel - Bo Liu - Darragh Hanley - Even Oldridge Correspondence to Theo Viel ([email protected]) and Bo Liu ([email protected]) The Nemotron Table Structure v1 model specializes in analyzing images containing tables by: - Detecting and extracting table structure elements (rows, columns, and cells) - Providing precise location information for each detected element - Supporting downstream tasks like table analysis and data extraction The model is designed to work in conjunction with OCR (Optical Character Recognition) systems to: 1. Identify the structural layout of tables 2. Preserve the relationships between table elements 3. Enable accurate extraction of tabular data from images Ideal for: - Document processing systems - Automated data extraction pipelines - Digital content management solutions - Business intelligence applications 10/23/2025 via https://huggingface.co/nvidia/nemotron-table-structure-v1 - YOLOX paper: https://arxiv.org/abs/2107.08430 - YOLOX repo: https://github.com/Megvii-BaseDetection/YOLOX - Technical blog: https://developer.nvidia.com/blog/approaches-to-pdf-data-extraction-for-information-retrieval/ Architecture Type: YOLOX Network Architecture: DarkNet53 Backbone \+ FPN Decoupled head (one 1x1 convolution \+ 2 parallel 3x3 convolutions (one for the classification and one for the bounding box prediction). YOLOX is a single-stage object detector that improves on Yolo-v3. This model was developed based on the Yolo architecture Number of model parameters: 5.4e7 Input Type(s): Image Input Format(s): Red, Green, Blue (RGB) Input Parameters: Two-Dimensional (2D) Other Properties Related to Input: Image size resized to `(1024, 1024)` Output Type(s): Array Output Format: A dictionary of dictionaries containing `np.ndarray` objects. The outer dictionary has entries for each sample (page), and the inner dictionary contains a list of dictionaries, each with a bounding box (`np.ndarray`), class label, and confidence score for that page. Output Parameters: One-Dimensional (1D) Other Properties Related to Output: The output contains bounding boxes, detection confidence scores, and object classes (cell, row, column). The thresholds used for non-maximum suppression are `confthresh = 0.01` and `iouthresh = 0.25` Output Classes: Cell Table cell Row Table row Column Table column Our AI models are designed and/or optimized to run on NVIDIA GPU-accelerated systems. By leveraging NVIDIA’s hardware (e.g. GPU cores) and software frameworks (e.g., CUDA libraries), the model achieves faster training and inference times compared to CPU-only solutions. The model requires torch, and the custom code available in this repository. - Make sure git-lfs is installed (https://git-lfs.com) Optional: This can be installed as a package using pip Note that this repository only provides minimal code to infer the model. If you wish to do additional training, refer to the original repo. Additional post-processing might be required to use the model as part of a data extraction pipeline. We show how to use the model as part of a table to text pipeline alongside with the Nemotron OCR in the notebook `Demo.ipynb`. Disclaimer: We are aware of some issues with the model, and will provide a v2 with improved performance in the future which addresses the following issues: - The model appears to be less confident in detecting cells in the bottom of the table, which sometimes results in missed cells. - Add an extra class for table titles - Add support for non full-page tables Data Modality: Image Image Training Data Size: Less than a Million Images Data collection method by dataset: Automated Labeling method by dataset: Automated Pretraining (by NVIDIA): 118,287 images of the COCO train2017 dataset Finetuning (by NVIDIA): 23,977 images from Digital Corpora dataset, with annotations from Azure AI Document Intelligence. Number of bounding boxes per class: 1,828,978 cells, 134,089 columns and 316,901 rows. The layout model of Document Intelligence was used with `2024-02-29-preview` API version. The primary evaluation set is a cut of the Azure labels and digital corpora images. Number of bounding boxes per class: 200,840 cells, 13,670 columns and 34,575 rows. Mean Average Precision (mAP) was used as an evaluation metric, which measures the model's ability to correctly identify and localize objects across different confidence thresholds. Data collection method by dataset: Hybrid: Automated, Human Labeling method by dataset: Hybrid: Automated, Human Properties: We evaluated with Azure labels from manually selected pages, as well as manual inspection on public PDFs and powerpoint slides. Per-class Performance Metrics: | Class | AP (%) | AR (%) | |:-------|:-------|:-------| | cell | 58.365 | 60.647 | | row | 76.992 | 81.115 | | column | 85.293 | 87.434 | Test hardware: See Support Matrix from NIM documentation ---> NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse. For more detailed information on ethical considerations for this model, please see the Explainability, Bias, Safety & Security, and Privacy sections below. Please report security vulnerabilities or NVIDIA AI Concerns here. | Field | Response | | ----- | ----- | | Participation considerations from adversely impacted groups protected classes in model design and testing | None | | Measures taken to mitigate against unwanted bias | None | | Field | Response | | ----- | ----- | | Intended Application & Domain: | Object Detection | | Model Type: | YOLOX-architecture for detection of table structure within images of tables. | | Intended User: | Enterprise developers, data scientists, and other technical users who need to extract table structure from images. | | Output: | After post-processing, the output is three numpy array that contains the detections: `boxes [N x 4]` (format is normalized `(xmin, ymin, xmax, ymax)`), associated classes: `labels [N]` and confidence scores: `scores [N]`.| | Describe how the model works: | Finds and identifies objects in images by first dividing the image into a grid. For each section of the grid, the model uses a series of neural networks to extract visual features and simultaneously predict what objects are present (in this case "cell", "row", or "column") and exactly where they are located in that section, all in a single pass through the image. | | Name the adversely impacted groups this has been tested to deliver comparable outcomes regardless of: | Not Applicable | | Technical Limitations & Mitigation: | The model may not generalize to unknown table formats. Further fine-tuning might be required for such documents. Furthermore, it is not robust to rotated tables. | | Verified to have met prescribed NVIDIA quality standards: | Yes | | Performance Metrics: | Mean Average Precision, detectionr recall and visual inspection | | Potential Known Risks: | This model may not always detect all elements in a document. | | Licensing & Terms of Use: | Use of this model is governed by NVIDIA Open Model License Agreement and the use of the post-processing scripts are licensed under Apache 2.0. | | Field | Response | | ----- | ----- | | Generatable or reverse engineerable personal data? | No | | Personal data used to create this model? | No | | Was consent obtained for any personal data used? | Not Applicable | | How often is the dataset reviewed? | Before Release | | Is there provenance for all datasets used in training? | Yes | | Does data labeling (annotation, metadata) comply with privacy laws? | Yes | | Is data compliant with data subject requests for data correction or removal, if such a request was made? | No, not possible with externally-sourced data. | | Applicable Privacy Policy | https://www.nvidia.com/en-us/about-nvidia/privacy-policy/ | | Field | Response | | ----- | ----- | | Model Application Field(s): | Object Detection for Retrieval, focused on Enterprise | | Describe the life critical impact (if present). | Not Applicable | | Use Case Restrictions: | Abide by NVIDIA Open Model License Agreement and the use of the post-processing scripts are licensed under Apache 2.0. | | Model and dataset restrictions: | The Principle of least privilege (PoLP) is applied limiting access for dataset generation and model development. Restrictions enforce dataset access during training, and dataset license constraints adhered to. |

β€”
21
8

Qwen2.5 VL 7B Surg CholecT50

Description: Qwen2.5-VL-7B-Surg-CholecT50 is a multimodal large language model fine-tuned on the CholecT50 dataset of laparoscopic cholecystectomy procedures to recognize and describe surgical actions, instruments, and targets in endoscopic video frames. Qwen2.5-VL-7B-Surg-CholecT50 was developed by NVIDIA for research in surgical workflow analyses and fine-grained action recognition. Use Case: Primarily intended for surgical researchers, healthcare AI developers, or academic institutions exploring laparoscopic action recognition and surgical workflow analytics. References(s): Twinanda, A. P., Shehata, S., Mutter, D., Marescaux, J., De Mathelin, M., & Padoy, N. (2016). Endonet: a deep architecture for recognition tasks on laparoscopic videos. C.I. Nwoye, N. Padoy. Data Splits and Metrics for Benchmarking Methods on Surgical Action Triplet Datasets. arXiv:2204.05235. Model Architecture: Architecture Type: Transformer-based Large Language Model with a Vision Adapter This model was developed based on Qwen2.5-VL-7B Number of model parameters: ~7.0Γ—10^9 Input: Input Type(s): Image (endoscopic frame), (Optional) Text Prompt Input Format: Red, Green, Blue (RGB), String Input Parameters: Image: Two-Dimensional (2D) laparoscopic image frames (extracted at 1 fps), Text: One-Dimensional (1D) Other Properties Related to Input: Recommended resolution: 480p or higher. Minimal resizing (e.g., 224Γ—224) if required by the model’s vision encoder. Token limit for text context: up to ~4k tokens. Output: Output Type(s): Text Output Format: String Output Parameters: One-Dimensional (1D) Other Properties Related to Output: Returns natural language descriptions of recognized instruments, actions, and targets; no bounding boxes or segmentation maps by default. Downstream systems may parse the text output for analytics. NVIDIA GPUs can significantly reduce inference time. Our AI models are designed and/or optimized to run on NVIDIA GPU-accelerated systems. By leveraging NVIDIA’s hardware (e.g. GPU cores) and software frameworks (e.g., CUDA libraries), the model achieves faster training and inference times compared to CPU-only solutions. Software Integration: Runtime Engine(s): Any standard LLM-serving solution (e.g., PyTorch with Triton Inference Server) Supported Hardware Microarchitecture Compatibility: NVIDIA Ampere (e.g., A100) NVIDIA Hopper (e.g., H100) The integration of foundation and fine-tuned models into AI systems requires additional testing using use-case-specific data to ensure safe and effective deployment. Following the V-model methodology, iterative testing and validation at both unit and system levels are essential to mitigate risks, meet technical and functional requirements, and ensure compliance with safety and ethical standards before deployment. This model may be used with the MONAI Surgical Agent Framework Image Training Data Size Less than a Million Images Data Collection Method by dataset Hybrid: Automated, Human Properties (Quantity, Dataset Descriptions, Sensor(s)): ~50 laparoscopic cholecystectomy procedures; frames extracted at 1 fps (~100K training frames); annotations include ` ` triplets. Data Collection Method by dataset: Hybrid: Automated, Human Properties (Quantity, Dataset Descriptions, Sensor(s)): ~1–2K frames for testing (approx). Link: CholecT50 (dedicated set never seen during training) Benchmark Score F1-score (Triplets): Instrument: 0.81, Verb: 0.64, Target (Anatomy): 0.60 Data Collection Method by dataset: Hybrid: Automated, Human Properties (Quantity, Dataset Descriptions, Sensor(s)): ~1–2K frames for final evaluation. Ethical Considerations: NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse. Please make sure you have proper rights and permissions for all input image and video content; if image or video includes people, personal health information, or intellectual property, the image or video generated will not blur or maintain proportions of image subjects included. Please report model quality, risk, security vulnerabilities or NVIDIA AI Concerns here. Please report model quality, risk, security vulnerabilities or NVIDIA AI Concerns here.

NaNK
β€”
21
7

Nemoretriever Graphic Elements V1

β€”
15
5

Llama-3.1-Nemotron-70B-Instruct

NaNK
llama3.1
14
567

NVIDIA-Nemotron-Parse-v1.1-Lite

β€”
13
1