HuggingFaceTB

✓ VerifiedAI Startup

Hugging Face's technical team, platform leaders

80 models • 21 total models in database
Sort by:

SmolLM2-135M

--- library_name: transformers license: apache-2.0 language: - en ---

llama
733,084
141

SmolLM2-360M-Instruct

--- library_name: transformers license: apache-2.0 language: - en pipeline_tag: text-generation tags: - safetensors - onnx - transformers.js base_model: - HuggingFaceTB/SmolLM2-360M ---

llama
393,195
154

SmolLM-135M

--- library_name: transformers license: apache-2.0 language: - en datasets: - HuggingFaceTB/smollm-corpus ---

llama
377,386
230

SmolLM2-360M

--- library_name: transformers license: apache-2.0 language: - en ---

llama
215,074
72

SmolVLM-256M-Instruct

--- library_name: transformers license: apache-2.0 datasets: - HuggingFaceM4/the_cauldron - HuggingFaceM4/Docmatix pipeline_tag: image-text-to-text language: - en base_model: - HuggingFaceTB/SmolLM2-135M-Instruct - google/siglip-base-patch16-512 ---

license:apache-2.0
197,333
297

SmolLM2-135M-Instruct

--- library_name: transformers license: apache-2.0 language: - en pipeline_tag: text-generation tags: - safetensors - onnx - transformers.js base_model: - HuggingFaceTB/SmolLM2-135M ---

llama
167,293
264

SmolVLM2-2.2B-Instruct

--- library_name: transformers license: apache-2.0 datasets: - HuggingFaceM4/the_cauldron - HuggingFaceM4/Docmatix - lmms-lab/LLaVA-OneVision-Data - lmms-lab/M4-Instruct-Data - HuggingFaceFV/finevideo - MAmmoTH-VL/MAmmoTH-VL-Instruct-12M - lmms-lab/LLaVA-Video-178K - orrzohar/Video-STaR - Mutonix/Vript - TIGER-Lab/VISTA-400K - Enxin/MovieChat-1K_train - ShareGPT4Video/ShareGPT4Video pipeline_tag: image-text-to-text tags: - video-text-to-text language: - en base_model: - HuggingFaceTB/SmolVLM-Ins

NaNK
license:apache-2.0
166,702
281

SmolVLM2-500M-Video-Instruct

--- library_name: transformers license: apache-2.0 datasets: - HuggingFaceM4/the_cauldron - HuggingFaceM4/Docmatix - lmms-lab/LLaVA-OneVision-Data - lmms-lab/M4-Instruct-Data - HuggingFaceFV/finevideo - MAmmoTH-VL/MAmmoTH-VL-Instruct-12M - lmms-lab/LLaVA-Video-178K - orrzohar/Video-STaR - Mutonix/Vript - TIGER-Lab/VISTA-400K - Enxin/MovieChat-1K_train - ShareGPT4Video/ShareGPT4Video pipeline_tag: image-text-to-text language: - en base_model: - HuggingFaceTB/SmolVLM-500M-Instruct ---

license:apache-2.0
125,449
103

SmolVLM-Instruct

--- library_name: transformers license: apache-2.0 datasets: - HuggingFaceM4/the_cauldron - HuggingFaceM4/Docmatix pipeline_tag: image-text-to-text language: - en base_model: - HuggingFaceTB/SmolLM2-1.7B-Instruct - google/siglip-so400m-patch14-384 ---

license:apache-2.0
91,126
559

SmolLM-360M

A model based on the Transformers library, licensed under Apache 2.0, designed for natural language processing tasks.

llama
73,830
67

SmolVLM2-256M-Video-Instruct

SmolVLM2-256M-Video is a lightweight multimodal model designed to analyze video content. The model processes videos, images, and text inputs to generate text outputs - whether answering questions about media files, comparing visual content, or transcribing text from images. Despite its compact size, requiring only 1.38GB of GPU RAM for video inference. This efficiency makes it particularly well-suited for on-device applications that require specific domain fine-tuning and computational resources may be limited. Model Summary - Developed by: Hugging Face 🤗 - Model type: Multi-modal model (image/multi-image/video/text) - Language(s) (NLP): English - License: Apache 2.0 - Architecture: Based on Idefics3 (see technical summary) - Demo: Video Highlight Generator - Blog: Blog post SmolVLM2 can be used for inference on multimodal (video / image / text) tasks where the input consists of text queries along with video or one or more images. Text and media files can be interleaved arbitrarily, enabling tasks like captioning, visual question answering, and storytelling based on visual content. The model does not support image or video generation. To fine-tune SmolVLM2 on a specific task, you can follow the fine-tuning tutorial. We evaluated the performance of the SmolVLM2 family on the following scientific benchmarks: | Size | Video-MME | MLVU | MVBench | |----------|-----------------|----------|---------------| | 2.2B | 52.1 | 55.2 | 46.27 | | 500M | 42.2 | 47.3 | 39.73 | | 256M | 33.7 | 40.6 | 32.7 | You can use transformers to load, infer and fine-tune SmolVLM. Make sure you have num2words, flash-attn and latest transformers installed. You can load the model as follows. You preprocess your inputs directly using chat templates and directly passing them To use SmolVLM2 for video inference, make sure you have decord installed. You can interleave multiple media with text using chat templates. SmolVLM is not intended for high-stakes scenarios or critical decision-making processes that affect an individual's well-being or livelihood. The model may produce content that appears factual but may not be accurate. Misuse includes, but is not limited to: - Prohibited Uses: - Evaluating or scoring individuals (e.g., in employment, education, credit) - Critical automated decision-making - Generating unreliable factual content - Malicious Activities: - Spam generation - Disinformation campaigns - Harassment or abuse - Unauthorized surveillance SmolVLM2 is built upon SigLIP as image encoder and SmolLM2 for text decoder part. We release the SmolVLM2 checkpoints under the Apache 2.0 license. Citation information You can cite us in the following way: Training Data SmolVLM2 used 3.3M samples for training originally from ten different datasets: LlaVa Onevision, M4-Instruct, Mammoth, LlaVa Video 178K, FineVideo, VideoStar, VRipt, Vista-400K, MovieChat and ShareGPT4Video. In the following plots we give a general overview of the samples across modalities and the source of those samples. | Data Type | Percentage | |--------------|------------| | Image | 34.4% | | Text | 20.2% | | Video | 33.0% | | Multi-image | 12.3% | Text Datasets | Dataset | Percentage | |--------------------------------------------|------------| | llava-onevision/magpieproft380bmt | 6.8% | | llava-onevision/magpieproft380btt | 6.8% | | llava-onevision/magpieproqwen272btt | 5.8% | | llava-onevision/mathqa | 0.9% | Multi-image Datasets | Dataset | Percentage | |--------------------------------------------|------------| | m4-instruct-data/m4instructmultiimage | 10.4% | | mammoth/multiimage-cap6 | 1.9% | Image Datasets | Dataset | Percentage | |--------------------------------------------|------------| | llava-onevision/other | 17.4% | | llava-onevision/visionflan | 3.9% | | llava-onevision/mavismathmetagen | 2.6% | | llava-onevision/mavismathrulegeo | 2.5% | | llava-onevision/sharegpt4o | 1.7% | | llava-onevision/sharegpt4vcoco | 1.5% | | llava-onevision/imagetextualization | 1.3% | | llava-onevision/sharegpt4vllava | 0.9% | | llava-onevision/mapqa | 0.9% | | llava-onevision/qa | 0.8% | | llava-onevision/textocr | 0.8% | Video Datasets | Dataset | Percentage | |--------------------------------------------|------------| | llava-video-178k/1-2m | 7.3% | | llava-video-178k/2-3m | 7.0% | | other-video/combined | 5.7% | | llava-video-178k/hound | 4.4% | | llava-video-178k/0-30s | 2.4% | | video-star/starb | 2.2% | | vista-400k/combined | 2.2% | | vript/long | 1.0% | | ShareGPT4Video/all | 0.8% |

license:apache-2.0
66,831
82

SmolLM3-3B

--- library_name: transformers license: apache-2.0 language: - en - fr - es - it - pt - zh - ar - ru base_model: - HuggingFaceTB/SmolLM3-3B-Base ---

NaNK
license:apache-2.0
57,554
797

SmolVLM-500M-Instruct

--- library_name: transformers license: apache-2.0 datasets: - HuggingFaceM4/the_cauldron - HuggingFaceM4/Docmatix pipeline_tag: image-text-to-text language: - en base_model: - HuggingFaceTB/SmolLM2-360M-Instruct - google/siglip-base-patch16-512 ---

license:apache-2.0
37,830
182

SmolLM2-1.7B-Instruct

NaNK
llama
36,139
682

SmolVLM-256M-Base

26,175
17

SmolLM-1.7B

This model is based on the transformers library and is licensed under Apache 2.0.

NaNK
llama
21,810
176

SmolLM-135M-Instruct

This model is based on HuggingFaceTB/SmolLM-135M and is licensed under Apache 2.0.

llama
18,054
124

SmolLM2-1.7B

1. Model Summary 2. Evaluation 3. Limitations 4. Training 5. License 6. Citation SmolLM2 is a family of compact language models available in three size: 135M, 360M, and 1.7B parameters. They are capable of solving a wide range of tasks while being lightweight enough to run on-device. More details in our paper: https://arxiv.org/abs/2502.02737v1 The 1.7B variant demonstrates significant advances over its predecessor SmolLM1-1.7B, particularly in instruction following, knowledge, reasoning, and mathematics. It was trained on 11 trillion tokens using a diverse dataset combination: FineWeb-Edu, DCLM, The Stack, along with new mathematics and coding datasets that we curated and will release soon. We developed the instruct version through supervised fine-tuning (SFT) using a combination of public datasets and our own curated datasets. We then applied Direct Preference Optimization (DPO) using UltraFeedback. The instruct model additionally supports tasks such as text rewriting, summarization and function calling thanks to datasets developed by Argilla such as Synth-APIGen-v0.1. You can find the SFT dataset here: https://huggingface.co/datasets/HuggingFaceTB/smoltalk and finetuning code in the alignement handbook. For more details refer to: https://github.com/huggingface/smollm. You will find pre-training, post-training, evaluation and local inference code. Running the model on CPU/GPU/multi GPU Using full precision In this section, we report the evaluation results of SmolLM2. All evaluations are zero-shot unless stated otherwise, and we use lighteval to run them. | Metric | SmolLM2-1.7B | Llama-1B | Qwen2.5-1.5B | SmolLM1-1.7B | |------------------|--------------|-------------|---------------|--------------| | HellaSwag | 68.7 | 61.2 | 66.4 | 62.9 | | ARC (Average) | 60.5 | 49.2 | 58.5 | 59.9 | | PIQA | 77.6 | 74.8 | 76.1 | 76.0 | | MMLU-Pro (MCF) | 19.4 | 11.7 | 13.7 | 10.8 | | CommonsenseQA | 43.6 | 41.2 | 34.1 | 38.0 | | TriviaQA | 36.7 | 28.1 | 20.9 | 22.5 | | Winogrande | 59.4 | 57.8 | 59.3 | 54.7 | | OpenBookQA | 42.2 | 38.4 | 40.0 | 42.4 | | GSM8K (5-shot) | 31.0 | 7.2 | 61.3 | 5.5 | | Metric | SmolLM2-1.7B-Instruct | Llama-1B-Instruct | Qwen2.5-1.5B-Instruct | SmolLM1-1.7B-Instruct | |:-----------------------------|:---------------------:|:-----------------:|:----------------------:|:----------------------:| | IFEval (Average prompt/inst) | 56.7 | 53.5 | 47.4 | 23.1 | | MT-Bench | 6.13 | 5.48 | 6.52 | 4.33 | | OpenRewrite-Eval (microavg RougeL) | 44.9 | 39.2 | 46.9 | NaN | | HellaSwag | 66.1 | 56.1 | 60.9 | 55.5 | | ARC (Average) | 51.7 | 41.6 | 46.2 | 43.7 | | PIQA | 74.4 | 72.3 | 73.2 | 71.6 | | MMLU-Pro (MCF) | 19.3 | 12.7 | 24.2 | 11.7 | | BBH (3-shot) | 32.2 | 27.6 | 35.3 | 25.7 | | GSM8K (5-shot) | 48.2 | 26.8 | 42.8 | 4.62 | SmolLM2 models primarily understand and generate content in English. They can produce text on a variety of topics, but the generated content may not always be factually accurate, logically consistent, or free from biases present in the training data. These models should be used as assistive tools rather than definitive sources of information. Users should always verify important information and critically evaluate any generated content. - Architecture: Transformer decoder - Pretraining tokens: 11T - Precision: bfloat16

NaNK
llama
11,410
137

SmolLM3-3B-Base

1. Model Summary 2. How to use 3. Evaluation 4. Training 5. Limitations 6. License SmolLM3 is a 3B parameter language model designed to push the boundaries of small models. It supports 6 languages, advanced reasoning and long context. SmolLM3 is a fully open model that offers strong performance at the 3B–4B scale. SmolLM3-3B-Base is the base model after pretraining, you can find the instruct model at SmolLM3-3B. The model is a decoder-only transformer using GQA and NoPE, it was pretrained on 11.2T tokens with a staged curriculum of web, code, math and reasoning data. Post-training included midtraining on 140B reasoning tokens followed by supervised fine-tuning and alignment via Anchored Preference Optimization (APO). Key features - Instruct model optimized for hybrid reasoning - Fully open model: open weights + full training details including public data mixture and training configs - Long context: Trained on 64k context and suppots up to 128k tokens using YARN extrapolation - Multilingual: 6 natively supported (English, French, Spanish, German, Italian, and Portuguese) For more details refer to our blog post: https://hf.co/blog/smollm3 How to use The modeling code for SmolLM3 is available in transformers `v4.53.0`, so make sure to upgrade your transformers version. You can also load the model with the latest `vllm` which uses transformers as a backend. For local inference, you can use `llama.cpp`, `ONNX`, `MLX` and `MLC`. You can find quantized checkpoints in this collection (https://huggingface.co/collections/HuggingFaceTB/smollm3-686d33c1fdffe8e635317e23). The current `config.json` is set for context length up to 65,536 tokens. To handle longer inputs (128k or 256k), we utilize YaRN you can change the `maxpositionembeddings` and ropescaling` to: In this section, we report the evaluation results of SmolLM3 model. All evaluations are zero-shot unless stated otherwise, and we use lighteval to run them. We highlight the best score in bold and underline the second-best score. English benchmarks Note: All evaluations are zero-shot unless stated otherwise. For Ruler 64k evaluation, we apply YaRN to the Qwen models with 32k context to extrapolate the context length. | Category | Metric | SmolLM3-3B | Qwen2.5-3B | Llama3-3.2B | Qwen3-1.7B-Base | Qwen3-4B-Base | |---------|--------|---------------------|------------|--------------|------------------|---------------| | Reasoning & Commonsense| HellaSwag | 76.15 | 74.19 | 75.52 | 60.52 | 74.37 | | | ARC-CF (Average) | 65.61 | 59.81 | 58.58 | 55.88 | 62.11 | | | Winogrande | 58.88 | 61.41 | 58.72 | 57.06 | 59.59 | | | CommonsenseQA | 55.28 | 49.14 | 60.60 | 48.98 | 52.99 | | Knowledge & Understanding | MMLU-CF (Average) | 44.13 | 42.93 | 41.32 | 39.11 | 47.65 | | | MMLU Pro CF | 19.61 | 16.66 | 16.42 | 18.04 | 24.92 | | | MMLU Pro MCF | 32.70 | 31.32 | 25.07 | 30.39 | 41.07 | | | PIQA | 78.89 | 78.35 | 78.51 | 75.35 | 77.58 | | | OpenBookQA | 40.60 | 40.20 | 42.00 | 36.40 | 42.40 | | | BoolQ | 78.99 | 73.61 | 75.33 | 74.46 | 74.28 | | Math & Code | | | | | | | | Coding & math | HumanEval+ | 30.48 | 34.14| 25.00 | 43.29 | 54.87 | | | MBPP+ | 52.91 | 52.11 | 38.88| 59.25 | 63.75 | | | MATH (4-shot) | 46.10 | 40.10 | 7.44 | 41.64 | 51.20 | | | GSM8k (5-shot) | 67.63 | 70.13 | 25.92 | 65.88 | 74.14 | | Long context | | | | | | | | | Ruler 32k | 76.35 | 75.93 | 77.58 | 70.63 | 83.98 | | | Ruler 64k | 67.85 | 64.90 | 72.93 | 57.18 | 60.29 | | | Ruler 128k | 61.03 | 62.23 | 71.30 | 43.03 | 47.23 | | Category | Metric | SmolLM3 3B Base | Qwen2.5-3B | Llama3.2 3B | Qwen3 1.7B Base | Qwen3 4B Base | |---------|--------|---------------------|------------|--------------|------------------|---------------| | Main supported languages | | | | | | | | | French| MLMM Hellaswag | 63.94 | 57.47 | 57.66 | 51.26 | 61.00 | | | Belebele | 51.00 | 51.55 | 49.22 |49.44| 55.00 | | | Global MMLU (CF) | 38.37 | 34.22 | 33.71 | 34.94 |41.80 | | | Flores-200 (5-shot) | 62.85| 61.38| 62.89 | 58.68 | 65.76 | | Spanish| MLMM Hellaswag | 65.85 | 58.25 | 59.39 | 52.40 | 61.85 | | | Belebele | 47.00 | 48.88 | 47.00 | 47.56 | 50.33 | | | Global MMLU (CF) | 38.51 | 35.84 | 35.60 | 34.79 |41.22 | | | Flores-200 (5-shot) | 48.25 | 50.00| 44.45 | 46.93 | 50.16 | | German| MLMM Hellaswag | 59.56 | 49.99| 53.19|46.10| 56.43 | | | Belebele | 48.44 | 47.88 | 46.22 | 48.00 | 53.44| | | Global MMLU (CF) | 35.10 | 33.19 | 32.60 | 32.73 |38.70 | | | Flores-200 (5-shot) | 56.60| 50.63| 54.95 | 52.58 | 50.48 | | Italian| MLMM Hellaswag | 62.49 | 53.21 | 54.96 | 48.72 | 58.76 | | | Belebele | 46.44 | 44.77 | 43.88 | 44.00 | 48.78 | 44.88 | | | Global MMLU (CF) | 36.99 | 33.91 | 32.79 | 35.37 |39.26 | | | Flores-200 (5-shot) | 52.65 | 54.87| 48.83 | 48.37 | 49.11 | | Portuguese| MLMM Hellaswag | 63.22 | 57.38 | 56.84 | 50.73 | 59.89 | | | Belebele | 47.67 | 49.22 | 45.00 | 44.00 | 50.00 | 49.00 | | | Global MMLU (CF) | 36.88 | 34.72 | 33.05 | 35.26 |40.66 | | | Flores-200 (5-shot) | 60.93 |57.68| 54.28 | 56.58 | 63.43 | The model has also been trained on Arabic (standard), Chinese and Russian data, but has seen fewer tokens in these languages compared to the 6 above. We report the performance on these langages for information. | Category | Metric | SmolLM3 3B Base | Qwen2.5-3B | Llama3.2 3B | Qwen3 1.7B Base | Qwen3 4B Base | |---------|--------|---------------------|------------|--------------|------------------|---------------| | Other supported languages | | | | | | | | | Arabic| Belebele | 40.22 | 44.22 | 45.33 | 42.33 | 51.78 | | | Global MMLU (CF) | 28.57 | 28.81 | 27.67 | 29.37 | 31.85 | | | Flores-200 (5-shot) | 40.22 | 39.44 | 44.43 | 35.82 | 39.76 | | Chinese| Belebele | 43.78 | 44.56 | 49.56 | 48.78 | 53.22 | | | Global MMLU (CF) | 36.16 | 33.79 | 39.57 | 38.56 | 44.55 | | | Flores-200 (5-shot) | 29.17 | 33.21 | 31.89 | 25.70 | 32.50 | | Russian| Belebele | 47.44 | 45.89 | 47.44 | 45.22 | 51.44 | | | Global MMLU (CF) | 36.51 | 32.47 | 34.52 | 34.83 | 38.80 | | | Flores-200 (5-shot) | 47.13 | 48.74 | 50.74 | 54.70 | 60.53 | No Extended Thinking Evaluation results of non reasoning models and reasoning models in no thinking mode. We highlight the best and second-best scores in bold. | Category | Metric | SmoLLM3-3B | Qwen2.5-3B | Llama3.1-3B | Qwen3-1.7B | Qwen3-4B | |---------|--------|------------|------------|-------------|------------|----------| | High school math competition | AIME 2025 | 9.3 | 2.9 | 0.3 | 8.0 | 17.1 | | Math problem-solving | GSM-Plus | 72.8 | 74.1 | 59.2 | 68.3 | 82.1 | | Competitive programming | LiveCodeBench v4 | 15.2 | 10.5 | 3.4 | 15.0 | 24.9 | | Graduate-level reasoning | GPQA Diamond | 35.7 | 32.2 | 29.4 | 31.8 | 44.4 | | Instruction following | IFEval | 76.7 | 65.6 | 71.6 | 74.0 | 68.9 | | Alignment | MixEval Hard | 26.9 | 27.6 | 24.9 | 24.3 | 31.6 | | Tool Calling | BFCL| 92.3 | - | 92.3 | 89.5 | 95.0 | | Multilingual Q&A | Global MMLU | 53.5 | 50.54 | 46.8 | 49.5 | 65.1 | Extended Thinking Evaluation results in reasoning mode for SmolLM3 and Qwen3 models: | Category | Metric | SmoLLM3-3B | Qwen3-1.7B | Qwen3-4B | |---------|--------|------------|------------|----------| | High school math competition | AIME 2025 | 36.7 | 30.7 | 58.8 | | Math problem-solving | GSM-Plus | 83.4 | 79.4 | 88.2 | | Competitive programming | LiveCodeBench v4 | 30.0 | 34.4 | 52.9 | | Graduate-level reasoning | GPQA Diamond | 41.7 | 39.9 | 55.3 | | Instruction following | IFEval | 71.2 | 74.2 | 85.4 | | Alignment | MixEval Hard | 30.8 | 33.9 | 38.0 | | Tool Calling | BFCL | 88.8 | 88.8 | 95.5 | | Multilingual Q&A | Global MMLU | 64.1 | 62.3 | 73.3 | - Architecture: Transformer decoder - Pretraining tokens: 11T - Precision: bfloat16 - GPUs: 384 H100 - Training Framework: nanotron - Data processing framework: datatrove - Evaluation framework: lighteval - Post-training Framework: TRL Open resources Here is an infographic with all the training details. - The datasets used for pretraining can be found in this collection and those used in mid-training and post-training will be released in the following weeks - The training and evaluation configs and code can be found in the huggingface/smollm repository. - The training intermediate checkpoints are available at HuggingFaceTB/SmolLM3-3B-checkpoints SmolLM3 can produce text on a variety of topics, but the generated content may not always be factually accurate, logically consistent, or free from biases present in the training data. These models should be used as assistive tools rather than definitive sources of information. Users should always verify important information and critically evaluate any generated content.

NaNK
license:apache-2.0
11,316
134

SmolLM-360M-Instruct

License: Apache 2.0. Base model: HuggingFaceTB/SmolLM-360M.

llama
7,687
83

SmolLM2-360M-Instruct-GGUF

ngxson/SmolLM2-360M-Instruct-Q80-GGUF This model was converted to GGUF format from `HuggingFaceTB/SmolLM2-360M-Instruct` using llama.cpp via the ggml.ai's GGUF-my-repo space. Refer to the original model card for more details on the model. Use with llama.cpp Install llama.cpp through brew (works on Mac and Linux) Note: You can also use this checkpoint directly through the usage steps listed in the Llama.cpp repo as well. Step 2: Move into the llama.cpp folder and build it with `LLAMACURL=1` flag along with other hardware-specific flags (for ex: LLAMACUDA=1 for Nvidia GPUs on Linux).

llama-cpp
7,635
36

SmolLM-1.7B-Instruct

This model is based on HuggingFaceTB/SmolLM-1.7B and is licensed under Apache 2.0.

NaNK
llama
6,401
117

SmolLM3-3B-checkpoints

We are releasing intermediate checkpoints of SmolLM3 to enable further research. For more details, check the SmolLM GitHub repo with the end-to-end training and evaluation code: - ✓ Pretraining scripts (nanotron) - ✓ Post-training code SFT + APO (TRL/alignment-handbook) - ✓ Evaluation scripts to reproduce all reported metrics We release checkpoints every 40,000 steps, which equals 94.4B tokens. The GBS (Global Batch Size) in tokens for SmolLM3-3B is 2,359,296. To calculate the number of tokens from a given step: Stage 1: Steps 0 to 3,450,000 (86 checkpoints) config Stage 2: Steps 3,450,000 to 4,200,000 (19 checkpoints) config Stage 3: Steps 4,200,000 to 4,720,000 (13 checkpoints) config For the additional 2 stages that extend the context length to 64k, we sample checkpoints every 4,000 steps (9.4B tokens) for a total of 10 checkpoints: We release checkpoints at every step of our post-training recipe: Mid training, SFT, APO soup, and LC expert.

NaNK
license:apache-2.0
4,946
22

smollm-135M-instruct-v0.2-Q8_0-GGUF

NaNK
llama-cpp
4,638
5

SmolLM2-1.7B-Instruct-GGUF

NaNK
llama-cpp
3,755
45

SmolVLM-Base

NaNK
license:apache-2.0
1,817
83

finemath-classifier

license:mit
1,445
12

smolvlm-app-config

1,348
0

SmolLM2-1.7B-Instruct-16k

NaNK
llama
847
9

SmolLM3-3B-ONNX

NaNK
license:apache-2.0
380
19

SmolVLM 500M Base

269
12

smollm-360M-instruct-add-basics-q0f16-MLC

249
0

SmolLM2-1.7B-intermediate-checkpoints

NaNK
license:apache-2.0
212
2

smollm-360M-instruct-v0.2-Q8_0-GGUF

NaNK
llama-cpp
192
12

SmolLM-360M-Instruct-ONNX-fp16

llama
190
0

SmolLM2-135M-intermediate-checkpoints

license:apache-2.0
160
2

smollm2-135M-SFT-Only

NaNK
llama
142
1

cosmo-1b

NaNK
llama
132
132

SmolVLM2-2.2B-Base

This is the base model for SmolVLM2-2.2B, a lightweight multimodal model designed to analyze video content. The model processes videos, images, and text inputs to generate text outputs - whether answering questions about media files, comparing visual content, or transcribing text from images. Despite its compact size, requiring only 5.2GB of GPU RAM for video inference, it delivers robust performance on complex multimodal tasks. This efficiency makes it particularly well-suited for on-device applications where computational resources may be limited. Model Summary - Developed by: Hugging Face 🤗 - Model type: Multi-modal model (image/multi-image/video/text) - Language(s) (NLP): English - License: Apache 2.0 - Architecture: Based on Idefics3 (see technical summary) - Demo: Video Highlight Generator - Blog: Blog post SmolVLM2 can be used for inference on multimodal (video / image / text) tasks where the input consists of text queries along with video or one or more images. Text and media files can be interleaved arbitrarily, enabling tasks like captioning, visual question answering, and storytelling based on visual content. The model does not support image or video generation. To fine-tune SmolVLM2 on a specific task, you can follow the fine-tuning tutorial. You can use transformers to load, infer and fine-tune SmolVLM. Make sure you have num2words, flash-attn and latest transformers installed. You can load the model as follows. You preprocess your inputs directly using chat templates and directly passing them To use SmolVLM2 for video inference, make sure you have decord installed. You can interleave multiple media with text using chat templates. SmolVLM is not intended for high-stakes scenarios or critical decision-making processes that affect an individual's well-being or livelihood. The model may produce content that appears factual but may not be accurate. Misuse includes, but is not limited to: - Prohibited Uses: - Evaluating or scoring individuals (e.g., in employment, education, credit) - Critical automated decision-making - Generating unreliable factual content - Malicious Activities: - Spam generation - Disinformation campaigns - Harassment or abuse - Unauthorized surveillance SmolVLM2 is built upon the shape-optimized SigLIP as image encoder and SmolLM2 for text decoder part. We release the SmolVLM2 checkpoints under the Apache 2.0 license. Citation information You can cite us in the following way:

NaNK
license:apache-2.0
107
9

SmolLM2-1.7B-sft-only

NaNK
llama
103
0

smollm-360M-instruct-add-basics

llama
90
5

finemath-ablation-infiwebmath

NaNK
llama
90
0

finemath-ablation-finemath-4plus

NaNK
llama
88
1

finemath-ablation-finemath-3plus

NaNK
llama
87
0

stack-edu-classifier-python

NaNK
86
6

finemath-ablation-infiwebmath-4plus

NaNK
llama
86
2

finemath-ablation-infiwebmath-3plus

NaNK
llama
86
0

finemath-ablation-finemath-infimath-3plus

NaNK
llama
85
0

finemath-ablation-finemath-infimath-4plus

NaNK
llama
83
2

finemath-ablation-4plus-160B

NaNK
llama
80
0

finemath-ablation-3plus-160B

NaNK
llama
80
0

finemath-ablation-owm

This model is part of the 📐 FineMath ablations, we continue pretraining Llama-3.2-3B base on different math datasets for 60B tokens. The model has 3.21B parameters and 4096 context length. It was trained on 60B tokens from OpenWebMath, tokenized using `llama3` tokenizer. This model was trained on English math data and is not instruction-tuned, making it intended for text completion in English with a focus on math. It is important to note that the primary intended use case of this model is to compare its performance with other models trained under the same conditions. This model is not necessarily the best possible outcome achievable with the given dataset. We are releasing intermediate checkpoints for this model at intervals of every 10000 training steps (10B tokens) in separate branches. The naming convention is `10B`. You can load a specific model revision with `transformers` using the argument `revision`: You can access all the revisions for the models via the following code: Training Model - Architecture: Llama3 - Pretraining steps: 60k - Pretraining tokens: 60B - Precision: bfloat16 Software - nanotron for training - datatrove for tokenization - lighteval for evaluation Evaluation We used the SmolLM2 setup to evaluate all our ablation models with `lighteval`. You can find the details here: https://github.com/huggingface/smollm/tree/main/evaluation#smollm2-base-models Limitations This model was predominantly trained on English math data, potentially limiting its performance in other languages. Furthermore, the model's behavior is influenced by the quality and diversity of its training data, which may include biases and harmful content.

NaNK
llama
79
0

finemath-ablation-fwedu

NaNK
llama
78
0

FineMath-Llama-3B

NaNK
llama
72
16

SmolLM2-1.7B-Instruct-Q8-mlx

NaNK
llama
53
4

python-edu-scorer

license:apache-2.0
51
27

SmolLM2-135M-Instruct-Q8-mlx

llama
51
2

SmolLM2-360M-Instruct-Q8-mlx

llama
49
1

stack-edu-classifier-cpp

NaNK
47
0

stack-edu-classifier-csharp

NaNK
44
1

stack-edu-classifier-javascript

NaNK
44
0

SmolVLM-Synthetic

NaNK
license:apache-2.0
43
12

stack-edu-classifier-markdown

NaNK
42
1

stack-edu-classifier-typescript

NaNK
41
0

stack-edu-classifier-shell

NaNK
39
0

stack-edu-classifier-php

NaNK
39
0

stack-edu-classifier-go

NaNK
38
1

stack-edu-classifier-sql

NaNK
38
1

stack-edu-classifier-c

NaNK
38
0

stack-edu-classifier-rust

NaNK
38
0

stack-edu-classifier-java

NaNK
38
0

stack-edu-classifier-ruby

NaNK
37
0

stack-edu-classifier-swift

NaNK
37
0

SmolVLM-Instruct-DPO

NaNK
license:apache-2.0
32
22

smollm-1.7B-instruct-v0.2-Q8_0-GGUF

NaNK
llama-cpp
26
2

smollm-135M-instruct-add-basics-q0f16-MLC

6
0

smollm-1.7B-instruct-add-basics-q4f16_1-MLC

NaNK
2
2

SmolLM2-nanotron-ckpt

license:apache-2.0
0
7

cosmo2-tokenizer

0
3

SmolLM2-360M-intermediate-checkpoints

license:apache-2.0
0
1