tiiuae
falcon-7b-instruct
--- datasets: - tiiuae/falcon-refinedweb language: - en inference: true new_version: tiiuae/falcon-11B widget: - text: "Hey Falcon! Any recommendations for my holidays in Abu Dhabi?" example_title: "Abu Dhabi Trip" - text: "What's the Everett interpretation of quantum mechanics?" example_title: "Q/A: Quantum & Answers" - text: "Give me a list of the top 10 dive sites you would recommend around the world." example_title: "Diving Top 10" - text: "Can you tell me more about deep-water soloing?" exa
falcon-7b
falcon-40b-instruct
Falcon-H1-0.5B-Base
falcon-mamba-tiny-dev
Falcon3-1B-Instruct
falcon-40b
Falcon3-3B-Instruct
Falcon3 family of Open Foundation Models is a set of pretrained and instruct LLMs ranging from 1B to 10B parameters. Falcon3-3B-Instruct achieves strong results on reasoning, language understanding, instruction following, code and mathematics tasks. Falcon3-3B-Instruct supports 4 languages (English, French, Spanish, Portuguese) and a context length of up to 32K. Model Details - Architecture - Transformer-based causal decoder-only architecture - 22 decoder blocks - Grouped Query Attention (GQA) for faster inference: 12 query heads and 4 key-value heads - Wider head dimension: 256 - High RoPE value to support long context understanding: 1000042 - Uses SwiGLU and RMSNorm - 32K context length - 131K vocab size - Pruned and healed from Falcon3-7B-Base on only 100 Gigatokens of datasets comprising of web, code, STEM, high quality and mutlilingual data using 1024 H100 GPU chips - Posttrained on 1.2 million samples of STEM, conversational, code, safety and function call data - Supports EN, FR, ES, PT - Developed by Technology Innovation Institute - License: TII Falcon-LLM License 2.0 - Model Release Date: December 2024 Benchmarks We report in the following table our internal pipeline benchmarks. - We use lm-evaluation harness. - We report raw scores obtained by applying chat template and fewshotasmultiturn. - We use same batch-size across all models. Category Benchmark Llama-3.2-3B-Instruct Qwen2.5-3B-Instruct Nemotron-Mini-4B-Instruct Falcon3-3B-Instruct Reasoning Arc Challenge (25-shot) 50.9 55.0 56.2 55.5 CommonSense Understanding PIQA (0-shot) 74.6 73.8 74.6 75.6 Instructions following MT-Bench (avg) 7.1 8.0 6.7 7.2 Useful links - View our release blogpost. - Feel free to join our discord server if you have any questions or to interact with our researchers and developers. Citation If the Falcon3 family of models were helpful to your work, feel free to give us a cite.
Falcon3-7B-Instruct
falcon-rw-1b
falcon-mamba-7b-instruct
Falcon3-10B-Instruct
falcon-11B
Falcon2-11B is an 11B parameters causal decoder-only model built by TII and trained on over 5,000B tokens of RefinedWeb enhanced with curated corpora. The model is made available under the TII Falcon License 2.0, the permissive Apache 2.0-based software license which includes an acceptable use policy that promotes the responsible use of AI. 🤗 To get started with Falcon (inference, finetuning, quantization, etc.), we recommend reading this great blogpost from HF! ⚠️ This is a raw, pretrained model, which should be further finetuned for most usecases. 💥 Falcon LLMs require PyTorch 2.0 for use with `transformers`! For fast inference with Falcon, check-out Text Generation Inference! Read more in this blogpost. - Developed by: https://www.tii.ae - Model type: Causal decoder-only - Language(s) (NLP): English, German, Spanish, French, Italian, Portuguese, Polish, Dutch, Romanian, Czech, Swedish - License: TII Falcon License 2.0 Research on large language models; as a foundation for further specialization and finetuning for specific usecases (e.g., summarization, text generation, chatbot, etc.) Production use without adequate assessment of risks and mitigation; any use cases which may be considered irresponsible or harmful. Falcon2-11B is trained mostly on English, but also German, Spanish, French, Italian, Portuguese, Polish, Dutch, Romanian, Czech, Swedish. It will not generalize appropriately to other languages. Furthermore, as it is trained on a large-scale corpora representative of the web, it will carry the stereotypes and biases commonly encountered online. We recommend users of Falcon2-11B to consider finetuning it for the specific set of tasks of interest, and for guardrails and appropriate precautions to be taken for any production use. Falcon2-11B was trained over 5,000B tokens of RefinedWeb, a high-quality filtered and deduplicated web dataset which we enhanced with curated corpora. It followed a four stage training strategy. The first three stages were focused on increasing the context length, from to 2048 to 4096 and finally to 8192 tokens. The last stage aimed to further enhance performance using only high quality data. Overall, the data sources included RefinedWeb-English, Refined Web-Europe (cs, de, es, fr, it, nl, pl, pt, ro, sv), high quality technical data, code data, and conversational data extracted from public sources. | Stage | Context length | Tokens | |--------------|-----------------|-------------| | Stage 1 | 2048 | 4500 B | | Stage 2 | 4096 | 250 B | | Stage 3 | 8192 | 250 B | | Stage 4 | 8192 | 500 B | The data was tokenized with the Falcon-7B/11B tokenizer. Falcon2-11B was trained on 1024 A100 40GB GPUs for the majority of the training, using a 3D parallelism strategy (TP=8, PP=1, DP=128) combined with ZeRO and Flash-Attention 2. | Hyperparameter | Value | Comment | |--------------------|------------|-------------------------------------------| | Precision | `bfloat16` | | | Optimizer | AdamW | | | Max learning rate | 3.7e-4 | Following a linear warm-up, then cosine decay to 1.89e-5 across 4500 B tokens. | | Weight decay | 1e-1 | | | Z-loss | 1e-4 | | | Batch size | Variable | Batch size was gradually increased during the training | |English Benchmark | Value | |--------------------|------------| | ARC-Challenge-25shots | 59.73 | | HellaSwag-10shots | 82.91 | | MMLU-5shots | 58.37 | | Winogrande-5shots | 78.30 | | TruthfulQA-0shot | 52.56 | | GSM8k-5shots | 53.83 | | ARC-Challenge-0shot | 50.17 | | ARC-Easy-0shot | 77.78 | | Hellaswag-0shot | 82.07 | We thank the leaderboard team from HuggingFace for providing an official evaluation of our model on the leaderboard tasks. Falcon2-11B is a causal decoder-only model trained on a causal language modeling task (i.e., predict the next token). The architecture is broadly adapted from the GPT-3 paper (Brown et al., 2020), with the following differences: Positional embeddings: rotary (Su et al., 2021); Attention: multiquery (Shazeer et al., 2019) and FlashAttention-2 (Dao, 2023); Decoder-block: parallel attention/MLP. | Hyperparameter | Value | Comment | |--------------------|-----------|----------------------------------------| | Layers | 60 | | | `dmodel` | 4096 | | | `headdim` | 128 | | | Vocabulary | 65024 | | | Sequence length | 8192 | During stages 3 and 4 | Falcon2-11B was trained on AWS SageMaker, using on average 1024 A100 40GB GPUs in 128 p4d instances. Falcon2-11B was trained a custom distributed training codebase, Gigatron. It uses a 3D parallelism approach combined with ZeRO, high-performance Triton kernels and FlashAttention-2. More details about the distributed training strategy can be found in Almazrouei et.al. Falcon2-11B is licenced under TII Falcon License 2.0, the permissive Apache 2.0-based software license which includes an acceptable use policy that promotes the responsible use of AI.
Falcon-H1R-7B
Falcon-H1R-7B-GGUF
falcon-mamba-7b-instruct-Q4_K_M-GGUF
Falcon3-1B-Base
Falcon3 family of Open Foundation Models is a set of pretrained and instruct LLMs ranging from 1B to 10B parameters. This repository contains the Falcon3-1B-Base. It achieves strong results on reasoning, language understanding, instruction following, code and mathematics tasks. Falcon3-1B-Base supports 4 languages (English, French, Spanish, Portuguese) and a context length of up to 4K. It was pruned in terms of depth, width, number of heads, and embedding channels from a larger 3B Falcon model, and was efficiently trained on only 80 GT using a knowledge distillation objective. ⚠️ This is a raw, pretrained model, which should be further finetuned using SFT, RLHF, continued pretraining, etc. for most use cases. Model Details - Architecture - Transformer-based causal decoder-only architecture - 18 decoder blocks - Grouped Query Attention (GQA) for faster inference: 8 query heads and 4 key-value heads - Wider head dimension: 256 - High RoPE value to support long context understanding: 1000042 - Uses SwiGLU and RMSNorm - 4K context length - 131K vocab size - Pruned and healed using larger Falcon models (3B and 7B respectively) on only 80 Gigatokens of datasets comprising of web, code, STEM, high quality and multilingual data using 256 H100 GPU chips - Supports EN, FR, ES, PT - Developed by Technology Innovation Institute - License: TII Falcon-LLM License 2.0 - Model Release Date: December 2024 Benchmarks We report in the following table our internal pipeline benchmarks. - We use lm-evaluation harness. - We report raw scores. - We use same batch-size across all models. Category Benchmark Llama-3.2-1B Qwen2.5-1.5B SmolLM2-1.7B Falcon3-1B-Base Reasoning Arc Challenge (25-shot) 40.2 54.8 54.1 48.1 CommonSense Understanding PIQA (0-shot) 74.5 76.0 77.5 74.5 Useful links - View our release blogpost. - Feel free to join our discord server if you have any questions or to interact with our researchers and developers. Citation If the Falcon3 family of models were helpful to your work, feel free to give us a cite.
falcon-mamba-7b
Falcon3-7B-Base
Falcon-H1-7B-Instruct-GGUF
siglino-moe-0.3-0.6B
Falcon3-3B-Base
Falcon-H1-7B-Instruct
Falcon-H1-7B-Base
Falcon-H1-34B-Instruct
Falcon-H1-Tiny-R-90M-GGUF
Falcon-H1-Tiny-R-0.6B-GGUF
Falcon3-10B-Base
Falcon-H1-1.5B-Base
Falcon-Perception
Falcon-H1-1.5B-Instruct
Falcon-H1-0.5B-Instruct-GGUF
Falcon-H1-3B-Base
Falcon-H1-34B-Base
Falcon3-Mamba-7B-Instruct
Falcon-H1-1.5B-Deep-Base
Falcon-H1-1.5B-Deep-Instruct-GGUF
Falcon-H1-34B-Instruct-GGUF
Falcon-H1-1.5B-Instruct-GGUF
Falcon-H1-0.5B-Instruct
Falcon-E-1B-Base
Falcon-H1-1.5B-Deep-Instruct
Falcon-H1-3B-Instruct-GGUF
Falcon-H1-3B-Instruct
falcon-180B
Falcon-E-1B-Instruct
Falcon3-7B-Instruct-1.58bit
Falcon-E-3B-Base
Falcon-H1-Tiny-R-0.6B
falcon-11B-vlm
Falcon-E-3B-Instruct
Falcon3-1B-Instruct-GGUF
falcon-rw-7b
Falcon3-10B-Instruct-GGUF
Falcon3-7B-Instruct-GGUF
Falcon3-1B-Instruct-GPTQ-Int4
Falcon3-3B-Instruct-GGUF
Falcon3-Mamba-7B-Base
falcon-180B-chat
Falcon-H1-Tiny-R-90M
Falcon3-Mamba-7B-Base-GGUF
Falcon3-Mamba-7B-Instruct-GGUF
siglino-70M
Falcon3-1B-Instruct-1.58bit
siglino-30M
viscon-contextual-captioner
Falcon-OCR
Falcon-H1-34B-Instruct-GPTQ-Int4
siglino-0.6B
siglino-moe-0.15-0.6B
falcon-mamba-7b-pre-decay
Falcon3-7B-Instruct-1.58bit-GGUF
falcon-mamba-7b-instruct-4bit
Falcon-H1-34B-Instruct-GPTQ-Int8
Falcon3-10B-Instruct-GPTQ-Int4
Falcon3-3B-Instruct-1.58bit
Falcon3-1B-Instruct-1.58bit-GGUF
falcon-mamba-7b-instruct-Q8_0-GGUF
Falcon3-10B-Instruct-1.58bit
Falcon3-3B-Base-1.58bit
falcon-mamba-7b-Q8_0-GGUF
Falcon3-7B-Instruct-GPTQ-Int8
falcon-mamba-7b-4bit
Falcon3-7B-Instruct-GPTQ-Int4
Falcon-H1-0.5B-Instruct-GPTQ-Int4
Falcon3-10B-Base-1.58bit
0. TL;DR 1. Model Details 2. Training Details 3. Usage 4. Evaluation 5. Citation - Developed by: https://www.tii.ae - Model type: Causal decoder-only - Architecture: Pure-transformer - 1.58bit version - Language(s) (NLP): Mainly English - License: TII Falcon License 2.0 The model has been trained following the training strategies from the recent 1-bit LLM HF blogpost and 1-bit LLM paper. For more details about the training protocol of this model, please refer to the Falcon-3 technical report, section Compression. Currently to use this model you can either rely on Hugging Face transformers library or BitNet library. You can also play with the model using the falcon-1.58bit playground (only for the 7B instruct version). Evaluation We report in the following table our internal pipeline benchmarks: Note evaluation results are normalized score from v2 leaderboard tasks - reported results of original models in the blogpost are raw scores Benchmark Llama3-8B-1.58-100B-tokens Falcon3-10B-Base-1.58bit