tiiuae

131 models • 17 total models in database
Sort by:

falcon-7b-instruct

--- datasets: - tiiuae/falcon-refinedweb language: - en inference: true new_version: tiiuae/falcon-11B widget: - text: "Hey Falcon! Any recommendations for my holidays in Abu Dhabi?" example_title: "Abu Dhabi Trip" - text: "What's the Everett interpretation of quantum mechanics?" example_title: "Q/A: Quantum & Answers" - text: "Give me a list of the top 10 dive sites you would recommend around the world." example_title: "Diving Top 10" - text: "Can you tell me more about deep-water soloing?" exa

NaNK
license:apache-2.0
83,433
1,022

falcon-7b

NaNK
license:apache-2.0
68,219
1,096

falcon-40b-instruct

NaNK
license:apache-2.0
40,397
1,179

Falcon-H1-0.5B-Base

NaNK
22,594
16

falcon-mamba-tiny-dev

17,752
1

Falcon3-1B-Instruct

NaNK
llama
14,100
42

falcon-40b

NaNK
license:apache-2.0
9,223
2,433

Falcon3-3B-Instruct

Falcon3 family of Open Foundation Models is a set of pretrained and instruct LLMs ranging from 1B to 10B parameters. Falcon3-3B-Instruct achieves strong results on reasoning, language understanding, instruction following, code and mathematics tasks. Falcon3-3B-Instruct supports 4 languages (English, French, Spanish, Portuguese) and a context length of up to 32K. Model Details - Architecture - Transformer-based causal decoder-only architecture - 22 decoder blocks - Grouped Query Attention (GQA) for faster inference: 12 query heads and 4 key-value heads - Wider head dimension: 256 - High RoPE value to support long context understanding: 1000042 - Uses SwiGLU and RMSNorm - 32K context length - 131K vocab size - Pruned and healed from Falcon3-7B-Base on only 100 Gigatokens of datasets comprising of web, code, STEM, high quality and mutlilingual data using 1024 H100 GPU chips - Posttrained on 1.2 million samples of STEM, conversational, code, safety and function call data - Supports EN, FR, ES, PT - Developed by Technology Innovation Institute - License: TII Falcon-LLM License 2.0 - Model Release Date: December 2024 Benchmarks We report in the following table our internal pipeline benchmarks. - We use lm-evaluation harness. - We report raw scores obtained by applying chat template and fewshotasmultiturn. - We use same batch-size across all models. Category Benchmark Llama-3.2-3B-Instruct Qwen2.5-3B-Instruct Nemotron-Mini-4B-Instruct Falcon3-3B-Instruct Reasoning Arc Challenge (25-shot) 50.9 55.0 56.2 55.5 CommonSense Understanding PIQA (0-shot) 74.6 73.8 74.6 75.6 Instructions following MT-Bench (avg) 7.1 8.0 6.7 7.2 Useful links - View our release blogpost. - Feel free to join our discord server if you have any questions or to interact with our researchers and developers. Citation If the Falcon3 family of models were helpful to your work, feel free to give us a cite.

NaNK
llama
8,687
27

Falcon3-7B-Instruct

NaNK
llama
8,181
73

falcon-rw-1b

NaNK
license:apache-2.0
8,100
116

falcon-mamba-7b-instruct

NaNK
7,779
68

Falcon3-10B-Instruct

NaNK
llama
7,457
117

falcon-11B

Falcon2-11B is an 11B parameters causal decoder-only model built by TII and trained on over 5,000B tokens of RefinedWeb enhanced with curated corpora. The model is made available under the TII Falcon License 2.0, the permissive Apache 2.0-based software license which includes an acceptable use policy that promotes the responsible use of AI. 🤗 To get started with Falcon (inference, finetuning, quantization, etc.), we recommend reading this great blogpost from HF! ⚠️ This is a raw, pretrained model, which should be further finetuned for most usecases. 💥 Falcon LLMs require PyTorch 2.0 for use with `transformers`! For fast inference with Falcon, check-out Text Generation Inference! Read more in this blogpost. - Developed by: https://www.tii.ae - Model type: Causal decoder-only - Language(s) (NLP): English, German, Spanish, French, Italian, Portuguese, Polish, Dutch, Romanian, Czech, Swedish - License: TII Falcon License 2.0 Research on large language models; as a foundation for further specialization and finetuning for specific usecases (e.g., summarization, text generation, chatbot, etc.) Production use without adequate assessment of risks and mitigation; any use cases which may be considered irresponsible or harmful. Falcon2-11B is trained mostly on English, but also German, Spanish, French, Italian, Portuguese, Polish, Dutch, Romanian, Czech, Swedish. It will not generalize appropriately to other languages. Furthermore, as it is trained on a large-scale corpora representative of the web, it will carry the stereotypes and biases commonly encountered online. We recommend users of Falcon2-11B to consider finetuning it for the specific set of tasks of interest, and for guardrails and appropriate precautions to be taken for any production use. Falcon2-11B was trained over 5,000B tokens of RefinedWeb, a high-quality filtered and deduplicated web dataset which we enhanced with curated corpora. It followed a four stage training strategy. The first three stages were focused on increasing the context length, from to 2048 to 4096 and finally to 8192 tokens. The last stage aimed to further enhance performance using only high quality data. Overall, the data sources included RefinedWeb-English, Refined Web-Europe (cs, de, es, fr, it, nl, pl, pt, ro, sv), high quality technical data, code data, and conversational data extracted from public sources. | Stage | Context length | Tokens | |--------------|-----------------|-------------| | Stage 1 | 2048 | 4500 B | | Stage 2 | 4096 | 250 B | | Stage 3 | 8192 | 250 B | | Stage 4 | 8192 | 500 B | The data was tokenized with the Falcon-7B/11B tokenizer. Falcon2-11B was trained on 1024 A100 40GB GPUs for the majority of the training, using a 3D parallelism strategy (TP=8, PP=1, DP=128) combined with ZeRO and Flash-Attention 2. | Hyperparameter | Value | Comment | |--------------------|------------|-------------------------------------------| | Precision | `bfloat16` | | | Optimizer | AdamW | | | Max learning rate | 3.7e-4 | Following a linear warm-up, then cosine decay to 1.89e-5 across 4500 B tokens. | | Weight decay | 1e-1 | | | Z-loss | 1e-4 | | | Batch size | Variable | Batch size was gradually increased during the training | |English Benchmark | Value | |--------------------|------------| | ARC-Challenge-25shots | 59.73 | | HellaSwag-10shots | 82.91 | | MMLU-5shots | 58.37 | | Winogrande-5shots | 78.30 | | TruthfulQA-0shot | 52.56 | | GSM8k-5shots | 53.83 | | ARC-Challenge-0shot | 50.17 | | ARC-Easy-0shot | 77.78 | | Hellaswag-0shot | 82.07 | We thank the leaderboard team from HuggingFace for providing an official evaluation of our model on the leaderboard tasks. Falcon2-11B is a causal decoder-only model trained on a causal language modeling task (i.e., predict the next token). The architecture is broadly adapted from the GPT-3 paper (Brown et al., 2020), with the following differences: Positional embeddings: rotary (Su et al., 2021); Attention: multiquery (Shazeer et al., 2019) and FlashAttention-2 (Dao, 2023); Decoder-block: parallel attention/MLP. | Hyperparameter | Value | Comment | |--------------------|-----------|----------------------------------------| | Layers | 60 | | | `dmodel` | 4096 | | | `headdim` | 128 | | | Vocabulary | 65024 | | | Sequence length | 8192 | During stages 3 and 4 | Falcon2-11B was trained on AWS SageMaker, using on average 1024 A100 40GB GPUs in 128 p4d instances. Falcon2-11B was trained a custom distributed training codebase, Gigatron. It uses a 3D parallelism approach combined with ZeRO, high-performance Triton kernels and FlashAttention-2. More details about the distributed training strategy can be found in Almazrouei et.al. Falcon2-11B is licenced under TII Falcon License 2.0, the permissive Apache 2.0-based software license which includes an acceptable use policy that promotes the responsible use of AI.

NaNK
5,655
217

Falcon-H1R-7B

NaNK
4,795
188

Falcon-H1R-7B-GGUF

NaNK
4,580
32

falcon-mamba-7b-instruct-Q4_K_M-GGUF

NaNK
4,166
5

Falcon3-1B-Base

Falcon3 family of Open Foundation Models is a set of pretrained and instruct LLMs ranging from 1B to 10B parameters. This repository contains the Falcon3-1B-Base. It achieves strong results on reasoning, language understanding, instruction following, code and mathematics tasks. Falcon3-1B-Base supports 4 languages (English, French, Spanish, Portuguese) and a context length of up to 4K. It was pruned in terms of depth, width, number of heads, and embedding channels from a larger 3B Falcon model, and was efficiently trained on only 80 GT using a knowledge distillation objective. ⚠️ This is a raw, pretrained model, which should be further finetuned using SFT, RLHF, continued pretraining, etc. for most use cases. Model Details - Architecture - Transformer-based causal decoder-only architecture - 18 decoder blocks - Grouped Query Attention (GQA) for faster inference: 8 query heads and 4 key-value heads - Wider head dimension: 256 - High RoPE value to support long context understanding: 1000042 - Uses SwiGLU and RMSNorm - 4K context length - 131K vocab size - Pruned and healed using larger Falcon models (3B and 7B respectively) on only 80 Gigatokens of datasets comprising of web, code, STEM, high quality and multilingual data using 256 H100 GPU chips - Supports EN, FR, ES, PT - Developed by Technology Innovation Institute - License: TII Falcon-LLM License 2.0 - Model Release Date: December 2024 Benchmarks We report in the following table our internal pipeline benchmarks. - We use lm-evaluation harness. - We report raw scores. - We use same batch-size across all models. Category Benchmark Llama-3.2-1B Qwen2.5-1.5B SmolLM2-1.7B Falcon3-1B-Base Reasoning Arc Challenge (25-shot) 40.2 54.8 54.1 48.1 CommonSense Understanding PIQA (0-shot) 74.5 76.0 77.5 74.5 Useful links - View our release blogpost. - Feel free to join our discord server if you have any questions or to interact with our researchers and developers. Citation If the Falcon3 family of models were helpful to your work, feel free to give us a cite.

NaNK
llama
3,889
27

falcon-mamba-7b

NaNK
3,526
240

Falcon3-7B-Base

NaNK
llama
3,081
34

Falcon-H1-7B-Instruct-GGUF

NaNK
2,848
19

siglino-moe-0.3-0.6B

NaNK
license:apache-2.0
2,448
6

Falcon3-3B-Base

NaNK
llama
2,251
15

Falcon-H1-7B-Instruct

NaNK
2,239
28

Falcon-H1-7B-Base

NaNK
2,179
5

Falcon-H1-34B-Instruct

NaNK
2,107
47

Falcon-H1-Tiny-R-90M-GGUF

2,024
7

Falcon-H1-Tiny-R-0.6B-GGUF

NaNK
1,844
4

Falcon3-10B-Base

NaNK
llama
1,803
40

Falcon-H1-1.5B-Base

NaNK
1,745
2

Falcon-Perception

license:apache-2.0
1,727
38

Falcon-H1-1.5B-Instruct

NaNK
1,683
13

Falcon-H1-0.5B-Instruct-GGUF

NaNK
1,658
8

Falcon-H1-3B-Base

NaNK
1,496
5

Falcon-H1-34B-Base

NaNK
1,447
13

Falcon3-Mamba-7B-Instruct

NaNK
1,435
31

Falcon-H1-1.5B-Deep-Base

NaNK
1,395
4

Falcon-H1-1.5B-Deep-Instruct-GGUF

NaNK
1,330
13

Falcon-H1-34B-Instruct-GGUF

NaNK
1,195
15

Falcon-H1-1.5B-Instruct-GGUF

NaNK
902
9

Falcon-H1-0.5B-Instruct

NaNK
853
28

Falcon-E-1B-Base

NaNK
llama
829
6

Falcon-H1-1.5B-Deep-Instruct

NaNK
813
29

Falcon-H1-3B-Instruct-GGUF

NaNK
760
12

Falcon-H1-3B-Instruct

NaNK
726
13

falcon-180B

NaNK
705
1,149

Falcon-E-1B-Instruct

NaNK
llama
660
9

Falcon3-7B-Instruct-1.58bit

NaNK
llama
652
15

Falcon-E-3B-Base

NaNK
llama
589
12

Falcon-H1-Tiny-R-0.6B

NaNK
570
14

falcon-11B-vlm

NaNK
517
47

Falcon-E-3B-Instruct

NaNK
llama
514
33

Falcon3-1B-Instruct-GGUF

NaNK
469
14

falcon-rw-7b

NaNK
license:apache-2.0
439
17

Falcon3-10B-Instruct-GGUF

NaNK
368
21

Falcon3-7B-Instruct-GGUF

NaNK
352
15

Falcon3-1B-Instruct-GPTQ-Int4

NaNK
llama
216
0

Falcon3-3B-Instruct-GGUF

NaNK
207
8

Falcon3-Mamba-7B-Base

NaNK
179
23

falcon-180B-chat

NaNK
175
545

Falcon-H1-Tiny-R-90M

172
4

Falcon3-Mamba-7B-Base-GGUF

NaNK
168
5

Falcon3-Mamba-7B-Instruct-GGUF

NaNK
129
15

siglino-70M

license:apache-2.0
113
5

Falcon3-1B-Instruct-1.58bit

NaNK
llama
109
10

siglino-30M

license:apache-2.0
107
4

viscon-contextual-captioner

107
0

Falcon-OCR

license:apache-2.0
96
25

Falcon-H1-34B-Instruct-GPTQ-Int4

NaNK
92
2

siglino-0.6B

NaNK
license:apache-2.0
83
10

siglino-moe-0.15-0.6B

NaNK
license:apache-2.0
77
4

falcon-mamba-7b-pre-decay

NaNK
74
3

Falcon3-7B-Instruct-1.58bit-GGUF

NaNK
68
5

falcon-mamba-7b-instruct-4bit

NaNK
66
12

Falcon-H1-34B-Instruct-GPTQ-Int8

NaNK
65
4

Falcon3-10B-Instruct-GPTQ-Int4

NaNK
llama
63
0

Falcon3-3B-Instruct-1.58bit

NaNK
llama
56
11

Falcon3-1B-Instruct-1.58bit-GGUF

NaNK
52
2

falcon-mamba-7b-instruct-Q8_0-GGUF

NaNK
51
5

Falcon3-10B-Instruct-1.58bit

NaNK
llama
50
21

Falcon3-3B-Base-1.58bit

NaNK
llama
50
2

falcon-mamba-7b-Q8_0-GGUF

NaNK
49
2

Falcon3-7B-Instruct-GPTQ-Int8

NaNK
llama
47
0

falcon-mamba-7b-4bit

NaNK
46
11

Falcon3-7B-Instruct-GPTQ-Int4

NaNK
llama
46
1

Falcon-H1-0.5B-Instruct-GPTQ-Int4

NaNK
44
0

Falcon3-10B-Base-1.58bit

0. TL;DR 1. Model Details 2. Training Details 3. Usage 4. Evaluation 5. Citation - Developed by: https://www.tii.ae - Model type: Causal decoder-only - Architecture: Pure-transformer - 1.58bit version - Language(s) (NLP): Mainly English - License: TII Falcon License 2.0 The model has been trained following the training strategies from the recent 1-bit LLM HF blogpost and 1-bit LLM paper. For more details about the training protocol of this model, please refer to the Falcon-3 technical report, section Compression. Currently to use this model you can either rely on Hugging Face transformers library or BitNet library. You can also play with the model using the falcon-1.58bit playground (only for the 7B instruct version). Evaluation We report in the following table our internal pipeline benchmarks: Note evaluation results are normalized score from v2 leaderboard tasks - reported results of original models in the blogpost are raw scores Benchmark Llama3-8B-1.58-100B-tokens Falcon3-10B-Base-1.58bit

NaNK
llama
43
8

Falcon-H1-7B-Instruct-GPTQ-Int8

NaNK
43
0

Falcon3-3B-Instruct-GPTQ-Int8

NaNK
llama
42
1

Falcon-E-3B-Instruct-GGUF

NaNK
41
13

Falcon3-3B-Instruct-GPTQ-Int4

NaNK
llama
41
0

Falcon-H1-1.5B-Deep-Instruct-GPTQ-Int8

NaNK
40
0

Falcon3-10B-Instruct-1.58bit-GGUF

NaNK
39
4

Falcon3-7B-Base-1.58bit

NaNK
llama
39
2

Falcon3-1B-Instruct-AWQ

NaNK
llama
38
0

falcon-mamba-7b-F16-GGUF

NaNK
37
1

Falcon-H1-3B-Instruct-GPTQ-Int4

NaNK
36
0

Falcon3-1B-Instruct-GPTQ-Int8

NaNK
llama
35
1

Falcon-H1-1.5B-Instruct-GPTQ-Int8

NaNK
35
0

Falcon-E-1B-Instruct-GGUF

NaNK
33
6

Falcon3-10B-Instruct-AWQ

NaNK
llama
33
1

Falcon3-3B-Instruct-AWQ

NaNK
llama
33
0

Falcon-H1-1.5B-Instruct-GPTQ-Int4

NaNK
33
0

Falcon-H1-1.5B-Deep-Instruct-GPTQ-Int4

NaNK
33
0

Falcon-H1-7B-Instruct-GPTQ-Int4

NaNK
33
0

Falcon-H1-Tiny-R-0.6B-pre-GRPO

NaNK
32
3

Falcon3-10B-Instruct-GPTQ-Int8

NaNK
llama
32
2

Falcon3-7B-Instruct-AWQ

NaNK
llama
32
0

Falcon-H1-0.5B-Instruct-GPTQ-Int8

NaNK
31
0

Falcon-H1-3B-Instruct-GPTQ-Int8

NaNK
30
0

falcon-mamba-7b-Q4_K_M-GGUF

NaNK
28
1

falcon-mamba-7b-instruct-BF16-GGUF

NaNK
27
1

falcon-mamba-7b-BF16-GGUF

NaNK
24
2

Falcon3-3B-Instruct-1.58bit-GGUF

NaNK
22
1

falcon-mamba-7b-instruct-F16-GGUF

NaNK
16
2

Falcon-H1-Tiny-90M-Instruct-GGUF

2
6

visper

license:cc-by-nc-2.0
0
9

Falcon-H1-Tiny-90M-Instruct

0
8

Falcon-H1-Tiny-Tool-Calling-90M

0
3

Falcon-H1-Tiny-Multilingual-100M-Instruct

0
3

amoe-dense-L

license:apache-2.0
0
2

amoe-dense-S

license:apache-2.0
0
2

Falcon-H1-Tiny-90M-Base

0
2

Falcon-H1-Tiny-Coder-90M

0
2

Falcon-H1-Tiny-Coder-90M-GGUF

0
2

Falcon-H1-Tiny-Tool-Calling-90M-GGUF

0
2

Falcon-H1-Tiny-90M-Instruct-pre-DPO

0
2

Dense 500m Arch1

0
2

amoe-ultrasparse

license:apache-2.0
0
1

Falcon-H1-Tiny-Multilingual-100M-Base

0
1

Falcon-H1-Tiny-90M-Instruct-Curriculum

0
1

Falcon-H1-Tiny-90M-Instruct-Curriculum-pre-DPO

0
1