Fijik-3b-Instruct
3
3.0B
3 languages
license:apache-2.0
by
Pinkstack
Language Model
OTHER
3B params
New
0 downloads
Early-stage
Edge AI:
Mobile
Laptop
Server
7GB+ RAM
Mobile
Laptop
Server
Quick Summary
AI model with specialized capabilities.
Device Compatibility
Mobile
4-6GB RAM
Laptop
16GB RAM
Server
GPU
Minimum Recommended
3GB+ RAM
Code Examples
What is ittext

# What is it
This is a 1.0 Fijik series with **3 billion** parameters, dense 56 layer transformer LLM based on Qwen2.5, specifically, it was merged using Mergekit to be twice as large as Qwen2.5 1.5B.
After merging, we used a custom dataset mix meant for this model, to improve its performance even more.
- **Step 1 for fine-tuning via unsloth:** SFT on an estimated 20 million tokens. (more or less)
- **Step 2 for the fine-tuning via unsloth:** DPO for 2 epochs for even better instruction following.
After these two steps, we got a powerful model which has less parameters than llama 3.1 8B yet performs just as good if not better, Note that unlike our other recent models, it is not a thinking model, yet it can reason quite well. Our theory behind this model is that a smaller yet deeper model can outperform for it's size.
Alibaba qwen states that Qwen2.5 was pre-trained on up to 18 trillion high-quality tokens. This model supports up to **32768** input tokens and can generate up to **8192** tokens.
# What should Fijik be used for?
Fijik 1.0 3B is by design, meant to be a production-ready, general use, high-performance model, which is also small enough to be run at high token throughputs while minimising performance loss.
- We made some efforts at ensuring the model is safe while keeping it useable. In addition, it is sensitive to system prompts (in a good way, adheres to them well), so it is very customisable. We did not put in our fine-tuning data any information about the identity of the model; rather it just knows that it is a Large Language Model (LLM), but it does not know it is Fijik, unless you specify in the system prompt.
- Due to the large context of the model, It can be used for RAG, but like any other LLM out there, you should be aware that it *may* hallucinate.
- In our fine-tuning data we included quite a bit of creative writing examples, so the model is pretty good at it.
- Coding, Math: In our SFT, DPO fine-tuning data we have put an effort into improving coding and step-by-step math performance, while it is indeed not perfect, no LLM is.
# ExamplesWhat is ittext

# What is it
This is a 1.0 Fijik series with **3 billion** parameters, dense 56 layer transformer LLM based on Qwen2.5, specifically, it was merged using Mergekit to be twice as large as Qwen2.5 1.5B.
After merging, we used a custom dataset mix meant for this model, to improve its performance even more.
- **Step 1 for fine-tuning via unsloth:** SFT on an estimated 20 million tokens. (more or less)
- **Step 2 for the fine-tuning via unsloth:** DPO for 2 epochs for even better instruction following.
After these two steps, we got a powerful model which has less parameters than llama 3.1 8B yet performs just as good if not better, Note that unlike our other recent models, it is not a thinking model, yet it can reason quite well. Our theory behind this model is that a smaller yet deeper model can outperform for it's size.
Alibaba qwen states that Qwen2.5 was pre-trained on up to 18 trillion high-quality tokens. This model supports up to **32768** input tokens and can generate up to **8192** tokens.
# What should Fijik be used for?
Fijik 1.0 3B is by design, meant to be a production-ready, general use, high-performance model, which is also small enough to be run at high token throughputs while minimising performance loss.
- We made some efforts at ensuring the model is safe while keeping it useable. In addition, it is sensitive to system prompts (in a good way, adheres to them well), so it is very customisable. We did not put in our fine-tuning data any information about the identity of the model; rather it just knows that it is a Large Language Model (LLM), but it does not know it is Fijik, unless you specify in the system prompt.
- Due to the large context of the model, It can be used for RAG, but like any other LLM out there, you should be aware that it *may* hallucinate.
- In our fine-tuning data we included quite a bit of creative writing examples, so the model is pretty good at it.
- Coding, Math: In our SFT, DPO fine-tuning data we have put an effort into improving coding and step-by-step math performance, while it is indeed not perfect, no LLM is.
# ExamplesWhat is ittext

# What is it
This is a 1.0 Fijik series with **3 billion** parameters, dense 56 layer transformer LLM based on Qwen2.5, specifically, it was merged using Mergekit to be twice as large as Qwen2.5 1.5B.
After merging, we used a custom dataset mix meant for this model, to improve its performance even more.
- **Step 1 for fine-tuning via unsloth:** SFT on an estimated 20 million tokens. (more or less)
- **Step 2 for the fine-tuning via unsloth:** DPO for 2 epochs for even better instruction following.
After these two steps, we got a powerful model which has less parameters than llama 3.1 8B yet performs just as good if not better, Note that unlike our other recent models, it is not a thinking model, yet it can reason quite well. Our theory behind this model is that a smaller yet deeper model can outperform for it's size.
Alibaba qwen states that Qwen2.5 was pre-trained on up to 18 trillion high-quality tokens. This model supports up to **32768** input tokens and can generate up to **8192** tokens.
# What should Fijik be used for?
Fijik 1.0 3B is by design, meant to be a production-ready, general use, high-performance model, which is also small enough to be run at high token throughputs while minimising performance loss.
- We made some efforts at ensuring the model is safe while keeping it useable. In addition, it is sensitive to system prompts (in a good way, adheres to them well), so it is very customisable. We did not put in our fine-tuning data any information about the identity of the model; rather it just knows that it is a Large Language Model (LLM), but it does not know it is Fijik, unless you specify in the system prompt.
- Due to the large context of the model, It can be used for RAG, but like any other LLM out there, you should be aware that it *may* hallucinate.
- In our fine-tuning data we included quite a bit of creative writing examples, so the model is pretty good at it.
- Coding, Math: In our SFT, DPO fine-tuning data we have put an effort into improving coding and step-by-step math performance, while it is indeed not perfect, no LLM is.
# ExamplesWhat is ittext

# What is it
This is a 1.0 Fijik series with **3 billion** parameters, dense 56 layer transformer LLM based on Qwen2.5, specifically, it was merged using Mergekit to be twice as large as Qwen2.5 1.5B.
After merging, we used a custom dataset mix meant for this model, to improve its performance even more.
- **Step 1 for fine-tuning via unsloth:** SFT on an estimated 20 million tokens. (more or less)
- **Step 2 for the fine-tuning via unsloth:** DPO for 2 epochs for even better instruction following.
After these two steps, we got a powerful model which has less parameters than llama 3.1 8B yet performs just as good if not better, Note that unlike our other recent models, it is not a thinking model, yet it can reason quite well. Our theory behind this model is that a smaller yet deeper model can outperform for it's size.
Alibaba qwen states that Qwen2.5 was pre-trained on up to 18 trillion high-quality tokens. This model supports up to **32768** input tokens and can generate up to **8192** tokens.
# What should Fijik be used for?
Fijik 1.0 3B is by design, meant to be a production-ready, general use, high-performance model, which is also small enough to be run at high token throughputs while minimising performance loss.
- We made some efforts at ensuring the model is safe while keeping it useable. In addition, it is sensitive to system prompts (in a good way, adheres to them well), so it is very customisable. We did not put in our fine-tuning data any information about the identity of the model; rather it just knows that it is a Large Language Model (LLM), but it does not know it is Fijik, unless you specify in the system prompt.
- Due to the large context of the model, It can be used for RAG, but like any other LLM out there, you should be aware that it *may* hallucinate.
- In our fine-tuning data we included quite a bit of creative writing examples, so the model is pretty good at it.
- Coding, Math: In our SFT, DPO fine-tuning data we have put an effort into improving coding and step-by-step math performance, while it is indeed not perfect, no LLM is.
# ExamplesWhat is ittext

# What is it
This is a 1.0 Fijik series with **3 billion** parameters, dense 56 layer transformer LLM based on Qwen2.5, specifically, it was merged using Mergekit to be twice as large as Qwen2.5 1.5B.
After merging, we used a custom dataset mix meant for this model, to improve its performance even more.
- **Step 1 for fine-tuning via unsloth:** SFT on an estimated 20 million tokens. (more or less)
- **Step 2 for the fine-tuning via unsloth:** DPO for 2 epochs for even better instruction following.
After these two steps, we got a powerful model which has less parameters than llama 3.1 8B yet performs just as good if not better, Note that unlike our other recent models, it is not a thinking model, yet it can reason quite well. Our theory behind this model is that a smaller yet deeper model can outperform for it's size.
Alibaba qwen states that Qwen2.5 was pre-trained on up to 18 trillion high-quality tokens. This model supports up to **32768** input tokens and can generate up to **8192** tokens.
# What should Fijik be used for?
Fijik 1.0 3B is by design, meant to be a production-ready, general use, high-performance model, which is also small enough to be run at high token throughputs while minimising performance loss.
- We made some efforts at ensuring the model is safe while keeping it useable. In addition, it is sensitive to system prompts (in a good way, adheres to them well), so it is very customisable. We did not put in our fine-tuning data any information about the identity of the model; rather it just knows that it is a Large Language Model (LLM), but it does not know it is Fijik, unless you specify in the system prompt.
- Due to the large context of the model, It can be used for RAG, but like any other LLM out there, you should be aware that it *may* hallucinate.
- In our fine-tuning data we included quite a bit of creative writing examples, so the model is pretty good at it.
- Coding, Math: In our SFT, DPO fine-tuning data we have put an effort into improving coding and step-by-step math performance, while it is indeed not perfect, no LLM is.
# ExamplesWhat is ittext

# What is it
This is a 1.0 Fijik series with **3 billion** parameters, dense 56 layer transformer LLM based on Qwen2.5, specifically, it was merged using Mergekit to be twice as large as Qwen2.5 1.5B.
After merging, we used a custom dataset mix meant for this model, to improve its performance even more.
- **Step 1 for fine-tuning via unsloth:** SFT on an estimated 20 million tokens. (more or less)
- **Step 2 for the fine-tuning via unsloth:** DPO for 2 epochs for even better instruction following.
After these two steps, we got a powerful model which has less parameters than llama 3.1 8B yet performs just as good if not better, Note that unlike our other recent models, it is not a thinking model, yet it can reason quite well. Our theory behind this model is that a smaller yet deeper model can outperform for it's size.
Alibaba qwen states that Qwen2.5 was pre-trained on up to 18 trillion high-quality tokens. This model supports up to **32768** input tokens and can generate up to **8192** tokens.
# What should Fijik be used for?
Fijik 1.0 3B is by design, meant to be a production-ready, general use, high-performance model, which is also small enough to be run at high token throughputs while minimising performance loss.
- We made some efforts at ensuring the model is safe while keeping it useable. In addition, it is sensitive to system prompts (in a good way, adheres to them well), so it is very customisable. We did not put in our fine-tuning data any information about the identity of the model; rather it just knows that it is a Large Language Model (LLM), but it does not know it is Fijik, unless you specify in the system prompt.
- Due to the large context of the model, It can be used for RAG, but like any other LLM out there, you should be aware that it *may* hallucinate.
- In our fine-tuning data we included quite a bit of creative writing examples, so the model is pretty good at it.
- Coding, Math: In our SFT, DPO fine-tuning data we have put an effort into improving coding and step-by-step math performance, while it is indeed not perfect, no LLM is.
# ExamplesWhat is ittext

# What is it
This is a 1.0 Fijik series with **3 billion** parameters, dense 56 layer transformer LLM based on Qwen2.5, specifically, it was merged using Mergekit to be twice as large as Qwen2.5 1.5B.
After merging, we used a custom dataset mix meant for this model, to improve its performance even more.
- **Step 1 for fine-tuning via unsloth:** SFT on an estimated 20 million tokens. (more or less)
- **Step 2 for the fine-tuning via unsloth:** DPO for 2 epochs for even better instruction following.
After these two steps, we got a powerful model which has less parameters than llama 3.1 8B yet performs just as good if not better, Note that unlike our other recent models, it is not a thinking model, yet it can reason quite well. Our theory behind this model is that a smaller yet deeper model can outperform for it's size.
Alibaba qwen states that Qwen2.5 was pre-trained on up to 18 trillion high-quality tokens. This model supports up to **32768** input tokens and can generate up to **8192** tokens.
# What should Fijik be used for?
Fijik 1.0 3B is by design, meant to be a production-ready, general use, high-performance model, which is also small enough to be run at high token throughputs while minimising performance loss.
- We made some efforts at ensuring the model is safe while keeping it useable. In addition, it is sensitive to system prompts (in a good way, adheres to them well), so it is very customisable. We did not put in our fine-tuning data any information about the identity of the model; rather it just knows that it is a Large Language Model (LLM), but it does not know it is Fijik, unless you specify in the system prompt.
- Due to the large context of the model, It can be used for RAG, but like any other LLM out there, you should be aware that it *may* hallucinate.
- In our fine-tuning data we included quite a bit of creative writing examples, so the model is pretty good at it.
- Coding, Math: In our SFT, DPO fine-tuning data we have put an effort into improving coding and step-by-step math performance, while it is indeed not perfect, no LLM is.
# ExamplesWhat is ittext

# What is it
This is a 1.0 Fijik series with **3 billion** parameters, dense 56 layer transformer LLM based on Qwen2.5, specifically, it was merged using Mergekit to be twice as large as Qwen2.5 1.5B.
After merging, we used a custom dataset mix meant for this model, to improve its performance even more.
- **Step 1 for fine-tuning via unsloth:** SFT on an estimated 20 million tokens. (more or less)
- **Step 2 for the fine-tuning via unsloth:** DPO for 2 epochs for even better instruction following.
After these two steps, we got a powerful model which has less parameters than llama 3.1 8B yet performs just as good if not better, Note that unlike our other recent models, it is not a thinking model, yet it can reason quite well. Our theory behind this model is that a smaller yet deeper model can outperform for it's size.
Alibaba qwen states that Qwen2.5 was pre-trained on up to 18 trillion high-quality tokens. This model supports up to **32768** input tokens and can generate up to **8192** tokens.
# What should Fijik be used for?
Fijik 1.0 3B is by design, meant to be a production-ready, general use, high-performance model, which is also small enough to be run at high token throughputs while minimising performance loss.
- We made some efforts at ensuring the model is safe while keeping it useable. In addition, it is sensitive to system prompts (in a good way, adheres to them well), so it is very customisable. We did not put in our fine-tuning data any information about the identity of the model; rather it just knows that it is a Large Language Model (LLM), but it does not know it is Fijik, unless you specify in the system prompt.
- Due to the large context of the model, It can be used for RAG, but like any other LLM out there, you should be aware that it *may* hallucinate.
- In our fine-tuning data we included quite a bit of creative writing examples, so the model is pretty good at it.
- Coding, Math: In our SFT, DPO fine-tuning data we have put an effort into improving coding and step-by-step math performance, while it is indeed not perfect, no LLM is.
# ExamplesWhat is ittext

# What is it
This is a 1.0 Fijik series with **3 billion** parameters, dense 56 layer transformer LLM based on Qwen2.5, specifically, it was merged using Mergekit to be twice as large as Qwen2.5 1.5B.
After merging, we used a custom dataset mix meant for this model, to improve its performance even more.
- **Step 1 for fine-tuning via unsloth:** SFT on an estimated 20 million tokens. (more or less)
- **Step 2 for the fine-tuning via unsloth:** DPO for 2 epochs for even better instruction following.
After these two steps, we got a powerful model which has less parameters than llama 3.1 8B yet performs just as good if not better, Note that unlike our other recent models, it is not a thinking model, yet it can reason quite well. Our theory behind this model is that a smaller yet deeper model can outperform for it's size.
Alibaba qwen states that Qwen2.5 was pre-trained on up to 18 trillion high-quality tokens. This model supports up to **32768** input tokens and can generate up to **8192** tokens.
# What should Fijik be used for?
Fijik 1.0 3B is by design, meant to be a production-ready, general use, high-performance model, which is also small enough to be run at high token throughputs while minimising performance loss.
- We made some efforts at ensuring the model is safe while keeping it useable. In addition, it is sensitive to system prompts (in a good way, adheres to them well), so it is very customisable. We did not put in our fine-tuning data any information about the identity of the model; rather it just knows that it is a Large Language Model (LLM), but it does not know it is Fijik, unless you specify in the system prompt.
- Due to the large context of the model, It can be used for RAG, but like any other LLM out there, you should be aware that it *may* hallucinate.
- In our fine-tuning data we included quite a bit of creative writing examples, so the model is pretty good at it.
- Coding, Math: In our SFT, DPO fine-tuning data we have put an effort into improving coding and step-by-step math performance, while it is indeed not perfect, no LLM is.
# ExamplesWhat is ittext

# What is it
This is a 1.0 Fijik series with **3 billion** parameters, dense 56 layer transformer LLM based on Qwen2.5, specifically, it was merged using Mergekit to be twice as large as Qwen2.5 1.5B.
After merging, we used a custom dataset mix meant for this model, to improve its performance even more.
- **Step 1 for fine-tuning via unsloth:** SFT on an estimated 20 million tokens. (more or less)
- **Step 2 for the fine-tuning via unsloth:** DPO for 2 epochs for even better instruction following.
After these two steps, we got a powerful model which has less parameters than llama 3.1 8B yet performs just as good if not better, Note that unlike our other recent models, it is not a thinking model, yet it can reason quite well. Our theory behind this model is that a smaller yet deeper model can outperform for it's size.
Alibaba qwen states that Qwen2.5 was pre-trained on up to 18 trillion high-quality tokens. This model supports up to **32768** input tokens and can generate up to **8192** tokens.
# What should Fijik be used for?
Fijik 1.0 3B is by design, meant to be a production-ready, general use, high-performance model, which is also small enough to be run at high token throughputs while minimising performance loss.
- We made some efforts at ensuring the model is safe while keeping it useable. In addition, it is sensitive to system prompts (in a good way, adheres to them well), so it is very customisable. We did not put in our fine-tuning data any information about the identity of the model; rather it just knows that it is a Large Language Model (LLM), but it does not know it is Fijik, unless you specify in the system prompt.
- Due to the large context of the model, It can be used for RAG, but like any other LLM out there, you should be aware that it *may* hallucinate.
- In our fine-tuning data we included quite a bit of creative writing examples, so the model is pretty good at it.
- Coding, Math: In our SFT, DPO fine-tuning data we have put an effort into improving coding and step-by-step math performance, while it is indeed not perfect, no LLM is.
# ExamplesWhat is ittext

# What is it
This is a 1.0 Fijik series with **3 billion** parameters, dense 56 layer transformer LLM based on Qwen2.5, specifically, it was merged using Mergekit to be twice as large as Qwen2.5 1.5B.
After merging, we used a custom dataset mix meant for this model, to improve its performance even more.
- **Step 1 for fine-tuning via unsloth:** SFT on an estimated 20 million tokens. (more or less)
- **Step 2 for the fine-tuning via unsloth:** DPO for 2 epochs for even better instruction following.
After these two steps, we got a powerful model which has less parameters than llama 3.1 8B yet performs just as good if not better, Note that unlike our other recent models, it is not a thinking model, yet it can reason quite well. Our theory behind this model is that a smaller yet deeper model can outperform for it's size.
Alibaba qwen states that Qwen2.5 was pre-trained on up to 18 trillion high-quality tokens. This model supports up to **32768** input tokens and can generate up to **8192** tokens.
# What should Fijik be used for?
Fijik 1.0 3B is by design, meant to be a production-ready, general use, high-performance model, which is also small enough to be run at high token throughputs while minimising performance loss.
- We made some efforts at ensuring the model is safe while keeping it useable. In addition, it is sensitive to system prompts (in a good way, adheres to them well), so it is very customisable. We did not put in our fine-tuning data any information about the identity of the model; rather it just knows that it is a Large Language Model (LLM), but it does not know it is Fijik, unless you specify in the system prompt.
- Due to the large context of the model, It can be used for RAG, but like any other LLM out there, you should be aware that it *may* hallucinate.
- In our fine-tuning data we included quite a bit of creative writing examples, so the model is pretty good at it.
- Coding, Math: In our SFT, DPO fine-tuning data we have put an effort into improving coding and step-by-step math performance, while it is indeed not perfect, no LLM is.
# ExamplesWhat is ittext

# What is it
This is a 1.0 Fijik series with **3 billion** parameters, dense 56 layer transformer LLM based on Qwen2.5, specifically, it was merged using Mergekit to be twice as large as Qwen2.5 1.5B.
After merging, we used a custom dataset mix meant for this model, to improve its performance even more.
- **Step 1 for fine-tuning via unsloth:** SFT on an estimated 20 million tokens. (more or less)
- **Step 2 for the fine-tuning via unsloth:** DPO for 2 epochs for even better instruction following.
After these two steps, we got a powerful model which has less parameters than llama 3.1 8B yet performs just as good if not better, Note that unlike our other recent models, it is not a thinking model, yet it can reason quite well. Our theory behind this model is that a smaller yet deeper model can outperform for it's size.
Alibaba qwen states that Qwen2.5 was pre-trained on up to 18 trillion high-quality tokens. This model supports up to **32768** input tokens and can generate up to **8192** tokens.
# What should Fijik be used for?
Fijik 1.0 3B is by design, meant to be a production-ready, general use, high-performance model, which is also small enough to be run at high token throughputs while minimising performance loss.
- We made some efforts at ensuring the model is safe while keeping it useable. In addition, it is sensitive to system prompts (in a good way, adheres to them well), so it is very customisable. We did not put in our fine-tuning data any information about the identity of the model; rather it just knows that it is a Large Language Model (LLM), but it does not know it is Fijik, unless you specify in the system prompt.
- Due to the large context of the model, It can be used for RAG, but like any other LLM out there, you should be aware that it *may* hallucinate.
- In our fine-tuning data we included quite a bit of creative writing examples, so the model is pretty good at it.
- Coding, Math: In our SFT, DPO fine-tuning data we have put an effort into improving coding and step-by-step math performance, while it is indeed not perfect, no LLM is.
# ExamplesWhat is ittext

# What is it
This is a 1.0 Fijik series with **3 billion** parameters, dense 56 layer transformer LLM based on Qwen2.5, specifically, it was merged using Mergekit to be twice as large as Qwen2.5 1.5B.
After merging, we used a custom dataset mix meant for this model, to improve its performance even more.
- **Step 1 for fine-tuning via unsloth:** SFT on an estimated 20 million tokens. (more or less)
- **Step 2 for the fine-tuning via unsloth:** DPO for 2 epochs for even better instruction following.
After these two steps, we got a powerful model which has less parameters than llama 3.1 8B yet performs just as good if not better, Note that unlike our other recent models, it is not a thinking model, yet it can reason quite well. Our theory behind this model is that a smaller yet deeper model can outperform for it's size.
Alibaba qwen states that Qwen2.5 was pre-trained on up to 18 trillion high-quality tokens. This model supports up to **32768** input tokens and can generate up to **8192** tokens.
# What should Fijik be used for?
Fijik 1.0 3B is by design, meant to be a production-ready, general use, high-performance model, which is also small enough to be run at high token throughputs while minimising performance loss.
- We made some efforts at ensuring the model is safe while keeping it useable. In addition, it is sensitive to system prompts (in a good way, adheres to them well), so it is very customisable. We did not put in our fine-tuning data any information about the identity of the model; rather it just knows that it is a Large Language Model (LLM), but it does not know it is Fijik, unless you specify in the system prompt.
- Due to the large context of the model, It can be used for RAG, but like any other LLM out there, you should be aware that it *may* hallucinate.
- In our fine-tuning data we included quite a bit of creative writing examples, so the model is pretty good at it.
- Coding, Math: In our SFT, DPO fine-tuning data we have put an effort into improving coding and step-by-step math performance, while it is indeed not perfect, no LLM is.
# ExamplesWhat is ittext

# What is it
This is a 1.0 Fijik series with **3 billion** parameters, dense 56 layer transformer LLM based on Qwen2.5, specifically, it was merged using Mergekit to be twice as large as Qwen2.5 1.5B.
After merging, we used a custom dataset mix meant for this model, to improve its performance even more.
- **Step 1 for fine-tuning via unsloth:** SFT on an estimated 20 million tokens. (more or less)
- **Step 2 for the fine-tuning via unsloth:** DPO for 2 epochs for even better instruction following.
After these two steps, we got a powerful model which has less parameters than llama 3.1 8B yet performs just as good if not better, Note that unlike our other recent models, it is not a thinking model, yet it can reason quite well. Our theory behind this model is that a smaller yet deeper model can outperform for it's size.
Alibaba qwen states that Qwen2.5 was pre-trained on up to 18 trillion high-quality tokens. This model supports up to **32768** input tokens and can generate up to **8192** tokens.
# What should Fijik be used for?
Fijik 1.0 3B is by design, meant to be a production-ready, general use, high-performance model, which is also small enough to be run at high token throughputs while minimising performance loss.
- We made some efforts at ensuring the model is safe while keeping it useable. In addition, it is sensitive to system prompts (in a good way, adheres to them well), so it is very customisable. We did not put in our fine-tuning data any information about the identity of the model; rather it just knows that it is a Large Language Model (LLM), but it does not know it is Fijik, unless you specify in the system prompt.
- Due to the large context of the model, It can be used for RAG, but like any other LLM out there, you should be aware that it *may* hallucinate.
- In our fine-tuning data we included quite a bit of creative writing examples, so the model is pretty good at it.
- Coding, Math: In our SFT, DPO fine-tuning data we have put an effort into improving coding and step-by-step math performance, while it is indeed not perfect, no LLM is.
# ExamplesWhat is ittext

# What is it
This is a 1.0 Fijik series with **3 billion** parameters, dense 56 layer transformer LLM based on Qwen2.5, specifically, it was merged using Mergekit to be twice as large as Qwen2.5 1.5B.
After merging, we used a custom dataset mix meant for this model, to improve its performance even more.
- **Step 1 for fine-tuning via unsloth:** SFT on an estimated 20 million tokens. (more or less)
- **Step 2 for the fine-tuning via unsloth:** DPO for 2 epochs for even better instruction following.
After these two steps, we got a powerful model which has less parameters than llama 3.1 8B yet performs just as good if not better, Note that unlike our other recent models, it is not a thinking model, yet it can reason quite well. Our theory behind this model is that a smaller yet deeper model can outperform for it's size.
Alibaba qwen states that Qwen2.5 was pre-trained on up to 18 trillion high-quality tokens. This model supports up to **32768** input tokens and can generate up to **8192** tokens.
# What should Fijik be used for?
Fijik 1.0 3B is by design, meant to be a production-ready, general use, high-performance model, which is also small enough to be run at high token throughputs while minimising performance loss.
- We made some efforts at ensuring the model is safe while keeping it useable. In addition, it is sensitive to system prompts (in a good way, adheres to them well), so it is very customisable. We did not put in our fine-tuning data any information about the identity of the model; rather it just knows that it is a Large Language Model (LLM), but it does not know it is Fijik, unless you specify in the system prompt.
- Due to the large context of the model, It can be used for RAG, but like any other LLM out there, you should be aware that it *may* hallucinate.
- In our fine-tuning data we included quite a bit of creative writing examples, so the model is pretty good at it.
- Coding, Math: In our SFT, DPO fine-tuning data we have put an effort into improving coding and step-by-step math performance, while it is indeed not perfect, no LLM is.
# ExamplesWhat is ittext

# What is it
This is a 1.0 Fijik series with **3 billion** parameters, dense 56 layer transformer LLM based on Qwen2.5, specifically, it was merged using Mergekit to be twice as large as Qwen2.5 1.5B.
After merging, we used a custom dataset mix meant for this model, to improve its performance even more.
- **Step 1 for fine-tuning via unsloth:** SFT on an estimated 20 million tokens. (more or less)
- **Step 2 for the fine-tuning via unsloth:** DPO for 2 epochs for even better instruction following.
After these two steps, we got a powerful model which has less parameters than llama 3.1 8B yet performs just as good if not better, Note that unlike our other recent models, it is not a thinking model, yet it can reason quite well. Our theory behind this model is that a smaller yet deeper model can outperform for it's size.
Alibaba qwen states that Qwen2.5 was pre-trained on up to 18 trillion high-quality tokens. This model supports up to **32768** input tokens and can generate up to **8192** tokens.
# What should Fijik be used for?
Fijik 1.0 3B is by design, meant to be a production-ready, general use, high-performance model, which is also small enough to be run at high token throughputs while minimising performance loss.
- We made some efforts at ensuring the model is safe while keeping it useable. In addition, it is sensitive to system prompts (in a good way, adheres to them well), so it is very customisable. We did not put in our fine-tuning data any information about the identity of the model; rather it just knows that it is a Large Language Model (LLM), but it does not know it is Fijik, unless you specify in the system prompt.
- Due to the large context of the model, It can be used for RAG, but like any other LLM out there, you should be aware that it *may* hallucinate.
- In our fine-tuning data we included quite a bit of creative writing examples, so the model is pretty good at it.
- Coding, Math: In our SFT, DPO fine-tuning data we have put an effort into improving coding and step-by-step math performance, while it is indeed not perfect, no LLM is.
# ExamplesWhat is ittext

# What is it
This is a 1.0 Fijik series with **3 billion** parameters, dense 56 layer transformer LLM based on Qwen2.5, specifically, it was merged using Mergekit to be twice as large as Qwen2.5 1.5B.
After merging, we used a custom dataset mix meant for this model, to improve its performance even more.
- **Step 1 for fine-tuning via unsloth:** SFT on an estimated 20 million tokens. (more or less)
- **Step 2 for the fine-tuning via unsloth:** DPO for 2 epochs for even better instruction following.
After these two steps, we got a powerful model which has less parameters than llama 3.1 8B yet performs just as good if not better, Note that unlike our other recent models, it is not a thinking model, yet it can reason quite well. Our theory behind this model is that a smaller yet deeper model can outperform for it's size.
Alibaba qwen states that Qwen2.5 was pre-trained on up to 18 trillion high-quality tokens. This model supports up to **32768** input tokens and can generate up to **8192** tokens.
# What should Fijik be used for?
Fijik 1.0 3B is by design, meant to be a production-ready, general use, high-performance model, which is also small enough to be run at high token throughputs while minimising performance loss.
- We made some efforts at ensuring the model is safe while keeping it useable. In addition, it is sensitive to system prompts (in a good way, adheres to them well), so it is very customisable. We did not put in our fine-tuning data any information about the identity of the model; rather it just knows that it is a Large Language Model (LLM), but it does not know it is Fijik, unless you specify in the system prompt.
- Due to the large context of the model, It can be used for RAG, but like any other LLM out there, you should be aware that it *may* hallucinate.
- In our fine-tuning data we included quite a bit of creative writing examples, so the model is pretty good at it.
- Coding, Math: In our SFT, DPO fine-tuning data we have put an effort into improving coding and step-by-step math performance, while it is indeed not perfect, no LLM is.
# ExamplesWhat is ittext

# What is it
This is a 1.0 Fijik series with **3 billion** parameters, dense 56 layer transformer LLM based on Qwen2.5, specifically, it was merged using Mergekit to be twice as large as Qwen2.5 1.5B.
After merging, we used a custom dataset mix meant for this model, to improve its performance even more.
- **Step 1 for fine-tuning via unsloth:** SFT on an estimated 20 million tokens. (more or less)
- **Step 2 for the fine-tuning via unsloth:** DPO for 2 epochs for even better instruction following.
After these two steps, we got a powerful model which has less parameters than llama 3.1 8B yet performs just as good if not better, Note that unlike our other recent models, it is not a thinking model, yet it can reason quite well. Our theory behind this model is that a smaller yet deeper model can outperform for it's size.
Alibaba qwen states that Qwen2.5 was pre-trained on up to 18 trillion high-quality tokens. This model supports up to **32768** input tokens and can generate up to **8192** tokens.
# What should Fijik be used for?
Fijik 1.0 3B is by design, meant to be a production-ready, general use, high-performance model, which is also small enough to be run at high token throughputs while minimising performance loss.
- We made some efforts at ensuring the model is safe while keeping it useable. In addition, it is sensitive to system prompts (in a good way, adheres to them well), so it is very customisable. We did not put in our fine-tuning data any information about the identity of the model; rather it just knows that it is a Large Language Model (LLM), but it does not know it is Fijik, unless you specify in the system prompt.
- Due to the large context of the model, It can be used for RAG, but like any other LLM out there, you should be aware that it *may* hallucinate.
- In our fine-tuning data we included quite a bit of creative writing examples, so the model is pretty good at it.
- Coding, Math: In our SFT, DPO fine-tuning data we have put an effort into improving coding and step-by-step math performance, while it is indeed not perfect, no LLM is.
# ExamplesWhat is ittext

# What is it
This is a 1.0 Fijik series with **3 billion** parameters, dense 56 layer transformer LLM based on Qwen2.5, specifically, it was merged using Mergekit to be twice as large as Qwen2.5 1.5B.
After merging, we used a custom dataset mix meant for this model, to improve its performance even more.
- **Step 1 for fine-tuning via unsloth:** SFT on an estimated 20 million tokens. (more or less)
- **Step 2 for the fine-tuning via unsloth:** DPO for 2 epochs for even better instruction following.
After these two steps, we got a powerful model which has less parameters than llama 3.1 8B yet performs just as good if not better, Note that unlike our other recent models, it is not a thinking model, yet it can reason quite well. Our theory behind this model is that a smaller yet deeper model can outperform for it's size.
Alibaba qwen states that Qwen2.5 was pre-trained on up to 18 trillion high-quality tokens. This model supports up to **32768** input tokens and can generate up to **8192** tokens.
# What should Fijik be used for?
Fijik 1.0 3B is by design, meant to be a production-ready, general use, high-performance model, which is also small enough to be run at high token throughputs while minimising performance loss.
- We made some efforts at ensuring the model is safe while keeping it useable. In addition, it is sensitive to system prompts (in a good way, adheres to them well), so it is very customisable. We did not put in our fine-tuning data any information about the identity of the model; rather it just knows that it is a Large Language Model (LLM), but it does not know it is Fijik, unless you specify in the system prompt.
- Due to the large context of the model, It can be used for RAG, but like any other LLM out there, you should be aware that it *may* hallucinate.
- In our fine-tuning data we included quite a bit of creative writing examples, so the model is pretty good at it.
- Coding, Math: In our SFT, DPO fine-tuning data we have put an effort into improving coding and step-by-step math performance, while it is indeed not perfect, no LLM is.
# ExamplesWhat is ittext

# What is it
This is a 1.0 Fijik series with **3 billion** parameters, dense 56 layer transformer LLM based on Qwen2.5, specifically, it was merged using Mergekit to be twice as large as Qwen2.5 1.5B.
After merging, we used a custom dataset mix meant for this model, to improve its performance even more.
- **Step 1 for fine-tuning via unsloth:** SFT on an estimated 20 million tokens. (more or less)
- **Step 2 for the fine-tuning via unsloth:** DPO for 2 epochs for even better instruction following.
After these two steps, we got a powerful model which has less parameters than llama 3.1 8B yet performs just as good if not better, Note that unlike our other recent models, it is not a thinking model, yet it can reason quite well. Our theory behind this model is that a smaller yet deeper model can outperform for it's size.
Alibaba qwen states that Qwen2.5 was pre-trained on up to 18 trillion high-quality tokens. This model supports up to **32768** input tokens and can generate up to **8192** tokens.
# What should Fijik be used for?
Fijik 1.0 3B is by design, meant to be a production-ready, general use, high-performance model, which is also small enough to be run at high token throughputs while minimising performance loss.
- We made some efforts at ensuring the model is safe while keeping it useable. In addition, it is sensitive to system prompts (in a good way, adheres to them well), so it is very customisable. We did not put in our fine-tuning data any information about the identity of the model; rather it just knows that it is a Large Language Model (LLM), but it does not know it is Fijik, unless you specify in the system prompt.
- Due to the large context of the model, It can be used for RAG, but like any other LLM out there, you should be aware that it *may* hallucinate.
- In our fine-tuning data we included quite a bit of creative writing examples, so the model is pretty good at it.
- Coding, Math: In our SFT, DPO fine-tuning data we have put an effort into improving coding and step-by-step math performance, while it is indeed not perfect, no LLM is.
# ExamplesWhat is ittext

# What is it
This is a 1.0 Fijik series with **3 billion** parameters, dense 56 layer transformer LLM based on Qwen2.5, specifically, it was merged using Mergekit to be twice as large as Qwen2.5 1.5B.
After merging, we used a custom dataset mix meant for this model, to improve its performance even more.
- **Step 1 for fine-tuning via unsloth:** SFT on an estimated 20 million tokens. (more or less)
- **Step 2 for the fine-tuning via unsloth:** DPO for 2 epochs for even better instruction following.
After these two steps, we got a powerful model which has less parameters than llama 3.1 8B yet performs just as good if not better, Note that unlike our other recent models, it is not a thinking model, yet it can reason quite well. Our theory behind this model is that a smaller yet deeper model can outperform for it's size.
Alibaba qwen states that Qwen2.5 was pre-trained on up to 18 trillion high-quality tokens. This model supports up to **32768** input tokens and can generate up to **8192** tokens.
# What should Fijik be used for?
Fijik 1.0 3B is by design, meant to be a production-ready, general use, high-performance model, which is also small enough to be run at high token throughputs while minimising performance loss.
- We made some efforts at ensuring the model is safe while keeping it useable. In addition, it is sensitive to system prompts (in a good way, adheres to them well), so it is very customisable. We did not put in our fine-tuning data any information about the identity of the model; rather it just knows that it is a Large Language Model (LLM), but it does not know it is Fijik, unless you specify in the system prompt.
- Due to the large context of the model, It can be used for RAG, but like any other LLM out there, you should be aware that it *may* hallucinate.
- In our fine-tuning data we included quite a bit of creative writing examples, so the model is pretty good at it.
- Coding, Math: In our SFT, DPO fine-tuning data we have put an effort into improving coding and step-by-step math performance, while it is indeed not perfect, no LLM is.
# ExamplesWhat is ittext

# What is it
This is a 1.0 Fijik series with **3 billion** parameters, dense 56 layer transformer LLM based on Qwen2.5, specifically, it was merged using Mergekit to be twice as large as Qwen2.5 1.5B.
After merging, we used a custom dataset mix meant for this model, to improve its performance even more.
- **Step 1 for fine-tuning via unsloth:** SFT on an estimated 20 million tokens. (more or less)
- **Step 2 for the fine-tuning via unsloth:** DPO for 2 epochs for even better instruction following.
After these two steps, we got a powerful model which has less parameters than llama 3.1 8B yet performs just as good if not better, Note that unlike our other recent models, it is not a thinking model, yet it can reason quite well. Our theory behind this model is that a smaller yet deeper model can outperform for it's size.
Alibaba qwen states that Qwen2.5 was pre-trained on up to 18 trillion high-quality tokens. This model supports up to **32768** input tokens and can generate up to **8192** tokens.
# What should Fijik be used for?
Fijik 1.0 3B is by design, meant to be a production-ready, general use, high-performance model, which is also small enough to be run at high token throughputs while minimising performance loss.
- We made some efforts at ensuring the model is safe while keeping it useable. In addition, it is sensitive to system prompts (in a good way, adheres to them well), so it is very customisable. We did not put in our fine-tuning data any information about the identity of the model; rather it just knows that it is a Large Language Model (LLM), but it does not know it is Fijik, unless you specify in the system prompt.
- Due to the large context of the model, It can be used for RAG, but like any other LLM out there, you should be aware that it *may* hallucinate.
- In our fine-tuning data we included quite a bit of creative writing examples, so the model is pretty good at it.
- Coding, Math: In our SFT, DPO fine-tuning data we have put an effort into improving coding and step-by-step math performance, while it is indeed not perfect, no LLM is.
# ExamplesWhat is ittext

# What is it
This is a 1.0 Fijik series with **3 billion** parameters, dense 56 layer transformer LLM based on Qwen2.5, specifically, it was merged using Mergekit to be twice as large as Qwen2.5 1.5B.
After merging, we used a custom dataset mix meant for this model, to improve its performance even more.
- **Step 1 for fine-tuning via unsloth:** SFT on an estimated 20 million tokens. (more or less)
- **Step 2 for the fine-tuning via unsloth:** DPO for 2 epochs for even better instruction following.
After these two steps, we got a powerful model which has less parameters than llama 3.1 8B yet performs just as good if not better, Note that unlike our other recent models, it is not a thinking model, yet it can reason quite well. Our theory behind this model is that a smaller yet deeper model can outperform for it's size.
Alibaba qwen states that Qwen2.5 was pre-trained on up to 18 trillion high-quality tokens. This model supports up to **32768** input tokens and can generate up to **8192** tokens.
# What should Fijik be used for?
Fijik 1.0 3B is by design, meant to be a production-ready, general use, high-performance model, which is also small enough to be run at high token throughputs while minimising performance loss.
- We made some efforts at ensuring the model is safe while keeping it useable. In addition, it is sensitive to system prompts (in a good way, adheres to them well), so it is very customisable. We did not put in our fine-tuning data any information about the identity of the model; rather it just knows that it is a Large Language Model (LLM), but it does not know it is Fijik, unless you specify in the system prompt.
- Due to the large context of the model, It can be used for RAG, but like any other LLM out there, you should be aware that it *may* hallucinate.
- In our fine-tuning data we included quite a bit of creative writing examples, so the model is pretty good at it.
- Coding, Math: In our SFT, DPO fine-tuning data we have put an effort into improving coding and step-by-step math performance, while it is indeed not perfect, no LLM is.
# ExamplesWhat is ittext

# What is it
This is a 1.0 Fijik series with **3 billion** parameters, dense 56 layer transformer LLM based on Qwen2.5, specifically, it was merged using Mergekit to be twice as large as Qwen2.5 1.5B.
After merging, we used a custom dataset mix meant for this model, to improve its performance even more.
- **Step 1 for fine-tuning via unsloth:** SFT on an estimated 20 million tokens. (more or less)
- **Step 2 for the fine-tuning via unsloth:** DPO for 2 epochs for even better instruction following.
After these two steps, we got a powerful model which has less parameters than llama 3.1 8B yet performs just as good if not better, Note that unlike our other recent models, it is not a thinking model, yet it can reason quite well. Our theory behind this model is that a smaller yet deeper model can outperform for it's size.
Alibaba qwen states that Qwen2.5 was pre-trained on up to 18 trillion high-quality tokens. This model supports up to **32768** input tokens and can generate up to **8192** tokens.
# What should Fijik be used for?
Fijik 1.0 3B is by design, meant to be a production-ready, general use, high-performance model, which is also small enough to be run at high token throughputs while minimising performance loss.
- We made some efforts at ensuring the model is safe while keeping it useable. In addition, it is sensitive to system prompts (in a good way, adheres to them well), so it is very customisable. We did not put in our fine-tuning data any information about the identity of the model; rather it just knows that it is a Large Language Model (LLM), but it does not know it is Fijik, unless you specify in the system prompt.
- Due to the large context of the model, It can be used for RAG, but like any other LLM out there, you should be aware that it *may* hallucinate.
- In our fine-tuning data we included quite a bit of creative writing examples, so the model is pretty good at it.
- Coding, Math: In our SFT, DPO fine-tuning data we have put an effort into improving coding and step-by-step math performance, while it is indeed not perfect, no LLM is.
# ExamplesWhat is ittext

# What is it
This is a 1.0 Fijik series with **3 billion** parameters, dense 56 layer transformer LLM based on Qwen2.5, specifically, it was merged using Mergekit to be twice as large as Qwen2.5 1.5B.
After merging, we used a custom dataset mix meant for this model, to improve its performance even more.
- **Step 1 for fine-tuning via unsloth:** SFT on an estimated 20 million tokens. (more or less)
- **Step 2 for the fine-tuning via unsloth:** DPO for 2 epochs for even better instruction following.
After these two steps, we got a powerful model which has less parameters than llama 3.1 8B yet performs just as good if not better, Note that unlike our other recent models, it is not a thinking model, yet it can reason quite well. Our theory behind this model is that a smaller yet deeper model can outperform for it's size.
Alibaba qwen states that Qwen2.5 was pre-trained on up to 18 trillion high-quality tokens. This model supports up to **32768** input tokens and can generate up to **8192** tokens.
# What should Fijik be used for?
Fijik 1.0 3B is by design, meant to be a production-ready, general use, high-performance model, which is also small enough to be run at high token throughputs while minimising performance loss.
- We made some efforts at ensuring the model is safe while keeping it useable. In addition, it is sensitive to system prompts (in a good way, adheres to them well), so it is very customisable. We did not put in our fine-tuning data any information about the identity of the model; rather it just knows that it is a Large Language Model (LLM), but it does not know it is Fijik, unless you specify in the system prompt.
- Due to the large context of the model, It can be used for RAG, but like any other LLM out there, you should be aware that it *may* hallucinate.
- In our fine-tuning data we included quite a bit of creative writing examples, so the model is pretty good at it.
- Coding, Math: In our SFT, DPO fine-tuning data we have put an effort into improving coding and step-by-step math performance, while it is indeed not perfect, no LLM is.
# ExamplesWhat is ittext

# What is it
This is a 1.0 Fijik series with **3 billion** parameters, dense 56 layer transformer LLM based on Qwen2.5, specifically, it was merged using Mergekit to be twice as large as Qwen2.5 1.5B.
After merging, we used a custom dataset mix meant for this model, to improve its performance even more.
- **Step 1 for fine-tuning via unsloth:** SFT on an estimated 20 million tokens. (more or less)
- **Step 2 for the fine-tuning via unsloth:** DPO for 2 epochs for even better instruction following.
After these two steps, we got a powerful model which has less parameters than llama 3.1 8B yet performs just as good if not better, Note that unlike our other recent models, it is not a thinking model, yet it can reason quite well. Our theory behind this model is that a smaller yet deeper model can outperform for it's size.
Alibaba qwen states that Qwen2.5 was pre-trained on up to 18 trillion high-quality tokens. This model supports up to **32768** input tokens and can generate up to **8192** tokens.
# What should Fijik be used for?
Fijik 1.0 3B is by design, meant to be a production-ready, general use, high-performance model, which is also small enough to be run at high token throughputs while minimising performance loss.
- We made some efforts at ensuring the model is safe while keeping it useable. In addition, it is sensitive to system prompts (in a good way, adheres to them well), so it is very customisable. We did not put in our fine-tuning data any information about the identity of the model; rather it just knows that it is a Large Language Model (LLM), but it does not know it is Fijik, unless you specify in the system prompt.
- Due to the large context of the model, It can be used for RAG, but like any other LLM out there, you should be aware that it *may* hallucinate.
- In our fine-tuning data we included quite a bit of creative writing examples, so the model is pretty good at it.
- Coding, Math: In our SFT, DPO fine-tuning data we have put an effort into improving coding and step-by-step math performance, while it is indeed not perfect, no LLM is.
# ExamplesWhat is ittext

# What is it
This is a 1.0 Fijik series with **3 billion** parameters, dense 56 layer transformer LLM based on Qwen2.5, specifically, it was merged using Mergekit to be twice as large as Qwen2.5 1.5B.
After merging, we used a custom dataset mix meant for this model, to improve its performance even more.
- **Step 1 for fine-tuning via unsloth:** SFT on an estimated 20 million tokens. (more or less)
- **Step 2 for the fine-tuning via unsloth:** DPO for 2 epochs for even better instruction following.
After these two steps, we got a powerful model which has less parameters than llama 3.1 8B yet performs just as good if not better, Note that unlike our other recent models, it is not a thinking model, yet it can reason quite well. Our theory behind this model is that a smaller yet deeper model can outperform for it's size.
Alibaba qwen states that Qwen2.5 was pre-trained on up to 18 trillion high-quality tokens. This model supports up to **32768** input tokens and can generate up to **8192** tokens.
# What should Fijik be used for?
Fijik 1.0 3B is by design, meant to be a production-ready, general use, high-performance model, which is also small enough to be run at high token throughputs while minimising performance loss.
- We made some efforts at ensuring the model is safe while keeping it useable. In addition, it is sensitive to system prompts (in a good way, adheres to them well), so it is very customisable. We did not put in our fine-tuning data any information about the identity of the model; rather it just knows that it is a Large Language Model (LLM), but it does not know it is Fijik, unless you specify in the system prompt.
- Due to the large context of the model, It can be used for RAG, but like any other LLM out there, you should be aware that it *may* hallucinate.
- In our fine-tuning data we included quite a bit of creative writing examples, so the model is pretty good at it.
- Coding, Math: In our SFT, DPO fine-tuning data we have put an effort into improving coding and step-by-step math performance, while it is indeed not perfect, no LLM is.
# ExamplesWhat is ittext

# What is it
This is a 1.0 Fijik series with **3 billion** parameters, dense 56 layer transformer LLM based on Qwen2.5, specifically, it was merged using Mergekit to be twice as large as Qwen2.5 1.5B.
After merging, we used a custom dataset mix meant for this model, to improve its performance even more.
- **Step 1 for fine-tuning via unsloth:** SFT on an estimated 20 million tokens. (more or less)
- **Step 2 for the fine-tuning via unsloth:** DPO for 2 epochs for even better instruction following.
After these two steps, we got a powerful model which has less parameters than llama 3.1 8B yet performs just as good if not better, Note that unlike our other recent models, it is not a thinking model, yet it can reason quite well. Our theory behind this model is that a smaller yet deeper model can outperform for it's size.
Alibaba qwen states that Qwen2.5 was pre-trained on up to 18 trillion high-quality tokens. This model supports up to **32768** input tokens and can generate up to **8192** tokens.
# What should Fijik be used for?
Fijik 1.0 3B is by design, meant to be a production-ready, general use, high-performance model, which is also small enough to be run at high token throughputs while minimising performance loss.
- We made some efforts at ensuring the model is safe while keeping it useable. In addition, it is sensitive to system prompts (in a good way, adheres to them well), so it is very customisable. We did not put in our fine-tuning data any information about the identity of the model; rather it just knows that it is a Large Language Model (LLM), but it does not know it is Fijik, unless you specify in the system prompt.
- Due to the large context of the model, It can be used for RAG, but like any other LLM out there, you should be aware that it *may* hallucinate.
- In our fine-tuning data we included quite a bit of creative writing examples, so the model is pretty good at it.
- Coding, Math: In our SFT, DPO fine-tuning data we have put an effort into improving coding and step-by-step math performance, while it is indeed not perfect, no LLM is.
# ExamplesWhat is ittext

# What is it
This is a 1.0 Fijik series with **3 billion** parameters, dense 56 layer transformer LLM based on Qwen2.5, specifically, it was merged using Mergekit to be twice as large as Qwen2.5 1.5B.
After merging, we used a custom dataset mix meant for this model, to improve its performance even more.
- **Step 1 for fine-tuning via unsloth:** SFT on an estimated 20 million tokens. (more or less)
- **Step 2 for the fine-tuning via unsloth:** DPO for 2 epochs for even better instruction following.
After these two steps, we got a powerful model which has less parameters than llama 3.1 8B yet performs just as good if not better, Note that unlike our other recent models, it is not a thinking model, yet it can reason quite well. Our theory behind this model is that a smaller yet deeper model can outperform for it's size.
Alibaba qwen states that Qwen2.5 was pre-trained on up to 18 trillion high-quality tokens. This model supports up to **32768** input tokens and can generate up to **8192** tokens.
# What should Fijik be used for?
Fijik 1.0 3B is by design, meant to be a production-ready, general use, high-performance model, which is also small enough to be run at high token throughputs while minimising performance loss.
- We made some efforts at ensuring the model is safe while keeping it useable. In addition, it is sensitive to system prompts (in a good way, adheres to them well), so it is very customisable. We did not put in our fine-tuning data any information about the identity of the model; rather it just knows that it is a Large Language Model (LLM), but it does not know it is Fijik, unless you specify in the system prompt.
- Due to the large context of the model, It can be used for RAG, but like any other LLM out there, you should be aware that it *may* hallucinate.
- In our fine-tuning data we included quite a bit of creative writing examples, so the model is pretty good at it.
- Coding, Math: In our SFT, DPO fine-tuning data we have put an effort into improving coding and step-by-step math performance, while it is indeed not perfect, no LLM is.
# ExamplesWhat is ittext

# What is it
This is a 1.0 Fijik series with **3 billion** parameters, dense 56 layer transformer LLM based on Qwen2.5, specifically, it was merged using Mergekit to be twice as large as Qwen2.5 1.5B.
After merging, we used a custom dataset mix meant for this model, to improve its performance even more.
- **Step 1 for fine-tuning via unsloth:** SFT on an estimated 20 million tokens. (more or less)
- **Step 2 for the fine-tuning via unsloth:** DPO for 2 epochs for even better instruction following.
After these two steps, we got a powerful model which has less parameters than llama 3.1 8B yet performs just as good if not better, Note that unlike our other recent models, it is not a thinking model, yet it can reason quite well. Our theory behind this model is that a smaller yet deeper model can outperform for it's size.
Alibaba qwen states that Qwen2.5 was pre-trained on up to 18 trillion high-quality tokens. This model supports up to **32768** input tokens and can generate up to **8192** tokens.
# What should Fijik be used for?
Fijik 1.0 3B is by design, meant to be a production-ready, general use, high-performance model, which is also small enough to be run at high token throughputs while minimising performance loss.
- We made some efforts at ensuring the model is safe while keeping it useable. In addition, it is sensitive to system prompts (in a good way, adheres to them well), so it is very customisable. We did not put in our fine-tuning data any information about the identity of the model; rather it just knows that it is a Large Language Model (LLM), but it does not know it is Fijik, unless you specify in the system prompt.
- Due to the large context of the model, It can be used for RAG, but like any other LLM out there, you should be aware that it *may* hallucinate.
- In our fine-tuning data we included quite a bit of creative writing examples, so the model is pretty good at it.
- Coding, Math: In our SFT, DPO fine-tuning data we have put an effort into improving coding and step-by-step math performance, while it is indeed not perfect, no LLM is.
# ExamplesWhat is ittext

# What is it
This is a 1.0 Fijik series with **3 billion** parameters, dense 56 layer transformer LLM based on Qwen2.5, specifically, it was merged using Mergekit to be twice as large as Qwen2.5 1.5B.
After merging, we used a custom dataset mix meant for this model, to improve its performance even more.
- **Step 1 for fine-tuning via unsloth:** SFT on an estimated 20 million tokens. (more or less)
- **Step 2 for the fine-tuning via unsloth:** DPO for 2 epochs for even better instruction following.
After these two steps, we got a powerful model which has less parameters than llama 3.1 8B yet performs just as good if not better, Note that unlike our other recent models, it is not a thinking model, yet it can reason quite well. Our theory behind this model is that a smaller yet deeper model can outperform for it's size.
Alibaba qwen states that Qwen2.5 was pre-trained on up to 18 trillion high-quality tokens. This model supports up to **32768** input tokens and can generate up to **8192** tokens.
# What should Fijik be used for?
Fijik 1.0 3B is by design, meant to be a production-ready, general use, high-performance model, which is also small enough to be run at high token throughputs while minimising performance loss.
- We made some efforts at ensuring the model is safe while keeping it useable. In addition, it is sensitive to system prompts (in a good way, adheres to them well), so it is very customisable. We did not put in our fine-tuning data any information about the identity of the model; rather it just knows that it is a Large Language Model (LLM), but it does not know it is Fijik, unless you specify in the system prompt.
- Due to the large context of the model, It can be used for RAG, but like any other LLM out there, you should be aware that it *may* hallucinate.
- In our fine-tuning data we included quite a bit of creative writing examples, so the model is pretty good at it.
- Coding, Math: In our SFT, DPO fine-tuning data we have put an effort into improving coding and step-by-step math performance, while it is indeed not perfect, no LLM is.
# ExamplesWhat is ittext

# What is it
This is a 1.0 Fijik series with **3 billion** parameters, dense 56 layer transformer LLM based on Qwen2.5, specifically, it was merged using Mergekit to be twice as large as Qwen2.5 1.5B.
After merging, we used a custom dataset mix meant for this model, to improve its performance even more.
- **Step 1 for fine-tuning via unsloth:** SFT on an estimated 20 million tokens. (more or less)
- **Step 2 for the fine-tuning via unsloth:** DPO for 2 epochs for even better instruction following.
After these two steps, we got a powerful model which has less parameters than llama 3.1 8B yet performs just as good if not better, Note that unlike our other recent models, it is not a thinking model, yet it can reason quite well. Our theory behind this model is that a smaller yet deeper model can outperform for it's size.
Alibaba qwen states that Qwen2.5 was pre-trained on up to 18 trillion high-quality tokens. This model supports up to **32768** input tokens and can generate up to **8192** tokens.
# What should Fijik be used for?
Fijik 1.0 3B is by design, meant to be a production-ready, general use, high-performance model, which is also small enough to be run at high token throughputs while minimising performance loss.
- We made some efforts at ensuring the model is safe while keeping it useable. In addition, it is sensitive to system prompts (in a good way, adheres to them well), so it is very customisable. We did not put in our fine-tuning data any information about the identity of the model; rather it just knows that it is a Large Language Model (LLM), but it does not know it is Fijik, unless you specify in the system prompt.
- Due to the large context of the model, It can be used for RAG, but like any other LLM out there, you should be aware that it *may* hallucinate.
- In our fine-tuning data we included quite a bit of creative writing examples, so the model is pretty good at it.
- Coding, Math: In our SFT, DPO fine-tuning data we have put an effort into improving coding and step-by-step math performance, while it is indeed not perfect, no LLM is.
# ExamplesWhat is ittext

# What is it
This is a 1.0 Fijik series with **3 billion** parameters, dense 56 layer transformer LLM based on Qwen2.5, specifically, it was merged using Mergekit to be twice as large as Qwen2.5 1.5B.
After merging, we used a custom dataset mix meant for this model, to improve its performance even more.
- **Step 1 for fine-tuning via unsloth:** SFT on an estimated 20 million tokens. (more or less)
- **Step 2 for the fine-tuning via unsloth:** DPO for 2 epochs for even better instruction following.
After these two steps, we got a powerful model which has less parameters than llama 3.1 8B yet performs just as good if not better, Note that unlike our other recent models, it is not a thinking model, yet it can reason quite well. Our theory behind this model is that a smaller yet deeper model can outperform for it's size.
Alibaba qwen states that Qwen2.5 was pre-trained on up to 18 trillion high-quality tokens. This model supports up to **32768** input tokens and can generate up to **8192** tokens.
# What should Fijik be used for?
Fijik 1.0 3B is by design, meant to be a production-ready, general use, high-performance model, which is also small enough to be run at high token throughputs while minimising performance loss.
- We made some efforts at ensuring the model is safe while keeping it useable. In addition, it is sensitive to system prompts (in a good way, adheres to them well), so it is very customisable. We did not put in our fine-tuning data any information about the identity of the model; rather it just knows that it is a Large Language Model (LLM), but it does not know it is Fijik, unless you specify in the system prompt.
- Due to the large context of the model, It can be used for RAG, but like any other LLM out there, you should be aware that it *may* hallucinate.
- In our fine-tuning data we included quite a bit of creative writing examples, so the model is pretty good at it.
- Coding, Math: In our SFT, DPO fine-tuning data we have put an effort into improving coding and step-by-step math performance, while it is indeed not perfect, no LLM is.
# ExamplesCitationstext
{
title={Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing},
author={Zhangchen Xu and Fengqing Jiang and Luyao Niu and Yuntian Deng and Radha Poovendran and Yejin Choi and Bill Yuchen Lin},
year={2024},
eprint={2406.08464},
archivePrefix={arXiv},
primaryClass={cs.CL}
}Citationstext
{
title={Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing},
author={Zhangchen Xu and Fengqing Jiang and Luyao Niu and Yuntian Deng and Radha Poovendran and Yejin Choi and Bill Yuchen Lin},
year={2024},
eprint={2406.08464},
archivePrefix={arXiv},
primaryClass={cs.CL}
}Citationstext
{
title={Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing},
author={Zhangchen Xu and Fengqing Jiang and Luyao Niu and Yuntian Deng and Radha Poovendran and Yejin Choi and Bill Yuchen Lin},
year={2024},
eprint={2406.08464},
archivePrefix={arXiv},
primaryClass={cs.CL}
}Citationstext
{
title={Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing},
author={Zhangchen Xu and Fengqing Jiang and Luyao Niu and Yuntian Deng and Radha Poovendran and Yejin Choi and Bill Yuchen Lin},
year={2024},
eprint={2406.08464},
archivePrefix={arXiv},
primaryClass={cs.CL}
}Citationstext
{
title={Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing},
author={Zhangchen Xu and Fengqing Jiang and Luyao Niu and Yuntian Deng and Radha Poovendran and Yejin Choi and Bill Yuchen Lin},
year={2024},
eprint={2406.08464},
archivePrefix={arXiv},
primaryClass={cs.CL}
}Citationstext
{
title={Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing},
author={Zhangchen Xu and Fengqing Jiang and Luyao Niu and Yuntian Deng and Radha Poovendran and Yejin Choi and Bill Yuchen Lin},
year={2024},
eprint={2406.08464},
archivePrefix={arXiv},
primaryClass={cs.CL}
}Citationstext
{
title={Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing},
author={Zhangchen Xu and Fengqing Jiang and Luyao Niu and Yuntian Deng and Radha Poovendran and Yejin Choi and Bill Yuchen Lin},
year={2024},
eprint={2406.08464},
archivePrefix={arXiv},
primaryClass={cs.CL}
}Citationstext
{
title={Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing},
author={Zhangchen Xu and Fengqing Jiang and Luyao Niu and Yuntian Deng and Radha Poovendran and Yejin Choi and Bill Yuchen Lin},
year={2024},
eprint={2406.08464},
archivePrefix={arXiv},
primaryClass={cs.CL}
}Citationstext
{
title={Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing},
author={Zhangchen Xu and Fengqing Jiang and Luyao Niu and Yuntian Deng and Radha Poovendran and Yejin Choi and Bill Yuchen Lin},
year={2024},
eprint={2406.08464},
archivePrefix={arXiv},
primaryClass={cs.CL}
}Citationstext
{
title={Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing},
author={Zhangchen Xu and Fengqing Jiang and Luyao Niu and Yuntian Deng and Radha Poovendran and Yejin Choi and Bill Yuchen Lin},
year={2024},
eprint={2406.08464},
archivePrefix={arXiv},
primaryClass={cs.CL}
}Citationstext
{
title={Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing},
author={Zhangchen Xu and Fengqing Jiang and Luyao Niu and Yuntian Deng and Radha Poovendran and Yejin Choi and Bill Yuchen Lin},
year={2024},
eprint={2406.08464},
archivePrefix={arXiv},
primaryClass={cs.CL}
}Citationstext
{
title={Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing},
author={Zhangchen Xu and Fengqing Jiang and Luyao Niu and Yuntian Deng and Radha Poovendran and Yejin Choi and Bill Yuchen Lin},
year={2024},
eprint={2406.08464},
archivePrefix={arXiv},
primaryClass={cs.CL}
}Citationstext
{
title={Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing},
author={Zhangchen Xu and Fengqing Jiang and Luyao Niu and Yuntian Deng and Radha Poovendran and Yejin Choi and Bill Yuchen Lin},
year={2024},
eprint={2406.08464},
archivePrefix={arXiv},
primaryClass={cs.CL}
}Citationstext
{
title={Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing},
author={Zhangchen Xu and Fengqing Jiang and Luyao Niu and Yuntian Deng and Radha Poovendran and Yejin Choi and Bill Yuchen Lin},
year={2024},
eprint={2406.08464},
archivePrefix={arXiv},
primaryClass={cs.CL}
}Citationstext
{
title={Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing},
author={Zhangchen Xu and Fengqing Jiang and Luyao Niu and Yuntian Deng and Radha Poovendran and Yejin Choi and Bill Yuchen Lin},
year={2024},
eprint={2406.08464},
archivePrefix={arXiv},
primaryClass={cs.CL}
}Citationstext
{
title={Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing},
author={Zhangchen Xu and Fengqing Jiang and Luyao Niu and Yuntian Deng and Radha Poovendran and Yejin Choi and Bill Yuchen Lin},
year={2024},
eprint={2406.08464},
archivePrefix={arXiv},
primaryClass={cs.CL}
}Citationstext
{
title={Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing},
author={Zhangchen Xu and Fengqing Jiang and Luyao Niu and Yuntian Deng and Radha Poovendran and Yejin Choi and Bill Yuchen Lin},
year={2024},
eprint={2406.08464},
archivePrefix={arXiv},
primaryClass={cs.CL}
}Citationstext
{
title={Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing},
author={Zhangchen Xu and Fengqing Jiang and Luyao Niu and Yuntian Deng and Radha Poovendran and Yejin Choi and Bill Yuchen Lin},
year={2024},
eprint={2406.08464},
archivePrefix={arXiv},
primaryClass={cs.CL}
}Citationstext
{
title={Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing},
author={Zhangchen Xu and Fengqing Jiang and Luyao Niu and Yuntian Deng and Radha Poovendran and Yejin Choi and Bill Yuchen Lin},
year={2024},
eprint={2406.08464},
archivePrefix={arXiv},
primaryClass={cs.CL}
}Citationstext
{
title={Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing},
author={Zhangchen Xu and Fengqing Jiang and Luyao Niu and Yuntian Deng and Radha Poovendran and Yejin Choi and Bill Yuchen Lin},
year={2024},
eprint={2406.08464},
archivePrefix={arXiv},
primaryClass={cs.CL}
}Citationstext
{
title={Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing},
author={Zhangchen Xu and Fengqing Jiang and Luyao Niu and Yuntian Deng and Radha Poovendran and Yejin Choi and Bill Yuchen Lin},
year={2024},
eprint={2406.08464},
archivePrefix={arXiv},
primaryClass={cs.CL}
}Citationstext
{
title={Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing},
author={Zhangchen Xu and Fengqing Jiang and Luyao Niu and Yuntian Deng and Radha Poovendran and Yejin Choi and Bill Yuchen Lin},
year={2024},
eprint={2406.08464},
archivePrefix={arXiv},
primaryClass={cs.CL}
}Citationstext
{
title={Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing},
author={Zhangchen Xu and Fengqing Jiang and Luyao Niu and Yuntian Deng and Radha Poovendran and Yejin Choi and Bill Yuchen Lin},
year={2024},
eprint={2406.08464},
archivePrefix={arXiv},
primaryClass={cs.CL}
}Citationstext
{
title={Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing},
author={Zhangchen Xu and Fengqing Jiang and Luyao Niu and Yuntian Deng and Radha Poovendran and Yejin Choi and Bill Yuchen Lin},
year={2024},
eprint={2406.08464},
archivePrefix={arXiv},
primaryClass={cs.CL}
}Citationstext
{
title={Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing},
author={Zhangchen Xu and Fengqing Jiang and Luyao Niu and Yuntian Deng and Radha Poovendran and Yejin Choi and Bill Yuchen Lin},
year={2024},
eprint={2406.08464},
archivePrefix={arXiv},
primaryClass={cs.CL}
}Citationstext
{
title={Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing},
author={Zhangchen Xu and Fengqing Jiang and Luyao Niu and Yuntian Deng and Radha Poovendran and Yejin Choi and Bill Yuchen Lin},
year={2024},
eprint={2406.08464},
archivePrefix={arXiv},
primaryClass={cs.CL}
}Citationstext
{
title={Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing},
author={Zhangchen Xu and Fengqing Jiang and Luyao Niu and Yuntian Deng and Radha Poovendran and Yejin Choi and Bill Yuchen Lin},
year={2024},
eprint={2406.08464},
archivePrefix={arXiv},
primaryClass={cs.CL}
}Citationstext
{
title={Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing},
author={Zhangchen Xu and Fengqing Jiang and Luyao Niu and Yuntian Deng and Radha Poovendran and Yejin Choi and Bill Yuchen Lin},
year={2024},
eprint={2406.08464},
archivePrefix={arXiv},
primaryClass={cs.CL}
}Citationstext
{
title={Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing},
author={Zhangchen Xu and Fengqing Jiang and Luyao Niu and Yuntian Deng and Radha Poovendran and Yejin Choi and Bill Yuchen Lin},
year={2024},
eprint={2406.08464},
archivePrefix={arXiv},
primaryClass={cs.CL}
}Citationstext
{
title={Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing},
author={Zhangchen Xu and Fengqing Jiang and Luyao Niu and Yuntian Deng and Radha Poovendran and Yejin Choi and Bill Yuchen Lin},
year={2024},
eprint={2406.08464},
archivePrefix={arXiv},
primaryClass={cs.CL}
}Citationstext
{
title={Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing},
author={Zhangchen Xu and Fengqing Jiang and Luyao Niu and Yuntian Deng and Radha Poovendran and Yejin Choi and Bill Yuchen Lin},
year={2024},
eprint={2406.08464},
archivePrefix={arXiv},
primaryClass={cs.CL}
}Citationstext
{
title={Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing},
author={Zhangchen Xu and Fengqing Jiang and Luyao Niu and Yuntian Deng and Radha Poovendran and Yejin Choi and Bill Yuchen Lin},
year={2024},
eprint={2406.08464},
archivePrefix={arXiv},
primaryClass={cs.CL}
}Citationstext
{
title={Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing},
author={Zhangchen Xu and Fengqing Jiang and Luyao Niu and Yuntian Deng and Radha Poovendran and Yejin Choi and Bill Yuchen Lin},
year={2024},
eprint={2406.08464},
archivePrefix={arXiv},
primaryClass={cs.CL}
}Deploy This Model
Production-ready deployment in minutes
Together.ai
Instant API access to this model
Production-ready inference API. Start free, scale to millions.
Try Free APIReplicate
One-click model deployment
Run models in the cloud with simple API. No DevOps required.
Deploy NowDisclosure: We may earn a commission from these partners. This helps keep LLMYourWay free.