Fijik-3b-Instruct

Name: Fijik-3b-Instruct
Author: Pinkstack

3.0B

3 languages

license:apache-2.0

Pinkstack

Language Model

OTHER

3B params

New

0 downloads

Early-stage

Try on Hugging Face Add to Compare

Edge AI:

Mobile

Laptop

Server

7GB+ RAM

Mobile

Laptop

Server

Quick Summary

AI model with specialized capabilities.

Device Compatibility

Mobile

4-6GB RAM

Laptop

16GB RAM

Server

GPU

Minimum Recommended

3GB+ RAM

Code Examples

What is ittext

![image/png](https://cdn-uploads.huggingface.co/production/uploads/6710ba6af1279fe0dfe33afe/m8o_qX4M5A5bd_qqSBQPN.png)

# What is it
    This is a 1.0 Fijik series with **3 billion** parameters, dense 56 layer transformer LLM based on Qwen2.5, specifically, it was merged using Mergekit to be twice as large as Qwen2.5 1.5B.

After merging, we used a custom dataset mix meant for this model, to improve its performance even more.
- **Step 1 for fine-tuning via unsloth:** SFT on an estimated 20 million tokens. (more or less)
- **Step 2 for the fine-tuning via unsloth:** DPO for 2 epochs for even better instruction following.
After these two steps, we got a powerful model which has less parameters than llama 3.1 8B yet performs just as good if not better, Note that unlike our other recent models, it is not a thinking model, yet it can reason quite well. Our theory behind this model is that a smaller yet deeper model can outperform for it's size.

Alibaba qwen states that Qwen2.5 was pre-trained on up to 18 trillion high-quality tokens. This model supports up to **32768** input tokens and can generate up to **8192** tokens.

# What should Fijik be used for?
Fijik 1.0 3B is by design, meant to be a production-ready, general use, high-performance model, which is also small enough to be run at high token throughputs while minimising performance loss.
- We made some efforts at ensuring the model is safe while keeping it useable. In addition, it is sensitive to system prompts (in a good way, adheres to them well), so it is very customisable. We did not put in our fine-tuning data any information about the identity of the model; rather it just knows that it is a Large Language Model (LLM), but it does not know it is Fijik, unless you specify in the system prompt.
- Due to the large context of the model, It can be used for RAG, but like any other LLM out there, you should be aware that it *may* hallucinate.
- In our fine-tuning data we included quite a bit of creative writing examples, so the model is pretty good at it.
- Coding, Math: In our SFT, DPO fine-tuning data we have put an effort into improving coding and step-by-step math performance, while it is indeed not perfect, no LLM is.
# Examples

What is ittext

![image/png](https://cdn-uploads.huggingface.co/production/uploads/6710ba6af1279fe0dfe33afe/m8o_qX4M5A5bd_qqSBQPN.png)

# What is it
    This is a 1.0 Fijik series with **3 billion** parameters, dense 56 layer transformer LLM based on Qwen2.5, specifically, it was merged using Mergekit to be twice as large as Qwen2.5 1.5B.

After merging, we used a custom dataset mix meant for this model, to improve its performance even more.
- **Step 1 for fine-tuning via unsloth:** SFT on an estimated 20 million tokens. (more or less)
- **Step 2 for the fine-tuning via unsloth:** DPO for 2 epochs for even better instruction following.
After these two steps, we got a powerful model which has less parameters than llama 3.1 8B yet performs just as good if not better, Note that unlike our other recent models, it is not a thinking model, yet it can reason quite well. Our theory behind this model is that a smaller yet deeper model can outperform for it's size.

Alibaba qwen states that Qwen2.5 was pre-trained on up to 18 trillion high-quality tokens. This model supports up to **32768** input tokens and can generate up to **8192** tokens.

# What should Fijik be used for?
Fijik 1.0 3B is by design, meant to be a production-ready, general use, high-performance model, which is also small enough to be run at high token throughputs while minimising performance loss.
- We made some efforts at ensuring the model is safe while keeping it useable. In addition, it is sensitive to system prompts (in a good way, adheres to them well), so it is very customisable. We did not put in our fine-tuning data any information about the identity of the model; rather it just knows that it is a Large Language Model (LLM), but it does not know it is Fijik, unless you specify in the system prompt.
- Due to the large context of the model, It can be used for RAG, but like any other LLM out there, you should be aware that it *may* hallucinate.
- In our fine-tuning data we included quite a bit of creative writing examples, so the model is pretty good at it.
- Coding, Math: In our SFT, DPO fine-tuning data we have put an effort into improving coding and step-by-step math performance, while it is indeed not perfect, no LLM is.
# Examples

What is ittext

![image/png](https://cdn-uploads.huggingface.co/production/uploads/6710ba6af1279fe0dfe33afe/m8o_qX4M5A5bd_qqSBQPN.png)

# What is it
    This is a 1.0 Fijik series with **3 billion** parameters, dense 56 layer transformer LLM based on Qwen2.5, specifically, it was merged using Mergekit to be twice as large as Qwen2.5 1.5B.

After merging, we used a custom dataset mix meant for this model, to improve its performance even more.
- **Step 1 for fine-tuning via unsloth:** SFT on an estimated 20 million tokens. (more or less)
- **Step 2 for the fine-tuning via unsloth:** DPO for 2 epochs for even better instruction following.
After these two steps, we got a powerful model which has less parameters than llama 3.1 8B yet performs just as good if not better, Note that unlike our other recent models, it is not a thinking model, yet it can reason quite well. Our theory behind this model is that a smaller yet deeper model can outperform for it's size.

Alibaba qwen states that Qwen2.5 was pre-trained on up to 18 trillion high-quality tokens. This model supports up to **32768** input tokens and can generate up to **8192** tokens.

# What should Fijik be used for?
Fijik 1.0 3B is by design, meant to be a production-ready, general use, high-performance model, which is also small enough to be run at high token throughputs while minimising performance loss.
- We made some efforts at ensuring the model is safe while keeping it useable. In addition, it is sensitive to system prompts (in a good way, adheres to them well), so it is very customisable. We did not put in our fine-tuning data any information about the identity of the model; rather it just knows that it is a Large Language Model (LLM), but it does not know it is Fijik, unless you specify in the system prompt.
- Due to the large context of the model, It can be used for RAG, but like any other LLM out there, you should be aware that it *may* hallucinate.
- In our fine-tuning data we included quite a bit of creative writing examples, so the model is pretty good at it.
- Coding, Math: In our SFT, DPO fine-tuning data we have put an effort into improving coding and step-by-step math performance, while it is indeed not perfect, no LLM is.
# Examples

What is ittext

![image/png](https://cdn-uploads.huggingface.co/production/uploads/6710ba6af1279fe0dfe33afe/m8o_qX4M5A5bd_qqSBQPN.png)

# What is it
    This is a 1.0 Fijik series with **3 billion** parameters, dense 56 layer transformer LLM based on Qwen2.5, specifically, it was merged using Mergekit to be twice as large as Qwen2.5 1.5B.

After merging, we used a custom dataset mix meant for this model, to improve its performance even more.
- **Step 1 for fine-tuning via unsloth:** SFT on an estimated 20 million tokens. (more or less)
- **Step 2 for the fine-tuning via unsloth:** DPO for 2 epochs for even better instruction following.
After these two steps, we got a powerful model which has less parameters than llama 3.1 8B yet performs just as good if not better, Note that unlike our other recent models, it is not a thinking model, yet it can reason quite well. Our theory behind this model is that a smaller yet deeper model can outperform for it's size.

Alibaba qwen states that Qwen2.5 was pre-trained on up to 18 trillion high-quality tokens. This model supports up to **32768** input tokens and can generate up to **8192** tokens.

# What should Fijik be used for?
Fijik 1.0 3B is by design, meant to be a production-ready, general use, high-performance model, which is also small enough to be run at high token throughputs while minimising performance loss.
- We made some efforts at ensuring the model is safe while keeping it useable. In addition, it is sensitive to system prompts (in a good way, adheres to them well), so it is very customisable. We did not put in our fine-tuning data any information about the identity of the model; rather it just knows that it is a Large Language Model (LLM), but it does not know it is Fijik, unless you specify in the system prompt.
- Due to the large context of the model, It can be used for RAG, but like any other LLM out there, you should be aware that it *may* hallucinate.
- In our fine-tuning data we included quite a bit of creative writing examples, so the model is pretty good at it.
- Coding, Math: In our SFT, DPO fine-tuning data we have put an effort into improving coding and step-by-step math performance, while it is indeed not perfect, no LLM is.
# Examples

What is ittext

![image/png](https://cdn-uploads.huggingface.co/production/uploads/6710ba6af1279fe0dfe33afe/m8o_qX4M5A5bd_qqSBQPN.png)

# What is it
    This is a 1.0 Fijik series with **3 billion** parameters, dense 56 layer transformer LLM based on Qwen2.5, specifically, it was merged using Mergekit to be twice as large as Qwen2.5 1.5B.

After merging, we used a custom dataset mix meant for this model, to improve its performance even more.
- **Step 1 for fine-tuning via unsloth:** SFT on an estimated 20 million tokens. (more or less)
- **Step 2 for the fine-tuning via unsloth:** DPO for 2 epochs for even better instruction following.
After these two steps, we got a powerful model which has less parameters than llama 3.1 8B yet performs just as good if not better, Note that unlike our other recent models, it is not a thinking model, yet it can reason quite well. Our theory behind this model is that a smaller yet deeper model can outperform for it's size.

Alibaba qwen states that Qwen2.5 was pre-trained on up to 18 trillion high-quality tokens. This model supports up to **32768** input tokens and can generate up to **8192** tokens.

# What should Fijik be used for?
Fijik 1.0 3B is by design, meant to be a production-ready, general use, high-performance model, which is also small enough to be run at high token throughputs while minimising performance loss.
- We made some efforts at ensuring the model is safe while keeping it useable. In addition, it is sensitive to system prompts (in a good way, adheres to them well), so it is very customisable. We did not put in our fine-tuning data any information about the identity of the model; rather it just knows that it is a Large Language Model (LLM), but it does not know it is Fijik, unless you specify in the system prompt.
- Due to the large context of the model, It can be used for RAG, but like any other LLM out there, you should be aware that it *may* hallucinate.
- In our fine-tuning data we included quite a bit of creative writing examples, so the model is pretty good at it.
- Coding, Math: In our SFT, DPO fine-tuning data we have put an effort into improving coding and step-by-step math performance, while it is indeed not perfect, no LLM is.
# Examples

What is ittext

![image/png](https://cdn-uploads.huggingface.co/production/uploads/6710ba6af1279fe0dfe33afe/m8o_qX4M5A5bd_qqSBQPN.png)

# What is it
    This is a 1.0 Fijik series with **3 billion** parameters, dense 56 layer transformer LLM based on Qwen2.5, specifically, it was merged using Mergekit to be twice as large as Qwen2.5 1.5B.

After merging, we used a custom dataset mix meant for this model, to improve its performance even more.
- **Step 1 for fine-tuning via unsloth:** SFT on an estimated 20 million tokens. (more or less)
- **Step 2 for the fine-tuning via unsloth:** DPO for 2 epochs for even better instruction following.
After these two steps, we got a powerful model which has less parameters than llama 3.1 8B yet performs just as good if not better, Note that unlike our other recent models, it is not a thinking model, yet it can reason quite well. Our theory behind this model is that a smaller yet deeper model can outperform for it's size.

Alibaba qwen states that Qwen2.5 was pre-trained on up to 18 trillion high-quality tokens. This model supports up to **32768** input tokens and can generate up to **8192** tokens.

# What should Fijik be used for?
Fijik 1.0 3B is by design, meant to be a production-ready, general use, high-performance model, which is also small enough to be run at high token throughputs while minimising performance loss.
- We made some efforts at ensuring the model is safe while keeping it useable. In addition, it is sensitive to system prompts (in a good way, adheres to them well), so it is very customisable. We did not put in our fine-tuning data any information about the identity of the model; rather it just knows that it is a Large Language Model (LLM), but it does not know it is Fijik, unless you specify in the system prompt.
- Due to the large context of the model, It can be used for RAG, but like any other LLM out there, you should be aware that it *may* hallucinate.
- In our fine-tuning data we included quite a bit of creative writing examples, so the model is pretty good at it.
- Coding, Math: In our SFT, DPO fine-tuning data we have put an effort into improving coding and step-by-step math performance, while it is indeed not perfect, no LLM is.
# Examples

What is ittext

![image/png](https://cdn-uploads.huggingface.co/production/uploads/6710ba6af1279fe0dfe33afe/m8o_qX4M5A5bd_qqSBQPN.png)

# What is it
    This is a 1.0 Fijik series with **3 billion** parameters, dense 56 layer transformer LLM based on Qwen2.5, specifically, it was merged using Mergekit to be twice as large as Qwen2.5 1.5B.

After merging, we used a custom dataset mix meant for this model, to improve its performance even more.
- **Step 1 for fine-tuning via unsloth:** SFT on an estimated 20 million tokens. (more or less)
- **Step 2 for the fine-tuning via unsloth:** DPO for 2 epochs for even better instruction following.
After these two steps, we got a powerful model which has less parameters than llama 3.1 8B yet performs just as good if not better, Note that unlike our other recent models, it is not a thinking model, yet it can reason quite well. Our theory behind this model is that a smaller yet deeper model can outperform for it's size.

Alibaba qwen states that Qwen2.5 was pre-trained on up to 18 trillion high-quality tokens. This model supports up to **32768** input tokens and can generate up to **8192** tokens.

# What should Fijik be used for?
Fijik 1.0 3B is by design, meant to be a production-ready, general use, high-performance model, which is also small enough to be run at high token throughputs while minimising performance loss.
- We made some efforts at ensuring the model is safe while keeping it useable. In addition, it is sensitive to system prompts (in a good way, adheres to them well), so it is very customisable. We did not put in our fine-tuning data any information about the identity of the model; rather it just knows that it is a Large Language Model (LLM), but it does not know it is Fijik, unless you specify in the system prompt.
- Due to the large context of the model, It can be used for RAG, but like any other LLM out there, you should be aware that it *may* hallucinate.
- In our fine-tuning data we included quite a bit of creative writing examples, so the model is pretty good at it.
- Coding, Math: In our SFT, DPO fine-tuning data we have put an effort into improving coding and step-by-step math performance, while it is indeed not perfect, no LLM is.
# Examples

What is ittext

![image/png](https://cdn-uploads.huggingface.co/production/uploads/6710ba6af1279fe0dfe33afe/m8o_qX4M5A5bd_qqSBQPN.png)

# What is it
    This is a 1.0 Fijik series with **3 billion** parameters, dense 56 layer transformer LLM based on Qwen2.5, specifically, it was merged using Mergekit to be twice as large as Qwen2.5 1.5B.

After merging, we used a custom dataset mix meant for this model, to improve its performance even more.
- **Step 1 for fine-tuning via unsloth:** SFT on an estimated 20 million tokens. (more or less)
- **Step 2 for the fine-tuning via unsloth:** DPO for 2 epochs for even better instruction following.
After these two steps, we got a powerful model which has less parameters than llama 3.1 8B yet performs just as good if not better, Note that unlike our other recent models, it is not a thinking model, yet it can reason quite well. Our theory behind this model is that a smaller yet deeper model can outperform for it's size.

Alibaba qwen states that Qwen2.5 was pre-trained on up to 18 trillion high-quality tokens. This model supports up to **32768** input tokens and can generate up to **8192** tokens.

# What should Fijik be used for?
Fijik 1.0 3B is by design, meant to be a production-ready, general use, high-performance model, which is also small enough to be run at high token throughputs while minimising performance loss.
- We made some efforts at ensuring the model is safe while keeping it useable. In addition, it is sensitive to system prompts (in a good way, adheres to them well), so it is very customisable. We did not put in our fine-tuning data any information about the identity of the model; rather it just knows that it is a Large Language Model (LLM), but it does not know it is Fijik, unless you specify in the system prompt.
- Due to the large context of the model, It can be used for RAG, but like any other LLM out there, you should be aware that it *may* hallucinate.
- In our fine-tuning data we included quite a bit of creative writing examples, so the model is pretty good at it.
- Coding, Math: In our SFT, DPO fine-tuning data we have put an effort into improving coding and step-by-step math performance, while it is indeed not perfect, no LLM is.
# Examples

What is ittext

![image/png](https://cdn-uploads.huggingface.co/production/uploads/6710ba6af1279fe0dfe33afe/m8o_qX4M5A5bd_qqSBQPN.png)

# What is it
    This is a 1.0 Fijik series with **3 billion** parameters, dense 56 layer transformer LLM based on Qwen2.5, specifically, it was merged using Mergekit to be twice as large as Qwen2.5 1.5B.

After merging, we used a custom dataset mix meant for this model, to improve its performance even more.
- **Step 1 for fine-tuning via unsloth:** SFT on an estimated 20 million tokens. (more or less)
- **Step 2 for the fine-tuning via unsloth:** DPO for 2 epochs for even better instruction following.
After these two steps, we got a powerful model which has less parameters than llama 3.1 8B yet performs just as good if not better, Note that unlike our other recent models, it is not a thinking model, yet it can reason quite well. Our theory behind this model is that a smaller yet deeper model can outperform for it's size.

Alibaba qwen states that Qwen2.5 was pre-trained on up to 18 trillion high-quality tokens. This model supports up to **32768** input tokens and can generate up to **8192** tokens.

# What should Fijik be used for?
Fijik 1.0 3B is by design, meant to be a production-ready, general use, high-performance model, which is also small enough to be run at high token throughputs while minimising performance loss.
- We made some efforts at ensuring the model is safe while keeping it useable. In addition, it is sensitive to system prompts (in a good way, adheres to them well), so it is very customisable. We did not put in our fine-tuning data any information about the identity of the model; rather it just knows that it is a Large Language Model (LLM), but it does not know it is Fijik, unless you specify in the system prompt.
- Due to the large context of the model, It can be used for RAG, but like any other LLM out there, you should be aware that it *may* hallucinate.
- In our fine-tuning data we included quite a bit of creative writing examples, so the model is pretty good at it.
- Coding, Math: In our SFT, DPO fine-tuning data we have put an effort into improving coding and step-by-step math performance, while it is indeed not perfect, no LLM is.
# Examples

What is ittext

![image/png](https://cdn-uploads.huggingface.co/production/uploads/6710ba6af1279fe0dfe33afe/m8o_qX4M5A5bd_qqSBQPN.png)

# What is it
    This is a 1.0 Fijik series with **3 billion** parameters, dense 56 layer transformer LLM based on Qwen2.5, specifically, it was merged using Mergekit to be twice as large as Qwen2.5 1.5B.

After merging, we used a custom dataset mix meant for this model, to improve its performance even more.
- **Step 1 for fine-tuning via unsloth:** SFT on an estimated 20 million tokens. (more or less)
- **Step 2 for the fine-tuning via unsloth:** DPO for 2 epochs for even better instruction following.
After these two steps, we got a powerful model which has less parameters than llama 3.1 8B yet performs just as good if not better, Note that unlike our other recent models, it is not a thinking model, yet it can reason quite well. Our theory behind this model is that a smaller yet deeper model can outperform for it's size.

Alibaba qwen states that Qwen2.5 was pre-trained on up to 18 trillion high-quality tokens. This model supports up to **32768** input tokens and can generate up to **8192** tokens.

# What should Fijik be used for?
Fijik 1.0 3B is by design, meant to be a production-ready, general use, high-performance model, which is also small enough to be run at high token throughputs while minimising performance loss.
- We made some efforts at ensuring the model is safe while keeping it useable. In addition, it is sensitive to system prompts (in a good way, adheres to them well), so it is very customisable. We did not put in our fine-tuning data any information about the identity of the model; rather it just knows that it is a Large Language Model (LLM), but it does not know it is Fijik, unless you specify in the system prompt.
- Due to the large context of the model, It can be used for RAG, but like any other LLM out there, you should be aware that it *may* hallucinate.
- In our fine-tuning data we included quite a bit of creative writing examples, so the model is pretty good at it.
- Coding, Math: In our SFT, DPO fine-tuning data we have put an effort into improving coding and step-by-step math performance, while it is indeed not perfect, no LLM is.
# Examples

What is ittext

![image/png](https://cdn-uploads.huggingface.co/production/uploads/6710ba6af1279fe0dfe33afe/m8o_qX4M5A5bd_qqSBQPN.png)

# What is it
    This is a 1.0 Fijik series with **3 billion** parameters, dense 56 layer transformer LLM based on Qwen2.5, specifically, it was merged using Mergekit to be twice as large as Qwen2.5 1.5B.

After merging, we used a custom dataset mix meant for this model, to improve its performance even more.
- **Step 1 for fine-tuning via unsloth:** SFT on an estimated 20 million tokens. (more or less)
- **Step 2 for the fine-tuning via unsloth:** DPO for 2 epochs for even better instruction following.
After these two steps, we got a powerful model which has less parameters than llama 3.1 8B yet performs just as good if not better, Note that unlike our other recent models, it is not a thinking model, yet it can reason quite well. Our theory behind this model is that a smaller yet deeper model can outperform for it's size.

Alibaba qwen states that Qwen2.5 was pre-trained on up to 18 trillion high-quality tokens. This model supports up to **32768** input tokens and can generate up to **8192** tokens.

# What should Fijik be used for?
Fijik 1.0 3B is by design, meant to be a production-ready, general use, high-performance model, which is also small enough to be run at high token throughputs while minimising performance loss.
- We made some efforts at ensuring the model is safe while keeping it useable. In addition, it is sensitive to system prompts (in a good way, adheres to them well), so it is very customisable. We did not put in our fine-tuning data any information about the identity of the model; rather it just knows that it is a Large Language Model (LLM), but it does not know it is Fijik, unless you specify in the system prompt.
- Due to the large context of the model, It can be used for RAG, but like any other LLM out there, you should be aware that it *may* hallucinate.
- In our fine-tuning data we included quite a bit of creative writing examples, so the model is pretty good at it.
- Coding, Math: In our SFT, DPO fine-tuning data we have put an effort into improving coding and step-by-step math performance, while it is indeed not perfect, no LLM is.
# Examples

What is ittext

![image/png](https://cdn-uploads.huggingface.co/production/uploads/6710ba6af1279fe0dfe33afe/m8o_qX4M5A5bd_qqSBQPN.png)

# What is it
    This is a 1.0 Fijik series with **3 billion** parameters, dense 56 layer transformer LLM based on Qwen2.5, specifically, it was merged using Mergekit to be twice as large as Qwen2.5 1.5B.

After merging, we used a custom dataset mix meant for this model, to improve its performance even more.
- **Step 1 for fine-tuning via unsloth:** SFT on an estimated 20 million tokens. (more or less)
- **Step 2 for the fine-tuning via unsloth:** DPO for 2 epochs for even better instruction following.
After these two steps, we got a powerful model which has less parameters than llama 3.1 8B yet performs just as good if not better, Note that unlike our other recent models, it is not a thinking model, yet it can reason quite well. Our theory behind this model is that a smaller yet deeper model can outperform for it's size.

Alibaba qwen states that Qwen2.5 was pre-trained on up to 18 trillion high-quality tokens. This model supports up to **32768** input tokens and can generate up to **8192** tokens.

# What should Fijik be used for?
Fijik 1.0 3B is by design, meant to be a production-ready, general use, high-performance model, which is also small enough to be run at high token throughputs while minimising performance loss.
- We made some efforts at ensuring the model is safe while keeping it useable. In addition, it is sensitive to system prompts (in a good way, adheres to them well), so it is very customisable. We did not put in our fine-tuning data any information about the identity of the model; rather it just knows that it is a Large Language Model (LLM), but it does not know it is Fijik, unless you specify in the system prompt.
- Due to the large context of the model, It can be used for RAG, but like any other LLM out there, you should be aware that it *may* hallucinate.
- In our fine-tuning data we included quite a bit of creative writing examples, so the model is pretty good at it.
- Coding, Math: In our SFT, DPO fine-tuning data we have put an effort into improving coding and step-by-step math performance, while it is indeed not perfect, no LLM is.
# Examples

What is ittext

![image/png](https://cdn-uploads.huggingface.co/production/uploads/6710ba6af1279fe0dfe33afe/m8o_qX4M5A5bd_qqSBQPN.png)

# What is it
    This is a 1.0 Fijik series with **3 billion** parameters, dense 56 layer transformer LLM based on Qwen2.5, specifically, it was merged using Mergekit to be twice as large as Qwen2.5 1.5B.

After merging, we used a custom dataset mix meant for this model, to improve its performance even more.
- **Step 1 for fine-tuning via unsloth:** SFT on an estimated 20 million tokens. (more or less)
- **Step 2 for the fine-tuning via unsloth:** DPO for 2 epochs for even better instruction following.
After these two steps, we got a powerful model which has less parameters than llama 3.1 8B yet performs just as good if not better, Note that unlike our other recent models, it is not a thinking model, yet it can reason quite well. Our theory behind this model is that a smaller yet deeper model can outperform for it's size.

Alibaba qwen states that Qwen2.5 was pre-trained on up to 18 trillion high-quality tokens. This model supports up to **32768** input tokens and can generate up to **8192** tokens.

# What should Fijik be used for?
Fijik 1.0 3B is by design, meant to be a production-ready, general use, high-performance model, which is also small enough to be run at high token throughputs while minimising performance loss.
- We made some efforts at ensuring the model is safe while keeping it useable. In addition, it is sensitive to system prompts (in a good way, adheres to them well), so it is very customisable. We did not put in our fine-tuning data any information about the identity of the model; rather it just knows that it is a Large Language Model (LLM), but it does not know it is Fijik, unless you specify in the system prompt.
- Due to the large context of the model, It can be used for RAG, but like any other LLM out there, you should be aware that it *may* hallucinate.
- In our fine-tuning data we included quite a bit of creative writing examples, so the model is pretty good at it.
- Coding, Math: In our SFT, DPO fine-tuning data we have put an effort into improving coding and step-by-step math performance, while it is indeed not perfect, no LLM is.
# Examples

What is ittext

![image/png](https://cdn-uploads.huggingface.co/production/uploads/6710ba6af1279fe0dfe33afe/m8o_qX4M5A5bd_qqSBQPN.png)

# What is it
    This is a 1.0 Fijik series with **3 billion** parameters, dense 56 layer transformer LLM based on Qwen2.5, specifically, it was merged using Mergekit to be twice as large as Qwen2.5 1.5B.

After merging, we used a custom dataset mix meant for this model, to improve its performance even more.
- **Step 1 for fine-tuning via unsloth:** SFT on an estimated 20 million tokens. (more or less)
- **Step 2 for the fine-tuning via unsloth:** DPO for 2 epochs for even better instruction following.
After these two steps, we got a powerful model which has less parameters than llama 3.1 8B yet performs just as good if not better, Note that unlike our other recent models, it is not a thinking model, yet it can reason quite well. Our theory behind this model is that a smaller yet deeper model can outperform for it's size.

Alibaba qwen states that Qwen2.5 was pre-trained on up to 18 trillion high-quality tokens. This model supports up to **32768** input tokens and can generate up to **8192** tokens.

# What should Fijik be used for?
Fijik 1.0 3B is by design, meant to be a production-ready, general use, high-performance model, which is also small enough to be run at high token throughputs while minimising performance loss.
- We made some efforts at ensuring the model is safe while keeping it useable. In addition, it is sensitive to system prompts (in a good way, adheres to them well), so it is very customisable. We did not put in our fine-tuning data any information about the identity of the model; rather it just knows that it is a Large Language Model (LLM), but it does not know it is Fijik, unless you specify in the system prompt.
- Due to the large context of the model, It can be used for RAG, but like any other LLM out there, you should be aware that it *may* hallucinate.
- In our fine-tuning data we included quite a bit of creative writing examples, so the model is pretty good at it.
- Coding, Math: In our SFT, DPO fine-tuning data we have put an effort into improving coding and step-by-step math performance, while it is indeed not perfect, no LLM is.
# Examples

What is ittext

![image/png](https://cdn-uploads.huggingface.co/production/uploads/6710ba6af1279fe0dfe33afe/m8o_qX4M5A5bd_qqSBQPN.png)

# What is it
    This is a 1.0 Fijik series with **3 billion** parameters, dense 56 layer transformer LLM based on Qwen2.5, specifically, it was merged using Mergekit to be twice as large as Qwen2.5 1.5B.

After merging, we used a custom dataset mix meant for this model, to improve its performance even more.
- **Step 1 for fine-tuning via unsloth:** SFT on an estimated 20 million tokens. (more or less)
- **Step 2 for the fine-tuning via unsloth:** DPO for 2 epochs for even better instruction following.
After these two steps, we got a powerful model which has less parameters than llama 3.1 8B yet performs just as good if not better, Note that unlike our other recent models, it is not a thinking model, yet it can reason quite well. Our theory behind this model is that a smaller yet deeper model can outperform for it's size.

Alibaba qwen states that Qwen2.5 was pre-trained on up to 18 trillion high-quality tokens. This model supports up to **32768** input tokens and can generate up to **8192** tokens.

# What should Fijik be used for?
Fijik 1.0 3B is by design, meant to be a production-ready, general use, high-performance model, which is also small enough to be run at high token throughputs while minimising performance loss.
- We made some efforts at ensuring the model is safe while keeping it useable. In addition, it is sensitive to system prompts (in a good way, adheres to them well), so it is very customisable. We did not put in our fine-tuning data any information about the identity of the model; rather it just knows that it is a Large Language Model (LLM), but it does not know it is Fijik, unless you specify in the system prompt.
- Due to the large context of the model, It can be used for RAG, but like any other LLM out there, you should be aware that it *may* hallucinate.
- In our fine-tuning data we included quite a bit of creative writing examples, so the model is pretty good at it.
- Coding, Math: In our SFT, DPO fine-tuning data we have put an effort into improving coding and step-by-step math performance, while it is indeed not perfect, no LLM is.
# Examples

What is ittext

![image/png](https://cdn-uploads.huggingface.co/production/uploads/6710ba6af1279fe0dfe33afe/m8o_qX4M5A5bd_qqSBQPN.png)

# What is it
    This is a 1.0 Fijik series with **3 billion** parameters, dense 56 layer transformer LLM based on Qwen2.5, specifically, it was merged using Mergekit to be twice as large as Qwen2.5 1.5B.

After merging, we used a custom dataset mix meant for this model, to improve its performance even more.
- **Step 1 for fine-tuning via unsloth:** SFT on an estimated 20 million tokens. (more or less)
- **Step 2 for the fine-tuning via unsloth:** DPO for 2 epochs for even better instruction following.
After these two steps, we got a powerful model which has less parameters than llama 3.1 8B yet performs just as good if not better, Note that unlike our other recent models, it is not a thinking model, yet it can reason quite well. Our theory behind this model is that a smaller yet deeper model can outperform for it's size.

Alibaba qwen states that Qwen2.5 was pre-trained on up to 18 trillion high-quality tokens. This model supports up to **32768** input tokens and can generate up to **8192** tokens.

# What should Fijik be used for?
Fijik 1.0 3B is by design, meant to be a production-ready, general use, high-performance model, which is also small enough to be run at high token throughputs while minimising performance loss.
- We made some efforts at ensuring the model is safe while keeping it useable. In addition, it is sensitive to system prompts (in a good way, adheres to them well), so it is very customisable. We did not put in our fine-tuning data any information about the identity of the model; rather it just knows that it is a Large Language Model (LLM), but it does not know it is Fijik, unless you specify in the system prompt.
- Due to the large context of the model, It can be used for RAG, but like any other LLM out there, you should be aware that it *may* hallucinate.
- In our fine-tuning data we included quite a bit of creative writing examples, so the model is pretty good at it.
- Coding, Math: In our SFT, DPO fine-tuning data we have put an effort into improving coding and step-by-step math performance, while it is indeed not perfect, no LLM is.
# Examples

What is ittext

![image/png](https://cdn-uploads.huggingface.co/production/uploads/6710ba6af1279fe0dfe33afe/m8o_qX4M5A5bd_qqSBQPN.png)

# What is it
    This is a 1.0 Fijik series with **3 billion** parameters, dense 56 layer transformer LLM based on Qwen2.5, specifically, it was merged using Mergekit to be twice as large as Qwen2.5 1.5B.

After merging, we used a custom dataset mix meant for this model, to improve its performance even more.
- **Step 1 for fine-tuning via unsloth:** SFT on an estimated 20 million tokens. (more or less)
- **Step 2 for the fine-tuning via unsloth:** DPO for 2 epochs for even better instruction following.
After these two steps, we got a powerful model which has less parameters than llama 3.1 8B yet performs just as good if not better, Note that unlike our other recent models, it is not a thinking model, yet it can reason quite well. Our theory behind this model is that a smaller yet deeper model can outperform for it's size.

Alibaba qwen states that Qwen2.5 was pre-trained on up to 18 trillion high-quality tokens. This model supports up to **32768** input tokens and can generate up to **8192** tokens.

# What should Fijik be used for?
Fijik 1.0 3B is by design, meant to be a production-ready, general use, high-performance model, which is also small enough to be run at high token throughputs while minimising performance loss.
- We made some efforts at ensuring the model is safe while keeping it useable. In addition, it is sensitive to system prompts (in a good way, adheres to them well), so it is very customisable. We did not put in our fine-tuning data any information about the identity of the model; rather it just knows that it is a Large Language Model (LLM), but it does not know it is Fijik, unless you specify in the system prompt.
- Due to the large context of the model, It can be used for RAG, but like any other LLM out there, you should be aware that it *may* hallucinate.
- In our fine-tuning data we included quite a bit of creative writing examples, so the model is pretty good at it.
- Coding, Math: In our SFT, DPO fine-tuning data we have put an effort into improving coding and step-by-step math performance, while it is indeed not perfect, no LLM is.
# Examples

What is ittext

![image/png](https://cdn-uploads.huggingface.co/production/uploads/6710ba6af1279fe0dfe33afe/m8o_qX4M5A5bd_qqSBQPN.png)

# What is it
    This is a 1.0 Fijik series with **3 billion** parameters, dense 56 layer transformer LLM based on Qwen2.5, specifically, it was merged using Mergekit to be twice as large as Qwen2.5 1.5B.

After merging, we used a custom dataset mix meant for this model, to improve its performance even more.
- **Step 1 for fine-tuning via unsloth:** SFT on an estimated 20 million tokens. (more or less)
- **Step 2 for the fine-tuning via unsloth:** DPO for 2 epochs for even better instruction following.
After these two steps, we got a powerful model which has less parameters than llama 3.1 8B yet performs just as good if not better, Note that unlike our other recent models, it is not a thinking model, yet it can reason quite well. Our theory behind this model is that a smaller yet deeper model can outperform for it's size.

Alibaba qwen states that Qwen2.5 was pre-trained on up to 18 trillion high-quality tokens. This model supports up to **32768** input tokens and can generate up to **8192** tokens.

# What should Fijik be used for?
Fijik 1.0 3B is by design, meant to be a production-ready, general use, high-performance model, which is also small enough to be run at high token throughputs while minimising performance loss.
- We made some efforts at ensuring the model is safe while keeping it useable. In addition, it is sensitive to system prompts (in a good way, adheres to them well), so it is very customisable. We did not put in our fine-tuning data any information about the identity of the model; rather it just knows that it is a Large Language Model (LLM), but it does not know it is Fijik, unless you specify in the system prompt.
- Due to the large context of the model, It can be used for RAG, but like any other LLM out there, you should be aware that it *may* hallucinate.
- In our fine-tuning data we included quite a bit of creative writing examples, so the model is pretty good at it.
- Coding, Math: In our SFT, DPO fine-tuning data we have put an effort into improving coding and step-by-step math performance, while it is indeed not perfect, no LLM is.
# Examples

What is ittext

![image/png](https://cdn-uploads.huggingface.co/production/uploads/6710ba6af1279fe0dfe33afe/m8o_qX4M5A5bd_qqSBQPN.png)

# What is it
    This is a 1.0 Fijik series with **3 billion** parameters, dense 56 layer transformer LLM based on Qwen2.5, specifically, it was merged using Mergekit to be twice as large as Qwen2.5 1.5B.

After merging, we used a custom dataset mix meant for this model, to improve its performance even more.
- **Step 1 for fine-tuning via unsloth:** SFT on an estimated 20 million tokens. (more or less)
- **Step 2 for the fine-tuning via unsloth:** DPO for 2 epochs for even better instruction following.
After these two steps, we got a powerful model which has less parameters than llama 3.1 8B yet performs just as good if not better, Note that unlike our other recent models, it is not a thinking model, yet it can reason quite well. Our theory behind this model is that a smaller yet deeper model can outperform for it's size.

Alibaba qwen states that Qwen2.5 was pre-trained on up to 18 trillion high-quality tokens. This model supports up to **32768** input tokens and can generate up to **8192** tokens.

# What should Fijik be used for?
Fijik 1.0 3B is by design, meant to be a production-ready, general use, high-performance model, which is also small enough to be run at high token throughputs while minimising performance loss.
- We made some efforts at ensuring the model is safe while keeping it useable. In addition, it is sensitive to system prompts (in a good way, adheres to them well), so it is very customisable. We did not put in our fine-tuning data any information about the identity of the model; rather it just knows that it is a Large Language Model (LLM), but it does not know it is Fijik, unless you specify in the system prompt.
- Due to the large context of the model, It can be used for RAG, but like any other LLM out there, you should be aware that it *may* hallucinate.
- In our fine-tuning data we included quite a bit of creative writing examples, so the model is pretty good at it.
- Coding, Math: In our SFT, DPO fine-tuning data we have put an effort into improving coding and step-by-step math performance, while it is indeed not perfect, no LLM is.
# Examples

What is ittext

![image/png](https://cdn-uploads.huggingface.co/production/uploads/6710ba6af1279fe0dfe33afe/m8o_qX4M5A5bd_qqSBQPN.png)

# What is it
    This is a 1.0 Fijik series with **3 billion** parameters, dense 56 layer transformer LLM based on Qwen2.5, specifically, it was merged using Mergekit to be twice as large as Qwen2.5 1.5B.

After merging, we used a custom dataset mix meant for this model, to improve its performance even more.
- **Step 1 for fine-tuning via unsloth:** SFT on an estimated 20 million tokens. (more or less)
- **Step 2 for the fine-tuning via unsloth:** DPO for 2 epochs for even better instruction following.
After these two steps, we got a powerful model which has less parameters than llama 3.1 8B yet performs just as good if not better, Note that unlike our other recent models, it is not a thinking model, yet it can reason quite well. Our theory behind this model is that a smaller yet deeper model can outperform for it's size.

Alibaba qwen states that Qwen2.5 was pre-trained on up to 18 trillion high-quality tokens. This model supports up to **32768** input tokens and can generate up to **8192** tokens.

# What should Fijik be used for?
Fijik 1.0 3B is by design, meant to be a production-ready, general use, high-performance model, which is also small enough to be run at high token throughputs while minimising performance loss.
- We made some efforts at ensuring the model is safe while keeping it useable. In addition, it is sensitive to system prompts (in a good way, adheres to them well), so it is very customisable. We did not put in our fine-tuning data any information about the identity of the model; rather it just knows that it is a Large Language Model (LLM), but it does not know it is Fijik, unless you specify in the system prompt.
- Due to the large context of the model, It can be used for RAG, but like any other LLM out there, you should be aware that it *may* hallucinate.
- In our fine-tuning data we included quite a bit of creative writing examples, so the model is pretty good at it.
- Coding, Math: In our SFT, DPO fine-tuning data we have put an effort into improving coding and step-by-step math performance, while it is indeed not perfect, no LLM is.
# Examples

What is ittext

![image/png](https://cdn-uploads.huggingface.co/production/uploads/6710ba6af1279fe0dfe33afe/m8o_qX4M5A5bd_qqSBQPN.png)

# What is it
    This is a 1.0 Fijik series with **3 billion** parameters, dense 56 layer transformer LLM based on Qwen2.5, specifically, it was merged using Mergekit to be twice as large as Qwen2.5 1.5B.

After merging, we used a custom dataset mix meant for this model, to improve its performance even more.
- **Step 1 for fine-tuning via unsloth:** SFT on an estimated 20 million tokens. (more or less)
- **Step 2 for the fine-tuning via unsloth:** DPO for 2 epochs for even better instruction following.
After these two steps, we got a powerful model which has less parameters than llama 3.1 8B yet performs just as good if not better, Note that unlike our other recent models, it is not a thinking model, yet it can reason quite well. Our theory behind this model is that a smaller yet deeper model can outperform for it's size.

Alibaba qwen states that Qwen2.5 was pre-trained on up to 18 trillion high-quality tokens. This model supports up to **32768** input tokens and can generate up to **8192** tokens.

# What should Fijik be used for?
Fijik 1.0 3B is by design, meant to be a production-ready, general use, high-performance model, which is also small enough to be run at high token throughputs while minimising performance loss.
- We made some efforts at ensuring the model is safe while keeping it useable. In addition, it is sensitive to system prompts (in a good way, adheres to them well), so it is very customisable. We did not put in our fine-tuning data any information about the identity of the model; rather it just knows that it is a Large Language Model (LLM), but it does not know it is Fijik, unless you specify in the system prompt.
- Due to the large context of the model, It can be used for RAG, but like any other LLM out there, you should be aware that it *may* hallucinate.
- In our fine-tuning data we included quite a bit of creative writing examples, so the model is pretty good at it.
- Coding, Math: In our SFT, DPO fine-tuning data we have put an effort into improving coding and step-by-step math performance, while it is indeed not perfect, no LLM is.
# Examples

What is ittext

![image/png](https://cdn-uploads.huggingface.co/production/uploads/6710ba6af1279fe0dfe33afe/m8o_qX4M5A5bd_qqSBQPN.png)

# What is it
    This is a 1.0 Fijik series with **3 billion** parameters, dense 56 layer transformer LLM based on Qwen2.5, specifically, it was merged using Mergekit to be twice as large as Qwen2.5 1.5B.

After merging, we used a custom dataset mix meant for this model, to improve its performance even more.
- **Step 1 for fine-tuning via unsloth:** SFT on an estimated 20 million tokens. (more or less)
- **Step 2 for the fine-tuning via unsloth:** DPO for 2 epochs for even better instruction following.
After these two steps, we got a powerful model which has less parameters than llama 3.1 8B yet performs just as good if not better, Note that unlike our other recent models, it is not a thinking model, yet it can reason quite well. Our theory behind this model is that a smaller yet deeper model can outperform for it's size.

Alibaba qwen states that Qwen2.5 was pre-trained on up to 18 trillion high-quality tokens. This model supports up to **32768** input tokens and can generate up to **8192** tokens.

# What should Fijik be used for?
Fijik 1.0 3B is by design, meant to be a production-ready, general use, high-performance model, which is also small enough to be run at high token throughputs while minimising performance loss.
- We made some efforts at ensuring the model is safe while keeping it useable. In addition, it is sensitive to system prompts (in a good way, adheres to them well), so it is very customisable. We did not put in our fine-tuning data any information about the identity of the model; rather it just knows that it is a Large Language Model (LLM), but it does not know it is Fijik, unless you specify in the system prompt.
- Due to the large context of the model, It can be used for RAG, but like any other LLM out there, you should be aware that it *may* hallucinate.
- In our fine-tuning data we included quite a bit of creative writing examples, so the model is pretty good at it.
- Coding, Math: In our SFT, DPO fine-tuning data we have put an effort into improving coding and step-by-step math performance, while it is indeed not perfect, no LLM is.
# Examples

What is ittext

![image/png](https://cdn-uploads.huggingface.co/production/uploads/6710ba6af1279fe0dfe33afe/m8o_qX4M5A5bd_qqSBQPN.png)

# What is it
    This is a 1.0 Fijik series with **3 billion** parameters, dense 56 layer transformer LLM based on Qwen2.5, specifically, it was merged using Mergekit to be twice as large as Qwen2.5 1.5B.

After merging, we used a custom dataset mix meant for this model, to improve its performance even more.
- **Step 1 for fine-tuning via unsloth:** SFT on an estimated 20 million tokens. (more or less)
- **Step 2 for the fine-tuning via unsloth:** DPO for 2 epochs for even better instruction following.
After these two steps, we got a powerful model which has less parameters than llama 3.1 8B yet performs just as good if not better, Note that unlike our other recent models, it is not a thinking model, yet it can reason quite well. Our theory behind this model is that a smaller yet deeper model can outperform for it's size.

Alibaba qwen states that Qwen2.5 was pre-trained on up to 18 trillion high-quality tokens. This model supports up to **32768** input tokens and can generate up to **8192** tokens.

# What should Fijik be used for?
Fijik 1.0 3B is by design, meant to be a production-ready, general use, high-performance model, which is also small enough to be run at high token throughputs while minimising performance loss.
- We made some efforts at ensuring the model is safe while keeping it useable. In addition, it is sensitive to system prompts (in a good way, adheres to them well), so it is very customisable. We did not put in our fine-tuning data any information about the identity of the model; rather it just knows that it is a Large Language Model (LLM), but it does not know it is Fijik, unless you specify in the system prompt.
- Due to the large context of the model, It can be used for RAG, but like any other LLM out there, you should be aware that it *may* hallucinate.
- In our fine-tuning data we included quite a bit of creative writing examples, so the model is pretty good at it.
- Coding, Math: In our SFT, DPO fine-tuning data we have put an effort into improving coding and step-by-step math performance, while it is indeed not perfect, no LLM is.
# Examples

What is ittext

![image/png](https://cdn-uploads.huggingface.co/production/uploads/6710ba6af1279fe0dfe33afe/m8o_qX4M5A5bd_qqSBQPN.png)

# What is it
    This is a 1.0 Fijik series with **3 billion** parameters, dense 56 layer transformer LLM based on Qwen2.5, specifically, it was merged using Mergekit to be twice as large as Qwen2.5 1.5B.

After merging, we used a custom dataset mix meant for this model, to improve its performance even more.
- **Step 1 for fine-tuning via unsloth:** SFT on an estimated 20 million tokens. (more or less)
- **Step 2 for the fine-tuning via unsloth:** DPO for 2 epochs for even better instruction following.
After these two steps, we got a powerful model which has less parameters than llama 3.1 8B yet performs just as good if not better, Note that unlike our other recent models, it is not a thinking model, yet it can reason quite well. Our theory behind this model is that a smaller yet deeper model can outperform for it's size.

Alibaba qwen states that Qwen2.5 was pre-trained on up to 18 trillion high-quality tokens. This model supports up to **32768** input tokens and can generate up to **8192** tokens.

# What should Fijik be used for?
Fijik 1.0 3B is by design, meant to be a production-ready, general use, high-performance model, which is also small enough to be run at high token throughputs while minimising performance loss.
- We made some efforts at ensuring the model is safe while keeping it useable. In addition, it is sensitive to system prompts (in a good way, adheres to them well), so it is very customisable. We did not put in our fine-tuning data any information about the identity of the model; rather it just knows that it is a Large Language Model (LLM), but it does not know it is Fijik, unless you specify in the system prompt.
- Due to the large context of the model, It can be used for RAG, but like any other LLM out there, you should be aware that it *may* hallucinate.
- In our fine-tuning data we included quite a bit of creative writing examples, so the model is pretty good at it.
- Coding, Math: In our SFT, DPO fine-tuning data we have put an effort into improving coding and step-by-step math performance, while it is indeed not perfect, no LLM is.
# Examples

What is ittext

![image/png](https://cdn-uploads.huggingface.co/production/uploads/6710ba6af1279fe0dfe33afe/m8o_qX4M5A5bd_qqSBQPN.png)

# What is it
    This is a 1.0 Fijik series with **3 billion** parameters, dense 56 layer transformer LLM based on Qwen2.5, specifically, it was merged using Mergekit to be twice as large as Qwen2.5 1.5B.

After merging, we used a custom dataset mix meant for this model, to improve its performance even more.
- **Step 1 for fine-tuning via unsloth:** SFT on an estimated 20 million tokens. (more or less)
- **Step 2 for the fine-tuning via unsloth:** DPO for 2 epochs for even better instruction following.
After these two steps, we got a powerful model which has less parameters than llama 3.1 8B yet performs just as good if not better, Note that unlike our other recent models, it is not a thinking model, yet it can reason quite well. Our theory behind this model is that a smaller yet deeper model can outperform for it's size.

Alibaba qwen states that Qwen2.5 was pre-trained on up to 18 trillion high-quality tokens. This model supports up to **32768** input tokens and can generate up to **8192** tokens.

# What should Fijik be used for?
Fijik 1.0 3B is by design, meant to be a production-ready, general use, high-performance model, which is also small enough to be run at high token throughputs while minimising performance loss.
- We made some efforts at ensuring the model is safe while keeping it useable. In addition, it is sensitive to system prompts (in a good way, adheres to them well), so it is very customisable. We did not put in our fine-tuning data any information about the identity of the model; rather it just knows that it is a Large Language Model (LLM), but it does not know it is Fijik, unless you specify in the system prompt.
- Due to the large context of the model, It can be used for RAG, but like any other LLM out there, you should be aware that it *may* hallucinate.
- In our fine-tuning data we included quite a bit of creative writing examples, so the model is pretty good at it.
- Coding, Math: In our SFT, DPO fine-tuning data we have put an effort into improving coding and step-by-step math performance, while it is indeed not perfect, no LLM is.
# Examples

What is ittext

![image/png](https://cdn-uploads.huggingface.co/production/uploads/6710ba6af1279fe0dfe33afe/m8o_qX4M5A5bd_qqSBQPN.png)

# What is it
    This is a 1.0 Fijik series with **3 billion** parameters, dense 56 layer transformer LLM based on Qwen2.5, specifically, it was merged using Mergekit to be twice as large as Qwen2.5 1.5B.

After merging, we used a custom dataset mix meant for this model, to improve its performance even more.
- **Step 1 for fine-tuning via unsloth:** SFT on an estimated 20 million tokens. (more or less)
- **Step 2 for the fine-tuning via unsloth:** DPO for 2 epochs for even better instruction following.
After these two steps, we got a powerful model which has less parameters than llama 3.1 8B yet performs just as good if not better, Note that unlike our other recent models, it is not a thinking model, yet it can reason quite well. Our theory behind this model is that a smaller yet deeper model can outperform for it's size.

Alibaba qwen states that Qwen2.5 was pre-trained on up to 18 trillion high-quality tokens. This model supports up to **32768** input tokens and can generate up to **8192** tokens.

# What should Fijik be used for?
Fijik 1.0 3B is by design, meant to be a production-ready, general use, high-performance model, which is also small enough to be run at high token throughputs while minimising performance loss.
- We made some efforts at ensuring the model is safe while keeping it useable. In addition, it is sensitive to system prompts (in a good way, adheres to them well), so it is very customisable. We did not put in our fine-tuning data any information about the identity of the model; rather it just knows that it is a Large Language Model (LLM), but it does not know it is Fijik, unless you specify in the system prompt.
- Due to the large context of the model, It can be used for RAG, but like any other LLM out there, you should be aware that it *may* hallucinate.
- In our fine-tuning data we included quite a bit of creative writing examples, so the model is pretty good at it.
- Coding, Math: In our SFT, DPO fine-tuning data we have put an effort into improving coding and step-by-step math performance, while it is indeed not perfect, no LLM is.
# Examples

What is ittext

![image/png](https://cdn-uploads.huggingface.co/production/uploads/6710ba6af1279fe0dfe33afe/m8o_qX4M5A5bd_qqSBQPN.png)

# What is it
    This is a 1.0 Fijik series with **3 billion** parameters, dense 56 layer transformer LLM based on Qwen2.5, specifically, it was merged using Mergekit to be twice as large as Qwen2.5 1.5B.

After merging, we used a custom dataset mix meant for this model, to improve its performance even more.
- **Step 1 for fine-tuning via unsloth:** SFT on an estimated 20 million tokens. (more or less)
- **Step 2 for the fine-tuning via unsloth:** DPO for 2 epochs for even better instruction following.
After these two steps, we got a powerful model which has less parameters than llama 3.1 8B yet performs just as good if not better, Note that unlike our other recent models, it is not a thinking model, yet it can reason quite well. Our theory behind this model is that a smaller yet deeper model can outperform for it's size.

Alibaba qwen states that Qwen2.5 was pre-trained on up to 18 trillion high-quality tokens. This model supports up to **32768** input tokens and can generate up to **8192** tokens.

# What should Fijik be used for?
Fijik 1.0 3B is by design, meant to be a production-ready, general use, high-performance model, which is also small enough to be run at high token throughputs while minimising performance loss.
- We made some efforts at ensuring the model is safe while keeping it useable. In addition, it is sensitive to system prompts (in a good way, adheres to them well), so it is very customisable. We did not put in our fine-tuning data any information about the identity of the model; rather it just knows that it is a Large Language Model (LLM), but it does not know it is Fijik, unless you specify in the system prompt.
- Due to the large context of the model, It can be used for RAG, but like any other LLM out there, you should be aware that it *may* hallucinate.
- In our fine-tuning data we included quite a bit of creative writing examples, so the model is pretty good at it.
- Coding, Math: In our SFT, DPO fine-tuning data we have put an effort into improving coding and step-by-step math performance, while it is indeed not perfect, no LLM is.
# Examples

What is ittext

![image/png](https://cdn-uploads.huggingface.co/production/uploads/6710ba6af1279fe0dfe33afe/m8o_qX4M5A5bd_qqSBQPN.png)

# What is it
    This is a 1.0 Fijik series with **3 billion** parameters, dense 56 layer transformer LLM based on Qwen2.5, specifically, it was merged using Mergekit to be twice as large as Qwen2.5 1.5B.

After merging, we used a custom dataset mix meant for this model, to improve its performance even more.
- **Step 1 for fine-tuning via unsloth:** SFT on an estimated 20 million tokens. (more or less)
- **Step 2 for the fine-tuning via unsloth:** DPO for 2 epochs for even better instruction following.
After these two steps, we got a powerful model which has less parameters than llama 3.1 8B yet performs just as good if not better, Note that unlike our other recent models, it is not a thinking model, yet it can reason quite well. Our theory behind this model is that a smaller yet deeper model can outperform for it's size.

Alibaba qwen states that Qwen2.5 was pre-trained on up to 18 trillion high-quality tokens. This model supports up to **32768** input tokens and can generate up to **8192** tokens.

# What should Fijik be used for?
Fijik 1.0 3B is by design, meant to be a production-ready, general use, high-performance model, which is also small enough to be run at high token throughputs while minimising performance loss.
- We made some efforts at ensuring the model is safe while keeping it useable. In addition, it is sensitive to system prompts (in a good way, adheres to them well), so it is very customisable. We did not put in our fine-tuning data any information about the identity of the model; rather it just knows that it is a Large Language Model (LLM), but it does not know it is Fijik, unless you specify in the system prompt.
- Due to the large context of the model, It can be used for RAG, but like any other LLM out there, you should be aware that it *may* hallucinate.
- In our fine-tuning data we included quite a bit of creative writing examples, so the model is pretty good at it.
- Coding, Math: In our SFT, DPO fine-tuning data we have put an effort into improving coding and step-by-step math performance, while it is indeed not perfect, no LLM is.
# Examples

What is ittext

![image/png](https://cdn-uploads.huggingface.co/production/uploads/6710ba6af1279fe0dfe33afe/m8o_qX4M5A5bd_qqSBQPN.png)

# What is it
    This is a 1.0 Fijik series with **3 billion** parameters, dense 56 layer transformer LLM based on Qwen2.5, specifically, it was merged using Mergekit to be twice as large as Qwen2.5 1.5B.

After merging, we used a custom dataset mix meant for this model, to improve its performance even more.
- **Step 1 for fine-tuning via unsloth:** SFT on an estimated 20 million tokens. (more or less)
- **Step 2 for the fine-tuning via unsloth:** DPO for 2 epochs for even better instruction following.
After these two steps, we got a powerful model which has less parameters than llama 3.1 8B yet performs just as good if not better, Note that unlike our other recent models, it is not a thinking model, yet it can reason quite well. Our theory behind this model is that a smaller yet deeper model can outperform for it's size.

Alibaba qwen states that Qwen2.5 was pre-trained on up to 18 trillion high-quality tokens. This model supports up to **32768** input tokens and can generate up to **8192** tokens.

# What should Fijik be used for?
Fijik 1.0 3B is by design, meant to be a production-ready, general use, high-performance model, which is also small enough to be run at high token throughputs while minimising performance loss.
- We made some efforts at ensuring the model is safe while keeping it useable. In addition, it is sensitive to system prompts (in a good way, adheres to them well), so it is very customisable. We did not put in our fine-tuning data any information about the identity of the model; rather it just knows that it is a Large Language Model (LLM), but it does not know it is Fijik, unless you specify in the system prompt.
- Due to the large context of the model, It can be used for RAG, but like any other LLM out there, you should be aware that it *may* hallucinate.
- In our fine-tuning data we included quite a bit of creative writing examples, so the model is pretty good at it.
- Coding, Math: In our SFT, DPO fine-tuning data we have put an effort into improving coding and step-by-step math performance, while it is indeed not perfect, no LLM is.
# Examples

What is ittext

![image/png](https://cdn-uploads.huggingface.co/production/uploads/6710ba6af1279fe0dfe33afe/m8o_qX4M5A5bd_qqSBQPN.png)

# What is it
    This is a 1.0 Fijik series with **3 billion** parameters, dense 56 layer transformer LLM based on Qwen2.5, specifically, it was merged using Mergekit to be twice as large as Qwen2.5 1.5B.

After merging, we used a custom dataset mix meant for this model, to improve its performance even more.
- **Step 1 for fine-tuning via unsloth:** SFT on an estimated 20 million tokens. (more or less)
- **Step 2 for the fine-tuning via unsloth:** DPO for 2 epochs for even better instruction following.
After these two steps, we got a powerful model which has less parameters than llama 3.1 8B yet performs just as good if not better, Note that unlike our other recent models, it is not a thinking model, yet it can reason quite well. Our theory behind this model is that a smaller yet deeper model can outperform for it's size.

Alibaba qwen states that Qwen2.5 was pre-trained on up to 18 trillion high-quality tokens. This model supports up to **32768** input tokens and can generate up to **8192** tokens.

# What should Fijik be used for?
Fijik 1.0 3B is by design, meant to be a production-ready, general use, high-performance model, which is also small enough to be run at high token throughputs while minimising performance loss.
- We made some efforts at ensuring the model is safe while keeping it useable. In addition, it is sensitive to system prompts (in a good way, adheres to them well), so it is very customisable. We did not put in our fine-tuning data any information about the identity of the model; rather it just knows that it is a Large Language Model (LLM), but it does not know it is Fijik, unless you specify in the system prompt.
- Due to the large context of the model, It can be used for RAG, but like any other LLM out there, you should be aware that it *may* hallucinate.
- In our fine-tuning data we included quite a bit of creative writing examples, so the model is pretty good at it.
- Coding, Math: In our SFT, DPO fine-tuning data we have put an effort into improving coding and step-by-step math performance, while it is indeed not perfect, no LLM is.
# Examples

What is ittext

![image/png](https://cdn-uploads.huggingface.co/production/uploads/6710ba6af1279fe0dfe33afe/m8o_qX4M5A5bd_qqSBQPN.png)

# What is it
    This is a 1.0 Fijik series with **3 billion** parameters, dense 56 layer transformer LLM based on Qwen2.5, specifically, it was merged using Mergekit to be twice as large as Qwen2.5 1.5B.

After merging, we used a custom dataset mix meant for this model, to improve its performance even more.
- **Step 1 for fine-tuning via unsloth:** SFT on an estimated 20 million tokens. (more or less)
- **Step 2 for the fine-tuning via unsloth:** DPO for 2 epochs for even better instruction following.
After these two steps, we got a powerful model which has less parameters than llama 3.1 8B yet performs just as good if not better, Note that unlike our other recent models, it is not a thinking model, yet it can reason quite well. Our theory behind this model is that a smaller yet deeper model can outperform for it's size.

Alibaba qwen states that Qwen2.5 was pre-trained on up to 18 trillion high-quality tokens. This model supports up to **32768** input tokens and can generate up to **8192** tokens.

# What should Fijik be used for?
Fijik 1.0 3B is by design, meant to be a production-ready, general use, high-performance model, which is also small enough to be run at high token throughputs while minimising performance loss.
- We made some efforts at ensuring the model is safe while keeping it useable. In addition, it is sensitive to system prompts (in a good way, adheres to them well), so it is very customisable. We did not put in our fine-tuning data any information about the identity of the model; rather it just knows that it is a Large Language Model (LLM), but it does not know it is Fijik, unless you specify in the system prompt.
- Due to the large context of the model, It can be used for RAG, but like any other LLM out there, you should be aware that it *may* hallucinate.
- In our fine-tuning data we included quite a bit of creative writing examples, so the model is pretty good at it.
- Coding, Math: In our SFT, DPO fine-tuning data we have put an effort into improving coding and step-by-step math performance, while it is indeed not perfect, no LLM is.
# Examples

What is ittext

![image/png](https://cdn-uploads.huggingface.co/production/uploads/6710ba6af1279fe0dfe33afe/m8o_qX4M5A5bd_qqSBQPN.png)

# What is it
    This is a 1.0 Fijik series with **3 billion** parameters, dense 56 layer transformer LLM based on Qwen2.5, specifically, it was merged using Mergekit to be twice as large as Qwen2.5 1.5B.

After merging, we used a custom dataset mix meant for this model, to improve its performance even more.
- **Step 1 for fine-tuning via unsloth:** SFT on an estimated 20 million tokens. (more or less)
- **Step 2 for the fine-tuning via unsloth:** DPO for 2 epochs for even better instruction following.
After these two steps, we got a powerful model which has less parameters than llama 3.1 8B yet performs just as good if not better, Note that unlike our other recent models, it is not a thinking model, yet it can reason quite well. Our theory behind this model is that a smaller yet deeper model can outperform for it's size.

Alibaba qwen states that Qwen2.5 was pre-trained on up to 18 trillion high-quality tokens. This model supports up to **32768** input tokens and can generate up to **8192** tokens.

# What should Fijik be used for?
Fijik 1.0 3B is by design, meant to be a production-ready, general use, high-performance model, which is also small enough to be run at high token throughputs while minimising performance loss.
- We made some efforts at ensuring the model is safe while keeping it useable. In addition, it is sensitive to system prompts (in a good way, adheres to them well), so it is very customisable. We did not put in our fine-tuning data any information about the identity of the model; rather it just knows that it is a Large Language Model (LLM), but it does not know it is Fijik, unless you specify in the system prompt.
- Due to the large context of the model, It can be used for RAG, but like any other LLM out there, you should be aware that it *may* hallucinate.
- In our fine-tuning data we included quite a bit of creative writing examples, so the model is pretty good at it.
- Coding, Math: In our SFT, DPO fine-tuning data we have put an effort into improving coding and step-by-step math performance, while it is indeed not perfect, no LLM is.
# Examples

What is ittext

![image/png](https://cdn-uploads.huggingface.co/production/uploads/6710ba6af1279fe0dfe33afe/m8o_qX4M5A5bd_qqSBQPN.png)

# What is it
    This is a 1.0 Fijik series with **3 billion** parameters, dense 56 layer transformer LLM based on Qwen2.5, specifically, it was merged using Mergekit to be twice as large as Qwen2.5 1.5B.

After merging, we used a custom dataset mix meant for this model, to improve its performance even more.
- **Step 1 for fine-tuning via unsloth:** SFT on an estimated 20 million tokens. (more or less)
- **Step 2 for the fine-tuning via unsloth:** DPO for 2 epochs for even better instruction following.
After these two steps, we got a powerful model which has less parameters than llama 3.1 8B yet performs just as good if not better, Note that unlike our other recent models, it is not a thinking model, yet it can reason quite well. Our theory behind this model is that a smaller yet deeper model can outperform for it's size.

Alibaba qwen states that Qwen2.5 was pre-trained on up to 18 trillion high-quality tokens. This model supports up to **32768** input tokens and can generate up to **8192** tokens.

# What should Fijik be used for?
Fijik 1.0 3B is by design, meant to be a production-ready, general use, high-performance model, which is also small enough to be run at high token throughputs while minimising performance loss.
- We made some efforts at ensuring the model is safe while keeping it useable. In addition, it is sensitive to system prompts (in a good way, adheres to them well), so it is very customisable. We did not put in our fine-tuning data any information about the identity of the model; rather it just knows that it is a Large Language Model (LLM), but it does not know it is Fijik, unless you specify in the system prompt.
- Due to the large context of the model, It can be used for RAG, but like any other LLM out there, you should be aware that it *may* hallucinate.
- In our fine-tuning data we included quite a bit of creative writing examples, so the model is pretty good at it.
- Coding, Math: In our SFT, DPO fine-tuning data we have put an effort into improving coding and step-by-step math performance, while it is indeed not perfect, no LLM is.
# Examples

Citationstext

{
    title={Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing}, 
    author={Zhangchen Xu and Fengqing Jiang and Luyao Niu and Yuntian Deng and Radha Poovendran and Yejin Choi and Bill Yuchen Lin},
    year={2024},
    eprint={2406.08464},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}

Citationstext

{
    title={Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing}, 
    author={Zhangchen Xu and Fengqing Jiang and Luyao Niu and Yuntian Deng and Radha Poovendran and Yejin Choi and Bill Yuchen Lin},
    year={2024},
    eprint={2406.08464},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}

Citationstext

{
    title={Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing}, 
    author={Zhangchen Xu and Fengqing Jiang and Luyao Niu and Yuntian Deng and Radha Poovendran and Yejin Choi and Bill Yuchen Lin},
    year={2024},
    eprint={2406.08464},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}

Citationstext

{
    title={Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing}, 
    author={Zhangchen Xu and Fengqing Jiang and Luyao Niu and Yuntian Deng and Radha Poovendran and Yejin Choi and Bill Yuchen Lin},
    year={2024},
    eprint={2406.08464},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}

Citationstext

{
    title={Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing}, 
    author={Zhangchen Xu and Fengqing Jiang and Luyao Niu and Yuntian Deng and Radha Poovendran and Yejin Choi and Bill Yuchen Lin},
    year={2024},
    eprint={2406.08464},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}

Citationstext

{
    title={Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing}, 
    author={Zhangchen Xu and Fengqing Jiang and Luyao Niu and Yuntian Deng and Radha Poovendran and Yejin Choi and Bill Yuchen Lin},
    year={2024},
    eprint={2406.08464},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}

Citationstext

{
    title={Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing}, 
    author={Zhangchen Xu and Fengqing Jiang and Luyao Niu and Yuntian Deng and Radha Poovendran and Yejin Choi and Bill Yuchen Lin},
    year={2024},
    eprint={2406.08464},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}

Citationstext

{
    title={Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing}, 
    author={Zhangchen Xu and Fengqing Jiang and Luyao Niu and Yuntian Deng and Radha Poovendran and Yejin Choi and Bill Yuchen Lin},
    year={2024},
    eprint={2406.08464},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}

Citationstext

{
    title={Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing}, 
    author={Zhangchen Xu and Fengqing Jiang and Luyao Niu and Yuntian Deng and Radha Poovendran and Yejin Choi and Bill Yuchen Lin},
    year={2024},
    eprint={2406.08464},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}

Citationstext

{
    title={Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing}, 
    author={Zhangchen Xu and Fengqing Jiang and Luyao Niu and Yuntian Deng and Radha Poovendran and Yejin Choi and Bill Yuchen Lin},
    year={2024},
    eprint={2406.08464},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}

Citationstext

{
    title={Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing}, 
    author={Zhangchen Xu and Fengqing Jiang and Luyao Niu and Yuntian Deng and Radha Poovendran and Yejin Choi and Bill Yuchen Lin},
    year={2024},
    eprint={2406.08464},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}

Citationstext

{
    title={Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing}, 
    author={Zhangchen Xu and Fengqing Jiang and Luyao Niu and Yuntian Deng and Radha Poovendran and Yejin Choi and Bill Yuchen Lin},
    year={2024},
    eprint={2406.08464},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}

Citationstext

{
    title={Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing}, 
    author={Zhangchen Xu and Fengqing Jiang and Luyao Niu and Yuntian Deng and Radha Poovendran and Yejin Choi and Bill Yuchen Lin},
    year={2024},
    eprint={2406.08464},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}

Citationstext

{
    title={Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing}, 
    author={Zhangchen Xu and Fengqing Jiang and Luyao Niu and Yuntian Deng and Radha Poovendran and Yejin Choi and Bill Yuchen Lin},
    year={2024},
    eprint={2406.08464},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}

Citationstext

{
    title={Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing}, 
    author={Zhangchen Xu and Fengqing Jiang and Luyao Niu and Yuntian Deng and Radha Poovendran and Yejin Choi and Bill Yuchen Lin},
    year={2024},
    eprint={2406.08464},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}

Citationstext

{
    title={Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing}, 
    author={Zhangchen Xu and Fengqing Jiang and Luyao Niu and Yuntian Deng and Radha Poovendran and Yejin Choi and Bill Yuchen Lin},
    year={2024},
    eprint={2406.08464},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}

Citationstext

{
    title={Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing}, 
    author={Zhangchen Xu and Fengqing Jiang and Luyao Niu and Yuntian Deng and Radha Poovendran and Yejin Choi and Bill Yuchen Lin},
    year={2024},
    eprint={2406.08464},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}

Citationstext

{
    title={Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing}, 
    author={Zhangchen Xu and Fengqing Jiang and Luyao Niu and Yuntian Deng and Radha Poovendran and Yejin Choi and Bill Yuchen Lin},
    year={2024},
    eprint={2406.08464},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}

Citationstext

{
    title={Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing}, 
    author={Zhangchen Xu and Fengqing Jiang and Luyao Niu and Yuntian Deng and Radha Poovendran and Yejin Choi and Bill Yuchen Lin},
    year={2024},
    eprint={2406.08464},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}

Citationstext

{
    title={Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing}, 
    author={Zhangchen Xu and Fengqing Jiang and Luyao Niu and Yuntian Deng and Radha Poovendran and Yejin Choi and Bill Yuchen Lin},
    year={2024},
    eprint={2406.08464},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}

Citationstext

{
    title={Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing}, 
    author={Zhangchen Xu and Fengqing Jiang and Luyao Niu and Yuntian Deng and Radha Poovendran and Yejin Choi and Bill Yuchen Lin},
    year={2024},
    eprint={2406.08464},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}

Citationstext

{
    title={Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing}, 
    author={Zhangchen Xu and Fengqing Jiang and Luyao Niu and Yuntian Deng and Radha Poovendran and Yejin Choi and Bill Yuchen Lin},
    year={2024},
    eprint={2406.08464},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}

Citationstext

{
    title={Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing}, 
    author={Zhangchen Xu and Fengqing Jiang and Luyao Niu and Yuntian Deng and Radha Poovendran and Yejin Choi and Bill Yuchen Lin},
    year={2024},
    eprint={2406.08464},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}

Citationstext

{
    title={Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing}, 
    author={Zhangchen Xu and Fengqing Jiang and Luyao Niu and Yuntian Deng and Radha Poovendran and Yejin Choi and Bill Yuchen Lin},
    year={2024},
    eprint={2406.08464},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}

Citationstext

{
    title={Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing}, 
    author={Zhangchen Xu and Fengqing Jiang and Luyao Niu and Yuntian Deng and Radha Poovendran and Yejin Choi and Bill Yuchen Lin},
    year={2024},
    eprint={2406.08464},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}

Citationstext

{
    title={Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing}, 
    author={Zhangchen Xu and Fengqing Jiang and Luyao Niu and Yuntian Deng and Radha Poovendran and Yejin Choi and Bill Yuchen Lin},
    year={2024},
    eprint={2406.08464},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}

Citationstext

{
    title={Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing}, 
    author={Zhangchen Xu and Fengqing Jiang and Luyao Niu and Yuntian Deng and Radha Poovendran and Yejin Choi and Bill Yuchen Lin},
    year={2024},
    eprint={2406.08464},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}

Citationstext

{
    title={Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing}, 
    author={Zhangchen Xu and Fengqing Jiang and Luyao Niu and Yuntian Deng and Radha Poovendran and Yejin Choi and Bill Yuchen Lin},
    year={2024},
    eprint={2406.08464},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}

Citationstext

{
    title={Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing}, 
    author={Zhangchen Xu and Fengqing Jiang and Luyao Niu and Yuntian Deng and Radha Poovendran and Yejin Choi and Bill Yuchen Lin},
    year={2024},
    eprint={2406.08464},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}

Citationstext

{
    title={Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing}, 
    author={Zhangchen Xu and Fengqing Jiang and Luyao Niu and Yuntian Deng and Radha Poovendran and Yejin Choi and Bill Yuchen Lin},
    year={2024},
    eprint={2406.08464},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}

Citationstext

{
    title={Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing}, 
    author={Zhangchen Xu and Fengqing Jiang and Luyao Niu and Yuntian Deng and Radha Poovendran and Yejin Choi and Bill Yuchen Lin},
    year={2024},
    eprint={2406.08464},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}

Citationstext

{
    title={Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing}, 
    author={Zhangchen Xu and Fengqing Jiang and Luyao Niu and Yuntian Deng and Radha Poovendran and Yejin Choi and Bill Yuchen Lin},
    year={2024},
    eprint={2406.08464},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}

Citationstext

{
    title={Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing}, 
    author={Zhangchen Xu and Fengqing Jiang and Luyao Niu and Yuntian Deng and Radha Poovendran and Yejin Choi and Bill Yuchen Lin},
    year={2024},
    eprint={2406.08464},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}

Deploy This Model

Production-ready deployment in minutes

Together.ai

Instant API access to this model

Fastest API

Production-ready inference API. Start free, scale to millions.

Try Free API

Replicate

One-click model deployment

Easiest Setup

Run models in the cloud with simple API. No DevOps required.

Deploy Now

Disclosure: We may earn a commission from these partners. This helps keep LLMYourWay free.