INSAIT-Institute

31 models • 4 total models in database

Sort by:

MamayLM-Gemma-3-12B-IT-v1.0

INSAIT introduces MamayLM-Gemma-3-12B-IT-v1.0, the best performing Ukrainian language model based on google/gemma-3-12b and google/gemma-3-12b-it. MamayLM-Gemma-3-12B-IT-v1.0 is free to use and distributed under the Gemma Terms of Use. This model was created by `INSAIT`, part of Sofia University St. Kliment Ohridski, in Sofia, Bulgaria. The model was built on top of Google’s Gemma 3 12B open models. It was continuously pre-trained on a large pre-filtered dataset using the combination of data mixing and model merging, allowing the model to gain outstanding Ukrainian cultural and linguistic capabilities while retaining its English performance. During the pre-training stage, we use various datasets, including Ukrainian web crawl data (Kobza), freely available datasets such as Wikipedia, a range of specialized Ukrainian datasets, and machine translations of popular English datasets. The model was then instruction-fine-tuned on a newly constructed Ukrainian instruction dataset created using machine translations of current best English datasets and specialized Ukrainian datasets, prepared by Ukrainian community. For more information check our blogpost (available in English and Ukrainian). We evaluate our models on a set of standard English benchmarks, a translated version of them in Ukrainian, as well as, Ukrainian specific benchmarks we collected: - Winogrande challenge: testing world knowledge and understanding - Hellaswag: testing sentence completion - ARC Easy/Challenge: testing logical reasoning - TriviaQA: testing trivia knowledge - GSM-8k: solving multiple-choice questions in high-school mathematics - MMLU: testing knowledge on a multitude of topics - IFEval: testing instruction-following skills - ZNO: testing knowledge of the Ukrainian high school curriculum in Ukrainian language & literature, history, mathematics and geography These benchmarks test logical reasoning, mathematics, knowledge, language understanding and other skills of the models and are provided at https://github.com/insait-institute/lm-evaluation-harness-uk. The graphs above show the performance of MamayLM 12B compared to other large open models. The results show the excellent abilities of MamayLM in Ukrainian, which allow them to outperform much larger models, including Alibaba’s Qwen 2.5 72B and Meta’s Llama3.1 70B. Finally, our models retain the excellent English performance inherited from the original Google Gemma 3 models upon which they are based. MamayLM v1.0 12B also shows improved performance on visual benchmarks like MMMU and ZNO-Vision(MMZNO): Use in 🤗 Transformers First install the latest version of the transformers library: For optimal performance, we recommend the following parameters for text generation, as we have extensively tested our model with them: In principle, increasing temperature should work adequately as well. In order to leverage instruction fine-tuning, your prompt should begin with a beginning-of-sequence token ` ` and be formatted in the Gemma 3 chat template. ` ` should only be the first token in a chat sequence. This format is also available as a chat template via the `applychattemplate()` method: The model and instructions for usage in GGUF format are available at INSAIT-Institute/MamayLM-Gemma-3-12B-IT-v1.0-GGUF. We welcome feedback from the community to help improve MamayLM. If you have suggestions, encounter any issues, or have ideas for improvements, please: - Share your experience using the model through Hugging Face's community discussion feature or - Contact us at [email protected] Your real-world usage and insights are valuable in helping us optimize the model's performance and behaviour for various use cases. Summary - Finetuned from: google/gemma-3-12b-it; google/gemma-3-12b-pt; - Model type: Causal decoder-only transformer language model - Language: Ukrainian and English - Contact: [email protected] - License: MamayLM is distributed under Gemma Terms of Use

INSAIT-Institute

MamayLM-Gemma-3-12B-IT-v1.0

MamayLM-Gemma-3-12B-IT-v1.0-GGUF

MamayLM-Gemma-3-4B-IT-v1.0

BgGPT-Gemma-2-9B-IT-v1.0-GGUF

MamayLM-Gemma-3-4B-IT-v1.0-GGUF

BgGPT-Gemma-2-2.6B-IT-v1.0

BgGPT-Gemma-2-27B-IT-v1.0-GGUF

MamayLM Gemma 2 9B IT V0.1

BgGPT-Gemma-2-2.6B-IT-v1.0-GGUF

MamayLM Gemma 2 9B IT V0.1 GGUF

BgGPT-7B-Instruct-v0.2-GGUF

BgGPT-7B-Instruct-v0.1-GGUF

BgGPT-7B-Instruct-v0.2

BgGPT-Gemma-2-9B-IT-v1.0

Zephyr-7B-MixAT

BrokenMath-Qwen3-4B

BgGPT-Gemma-2-27B-IT-v1.0

Spear1 Franka

Qwen-32B-MixAT

Llama3-8B-MixAT-GCG

BgGPT-7B-Instruct-v0.1

OPC-R1-8B

Zephyr-7B-MixAT-GCG

Llama3-8B-MixAT

Qwen-14B-MixAT

Qwen-14B-MixAT-GCG

ReVLA-Bridge

Mistral-7B-MixAT

GenieRedux

COM4D

ReVLA-Fractal