ai-forever

114 models • 10 total models in database

Sort by:

FRIDA

--- model-index: - name: FRIDA results: - dataset: config: default name: MTEB CEDRClassification (default) revision: c0ba03d058e3e1b2f3fd20518875a4563dd12db4 split: test type: ai-forever/cedr-classification metrics: - type: accuracy value: 64.60148777895856 - type: f1 value: 70.36630348039266 - type: lrap value: 92.47290116896953 - type: main_score value: 64.60148777895856 task: type: MultilabelClassification - dataset: config: default name: MTEB GeoreviewClassification (default) revision: 3765c

ruT5-base

--- language: - ru tags: - PyTorch - Transformers thumbnail: "https://github.com/sberbank-ai/model-zoo" ---

ruGPT-3.5-13B

Language model for Russian. Model has 13B parameters as you can guess from it's name. This is our biggest model so far and it was used for trainig GigaChat (read more about it in the article). Model was pretrained on a 300Gb of various domains, than additionaly trained on the 100 Gb of code and legal documets. Here is the dataset structure: Training data was deduplicated, the text deduplication includes 64-bit hashing of each text in the corpus for keeping texts with a unique hash. We also filter the documents based on their text compression rate using zlib4. The most strongly and weakly compressing deduplicated texts are discarded. Model was trained using Deepspeed and Megatron libraries, on 300B tokens dataset for 3 epochs, around 45 days on 512 V100. After that model was finetuned 1 epoch with sequence length 2048 around 20 days on 200 GPU A100 on additional data (see above). After the final training perplexity for this model was around 8.8 for Russian. Try different generation strategies to reach better results.

ruBert-base

license:apache-2.0

sbert_large_nlu_ru

BERT large model (uncased) for Sentence Embeddings in Russian language. The model is described in this article For better quality, use mean token embeddings. You can use the model directly from the model repository to compute sentence embeddings: Authors + SberDevices Team. + Aleksandr Abramov: HF profile, Github, Kaggle Competitions Master; + Denis Antykhov: Github; + Ibragim Badertdinov: Github

ruRoberta-large

rugpt3small_based_on_gpt2

rugpt3small\based\on\gpt2 The model architecture design, pretraining, and evaluation are documented in our preprint: A Family of Pretrained Transformer Language Models for Russian. The model was pretrained with sequence length 1024 using transformers by the SberDevices team on 80B tokens around 3 epochs. After that, the model was finetuned with the context size of 2048. Total training time took around one week on 32 GPUs. Authors + NLP core team RnD Telegram channel: + Dmitry Zmitrovich

ru-en-RoSBERTa

The ru-en-RoSBERTa is a general text embedding model for Russian. The model is based on ruRoBERTa and fine-tuned with ~4M pairs of supervised, synthetic and unsupervised data in Russian and English. Tokenizer supports some English tokens from RoBERTa tokenizer. For more model details please refer to our article. The model can be used as is with prefixes. It is recommended to use CLS pooling. The choice of prefix and pooling depends on the task. We use the following basic rules to choose a prefix: - `"searchquery: "` and `"searchdocument: "` prefixes are for answer or relevant paragraph retrieval - `"classification: "` prefix is for symmetric paraphrasing related tasks (STS, NLI, Bitext Mining) - `"clustering: "` prefix is for any tasks that rely on thematic features (topic classification, title-body retrieval) To better tailor the model to your needs, you can fine-tune it with relevant high-quality Russian and English datasets. Below are examples of texts encoding using the Transformers and SentenceTransformers libraries. The model is designed to process texts in Russian, the quality in English is unknown. Maximum input text length is limited to 512 tokens.

sbert_large_mt_nlu_ru

mGPT

license:apache-2.0

rugpt3large_based_on_gpt2

sage-m2m100-1.2B

rugpt3medium_based_on_gpt2

rugpt3medium\based\on\gpt2 The model architecture design, pretraining, and evaluation are documented in our preprint: A Family of Pretrained Transformer Language Models for Russian. The model was pretrained with sequence length 1024 using the Transformers library by the SberDevices team on 80B tokens for 3 epochs. After that, the model was finetuned with the context size of 2048 tokens. Total training time was around 16 days on 64 GPUs. The final perplexity on the test set is `17.4`. Authors + NLP core team RnD Telegram channel: + Dmitry Zmitrovich

kandinsky-videomae-large-camera-motion

ruclip-vit-base-patch32-384

ruBert-large

pollux-judge-7b

pollux-judge-7b is a 7-billion parameter generative language model specifically designed to evaluate the quality of other language models' responses in Russian. The model assesses answer quality given input instruction, specific criteria and rubrics, providing automated LLM performance evaluation for Russian-language tasks. pollux-judge-7b is an integral component of the POLLUX project, a comprehensive initiative dedicated to evaluating the generative capabilities of Large Language Models (LLMs). At the heart of this project lies the POLLUX dataset, which introduces systematic taxonomies for both generative tasks and evaluation criteria, providing quantitative and qualitative assessments of responses from top-tier LLMs. Built upon the t-tech/T-lite-it-1.0 architecture, pollux-judge-7b is a decoder-based 7 billion parameter model trained in a sequence-to-sequence fashion. The model is designed to predict both numerical scores and detailed textual rationales based on the original instruction, the LLM's response, specific evaluation criteria, scoring rubrics, and reference answers when available. While the model is technically capable of processing any type of instruction and criterion when properly formatted, its training has been specifically optimized using the generative tasks and evaluation criteria derived from the taxonomies established within the POLLUX dataset. - Model type: decoder - Language(s) (NLP): Russian - License: MIT - Finetuned from model: t-tech/T-lite-it-1.0 - Repository: POLLUX code base - Paper: ArXiv preprint pollux-judge-7b is specifically designed for assessing text responses against a single, predefined criterion per evaluation run. The model operates optimally when provided with all essential components: the source instruction, the response to be evaluated (typically generated by another LLM), the specific evaluation criterion, and its corresponding scoring rubrics. While the model may technically process multiple criteria simultaneously, such usage falls outside its intended design and may yield unpredictable results. Similarly, the model is not designed to function autonomously in determining appropriate evaluation criteria—it requires explicit criterion specification to perform reliable assessments. For optimal performance and reliable results, users should structure each evaluation session around one criterion at a time, providing all necessary contextual components to enable the model's comprehensive scoring and rationale generation capabilities. MODEL OUTPUT DISCLAIMER AND LIMITATION OF LIABILITY All content, responses, and outputs generated by pollux-judge-7b (the "Model") are produced through automated computational processes based on statistical patterns learned from pre-training data. Such outputs do not constitute statements, opinions, recommendations, or positions of the model developers, publishers, or affiliated entities (collectively, the "Developers"). The Model's outputs do not represent, reflect, or endorse any views, beliefs, policies, or positions held by the Developers. Generated content should not be interpreted as official statements, advice, or guidance from the Developers. While the Developers employed appropriate data curation practices during fine-tuning and avoided the intentional inclusion of inappropriate content, the Model's responses may reflect patterns present in the underlying pre-training datasets, which were sourced from publicly available internet content and other large-scale text corpora. The Developers expressly disclaim responsibility for any content generated by the Model. Users acknowledge that: - Generated outputs are probabilistic and may contain inaccuracies, biases, or inappropriate content - The Developers cannot guarantee the accuracy, completeness, or appropriateness of any Model output - Users assume full responsibility for evaluating and using Model-generated content Users are solely responsible for reviewing, validating, and determining the appropriateness of any Model-generated content before use or distribution. Given the substantial time investment required for manual dataset creation—approximately 24,447 hours for the POLLUX dataset—we opted to employ synthetic data for training purposes, as acquiring a manually composed training set of comparable size was not feasible. Our synthetic data generation process proceeded in several stages. Initially, we generated 78,000 instructions using three state-of-the-art language models: DeepSeekV3, OpenAI GPT-4o, and o3-mini, with each model contributing equally to the instruction pool. These instructions were based on the POLLUX tasks taxonomy and complexity levels to ensure consistency with the original framework. Training data does not include Recommendations, Applied Brainstorming, Literary Text Generation, Questions Generation, Style Transfer, Code Modification, and AI as a Character tasks alongside corresponding Task-specific criteria to enable out-of-domain evaluation of the resulting LM-as-a-Judge model. To maintain data quality, we implemented filtering procedure that removed instructions containing more than 5% non-Russian tokens as well as duplicate entries, ultimately yielding a refined set of 26,000 high-quality instructions. Subsequently, we mapped these synthetic instructions to their corresponding evaluation criteria sets using the same algorithm employed in the original POLLUX dataset. Each criteria set comprised Critical, General, Subjective, and relevant Domain- and Task-specific criteria (for detailed methodology, see Section 2.3 in the preprint). To generate diverse responses, we employed 15 open-source language models from various families, including Llama, Phi, Qwen, Mistral, and Gemma, with each model contributing equally to the answer generation process, for the complete listing of the models see Appendix M.2 in the preprint). For criteria annotation, we utilized DeepSeek-R1, which generated numerical scores based on established criterion rubrics along with corresponding rationales for each evaluation. This systematic approach resulted in 8,000,000 samples, each containing the complete tuple of (instruction, answer, criterion, score, rationale). From this dataset, we performed stratified random sampling across tasks to obtain our final training set of 1,000,000 samples, ensuring balanced representation across different task categories. The model was trained in sequence-to-sequence fashion. Input includes source instruction, LLM's answer, name of criterion, its rubrics and reference answer if present. The output is expected to be numerical score from provided rubrics and textual explanation. - Training regime: bf16 mixed precision; - Epochs: 3; - Optimizer: AdamW; - Learning rate: from 1e-05 to 0 with linear scheduler; - Batch size: 256; - For testing data we employed the POLLUX dataset. Note this provides both in- and out-of-domain evaluation as some of the tasks and criteria are absent in training data. We employed Spearman’s rank correlation with expert judgements and Mean Absolute Error (MAE) metrics alongside the Verdict Confidence to assess the performance of pollux-judge-7b and compare it with those of the reference models. MAE offers a high degree of interpretability, as it is measured on the same scale as the annotation – specifically, in points. On the other hand, Spearman’s rank correlation allows to quantify the degree of monotonic association between the two rankings of models outputs and to demonstrate how consistently the LLM-as-Judge reproduces the relative ordering of output quality as established by human experts. The Verdict Confidence is computed as the maximum empirical probability among the assigned scores. We adopted Verdict Confidence as a measure of the annotator agreement instead of Krippendorff’s alpha and the Dawid-Skene algorithm due to their relatively complex interpretability. For the reference models we took OpenAI GPT-4o, DeepSeek-R1 and M-Prometheus-14B. We report aggregate results averaged over the evaluated models (LM-as-a-Judge predicts the scores that have been assigned to the answers of particular LLM) on the out-of-domain part of POLLUX dataset. For detailed evaluation results see Appendix D in the preprint. | Model | pollux-judge-7b | DeepSeek-R1 | M-Prometheus-14B | GPT-4o (2024-11-20)| | --- | --- | --- | --- | --- | | Claude 3.5 Sonnet (2024-10-22) | 0.660 | 0.739 | -0.006 | 0.759 | | GPT-4o (2024-08-06) | 0.596 | 0.627 | -0.033 | 0.643 | | GigaChat-Max (1.0.26.20) | 0.596 | 0.640 | 0.027 | 0.649 | | Llama-3.1-405B | 0.613 | 0.591 | 0.022 | 0.639 | | T-pro-it-1.0 | 0.571 | 0.573 | -0.044 | 0.616 | | YaGPT-4-Pro (2024-10-23) | 0.616 | 0.635 | 0.099 | 0.671 | |o1 (2024-12-17) | 0.675 | 0.748 | -0.022 | 0.771 | | Avg. | 0.619 | 0.647 | 0.019 | 0.674 | | Model | pollux-judge-7b | DeepSeek-R1 | M-Prometheus-14B | GPT-4o (2024-11-20)| | --- | --- | --- | --- | --- | | Claude 3.5 Sonnet (2024-10-22) | 0.501 | 0.245 | 2.697 | 0.236 | | GPT-4o (2024-08-06) | 0.484 | 0.349 | 2.676 | 0.339 | | GigaChat-Max (1.0.26.20) | 0.477 | 0.350 | 2.468 | 0.342 | | Llama-3.1-405B | 0.517 | 0.448 | 1.912 | 0.405 | | T-pro-it-1.0 | 0.497 | 0.475 | 2.978 | 0.425 | | YaGPT-4-Pro (2024-10-23) | 0.511 | 0.387 | 1.793 | 0.369 | |o1 (2024-12-17) | 0.438 | 0.244 | 2.873 | 0.229 | | Avg. | 0.489 | 0.356 | 2.487 | 0.335 | Verdict Confidence (calculated on the whole test sample): | Model | pollux-judge-7b | DeepSeek-R1 | M-Prometheus-14B | GPT-4o (2024-11-20)| | --- | --- | --- | --- | --- | | Claude 3.5 Sonnet (2024-10-22) | 0.800 | 0.879 | 0.645 | 0.877 | | GPT-4o (2024-08-06) | 0.822 | 0.877 | 0.702 | 0.877 | | GigaChat-Max (1.0.26.20) | 0.824 | 0.878 | 0.715 | 0.879 | | Llama-3.1-405B | 0.777 | 0.836 | 0.684 | 0.837 | | T-pro-it-1.0 | 0.791 | 0.838 | 0.644 | 0.842 | | YaGPT-4-Pro (2024-10-23) | 0.813 | 0.866 | 0.738 | 0.867 | |o1 (2024-12-17) | 0.821 | 0.885 | 0.643 | 0.882 | | Avg. | 0.808 | 0.866 | 0.684 | 0.867 |

FRED-T5-1.7B

license:apache-2.0

T5-large-spell

Kandinsky-5.0-T2V-Lite-sft-10s-Diffusers

Habr | Project Page | Technical Report | Original Github | 🤗 Diffusers This repository provides the 🤗 Diffusers integration for Kandinsky 5.0 T2V Lite - a lightweight video generation model (2B parameters) that ranks #1 among open-source models in its class. - 🔥 2025/09/29: We have open-sourced `Kandinsky 5.0 T2V Lite` a lite (2B parameters) version of `Kandinsky 5.0 Video` text-to-video generation model. - 🚀 Diffusers Integration: Now available with easy-to-use 🤗 Diffusers pipeline! Kandinsky 5.0 T2V Lite is a lightweight video generation model (2B parameters) that ranks #1 among open-source models in its class. It outperforms larger Wan models (5B and 14B) and offers the best understanding of Russian concepts in the open-source ecosystem. We provide 8 model variants, each optimized for different use cases: SFT model — delivers the highest generation quality CFG-distilled — runs 2× faster Diffusion-distilled — enables low-latency generation with minimal quality loss (6× faster) Pretrain model — designed for fine-tuning by researchers and enthusiasts Architecture Latent diffusion pipeline with Flow Matching. Diffusion Transformer (DiT) as the main generative backbone with cross-attention to text embeddings. HunyuanVideo 3D VAE encodes/decodes video into a latent space DiT is the main generative module using cross-attention to condition on text Results Side-by-Side Evaluation The evaluation is based on the expanded prompts from the Movie Gen benchmark. Beta Testing You can apply to participate in the beta testing of the Kandinsky Video Lite via the telegram bot.

license:apache-2.0

Kandinsky-5.0-T2V-Lite-sft-5s-Diffusers

license:apache-2.0

rugpt2large

RuM2M100-418M

ruT5-large

sage-fredt5-distilled-95m

sage-v1.1.0

FRED-T5-large-spell

Pollux Judge 32b

sage-fredt5-large

RuM2M100-1.2B

FRED-T5-large

license:apache-2.0

ruElectra-small

bert-base-NER-reptile-5-datasets

pollux-judge-7b-r

mGPT-1.3B-mongol

ruclip-vit-large-patch14-336

mGPT-13B

MGPT 1.3B Kazakh

Language model for Kazakh. Model has 1.3B parameters as you can guess from it's name. Kazakh belongs to Turkic language family. It's a very vibrant language with approximately 18 million speakers. Here are some facts about it: 1. It is a major language spoken in Kazakhstan. 2. Kazakh has its own version of the Cyrillic script but is transitioning to the Latin script. 3. It has a rich tradition of oral literature, including epic poetry. It's one of the models derived from the base mGPT-XL (1.3B) model (see the list below) which was originally trained on the 61 languages from 25 language families using Wikipedia and C4 corpus. We've found additional data for 23 languages most of which are considered as minor and decided to further tune the base model. Kazakh mGPT 1.3B was trained for another 150000 steps with batchsize=4 and context window of 2048 tokens on 1 A100. Final perplexity for this model on validation is 3.38. - 🇦🇲 mGPT-1.3B Armenian - 🇦🇿 mGPT-1.3B Azerbaijan - 🍯 mGPT-1.3B Bashkir - 🇧🇾 mGPT-1.3B Belorussian - 🇧🇬 mGPT-1.3B Bulgarian - 🌞 mGPT-1.3B Buryat - 🌳 mGPT-1.3B Chuvash - 🇬🇪 mGPT-1.3B Georgian - 🌸 mGPT-1.3B Kalmyk - 🇰🇬 mGPT-1.3B Kirgiz - 🐻 mGPT-1.3B Mari - 🇲🇳 mGPT-1.3B Mongol - 🐆 mGPT-1.3B Ossetian - 🇮🇷 mGPT-1.3B Persian - 🇷🇴 mGPT-1.3B Romanian - 🇹🇯 mGPT-1.3B Tajik - ☕ mGPT-1.3B Tatar - 🇹🇲 mGPT-1.3B Turkmen - 🐎 mGPT-1.3B Tuvan - 🇺🇦 mGPT-1.3B Ukranian - 🇺🇿 mGPT-1.3B Uzbek - 💎 mGPT-1.3B Yakut If you'll found a bug of have additional data to train model on your language — please, give us feedback.

Kandinsky-5.0-T2V-Lite-nocfg-5s-Diffusers

Habr | Project Page | Technical Report | Original Github | 🤗 Diffusers This repository provides the 🤗 Diffusers integration for Kandinsky 5.0 T2V Lite - a lightweight video generation model (2B parameters) that ranks #1 among open-source models in its class. - 🔥 2025/09/29: We have open-sourced `Kandinsky 5.0 T2V Lite` a lite (2B parameters) version of `Kandinsky 5.0 Video` text-to-video generation model. - 🚀 Diffusers Integration: Now available with easy-to-use 🤗 Diffusers pipeline! Kandinsky 5.0 T2V Lite is a lightweight video generation model (2B parameters) that ranks #1 among open-source models in its class. It outperforms larger Wan models (5B and 14B) and offers the best understanding of Russian concepts in the open-source ecosystem. We provide 8 model variants, each optimized for different use cases: SFT model — delivers the highest generation quality CFG-distilled — runs 2× faster Diffusion-distilled — enables low-latency generation with minimal quality loss (6× faster) Pretrain model — designed for fine-tuning by researchers and enthusiasts Architecture Latent diffusion pipeline with Flow Matching. Diffusion Transformer (DiT) as the main generative backbone with cross-attention to text embeddings. HunyuanVideo 3D VAE encodes/decodes video into a latent space DiT is the main generative module using cross-attention to condition on text Results Side-by-Side Evaluation The evaluation is based on the expanded prompts from the Movie Gen benchmark. Beta Testing You can apply to participate in the beta testing of the Kandinsky Video Lite via the telegram bot.

license:apache-2.0

mGPT-1.3B-tatar

ruclip-vit-base-patch32-224

mGPT-1.3B-bashkir

Wan2.1-T2V-14B-NABLA-0.5-STA-11-5-5

license:apache-2.0

mGPT-1.3B-persian

Kandinsky-5.0-T2V-Lite-distilled16steps-5s-Diffusers

Habr | Project Page | Technical Report | Original Github | 🤗 Diffusers This repository provides the 🤗 Diffusers integration for Kandinsky 5.0 T2V Lite - a lightweight video generation model (2B parameters) that ranks #1 among open-source models in its class. - 🔥 2025/09/29: We have open-sourced `Kandinsky 5.0 T2V Lite` a lite (2B parameters) version of `Kandinsky 5.0 Video` text-to-video generation model. - 🚀 Diffusers Integration: Now available with easy-to-use 🤗 Diffusers pipeline! Kandinsky 5.0 T2V Lite is a lightweight video generation model (2B parameters) that ranks #1 among open-source models in its class. It outperforms larger Wan models (5B and 14B) and offers the best understanding of Russian concepts in the open-source ecosystem. We provide 8 model variants, each optimized for different use cases: SFT model — delivers the highest generation quality CFG-distilled — runs 2× faster Diffusion-distilled — enables low-latency generation with minimal quality loss (6× faster) Pretrain model — designed for fine-tuning by researchers and enthusiasts Architecture Latent diffusion pipeline with Flow Matching. Diffusion Transformer (DiT) as the main generative backbone with cross-attention to text embeddings. HunyuanVideo 3D VAE encodes/decodes video into a latent space DiT is the main generative module using cross-attention to condition on text Results Side-by-Side Evaluation The evaluation is based on the expanded prompts from the Movie Gen benchmark. Beta Testing You can apply to participate in the beta testing of the Kandinsky Video Lite via the telegram bot.

license:apache-2.0

Wan2.1-T2V-14B-NABLA-0.7

license:apache-2.0

MGPT 1.3B Uzbek

Language model for Uzbek. Model has 1.3B parameters as you can guess from it's name. Uzbek belongs to Turkic language family. It's a very rhythmic language with approximately 32 million speakers. Here are some facts about it: 1. The official language of Uzbekistan. 2. It transitioned from the Cyrillic script to the Latin script after Uzbekistans independence, but the Cyrillic script is still in use among older generations. 3. Historically, it was influenced by Persian and Arabic due to trade and Islamic scholarly traditions. It's one of the models derived from the base mGPT-XL (1.3B) model (see the list below) which was originally trained on the 61 languages from 25 language families using Wikipedia and C4 corpus. We've found additional data for 23 languages most of which are considered as minor and decided to further tune the base model. Uzbek mGPT 1.3B was trained for another 50000 steps with batchsize=4 and context window of 2048 tokens on 1 A100. Final perplexity for this model on validation is 6.84. - 🇦🇲 mGPT-1.3B Armenian - 🇦🇿 mGPT-1.3B Azerbaijan - 🍯 mGPT-1.3B Bashkir - 🇧🇾 mGPT-1.3B Belorussian - 🇧🇬 mGPT-1.3B Bulgarian - 🌞 mGPT-1.3B Buryat - 🌳 mGPT-1.3B Chuvash - 🇬🇪 mGPT-1.3B Georgian - 🌸 mGPT-1.3B Kalmyk - 🇰🇿 mGPT-1.3B Kazakh - 🇰🇬 mGPT-1.3B Kirgiz - 🐻 mGPT-1.3B Mari - 🇲🇳 mGPT-1.3B Mongol - 🐆 mGPT-1.3B Ossetian - 🇮🇷 mGPT-1.3B Persian - 🇷🇴 mGPT-1.3B Romanian - 🇹🇯 mGPT-1.3B Tajik - ☕ mGPT-1.3B Tatar - 🇹🇲 mGPT-1.3B Turkmen - 🐎 mGPT-1.3B Tuvan - 🇺🇦 mGPT-1.3B Ukranian - 💎 mGPT-1.3B Yakut If you'll found a bug of have additional data to train model on your language — please, give us feedback.

sage-mt5-large

Kandinsky 5.0 T2V Lite Pretrain 5s

Habr | Project Page | Technical Report (soon) | Github Kandinsky 5.0: A family of diffusion models for Video & Image generation In this repository, we provide a family of diffusion models to generate a video or an image ( Coming Soon ) given a textual prompt and distilled model for faster generation. - 🔥 Source: : We have open-sourced `Kandinsky 5.0 T2V Lite` a lite (2B parameters) version of `Kandinsky 5.0 Video` text-to-video generation model. Released checkpoints: `kandinsky5litet2vpretrain5s`, `kandinsky5litet2vpretrain10s`, `kandinsky5litet2vsft5s`, `kandinsky5litet2vsft10s`, `kandinsky5litet2vnocfg5s`, `kandinsky5litet2vnocfg10s`, `kandinsky5litet2vdistilled16steps5s`, `kandinsky5litet2vdistilled16steps10s` contains weight from pretrain, supervised finetuning, cfg distillation and diffusion distillation into 16 steps. 5s checkpoints are capable of generating videos up to 5 seconds long. 10s checkpoints is faster models checkpoints trained with NABLA algorithm and capable to generate videos up to 10 seconds long. Kandinsky 5.0 T2V Lite is a lightweight video generation model (2B parameters) that ranks #1 among open-source models in its class. It outperforms larger Wan models (5B and 14B) and offers the best understanding of Russian concepts in the open-source ecosystem. We provide 8 model variants, each optimized for different use cases: SFT model — delivers the highest generation quality; Diffusion-distilled — enables low-latency generation with minimal quality loss (6× faster); Pretrain model — designed for fine-tuning by researchers and enthusiasts. All models are available in two versions: for generating 5-second and 10-second videos. Diffusion Transformer (DiT) as the main generative backbone with cross-attention to text embeddings. - HunyuanVideo 3D VAE encodes/decodes video into a latent space. - DiT is the main generative module using cross-attention to condition on text. | Model | config | video duration | NFE | Checkpoint | Latency | |-------------------------------------|--------|----------------|-----|------------|----------------| | Kandinsky 5.0 T2V Lite SFT 5s |configs/config5ssft.yaml | 5s | 100 |🤗 HF | 139 s | | Kandinsky 5.0 T2V Lite SFT 10s |configs/config10ssft.yaml| 10s | 100 |🤗 HF | 224 s | | Kandinsky 5.0 T2V Lite pretrain 5s |configs/config5spretrain.yaml | 5s | 100 |🤗 HF | 139 s | | Kandinsky 5.0 T2V Lite pretrain 10s |configs/config10spretrain.yaml | 10s | 100 |🤗 HF | 224 s | | Kandinsky 5.0 T2V Lite no-CFG 5s |configs/config5snocfg.yaml| 5s | 50 |🤗 HF | 77 s | | Kandinsky 5.0 T2V Lite no-CFG 10s |configs/config10snocfg.yaml| 10s | 50 |🤗 HF | 124 s | | Kandinsky 5.0 T2V Lite distill 5s |configs/config5sdistil.yaml| 5s | 16 | 🤗 HF| 35 s | | Kandinsky 5.0 T2V Lite distill 10s |configs/config10sdistil.yaml| 10s | 16 | 🤗 HF| 55 s | | Latency was measured after the second inference run. The first run of the model can be slower due to the compilation process. Inference was measured on an NVIDIA H100 GPU with 80 GB of memory, using CUDA 12.8.1 and PyTorch 2.8. For 5-second models Flash Attention 3 was used. The evaluation is based on the expanded prompts from the Movie Gen benchmark, which are available in the expandedprompt column of the benchmark/moviegenbench.csv file. To improve inference performance on NVidia Hopper GPUs, we recommend installing Flash Attention 3. Please, refer to inferenceexample.ipynb notebook for more usage details. For a faster inference, we also provide the capability to perform inference in a distributed way: Offloading For less memory consumption you can use offloading of the models. Magcache Also we provide Magcache inference for faster generations (now available for sft 5s and sft 10s checkpoints). Beta testing You can apply to participate in the beta testing of the Kandinsky Video Lite via the telegram bot. 📑 Todo List - Kandinsky 5.0 Lite Text-to-Video - [x] Multi-GPU Inference code of the 2B models - [ ] Checkpoints 2B models - [x] pretrain - [x] sft - [ ] rl - [x] cfg distil - [x] distil 16 steps - [ ] autoregressive generation - [x] ComfyUI integration - [ ] Diffusers integration - [x] Caching acceleration support - Kandinsky 5.0 Lite Image-to-Video - [ ] Multi-GPU Inference code of the 2B model - [ ] Checkpoints of the 2B model - [ ] ComfyUI integration - [ ] Diffusers integration - Kandinsky 5.0 Pro Text-to-Video - [ ] Multi-GPU Inference code of the models - [ ] Checkpoints of the model - [ ] ComfyUI integration - [ ] Diffusers integration - Kandinsky 5.0 Pro Image-to-Video - [ ] Multi-GPU Inference code of the model - [ ] Checkpoints of the model - [ ] ComfyUI integration - [ ] Diffusers integration - [ ] Technical report Team Leads: Vladimir Arkhipkin, Vladimir Korviakov, Nikolai Gerasimenko, Denis Parkhomenko Core Contributors: Alexey Letunovskiy, Maria Kovaleva, Ivan Kirillov, Lev Novitskiy, Denis Koposov, Dmitrii Mikhailov, Anna Averchenkova, Andrey Shutkin, Julia Agafonova, Olga Kim, Anastasiia Kargapoltseva, Nikita Kiselev Contributors: Anna Dmitrienko, Anastasia Maltseva, Kirill Chernyshev, Ilia Vasiliev, Viacheslav Vasilev, Vladimir Polovnikov, Yury Kolabushin, Alexander Belykh, Mikhail Mamaev, Anastasia Aliaskina, Tatiana Nikulina, Polina Gavrilova

license:apache-2.0

Kandinsky-5.0-T2V-Lite-nocfg-10s-Diffusers

Habr | Project Page | Technical Report | Original Github | 🤗 Diffusers This repository provides the 🤗 Diffusers integration for Kandinsky 5.0 T2V Lite - a lightweight video generation model (2B parameters) that ranks #1 among open-source models in its class. - 🔥 2025/09/29: We have open-sourced `Kandinsky 5.0 T2V Lite` a lite (2B parameters) version of `Kandinsky 5.0 Video` text-to-video generation model. - 🚀 Diffusers Integration: Now available with easy-to-use 🤗 Diffusers pipeline! Kandinsky 5.0 T2V Lite is a lightweight video generation model (2B parameters) that ranks #1 among open-source models in its class. It outperforms larger Wan models (5B and 14B) and offers the best understanding of Russian concepts in the open-source ecosystem. We provide 8 model variants, each optimized for different use cases: SFT model — delivers the highest generation quality CFG-distilled — runs 2× faster Diffusion-distilled — enables low-latency generation with minimal quality loss (6× faster) Pretrain model — designed for fine-tuning by researchers and enthusiasts Architecture Latent diffusion pipeline with Flow Matching. Diffusion Transformer (DiT) as the main generative backbone with cross-attention to text embeddings. HunyuanVideo 3D VAE encodes/decodes video into a latent space DiT is the main generative module using cross-attention to condition on text Results Side-by-Side Evaluation The evaluation is based on the expanded prompts from the Movie Gen benchmark. Beta Testing You can apply to participate in the beta testing of the Kandinsky Video Lite via the telegram bot.

license:apache-2.0

ruclip-vit-base-patch16-384

ruSciBERT

license:apache-2.0

Kandinsky-5.0-T2V-Lite-pretrain-5s-Diffusers

Habr | Project Page | Technical Report | Original Github | 🤗 Diffusers This repository provides the 🤗 Diffusers integration for Kandinsky 5.0 T2V Lite - a lightweight video generation model (2B parameters) that ranks #1 among open-source models in its class. - 🔥 2025/09/29: We have open-sourced `Kandinsky 5.0 T2V Lite` a lite (2B parameters) version of `Kandinsky 5.0 Video` text-to-video generation model. - 🚀 Diffusers Integration: Now available with easy-to-use 🤗 Diffusers pipeline! Kandinsky 5.0 T2V Lite is a lightweight video generation model (2B parameters) that ranks #1 among open-source models in its class. It outperforms larger Wan models (5B and 14B) and offers the best understanding of Russian concepts in the open-source ecosystem. We provide 8 model variants, each optimized for different use cases: SFT model — delivers the highest generation quality CFG-distilled — runs 2× faster Diffusion-distilled — enables low-latency generation with minimal quality loss (6× faster) Pretrain model — designed for fine-tuning by researchers and enthusiasts Architecture Latent diffusion pipeline with Flow Matching. Diffusion Transformer (DiT) as the main generative backbone with cross-attention to text embeddings. HunyuanVideo 3D VAE encodes/decodes video into a latent space DiT is the main generative module using cross-attention to condition on text Results Side-by-Side Evaluation The evaluation is based on the expanded prompts from the Movie Gen benchmark. Beta Testing You can apply to participate in the beta testing of the Kandinsky Video Lite via the telegram bot.

license:apache-2.0

Wan2.1-T2V-14B-NABLA-0.6-STA-11-3-3

license:apache-2.0

mGPT-1.3B-tajik

Kandinsky 5.0 T2V Lite Sft 10s

Habr | Project Page | Technical Report (soon) | Github Kandinsky 5.0: A family of diffusion models for Video & Image generation In this repository, we provide a family of diffusion models to generate a video or an image ( Coming Soon ) given a textual prompt and distilled model for faster generation.

license:apache-2.0

Kandinsky-5.0-T2V-Lite-pretrain-10s-Diffusers

Habr | Project Page | Technical Report | Original Github | 🤗 Diffusers This repository provides the 🤗 Diffusers integration for Kandinsky 5.0 T2V Lite - a lightweight video generation model (2B parameters) that ranks #1 among open-source models in its class. - 🔥 2025/09/29: We have open-sourced `Kandinsky 5.0 T2V Lite` a lite (2B parameters) version of `Kandinsky 5.0 Video` text-to-video generation model. - 🚀 Diffusers Integration: Now available with easy-to-use 🤗 Diffusers pipeline! Kandinsky 5.0 T2V Lite is a lightweight video generation model (2B parameters) that ranks #1 among open-source models in its class. It outperforms larger Wan models (5B and 14B) and offers the best understanding of Russian concepts in the open-source ecosystem. We provide 8 model variants, each optimized for different use cases: SFT model — delivers the highest generation quality CFG-distilled — runs 2× faster Diffusion-distilled — enables low-latency generation with minimal quality loss (6× faster) Pretrain model — designed for fine-tuning by researchers and enthusiasts Architecture Latent diffusion pipeline with Flow Matching. Diffusion Transformer (DiT) as the main generative backbone with cross-attention to text embeddings. HunyuanVideo 3D VAE encodes/decodes video into a latent space DiT is the main generative module using cross-attention to condition on text Results Side-by-Side Evaluation The evaluation is based on the expanded prompts from the Movie Gen benchmark. Beta Testing You can apply to participate in the beta testing of the Kandinsky Video Lite via the telegram bot.

license:apache-2.0

Kandinsky-5.0-T2V-Lite-distilled16steps-10s

Habr | Project Page | Technical Report (soon) | Github Kandinsky 5.0: A family of diffusion models for Video & Image generation In this repository, we provide a family of diffusion models to generate a video or an image ( Coming Soon ) given a textual prompt and distilled model for faster generation. - 🔥 Source: : We have open-sourced `Kandinsky 5.0 T2V Lite` a lite (2B parameters) version of `Kandinsky 5.0 Video` text-to-video generation model. Released checkpoints: `kandinsky5litet2vpretrain5s`, `kandinsky5litet2vpretrain10s`, `kandinsky5litet2vsft5s`, `kandinsky5litet2vsft10s`, `kandinsky5litet2vnocfg5s`, `kandinsky5litet2vnocfg10s`, `kandinsky5litet2vdistilled16steps5s`, `kandinsky5litet2vdistilled16steps10s` contains weight from pretrain, supervised finetuning, cfg distillation and diffusion distillation into 16 steps. 5s checkpoints are capable of generating videos up to 5 seconds long. 10s checkpoints is faster models checkpoints trained with NABLA algorithm and capable to generate videos up to 10 seconds long. Kandinsky 5.0 T2V Lite is a lightweight video generation model (2B parameters) that ranks #1 among open-source models in its class. It outperforms larger Wan models (5B and 14B) and offers the best understanding of Russian concepts in the open-source ecosystem. We provide 8 model variants, each optimized for different use cases: SFT model — delivers the highest generation quality; Diffusion-distilled — enables low-latency generation with minimal quality loss (6× faster); Pretrain model — designed for fine-tuning by researchers and enthusiasts. All models are available in two versions: for generating 5-second and 10-second videos. Diffusion Transformer (DiT) as the main generative backbone with cross-attention to text embeddings. - HunyuanVideo 3D VAE encodes/decodes video into a latent space. - DiT is the main generative module using cross-attention to condition on text. | Model | config | video duration | NFE | Checkpoint | Latency | |-------------------------------------|--------|----------------|-----|------------|----------------| | Kandinsky 5.0 T2V Lite SFT 5s |configs/config5ssft.yaml | 5s | 100 |🤗 HF | 139 s | | Kandinsky 5.0 T2V Lite SFT 10s |configs/config10ssft.yaml| 10s | 100 |🤗 HF | 224 s | | Kandinsky 5.0 T2V Lite pretrain 5s |configs/config5spretrain.yaml | 5s | 100 |🤗 HF | 139 s | | Kandinsky 5.0 T2V Lite pretrain 10s |configs/config10spretrain.yaml | 10s | 100 |🤗 HF | 224 s | | Kandinsky 5.0 T2V Lite no-CFG 5s |configs/config5snocfg.yaml| 5s | 50 |🤗 HF | 77 s | | Kandinsky 5.0 T2V Lite no-CFG 10s |configs/config10snocfg.yaml| 10s | 50 |🤗 HF | 124 s | | Kandinsky 5.0 T2V Lite distill 5s |configs/config5sdistil.yaml| 5s | 16 | 🤗 HF| 35 s | | Kandinsky 5.0 T2V Lite distill 10s |configs/config10sdistil.yaml| 10s | 16 | 🤗 HF| 55 s | | Latency was measured after the second inference run. The first run of the model can be slower due to the compilation process. Inference was measured on an NVIDIA H100 GPU with 80 GB of memory, using CUDA 12.8.1 and PyTorch 2.8. For 5-second models Flash Attention 3 was used. The evaluation is based on the expanded prompts from the Movie Gen benchmark, which are available in the expandedprompt column of the benchmark/moviegenbench.csv file. To improve inference performance on NVidia Hopper GPUs, we recommend installing Flash Attention 3. Please, refer to inferenceexample.ipynb notebook for more usage details. For a faster inference, we also provide the capability to perform inference in a distributed way: Offloading For less memory consumption you can use offloading of the models. Magcache Also we provide Magcache inference for faster generations (now available for sft 5s and sft 10s checkpoints). Beta testing You can apply to participate in the beta testing of the Kandinsky Video Lite via the telegram bot. 📑 Todo List - Kandinsky 5.0 Lite Text-to-Video - [x] Multi-GPU Inference code of the 2B models - [ ] Checkpoints 2B models - [x] pretrain - [x] sft - [ ] rl - [x] cfg distil - [x] distil 16 steps - [ ] autoregressive generation - [x] ComfyUI integration - [ ] Diffusers integration - [x] Caching acceleration support - Kandinsky 5.0 Lite Image-to-Video - [ ] Multi-GPU Inference code of the 2B model - [ ] Checkpoints of the 2B model - [ ] ComfyUI integration - [ ] Diffusers integration - Kandinsky 5.0 Pro Text-to-Video - [ ] Multi-GPU Inference code of the models - [ ] Checkpoints of the model - [ ] ComfyUI integration - [ ] Diffusers integration - Kandinsky 5.0 Pro Image-to-Video - [ ] Multi-GPU Inference code of the model - [ ] Checkpoints of the model - [ ] ComfyUI integration - [ ] Diffusers integration - [ ] Technical report Team Leads: Vladimir Arkhipkin, Vladimir Korviakov, Nikolai Gerasimenko, Denis Parkhomenko Core Contributors: Alexey Letunovskiy, Maria Kovaleva, Ivan Kirillov, Lev Novitskiy, Denis Koposov, Dmitrii Mikhailov, Anna Averchenkova, Andrey Shutkin, Julia Agafonova, Olga Kim, Anastasiia Kargapoltseva, Nikita Kiselev Contributors: Anna Dmitrienko, Anastasia Maltseva, Kirill Chernyshev, Ilia Vasiliev, Viacheslav Vasilev, Vladimir Polovnikov, Yury Kolabushin, Alexander Belykh, Mikhail Mamaev, Anastasia Aliaskina, Tatiana Nikulina, Polina Gavrilova

license:apache-2.0

Kandinsky-5.0-T2V-Lite-distilled16steps-10s-Diffusers

Habr | Project Page | Technical Report | Original Github | 🤗 Diffusers This repository provides the 🤗 Diffusers integration for Kandinsky 5.0 T2V Lite - a lightweight video generation model (2B parameters) that ranks #1 among open-source models in its class. - 🔥 2025/09/29: We have open-sourced `Kandinsky 5.0 T2V Lite` a lite (2B parameters) version of `Kandinsky 5.0 Video` text-to-video generation model. - 🚀 Diffusers Integration: Now available with easy-to-use 🤗 Diffusers pipeline! Kandinsky 5.0 T2V Lite is a lightweight video generation model (2B parameters) that ranks #1 among open-source models in its class. It outperforms larger Wan models (5B and 14B) and offers the best understanding of Russian concepts in the open-source ecosystem. We provide 8 model variants, each optimized for different use cases: SFT model — delivers the highest generation quality CFG-distilled — runs 2× faster Diffusion-distilled — enables low-latency generation with minimal quality loss (6× faster) Pretrain model — designed for fine-tuning by researchers and enthusiasts Architecture Latent diffusion pipeline with Flow Matching. Diffusion Transformer (DiT) as the main generative backbone with cross-attention to text embeddings. HunyuanVideo 3D VAE encodes/decodes video into a latent space DiT is the main generative module using cross-attention to condition on text Results Side-by-Side Evaluation The evaluation is based on the expanded prompts from the Movie Gen benchmark. Beta Testing You can apply to participate in the beta testing of the Kandinsky Video Lite via the telegram bot.

license:apache-2.0

Kandinsky-5.0-T2V-Lite-sft-5s

Habr | Project Page | Technical Report (soon) | Github Kandinsky 5.0: A family of diffusion models for Video & Image generation In this repository, we provide a family of diffusion models to generate a video or an image ( Coming Soon ) given a textual prompt and distilled model for faster generation. - 🔥 Source: : We have open-sourced `Kandinsky 5.0 T2V Lite` a lite (2B parameters) version of `Kandinsky 5.0 Video` text-to-video generation model. Released checkpoints: `kandinsky5litet2vpretrain5s`, `kandinsky5litet2vpretrain10s`, `kandinsky5litet2vsft5s`, `kandinsky5litet2vsft10s`, `kandinsky5litet2vnocfg5s`, `kandinsky5litet2vnocfg10s`, `kandinsky5litet2vdistilled16steps5s`, `kandinsky5litet2vdistilled16steps10s` contains weight from pretrain, supervised finetuning, cfg distillation and diffusion distillation into 16 steps. 5s checkpoints are capable of generating videos up to 5 seconds long. 10s checkpoints is faster models checkpoints trained with NABLA algorithm and capable to generate videos up to 10 seconds long. Kandinsky 5.0 T2V Lite is a lightweight video generation model (2B parameters) that ranks #1 among open-source models in its class. It outperforms larger Wan models (5B and 14B) and offers the best understanding of Russian concepts in the open-source ecosystem. We provide 8 model variants, each optimized for different use cases: SFT model — delivers the highest generation quality; Diffusion-distilled — enables low-latency generation with minimal quality loss (6× faster); Pretrain model — designed for fine-tuning by researchers and enthusiasts. All models are available in two versions: for generating 5-second and 10-second videos. Diffusion Transformer (DiT) as the main generative backbone with cross-attention to text embeddings. - HunyuanVideo 3D VAE encodes/decodes video into a latent space. - DiT is the main generative module using cross-attention to condition on text. | Model | config | video duration | NFE | Checkpoint | Latency | |-------------------------------------|--------|----------------|-----|------------|----------------| | Kandinsky 5.0 T2V Lite SFT 5s |configs/config5ssft.yaml | 5s | 100 |🤗 HF | 139 s | | Kandinsky 5.0 T2V Lite SFT 10s |configs/config10ssft.yaml| 10s | 100 |🤗 HF | 224 s | | Kandinsky 5.0 T2V Lite pretrain 5s |configs/config5spretrain.yaml | 5s | 100 |🤗 HF | 139 s | | Kandinsky 5.0 T2V Lite pretrain 10s |configs/config10spretrain.yaml | 10s | 100 |🤗 HF | 224 s | | Kandinsky 5.0 T2V Lite no-CFG 5s |configs/config5snocfg.yaml| 5s | 50 |🤗 HF | 77 s | | Kandinsky 5.0 T2V Lite no-CFG 10s |configs/config10snocfg.yaml| 10s | 50 |🤗 HF | 124 s | | Kandinsky 5.0 T2V Lite distill 5s |configs/config5sdistil.yaml| 5s | 16 | 🤗 HF| 35 s | | Kandinsky 5.0 T2V Lite distill 10s |configs/config10sdistil.yaml| 10s | 16 | 🤗 HF| 55 s | | Latency was measured after the second inference run. The first run of the model can be slower due to the compilation process. Inference was measured on an NVIDIA H100 GPU with 80 GB of memory, using CUDA 12.8.1 and PyTorch 2.8. For 5-second models Flash Attention 3 was used. The evaluation is based on the expanded prompts from the Movie Gen benchmark, which are available in the expandedprompt column of the benchmark/moviegenbench.csv file. To improve inference performance on NVidia Hopper GPUs, we recommend installing Flash Attention 3. Please, refer to inferenceexample.ipynb notebook for more usage details. For a faster inference, we also provide the capability to perform inference in a distributed way: Offloading For less memory consumption you can use offloading of the models. Magcache Also we provide Magcache inference for faster generations (now available for sft 5s and sft 10s checkpoints). Beta testing You can apply to participate in the beta testing of the Kandinsky Video Lite via the telegram bot. 📑 Todo List - Kandinsky 5.0 Lite Text-to-Video - [x] Multi-GPU Inference code of the 2B models - [ ] Checkpoints 2B models - [x] pretrain - [x] sft - [ ] rl - [x] cfg distil - [x] distil 16 steps - [ ] autoregressive generation - [x] ComfyUI integration - [ ] Diffusers integration - [x] Caching acceleration support - Kandinsky 5.0 Lite Image-to-Video - [ ] Multi-GPU Inference code of the 2B model - [ ] Checkpoints of the 2B model - [ ] ComfyUI integration - [ ] Diffusers integration - Kandinsky 5.0 Pro Text-to-Video - [ ] Multi-GPU Inference code of the models - [ ] Checkpoints of the model - [ ] ComfyUI integration - [ ] Diffusers integration - Kandinsky 5.0 Pro Image-to-Video - [ ] Multi-GPU Inference code of the model - [ ] Checkpoints of the model - [ ] ComfyUI integration - [ ] Diffusers integration - [ ] Technical report Team Leads: Vladimir Arkhipkin, Vladimir Korviakov, Nikolai Gerasimenko, Denis Parkhomenko Core Contributors: Alexey Letunovskiy, Maria Kovaleva, Ivan Kirillov, Lev Novitskiy, Denis Koposov, Dmitrii Mikhailov, Anna Averchenkova, Andrey Shutkin, Julia Agafonova, Olga Kim, Anastasiia Kargapoltseva, Nikita Kiselev Contributors: Anna Dmitrienko, Anastasia Maltseva, Kirill Chernyshev, Ilia Vasiliev, Viacheslav Vasilev, Vladimir Polovnikov, Yury Kolabushin, Alexander Belykh, Mikhail Mamaev, Anastasia Aliaskina, Tatiana Nikulina, Polina Gavrilova

license:apache-2.0

mGPT-1.3B-ukranian

Kandinsky-5.0-T2V-Lite-pretrain-10s

Habr | Project Page | Technical Report (soon) | Github Kandinsky 5.0: A family of diffusion models for Video & Image generation In this repository, we provide a family of diffusion models to generate a video or an image ( Coming Soon ) given a textual prompt and distilled model for faster generation. - 🔥 Source: : We have open-sourced `Kandinsky 5.0 T2V Lite` a lite (2B parameters) version of `Kandinsky 5.0 Video` text-to-video generation model. Released checkpoints: `kandinsky5litet2vpretrain5s`, `kandinsky5litet2vpretrain10s`, `kandinsky5litet2vsft5s`, `kandinsky5litet2vsft10s`, `kandinsky5litet2vnocfg5s`, `kandinsky5litet2vnocfg10s`, `kandinsky5litet2vdistilled16steps5s`, `kandinsky5litet2vdistilled16steps10s` contains weight from pretrain, supervised finetuning, cfg distillation and diffusion distillation into 16 steps. 5s checkpoints are capable of generating videos up to 5 seconds long. 10s checkpoints is faster models checkpoints trained with NABLA algorithm and capable to generate videos up to 10 seconds long. Kandinsky 5.0 T2V Lite is a lightweight video generation model (2B parameters) that ranks #1 among open-source models in its class. It outperforms larger Wan models (5B and 14B) and offers the best understanding of Russian concepts in the open-source ecosystem. We provide 8 model variants, each optimized for different use cases: SFT model — delivers the highest generation quality; Diffusion-distilled — enables low-latency generation with minimal quality loss (6× faster); Pretrain model — designed for fine-tuning by researchers and enthusiasts. All models are available in two versions: for generating 5-second and 10-second videos. Diffusion Transformer (DiT) as the main generative backbone with cross-attention to text embeddings. - HunyuanVideo 3D VAE encodes/decodes video into a latent space. - DiT is the main generative module using cross-attention to condition on text. | Model | config | video duration | NFE | Checkpoint | Latency | |-------------------------------------|--------|----------------|-----|------------|----------------| | Kandinsky 5.0 T2V Lite SFT 5s |configs/config5ssft.yaml | 5s | 100 |🤗 HF | 139 s | | Kandinsky 5.0 T2V Lite SFT 10s |configs/config10ssft.yaml| 10s | 100 |🤗 HF | 224 s | | Kandinsky 5.0 T2V Lite pretrain 5s |configs/config5spretrain.yaml | 5s | 100 |🤗 HF | 139 s | | Kandinsky 5.0 T2V Lite pretrain 10s |configs/config10spretrain.yaml | 10s | 100 |🤗 HF | 224 s | | Kandinsky 5.0 T2V Lite no-CFG 5s |configs/config5snocfg.yaml| 5s | 50 |🤗 HF | 77 s | | Kandinsky 5.0 T2V Lite no-CFG 10s |configs/config10snocfg.yaml| 10s | 50 |🤗 HF | 124 s | | Kandinsky 5.0 T2V Lite distill 5s |configs/config5sdistil.yaml| 5s | 16 | 🤗 HF| 35 s | | Kandinsky 5.0 T2V Lite distill 10s |configs/config10sdistil.yaml| 10s | 16 | 🤗 HF| 55 s | | Latency was measured after the second inference run. The first run of the model can be slower due to the compilation process. Inference was measured on an NVIDIA H100 GPU with 80 GB of memory, using CUDA 12.8.1 and PyTorch 2.8. For 5-second models Flash Attention 3 was used. The evaluation is based on the expanded prompts from the Movie Gen benchmark, which are available in the expandedprompt column of the benchmark/moviegenbench.csv file. To improve inference performance on NVidia Hopper GPUs, we recommend installing Flash Attention 3. Please, refer to inferenceexample.ipynb notebook for more usage details. For a faster inference, we also provide the capability to perform inference in a distributed way: Offloading For less memory consumption you can use offloading of the models. Magcache Also we provide Magcache inference for faster generations (now available for sft 5s and sft 10s checkpoints). Beta testing You can apply to participate in the beta testing of the Kandinsky Video Lite via the telegram bot. 📑 Todo List - Kandinsky 5.0 Lite Text-to-Video - [x] Multi-GPU Inference code of the 2B models - [ ] Checkpoints 2B models - [x] pretrain - [x] sft - [ ] rl - [x] cfg distil - [x] distil 16 steps - [ ] autoregressive generation - [x] ComfyUI integration - [ ] Diffusers integration - [x] Caching acceleration support - Kandinsky 5.0 Lite Image-to-Video - [ ] Multi-GPU Inference code of the 2B model - [ ] Checkpoints of the 2B model - [ ] ComfyUI integration - [ ] Diffusers integration - Kandinsky 5.0 Pro Text-to-Video - [ ] Multi-GPU Inference code of the models - [ ] Checkpoints of the model - [ ] ComfyUI integration - [ ] Diffusers integration - Kandinsky 5.0 Pro Image-to-Video - [ ] Multi-GPU Inference code of the model - [ ] Checkpoints of the model - [ ] ComfyUI integration - [ ] Diffusers integration - [ ] Technical report Team Leads: Vladimir Arkhipkin, Vladimir Korviakov, Nikolai Gerasimenko, Denis Parkhomenko Core Contributors: Alexey Letunovskiy, Maria Kovaleva, Ivan Kirillov, Lev Novitskiy, Denis Koposov, Dmitrii Mikhailov, Anna Averchenkova, Andrey Shutkin, Julia Agafonova, Olga Kim, Anastasiia Kargapoltseva, Nikita Kiselev Contributors: Anna Dmitrienko, Anastasia Maltseva, Kirill Chernyshev, Ilia Vasiliev, Viacheslav Vasilev, Vladimir Polovnikov, Yury Kolabushin, Alexander Belykh, Mikhail Mamaev, Anastasia Aliaskina, Tatiana Nikulina, Polina Gavrilova

license:apache-2.0

Kandinsky-5.0-T2V-Lite-distilled16steps-5s

Habr | Project Page | Technical Report (soon) | Github Kandinsky 5.0: A family of diffusion models for Video & Image generation In this repository, we provide a family of diffusion models to generate a video or an image ( Coming Soon ) given a textual prompt and distilled model for faster generation. - 🔥 Source: : We have open-sourced `Kandinsky 5.0 T2V Lite` a lite (2B parameters) version of `Kandinsky 5.0 Video` text-to-video generation model. Released checkpoints: `kandinsky5litet2vpretrain5s`, `kandinsky5litet2vpretrain10s`, `kandinsky5litet2vsft5s`, `kandinsky5litet2vsft10s`, `kandinsky5litet2vnocfg5s`, `kandinsky5litet2vnocfg10s`, `kandinsky5litet2vdistilled16steps5s`, `kandinsky5litet2vdistilled16steps10s` contains weight from pretrain, supervised finetuning, cfg distillation and diffusion distillation into 16 steps. 5s checkpoints are capable of generating videos up to 5 seconds long. 10s checkpoints is faster models checkpoints trained with NABLA algorithm and capable to generate videos up to 10 seconds long. Kandinsky 5.0 T2V Lite is a lightweight video generation model (2B parameters) that ranks #1 among open-source models in its class. It outperforms larger Wan models (5B and 14B) and offers the best understanding of Russian concepts in the open-source ecosystem. We provide 8 model variants, each optimized for different use cases: SFT model — delivers the highest generation quality; Diffusion-distilled — enables low-latency generation with minimal quality loss (6× faster); Pretrain model — designed for fine-tuning by researchers and enthusiasts. All models are available in two versions: for generating 5-second and 10-second videos. Diffusion Transformer (DiT) as the main generative backbone with cross-attention to text embeddings. - HunyuanVideo 3D VAE encodes/decodes video into a latent space. - DiT is the main generative module using cross-attention to condition on text. | Model | config | video duration | NFE | Checkpoint | Latency | |-------------------------------------|--------|----------------|-----|------------|----------------| | Kandinsky 5.0 T2V Lite SFT 5s |configs/config5ssft.yaml | 5s | 100 |🤗 HF | 139 s | | Kandinsky 5.0 T2V Lite SFT 10s |configs/config10ssft.yaml| 10s | 100 |🤗 HF | 224 s | | Kandinsky 5.0 T2V Lite pretrain 5s |configs/config5spretrain.yaml | 5s | 100 |🤗 HF | 139 s | | Kandinsky 5.0 T2V Lite pretrain 10s |configs/config10spretrain.yaml | 10s | 100 |🤗 HF | 224 s | | Kandinsky 5.0 T2V Lite no-CFG 5s |configs/config5snocfg.yaml| 5s | 50 |🤗 HF | 77 s | | Kandinsky 5.0 T2V Lite no-CFG 10s |configs/config10snocfg.yaml| 10s | 50 |🤗 HF | 124 s | | Kandinsky 5.0 T2V Lite distill 5s |configs/config5sdistil.yaml| 5s | 16 | 🤗 HF| 35 s | | Kandinsky 5.0 T2V Lite distill 10s |configs/config10sdistil.yaml| 10s | 16 | 🤗 HF| 55 s | | Latency was measured after the second inference run. The first run of the model can be slower due to the compilation process. Inference was measured on an NVIDIA H100 GPU with 80 GB of memory, using CUDA 12.8.1 and PyTorch 2.8. For 5-second models Flash Attention 3 was used. The evaluation is based on the expanded prompts from the Movie Gen benchmark, which are available in the expandedprompt column of the benchmark/moviegenbench.csv file. To improve inference performance on NVidia Hopper GPUs, we recommend installing Flash Attention 3. Please, refer to inferenceexample.ipynb notebook for more usage details. For a faster inference, we also provide the capability to perform inference in a distributed way: Offloading For less memory consumption you can use offloading of the models. Magcache Also we provide Magcache inference for faster generations (now available for sft 5s and sft 10s checkpoints). Beta testing You can apply to participate in the beta testing of the Kandinsky Video Lite via the telegram bot. 📑 Todo List - Kandinsky 5.0 Lite Text-to-Video - [x] Multi-GPU Inference code of the 2B models - [ ] Checkpoints 2B models - [x] pretrain - [x] sft - [ ] rl - [x] cfg distil - [x] distil 16 steps - [ ] autoregressive generation - [x] ComfyUI integration - [ ] Diffusers integration - [x] Caching acceleration support - Kandinsky 5.0 Lite Image-to-Video - [ ] Multi-GPU Inference code of the 2B model - [ ] Checkpoints of the 2B model - [ ] ComfyUI integration - [ ] Diffusers integration - Kandinsky 5.0 Pro Text-to-Video - [ ] Multi-GPU Inference code of the models - [ ] Checkpoints of the model - [ ] ComfyUI integration - [ ] Diffusers integration - Kandinsky 5.0 Pro Image-to-Video - [ ] Multi-GPU Inference code of the model - [ ] Checkpoints of the model - [ ] ComfyUI integration - [ ] Diffusers integration - [ ] Technical report Team Leads: Vladimir Arkhipkin, Vladimir Korviakov, Nikolai Gerasimenko, Denis Parkhomenko Core Contributors: Alexey Letunovskiy, Maria Kovaleva, Ivan Kirillov, Lev Novitskiy, Denis Koposov, Dmitrii Mikhailov, Anna Averchenkova, Andrey Shutkin, Julia Agafonova, Olga Kim, Anastasiia Kargapoltseva, Nikita Kiselev Contributors: Anna Dmitrienko, Anastasia Maltseva, Kirill Chernyshev, Ilia Vasiliev, Viacheslav Vasilev, Vladimir Polovnikov, Yury Kolabushin, Alexander Belykh, Mikhail Mamaev, Anastasia Aliaskina, Tatiana Nikulina, Polina Gavrilova

license:apache-2.0

ruElectra-medium

Kandinsky-5.0-T2V-Lite-nocfg-10s

Habr | Project Page | Technical Report (soon) | Github Kandinsky 5.0: A family of diffusion models for Video & Image generation In this repository, we provide a family of diffusion models to generate a video or an image ( Coming Soon ) given a textual prompt and distilled model for faster generation. - 🔥 Source: : We have open-sourced `Kandinsky 5.0 T2V Lite` a lite (2B parameters) version of `Kandinsky 5.0 Video` text-to-video generation model. Released checkpoints: `kandinsky5litet2vpretrain5s`, `kandinsky5litet2vpretrain10s`, `kandinsky5litet2vsft5s`, `kandinsky5litet2vsft10s`, `kandinsky5litet2vnocfg5s`, `kandinsky5litet2vnocfg10s`, `kandinsky5litet2vdistilled16steps5s`, `kandinsky5litet2vdistilled16steps10s` contains weight from pretrain, supervised finetuning, cfg distillation and diffusion distillation into 16 steps. 5s checkpoints are capable of generating videos up to 5 seconds long. 10s checkpoints is faster models checkpoints trained with NABLA algorithm and capable to generate videos up to 10 seconds long. Kandinsky 5.0 T2V Lite is a lightweight video generation model (2B parameters) that ranks #1 among open-source models in its class. It outperforms larger Wan models (5B and 14B) and offers the best understanding of Russian concepts in the open-source ecosystem. We provide 8 model variants, each optimized for different use cases: SFT model — delivers the highest generation quality; Diffusion-distilled — enables low-latency generation with minimal quality loss (6× faster); Pretrain model — designed for fine-tuning by researchers and enthusiasts. All models are available in two versions: for generating 5-second and 10-second videos. Diffusion Transformer (DiT) as the main generative backbone with cross-attention to text embeddings. - HunyuanVideo 3D VAE encodes/decodes video into a latent space. - DiT is the main generative module using cross-attention to condition on text. | Model | config | video duration | NFE | Checkpoint | Latency | |-------------------------------------|--------|----------------|-----|------------|----------------| | Kandinsky 5.0 T2V Lite SFT 5s |configs/config5ssft.yaml | 5s | 100 |🤗 HF | 139 s | | Kandinsky 5.0 T2V Lite SFT 10s |configs/config10ssft.yaml| 10s | 100 |🤗 HF | 224 s | | Kandinsky 5.0 T2V Lite pretrain 5s |configs/config5spretrain.yaml | 5s | 100 |🤗 HF | 139 s | | Kandinsky 5.0 T2V Lite pretrain 10s |configs/config10spretrain.yaml | 10s | 100 |🤗 HF | 224 s | | Kandinsky 5.0 T2V Lite no-CFG 5s |configs/config5snocfg.yaml| 5s | 50 |🤗 HF | 77 s | | Kandinsky 5.0 T2V Lite no-CFG 10s |configs/config10snocfg.yaml| 10s | 50 |🤗 HF | 124 s | | Kandinsky 5.0 T2V Lite distill 5s |configs/config5sdistil.yaml| 5s | 16 | 🤗 HF| 35 s | | Kandinsky 5.0 T2V Lite distill 10s |configs/config10sdistil.yaml| 10s | 16 | 🤗 HF| 55 s | | Latency was measured after the second inference run. The first run of the model can be slower due to the compilation process. Inference was measured on an NVIDIA H100 GPU with 80 GB of memory, using CUDA 12.8.1 and PyTorch 2.8. For 5-second models Flash Attention 3 was used. The evaluation is based on the expanded prompts from the Movie Gen benchmark, which are available in the expandedprompt column of the benchmark/moviegenbench.csv file. To improve inference performance on NVidia Hopper GPUs, we recommend installing Flash Attention 3. Please, refer to inferenceexample.ipynb notebook for more usage details. For a faster inference, we also provide the capability to perform inference in a distributed way: Offloading For less memory consumption you can use offloading of the models. Magcache Also we provide Magcache inference for faster generations (now available for sft 5s and sft 10s checkpoints). Beta testing You can apply to participate in the beta testing of the Kandinsky Video Lite via the telegram bot. 📑 Todo List - Kandinsky 5.0 Lite Text-to-Video - [x] Multi-GPU Inference code of the 2B models - [ ] Checkpoints 2B models - [x] pretrain - [x] sft - [ ] rl - [x] cfg distil - [x] distil 16 steps - [ ] autoregressive generation - [x] ComfyUI integration - [ ] Diffusers integration - [x] Caching acceleration support - Kandinsky 5.0 Lite Image-to-Video - [ ] Multi-GPU Inference code of the 2B model - [ ] Checkpoints of the 2B model - [ ] ComfyUI integration - [ ] Diffusers integration - Kandinsky 5.0 Pro Text-to-Video - [ ] Multi-GPU Inference code of the models - [ ] Checkpoints of the model - [ ] ComfyUI integration - [ ] Diffusers integration - Kandinsky 5.0 Pro Image-to-Video - [ ] Multi-GPU Inference code of the model - [ ] Checkpoints of the model - [ ] ComfyUI integration - [ ] Diffusers integration - [ ] Technical report Team Leads: Vladimir Arkhipkin, Vladimir Korviakov, Nikolai Gerasimenko, Denis Parkhomenko Core Contributors: Alexey Letunovskiy, Maria Kovaleva, Ivan Kirillov, Lev Novitskiy, Denis Koposov, Dmitrii Mikhailov, Anna Averchenkova, Andrey Shutkin, Julia Agafonova, Olga Kim, Anastasiia Kargapoltseva, Nikita Kiselev Contributors: Anna Dmitrienko, Anastasia Maltseva, Kirill Chernyshev, Ilia Vasiliev, Viacheslav Vasilev, Vladimir Polovnikov, Yury Kolabushin, Alexander Belykh, Mikhail Mamaev, Anastasia Aliaskina, Tatiana Nikulina, Polina Gavrilova

license:apache-2.0

mGPT-1.3B-azerbaijan

mGPT-1.3B-georgian

Kandinsky-5.0-T2V-Lite-nocfg-5s

Habr | Project Page | Technical Report (soon) | Github Kandinsky 5.0: A family of diffusion models for Video & Image generation In this repository, we provide a family of diffusion models to generate a video or an image ( Coming Soon ) given a textual prompt and distilled model for faster generation. - 🔥 Source: : We have open-sourced `Kandinsky 5.0 T2V Lite` a lite (2B parameters) version of `Kandinsky 5.0 Video` text-to-video generation model. Released checkpoints: `kandinsky5litet2vpretrain5s`, `kandinsky5litet2vpretrain10s`, `kandinsky5litet2vsft5s`, `kandinsky5litet2vsft10s`, `kandinsky5litet2vnocfg5s`, `kandinsky5litet2vnocfg10s`, `kandinsky5litet2vdistilled16steps5s`, `kandinsky5litet2vdistilled16steps10s` contains weight from pretrain, supervised finetuning, cfg distillation and diffusion distillation into 16 steps. 5s checkpoints are capable of generating videos up to 5 seconds long. 10s checkpoints is faster models checkpoints trained with NABLA algorithm and capable to generate videos up to 10 seconds long. Kandinsky 5.0 T2V Lite is a lightweight video generation model (2B parameters) that ranks #1 among open-source models in its class. It outperforms larger Wan models (5B and 14B) and offers the best understanding of Russian concepts in the open-source ecosystem. We provide 8 model variants, each optimized for different use cases: SFT model — delivers the highest generation quality; Diffusion-distilled — enables low-latency generation with minimal quality loss (6× faster); Pretrain model — designed for fine-tuning by researchers and enthusiasts. All models are available in two versions: for generating 5-second and 10-second videos. Diffusion Transformer (DiT) as the main generative backbone with cross-attention to text embeddings. - HunyuanVideo 3D VAE encodes/decodes video into a latent space. - DiT is the main generative module using cross-attention to condition on text. | Model | config | video duration | NFE | Checkpoint | Latency | |-------------------------------------|--------|----------------|-----|------------|----------------| | Kandinsky 5.0 T2V Lite SFT 5s |configs/config5ssft.yaml | 5s | 100 |🤗 HF | 139 s | | Kandinsky 5.0 T2V Lite SFT 10s |configs/config10ssft.yaml| 10s | 100 |🤗 HF | 224 s | | Kandinsky 5.0 T2V Lite pretrain 5s |configs/config5spretrain.yaml | 5s | 100 |🤗 HF | 139 s | | Kandinsky 5.0 T2V Lite pretrain 10s |configs/config10spretrain.yaml | 10s | 100 |🤗 HF | 224 s | | Kandinsky 5.0 T2V Lite no-CFG 5s |configs/config5snocfg.yaml| 5s | 50 |🤗 HF | 77 s | | Kandinsky 5.0 T2V Lite no-CFG 10s |configs/config10snocfg.yaml| 10s | 50 |🤗 HF | 124 s | | Kandinsky 5.0 T2V Lite distill 5s |configs/config5sdistil.yaml| 5s | 16 | 🤗 HF| 35 s | | Kandinsky 5.0 T2V Lite distill 10s |configs/config10sdistil.yaml| 10s | 16 | 🤗 HF| 55 s | | Latency was measured after the second inference run. The first run of the model can be slower due to the compilation process. Inference was measured on an NVIDIA H100 GPU with 80 GB of memory, using CUDA 12.8.1 and PyTorch 2.8. For 5-second models Flash Attention 3 was used. The evaluation is based on the expanded prompts from the Movie Gen benchmark, which are available in the expandedprompt column of the benchmark/moviegenbench.csv file. To improve inference performance on NVidia Hopper GPUs, we recommend installing Flash Attention 3. Please, refer to inferenceexample.ipynb notebook for more usage details. For a faster inference, we also provide the capability to perform inference in a distributed way: Offloading For less memory consumption you can use offloading of the models. Magcache Also we provide Magcache inference for faster generations (now available for sft 5s and sft 10s checkpoints). Beta testing You can apply to participate in the beta testing of the Kandinsky Video Lite via the telegram bot. 📑 Todo List - Kandinsky 5.0 Lite Text-to-Video - [x] Multi-GPU Inference code of the 2B models - [ ] Checkpoints 2B models - [x] pretrain - [x] sft - [ ] rl - [x] cfg distil - [x] distil 16 steps - [ ] autoregressive generation - [x] ComfyUI integration - [ ] Diffusers integration - [x] Caching acceleration support - Kandinsky 5.0 Lite Image-to-Video - [ ] Multi-GPU Inference code of the 2B model - [ ] Checkpoints of the 2B model - [ ] ComfyUI integration - [ ] Diffusers integration - Kandinsky 5.0 Pro Text-to-Video - [ ] Multi-GPU Inference code of the models - [ ] Checkpoints of the model - [ ] ComfyUI integration - [ ] Diffusers integration - Kandinsky 5.0 Pro Image-to-Video - [ ] Multi-GPU Inference code of the model - [ ] Checkpoints of the model - [ ] ComfyUI integration - [ ] Diffusers integration - [ ] Technical report Team Leads: Vladimir Arkhipkin, Vladimir Korviakov, Nikolai Gerasimenko, Denis Parkhomenko Core Contributors: Alexey Letunovskiy, Maria Kovaleva, Ivan Kirillov, Lev Novitskiy, Denis Koposov, Dmitrii Mikhailov, Anna Averchenkova, Andrey Shutkin, Julia Agafonova, Olga Kim, Anastasiia Kargapoltseva, Nikita Kiselev Contributors: Anna Dmitrienko, Anastasia Maltseva, Kirill Chernyshev, Ilia Vasiliev, Viacheslav Vasilev, Vladimir Polovnikov, Yury Kolabushin, Alexander Belykh, Mikhail Mamaev, Anastasia Aliaskina, Tatiana Nikulina, Polina Gavrilova

license:apache-2.0

mGPT-1.3B-romanian

mGPT-1.3B-bulgarian

mGPT-1.3B-turkmen

ruclip-vit-large-patch14-224

pollux-judge-32b-r

ruclip-vit-base-patch16-224

mGPT-1.3B-armenian

mGPT-armenian

license:apache-2.0

kandinsky3-diffusers

mGPT-1.3B-chuvash

ruElectra-large

mGPT-1.3B-kirgiz

mGPT-1.3B-belorussian

mGPT-1.3B-tuvan

mGPT-1.3B-ossetian

mGPT-1.3B-mari

kandinsky3_ip_adapter

license:apache-2.0

kandinsky3_controlnet_hed

kandinsky-video-motion-predictor

mGPT-1.3B-yakut

mGPT-1.3B-kalmyk

Real ESRGAN

PyTorch implementation of a Real-ESRGAN model trained on custom dataset. This model shows better results on faces compared to the original version. It is also easier to integrate this model into your projects. Real-ESRGAN is an upgraded ESRGAN trained with pure synthetic data is capable of enhancing details while removing annoying artifacts for common real-world images.

Kandinsky_2.1

license:apache-2.0

Kandinsky_2.0

license:apache-2.0

rudalle-Malevich

Kandinsky3.1

license:apache-2.0

Kandinsky3.0

license:apache-2.0

rudalle-Emojich

rugpt3xl

KandiSuperRes

RUDOLPH-350M

KandinskyVideo

license:apache-2.0

KandinskyVideo_1_1

license:apache-2.0

kandinsky-4-v2a

license:apache-2.0

MoVQGAN

kandinsky-4-t2v-flash

license:apache-2.0

charllama-2.6B

ru-clip

ReadingPipeline-notebooks

paper_persi_chat

scrabblegan-notebooks

tags-generation

Kandinsky-5.0-I2V-Lite-5s

license:apache-2.0

ReadingPipeline-Peter

scrabblegan-peter

mGPT-1.3B-buryat