deepseek-ai

✓ VerifiedAI Startup

Chinese AI research lab with competitive open models

83 models • 32 total models in database
Sort by:

DeepSeek-OCR

--- pipeline_tag: image-text-to-text language: - multilingual tags: - deepseek - vision-language - ocr - custom_code license: mit library_name: transformers ---

3,581,905
2,643

DeepSeek-R1-Distill-Qwen-32B

--- license: mit library_name: transformers ---

NaNK
license:mit
1,715,053
1,467

DeepSeek-R1-Distill-Qwen-1.5B

--- license: mit library_name: transformers ---

NaNK
license:mit
1,004,933
1,378

DeepSeek-R1-Distill-Llama-70B

--- license: mit library_name: transformers ---

NaNK
llama
806,614
730

DeepSeek-R1-Distill-Llama-8B

--- license: mit library_name: transformers ---

NaNK
llama
780,774
822

deepseek-coder-1.3b-instruct

--- license: other license_name: deepseek license_link: LICENSE ---

NaNK
llama
625,841
146

DeepSeek-R1-Distill-Qwen-7B

--- license: mit library_name: transformers ---

NaNK
license:mit
569,684
750

DeepSeek-R1-0528

--- license: mit library_name: transformers ---

license:mit
542,505
2,386

DeepSeek-V3.1

--- license: mit library_name: transformers base_model: - deepseek-ai/DeepSeek-V3.1-Base ---

license:mit
448,639
801

DeepSeek-R1

--- license: mit library_name: transformers ---

license:mit
420,855
12,838

DeepSeek-R1-Distill-Qwen-14B

--- license: mit library_name: transformers ---

NaNK
license:mit
289,943
575

DeepSeek-Coder-V2-Lite-Instruct

--- license: other license_name: deepseek-license license_link: LICENSE ---

278,186
497

DeepSeek-V3-0324

--- license: mit library_name: transformers ---

license:mit
240,699
3,076

DeepSeek-V3

<!-- markdownlint-disable first-line-h1 --> <!-- markdownlint-disable html --> <!-- markdownlint-disable no-duplicate-header -->

174,492
3,995

DeepSeek-OCR-2

license:apache-2.0
143,676
639

DeepSeek-R1-0528-Qwen3-8B

NaNK
license:mit
130,091
976

DeepSeek-V2-Lite

108,898
154

DeepSeek-V2-Lite-Chat

106,923
131

deepseek-vl-1.3b-chat

NaNK
91,738
68

deepseek-llm-7b-chat

[🏠Homepage] | [🤖 Chat with DeepSeek LLM] | [Discord] | [Wechat(微信)] Introducing DeepSeek LLM, an advanced language model comprising 7 billion parameters. It has been trained from scratch on a vast dataset of 2 trillion tokens in both English and Chinese. In order to foster research, we have made DeepSeek LLM 7B/67B Base and DeepSeek LLM 7B/67B Chat open source for the research community. 2. Model Summary `deepseek-llm-7b-chat` is a 7B parameter model initialized from `deepseek-llm-7b-base` and fine-tuned on extra instruction data. - Home Page: DeepSeek - Repository: deepseek-ai/deepseek-LLM - Chat With DeepSeek LLM: DeepSeek-LLM 3. How to Use Here give some examples of how to use our model. Chat Completion Avoiding the use of the provided function `applychattemplate`, you can also interact with our model following the sample template. Note that `messages` should be replaced by your input. Note: By default (`addspecialtokens=True`), our tokenizer automatically adds a `bostoken` (` `) before the input text. Additionally, since the system prompt is not compatible with this version of our models, we DO NOT RECOMMEND including the system prompt in your input. 4. License This code repository is licensed under the MIT License. The use of DeepSeek LLM models is subject to the Model License. DeepSeek LLM supports commercial use. If you have any questions, please raise an issue or contact us at [email protected].

NaNK
llama
91,628
206

DeepSeek-V3.2-Exp

We are excited to announce the official release of DeepSeek-V3.2-Exp, an experimental version of our model. As an intermediate step toward our next-generation architecture, V3.2-Exp builds upon V3....

license:mit
83,786
803

DeepSeek-Coder-V2-Base

74,735
80

deepseek-math-7b-instruct

NaNK
llama
73,484
140

Janus-Pro-7B

NaNK
license:mit
70,598
3,523

deepseek-vl2

63,820
365

deepseek-coder-6.7b-instruct

NaNK
llama
60,436
456

DeepSeek-Prover-V2-7B

<!-- markdownlint-disable first-line-h1 --> <!-- markdownlint-disable html --> <!-- markdownlint-disable no-duplicate-header -->

NaNK
llama
58,889
130

deepseek-vl2-tiny

Introducing DeepSeek-VL2, an advanced series of large Mixture-of-Experts (MoE) Vision-Language Models that significantly improves upon its predecessor, DeepSeek-VL. DeepSeek-VL2 demonstrates superior capabilities across various tasks, including but not limited to visual question answering, optical character recognition, document/table/chart understanding, and visual grounding. Our model series is composed of three variants: DeepSeek-VL2-Tiny, DeepSeek-VL2-Small and DeepSeek-VL2, with 1.0B, 2.8B and 4.5B activated parameters respectively. DeepSeek-VL2 achieves competitive or state-of-the-art performance with similar or fewer activated parameters compared to existing open-source dense and MoE-based models. DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding Zhiyu Wu, Xiaokang Chen, Zizheng Pan, Xingchao Liu, Wen Liu, Damai Dai, Huazuo Gao, Yiyang Ma, Chengyue Wu, Bingxuan Wang, Zhenda Xie, Yu Wu, Kai Hu, Jiawei Wang, Yaofeng Sun, Yukun Li, Yishi Piao, Kang Guan, Aixin Liu, Xin Xie, Yuxiang You, Kai Dong, Xingkai Yu, Haowei Zhang, Liang Zhao, Yisong Wang, Chong Ruan ( Equal Contribution, Project Lead, Corresponding author) DeepSeek-VL2-tiny is built on DeepSeekMoE-3B (total activated parameters are 1.0B). On the basis of `Python >= 3.8` environment, install the necessary dependencies by running the following command: Notifications 1. We suggest to use a temperature T =3 images, we directly pad the images to 384384 as inputs without tiling. 3. The main difference between DeepSeek-VL2-Tiny, DeepSeek-VL2-Small and DeepSeek-VL2 is the base LLM. This code repository is licensed under MIT License. The use of DeepSeek-VL2 models is subject to DeepSeek Model License. DeepSeek-VL2 series supports commercial use. If you have any questions, please raise an issue or contact us at [email protected].

46,125
228

DeepSeek-V3.1-Terminus

This update maintains the model's original capabilities while addressing issues reported by users, including: - Language consistency: Reducing instances of mixed Chinese-English text and occasional abnormal characters; - Agent capabilities: Further optimizing the performance of the Code Agent and Search Agent. | Benchmark | DeepSeek-V3.1 | DeepSeek-V3.1-Terminus | | :--- | :---: | :---: | | Reasoning Mode w/o Tool Use | | | | MMLU-Pro | 84.8 | 85.0 | | GPQA-Diamond | 80.1 | 80.7 | | Humanity's Last Exam | 15.9 | 21.7 | | LiveCodeBench | 74.8 | 74.9 | | Codeforces | 2091 | 2046 | | Aider-Polyglot | 76.3 | 76.1 | | Agentic Tool Use | | | | BrowseComp | 30.0 | 38.5 | | BrowseComp-zh | 49.2 | 45.0 | | SimpleQA | 93.4 | 96.8 | | SWE Verified | 66.0 | 68.4 | | SWE-bench Multilingual | 54.5 | 57.8 | | Terminal-bench | 31.3 | 36.7 | The template and tool-set of search agent have been updated, which is shown in `assets/searchtooltrajectory.html`. The model structure of DeepSeek-V3.1-Terminus is the same as DeepSeek-V3. Please visit DeepSeek-V3 repo for more information about running this model locally. For the model's chat template other than search agent, please refer to the DeepSeek-V3.1 repo. Here we also provide an updated inference demo code in the `inference` folder to help the community get started with running our model and understand the details of model architecture. NOTE: In the current model checkpoint, the parameters of `selfattn.oproj` do not conform to the UE8M0 FP8 scale data format. This is a known issue and will be corrected in future model releases. This repository and the model weights are licensed under the MIT License. If you have any questions, please raise an issue or contact us at [email protected].

license:mit
44,332
345

DeepSeek-V2-Chat

26,840
459

deepseek-coder-33b-instruct

[🏠Homepage] | [🤖 Chat with DeepSeek Coder] | [Discord] | [Wechat(微信)] Deepseek Coder is composed of a series of code language models, each trained from scratch on 2T tokens, with a composition of 87% code and 13% natural language in both English and Chinese. We provide various sizes of the code model, ranging from 1B to 33B versions. Each model is pre-trained on project-level code corpus by employing a window size of 16K and a extra fill-in-the-blank task, to support project-level code completion and infilling. For coding capabilities, Deepseek Coder achieves state-of-the-art performance among open-source code models on multiple programming languages and various benchmarks. - Massive Training Data: Trained from scratch on 2T tokens, including 87% code and 13% linguistic data in both English and Chinese languages. - Highly Flexible & Scalable: Offered in model sizes of 1.3B, 5.7B, 6.7B, and 33B, enabling users to choose the setup most suitable for their requirements. - Superior Model Performance: State-of-the-art performance among publicly available code models on HumanEval, MultiPL-E, MBPP, DS-1000, and APPS benchmarks. - Advanced Code Completion Capabilities: A window size of 16K and a fill-in-the-blank task, supporting project-level code completion and infilling tasks. 2. Model Summary deepseek-coder-33b-instruct is a 33B parameter model initialized from deepseek-coder-33b-base and fine-tuned on 2B tokens of instruction data. - Home Page: DeepSeek - Repository: deepseek-ai/deepseek-coder - Chat With DeepSeek Coder: DeepSeek-Coder 3. How to Use Here give some examples of how to use our model. Chat Model Inference 4. License This code repository is licensed under the MIT License. The use of DeepSeek Coder models is subject to the Model License. DeepSeek Coder supports commercial use. If you have any questions, please raise an issue or contact us at [email protected].

NaNK
llama
20,247
546

deepseek-coder-1.3b-base

NaNK
llama
19,427
103

deepseek-llm-7b-base

[🏠Homepage] | [🤖 Chat with DeepSeek LLM] | [Discord] | [Wechat(微信)] Introducing DeepSeek LLM, an advanced language model comprising 7 billion parameters. It has been trained from scratch on a vast dataset of 2 trillion tokens in both English and Chinese. In order to foster research, we have made DeepSeek LLM 7B/67B Base and DeepSeek LLM 7B/67B Chat open source for the research community. 2. Model Summary `deepseek-llm-7b-base` is a 7B parameter model with Multi-Head Attention trained on 2 trillion tokens from scratch. - Home Page: DeepSeek - Repository: deepseek-ai/deepseek-LLM - Chat With DeepSeek LLM: DeepSeek-LLM 3. How to Use Here give some examples of how to use our model. Text Completion 4. License This code repository is licensed under the MIT License. The use of DeepSeek LLM models is subject to the Model License. DeepSeek LLM supports commercial use. If you have any questions, please raise an issue or contact us at [email protected].

NaNK
llama
15,124
127

Janus-Pro-1B

Janus-Pro is a novel autoregressive framework that unifies multimodal understanding and generation. It addresses the limitations of previous approaches by decoupling visual encoding into separate pathways, while still utilizing a single, unified transformer architecture for processing. The decoupling not only alleviates the conflict between the visual encoder’s roles in understanding and generation, but also enhances the framework’s flexibility. Janus-Pro surpasses previous unified model and matches or exceeds the performance of task-specific models. The simplicity, high flexibility, and effectiveness of Janus-Pro make it a strong candidate for next-generation unified multimodal models. Janus-Pro is a unified understanding and generation MLLM, which decouples visual encoding for multimodal understanding and generation. Janus-Pro is constructed based on the DeepSeek-LLM-1.5b-base/DeepSeek-LLM-7b-base. For multimodal understanding, it uses the SigLIP-L as the vision encoder, which supports 384 x 384 image input. For image generation, Janus-Pro uses the tokenizer from here with a downsample rate of 16. This code repository is licensed under the MIT License. The use of Janus-Pro models is subject to DeepSeek Model License. 5. Citation If you have any questions, please raise an issue or contact us at [email protected].

NaNK
license:mit
12,426
463

deepseek-vl2-small

12,217
167

Janus-1.3B

0. Update 2024.10.20: We have uploaded the correct `tokenizerconfig.json`. The previous file was missing the `padtoken`, which caused poor visual generation results. Janus is a novel autoregressive framework that unifies multimodal understanding and generation. It addresses the limitations of previous approaches by decoupling visual encoding into separate pathways, while still utilizing a single, unified transformer architecture for processing. The decoupling not only alleviates the conflict between the visual encoder’s roles in understanding and generation, but also enhances the framework’s flexibility. Janus surpasses previous unified model and matches or exceeds the performance of task-specific models. The simplicity, high flexibility, and effectiveness of Janus make it a strong candidate for next-generation unified multimodal models. Janus: Decoupling Visual Encoding for Unified Multimodal Understanding and Generation Janus is a unified understanding and generation MLLM, which decouples visual encoding for multimodal understanding and generation. Janus is constructed based on the DeepSeek-LLM-1.3b-base which is trained on an approximate corpus of 500B text tokens. For multimodal understanding, it uses the SigLIP-L as the vision encoder, which supports 384 x 384 image input. For image generation, Janus uses the tokenizer from here with a downsample rate of 16. This code repository is licensed under the MIT License. The use of Janus models is subject to DeepSeek Model License. 5. Citation If you have any questions, please raise an issue or contact us at [email protected].

NaNK
license:mit
11,645
592

deepseek-moe-16b-base

[🏠Homepage] | [🤖 Chat with DeepSeek LLM] | [Discord] | [Wechat(微信)] 1. Introduction to DeepSeekMoE See the Introduction for more details. 2. How to Use Here give some examples of how to use our model. Text Completion 3. License This code repository is licensed under the MIT License. The use of DeepSeekMoE models is subject to the Model License. DeepSeekMoE supports commercial use. If you have any questions, please raise an issue or contact us at [email protected].

NaNK
11,323
130

DeepSeek-V2

Model Download | Evaluation Results | Model Architecture | API Platform | License | Citation DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model 1. Introduction Today, we’re introducing DeepSeek-V2, a strong Mixture-of-Experts (MoE) language model characterized by economical training and efficient inference. It comprises 236B total parameters, of which 21B are activated for each token. Compared with DeepSeek 67B, DeepSeek-V2 achieves stronger performance, and meanwhile saves 42.5% of training costs, reduces the KV cache by 93.3%, and boosts the maximum generation throughput to 5.76 times. We pretrained DeepSeek-V2 on a diverse and high-quality corpus comprising 8.1 trillion tokens. This comprehensive pretraining was followed by a process of Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL) to fully unleash the model's capabilities. The evaluation results validate the effectiveness of our approach as DeepSeek-V2 achieves remarkable performance on both standard benchmarks and open-ended generation evaluation. | Model | Context Length | Download | | :------------: | :------------: | :------------: | | DeepSeek-V2 | 128k | 🤗 HuggingFace | | DeepSeek-V2-Chat (RL) | 128k | 🤗 HuggingFace | Due to the constraints of HuggingFace, the open-source code currently experiences slower performance than our internal codebase when running on GPUs with Huggingface. To facilitate the efficient execution of our model, we offer a dedicated vllm solution that optimizes performance for running our model effectively. 3. Evaluation Results Base Model Standard Benchmark | Benchmark | Domain | LLaMA3 70B | Mixtral 8x22B | DeepSeek-V1 (Dense-67B) | DeepSeek-V2 (MoE-236B) | |:-----------:|:--------:|:------------:|:---------------:|:-------------------------:|:------------------------:| | MMLU | English | 78.9 | 77.6 | 71.3 | 78.5 | | BBH | English | 81.0 | 78.9 | 68.7 | 78.9 | | C-Eval | Chinese | 67.5 | 58.6 | 66.1 | 81.7 | | CMMLU | Chinese | 69.3 | 60.0 | 70.8 | 84.0 | | HumanEval | Code | 48.2 | 53.1 | 45.1 | 48.8 | | MBPP | Code | 68.6 | 64.2 | 57.4 | 66.6 | | GSM8K | Math | 83.0 | 80.3 | 63.4 | 79.2 | | Math | Math | 42.2 | 42.5 | 18.7 | 43.6 | For more evaluation details, such as few-shot settings and prompts, please check our paper. Evaluation results on the ``Needle In A Haystack`` (NIAH) tests. DeepSeek-V2 performs well across all context window lengths up to 128K. | Benchmark | Domain | QWen1.5 72B Chat | Mixtral 8x22B | LLaMA3 70B Instruct | DeepSeek-V1 Chat (SFT) | DeepSeek-V2 Chat (SFT) | DeepSeek-V2 Chat (RL) | |:-----------:|:----------------:|:------------------:|:---------------:|:---------------------:|:-------------:|:-----------------------:|:----------------------:| | MMLU | English | 76.2 | 77.8 | 80.3 | 71.1 | 78.4 | 77.8 | | BBH | English | 65.9 | 78.4 | 80.1 | 71.7 | 81.3 | 79.7 | | C-Eval | Chinese | 82.2 | 60.0 | 67.9 | 65.2 | 80.9 | 78.0 | | CMMLU | Chinese | 82.9 | 61.0 | 70.7 | 67.8 | 82.4 | 81.6 | | HumanEval | Code | 68.9 | 75.0 | 76.2 | 73.8 | 76.8 | 81.1 | | MBPP | Code | 52.2 | 64.4 | 69.8 | 61.4 | 70.4 | 72.0 | | LiveCodeBench (0901-0401) | Code | 18.8 | 25.0 | 30.5 | 18.3 | 28.7 | 32.5 | | GSM8K | Math | 81.9 | 87.9 | 93.2 | 84.1 | 90.8 | 92.2 | | Math | Math | 40.6 | 49.8 | 48.5 | 32.6 | 52.7 | 53.9 | English Open Ended Generation Evaluation We evaluate our model on AlpacaEval 2.0 and MTBench, showing the competitive performance of DeepSeek-V2-Chat-RL on English conversation generation. Chinese Open Ended Generation Evaluation Alignbench (https://arxiv.org/abs/2311.18743) | 模型 | 开源/闭源 | 总分 | 中文推理 | 中文语言 | | :---: | :---: | :---: | :---: | :---: | | gpt-4-1106-preview | 闭源 | 8.01 | 7.73 | 8.29 | | DeepSeek-V2 Chat (RL) | 开源 | 7.91 | 7.45 | 8.35 | | erniebot-4.0-202404 (文心一言) | 闭源 | 7.89 | 7.61 | 8.17 | | DeepSeek-V2 Chat (SFT) | 开源 | 7.74 | 7.30 | 8.17 | | gpt-4-0613 | 闭源 | 7.53 | 7.47 | 7.59 | | erniebot-4.0-202312 (文心一言) | 闭源 | 7.36 | 6.84 | 7.88 | | moonshot-v1-32k-202404 (月之暗面) | 闭源 | 7.22 | 6.42 | 8.02 | | Qwen1.5-72B-Chat (通义千问) | 开源 | 7.19 | 6.45 | 7.93 | | DeepSeek-67B-Chat | 开源 | 6.43 | 5.75 | 7.11 | | Yi-34B-Chat (零一万物) | 开源 | 6.12 | 4.86 | 7.38 | | gpt-3.5-turbo-0613 | 闭源 | 6.08 | 5.35 | 6.71 | Coding Benchmarks We evaluate our model on LiveCodeBench (0901-0401), a benchmark designed for live coding challenges. As illustrated, DeepSeek-V2 demonstrates considerable proficiency in LiveCodeBench, achieving a Pass@1 score that surpasses several other sophisticated models. This performance highlights the model's effectiveness in tackling live coding tasks. 4. Model Architecture DeepSeek-V2 adopts innovative architectures to guarantee economical training and efficient inference: - For attention, we design MLA (Multi-head Latent Attention), which utilizes low-rank key-value union compression to eliminate the bottleneck of inference-time key-value cache, thus supporting efficient inference. - For Feed-Forward Networks (FFNs), we adopt DeepSeekMoE architecture, a high-performance MoE architecture that enables training stronger models at lower costs. 5. Chat Website You can chat with the DeepSeek-V2 on DeepSeek's official website: chat.deepseek.com 6. API Platform We also provide OpenAI-Compatible API at DeepSeek Platform: platform.deepseek.com. Sign up for over millions of free tokens. And you can also pay-as-you-go at an unbeatable price. 7. How to run locally To utilize DeepSeek-V2 in BF16 format for inference, 80GB8 GPUs are required. Inference with Huggingface's Transformers You can directly employ Huggingface's Transformers for model inference. The complete chat template can be found within `tokenizerconfig.json` located in the huggingface model repository. Inference with vLLM (recommended) To utilize vLLM for model inference, please merge this Pull Request into your vLLM codebase: https://github.com/vllm-project/vllm/pull/4650. 8. License This code repository is licensed under the MIT License. The use of DeepSeek-V2 Base/Chat models is subject to the Model License. DeepSeek-V2 series (including Base and Chat) supports commercial use. 10. Contact If you have any questions, please raise an issue or contact us at [email protected].

10,869
326

deepseek-coder-6.7b-base

NaNK
llama
10,116
119

deepseek-llm-67b-base

NaNK
llama
9,228
127

DeepSeek-V3.1-Base

license:mit
8,435
1,002

deepseek-coder-7b-instruct-v1.5

NaNK
llama
7,395
141

deepseek-vl-7b-chat

Introducing DeepSeek-VL, an open-source Vision-Language (VL) Model designed for real-world vision and language understanding applications. DeepSeek-VL possesses general multimodal understanding capabilities, capable of processing logical diagrams, web pages, formula recognition, scientific literature, natural images, and embodied intelligence in complex scenarios. DeepSeek-VL: Towards Real-World Vision-Language Understanding Haoyu Lu, Wen Liu, Bo Zhang, Bingxuan Wang, Kai Dong, Bo Liu, Jingxiang Sun, Tongzheng Ren, Zhuoshu Li, Hao Yang, Yaofeng Sun, Chengqi Deng, Hanwei Xu, Zhenda Xie, Chong Ruan (Equal Contribution, Project Lead) DeepSeek-VL-7b-base uses the SigLIP-L and SAM-B as the hybrid vision encoder supporting 1024 x 1024 image input and is constructed based on the DeepSeek-LLM-7b-base which is trained on an approximate corpus of 2T text tokens. The whole DeepSeek-VL-7b-base model is finally trained around 400B vision-language tokens. DeekSeel-VL-7b-chat is an instructed version based on DeepSeek-VL-7b-base. On the basis of `Python >= 3.8` environment, install the necessary dependencies by running the following command: This code repository is licensed under the MIT License. The use of DeepSeek-VL Base/Chat models is subject to DeepSeek Model License. DeepSeek-VL series (including Base and Chat) supports commercial use. If you have any questions, please raise an issue or contact us at [email protected].

NaNK
5,393
261

DeepSeek-V3-Base

5,055
1,682

deepseek-moe-16b-chat

[🏠Homepage] | [🤖 Chat with DeepSeek LLM] | [Discord] | [Wechat(微信)] 1. Introduction to DeepSeekMoE See the Introduction for more details. 2. How to Use Here give some examples of how to use our model. Avoiding the use of the provided function `applychattemplate`, you can also interact with our model following the sample template. Note that `messages` should be replaced by your input. Note: By default (`addspecialtokens=True`), our tokenizer automatically adds a `bostoken` (` `) before the input text. Additionally, since the system prompt is not compatible with this version of our models, we DO NOT RECOMMEND including the system prompt in your input. 3. License This code repository is licensed under the MIT License. The use of DeepSeekMoE models is subject to the Model License. DeepSeekMoE supports commercial use. If you have any questions, please raise an issue or contact us at [email protected].

NaNK
4,909
149

DeepSeek-R1-Zero

license:mit
4,821
937

deepseek-coder-7b-base-v1.5

NaNK
llama
4,550
46

DeepSeek-Coder-V2-Lite-Base

DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence 1. Introduction We present DeepSeek-Coder-V2, an open-source Mixture-of-Experts (MoE) code language model that achieves performance comparable to GPT4-Turbo in code-specific tasks. Specifically, DeepSeek-Coder-V2 is further pre-trained from an intermediate checkpoint of DeepSeek-V2 with additional 6 trillion tokens. Through this continued pre-training, DeepSeek-Coder-V2 substantially enhances the coding and mathematical reasoning capabilities of DeepSeek-V2, while maintaining comparable performance in general language tasks. Compared to DeepSeek-Coder-33B, DeepSeek-Coder-V2 demonstrates significant advancements in various aspects of code-related tasks, as well as reasoning and general capabilities. Additionally, DeepSeek-Coder-V2 expands its support for programming languages from 86 to 338, while extending the context length from 16K to 128K. In standard benchmark evaluations, DeepSeek-Coder-V2 achieves superior performance compared to closed-source models such as GPT4-Turbo, Claude 3 Opus, and Gemini 1.5 Pro in coding and math benchmarks. The list of supported programming languages can be found here. We release the DeepSeek-Coder-V2 with 16B and 236B parameters based on the DeepSeekMoE framework, which has actived parameters of only 2.4B and 21B , including base and instruct models, to the public. | Model | #Total Params | #Active Params | Context Length | Download | | :-----------------------------: | :---------------: | :----------------: | :----------------: | :----------------------------------------------------------: | | DeepSeek-Coder-V2-Lite-Base | 16B | 2.4B | 128k | 🤗 HuggingFace | | DeepSeek-Coder-V2-Lite-Instruct | 16B | 2.4B | 128k | 🤗 HuggingFace | | DeepSeek-Coder-V2-Base | 236B | 21B | 128k | 🤗 HuggingFace | | DeepSeek-Coder-V2-Instruct | 236B | 21B | 128k | 🤗 HuggingFace | You can chat with the DeepSeek-Coder-V2 on DeepSeek's official website: coder.deepseek.com 4. API Platform We also provide OpenAI-Compatible API at DeepSeek Platform: platform.deepseek.com, and you can also pay-as-you-go at an unbeatable price. 5. How to run locally Here, we provide some examples of how to use DeepSeek-Coder-V2-Lite model. If you want to utilize DeepSeek-Coder-V2 in BF16 format for inference, 80GB8 GPUs are required. Inference with Huggingface's Transformers You can directly employ Huggingface's Transformers for model inference. The complete chat template can be found within `tokenizerconfig.json` located in the huggingface model repository. Inference with vLLM (recommended) To utilize vLLM for model inference, please merge this Pull Request into your vLLM codebase: https://github.com/vllm-project/vllm/pull/4650. This code repository is licensed under the MIT License. The use of DeepSeek-Coder-V2 Base/Instruct models is subject to the Model License. DeepSeek-Coder-V2 series (including Base and Instruct) supports commercial use. 7. Contact If you have any questions, please raise an issue or contact us at [email protected].

2,941
96

DeepSeek-Coder-V2-Instruct

DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence 1. Introduction We present DeepSeek-Coder-V2, an open-source Mixture-of-Experts (MoE) code language model that achieves performance comparable to GPT4-Turbo in code-specific tasks. Specifically, DeepSeek-Coder-V2 is further pre-trained from an intermediate checkpoint of DeepSeek-V2 with additional 6 trillion tokens. Through this continued pre-training, DeepSeek-Coder-V2 substantially enhances the coding and mathematical reasoning capabilities of DeepSeek-V2, while maintaining comparable performance in general language tasks. Compared to DeepSeek-Coder-33B, DeepSeek-Coder-V2 demonstrates significant advancements in various aspects of code-related tasks, as well as reasoning and general capabilities. Additionally, DeepSeek-Coder-V2 expands its support for programming languages from 86 to 338, while extending the context length from 16K to 128K. In standard benchmark evaluations, DeepSeek-Coder-V2 achieves superior performance compared to closed-source models such as GPT4-Turbo, Claude 3 Opus, and Gemini 1.5 Pro in coding and math benchmarks. The list of supported programming languages can be found here. We release the DeepSeek-Coder-V2 with 16B and 236B parameters based on the DeepSeekMoE framework, which has actived parameters of only 2.4B and 21B , including base and instruct models, to the public. | Model | #Total Params | #Active Params | Context Length | Download | | :-----------------------------: | :---------------: | :----------------: | :----------------: | :----------------------------------------------------------: | | DeepSeek-Coder-V2-Lite-Base | 16B | 2.4B | 128k | 🤗 HuggingFace | | DeepSeek-Coder-V2-Lite-Instruct | 16B | 2.4B | 128k | 🤗 HuggingFace | | DeepSeek-Coder-V2-Base | 236B | 21B | 128k | 🤗 HuggingFace | | DeepSeek-Coder-V2-Instruct | 236B | 21B | 128k | 🤗 HuggingFace | You can chat with the DeepSeek-Coder-V2 on DeepSeek's official website: coder.deepseek.com 4. API Platform We also provide OpenAI-Compatible API at DeepSeek Platform: platform.deepseek.com, and you can also pay-as-you-go at an unbeatable price. 5. How to run locally Here, we provide some examples of how to use DeepSeek-Coder-V2-Lite model. If you want to utilize DeepSeek-Coder-V2 in BF16 format for inference, 80GB8 GPUs are required. Inference with Huggingface's Transformers You can directly employ Huggingface's Transformers for model inference. The complete chat template can be found within `tokenizerconfig.json` located in the huggingface model repository. Inference with vLLM (recommended) To utilize vLLM for model inference, please merge this Pull Request into your vLLM codebase: https://github.com/vllm-project/vllm/pull/4650. This code repository is licensed under the MIT License. The use of DeepSeek-Coder-V2 Base/Instruct models is subject to the Model License. DeepSeek-Coder-V2 series (including Base and Instruct) supports commercial use. 7. Contact If you have any questions, please raise an issue or contact us at [email protected].

2,623
668

DeepSeek-Prover-V1.5-RL

llama
2,160
64

deepseek-math-7b-base

NaNK
llama
2,027
78

deepseek-vl-1.3b-base

NaNK
1,588
55

deepseek-llm-67b-chat

[🏠Homepage] | [🤖 Chat with DeepSeek LLM] | [Discord] | [Wechat(微信)] Introducing DeepSeek LLM, an advanced language model comprising 67 billion parameters. It has been trained from scratch on a vast dataset of 2 trillion tokens in both English and Chinese. In order to foster research, we have made DeepSeek LLM 7B/67B Base and DeepSeek LLM 7B/67B Chat open source for the research community. 2. Model Summary `deepseek-llm-67b-chat` is a 67B parameter model initialized from `deepseek-llm-67b-base` and fine-tuned on extra instruction data. - Home Page: DeepSeek - Repository: deepseek-ai/deepseek-LLM - Chat With DeepSeek LLM: DeepSeek-LLM 3. How to Use Here give some examples of how to use our model. Chat Completion Avoiding the use of the provided function `applychattemplate`, you can also interact with our model following the sample template. Note that `messages` should be replaced by your input. Note: By default (`addspecialtokens=True`), our tokenizer automatically adds a `bostoken` (` `) before the input text. Additionally, since the system prompt is not compatible with this version of our models, we DO NOT RECOMMEND including the system prompt in your input. 4. License This code repository is licensed under the MIT License. The use of DeepSeek LLM models is subject to the Model License. DeepSeek LLM supports commercial use. If you have any questions, please raise an issue or contact us at [email protected].

NaNK
llama
1,411
204

deepseek-coder-33b-base

NaNK
llama
1,393
73

DeepSeek-V2.5

1,211
731

deepseek-math-7b-rl

NaNK
llama
1,024
85

DeepSeek Coder V2 Instruct 0724

DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence 1. Introduction We present DeepSeek-Coder-V2, an open-source Mixture-of-Experts (MoE) code language model that achieves performance comparable to GPT4-Turbo in code-specific tasks. Specifically, DeepSeek-Coder-V2 is further pre-trained from an intermediate checkpoint of DeepSeek-V2 with additional 6 trillion tokens. Through this continued pre-training, DeepSeek-Coder-V2 substantially enhances the coding and mathematical reasoning capabilities of DeepSeek-V2, while maintaining comparable performance in general language tasks. Compared to DeepSeek-Coder-33B, DeepSeek-Coder-V2 demonstrates significant advancements in various aspects of code-related tasks, as well as reasoning and general capabilities. Additionally, DeepSeek-Coder-V2 expands its support for programming languages from 86 to 338, while extending the context length from 16K to 128K. In standard benchmark evaluations, DeepSeek-Coder-V2 achieves superior performance compared to closed-source models such as GPT4-Turbo, Claude 3 Opus, and Gemini 1.5 Pro in coding and math benchmarks. The list of supported programming languages can be found here. We release the DeepSeek-Coder-V2 with 16B and 236B parameters based on the DeepSeekMoE framework, which has actived parameters of only 2.4B and 21B , including base and instruct models, to the public. | Model | #Total Params | #Active Params | Context Length | Download | | :-----------------------------: | :---------------: | :----------------: | :----------------: | :----------------------------------------------------------: | | DeepSeek-Coder-V2-Lite-Base | 16B | 2.4B | 128k | 🤗 HuggingFace | | DeepSeek-Coder-V2-Lite-Instruct | 16B | 2.4B | 128k | 🤗 HuggingFace | | DeepSeek-Coder-V2-Base | 236B | 21B | 128k | 🤗 HuggingFace | | DeepSeek-Coder-V2-Instruct | 236B | 21B | 128k | 🤗 HuggingFace | | DeepSeek-Coder-V2-Instruct-0724 | 236B | 21B | 128k | 🤗 HuggingFace | You can chat with the DeepSeek-Coder-V2 on DeepSeek's official website: coder.deepseek.com 4. API Platform We also provide OpenAI-Compatible API at DeepSeek Platform: platform.deepseek.com, and you can also pay-as-you-go at an unbeatable price. 5. How to run locally Here, we provide some examples of how to use DeepSeek-Coder-V2-Lite model. If you want to utilize DeepSeek-Coder-V2 in BF16 format for inference, 80GB8 GPUs are required. Inference with Huggingface's Transformers You can directly employ Huggingface's Transformers for model inference. The complete chat template can be found within `tokenizerconfig.json` located in the huggingface model repository. Inference with vLLM (recommended) To utilize vLLM for model inference, please merge this Pull Request into your vLLM codebase: https://github.com/vllm-project/vllm/pull/4650. Function calling allows the model to call external tools to enhance its capabilities. json { "name": "getcurrentweather", "description": "Get the current weather in a given location", "parameters": { "type": "object", "properties": { "location": { "type": "string", "description": "The city and state, e.g. San Francisco, CA" }, "unit": { "type": "string", "enum": [ "celsius", "fahrenheit" ] } }, "required": [ "location" ] } } json\n{"location": "Tokyo"}\njson\n{"location": "Paris"}\n You can use JSON Output Mode to ensure the model generates a valid JSON object. To active this mode, a special instruction should be appended to your system prompt. json\n{\n "question": "Which is the highest mountain in the world?",\n "answer": "Mount Everest."\n}\n In FIM (Fill In the Middle) completion, you can provide a prefix and an optional suffix, and the model will complete the content in between. This code repository is licensed under the MIT License. The use of DeepSeek-Coder-V2 Base/Instruct models is subject to the Model License. DeepSeek-Coder-V2 series (including Base and Instruct) supports commercial use. 7. Contact If you have any questions, please raise an issue or contact us at [email protected].

NaNK
727
113

DeepSeek Prover V2 671B

We introduce DeepSeek-Prover-V2, an open-source large language model designed for formal theorem proving in Lean 4, with initialization data collected through a recursive theorem proving pipeline powered by DeepSeek-V3. The cold-start training procedure begins by prompting DeepSeek-V3 to decompose complex problems into a series of subgoals. The proofs of resolved subgoals are synthesized into a chain-of-thought process, combined with DeepSeek-V3's step-by-step reasoning, to create an initial cold start for reinforcement learning. This process enables us to integrate both informal and formal mathematical reasoning into a unified model. Synthesize Cold-Start Reasoning Data through Recursive Proof Search - To construct the cold-start dataset, we develop a simple yet effective pipeline for recursive theorem proving, utilizing DeepSeek-V3 as a unified tool for both subgoal decomposition and formalization. We prompt DeepSeek-V3 to decompose theorems into high-level proof sketches while simultaneously formalizing these proof steps in Lean 4, resulting in a sequence of subgoals. - We use a smaller 7B model to handle the proof search for each subgoal, thereby reducing the associated computational burden. Once the decomposed steps of a challenging problem are resolved, we pair the complete step-by-step formal proof with the corresponding chain-of-thought from DeepSeek-V3 to create cold-start reasoning data. Reinforcement Learning with Synthetic Cold-Start Data - We curate a subset of challenging problems that remain unsolved by the 7B prover model in an end-to-end manner, but for which all decomposed subgoals have been successfully resolved. By composing the proofs of all subgoals, we construct a complete formal proof for the original problem. This proof is then appended to DeepSeek-V3's chain-of-thought, which outlines the corresponding lemma decomposition, thereby producing a cohesive synthesis of informal reasoning and subsequent formalization. - After fine-tuning the prover model on the synthetic cold-start data, we perform a reinforcement learning stage to further enhance its ability to bridge informal reasoning with formal proof construction. Following the standard training objective for reasoning models, we use binary correct-or-incorrect feedback as the primary form of reward supervision. - The resulting model, DeepSeek-Prover-V2-671B, achieves state-of-the-art performance in neural theorem proving, reaching $88.9$% pass ratio on the MiniF2F-test and solving 49 out of 658 problems from PutnamBench. The proofs generated by DeepSeek-Prover-V2 for the miniF2F dataset are available for download as a ZIP archive. 3. ProverBench: Formalization of AIME and Textbook Problems we introduce ProverBench, a benchmark dataset comprising 325 problems. Of these, 15 are formalized from number theory and algebra questions featured in the recent AIME competitions (AIME 24 and 25), offering authentic high-school competition-level challenges. The remaining 310 problems are drawn from curated textbook examples and educational tutorials, contributing a diverse and pedagogically grounded collection of formalized mathematical problems. This benchmark is designed to enable more comprehensive evaluation across both high-school competition problems and undergraduate-level mathematics. | Area | Count | | :---------------------: | :-------: | | AIME 24&25 | 15 | | Number Theory | 40 | | Elementary Algebra | 30 | | Linear Algebra | 50 | | Abstract Algebra | 40 | | Calculus | 90 | | Real Analysis | 30 | | Complex Analysis | 10 | | Functional Analysis | 10 | | Probability | 10 | | Total | 325 | We release DeepSeek-Prover-V2 in two model sizes: 7B and 671B parameters. DeepSeek-Prover-V2-671B is trained on top of DeepSeek-V3-Base. DeepSeek-Prover-V2-7B is built upon DeepSeek-Prover-V1.5-Base and features an extended context length of up to 32K tokens. | Model | Download | | :-----------------------------: | :----------------------------------------------------------: | | DeepSeek-Prover-V2-7B | 🤗 HuggingFace | | DeepSeek-Prover-V2-671B | 🤗 HuggingFace | | Dataset | Download | | :-----------------------------: | :----------------------------------------------------------: | | DeepSeek-ProverBench | 🤗 HuggingFace | You can directly use Huggingface's Transformers for model inference. DeepSeek-Prover-V2-671B shares the same architecture as DeepSeek-V3. For detailed information and supported features, please refer to the DeepSeek-V3 documentation on Hugging Face. The following is a basic example of generating a proof for a problem from the miniF2F dataset: lean4 {} ` 6. License The use of DeepSeek-Prover-V2 models is subject to the Model License. If you have any questions, please raise an issue or contact us at [email protected].

NaNK
680
812

DeepSeek V3.2 Exp Base

license:mit
616
40

deepseek-vl-7b-base

NaNK
446
63

ESFT-vanilla-lite

393
19

DeepSeek-V2.5-1210

DeepSeek-V2.5-1210 is an upgraded version of DeepSeek-V2.5, with improvements across various capabilities: - Mathematical: Performance on the MATH-500 benchmark has improved from 74.8% to 82.8% . - Coding: Accuracy on the LiveCodebench (08.01 - 12.01) benchmark has increased from 29.2% to 34.38% . - Writing and Reasoning: Corresponding improvements have been observed in internal test datasets. Additionally, the new version of the model has optimized the user experience for file upload and webpage summarization functionalities. To utilize DeepSeek-V2.5 in BF16 format for inference, 80GB8 GPUs are required. Inference with Huggingface's Transformers You can directly employ Huggingface's Transformers for model inference. The complete chat template can be found within `tokenizerconfig.json` located in the huggingface model repository. Note: The chat template has been updated compared to the previous DeepSeek-V2-Chat version. Inference with vLLM (recommended) To utilize vLLM for model inference, please merge this Pull Request into your vLLM codebase: https://github.com/vllm-project/vllm/pull/4650. Function calling allows the model to call external tools to enhance its capabilities. json { "name": "getcurrentweather", "description": "Get the current weather in a given location", "parameters": { "type": "object", "properties": { "location": { "type": "string", "description": "The city and state, e.g. San Francisco, CA" }, "unit": { "type": "string", "enum": [ "celsius", "fahrenheit" ] } }, "required": [ "location" ] } } json\n{"location": "Tokyo"}\njson\n{"location": "Paris"}\n You can use JSON Output Mode to ensure the model generates a valid JSON object. To active this mode, a special instruction should be appended to your system prompt. json\n{\n "question": "Which is the highest mountain in the world?",\n "answer": "Mount Everest."\n}\n In FIM (Fill In the Middle) completion, you can provide a prefix and an optional suffix, and the model will complete the content in between. 3. License This code repository is licensed under the MIT License. The use of DeepSeek-V2 Base/Chat models is subject to the Model License. DeepSeek-V2 series (including Base and Chat) supports commercial use. 5. Contact If you have any questions, please raise an issue or contact us at [email protected].

340
256

DeepSeek-V2-Chat-0628

289
177

JanusFlow-1.3B

We present JanusFlow, a powerful framework that unifies image understanding and generation in a single model. JanusFlow introduces a minimalist architecture that integrates autoregressive language models with rectified flow, a state-of-the-art method in generative modeling. Our key finding demonstrates that rectified flow can be straightforwardly trained within the large language model framework, eliminating the need for complex architectural modifications. JanusFlow: Harmonizing Autoregression and Rectified Flow for Unified Multimodal Understanding and Generation JanusFlow is a unified understanding and generation MLLM, which decouples visual encoding for multimodal understanding and generation, which is constructed based on DeepSeek-LLM-1.3b-base. For multimodal understanding, it uses the SigLIP-L as the vision encoder, which supports 384 x 384 image input. For image generation, JanusFlow uses rectified flow and SDXL-VAE to generate 384 x 384 images. The provided checkpoint is the EMA checkpoint after pre-training and supervised fine-tuning. This code repository is licensed under the MIT License. The use of JanusFlow models is subject to DeepSeek Model License. If you have any questions, please raise an issue or contact us at [email protected].

NaNK
license:mit
284
151

DeepSeek-Prover-V1.5-SFT

NaNK
llama
274
13

ESFT-token-translation-lite

220
3

DeepSeek-Prover-V1.5-Base

llama
167
18

ESFT-gate-translation-lite

165
4

ESFT-token-law-lite

154
6

ESFT-token-summary-lite

147
4

ESFT-token-code-lite

146
4

ESFT-gate-law-lite

145
4

ESFT-token-intent-lite

144
3

ESFT-gate-code-lite

144
3

ESFT-gate-summary-lite

143
3

ESFT-gate-intent-lite

142
4

ESFT-token-math-lite

141
4

ESFT-gate-math-lite

140
3

DeepSeek-Prover-V1

llama
95
11

deepseek-coder-5.7bmqa-base

NaNK
llama
88
10

DeepSeek-V3.2

license:mit
0
40

DeepSeek-V3.2-Speciale

license:mit
0
24

DeepSeek-Math-V2

NaNK
license:apache-2.0
0
10