deepseek-ai
✓ VerifiedAI StartupChinese AI research lab with competitive open models
DeepSeek-OCR
--- pipeline_tag: image-text-to-text language: - multilingual tags: - deepseek - vision-language - ocr - custom_code license: mit library_name: transformers ---
DeepSeek-R1-Distill-Qwen-32B
--- license: mit library_name: transformers ---
DeepSeek-R1-Distill-Qwen-1.5B
--- license: mit library_name: transformers ---
DeepSeek-R1-Distill-Llama-70B
--- license: mit library_name: transformers ---
DeepSeek-R1-Distill-Llama-8B
--- license: mit library_name: transformers ---
deepseek-coder-1.3b-instruct
--- license: other license_name: deepseek license_link: LICENSE ---
DeepSeek-R1-Distill-Qwen-7B
--- license: mit library_name: transformers ---
DeepSeek-R1-0528
--- license: mit library_name: transformers ---
DeepSeek-V3.1
--- license: mit library_name: transformers base_model: - deepseek-ai/DeepSeek-V3.1-Base ---
DeepSeek-R1
--- license: mit library_name: transformers ---
DeepSeek-R1-Distill-Qwen-14B
--- license: mit library_name: transformers ---
DeepSeek-Coder-V2-Lite-Instruct
--- license: other license_name: deepseek-license license_link: LICENSE ---
DeepSeek-V3-0324
--- license: mit library_name: transformers ---
DeepSeek-V3
<!-- markdownlint-disable first-line-h1 --> <!-- markdownlint-disable html --> <!-- markdownlint-disable no-duplicate-header -->
DeepSeek-OCR-2
DeepSeek-R1-0528-Qwen3-8B
DeepSeek-V2-Lite
DeepSeek-V2-Lite-Chat
deepseek-vl-1.3b-chat
deepseek-llm-7b-chat
[🏠Homepage] | [🤖 Chat with DeepSeek LLM] | [Discord] | [Wechat(微信)] Introducing DeepSeek LLM, an advanced language model comprising 7 billion parameters. It has been trained from scratch on a vast dataset of 2 trillion tokens in both English and Chinese. In order to foster research, we have made DeepSeek LLM 7B/67B Base and DeepSeek LLM 7B/67B Chat open source for the research community. 2. Model Summary `deepseek-llm-7b-chat` is a 7B parameter model initialized from `deepseek-llm-7b-base` and fine-tuned on extra instruction data. - Home Page: DeepSeek - Repository: deepseek-ai/deepseek-LLM - Chat With DeepSeek LLM: DeepSeek-LLM 3. How to Use Here give some examples of how to use our model. Chat Completion Avoiding the use of the provided function `applychattemplate`, you can also interact with our model following the sample template. Note that `messages` should be replaced by your input. Note: By default (`addspecialtokens=True`), our tokenizer automatically adds a `bostoken` (` `) before the input text. Additionally, since the system prompt is not compatible with this version of our models, we DO NOT RECOMMEND including the system prompt in your input. 4. License This code repository is licensed under the MIT License. The use of DeepSeek LLM models is subject to the Model License. DeepSeek LLM supports commercial use. If you have any questions, please raise an issue or contact us at [email protected].
DeepSeek-V3.2-Exp
We are excited to announce the official release of DeepSeek-V3.2-Exp, an experimental version of our model. As an intermediate step toward our next-generation architecture, V3.2-Exp builds upon V3....
DeepSeek-Coder-V2-Base
deepseek-math-7b-instruct
Janus-Pro-7B
deepseek-vl2
deepseek-coder-6.7b-instruct
DeepSeek-Prover-V2-7B
<!-- markdownlint-disable first-line-h1 --> <!-- markdownlint-disable html --> <!-- markdownlint-disable no-duplicate-header -->
deepseek-vl2-tiny
Introducing DeepSeek-VL2, an advanced series of large Mixture-of-Experts (MoE) Vision-Language Models that significantly improves upon its predecessor, DeepSeek-VL. DeepSeek-VL2 demonstrates superior capabilities across various tasks, including but not limited to visual question answering, optical character recognition, document/table/chart understanding, and visual grounding. Our model series is composed of three variants: DeepSeek-VL2-Tiny, DeepSeek-VL2-Small and DeepSeek-VL2, with 1.0B, 2.8B and 4.5B activated parameters respectively. DeepSeek-VL2 achieves competitive or state-of-the-art performance with similar or fewer activated parameters compared to existing open-source dense and MoE-based models. DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding Zhiyu Wu, Xiaokang Chen, Zizheng Pan, Xingchao Liu, Wen Liu, Damai Dai, Huazuo Gao, Yiyang Ma, Chengyue Wu, Bingxuan Wang, Zhenda Xie, Yu Wu, Kai Hu, Jiawei Wang, Yaofeng Sun, Yukun Li, Yishi Piao, Kang Guan, Aixin Liu, Xin Xie, Yuxiang You, Kai Dong, Xingkai Yu, Haowei Zhang, Liang Zhao, Yisong Wang, Chong Ruan ( Equal Contribution, Project Lead, Corresponding author) DeepSeek-VL2-tiny is built on DeepSeekMoE-3B (total activated parameters are 1.0B). On the basis of `Python >= 3.8` environment, install the necessary dependencies by running the following command: Notifications 1. We suggest to use a temperature T =3 images, we directly pad the images to 384384 as inputs without tiling. 3. The main difference between DeepSeek-VL2-Tiny, DeepSeek-VL2-Small and DeepSeek-VL2 is the base LLM. This code repository is licensed under MIT License. The use of DeepSeek-VL2 models is subject to DeepSeek Model License. DeepSeek-VL2 series supports commercial use. If you have any questions, please raise an issue or contact us at [email protected].
DeepSeek-V3.1-Terminus
This update maintains the model's original capabilities while addressing issues reported by users, including: - Language consistency: Reducing instances of mixed Chinese-English text and occasional abnormal characters; - Agent capabilities: Further optimizing the performance of the Code Agent and Search Agent. | Benchmark | DeepSeek-V3.1 | DeepSeek-V3.1-Terminus | | :--- | :---: | :---: | | Reasoning Mode w/o Tool Use | | | | MMLU-Pro | 84.8 | 85.0 | | GPQA-Diamond | 80.1 | 80.7 | | Humanity's Last Exam | 15.9 | 21.7 | | LiveCodeBench | 74.8 | 74.9 | | Codeforces | 2091 | 2046 | | Aider-Polyglot | 76.3 | 76.1 | | Agentic Tool Use | | | | BrowseComp | 30.0 | 38.5 | | BrowseComp-zh | 49.2 | 45.0 | | SimpleQA | 93.4 | 96.8 | | SWE Verified | 66.0 | 68.4 | | SWE-bench Multilingual | 54.5 | 57.8 | | Terminal-bench | 31.3 | 36.7 | The template and tool-set of search agent have been updated, which is shown in `assets/searchtooltrajectory.html`. The model structure of DeepSeek-V3.1-Terminus is the same as DeepSeek-V3. Please visit DeepSeek-V3 repo for more information about running this model locally. For the model's chat template other than search agent, please refer to the DeepSeek-V3.1 repo. Here we also provide an updated inference demo code in the `inference` folder to help the community get started with running our model and understand the details of model architecture. NOTE: In the current model checkpoint, the parameters of `selfattn.oproj` do not conform to the UE8M0 FP8 scale data format. This is a known issue and will be corrected in future model releases. This repository and the model weights are licensed under the MIT License. If you have any questions, please raise an issue or contact us at [email protected].
DeepSeek-V2-Chat
deepseek-coder-33b-instruct
[🏠Homepage] | [🤖 Chat with DeepSeek Coder] | [Discord] | [Wechat(微信)] Deepseek Coder is composed of a series of code language models, each trained from scratch on 2T tokens, with a composition of 87% code and 13% natural language in both English and Chinese. We provide various sizes of the code model, ranging from 1B to 33B versions. Each model is pre-trained on project-level code corpus by employing a window size of 16K and a extra fill-in-the-blank task, to support project-level code completion and infilling. For coding capabilities, Deepseek Coder achieves state-of-the-art performance among open-source code models on multiple programming languages and various benchmarks. - Massive Training Data: Trained from scratch on 2T tokens, including 87% code and 13% linguistic data in both English and Chinese languages. - Highly Flexible & Scalable: Offered in model sizes of 1.3B, 5.7B, 6.7B, and 33B, enabling users to choose the setup most suitable for their requirements. - Superior Model Performance: State-of-the-art performance among publicly available code models on HumanEval, MultiPL-E, MBPP, DS-1000, and APPS benchmarks. - Advanced Code Completion Capabilities: A window size of 16K and a fill-in-the-blank task, supporting project-level code completion and infilling tasks. 2. Model Summary deepseek-coder-33b-instruct is a 33B parameter model initialized from deepseek-coder-33b-base and fine-tuned on 2B tokens of instruction data. - Home Page: DeepSeek - Repository: deepseek-ai/deepseek-coder - Chat With DeepSeek Coder: DeepSeek-Coder 3. How to Use Here give some examples of how to use our model. Chat Model Inference 4. License This code repository is licensed under the MIT License. The use of DeepSeek Coder models is subject to the Model License. DeepSeek Coder supports commercial use. If you have any questions, please raise an issue or contact us at [email protected].
deepseek-coder-1.3b-base
deepseek-llm-7b-base
[🏠Homepage] | [🤖 Chat with DeepSeek LLM] | [Discord] | [Wechat(微信)] Introducing DeepSeek LLM, an advanced language model comprising 7 billion parameters. It has been trained from scratch on a vast dataset of 2 trillion tokens in both English and Chinese. In order to foster research, we have made DeepSeek LLM 7B/67B Base and DeepSeek LLM 7B/67B Chat open source for the research community. 2. Model Summary `deepseek-llm-7b-base` is a 7B parameter model with Multi-Head Attention trained on 2 trillion tokens from scratch. - Home Page: DeepSeek - Repository: deepseek-ai/deepseek-LLM - Chat With DeepSeek LLM: DeepSeek-LLM 3. How to Use Here give some examples of how to use our model. Text Completion 4. License This code repository is licensed under the MIT License. The use of DeepSeek LLM models is subject to the Model License. DeepSeek LLM supports commercial use. If you have any questions, please raise an issue or contact us at [email protected].
Janus-Pro-1B
Janus-Pro is a novel autoregressive framework that unifies multimodal understanding and generation. It addresses the limitations of previous approaches by decoupling visual encoding into separate pathways, while still utilizing a single, unified transformer architecture for processing. The decoupling not only alleviates the conflict between the visual encoder’s roles in understanding and generation, but also enhances the framework’s flexibility. Janus-Pro surpasses previous unified model and matches or exceeds the performance of task-specific models. The simplicity, high flexibility, and effectiveness of Janus-Pro make it a strong candidate for next-generation unified multimodal models. Janus-Pro is a unified understanding and generation MLLM, which decouples visual encoding for multimodal understanding and generation. Janus-Pro is constructed based on the DeepSeek-LLM-1.5b-base/DeepSeek-LLM-7b-base. For multimodal understanding, it uses the SigLIP-L as the vision encoder, which supports 384 x 384 image input. For image generation, Janus-Pro uses the tokenizer from here with a downsample rate of 16. This code repository is licensed under the MIT License. The use of Janus-Pro models is subject to DeepSeek Model License. 5. Citation If you have any questions, please raise an issue or contact us at [email protected].
deepseek-vl2-small
Janus-1.3B
0. Update 2024.10.20: We have uploaded the correct `tokenizerconfig.json`. The previous file was missing the `padtoken`, which caused poor visual generation results. Janus is a novel autoregressive framework that unifies multimodal understanding and generation. It addresses the limitations of previous approaches by decoupling visual encoding into separate pathways, while still utilizing a single, unified transformer architecture for processing. The decoupling not only alleviates the conflict between the visual encoder’s roles in understanding and generation, but also enhances the framework’s flexibility. Janus surpasses previous unified model and matches or exceeds the performance of task-specific models. The simplicity, high flexibility, and effectiveness of Janus make it a strong candidate for next-generation unified multimodal models. Janus: Decoupling Visual Encoding for Unified Multimodal Understanding and Generation Janus is a unified understanding and generation MLLM, which decouples visual encoding for multimodal understanding and generation. Janus is constructed based on the DeepSeek-LLM-1.3b-base which is trained on an approximate corpus of 500B text tokens. For multimodal understanding, it uses the SigLIP-L as the vision encoder, which supports 384 x 384 image input. For image generation, Janus uses the tokenizer from here with a downsample rate of 16. This code repository is licensed under the MIT License. The use of Janus models is subject to DeepSeek Model License. 5. Citation If you have any questions, please raise an issue or contact us at [email protected].
deepseek-moe-16b-base
[🏠Homepage] | [🤖 Chat with DeepSeek LLM] | [Discord] | [Wechat(微信)] 1. Introduction to DeepSeekMoE See the Introduction for more details. 2. How to Use Here give some examples of how to use our model. Text Completion 3. License This code repository is licensed under the MIT License. The use of DeepSeekMoE models is subject to the Model License. DeepSeekMoE supports commercial use. If you have any questions, please raise an issue or contact us at [email protected].
DeepSeek-V2
Model Download | Evaluation Results | Model Architecture | API Platform | License | Citation DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model 1. Introduction Today, we’re introducing DeepSeek-V2, a strong Mixture-of-Experts (MoE) language model characterized by economical training and efficient inference. It comprises 236B total parameters, of which 21B are activated for each token. Compared with DeepSeek 67B, DeepSeek-V2 achieves stronger performance, and meanwhile saves 42.5% of training costs, reduces the KV cache by 93.3%, and boosts the maximum generation throughput to 5.76 times. We pretrained DeepSeek-V2 on a diverse and high-quality corpus comprising 8.1 trillion tokens. This comprehensive pretraining was followed by a process of Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL) to fully unleash the model's capabilities. The evaluation results validate the effectiveness of our approach as DeepSeek-V2 achieves remarkable performance on both standard benchmarks and open-ended generation evaluation. | Model | Context Length | Download | | :------------: | :------------: | :------------: | | DeepSeek-V2 | 128k | 🤗 HuggingFace | | DeepSeek-V2-Chat (RL) | 128k | 🤗 HuggingFace | Due to the constraints of HuggingFace, the open-source code currently experiences slower performance than our internal codebase when running on GPUs with Huggingface. To facilitate the efficient execution of our model, we offer a dedicated vllm solution that optimizes performance for running our model effectively. 3. Evaluation Results Base Model Standard Benchmark | Benchmark | Domain | LLaMA3 70B | Mixtral 8x22B | DeepSeek-V1 (Dense-67B) | DeepSeek-V2 (MoE-236B) | |:-----------:|:--------:|:------------:|:---------------:|:-------------------------:|:------------------------:| | MMLU | English | 78.9 | 77.6 | 71.3 | 78.5 | | BBH | English | 81.0 | 78.9 | 68.7 | 78.9 | | C-Eval | Chinese | 67.5 | 58.6 | 66.1 | 81.7 | | CMMLU | Chinese | 69.3 | 60.0 | 70.8 | 84.0 | | HumanEval | Code | 48.2 | 53.1 | 45.1 | 48.8 | | MBPP | Code | 68.6 | 64.2 | 57.4 | 66.6 | | GSM8K | Math | 83.0 | 80.3 | 63.4 | 79.2 | | Math | Math | 42.2 | 42.5 | 18.7 | 43.6 | For more evaluation details, such as few-shot settings and prompts, please check our paper. Evaluation results on the ``Needle In A Haystack`` (NIAH) tests. DeepSeek-V2 performs well across all context window lengths up to 128K. | Benchmark | Domain | QWen1.5 72B Chat | Mixtral 8x22B | LLaMA3 70B Instruct | DeepSeek-V1 Chat (SFT) | DeepSeek-V2 Chat (SFT) | DeepSeek-V2 Chat (RL) | |:-----------:|:----------------:|:------------------:|:---------------:|:---------------------:|:-------------:|:-----------------------:|:----------------------:| | MMLU | English | 76.2 | 77.8 | 80.3 | 71.1 | 78.4 | 77.8 | | BBH | English | 65.9 | 78.4 | 80.1 | 71.7 | 81.3 | 79.7 | | C-Eval | Chinese | 82.2 | 60.0 | 67.9 | 65.2 | 80.9 | 78.0 | | CMMLU | Chinese | 82.9 | 61.0 | 70.7 | 67.8 | 82.4 | 81.6 | | HumanEval | Code | 68.9 | 75.0 | 76.2 | 73.8 | 76.8 | 81.1 | | MBPP | Code | 52.2 | 64.4 | 69.8 | 61.4 | 70.4 | 72.0 | | LiveCodeBench (0901-0401) | Code | 18.8 | 25.0 | 30.5 | 18.3 | 28.7 | 32.5 | | GSM8K | Math | 81.9 | 87.9 | 93.2 | 84.1 | 90.8 | 92.2 | | Math | Math | 40.6 | 49.8 | 48.5 | 32.6 | 52.7 | 53.9 | English Open Ended Generation Evaluation We evaluate our model on AlpacaEval 2.0 and MTBench, showing the competitive performance of DeepSeek-V2-Chat-RL on English conversation generation. Chinese Open Ended Generation Evaluation Alignbench (https://arxiv.org/abs/2311.18743) | 模型 | 开源/闭源 | 总分 | 中文推理 | 中文语言 | | :---: | :---: | :---: | :---: | :---: | | gpt-4-1106-preview | 闭源 | 8.01 | 7.73 | 8.29 | | DeepSeek-V2 Chat (RL) | 开源 | 7.91 | 7.45 | 8.35 | | erniebot-4.0-202404 (文心一言) | 闭源 | 7.89 | 7.61 | 8.17 | | DeepSeek-V2 Chat (SFT) | 开源 | 7.74 | 7.30 | 8.17 | | gpt-4-0613 | 闭源 | 7.53 | 7.47 | 7.59 | | erniebot-4.0-202312 (文心一言) | 闭源 | 7.36 | 6.84 | 7.88 | | moonshot-v1-32k-202404 (月之暗面) | 闭源 | 7.22 | 6.42 | 8.02 | | Qwen1.5-72B-Chat (通义千问) | 开源 | 7.19 | 6.45 | 7.93 | | DeepSeek-67B-Chat | 开源 | 6.43 | 5.75 | 7.11 | | Yi-34B-Chat (零一万物) | 开源 | 6.12 | 4.86 | 7.38 | | gpt-3.5-turbo-0613 | 闭源 | 6.08 | 5.35 | 6.71 | Coding Benchmarks We evaluate our model on LiveCodeBench (0901-0401), a benchmark designed for live coding challenges. As illustrated, DeepSeek-V2 demonstrates considerable proficiency in LiveCodeBench, achieving a Pass@1 score that surpasses several other sophisticated models. This performance highlights the model's effectiveness in tackling live coding tasks. 4. Model Architecture DeepSeek-V2 adopts innovative architectures to guarantee economical training and efficient inference: - For attention, we design MLA (Multi-head Latent Attention), which utilizes low-rank key-value union compression to eliminate the bottleneck of inference-time key-value cache, thus supporting efficient inference. - For Feed-Forward Networks (FFNs), we adopt DeepSeekMoE architecture, a high-performance MoE architecture that enables training stronger models at lower costs. 5. Chat Website You can chat with the DeepSeek-V2 on DeepSeek's official website: chat.deepseek.com 6. API Platform We also provide OpenAI-Compatible API at DeepSeek Platform: platform.deepseek.com. Sign up for over millions of free tokens. And you can also pay-as-you-go at an unbeatable price. 7. How to run locally To utilize DeepSeek-V2 in BF16 format for inference, 80GB8 GPUs are required. Inference with Huggingface's Transformers You can directly employ Huggingface's Transformers for model inference. The complete chat template can be found within `tokenizerconfig.json` located in the huggingface model repository. Inference with vLLM (recommended) To utilize vLLM for model inference, please merge this Pull Request into your vLLM codebase: https://github.com/vllm-project/vllm/pull/4650. 8. License This code repository is licensed under the MIT License. The use of DeepSeek-V2 Base/Chat models is subject to the Model License. DeepSeek-V2 series (including Base and Chat) supports commercial use. 10. Contact If you have any questions, please raise an issue or contact us at [email protected].
deepseek-coder-6.7b-base
deepseek-llm-67b-base
DeepSeek-V3.1-Base
deepseek-coder-7b-instruct-v1.5
deepseek-vl-7b-chat
Introducing DeepSeek-VL, an open-source Vision-Language (VL) Model designed for real-world vision and language understanding applications. DeepSeek-VL possesses general multimodal understanding capabilities, capable of processing logical diagrams, web pages, formula recognition, scientific literature, natural images, and embodied intelligence in complex scenarios. DeepSeek-VL: Towards Real-World Vision-Language Understanding Haoyu Lu, Wen Liu, Bo Zhang, Bingxuan Wang, Kai Dong, Bo Liu, Jingxiang Sun, Tongzheng Ren, Zhuoshu Li, Hao Yang, Yaofeng Sun, Chengqi Deng, Hanwei Xu, Zhenda Xie, Chong Ruan (Equal Contribution, Project Lead) DeepSeek-VL-7b-base uses the SigLIP-L and SAM-B as the hybrid vision encoder supporting 1024 x 1024 image input and is constructed based on the DeepSeek-LLM-7b-base which is trained on an approximate corpus of 2T text tokens. The whole DeepSeek-VL-7b-base model is finally trained around 400B vision-language tokens. DeekSeel-VL-7b-chat is an instructed version based on DeepSeek-VL-7b-base. On the basis of `Python >= 3.8` environment, install the necessary dependencies by running the following command: This code repository is licensed under the MIT License. The use of DeepSeek-VL Base/Chat models is subject to DeepSeek Model License. DeepSeek-VL series (including Base and Chat) supports commercial use. If you have any questions, please raise an issue or contact us at [email protected].
DeepSeek-V3-Base
deepseek-moe-16b-chat
[🏠Homepage] | [🤖 Chat with DeepSeek LLM] | [Discord] | [Wechat(微信)] 1. Introduction to DeepSeekMoE See the Introduction for more details. 2. How to Use Here give some examples of how to use our model. Avoiding the use of the provided function `applychattemplate`, you can also interact with our model following the sample template. Note that `messages` should be replaced by your input. Note: By default (`addspecialtokens=True`), our tokenizer automatically adds a `bostoken` (` `) before the input text. Additionally, since the system prompt is not compatible with this version of our models, we DO NOT RECOMMEND including the system prompt in your input. 3. License This code repository is licensed under the MIT License. The use of DeepSeekMoE models is subject to the Model License. DeepSeekMoE supports commercial use. If you have any questions, please raise an issue or contact us at [email protected].
DeepSeek-R1-Zero
deepseek-coder-7b-base-v1.5
DeepSeek-Coder-V2-Lite-Base
DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence 1. Introduction We present DeepSeek-Coder-V2, an open-source Mixture-of-Experts (MoE) code language model that achieves performance comparable to GPT4-Turbo in code-specific tasks. Specifically, DeepSeek-Coder-V2 is further pre-trained from an intermediate checkpoint of DeepSeek-V2 with additional 6 trillion tokens. Through this continued pre-training, DeepSeek-Coder-V2 substantially enhances the coding and mathematical reasoning capabilities of DeepSeek-V2, while maintaining comparable performance in general language tasks. Compared to DeepSeek-Coder-33B, DeepSeek-Coder-V2 demonstrates significant advancements in various aspects of code-related tasks, as well as reasoning and general capabilities. Additionally, DeepSeek-Coder-V2 expands its support for programming languages from 86 to 338, while extending the context length from 16K to 128K. In standard benchmark evaluations, DeepSeek-Coder-V2 achieves superior performance compared to closed-source models such as GPT4-Turbo, Claude 3 Opus, and Gemini 1.5 Pro in coding and math benchmarks. The list of supported programming languages can be found here. We release the DeepSeek-Coder-V2 with 16B and 236B parameters based on the DeepSeekMoE framework, which has actived parameters of only 2.4B and 21B , including base and instruct models, to the public. | Model | #Total Params | #Active Params | Context Length | Download | | :-----------------------------: | :---------------: | :----------------: | :----------------: | :----------------------------------------------------------: | | DeepSeek-Coder-V2-Lite-Base | 16B | 2.4B | 128k | 🤗 HuggingFace | | DeepSeek-Coder-V2-Lite-Instruct | 16B | 2.4B | 128k | 🤗 HuggingFace | | DeepSeek-Coder-V2-Base | 236B | 21B | 128k | 🤗 HuggingFace | | DeepSeek-Coder-V2-Instruct | 236B | 21B | 128k | 🤗 HuggingFace | You can chat with the DeepSeek-Coder-V2 on DeepSeek's official website: coder.deepseek.com 4. API Platform We also provide OpenAI-Compatible API at DeepSeek Platform: platform.deepseek.com, and you can also pay-as-you-go at an unbeatable price. 5. How to run locally Here, we provide some examples of how to use DeepSeek-Coder-V2-Lite model. If you want to utilize DeepSeek-Coder-V2 in BF16 format for inference, 80GB8 GPUs are required. Inference with Huggingface's Transformers You can directly employ Huggingface's Transformers for model inference. The complete chat template can be found within `tokenizerconfig.json` located in the huggingface model repository. Inference with vLLM (recommended) To utilize vLLM for model inference, please merge this Pull Request into your vLLM codebase: https://github.com/vllm-project/vllm/pull/4650. This code repository is licensed under the MIT License. The use of DeepSeek-Coder-V2 Base/Instruct models is subject to the Model License. DeepSeek-Coder-V2 series (including Base and Instruct) supports commercial use. 7. Contact If you have any questions, please raise an issue or contact us at [email protected].
DeepSeek-Coder-V2-Instruct
DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence 1. Introduction We present DeepSeek-Coder-V2, an open-source Mixture-of-Experts (MoE) code language model that achieves performance comparable to GPT4-Turbo in code-specific tasks. Specifically, DeepSeek-Coder-V2 is further pre-trained from an intermediate checkpoint of DeepSeek-V2 with additional 6 trillion tokens. Through this continued pre-training, DeepSeek-Coder-V2 substantially enhances the coding and mathematical reasoning capabilities of DeepSeek-V2, while maintaining comparable performance in general language tasks. Compared to DeepSeek-Coder-33B, DeepSeek-Coder-V2 demonstrates significant advancements in various aspects of code-related tasks, as well as reasoning and general capabilities. Additionally, DeepSeek-Coder-V2 expands its support for programming languages from 86 to 338, while extending the context length from 16K to 128K. In standard benchmark evaluations, DeepSeek-Coder-V2 achieves superior performance compared to closed-source models such as GPT4-Turbo, Claude 3 Opus, and Gemini 1.5 Pro in coding and math benchmarks. The list of supported programming languages can be found here. We release the DeepSeek-Coder-V2 with 16B and 236B parameters based on the DeepSeekMoE framework, which has actived parameters of only 2.4B and 21B , including base and instruct models, to the public. | Model | #Total Params | #Active Params | Context Length | Download | | :-----------------------------: | :---------------: | :----------------: | :----------------: | :----------------------------------------------------------: | | DeepSeek-Coder-V2-Lite-Base | 16B | 2.4B | 128k | 🤗 HuggingFace | | DeepSeek-Coder-V2-Lite-Instruct | 16B | 2.4B | 128k | 🤗 HuggingFace | | DeepSeek-Coder-V2-Base | 236B | 21B | 128k | 🤗 HuggingFace | | DeepSeek-Coder-V2-Instruct | 236B | 21B | 128k | 🤗 HuggingFace | You can chat with the DeepSeek-Coder-V2 on DeepSeek's official website: coder.deepseek.com 4. API Platform We also provide OpenAI-Compatible API at DeepSeek Platform: platform.deepseek.com, and you can also pay-as-you-go at an unbeatable price. 5. How to run locally Here, we provide some examples of how to use DeepSeek-Coder-V2-Lite model. If you want to utilize DeepSeek-Coder-V2 in BF16 format for inference, 80GB8 GPUs are required. Inference with Huggingface's Transformers You can directly employ Huggingface's Transformers for model inference. The complete chat template can be found within `tokenizerconfig.json` located in the huggingface model repository. Inference with vLLM (recommended) To utilize vLLM for model inference, please merge this Pull Request into your vLLM codebase: https://github.com/vllm-project/vllm/pull/4650. This code repository is licensed under the MIT License. The use of DeepSeek-Coder-V2 Base/Instruct models is subject to the Model License. DeepSeek-Coder-V2 series (including Base and Instruct) supports commercial use. 7. Contact If you have any questions, please raise an issue or contact us at [email protected].
DeepSeek-Prover-V1.5-RL
deepseek-math-7b-base
deepseek-vl-1.3b-base
deepseek-llm-67b-chat
[🏠Homepage] | [🤖 Chat with DeepSeek LLM] | [Discord] | [Wechat(微信)] Introducing DeepSeek LLM, an advanced language model comprising 67 billion parameters. It has been trained from scratch on a vast dataset of 2 trillion tokens in both English and Chinese. In order to foster research, we have made DeepSeek LLM 7B/67B Base and DeepSeek LLM 7B/67B Chat open source for the research community. 2. Model Summary `deepseek-llm-67b-chat` is a 67B parameter model initialized from `deepseek-llm-67b-base` and fine-tuned on extra instruction data. - Home Page: DeepSeek - Repository: deepseek-ai/deepseek-LLM - Chat With DeepSeek LLM: DeepSeek-LLM 3. How to Use Here give some examples of how to use our model. Chat Completion Avoiding the use of the provided function `applychattemplate`, you can also interact with our model following the sample template. Note that `messages` should be replaced by your input. Note: By default (`addspecialtokens=True`), our tokenizer automatically adds a `bostoken` (` `) before the input text. Additionally, since the system prompt is not compatible with this version of our models, we DO NOT RECOMMEND including the system prompt in your input. 4. License This code repository is licensed under the MIT License. The use of DeepSeek LLM models is subject to the Model License. DeepSeek LLM supports commercial use. If you have any questions, please raise an issue or contact us at [email protected].
deepseek-coder-33b-base
DeepSeek-V2.5
deepseek-math-7b-rl
DeepSeek Coder V2 Instruct 0724
DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence 1. Introduction We present DeepSeek-Coder-V2, an open-source Mixture-of-Experts (MoE) code language model that achieves performance comparable to GPT4-Turbo in code-specific tasks. Specifically, DeepSeek-Coder-V2 is further pre-trained from an intermediate checkpoint of DeepSeek-V2 with additional 6 trillion tokens. Through this continued pre-training, DeepSeek-Coder-V2 substantially enhances the coding and mathematical reasoning capabilities of DeepSeek-V2, while maintaining comparable performance in general language tasks. Compared to DeepSeek-Coder-33B, DeepSeek-Coder-V2 demonstrates significant advancements in various aspects of code-related tasks, as well as reasoning and general capabilities. Additionally, DeepSeek-Coder-V2 expands its support for programming languages from 86 to 338, while extending the context length from 16K to 128K. In standard benchmark evaluations, DeepSeek-Coder-V2 achieves superior performance compared to closed-source models such as GPT4-Turbo, Claude 3 Opus, and Gemini 1.5 Pro in coding and math benchmarks. The list of supported programming languages can be found here. We release the DeepSeek-Coder-V2 with 16B and 236B parameters based on the DeepSeekMoE framework, which has actived parameters of only 2.4B and 21B , including base and instruct models, to the public. | Model | #Total Params | #Active Params | Context Length | Download | | :-----------------------------: | :---------------: | :----------------: | :----------------: | :----------------------------------------------------------: | | DeepSeek-Coder-V2-Lite-Base | 16B | 2.4B | 128k | 🤗 HuggingFace | | DeepSeek-Coder-V2-Lite-Instruct | 16B | 2.4B | 128k | 🤗 HuggingFace | | DeepSeek-Coder-V2-Base | 236B | 21B | 128k | 🤗 HuggingFace | | DeepSeek-Coder-V2-Instruct | 236B | 21B | 128k | 🤗 HuggingFace | | DeepSeek-Coder-V2-Instruct-0724 | 236B | 21B | 128k | 🤗 HuggingFace | You can chat with the DeepSeek-Coder-V2 on DeepSeek's official website: coder.deepseek.com 4. API Platform We also provide OpenAI-Compatible API at DeepSeek Platform: platform.deepseek.com, and you can also pay-as-you-go at an unbeatable price. 5. How to run locally Here, we provide some examples of how to use DeepSeek-Coder-V2-Lite model. If you want to utilize DeepSeek-Coder-V2 in BF16 format for inference, 80GB8 GPUs are required. Inference with Huggingface's Transformers You can directly employ Huggingface's Transformers for model inference. The complete chat template can be found within `tokenizerconfig.json` located in the huggingface model repository. Inference with vLLM (recommended) To utilize vLLM for model inference, please merge this Pull Request into your vLLM codebase: https://github.com/vllm-project/vllm/pull/4650. Function calling allows the model to call external tools to enhance its capabilities. json { "name": "getcurrentweather", "description": "Get the current weather in a given location", "parameters": { "type": "object", "properties": { "location": { "type": "string", "description": "The city and state, e.g. San Francisco, CA" }, "unit": { "type": "string", "enum": [ "celsius", "fahrenheit" ] } }, "required": [ "location" ] } } json\n{"location": "Tokyo"}\njson\n{"location": "Paris"}\n You can use JSON Output Mode to ensure the model generates a valid JSON object. To active this mode, a special instruction should be appended to your system prompt. json\n{\n "question": "Which is the highest mountain in the world?",\n "answer": "Mount Everest."\n}\n In FIM (Fill In the Middle) completion, you can provide a prefix and an optional suffix, and the model will complete the content in between. This code repository is licensed under the MIT License. The use of DeepSeek-Coder-V2 Base/Instruct models is subject to the Model License. DeepSeek-Coder-V2 series (including Base and Instruct) supports commercial use. 7. Contact If you have any questions, please raise an issue or contact us at [email protected].
DeepSeek Prover V2 671B
We introduce DeepSeek-Prover-V2, an open-source large language model designed for formal theorem proving in Lean 4, with initialization data collected through a recursive theorem proving pipeline powered by DeepSeek-V3. The cold-start training procedure begins by prompting DeepSeek-V3 to decompose complex problems into a series of subgoals. The proofs of resolved subgoals are synthesized into a chain-of-thought process, combined with DeepSeek-V3's step-by-step reasoning, to create an initial cold start for reinforcement learning. This process enables us to integrate both informal and formal mathematical reasoning into a unified model. Synthesize Cold-Start Reasoning Data through Recursive Proof Search - To construct the cold-start dataset, we develop a simple yet effective pipeline for recursive theorem proving, utilizing DeepSeek-V3 as a unified tool for both subgoal decomposition and formalization. We prompt DeepSeek-V3 to decompose theorems into high-level proof sketches while simultaneously formalizing these proof steps in Lean 4, resulting in a sequence of subgoals. - We use a smaller 7B model to handle the proof search for each subgoal, thereby reducing the associated computational burden. Once the decomposed steps of a challenging problem are resolved, we pair the complete step-by-step formal proof with the corresponding chain-of-thought from DeepSeek-V3 to create cold-start reasoning data. Reinforcement Learning with Synthetic Cold-Start Data - We curate a subset of challenging problems that remain unsolved by the 7B prover model in an end-to-end manner, but for which all decomposed subgoals have been successfully resolved. By composing the proofs of all subgoals, we construct a complete formal proof for the original problem. This proof is then appended to DeepSeek-V3's chain-of-thought, which outlines the corresponding lemma decomposition, thereby producing a cohesive synthesis of informal reasoning and subsequent formalization. - After fine-tuning the prover model on the synthetic cold-start data, we perform a reinforcement learning stage to further enhance its ability to bridge informal reasoning with formal proof construction. Following the standard training objective for reasoning models, we use binary correct-or-incorrect feedback as the primary form of reward supervision. - The resulting model, DeepSeek-Prover-V2-671B, achieves state-of-the-art performance in neural theorem proving, reaching $88.9$% pass ratio on the MiniF2F-test and solving 49 out of 658 problems from PutnamBench. The proofs generated by DeepSeek-Prover-V2 for the miniF2F dataset are available for download as a ZIP archive. 3. ProverBench: Formalization of AIME and Textbook Problems we introduce ProverBench, a benchmark dataset comprising 325 problems. Of these, 15 are formalized from number theory and algebra questions featured in the recent AIME competitions (AIME 24 and 25), offering authentic high-school competition-level challenges. The remaining 310 problems are drawn from curated textbook examples and educational tutorials, contributing a diverse and pedagogically grounded collection of formalized mathematical problems. This benchmark is designed to enable more comprehensive evaluation across both high-school competition problems and undergraduate-level mathematics. | Area | Count | | :---------------------: | :-------: | | AIME 24&25 | 15 | | Number Theory | 40 | | Elementary Algebra | 30 | | Linear Algebra | 50 | | Abstract Algebra | 40 | | Calculus | 90 | | Real Analysis | 30 | | Complex Analysis | 10 | | Functional Analysis | 10 | | Probability | 10 | | Total | 325 | We release DeepSeek-Prover-V2 in two model sizes: 7B and 671B parameters. DeepSeek-Prover-V2-671B is trained on top of DeepSeek-V3-Base. DeepSeek-Prover-V2-7B is built upon DeepSeek-Prover-V1.5-Base and features an extended context length of up to 32K tokens. | Model | Download | | :-----------------------------: | :----------------------------------------------------------: | | DeepSeek-Prover-V2-7B | 🤗 HuggingFace | | DeepSeek-Prover-V2-671B | 🤗 HuggingFace | | Dataset | Download | | :-----------------------------: | :----------------------------------------------------------: | | DeepSeek-ProverBench | 🤗 HuggingFace | You can directly use Huggingface's Transformers for model inference. DeepSeek-Prover-V2-671B shares the same architecture as DeepSeek-V3. For detailed information and supported features, please refer to the DeepSeek-V3 documentation on Hugging Face. The following is a basic example of generating a proof for a problem from the miniF2F dataset: lean4 {} ` 6. License The use of DeepSeek-Prover-V2 models is subject to the Model License. If you have any questions, please raise an issue or contact us at [email protected].
DeepSeek V3.2 Exp Base
deepseek-vl-7b-base
ESFT-vanilla-lite
DeepSeek-V2.5-1210
DeepSeek-V2.5-1210 is an upgraded version of DeepSeek-V2.5, with improvements across various capabilities: - Mathematical: Performance on the MATH-500 benchmark has improved from 74.8% to 82.8% . - Coding: Accuracy on the LiveCodebench (08.01 - 12.01) benchmark has increased from 29.2% to 34.38% . - Writing and Reasoning: Corresponding improvements have been observed in internal test datasets. Additionally, the new version of the model has optimized the user experience for file upload and webpage summarization functionalities. To utilize DeepSeek-V2.5 in BF16 format for inference, 80GB8 GPUs are required. Inference with Huggingface's Transformers You can directly employ Huggingface's Transformers for model inference. The complete chat template can be found within `tokenizerconfig.json` located in the huggingface model repository. Note: The chat template has been updated compared to the previous DeepSeek-V2-Chat version. Inference with vLLM (recommended) To utilize vLLM for model inference, please merge this Pull Request into your vLLM codebase: https://github.com/vllm-project/vllm/pull/4650. Function calling allows the model to call external tools to enhance its capabilities. json { "name": "getcurrentweather", "description": "Get the current weather in a given location", "parameters": { "type": "object", "properties": { "location": { "type": "string", "description": "The city and state, e.g. San Francisco, CA" }, "unit": { "type": "string", "enum": [ "celsius", "fahrenheit" ] } }, "required": [ "location" ] } } json\n{"location": "Tokyo"}\njson\n{"location": "Paris"}\n You can use JSON Output Mode to ensure the model generates a valid JSON object. To active this mode, a special instruction should be appended to your system prompt. json\n{\n "question": "Which is the highest mountain in the world?",\n "answer": "Mount Everest."\n}\n In FIM (Fill In the Middle) completion, you can provide a prefix and an optional suffix, and the model will complete the content in between. 3. License This code repository is licensed under the MIT License. The use of DeepSeek-V2 Base/Chat models is subject to the Model License. DeepSeek-V2 series (including Base and Chat) supports commercial use. 5. Contact If you have any questions, please raise an issue or contact us at [email protected].
DeepSeek-V2-Chat-0628
JanusFlow-1.3B
We present JanusFlow, a powerful framework that unifies image understanding and generation in a single model. JanusFlow introduces a minimalist architecture that integrates autoregressive language models with rectified flow, a state-of-the-art method in generative modeling. Our key finding demonstrates that rectified flow can be straightforwardly trained within the large language model framework, eliminating the need for complex architectural modifications. JanusFlow: Harmonizing Autoregression and Rectified Flow for Unified Multimodal Understanding and Generation JanusFlow is a unified understanding and generation MLLM, which decouples visual encoding for multimodal understanding and generation, which is constructed based on DeepSeek-LLM-1.3b-base. For multimodal understanding, it uses the SigLIP-L as the vision encoder, which supports 384 x 384 image input. For image generation, JanusFlow uses rectified flow and SDXL-VAE to generate 384 x 384 images. The provided checkpoint is the EMA checkpoint after pre-training and supervised fine-tuning. This code repository is licensed under the MIT License. The use of JanusFlow models is subject to DeepSeek Model License. If you have any questions, please raise an issue or contact us at [email protected].