IntervitensInc

41 models • 1 total models in database

Sort by:

pangu-pro-moe-model

Reuploaded from https://gitcode.com/ascend-tribe/pangu-pro-moe-model 我们提出了一种新型的分组混合专家模型（Mixture of Grouped Experts, MoGE），它在专家选择阶段对专家进行分组，并约束 token 在每个组内激活等量专家，从而实现设备间天然的负载均衡。基于 MoGE 架构，我们构建了总参数量 72B、激活参数量 16B 的盘古 Pro MoE 模型: 词表大小：153376 层数： 48 MoGE 配置：4 个共享专家，64 个路由专家分 8 组、每组激活 1 个专家训练阶段：预训练和后训练预训练预料：15T 详细报告参见：中文技术报告地址：盘古 Pro MoE：昇腾原生的分组混合专家模型英文技术报告地址：Pangu Pro MoE: Mixture of Grouped Experts for Efficient Sparsity 昇腾推理系统加速代码和MindIE 与 vLLM-Ascend 配套软件版本已经推出。量化权重将于近期推出，敬请期待。 Pangu Pro MoE 模型根据 Pangu Model License Agreement 授权，旨在允许使用并促进人工智能技术的进一步发展。有关详细信息，请参阅模型存储库根目录中的 `LICENSE` 文件。由于Pangu Pro MoE（“模型”）所依赖的技术固有的限制，以及人工智能生成的内容是由盘古自动生成的，我们无法对以下事项做出任何保证： 1. 该模型的输出通过AI算法自动生成，不能排除某些信息可能存在缺陷、不合理或引起不适的可能性，生成的内容不代表华为的态度或立场； 2. 无法保证该模型100%准确、可靠、功能齐全、及时、安全、无错误、不间断、持续稳定或无任何故障； 3. 该模型的输出内容不构成任何建议或决策，也不保证生成的内容的真实性、完整性、准确性、及时性、合法性、功能性或实用性。生成的内容不能替代医疗、法律等领域的专业人士回答您的问题。生成的内容仅供参考，不代表华为的任何态度、立场或观点。您需要根据实际情况做出独立判断，华为不承担任何责任。

—

internlm2_5-20b-llamafied

Text generation model converted using a specific script and edited tokenizer to match the behavior of the original. Matches the original model at temperature=0 after some quick tests, but not extensively verified.

NaNK

llama

Qwen3-235B-A22B-Thinking-2507-tt-ckpt

NaNK

license:apache-2.0

GLM-4.6-Channel-int8

📖 Check out the GLM-4.6 technical blog , technical report(GLM-4.5) , and Zhipu AI technical documentation . Compared with GLM-4.5, GLM-4.6 brings several key improvements: Longer context window: The context window has been expanded from 128K to 200K tokens, enabling the model to handle more complex agentic tasks. Superior coding performance: The model achieves higher scores on code benchmarks and demonstrates better real-world performance in applications such as Claude Code、Cline、Roo Code and Kilo Code, including improvements in generating visually polished front-end pages. Advanced reasoning: GLM-4.6 shows a clear improvement in reasoning performance and supports tool use during inference, leading to stronger overall capability. More capable agents: GLM-4.6 exhibits stronger performance in tool using and search-based agents, and integrates more effectively within agent frameworks. Refined writing: Better aligns with human preferences in style and readability, and performs more naturally in role-playing scenarios. We evaluated GLM-4.6 across eight public benchmarks covering agents, reasoning, and coding. Results show clear gains over GLM-4.5, with GLM-4.6 also holding competitive advantages over leading domestic and international models such as DeepSeek-V3.1-Terminus and Claude Sonnet 4. Both GLM-4.5 and GLM-4.6 use the same inference method. For general evaluations, we recommend using a sampling temperature of 1.0. For code-related evaluation tasks (such as LCB), it is further recommended to set: - For tool-integrated reasoning, please refer to this doc. - For search benchmark, we design a specific format for searching toolcall in thinking mode to support search agent, please refer to this. for the detailed template.

license:mit