taobao-mnn

153 models • 1 total models in database
Sort by:

Qwen2.5-1.5B-Instruct-MNN

NaNK
license:apache-2.0
779
1

Qwen2.5-VL-3B-Instruct-MNN

Introduction This model is a 4-bit quantized version of the MNN model exported from Qwen2.5-VL-3B-Instruct using llmexport.

NaNK
license:apache-2.0
511
0

Llama-3.2-1B-Instruct-MNN

NaNK
license:apache-2.0
365
1

Qwen3-VL-4B-Thinking-MNN

Introduction This model is a 4-bit quantized version of the MNN model exported from Qwen3-VL-4B-Thinking using llmexport.

NaNK
license:apache-2.0
330
1

Qwen3-VL-4B-Instruct-MNN

Introduction This model is a 4-bit quantized version of the MNN model exported from Qwen3-VL-4B-Instruct using llmexport.

NaNK
license:apache-2.0
315
0

Qwen2.5-Omni-3B-MNN

Introduction This model is a 4-bit quantized version of the MNN model exported from Qwen2.5-Omni-3B using llmexport.

NaNK
license:apache-2.0
272
2

Qwen3-4B-Instruct-2507-MNN

Introduction This model is a 4-bit quantized version of the MNN model exported from Qwen3-4B-Instruct-2507 using llmexport.

NaNK
license:apache-2.0
260
2

bert-vits2-MNN

license:cc-by-nc-4.0
252
1

gpt-oss-20b-MNN

Introduction This model is a 4-bit quantized version of the MNN model exported from gpt-oss-20b using llmexport.

NaNK
license:apache-2.0
237
1

Qwen3-4B-Thinking-2507-MNN

NaNK
license:apache-2.0
232
1

Qwen3.5-4B-MNN

NaNK
license:apache-2.0
227
0

Qwen3-VL-8B-Instruct-MNN

Introduction This model is a 4-bit quantized version of the MNN model exported from Qwen3-VL-8B-Instruct using llmexport.

NaNK
license:apache-2.0
204
0

gemma-3-1b-it-qat-q4_0-gguf-MNN

NaNK
license:apache-2.0
199
0

DeepSeek-R1-0528-Qwen3-8B-MNN

Introduction This model is a 4-bit quantized version of the MNN model exported from DeepSeek-R1-0528-Qwen3-8B using llmexport.

NaNK
license:apache-2.0
192
1

Qwen3.5-0.8B-MNN

NaNK
license:apache-2.0
192
0

Qwen3-VL-8B-Thinking-MNN

Introduction This model is a 4-bit quantized version of the MNN model exported from Qwen3-VL-8B-Thinking using llmexport.

NaNK
license:apache-2.0
188
1

Qwen2.5-Omni-7B-MNN

NaNK
license:apache-2.0
182
1

Qwen3-0.6B-MNN

NaNK
license:apache-2.0
182
0

Qwen3.5-2B-MNN

NaNK
license:apache-2.0
178
0

Qwen3-1.7B-MNN

NaNK
license:apache-2.0
161
1

SmolVLM-256M-Instruct-MNN

Introduction This model is a 8-bit quantized version of the MNN model exported from SmolVLM-256M-Instruct using llmexport.

license:apache-2.0
151
0

sherpa-mnn-streaming-zipformer-en-2023-02-21

license:apache-2.0
148
0

Qwen3-4B-MNN

NaNK
license:apache-2.0
144
3

Qwen2.5-7B-Instruct-MNN

NaNK
license:apache-2.0
144
0

Hunyuan-0.5B-Instruct-MNN

NaNK
license:apache-2.0
140
1

MiniCPM-V-4-MNN

NaNK
license:apache-2.0
131
0

SmolVLM2-500M-Video-Instruct-MNN

Introduction This model is a 8-bit quantized version of the MNN model exported from SmolVLM2-500M-Video-Instruct using llmexport.

license:apache-2.0
129
0

sherpa-mnn-streaming-zipformer-bilingual-zh-en-2023-02-20

license:apache-2.0
125
0

gemma-3-4b-it-q4_0-mnn

Introduction This model is a 4-bit quantized version of the MNN model exported from gemma-3-4b-it-q40 using llmexport.

NaNK
license:apache-2.0
118
0

DeepSeek-R1-1.5B-Qwen-MNN

NaNK
license:apache-2.0
111
1

MiniCPM4-0.5B-MNN

Introduction This model is a 4-bit quantized version of the MNN model exported from MiniCPM4-0.5B using llmexport.

NaNK
license:apache-2.0
109
1

Qwen3-Coder-30B-A3B-Instruct-MNN

Introduction This model is a 4-bit quantized version of the MNN model exported from Qwen3-Coder-30B-A3B-Instruct using llmexport.

NaNK
license:apache-2.0
100
0

Qwen3-VL-30B-A3B-Thinking-MNN

Introduction This model is a 4-bit quantized version of the MNN model exported from Qwen3-VL-30B-A3B-Thinking using llmexport.

NaNK
license:apache-2.0
94
1

Qwen2.5-0.5B-Instruct-MNN

NaNK
license:apache-2.0
93
3

Hunyuan-1.8B-Instruct-MNN

NaNK
license:apache-2.0
92
1

Qwen3-VL-30B-A3B-Instruct-MNN

Introduction This model is a 4-bit quantized version of the MNN model exported from Qwen3-VL-30B-A3B-Instruct using llmexport.

NaNK
license:apache-2.0
89
0

SmolLM3-3B-MNN

NaNK
license:apache-2.0
84
0

WebSailor-3B-MNN

Introduction This model is a 4-bit quantized version of the MNN model exported from WebSailor-3B using llmexport.

NaNK
license:apache-2.0
75
1

Qwen2.5-Coder-7B-Instruct-MNN

NaNK
license:apache-2.0
74
0

DeepSeek-R1-7B-Qwen-MNN

NaNK
license:apache-2.0
74
0

Qwen3-8B-MNN

NaNK
license:apache-2.0
71
0

Llama-3.2-3B-Instruct-MNN

NaNK
license:apache-2.0
68
0

Qwen2.5-Coder-1.5B-Instruct-MNN

NaNK
license:apache-2.0
68
0

SmolVLM2-256M-Video-Instruct-MNN

Introduction This model is a 8-bit quantized version of the MNN model exported from SmolVLM2-256M-Video-Instruct using llmexport.

license:apache-2.0
66
0

Hunyuan-4B-Instruct-MNN

NaNK
license:apache-2.0
65
1

deepseek-vl-7b-chat-MNN

Introduction This model is a 4-bit quantized version of the MNN model exported from deepseek-vl-7b-chat using llmexport.

NaNK
license:apache-2.0
62
0

Qwen3-30B-A3B-Thinking-2507-MNN

NaNK
license:apache-2.0
59
0

ERNIE-4.5-0.3B-PT-MNN

Introduction This model is a 4-bit quantized version of the MNN model exported from ERNIE-4.5-0.3B-PT using llmexport.

NaNK
license:apache-2.0
58
2

MiniCPM4-8B-MNN

Introduction This model is a 4-bit quantized version of the MNN model exported from MiniCPM4-8B using llmexport.

NaNK
license:apache-2.0
56
2

gemma-2-2b-it-MNN

NaNK
license:apache-2.0
56
0

Qwen2-VL-2B-Instruct-MNN

NaNK
license:apache-2.0
53
1

Lingshu-7B-MNN

NaNK
license:apache-2.0
53
1

MobileLLM-125M-MNN

license:apache-2.0
53
0

Qwen3-14B-MNN

NaNK
license:apache-2.0
50
0

SmolVLM2-256M-Video-Instruct-NPU

Introduction This model is a 4-bit quantized version of the MNN model exported from SmolVLM2-256M-Video-Instruct using llmexport.

license:apache-2.0
49
0

Hunyuan-7B-Instruct-MNN

NaNK
license:apache-2.0
48
1

Qwen3-4B-SafeRL-MNN

NaNK
license:apache-2.0
46
0

Qwen3-VL-2B-Instruct-Eagle3

NaNK
llama
45
2

InternVL2_5-1B-MNN

NaNK
license:apache-2.0
45
1

Qwen3-Embedding-0.6B-MNN

Introduction This model is a 4-bit quantized version of the MNN model exported from Qwen3-Embedding-0.6B using llmexport.

NaNK
license:apache-2.0
44
0

Meta-Llama-3.1-8B-Instruct-MNN

NaNK
license:apache-2.0
42
0

Qwen3-VL-32B-Thinking-MNN

NaNK
license:apache-2.0
42
0

FastVLM-1.5B-Stage3-MNN

Introduction This model is a 8-bit quantized version of the MNN model exported from FastVLM-1.5B-Stage3 using llmexport.

NaNK
license:apache-2.0
41
1

Qwen2.5-3B-Instruct-MNN

NaNK
license:apache-2.0
40
0

TinyLlama-1.1B-Chat-MNN

NaNK
license:apache-2.0
40
0

SmolVLM-500M-Instruct-MNN

Introduction This model is a 8-bit quantized version of the MNN model exported from SmolVLM-500M-Instruct using llmexport.

license:apache-2.0
40
0

phi-2-MNN

license:apache-2.0
39
1

Qwen3Guard-Gen-0.6B-MNN

NaNK
license:apache-2.0
39
0

Qwen3Guard-Stream-8B-MNN

NaNK
license:apache-2.0
39
0

Qwen3-VL-32B-Instruct-MNN

NaNK
license:apache-2.0
38
1

MobileLLM-1B-MNN

NaNK
license:apache-2.0
38
0

SmolVLM2-2.2B-Instruct-MNN

NaNK
license:apache-2.0
38
0

FastVLM-0.5B-Stage3-MNN

Introduction This model is a 8-bit quantized version of the MNN model exported from FastVLM-0.5B-Stage3 using llmexport.

NaNK
license:apache-2.0
38
0

Meta-Llama-3.1-8B-Instruct-Eagle3-MNN

Introduction This model is a 4-bit quantized version of the MNN model exported from Meta-Llama-3.1-8B-Instruct-Eagle3 using llmexport.

NaNK
base_model:meta-llama/Llama-3.1-8B-Instruct
38
0

Qwen3-4B-Instruct-2507-Eagle3

This repository contains an EAGLE-3 style draft model specifically trained to accelerate the inference of the `Qwen3-4B-Instruct-2507` large language model. This is not a standalone model. It must be used in conjunction with its corresponding base model (`Qwen3-4B-Instruct-2507`) within a speculative decoding framework to achieve significant speedups in text generation. - Base Model: `Qwen3-4B-Instruct-2507` - Model Architecture: EAGLE-3 (Speculative Decoding Draft Model) - Primary Benefit: Accelerates text generation throughput by 1.5x to 2.5x without compromising the generation quality of the base model. EAGLE (Extrapolative A Generative Language Engine) is an advanced speculative decoding method. It uses a small draft model to generate a sequence of draft tokens in parallel. These tokens are then verified by the larger, more powerful base model in a single forward pass. If the draft is accepted, the generation process advances multiple steps at once, leading to a substantial increase in speed. This model serves as the "draft model" in this process. Its average acceptance length (`acclength`) on standard benchmarks is approximately 2.08 tokens (with 4 draft tokens), meaning on average, it helps the base model advance over 2 tokens per verification step. This model was evaluated on a diverse set of benchmarks. The `acclength` (average number of accepted draft tokens) indicates the efficiency of the acceleration. A higher value is better. | Benchmark | `acclength` (numdrafttokens=4) | | :--------- | :-------------------------------: | | gsm8k | 2.22 | | humaneval | 2.29 | | math500 | 2.27 | | cmmlu | 1.94 | | ceval | 1.93 | | mtbench | 1.85 | | Average| ~2.08 | These results demonstrate consistent and effective acceleration across various tasks, including coding, math, and general conversation. - Training Framework: This model was trained using SpecForge, an open-source framework for speculative decoding research. - Training Data: The model was trained on the EagleChat dataset. Available on Hugging Face and ModelScope. - Training Duration: The model was trained for 3 epochs on 8x MI308X GPUs, which took 56 hours and totaled 448 `MI308X GPU-hours`. 本仓库包含一个 EAGLE-3 风格的草稿模型,专为加速 `Qwen3-4B-Instruct-2507` 大语言模型的推理而训练。 请注意:这是一个非独立模型。它必须与对应的基座模型 (`Qwen3-4B-Instruct-2507`) 在推测解码 (speculative decoding) 框架下配合使用,才能实现显著的文本生成加速效果。 - 基座模型: `Qwen3-4B-Instruct-2507` - 模型架构: EAGLE-3 (推测解码草稿模型) - 核心优势: 在不牺牲基座模型生成质量的前提下,将文本生成吞吐量提升 1.5 到 2.5 倍。 EAGLE (Extrapolative A Generative Language Engine) 是一种先进的推测解码方法。它利用一个轻量的草稿模型并行生成一系列草稿词元 (draft tokens),然后由更大、更强的基座模型通过单次前向传播进行验证。如果草稿被接受,生成过程就能一次性前进多个步骤,从而实现显著的速度提升。 本模型在此过程中扮演“草稿模型”的角色。它在标准评测基准上的平均接受长度 (`acclength`) 约为 2.08 个词元 (在草稿长度为4时),这意味着在每次验证中,它平均能帮助基座模型推进超过 2 个词元。 本模型在一系列多样化的评测基准上进行了评估。`acclength` (平均接受的草稿词元数) 反映了加速的效率,数值越高越好。 | 评测基准 (Benchmark) | `acclength` (numdrafttokens=4) | | :------------------ | :-------------------------------: | | gsm8k | 2.22 | | humaneval | 2.29 | | math500 | 2.27 | | cmmlu | 1.94 | | ceval | 1.93 | | mtbench | 1.85 | | 平均值 | ~2.08 | - 训练框架: 本模型使用开源推测解码研究框架 SpecForge 进行训练。 - 训练数据: 训练数据使用了 EagleChat 数据集。您可以在 Hugging Face 或 ModelScope 上获取该数据集。 - 训练耗时: 训练使用 8x MI308X 训练 3 轮,耗时 56 小时,共 448 `MI308X 卡时`。

NaNK
llama
38
0

MobileLLM-600M-MNN

license:apache-2.0
37
0

Qwen3Guard-Gen-4B-MNN

NaNK
license:apache-2.0
37
0

Qwen3Guard-Stream-4B-MNN

NaNK
license:apache-2.0
37
0

Qwen3Guard-Stream-0.6B-MNN

NaNK
license:apache-2.0
36
0

Qwen3-30B-A3B-Instruct-2507-MNN

NaNK
license:apache-2.0
35
1

Hunyuan-MT-7B-MNN

Introduction This model is a 4-bit quantized version of the MNN model exported from Hunyuan-MT-7B using llmexport.

NaNK
license:apache-2.0
34
1

glm-4-9b-chat-MNN

NaNK
license:apache-2.0
34
0

SmolLM2-135M-Instruct-MNN

license:apache-2.0
34
0

DeepSeek-Prover-V2-7B-MNN

NaNK
license:apache-2.0
34
0

Qwen3Guard-Gen-8B-MNN

NaNK
license:apache-2.0
33
0

FastVLM-1.5B-Stage2-MNN

Introduction This model is a 8-bit quantized version of the MNN model exported from FastVLM-1.5B-Stage2 using llmexport.

NaNK
license:apache-2.0
32
0

Qwen3 VL 2B Instruct MNN

Introduction This model is a 4-bit quantized version of the MNN model exported from Qwen3-VL-2B-Instruct using llmexport.

NaNK
license:apache-2.0
31
3

Qwen3.5-27B-MNN

NaNK
license:apache-2.0
31
0

Qwen3-VL-2B-Thinking-MNN

Introduction This model is a 4-bit quantized version of the MNN model exported from Qwen3-VL-2B-Thinking using llmexport.

NaNK
license:apache-2.0
31
0

gemma-3-270m-it-MNN

license:apache-2.0
29
2

Meta-Llama-3-8B-Instruct-MNN

Introduction This model is a 4-bit quantized version of the MNN model exported from Meta-Llama-3-8B-Instruct using llmexport.

NaNK
license:apache-2.0
29
0

deepseek-llm-7b-chat-MNN

NaNK
license:apache-2.0
28
3

Llama-2-7b-chat-ms-MNN

Introduction This model is a 4-bit quantized version of the MNN model exported from Llama-2-7b-chat using llmexport.

NaNK
license:apache-2.0
28
0

Qwen3-Embedding-4B-MNN

Introduction This model is a 4-bit quantized version of the MNN model exported from Qwen3-Embedding-4B using llmexport.

NaNK
license:apache-2.0
27
0

SmolLM2-360M-Instruct-MNN

license:apache-2.0
25
0

gemma-2-9b-it-MNN

NaNK
license:apache-2.0
25
0

gemma-7b-it-MNN

NaNK
license:apache-2.0
24
1

chatglm3-6b-MNN

NaNK
license:apache-2.0
24
0

MobileLLM-350M-MNN

license:apache-2.0
24
0

TinyLlama-1.1B-Chat-v1.0-MNN

NaNK
license:apache-2.0
24
0

Qwen2-VL-7B-Instruct-MNN

NaNK
license:apache-2.0
23
0

Qwen2.5-Math-1.5B-Instruct-MNN

NaNK
license:apache-2.0
23
0

FastVLM-0.5B-Stage2-MNN

Introduction This model is a 8-bit quantized version of the MNN model exported from FastVLM-0.5B-Stage2 using llmexport.

NaNK
license:apache-2.0
23
0

Qwen3-Embedding-8B-MNN

Introduction This model is a 4-bit quantized version of the MNN model exported from Qwen3-Embedding-8B using llmexport.

NaNK
license:apache-2.0
23
0

Qwen-VL-Chat-MNN

license:apache-2.0
21
1

Qwen2.5-VL-7B-Instruct-MNN

NaNK
license:apache-2.0
21
0

Qwen3-32B-MNN

NaNK
license:apache-2.0
21
0

Qwen3-30B-A3B-MNN

Introduction This model is a 4-bit quantized version of the MNN model exported from Qwen3-30B-A3B using llmexport.

NaNK
license:apache-2.0
20
2

Qwen2-0.5B-Instruct-MNN

NaNK
license:apache-2.0
20
1

Qwen3-4B-Instruct-2507-Eagle3-MNN

Introduction This model is a 4-bit quantized version of the MNN model exported from Qwen3-4B-Instruct-2507-Eagle3 using llmexport.

NaNK
license:apache-2.0
18
0

Qwen3-VL-2B-Thinking-Eagle3

This repository contains an EAGLE-3 style draft model specifically trained to accelerate the inference of the `Qwen3-VL-2B-Thinking` large language model. This is not a standalone model. It must be used in conjunction with its corresponding base model (`Qwen3-VL-2B-Thinking`) within a speculative decoding framework to achieve significant speedups in text generation. - Base Model: `Qwen3-VL-2B-Thinking` - Model Architecture: EAGLE-3 (Speculative Decoding Draft Model) - Primary Benefit: Accelerates text generation throughput by 1.5x to 2.5x without compromising the generation quality of the base model. EAGLE (Extrapolative A Generative Language Engine) is an advanced speculative decoding method. It uses a small draft model to generate a sequence of draft tokens in parallel. These tokens are then verified by the larger, more powerful base model in a single forward pass. If the draft is accepted, the generation process advances multiple steps at once, leading to a substantial increase in speed. This model serves as the "draft model" in this process. Its average acceptance length (`acclength`) on standard benchmarks is approximately 1.87 tokens (with 4 draft tokens), meaning on average, it helps the base model advance nearly 2 tokens per verification step. This model was evaluated on a diverse set of benchmarks. The `acclength` (average number of accepted draft tokens) indicates the efficiency of the acceleration. A higher value is better. | Benchmark | `acclength` (numdrafttokens=4) | `acclength` (numdrafttokens=8) | | :--------- | :-------------------------------: | :-------------------------------: | | humaneval | 1.80 | 1.85 | | gsm8k | 1.77 | 1.80 | | math500 | 1.75 | 1.81 | | ceval | 1.70 | 1.74 | | cmmlu | 1.65 | 1.70 | | mtbench | 1.61 | 1.65 | | Average| ~1.71 | ~1.76 | These results demonstrate consistent and effective acceleration across various tasks, including coding, math, and general conversation. - Training Framework: This model was trained using SpecForge, an open-source framework for speculative decoding research. - Training Data: The model was trained on the EagleChat dataset. Available on Hugging Face and ModelScope. - Training Duration: The model was trained for 2 epochs on 4x H20 GPUs, which took 27 hours and totaled 108 `H20 GPU-hours`. 本仓库包含一个 EAGLE-3 风格的草稿模型,专为加速 `Qwen3-VL-2B-Thinking` 大语言模型的推理而训练。 请注意:这是一个非独立模型。它必须与对应的基座模型 (`Qwen3-VL-2B-Thinking`) 在推测解码 (speculative decoding) 框架下配合使用,才能实现显著的文本生成加速效果。 - 基座模型: `Qwen3-VL-2B-Thinking` - 模型架构: EAGLE-3 (推测解码草稿模型) - 核心优势: 在不牺牲基座模型生成质量的前提下,将文本生成吞吐量提升 1.5 到 2.5 倍。 EAGLE (Extrapolative A Generative Language Engine) 是一种先进的推测解码方法。它利用一个轻量的草稿模型并行生成一系列草稿词元 (draft tokens),然后由更大、更强的基座模型通过单次前向传播进行验证。如果草稿被接受,生成过程就能一次性前进多个步骤,从而实现显著的速度提升。 本模型在此过程中扮演“草稿模型”的角色。它在标准评测基准上的平均接受长度 (`acclength`) 约为 1.71 个词元 (在草稿长度为4时)。 本模型在一系列多样化的评测基准上进行了评估。`acclength` (平均接受的草稿词元数) 反映了加速的效率,数值越高越好。 | 评测基准 (Benchmark) | `acclength` (numdrafttokens=4) | `acclength` (numdrafttokens=8) | | :------------------ | :-------------------------------: | :-------------------------------: | | humaneval | 1.80 | 1.85 | | gsm8k | 1.77 | 1.80 | | math500 | 1.75 | 1.81 | | ceval | 1.70 | 1.74 | | cmmlu | 1.65 | 1.70 | | mtbench | 1.61 | 1.65 | | 平均值 | ~1.71 | ~1.76 | - 训练框架: 本模型使用开源推测解码研究框架 SpecForge 进行训练。 - 训练数据: 训练数据使用了 EagleChat 数据集。您可以在 Hugging Face 或 ModelScope 上获取该数据集。 - 训练耗时: 训练使用 4x H20 训练 2 轮,耗时 27 小时,共 108 `H20 卡时`。

NaNK
llama
18
0

Qwen-7B-Chat-MNN

NaNK
license:apache-2.0
17
1

internlm-chat-7b-MNN

NaNK
license:apache-2.0
17
1

Baichuan2-7B-Chat-MNN

NaNK
license:apache-2.0
17
0

Qwen2-7B-Instruct-MNN

NaNK
license:apache-2.0
17
0

MiMo-7B-RL-MNN

NaNK
license:apache-2.0
17
0

Qwen3-Reranker-0.6B-MNN

Introduction This model is a 4-bit quantized version of the MNN model exported from Qwen3-Reranker-0.6B using llmexport.

NaNK
license:apache-2.0
17
0

SmolLM2-1.7B-Instruct-MNN

NaNK
license:apache-2.0
16
1

Qwen2-1.5B-Instruct-MNN

NaNK
license:apache-2.0
16
0

Qwen2.5-Math-7B-Instruct-MNN

NaNK
license:apache-2.0
16
0

Yi-6B-Chat-MNN

NaNK
license:apache-2.0
16
0

Qwen2-Audio-7B-Instruct-MNN

NaNK
license:apache-2.0
15
3

MiMo-7B-SFT-MNN

NaNK
license:apache-2.0
14
0

SmolVLM-Instruct-MNN

license:apache-2.0
14
0

Qwen3-Reranker-4B-MNN

Introduction This model is a 4-bit quantized version of the MNN model exported from Qwen3-Reranker-4B using llmexport.

NaNK
license:apache-2.0
13
0

Qwen3-Reranker-8B-MNN

NaNK
license:apache-2.0
13
0

Qwen3-VL-2B-Instruct-Eagle3-MNN

Introduction This model is a 4-bit quantized version of the MNN model exported from Qwen3-VL-2B-Instruct-Eagle3 using llmexport.

NaNK
license:apache-2.0
13
0

MiMo-7B-Base-MNN

Introduction This model is a 4-bit quantized version of the MNN model exported from MiMo-7B-Base using llmexport.

NaNK
license:apache-2.0
11
0

Qwen3.5-35B-A3B-MNN

NaNK
license:apache-2.0
10
0

MiMo-7B-RL-Zero-MNN

Introduction This model is a 4-bit quantized version of the MNN model exported from MiMo-7B-RL-Zero using llmexport.

NaNK
license:apache-2.0
10
0

SmolDocling-256M-preview-MNN

license:apache-2.0
10
0

reader-lm-0.5b-MNN

NaNK
license:apache-2.0
9
0

QwQ-32B-MNN

NaNK
license:apache-2.0
9
0

Qwen3.5-9B-MNN

NaNK
license:apache-2.0
8
1

OpenELM-3B-Instruct-MNN

NaNK
license:apache-2.0
8
1

Qwen1.5-4B-Chat-MNN

NaNK
license:apache-2.0
8
0

Qwen1.5-7B-Chat-MNN

NaNK
license:apache-2.0
8
0

bge-large-zh-MNN

license:apache-2.0
8
0

Qwen3-VL-4B-Instruct-Eagle3

This repository contains an EAGLE-3 style draft model specifically trained to accelerate the inference of the `Qwen3-VL-4B-Instruct` large language model. This is not a standalone model. It must be used in conjunction with its corresponding base model (`Qwen3-VL-4B-Instruct`) within a speculative decoding framework to achieve significant speedups in text generation. - Base Model: `Qwen3-VL-4B-Instruct` - Model Architecture: EAGLE-3 (Speculative Decoding Draft Model) - Primary Benefit: Accelerates text generation throughput by 1.5x to 2.5x without compromising the generation quality of the base model. EAGLE (Extrapolative A Generative Language Engine) is an advanced speculative decoding method. It uses a small draft model to generate a sequence of draft tokens in parallel. These tokens are then verified by the larger, more powerful base model in a single forward pass. If the draft is accepted, the generation process advances multiple steps at once, leading to a substantial increase in speed. This model serves as the "draft model" in this process. Its average acceptance length (`acclength`) on standard benchmarks is approximately 1.87 tokens (with 4 draft tokens), meaning on average, it helps the base model advance nearly 2 tokens per verification step. This model was evaluated on a diverse set of benchmarks. The `acclength` (average number of accepted draft tokens) indicates the efficiency of the acceleration. A higher value is better. | Benchmark | `acclength` (numdrafttokens=4) | `acclength` (numdrafttokens=8) | | :--------- | :-------------------------------: | :-------------------------------: | | humaneval | 2.05 | 2.18 | | math500 | 2.01 | 2.15 | | ceval | 1.74 | 1.80 | | gsm8k | 1.74 | 1.78 | | cmmlu | 1.72 | 1.77 | | mtbench | 1.61 | 1.66 | | Average| ~1.81 | ~1.89 | These results demonstrate consistent and effective acceleration across various tasks, including coding, math, and general conversation. - Training Framework: This model was trained using SpecForge, an open-source framework for speculative decoding research. - Training Data: The model was trained on the EagleChat dataset. Available on Hugging Face and ModelScope. - Training Duration: The model was trained for 3 epochs on 8x MI308X GPUs, which took 56 hours and totaled 448 `MI308X GPU-hours`. 本仓库包含一个 EAGLE-3 风格的草稿模型,专为加速 `Qwen3-VL-4B-Instruct` 大语言模型的推理而训练。 请注意:这是一个非独立模型。它必须与对应的基座模型 (`Qwen3-VL-4B-Instruct`) 在推测解码 (speculative decoding) 框架下配合使用,才能实现显著的文本生成加速效果。 - 基座模型: `Qwen3-VL-4B-Instruct` - 模型架构: EAGLE-3 (推测解码草稿模型) - 核心优势: 在不牺牲基座模型生成质量的前提下,将文本生成吞吐量提升 1.5 到 2.5 倍。 EAGLE (Extrapolative A Generative Language Engine) 是一种先进的推测解码方法。它利用一个轻量的草稿模型并行生成一系列草稿词元 (draft tokens),然后由更大、更强的基座模型通过单次前向传播进行验证。如果草稿被接受,生成过程就能一次性前进多个步骤,从而实现显著的速度提升。 本模型在此过程中扮演“草稿模型”的角色。它在标准评测基准上的平均接受长度 (`acclength`) 约为 1.81 个词元 (在草稿长度为4时)。 本模型在一系列多样化的评测基准上进行了评估。`acclength` (平均接受的草稿词元数) 反映了加速的效率,数值越高越好。 | 评测基准 (Benchmark) | `acclength` (numdrafttokens=4) | `acclength` (numdrafttokens=8) | | :------------------ | :-------------------------------: | :-------------------------------: | | humaneval | 2.05 | 2.18 | | math500 | 2.01 | 2.15 | | ceval | 1.74 | 1.80 | | gsm8k | 1.74 | 1.78 | | cmmlu | 1.72 | 1.77 | | mtbench | 1.61 | 1.66 | | 平均值 | ~1.81 | ~1.89 | - 训练框架: 本模型使用开源推测解码研究框架 SpecForge 进行训练。 - 训练数据: 训练数据使用了 EagleChat 数据集。您可以在 Hugging Face 或 ModelScope 上获取该数据集。 - 训练耗时: 训练使用 8x MI308X 训练 3 轮,耗时 56 小时,共 448 `MI308X 卡时`。

NaNK
llama
8
0

Qwen-1_8B-Chat-MNN

NaNK
license:apache-2.0
7
0

chatglm2-6b-MNN

NaNK
license:apache-2.0
7
0

reader-lm-1.5b-MNN

NaNK
license:apache-2.0
7
0

OpenELM-270M-Instruct-MNN

license:apache-2.0
7
0

OpenELM-1_1B-Instruct-MNN

NaNK
license:apache-2.0
7
0

QwQ-32B-Preview-MNN

NaNK
license:apache-2.0
7
0

Qwen3.5-35B-A3B-Base-MNN

NaNK
license:apache-2.0
6
0

Qwen1.5-0.5B-Chat-MNN

NaNK
license:apache-2.0
6
0

Qwen1.5-1.8B-Chat-MNN

NaNK
license:apache-2.0
6
0

chatglm-6b-MNN

NaNK
license:apache-2.0
6
0

codegeex2-6b-MNN

NaNK
license:apache-2.0
6
0

OpenELM-450M-Instruct-MNN

license:apache-2.0
6
0

gte_sentence-embedding_multilingual-base-MNN

license:apache-2.0
5
0

stable-diffusion-v1-5-mnn-opencl

license:apache-2.0
0
1