DeepSeek-R1-0528-GPTQ-Int4-Int8Mix-Compact
16
5
license:mit
by
QuantTrio
Language Model
OTHER
2501.12948B params
New
16 downloads
Early-stage
Edge AI:
Mobile
Laptop
Server
5591GB+ RAM
Mobile
Laptop
Server
Quick Summary
AI model with specialized capabilities.
Device Compatibility
Mobile
4-6GB RAM
Laptop
16GB RAM
Server
GPU
Minimum Recommended
2330GB+ RAM
Code Examples
【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:【💡Notes on New VLLM Versions💡】textvllm
</div>
<div style="
background: rgba(255, 0, 200, 0.15);
padding: 16px;
border-radius: 6px;
border: 1px solid rgba(255, 0, 200, 0.3);
margin: 16px 0;
">
### 【💡 Patch for gptq_marlin.py💡】
At present, vllm==0.9.0 lacks support for per-layer quantization configurations for the moe module, which will lead to errors when loading the model.
I have implemented a simple fix by adding the get_moe_quant_method function to the gptq_marlin.py file.
Until the PR is merged, please replace the gptq_marlin.py file in your installation with the attached version, placing it at:text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】text
</div>
### 【Model List】
| FILE SIZE | LATEST UPDATE TIME |
|---------|--------------|
| `414GB` | `2025-06-01` |
### 【Model Download】Deploy This Model
Production-ready deployment in minutes
Together.ai
Instant API access to this model
Production-ready inference API. Start free, scale to millions.
Try Free APIReplicate
One-click model deployment
Run models in the cloud with simple API. No DevOps required.
Deploy NowDisclosure: We may earn a commission from these partners. This helps keep LLMYourWay free.