huawei-csl
Kimi-Linear-48B-A3B-Instruct-4bit-SINQ
Qwen3-14B-4bit-SINQ
This repository contains the official 4-bit quantized version of the `Qwen3-14B` model using the SINQ (Sinkhorn-Normalized Quantization) method. SINQ is a novel, fast and high-quality quantization method designed to make any Large Language Models smaller while keeping their accuracy almost intact. To support the project please put a star ⭐ in the official SINQ github repository. Model Details - Model Name: `Qwen3-14B-4bit-SINQ ` - Base Model: `Qwen/Qwen3-14B` - Task: Text Generation - Framework: PyTorch / Transformers - License: Apache-2.0 - Quantized By: Huawei - Computing Systems Lab - Quantization Method: SINQ (Sinkhorn-Normalized Quantization) - Precision: INT4 - Group Size: 64 - Framework: PyTorch - Quantization Library: `sinq` Prerequisite Before running the quantization script, make sure the SINQ library is installed. Installation instructions and setup details are available in the SINQ official github repository. Usage example You can load and use the model with our wrapper based on the 🤗 Transformers library: The quantized model was obtained using the SINQ quantization library, following the steps below: > Reproducibility Note: This model was quantized using the SINQ implementation from commit `14ad847` of the SINQ repository. If you find SINQ useful in your research or applications, please - Put a star ⭐ in the official SINQ github repository. - Cite our paper :
Qwen3-4B-PreSINQ-GGUF
Qwen3-14B-3bit-ASINQ
This repository contains the official 3-bit quantized version of the `Qwen3-14B` model using the calibrated version of SINQ (Sinkhorn-Normalized Quantization) method. SINQ is a novel, fast and high-quality quantization method designed to make any Large Language Models smaller while keeping their accuracy almost intact. To support the project please put a star ⭐ in the official SINQ github repository. Model Details - Model Name: `Qwen3-14B-3bit-ASINQ ` - Base Model: `Qwen/Qwen3-14B` - Task: Text Generation - Framework: PyTorch / Transformers - License: Apache-2.0 - Quantized By: Huawei - Computing Systems Lab - Quantization Method: A-SINQ (Sinkhorn-Normalized Quantization) - Precision: INT3 - Group Size: 64 - Framework: PyTorch - Quantization Library: `sinq` Prerequisite Before running the quantization script, make sure the SINQ library is installed. Installation instructions and setup details are available in the SINQ official github repository. Usage example You can load and use the model with our wrapper based on the 🤗 Transformers library: The quantized model was obtained using the SINQ quantization library, following the steps below: > Reproducibility Note: This model was quantized using the SINQ implementation from commit `14ad847` of the SINQ repository. If you find SINQ useful in your research or applications, please - Put a star ⭐ in the official SINQ github repository. - Cite our paper :
Qwen3-32B-3bit-ASINQ
This repository contains the official 3-bit quantized version of the `Qwen3-32B` model using the calibrated version of SINQ (Sinkhorn-Normalized Quantization) method. SINQ is a novel, fast and high-quality quantization method designed to make any Large Language Models smaller while keeping their accuracy almost intact. To support the project please put a star ⭐ in the official SINQ github repository. Model Details - Model Name: `Qwen3-32B-3bit-ASINQ ` - Base Model: `Qwen/Qwen3-32B` - Task: Text Generation - Framework: PyTorch / Transformers - License: Apache-2.0 - Quantized By: Huawei - Computing Systems Lab - Quantization Method: A-SINQ (Sinkhorn-Normalized Quantization) - Precision: INT3 - Group Size: 64 - Framework: PyTorch - Quantization Library: `sinq` Prerequisite Before running the quantization script, make sure the SINQ library is installed. Installation instructions and setup details are available in the SINQ official github repository. Usage example You can load and use the model with our wrapper based on the 🤗 Transformers library: The quantized model was obtained using the SINQ quantization library, following the steps below: > Reproducibility Note: This model was quantized using the SINQ implementation from commit `14ad847` of the SINQ repository. If you find SINQ useful in your research or applications, please - Put a star ⭐ in the official SINQ github repository. - Cite our paper :
Kimi-Linear-48B-A3B-Instruct-3bit-SINQ
Qwen3-1.7B-PreSINQ-GGUF
Apertus-8B-2509-4bit-ASINQ
Qwen3-32B-4bit-SINQ
This repository contains the official 4-bit quantized version of the `Qwen3-32B` model using the SINQ (Sinkhorn-Normalized Quantization) method. SINQ is a novel, fast and high-quality quantization method designed to make any Large Language Models smaller while keeping their accuracy almost intact. To support the project please put a star ⭐ in the official SINQ github repository. Model Details - Model Name: `Qwen3-32B-4bit-SINQ ` - Base Model: `Qwen/Qwen3-32B` - Task: Text Generation - Framework: PyTorch / Transformers - License: Apache-2.0 - Quantized By: Huawei - Computing Systems Lab - Quantization Method: SINQ (Sinkhorn-Normalized Quantization) - Precision: INT4 - Group Size: 64 - Framework: PyTorch - Quantization Library: `sinq` Prerequisite Before running the quantization script, make sure the SINQ library is installed. Installation instructions and setup details are available in the SINQ official github repository. Usage example You can load and use the model with our wrapper based on the 🤗 Transformers library: The quantized model was obtained using the SINQ quantization library, following the steps below: > Reproducibility Note: This model was quantized using the SINQ implementation from commit `14ad847` of the SINQ repository. If you find SINQ useful in your research or applications, please - Put a star ⭐ in the official SINQ github repository. - Cite our paper :
Qwen3-0.6B-PreSINQ-GGUF
Qwen3-14B-4bit-ASINQ
This repository contains the official 4-bit quantized version of the `Qwen3-14B` model using the calibrated version of SINQ (Sinkhorn-Normalized Quantization) method. SINQ is a novel, fast and high-quality quantization method designed to make any Large Language Models smaller while keeping their accuracy almost intact. To support the project please put a star ⭐ in the official SINQ github repository. Model Details - Model Name: `Qwen3-14B-4bit-ASINQ ` - Base Model: `Qwen/Qwen3-14B` - Task: Text Generation - Framework: PyTorch / Transformers - License: Apache-2.0 - Quantized By: Huawei - Computing Systems Lab - Quantization Method: A-SINQ (Sinkhorn-Normalized Quantization) - Precision: INT4 - Group Size: 64 - Framework: PyTorch - Quantization Library: `sinq` Prerequisite Before running the quantization script, make sure the SINQ library is installed. Installation instructions and setup details are available in the SINQ official github repository. Usage example You can load and use the model with our wrapper based on the 🤗 Transformers library: The quantized model was obtained using the SINQ quantization library, following the steps below: > Reproducibility Note: This model was quantized using the SINQ implementation from commit `14ad847` of the SINQ repository. If you find SINQ useful in your research or applications, please - Put a star ⭐ in the official SINQ github repository. - Cite our paper :
Qwen3-32B-3bit-SINQ
This repository contains the official 3-bit quantized version of the `Qwen3-32B` model using the SINQ (Sinkhorn-Normalized Quantization) method. SINQ is a novel, fast and high-quality quantization method designed to make any Large Language Models smaller while keeping their accuracy almost intact. To support the project please put a star ⭐ in the official SINQ github repository. Model Details - Model Name: `Qwen3-32B-3bit-SINQ ` - Base Model: `Qwen/Qwen3-32B` - Task: Text Generation - Framework: PyTorch / Transformers - License: Apache-2.0 - Quantized By: Huawei - Computing Systems Lab - Quantization Method: SINQ (Sinkhorn-Normalized Quantization) - Precision: INT3 - Group Size: 64 - Framework: PyTorch - Quantization Library: `sinq` Prerequisite Before running the quantization script, make sure the SINQ library is installed. Installation instructions and setup details are available in the SINQ official github repository. Usage example You can load and use the model with our wrapper based on the 🤗 Transformers library: The quantized model was obtained using the SINQ quantization library, following the steps below: > Reproducibility Note: This model was quantized using the SINQ implementation from commit `14ad847` of the SINQ repository. If you find SINQ useful in your research or applications, please - Put a star ⭐ in the official SINQ github repository. - Cite our paper :
Qwen3-1.7B-4bit-SINQ
This repository contains the official 4-bit quantized version of the `Qwen3-1.7B` model using the SINQ (Sinkhorn-Normalized Quantization) method. SINQ is a novel, fast and high-quality quantization method designed to make any Large Language Models smaller while keeping their accuracy almost intact. To support the project please put a star ⭐ in the official SINQ github repository. Model Details - Model Name: `Qwen3-1.7B-4bit-SINQ ` - Base Model: `Qwen/Qwen3-1.7B` - Task: Text Generation - Framework: PyTorch / Transformers - License: Apache-2.0 - Quantized By: Huawei - Computing Systems Lab - Quantization Method: SINQ (Sinkhorn-Normalized Quantization) - Precision: INT4 - Group Size: 64 - Framework: PyTorch - Quantization Library: `sinq` Prerequisite Before running the quantization script, make sure the SINQ library is installed. Installation instructions and setup details are available in the SINQ official github repository. Usage example You can load and use the model with our wrapper based on the 🤗 Transformers library: The quantized model was obtained using the SINQ quantization library, following the steps below: > Reproducibility Note: This model was quantized using the SINQ implementation from commit `14ad847` of the SINQ repository. If you find SINQ useful in your research or applications, please - Put a star ⭐ in the official SINQ github repository. - Cite our paper :
Apertus-8B-2509-4bit-SINQ
Qwen3-1.7B-3bit-SINQ
This repository contains the official 3-bit quantized version of the `Qwen3-1.7B` model using the SINQ (Sinkhorn-Normalized Quantization) method. SINQ is a novel, fast and high-quality quantization method designed to make any Large Language Models smaller while keeping their accuracy almost intact. To support the project please put a star ⭐ in the official SINQ github repository. Model Details - Model Name: `Qwen3-1.7B-3bit-SINQ ` - Base Model: `Qwen/Qwen3-1.7B` - Task: Text Generation - Framework: PyTorch / Transformers - License: Apache-2.0 - Quantized By: Huawei - Computing Systems Lab - Quantization Method: SINQ (Sinkhorn-Normalized Quantization) - Precision: INT3 - Group Size: 64 - Framework: PyTorch - Quantization Library: `sinq` Prerequisite Before running the quantization script, make sure the SINQ library is installed. Installation instructions and setup details are available in the SINQ official github repository. Usage example You can load and use the model with our wrapper based on the 🤗 Transformers library: The quantized model was obtained using the SINQ quantization library, following the steps below: > Reproducibility Note: This model was quantized using the SINQ implementation from commit `14ad847` of the SINQ repository. If you find SINQ useful in your research or applications, please - Put a star ⭐ in the official SINQ github repository. - Cite our paper :
Qwen3-14B-3bit-SINQ
This repository contains the official 3-bit quantized version of the `Qwen3-14B` model using the SINQ (Sinkhorn-Normalized Quantization) method. SINQ is a novel, fast and high-quality quantization method designed to make any Large Language Models smaller while keeping their accuracy almost intact. To support the project please put a star ⭐ in the official SINQ github repository. Model Details - Model Name: `Qwen3-14B-3bit-SINQ ` - Base Model: `Qwen/Qwen3-14B` - Task: Text Generation - Framework: PyTorch / Transformers - License: Apache-2.0 - Quantized By: Huawei - Computing Systems Lab - Quantization Method: SINQ (Sinkhorn-Normalized Quantization) - Precision: INT3 - Group Size: 64 - Framework: PyTorch - Quantization Library: `sinq` Prerequisite Before running the quantization script, make sure the SINQ library is installed. Installation instructions and setup details are available in the SINQ official github repository. Usage example You can load and use the model with our wrapper based on the 🤗 Transformers library: The quantized model was obtained using the SINQ quantization library, following the steps below: > Reproducibility Note: This model was quantized using the SINQ implementation from commit `14ad847` of the SINQ repository. If you find SINQ useful in your research or applications, please - Put a star ⭐ in the official SINQ github repository. - Cite our paper :
Qwen3-Next-80B-A3B-Instruct-3bit-SINQ
Qwen3-1.7B-3bit-ASINQ
This repository contains the official 3-bit quantized version of the `Qwen3-1.7B` model using the calibrated version of SINQ (Sinkhorn-Normalized Quantization) method. SINQ is a novel, fast and high-quality quantization method designed to make any Large Language Models smaller while keeping their accuracy almost intact. To support the project please put a star ⭐ in the official SINQ github repository. Model Details - Model Name: `Qwen3-1.7B-3bit-ASINQ ` - Base Model: `Qwen/Qwen3-1.7B` - Task: Text Generation - Framework: PyTorch / Transformers - License: Apache-2.0 - Quantized By: Huawei - Computing Systems Lab - Quantization Method: A-SINQ (Sinkhorn-Normalized Quantization) - Precision: INT3 - Group Size: 64 - Framework: PyTorch - Quantization Library: `sinq` Prerequisite Before running the quantization script, make sure the SINQ library is installed. Installation instructions and setup details are available in the SINQ official github repository. Usage example You can load and use the model with our wrapper based on the 🤗 Transformers library: The quantized model was obtained using the SINQ quantization library, following the steps below: > Reproducibility Note: This model was quantized using the SINQ implementation from commit `14ad847` of the SINQ repository. If you find SINQ useful in your research or applications, please - Put a star ⭐ in the official SINQ github repository. - Cite our paper :
Qwen3-1.7B-4bit-ASINQ
This repository contains the official 4-bit quantized version of the `Qwen3-1.7B` model using the calibrated version of SINQ (Sinkhorn-Normalized Quantization) method. SINQ is a novel, fast and high-quality quantization method designed to make any Large Language Models smaller while keeping their accuracy almost intact. To support the project please put a star ⭐ in the official SINQ github repository. Model Details - Model Name: `Qwen3-1.7B-4bit-ASINQ ` - Base Model: `Qwen/Qwen3-1.7B` - Task: Text Generation - Framework: PyTorch / Transformers - License: Apache-2.0 - Quantized By: Huawei - Computing Systems Lab - Quantization Method: A-SINQ (Sinkhorn-Normalized Quantization) - Precision: INT4 - Group Size: 64 - Framework: PyTorch - Quantization Library: `sinq` Prerequisite Before running the quantization script, make sure the SINQ library is installed. Installation instructions and setup details are available in the SINQ official github repository. Usage example You can load and use the model with our wrapper based on the 🤗 Transformers library: The quantized model was obtained using the SINQ quantization library, following the steps below: > Reproducibility Note: This model was quantized using the SINQ implementation from commit `14ad847` of the SINQ repository. If you find SINQ useful in your research or applications, please - Put a star ⭐ in the official SINQ github repository. - Cite our paper :
Qwen3-Next-80B-A3B-Instruct-4bit-SINQ
Qwen3-32B-4bit-ASINQ
This repository contains the official 4-bit quantized version of the `Qwen3-32B` model using the calibrated version of SINQ (Sinkhorn-Normalized Quantization) method. SINQ is a novel, fast and high-quality quantization method designed to make any Large Language Models smaller while keeping their accuracy almost intact. To support the project please put a star ⭐ in the official SINQ github repository. Model Details - Model Name: `Qwen3-32B-4bit-ASINQ ` - Base Model: `Qwen/Qwen3-32B` - Task: Text Generation - Framework: PyTorch / Transformers - License: Apache-2.0 - Quantized By: Huawei - Computing Systems Lab - Quantization Method: A-SINQ (Sinkhorn-Normalized Quantization) - Precision: INT4 - Group Size: 64 - Framework: PyTorch - Quantization Library: `sinq` Prerequisite Before running the quantization script, make sure the SINQ library is installed. Installation instructions and setup details are available in the SINQ official github repository. Usage example You can load and use the model with our wrapper based on the 🤗 Transformers library: The quantized model was obtained using the SINQ quantization library, following the steps below: > Reproducibility Note: This model was quantized using the SINQ implementation from commit `14ad847` of the SINQ repository. If you find SINQ useful in your research or applications, please - Put a star ⭐ in the official SINQ github repository. - Cite our paper :