huawei-csl

23 models • 12 total models in database

Sort by:

Kimi-Linear-48B-A3B-Instruct-4bit-SINQ

license:apache-2.0

Qwen3-14B-4bit-SINQ

This repository contains the official 4-bit quantized version of the `Qwen3-14B` model using the SINQ (Sinkhorn-Normalized Quantization) method. SINQ is a novel, fast and high-quality quantization method designed to make any Large Language Models smaller while keeping their accuracy almost intact. To support the project please put a star ⭐ in the official SINQ github repository. Model Details - Model Name: `Qwen3-14B-4bit-SINQ ` - Base Model: `Qwen/Qwen3-14B` - Task: Text Generation - Framework: PyTorch / Transformers - License: Apache-2.0 - Quantized By: Huawei - Computing Systems Lab - Quantization Method: SINQ (Sinkhorn-Normalized Quantization) - Precision: INT4 - Group Size: 64 - Framework: PyTorch - Quantization Library: `sinq` Prerequisite Before running the quantization script, make sure the SINQ library is installed. Installation instructions and setup details are available in the SINQ official github repository. Usage example You can load and use the model with our wrapper based on the 🤗 Transformers library: The quantized model was obtained using the SINQ quantization library, following the steps below: > Reproducibility Note: This model was quantized using the SINQ implementation from commit `14ad847` of the SINQ repository. If you find SINQ useful in your research or applications, please - Put a star ⭐ in the official SINQ github repository. - Cite our paper :

license:apache-2.0

Qwen3-4B-PreSINQ-GGUF

license:apache-2.0

Qwen3-14B-3bit-ASINQ

This repository contains the official 3-bit quantized version of the `Qwen3-14B` model using the calibrated version of SINQ (Sinkhorn-Normalized Quantization) method. SINQ is a novel, fast and high-quality quantization method designed to make any Large Language Models smaller while keeping their accuracy almost intact. To support the project please put a star ⭐ in the official SINQ github repository. Model Details - Model Name: `Qwen3-14B-3bit-ASINQ ` - Base Model: `Qwen/Qwen3-14B` - Task: Text Generation - Framework: PyTorch / Transformers - License: Apache-2.0 - Quantized By: Huawei - Computing Systems Lab - Quantization Method: A-SINQ (Sinkhorn-Normalized Quantization) - Precision: INT3 - Group Size: 64 - Framework: PyTorch - Quantization Library: `sinq` Prerequisite Before running the quantization script, make sure the SINQ library is installed. Installation instructions and setup details are available in the SINQ official github repository. Usage example You can load and use the model with our wrapper based on the 🤗 Transformers library: The quantized model was obtained using the SINQ quantization library, following the steps below: > Reproducibility Note: This model was quantized using the SINQ implementation from commit `14ad847` of the SINQ repository. If you find SINQ useful in your research or applications, please - Put a star ⭐ in the official SINQ github repository. - Cite our paper :

license:apache-2.0

Qwen3-32B-3bit-ASINQ

This repository contains the official 3-bit quantized version of the `Qwen3-32B` model using the calibrated version of SINQ (Sinkhorn-Normalized Quantization) method. SINQ is a novel, fast and high-quality quantization method designed to make any Large Language Models smaller while keeping their accuracy almost intact. To support the project please put a star ⭐ in the official SINQ github repository. Model Details - Model Name: `Qwen3-32B-3bit-ASINQ ` - Base Model: `Qwen/Qwen3-32B` - Task: Text Generation - Framework: PyTorch / Transformers - License: Apache-2.0 - Quantized By: Huawei - Computing Systems Lab - Quantization Method: A-SINQ (Sinkhorn-Normalized Quantization) - Precision: INT3 - Group Size: 64 - Framework: PyTorch - Quantization Library: `sinq` Prerequisite Before running the quantization script, make sure the SINQ library is installed. Installation instructions and setup details are available in the SINQ official github repository. Usage example You can load and use the model with our wrapper based on the 🤗 Transformers library: The quantized model was obtained using the SINQ quantization library, following the steps below: > Reproducibility Note: This model was quantized using the SINQ implementation from commit `14ad847` of the SINQ repository. If you find SINQ useful in your research or applications, please - Put a star ⭐ in the official SINQ github repository. - Cite our paper :

license:apache-2.0

Kimi-Linear-48B-A3B-Instruct-3bit-SINQ

license:apache-2.0

Qwen3-1.7B-PreSINQ-GGUF

license:apache-2.0

Apertus-8B-2509-4bit-ASINQ

license:apache-2.0

Qwen3-32B-4bit-SINQ

This repository contains the official 4-bit quantized version of the `Qwen3-32B` model using the SINQ (Sinkhorn-Normalized Quantization) method. SINQ is a novel, fast and high-quality quantization method designed to make any Large Language Models smaller while keeping their accuracy almost intact. To support the project please put a star ⭐ in the official SINQ github repository. Model Details - Model Name: `Qwen3-32B-4bit-SINQ ` - Base Model: `Qwen/Qwen3-32B` - Task: Text Generation - Framework: PyTorch / Transformers - License: Apache-2.0 - Quantized By: Huawei - Computing Systems Lab - Quantization Method: SINQ (Sinkhorn-Normalized Quantization) - Precision: INT4 - Group Size: 64 - Framework: PyTorch - Quantization Library: `sinq` Prerequisite Before running the quantization script, make sure the SINQ library is installed. Installation instructions and setup details are available in the SINQ official github repository. Usage example You can load and use the model with our wrapper based on the 🤗 Transformers library: The quantized model was obtained using the SINQ quantization library, following the steps below: > Reproducibility Note: This model was quantized using the SINQ implementation from commit `14ad847` of the SINQ repository. If you find SINQ useful in your research or applications, please - Put a star ⭐ in the official SINQ github repository. - Cite our paper :

license:apache-2.0

Qwen3-0.6B-PreSINQ-GGUF

license:apache-2.0

Qwen3-14B-4bit-ASINQ

This repository contains the official 4-bit quantized version of the `Qwen3-14B` model using the calibrated version of SINQ (Sinkhorn-Normalized Quantization) method. SINQ is a novel, fast and high-quality quantization method designed to make any Large Language Models smaller while keeping their accuracy almost intact. To support the project please put a star ⭐ in the official SINQ github repository. Model Details - Model Name: `Qwen3-14B-4bit-ASINQ ` - Base Model: `Qwen/Qwen3-14B` - Task: Text Generation - Framework: PyTorch / Transformers - License: Apache-2.0 - Quantized By: Huawei - Computing Systems Lab - Quantization Method: A-SINQ (Sinkhorn-Normalized Quantization) - Precision: INT4 - Group Size: 64 - Framework: PyTorch - Quantization Library: `sinq` Prerequisite Before running the quantization script, make sure the SINQ library is installed. Installation instructions and setup details are available in the SINQ official github repository. Usage example You can load and use the model with our wrapper based on the 🤗 Transformers library: The quantized model was obtained using the SINQ quantization library, following the steps below: > Reproducibility Note: This model was quantized using the SINQ implementation from commit `14ad847` of the SINQ repository. If you find SINQ useful in your research or applications, please - Put a star ⭐ in the official SINQ github repository. - Cite our paper :

license:apache-2.0

Qwen3-32B-3bit-SINQ

This repository contains the official 3-bit quantized version of the `Qwen3-32B` model using the SINQ (Sinkhorn-Normalized Quantization) method. SINQ is a novel, fast and high-quality quantization method designed to make any Large Language Models smaller while keeping their accuracy almost intact. To support the project please put a star ⭐ in the official SINQ github repository. Model Details - Model Name: `Qwen3-32B-3bit-SINQ ` - Base Model: `Qwen/Qwen3-32B` - Task: Text Generation - Framework: PyTorch / Transformers - License: Apache-2.0 - Quantized By: Huawei - Computing Systems Lab - Quantization Method: SINQ (Sinkhorn-Normalized Quantization) - Precision: INT3 - Group Size: 64 - Framework: PyTorch - Quantization Library: `sinq` Prerequisite Before running the quantization script, make sure the SINQ library is installed. Installation instructions and setup details are available in the SINQ official github repository. Usage example You can load and use the model with our wrapper based on the 🤗 Transformers library: The quantized model was obtained using the SINQ quantization library, following the steps below: > Reproducibility Note: This model was quantized using the SINQ implementation from commit `14ad847` of the SINQ repository. If you find SINQ useful in your research or applications, please - Put a star ⭐ in the official SINQ github repository. - Cite our paper :

license:apache-2.0

Qwen3-1.7B-4bit-SINQ

This repository contains the official 4-bit quantized version of the `Qwen3-1.7B` model using the SINQ (Sinkhorn-Normalized Quantization) method. SINQ is a novel, fast and high-quality quantization method designed to make any Large Language Models smaller while keeping their accuracy almost intact. To support the project please put a star ⭐ in the official SINQ github repository. Model Details - Model Name: `Qwen3-1.7B-4bit-SINQ ` - Base Model: `Qwen/Qwen3-1.7B` - Task: Text Generation - Framework: PyTorch / Transformers - License: Apache-2.0 - Quantized By: Huawei - Computing Systems Lab - Quantization Method: SINQ (Sinkhorn-Normalized Quantization) - Precision: INT4 - Group Size: 64 - Framework: PyTorch - Quantization Library: `sinq` Prerequisite Before running the quantization script, make sure the SINQ library is installed. Installation instructions and setup details are available in the SINQ official github repository. Usage example You can load and use the model with our wrapper based on the 🤗 Transformers library: The quantized model was obtained using the SINQ quantization library, following the steps below: > Reproducibility Note: This model was quantized using the SINQ implementation from commit `14ad847` of the SINQ repository. If you find SINQ useful in your research or applications, please - Put a star ⭐ in the official SINQ github repository. - Cite our paper :

license:apache-2.0

Apertus-8B-2509-4bit-SINQ

license:apache-2.0

Qwen3-1.7B-3bit-SINQ

This repository contains the official 3-bit quantized version of the `Qwen3-1.7B` model using the SINQ (Sinkhorn-Normalized Quantization) method. SINQ is a novel, fast and high-quality quantization method designed to make any Large Language Models smaller while keeping their accuracy almost intact. To support the project please put a star ⭐ in the official SINQ github repository. Model Details - Model Name: `Qwen3-1.7B-3bit-SINQ ` - Base Model: `Qwen/Qwen3-1.7B` - Task: Text Generation - Framework: PyTorch / Transformers - License: Apache-2.0 - Quantized By: Huawei - Computing Systems Lab - Quantization Method: SINQ (Sinkhorn-Normalized Quantization) - Precision: INT3 - Group Size: 64 - Framework: PyTorch - Quantization Library: `sinq` Prerequisite Before running the quantization script, make sure the SINQ library is installed. Installation instructions and setup details are available in the SINQ official github repository. Usage example You can load and use the model with our wrapper based on the 🤗 Transformers library: The quantized model was obtained using the SINQ quantization library, following the steps below: > Reproducibility Note: This model was quantized using the SINQ implementation from commit `14ad847` of the SINQ repository. If you find SINQ useful in your research or applications, please - Put a star ⭐ in the official SINQ github repository. - Cite our paper :

license:apache-2.0

Qwen3-14B-3bit-SINQ

This repository contains the official 3-bit quantized version of the `Qwen3-14B` model using the SINQ (Sinkhorn-Normalized Quantization) method. SINQ is a novel, fast and high-quality quantization method designed to make any Large Language Models smaller while keeping their accuracy almost intact. To support the project please put a star ⭐ in the official SINQ github repository. Model Details - Model Name: `Qwen3-14B-3bit-SINQ ` - Base Model: `Qwen/Qwen3-14B` - Task: Text Generation - Framework: PyTorch / Transformers - License: Apache-2.0 - Quantized By: Huawei - Computing Systems Lab - Quantization Method: SINQ (Sinkhorn-Normalized Quantization) - Precision: INT3 - Group Size: 64 - Framework: PyTorch - Quantization Library: `sinq` Prerequisite Before running the quantization script, make sure the SINQ library is installed. Installation instructions and setup details are available in the SINQ official github repository. Usage example You can load and use the model with our wrapper based on the 🤗 Transformers library: The quantized model was obtained using the SINQ quantization library, following the steps below: > Reproducibility Note: This model was quantized using the SINQ implementation from commit `14ad847` of the SINQ repository. If you find SINQ useful in your research or applications, please - Put a star ⭐ in the official SINQ github repository. - Cite our paper :

license:apache-2.0

Qwen3-Next-80B-A3B-Instruct-3bit-SINQ

license:apache-2.0

Qwen3-1.7B-3bit-ASINQ

This repository contains the official 3-bit quantized version of the `Qwen3-1.7B` model using the calibrated version of SINQ (Sinkhorn-Normalized Quantization) method. SINQ is a novel, fast and high-quality quantization method designed to make any Large Language Models smaller while keeping their accuracy almost intact. To support the project please put a star ⭐ in the official SINQ github repository. Model Details - Model Name: `Qwen3-1.7B-3bit-ASINQ ` - Base Model: `Qwen/Qwen3-1.7B` - Task: Text Generation - Framework: PyTorch / Transformers - License: Apache-2.0 - Quantized By: Huawei - Computing Systems Lab - Quantization Method: A-SINQ (Sinkhorn-Normalized Quantization) - Precision: INT3 - Group Size: 64 - Framework: PyTorch - Quantization Library: `sinq` Prerequisite Before running the quantization script, make sure the SINQ library is installed. Installation instructions and setup details are available in the SINQ official github repository. Usage example You can load and use the model with our wrapper based on the 🤗 Transformers library: The quantized model was obtained using the SINQ quantization library, following the steps below: > Reproducibility Note: This model was quantized using the SINQ implementation from commit `14ad847` of the SINQ repository. If you find SINQ useful in your research or applications, please - Put a star ⭐ in the official SINQ github repository. - Cite our paper :

license:apache-2.0

Qwen3-1.7B-4bit-ASINQ

This repository contains the official 4-bit quantized version of the `Qwen3-1.7B` model using the calibrated version of SINQ (Sinkhorn-Normalized Quantization) method. SINQ is a novel, fast and high-quality quantization method designed to make any Large Language Models smaller while keeping their accuracy almost intact. To support the project please put a star ⭐ in the official SINQ github repository. Model Details - Model Name: `Qwen3-1.7B-4bit-ASINQ ` - Base Model: `Qwen/Qwen3-1.7B` - Task: Text Generation - Framework: PyTorch / Transformers - License: Apache-2.0 - Quantized By: Huawei - Computing Systems Lab - Quantization Method: A-SINQ (Sinkhorn-Normalized Quantization) - Precision: INT4 - Group Size: 64 - Framework: PyTorch - Quantization Library: `sinq` Prerequisite Before running the quantization script, make sure the SINQ library is installed. Installation instructions and setup details are available in the SINQ official github repository. Usage example You can load and use the model with our wrapper based on the 🤗 Transformers library: The quantized model was obtained using the SINQ quantization library, following the steps below: > Reproducibility Note: This model was quantized using the SINQ implementation from commit `14ad847` of the SINQ repository. If you find SINQ useful in your research or applications, please - Put a star ⭐ in the official SINQ github repository. - Cite our paper :

license:apache-2.0

Qwen3-Next-80B-A3B-Instruct-4bit-SINQ

license:apache-2.0

Qwen3-32B-4bit-ASINQ

This repository contains the official 4-bit quantized version of the `Qwen3-32B` model using the calibrated version of SINQ (Sinkhorn-Normalized Quantization) method. SINQ is a novel, fast and high-quality quantization method designed to make any Large Language Models smaller while keeping their accuracy almost intact. To support the project please put a star ⭐ in the official SINQ github repository. Model Details - Model Name: `Qwen3-32B-4bit-ASINQ ` - Base Model: `Qwen/Qwen3-32B` - Task: Text Generation - Framework: PyTorch / Transformers - License: Apache-2.0 - Quantized By: Huawei - Computing Systems Lab - Quantization Method: A-SINQ (Sinkhorn-Normalized Quantization) - Precision: INT4 - Group Size: 64 - Framework: PyTorch - Quantization Library: `sinq` Prerequisite Before running the quantization script, make sure the SINQ library is installed. Installation instructions and setup details are available in the SINQ official github repository. Usage example You can load and use the model with our wrapper based on the 🤗 Transformers library: The quantized model was obtained using the SINQ quantization library, following the steps below: > Reproducibility Note: This model was quantized using the SINQ implementation from commit `14ad847` of the SINQ repository. If you find SINQ useful in your research or applications, please - Put a star ⭐ in the official SINQ github repository. - Cite our paper :

license:apache-2.0

Qwen3-235B-A22B-3bit-SINQ

license:apache-2.0

Qwen3-8B-PreSINQ-GGUF

license:apache-2.0