huawei-csl

23 models • 12 total models in database
Sort by:

Kimi-Linear-48B-A3B-Instruct-4bit-SINQ

NaNK
license:apache-2.0
26
3

Qwen3-14B-4bit-SINQ

This repository contains the official 4-bit quantized version of the `Qwen3-14B` model using the SINQ (Sinkhorn-Normalized Quantization) method. SINQ is a novel, fast and high-quality quantization method designed to make any Large Language Models smaller while keeping their accuracy almost intact. To support the project please put a star ⭐ in the official SINQ github repository. Model Details - Model Name: `Qwen3-14B-4bit-SINQ ` - Base Model: `Qwen/Qwen3-14B` - Task: Text Generation - Framework: PyTorch / Transformers - License: Apache-2.0 - Quantized By: Huawei - Computing Systems Lab - Quantization Method: SINQ (Sinkhorn-Normalized Quantization) - Precision: INT4 - Group Size: 64 - Framework: PyTorch - Quantization Library: `sinq` Prerequisite Before running the quantization script, make sure the SINQ library is installed. Installation instructions and setup details are available in the SINQ official github repository. Usage example You can load and use the model with our wrapper based on the 🤗 Transformers library: The quantized model was obtained using the SINQ quantization library, following the steps below: > Reproducibility Note: This model was quantized using the SINQ implementation from commit `14ad847` of the SINQ repository. If you find SINQ useful in your research or applications, please - Put a star ⭐ in the official SINQ github repository. - Cite our paper :

NaNK
license:apache-2.0
17
5

Qwen3-4B-PreSINQ-GGUF

NaNK
license:apache-2.0
17
0

Qwen3-14B-3bit-ASINQ

This repository contains the official 3-bit quantized version of the `Qwen3-14B` model using the calibrated version of SINQ (Sinkhorn-Normalized Quantization) method. SINQ is a novel, fast and high-quality quantization method designed to make any Large Language Models smaller while keeping their accuracy almost intact. To support the project please put a star ⭐ in the official SINQ github repository. Model Details - Model Name: `Qwen3-14B-3bit-ASINQ ` - Base Model: `Qwen/Qwen3-14B` - Task: Text Generation - Framework: PyTorch / Transformers - License: Apache-2.0 - Quantized By: Huawei - Computing Systems Lab - Quantization Method: A-SINQ (Sinkhorn-Normalized Quantization) - Precision: INT3 - Group Size: 64 - Framework: PyTorch - Quantization Library: `sinq` Prerequisite Before running the quantization script, make sure the SINQ library is installed. Installation instructions and setup details are available in the SINQ official github repository. Usage example You can load and use the model with our wrapper based on the 🤗 Transformers library: The quantized model was obtained using the SINQ quantization library, following the steps below: > Reproducibility Note: This model was quantized using the SINQ implementation from commit `14ad847` of the SINQ repository. If you find SINQ useful in your research or applications, please - Put a star ⭐ in the official SINQ github repository. - Cite our paper :

NaNK
license:apache-2.0
15
5

Qwen3-32B-3bit-ASINQ

This repository contains the official 3-bit quantized version of the `Qwen3-32B` model using the calibrated version of SINQ (Sinkhorn-Normalized Quantization) method. SINQ is a novel, fast and high-quality quantization method designed to make any Large Language Models smaller while keeping their accuracy almost intact. To support the project please put a star ⭐ in the official SINQ github repository. Model Details - Model Name: `Qwen3-32B-3bit-ASINQ ` - Base Model: `Qwen/Qwen3-32B` - Task: Text Generation - Framework: PyTorch / Transformers - License: Apache-2.0 - Quantized By: Huawei - Computing Systems Lab - Quantization Method: A-SINQ (Sinkhorn-Normalized Quantization) - Precision: INT3 - Group Size: 64 - Framework: PyTorch - Quantization Library: `sinq` Prerequisite Before running the quantization script, make sure the SINQ library is installed. Installation instructions and setup details are available in the SINQ official github repository. Usage example You can load and use the model with our wrapper based on the 🤗 Transformers library: The quantized model was obtained using the SINQ quantization library, following the steps below: > Reproducibility Note: This model was quantized using the SINQ implementation from commit `14ad847` of the SINQ repository. If you find SINQ useful in your research or applications, please - Put a star ⭐ in the official SINQ github repository. - Cite our paper :

NaNK
license:apache-2.0
14
5

Kimi-Linear-48B-A3B-Instruct-3bit-SINQ

NaNK
license:apache-2.0
14
1

Qwen3-1.7B-PreSINQ-GGUF

NaNK
license:apache-2.0
12
0

Apertus-8B-2509-4bit-ASINQ

NaNK
license:apache-2.0
11
2

Qwen3-32B-4bit-SINQ

This repository contains the official 4-bit quantized version of the `Qwen3-32B` model using the SINQ (Sinkhorn-Normalized Quantization) method. SINQ is a novel, fast and high-quality quantization method designed to make any Large Language Models smaller while keeping their accuracy almost intact. To support the project please put a star ⭐ in the official SINQ github repository. Model Details - Model Name: `Qwen3-32B-4bit-SINQ ` - Base Model: `Qwen/Qwen3-32B` - Task: Text Generation - Framework: PyTorch / Transformers - License: Apache-2.0 - Quantized By: Huawei - Computing Systems Lab - Quantization Method: SINQ (Sinkhorn-Normalized Quantization) - Precision: INT4 - Group Size: 64 - Framework: PyTorch - Quantization Library: `sinq` Prerequisite Before running the quantization script, make sure the SINQ library is installed. Installation instructions and setup details are available in the SINQ official github repository. Usage example You can load and use the model with our wrapper based on the 🤗 Transformers library: The quantized model was obtained using the SINQ quantization library, following the steps below: > Reproducibility Note: This model was quantized using the SINQ implementation from commit `14ad847` of the SINQ repository. If you find SINQ useful in your research or applications, please - Put a star ⭐ in the official SINQ github repository. - Cite our paper :

NaNK
license:apache-2.0
10
7

Qwen3-0.6B-PreSINQ-GGUF

NaNK
license:apache-2.0
10
0

Qwen3-14B-4bit-ASINQ

This repository contains the official 4-bit quantized version of the `Qwen3-14B` model using the calibrated version of SINQ (Sinkhorn-Normalized Quantization) method. SINQ is a novel, fast and high-quality quantization method designed to make any Large Language Models smaller while keeping their accuracy almost intact. To support the project please put a star ⭐ in the official SINQ github repository. Model Details - Model Name: `Qwen3-14B-4bit-ASINQ ` - Base Model: `Qwen/Qwen3-14B` - Task: Text Generation - Framework: PyTorch / Transformers - License: Apache-2.0 - Quantized By: Huawei - Computing Systems Lab - Quantization Method: A-SINQ (Sinkhorn-Normalized Quantization) - Precision: INT4 - Group Size: 64 - Framework: PyTorch - Quantization Library: `sinq` Prerequisite Before running the quantization script, make sure the SINQ library is installed. Installation instructions and setup details are available in the SINQ official github repository. Usage example You can load and use the model with our wrapper based on the 🤗 Transformers library: The quantized model was obtained using the SINQ quantization library, following the steps below: > Reproducibility Note: This model was quantized using the SINQ implementation from commit `14ad847` of the SINQ repository. If you find SINQ useful in your research or applications, please - Put a star ⭐ in the official SINQ github repository. - Cite our paper :

NaNK
license:apache-2.0
9
6

Qwen3-32B-3bit-SINQ

This repository contains the official 3-bit quantized version of the `Qwen3-32B` model using the SINQ (Sinkhorn-Normalized Quantization) method. SINQ is a novel, fast and high-quality quantization method designed to make any Large Language Models smaller while keeping their accuracy almost intact. To support the project please put a star ⭐ in the official SINQ github repository. Model Details - Model Name: `Qwen3-32B-3bit-SINQ ` - Base Model: `Qwen/Qwen3-32B` - Task: Text Generation - Framework: PyTorch / Transformers - License: Apache-2.0 - Quantized By: Huawei - Computing Systems Lab - Quantization Method: SINQ (Sinkhorn-Normalized Quantization) - Precision: INT3 - Group Size: 64 - Framework: PyTorch - Quantization Library: `sinq` Prerequisite Before running the quantization script, make sure the SINQ library is installed. Installation instructions and setup details are available in the SINQ official github repository. Usage example You can load and use the model with our wrapper based on the 🤗 Transformers library: The quantized model was obtained using the SINQ quantization library, following the steps below: > Reproducibility Note: This model was quantized using the SINQ implementation from commit `14ad847` of the SINQ repository. If you find SINQ useful in your research or applications, please - Put a star ⭐ in the official SINQ github repository. - Cite our paper :

NaNK
license:apache-2.0
9
6

Qwen3-1.7B-4bit-SINQ

This repository contains the official 4-bit quantized version of the `Qwen3-1.7B` model using the SINQ (Sinkhorn-Normalized Quantization) method. SINQ is a novel, fast and high-quality quantization method designed to make any Large Language Models smaller while keeping their accuracy almost intact. To support the project please put a star ⭐ in the official SINQ github repository. Model Details - Model Name: `Qwen3-1.7B-4bit-SINQ ` - Base Model: `Qwen/Qwen3-1.7B` - Task: Text Generation - Framework: PyTorch / Transformers - License: Apache-2.0 - Quantized By: Huawei - Computing Systems Lab - Quantization Method: SINQ (Sinkhorn-Normalized Quantization) - Precision: INT4 - Group Size: 64 - Framework: PyTorch - Quantization Library: `sinq` Prerequisite Before running the quantization script, make sure the SINQ library is installed. Installation instructions and setup details are available in the SINQ official github repository. Usage example You can load and use the model with our wrapper based on the 🤗 Transformers library: The quantized model was obtained using the SINQ quantization library, following the steps below: > Reproducibility Note: This model was quantized using the SINQ implementation from commit `14ad847` of the SINQ repository. If you find SINQ useful in your research or applications, please - Put a star ⭐ in the official SINQ github repository. - Cite our paper :

NaNK
license:apache-2.0
8
5

Apertus-8B-2509-4bit-SINQ

NaNK
license:apache-2.0
8
2

Qwen3-1.7B-3bit-SINQ

This repository contains the official 3-bit quantized version of the `Qwen3-1.7B` model using the SINQ (Sinkhorn-Normalized Quantization) method. SINQ is a novel, fast and high-quality quantization method designed to make any Large Language Models smaller while keeping their accuracy almost intact. To support the project please put a star ⭐ in the official SINQ github repository. Model Details - Model Name: `Qwen3-1.7B-3bit-SINQ ` - Base Model: `Qwen/Qwen3-1.7B` - Task: Text Generation - Framework: PyTorch / Transformers - License: Apache-2.0 - Quantized By: Huawei - Computing Systems Lab - Quantization Method: SINQ (Sinkhorn-Normalized Quantization) - Precision: INT3 - Group Size: 64 - Framework: PyTorch - Quantization Library: `sinq` Prerequisite Before running the quantization script, make sure the SINQ library is installed. Installation instructions and setup details are available in the SINQ official github repository. Usage example You can load and use the model with our wrapper based on the 🤗 Transformers library: The quantized model was obtained using the SINQ quantization library, following the steps below: > Reproducibility Note: This model was quantized using the SINQ implementation from commit `14ad847` of the SINQ repository. If you find SINQ useful in your research or applications, please - Put a star ⭐ in the official SINQ github repository. - Cite our paper :

NaNK
license:apache-2.0
7
7

Qwen3-14B-3bit-SINQ

This repository contains the official 3-bit quantized version of the `Qwen3-14B` model using the SINQ (Sinkhorn-Normalized Quantization) method. SINQ is a novel, fast and high-quality quantization method designed to make any Large Language Models smaller while keeping their accuracy almost intact. To support the project please put a star ⭐ in the official SINQ github repository. Model Details - Model Name: `Qwen3-14B-3bit-SINQ ` - Base Model: `Qwen/Qwen3-14B` - Task: Text Generation - Framework: PyTorch / Transformers - License: Apache-2.0 - Quantized By: Huawei - Computing Systems Lab - Quantization Method: SINQ (Sinkhorn-Normalized Quantization) - Precision: INT3 - Group Size: 64 - Framework: PyTorch - Quantization Library: `sinq` Prerequisite Before running the quantization script, make sure the SINQ library is installed. Installation instructions and setup details are available in the SINQ official github repository. Usage example You can load and use the model with our wrapper based on the 🤗 Transformers library: The quantized model was obtained using the SINQ quantization library, following the steps below: > Reproducibility Note: This model was quantized using the SINQ implementation from commit `14ad847` of the SINQ repository. If you find SINQ useful in your research or applications, please - Put a star ⭐ in the official SINQ github repository. - Cite our paper :

NaNK
license:apache-2.0
7
5

Qwen3-Next-80B-A3B-Instruct-3bit-SINQ

NaNK
license:apache-2.0
7
2

Qwen3-1.7B-3bit-ASINQ

This repository contains the official 3-bit quantized version of the `Qwen3-1.7B` model using the calibrated version of SINQ (Sinkhorn-Normalized Quantization) method. SINQ is a novel, fast and high-quality quantization method designed to make any Large Language Models smaller while keeping their accuracy almost intact. To support the project please put a star ⭐ in the official SINQ github repository. Model Details - Model Name: `Qwen3-1.7B-3bit-ASINQ ` - Base Model: `Qwen/Qwen3-1.7B` - Task: Text Generation - Framework: PyTorch / Transformers - License: Apache-2.0 - Quantized By: Huawei - Computing Systems Lab - Quantization Method: A-SINQ (Sinkhorn-Normalized Quantization) - Precision: INT3 - Group Size: 64 - Framework: PyTorch - Quantization Library: `sinq` Prerequisite Before running the quantization script, make sure the SINQ library is installed. Installation instructions and setup details are available in the SINQ official github repository. Usage example You can load and use the model with our wrapper based on the 🤗 Transformers library: The quantized model was obtained using the SINQ quantization library, following the steps below: > Reproducibility Note: This model was quantized using the SINQ implementation from commit `14ad847` of the SINQ repository. If you find SINQ useful in your research or applications, please - Put a star ⭐ in the official SINQ github repository. - Cite our paper :

NaNK
license:apache-2.0
6
7

Qwen3-1.7B-4bit-ASINQ

This repository contains the official 4-bit quantized version of the `Qwen3-1.7B` model using the calibrated version of SINQ (Sinkhorn-Normalized Quantization) method. SINQ is a novel, fast and high-quality quantization method designed to make any Large Language Models smaller while keeping their accuracy almost intact. To support the project please put a star ⭐ in the official SINQ github repository. Model Details - Model Name: `Qwen3-1.7B-4bit-ASINQ ` - Base Model: `Qwen/Qwen3-1.7B` - Task: Text Generation - Framework: PyTorch / Transformers - License: Apache-2.0 - Quantized By: Huawei - Computing Systems Lab - Quantization Method: A-SINQ (Sinkhorn-Normalized Quantization) - Precision: INT4 - Group Size: 64 - Framework: PyTorch - Quantization Library: `sinq` Prerequisite Before running the quantization script, make sure the SINQ library is installed. Installation instructions and setup details are available in the SINQ official github repository. Usage example You can load and use the model with our wrapper based on the 🤗 Transformers library: The quantized model was obtained using the SINQ quantization library, following the steps below: > Reproducibility Note: This model was quantized using the SINQ implementation from commit `14ad847` of the SINQ repository. If you find SINQ useful in your research or applications, please - Put a star ⭐ in the official SINQ github repository. - Cite our paper :

NaNK
license:apache-2.0
6
5

Qwen3-Next-80B-A3B-Instruct-4bit-SINQ

NaNK
license:apache-2.0
6
2

Qwen3-32B-4bit-ASINQ

This repository contains the official 4-bit quantized version of the `Qwen3-32B` model using the calibrated version of SINQ (Sinkhorn-Normalized Quantization) method. SINQ is a novel, fast and high-quality quantization method designed to make any Large Language Models smaller while keeping their accuracy almost intact. To support the project please put a star ⭐ in the official SINQ github repository. Model Details - Model Name: `Qwen3-32B-4bit-ASINQ ` - Base Model: `Qwen/Qwen3-32B` - Task: Text Generation - Framework: PyTorch / Transformers - License: Apache-2.0 - Quantized By: Huawei - Computing Systems Lab - Quantization Method: A-SINQ (Sinkhorn-Normalized Quantization) - Precision: INT4 - Group Size: 64 - Framework: PyTorch - Quantization Library: `sinq` Prerequisite Before running the quantization script, make sure the SINQ library is installed. Installation instructions and setup details are available in the SINQ official github repository. Usage example You can load and use the model with our wrapper based on the 🤗 Transformers library: The quantized model was obtained using the SINQ quantization library, following the steps below: > Reproducibility Note: This model was quantized using the SINQ implementation from commit `14ad847` of the SINQ repository. If you find SINQ useful in your research or applications, please - Put a star ⭐ in the official SINQ github repository. - Cite our paper :

NaNK
license:apache-2.0
5
8

Qwen3-235B-A22B-3bit-SINQ

NaNK
license:apache-2.0
4
2

Qwen3-8B-PreSINQ-GGUF

NaNK
license:apache-2.0
0
1