Intel

262 models • 15 total models in database

Sort by:

dpt-hybrid-midas

--- license: apache-2.0 tags: - vision - depth-estimation widget: - src: https://huggingface.co/datasets/mishig/sample_images/resolve/main/tiger.jpg example_title: Tiger - src: https://huggingface.co/datasets/mishig/sample_images/resolve/main/teapot.jpg example_title: Teapot - src: https://huggingface.co/datasets/mishig/sample_images/resolve/main/palace.jpg example_title: Palace model-index: - name: dpt-hybrid-midas results: - task: type: monocular-depth-estimation name: Monocular Depth Estimati

license:apache-2.0

757,265

102

zoedepth-nyu-kitti

--- license: mit tags: - vision pipeline_tag: depth-estimation ---

license:mit

394,663

dpt-large

license:apache-2.0

138,597

197

zoedepth-nyu

ZoeDepth model fine-tuned on the NYU dataset. It was introduced in the paper ZoeDepth: Zero-shot Transfer by Combining Relative and Metric Depth by Shariq et al. and first released in this repository. ZoeDepth extends the DPT framework for metric (also called absolute) depth estimation, obtaining state-of-the-art results. Disclaimer: The team releasing ZoeDepth did not write a model card for this model so this model card has been written by the Hugging Face team. ZoeDepth adapts DPT, a model for relative depth estimation, for so-called metric (also called absolute) depth estimation. This means that the model is able to estimate depth in actual metric values. ZoeDepth architecture. Taken from the original paper. You can use the raw model for tasks like zero-shot monocular depth estimation. See the model hub to look for other versions on a task that interests you. The easiest is to leverage the pipeline API which abstracts away the complexity for the user: For more code examples, we refer to the documentation.

license:mit

47,597

Qwen3.5-122B-A10B-int4-AutoRound

NaNK

—

39,070

neural-chat-7b-v3-3

A large language model (LLM) licensed under Apache 2.0.

NaNK

license:apache-2.0

30,274

Mistral-Small-3.2-24B-Instruct-2506-int4-AutoRound

This model is an int4 model with groupsize 128 and symmetric quantization of mistralai/Mistral-Small-3.2-24B-Instruct-2506 generated by intel/auto-round algorithm. The model can produce factually incorrect output, and should not be relied on to produce factually accurate information. Because of the limitations of the pretrained model and the finetuning datasets, it is possible that this model could generate lewd, biased or otherwise offensive outputs. Therefore, before deploying any applications of the model, developers should perform safety testing. Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. Here are a couple of useful links to learn more about Intel's AI software: The license on this model does not constitute legal advice. We are not responsible for the actions of third parties who use this model. Please consult an attorney before using this model for commercial purposes. @article{cheng2023optimize, title={Optimize weight rounding via signed gradient descent for the quantization of llms}, author={Cheng, Wenhua and Zhang, Weiwei and Shen, Haihao and Cai, Yiyang and He, Xin and Lv, Kaokao and Liu, Yi}, journal={arXiv preprint arXiv:2309.05516}, year={2023} }

NaNK

—

15,344

tiny-random-llama2

This is a tiny random Llama model derived from "meta-llama/Llama-2-7b-hf". This is useful for functional testing (not quality generation, since its weights are random) on optimum-intel

llama

3,394

GLM-5-int4-mixed-AutoRound

—

3,269

neural-chat-7b-v3-1

A large language model designed for conversational AI applications, licensed under Apache 2.0.

tiny-random-gpt2

This is a tiny random GPT2 model derived from "openai-community/gpt2". This is useful for functional testing (not quality generation, since its weights are random) on optimum-intel

—

3,078

tiny-random-falcon

—

3,074

dpt-swinv2-tiny-256

license:mit

2,478

distilbert-base-uncased-distilled-squad-int8-static-inc

license:apache-2.0

2,214

t5-small-xsum-int8-dynamic-inc

license:apache-2.0

2,201

dynamic_tinybert

license:apache-2.0

1,757

Step-3.5-Flash-int4-mixed-AutoRound

—

1,537

neural-chat-7b-v3-2

A large language model (LLM) licensed under Apache 2.0.

NaNK

license:apache-2.0

1,475

Qwen3-Coder-30B-A3B-Instruct-gguf-q4km-AutoRound

NaNK

—

1,273

Qwen3-Omni-30B-A3B-Instruct-int4-AutoRound

NaNK

license:apache-2.0

1,240

GLM 4.6 REAP 218B A32B FP8 Gguf Q2ks Mixed AutoRound

This model is a mixed gguf q2ks format of cerebras/GLM-4.6-REAP-218B-A32B-FP8 generated by intel/auto-round algorithm. Embedding layer and lm-head layer are fallback to 8 bits and non expert layers are fallback to 4 bits. Please refer to Section `Generate the model` for more details. ~~~bash /llama-cli -hf Intel/GLM-4.6-REAP-218B-A32B-FP8-gguf-q2ks-mixed-AutoRound ~~~ ~~~bash > Write a quick sort algorithm. The user wants a quick sort algorithm. This is a very common request. I need to provide a correct, clear, and well-explained implementation of Quick Sort. I should probably choose a language like Python or JavaScript, as they are common and easy to read. I'll go with Python as it's very clean for algorithms. The user asked for a "quick sort algorithm" which implies they want the code, maybe a brief explanation, and maybe some edge cases. I should make sure to handle the common pitfalls, like in-place vs. not-in-place. Here's a plan: 1. Introduction: Briefly explain what Quick Sort is, its core idea (divide and conquer, pivot), and its time complexity. 2. Code Implementation: I'll write a clean, standard Python function for Quick Sort. I'll choose the in-place version, as it's memory-efficient (O(1) extra space besides the recursion stack). This is a common interview question and a practical approach. I'll use a helper function to keep the main function signature clean (e.g., `quicksort(arr, low, high)`). The implementation should correctly handle the base case (when the sub-array has 1 or 0 elements). The `partition` function is crucial. It will place the pivot in its correct sorted position and return the pivot index. 3. Explanation of the Code: I will walk through the code step-by-step. Explain the `quicksort` function's role (the main recursive function). Explain the `partition` function's logic (how it rearranges elements around the pivot). Explain the recursive calls to the left and right sub-arrays. 4. Example Walk-through: I'll trace the algorithm with a small array (e.g., `[3, 6, 8, 2, 5]`) to make it concrete and easy to understand. This is very helpful for learning. 5. Complexity Analysis: Time: Explain the best, average, and worst-case scenarios. Best/Average: O(n log n) Worst: O(n^2) (when the pivot is always the smallest or largest element, e.g., in a sorted array). Space: Explain the space complexity, which comes from the recursion stack depth (O(log n) on average, O(n) in the worst case). 6. Alternative Implementation (Optional but good): I could also show a more "Pythonic" but less memory-efficient version that uses list comprehensions. It's often more readable for beginners, so it's a good alternative to present. It's not in-place, so I should mention that. Introduction: Quick Sort is a highly efficient sorting algorithm that uses a "divide and conquer" strategy. It's a "in-place" algorithm, meaning it typically modifies the input array directly without using significant extra memory. The core idea is: 1. Divide: Pick an element from the array, called the pivot. 2. Conquer: Reorder the array so that all elements with values less than the pivot come before the pivot, while all elements with values greater than the pivot come after it. After this partitioning, the pivot is in its final sorted position. 3. Recurse: Recursively apply the above steps to the sub-array of elements with smaller values and the sub-array of elements with greater values. This is also a good option to present, as it highlights the trade-off between readability and efficiency. Final Plan for the Answer: 1. Start with a clear, concise introduction to Quick Sort. 2. Present the in-place implementation first, as it's the classic and memory-efficient approach. I'll explain the code step-by-step. 3. Provide a detailed walkthrough of an example (`[3, 6, 1, 2]` or similar) to make the process concrete. 4. Analyze the time and space complexity, covering all cases (best, average, worst). 5. Then, offer the alternative Pythonic version and explain its pros and cons (readability vs. memory usage). 6. Conclude with a comparison or a summary of when to use which version. This structure covers all bases from a practical developer's point of view and a learning perspective. I will make sure the comments in the code are clear and the explanation is easy to follow. I'll use Python for the code examples. Of course! Here is a Quick Sort algorithm implementation in Python, along with a detailed explanation of how it works, its complexity, and an alternative, more "Pythonic" version. Quick Sort is a highly efficient sorting algorithm that uses a "divide and conquer" strategy. It's an "in-place" algorithm, meaning it typically modifies the input array directly without using significant extra memory. The core idea is: 1. Divide: Pick an element from the array, called the pivot. 2. Conquer: Reorder the array so that all elements with values less than the pivot come before it, while all elements with values greater come after. The pivot is now in its final sorted position. 3. Recurse: Recursively apply the same logic to the sub-array of smaller elements and the sub-array of greater elements. This is the standard, memory-efficient version that sorts the array directly. It's a bit more complex to understand than the alternative below, but it's what you'd typically see in interviews and system-level code. | Aspect | Analysis | | -------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | Time Complexity | Best/Average Case: O(n log n). This occurs when the pivot consistently divides the array into two roughly equal halves. | | | Worst Case: O(n²). This occurs when the pivot is always the smallest or largest element (e.g., sorting an already sorted array). The `partition` function above chooses the last element, which can lead to this case. This can be avoided by choosing a random pivot. | | Space Complexity | O(log n) on average for the in-place version due to the recursive call stack. In the worst case, it's O(n). The Pythonic version uses O(n log n) space in the average case because it creates new lists. | | Feature | In-Place Version | Pythonic Version | | ------------------- | ---------------------------------------------------- | ----------------------------------------------------- | | Memory | O(log n) (very efficient) | O(n) (uses more memory) | | Readability | More complex, requires understanding `partition()`. | Very high, follows a simple, direct logic. | | When to Use | Performance-critical applications, large arrays, or when memory is constrained. | Small scripts, teaching, learning, or when code clarity is the absolute priority. | The model can produce factually incorrect output, and should not be relied on to produce factually accurate information. Because of the limitations of the pretrained model and the finetuning datasets, it is possible that this model could generate lewd, biased or otherwise offensive outputs. Therefore, before deploying any applications of the model, developers should perform safety testing. Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. Here are a couple of useful links to learn more about Intel's AI software: The license on this model does not constitute legal advice. We are not responsible for the actions of third parties who use this model. Please consult an attorney before using this model for commercial purposes. @article{cheng2023optimize, title={Optimize weight rounding via signed gradient descent for the quantization of llms}, author={Cheng, Wenhua and Zhang, Weiwei and Shen, Haihao and Cai, Yiyang and He, Xin and Lv, Kaokao and Liu, Yi}, journal={arXiv preprint arXiv:2309.05516}, year={2023} }

NaNK

—

998

Qwen3-30B-A3B-Instruct-2507-gguf-q2ks-mixed-AutoRound

NaNK

license:apache-2.0

960

Qwen3-30B-A3B-Instruct-2507-int4-AutoRound

NaNK

license:apache-2.0

861

Gpt Oss 20b Gguf Q4ks AutoRound

This model is a gguf q4ks format of openai/gpt-oss-20b generated by intel/auto-round algorithm. ~~~bash llama-cli -hf Intel/gpt-oss-20b-gguf-q4ks-AutoRound ~~~ ~~~bash > Write a quick sort algorithm. analysis We need to write quick sort algorithm. The user wants a quick sort algorithm. Provide code? They didn't specify language. I can provide a general quick sort algorithm, maybe in Python, C++, Java, etc. Provide explanation. Provide typical implementation. I'll respond with explanation and code. assistant final Below is a minimal, self‑contained quick‑sort implementation in a few common languages. Feel free to copy‑paste the snippet that matches your stack, or use it as a template and adapt it to your needs. 1. Choose a pivot – usually the first element (or median of three, etc.). 2. Partition the array into two sub‑arrays: elements ` = pivot` (or `> pivot` if you want to avoid duplicates) 3. Recursively sort the two sub‑arrays. 4. Concatenate the sorted left part, the pivot, and the sorted right part. The algorithm is in‑place (no extra array for the whole list) and runs in average \(O(n \log n)\), worst‑case \(O(n^2)\) if the pivot is poorly chosen. The helper `partition` returns the final index of the pivot. The algorithm is stable only if you change the comparison to ` =` depending on the ordering you need. For very large lists you may want to switch to a non‑recursive implementation to avoid hitting Python’s recursion depth limit. | Problem | Fix | |---------|-----| | Worst‑case O(n²) when the pivot is always the smallest/largest element | Pick the middle element or use median‑of‑three pivot. | | Stack overflow on very deep recursion | Convert recursion to iteration (explicit stack) or switch to an iterative algorithm. | | Unstable sorting | If stability matters, use a stable algorithm (e.g., merge‑sort) or add an index to each element and compare that as a tie‑breaker. | | Large duplicates | Use “Dutch‑Flag” partitioning that groups ` pivot`. | Quick‑sort is a classic divide‑and‑conquer algorithm that works well for average‑case sorting. The snippets above are short, in‑place, and can be dropped into most code bases. Requirement pip install git+https://github.com/intel/auto-round.git@hengguo/ggufupdate0806 The model can produce factually incorrect output, and should not be relied on to produce factually accurate information. Because of the limitations of the pretrained model and the finetuning datasets, it is possible that this model could generate lewd, biased or otherwise offensive outputs. Therefore, before deploying any applications of the model, developers should perform safety testing. Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. Here are a couple of useful links to learn more about Intel's AI software: The license on this model does not constitute legal advice. We are not responsible for the actions of third parties who use this model. Please consult an attorney before using this model for commercial purposes. @article{cheng2023optimize, title={Optimize weight rounding via signed gradient descent for the quantization of llms}, author={Cheng, Wenhua and Zhang, Weiwei and Shen, Haihao and Cai, Yiyang and He, Xin and Lv, Kaokao and Liu, Yi}, journal={arXiv preprint arXiv:2309.05516}, year={2023} }

NaNK

—

810

dpt-beit-base-384

license:mit

802

dpt-beit-large-512

license:mit

777

Qwen3-235B-A22B-Instruct-2507-gguf-q4km-AutoRound

This model is gguf:q4km model of Qwen/Qwen3-235B-A22B-Instruct-2507 generated by intel/auto-round algorithm. Please refer to Section `Generate the model` for more details. ~~~bash ./llama-cli -hf Intel/Qwen3-235B-A22B-Instruct-2507-gguf-q4km-AutoRound:q4km --conversation ~~~ ~~~python """ > Hi Hello! ٩(◕‿◕｡)۶ How can I assist you today? > Code a flappy bird in python Sure! Below is a simple implementation of Flappy Bird using Python and the Pygame library. This version includes: - A bird that flaps when you press the spacebar or click. - Pipes that move from right to left. - Collision detection. - Score tracking. - Game over when the bird hits a pipe or the ground/ceiling. Screen dimensions SCREENWIDTH = 400 SCREENHEIGHT = 600 screen = pygame.display.setmode((SCREENWIDTH, SCREENHEIGHT)) pygame.display.setcaption("Flappy Bird") Colors WHITE = (255, 255, 255) BLACK = (0, 0, 0) GREEN = (0, 255, 0) BLUE = (0, 0, 255) SKYBLUE = (135, 206, 235) Game settings GRAVITY = 0.5 FLAPSTRENGTH = -10 PIPESPEED = 3 PIPEGAP = 150 PIPEFREQUENCY = 1500 # milliseconds Bird settings birdx = 100 birdy = SCREENHEIGHT // 2 birdvelocity = 0 birdwidth = 40 birdheight = 30 Score score = 0 font = pygame.font.SysFont('Arial', 26) def drawbird(x, y): pygame.draw.rect(screen, BLUE, (x, y, birdwidth, birdheight), borderradius=10) def createpipe(): height = random.randint(100, SCREENHEIGHT - PIPEGAP - 100) toppipe = pygame.Rect(SCREENWIDTH, 0, 60, height) bottompipe = pygame.Rect(SCREENWIDTH, height + PIPEGAP, 60, SCREENHEIGHT) return {"top": toppipe, "bottom": bottompipe, "passed": False} def drawpipes(pipes): for pipe in pipes: pygame.draw.rect(screen, GREEN, pipe["top"]) pygame.draw.rect(screen, GREEN, pipe["bottom"]) def movepipes(pipes): for pipe in pipes: pipe["top"].x -= PIPESPEED pipe["bottom"].x -= PIPESPEED def removeoffscreenpipes(pipes): return [pipe for pipe in pipes if pipe["top"].right > 0] def checkcollision(pipes, birdy): # Check ceiling or floor if birdy SCREENHEIGHT: return True # Check pipes birdrect = pygame.Rect(birdx, birdy, birdwidth, birdheight) for pipe in pipes: if birdrect.colliderect(pipe["top"]) or birdrect.colliderect(pipe["bottom"]): return True return False def drawscore(): scoretext = font.render(f"Score: {score}", True, BLACK) screen.blit(scoretext, (10, 10)) Main game loop while running: currenttime = pygame.time.getticks() screen.fill(SKYBLUE) for event in pygame.event.get(): if event.type == pygame.QUIT: running = False if event.type == pygame.KEYDOWN: if event.key == pygame.KSPACE and gameactive: birdvelocity = FLAPSTRENGTH if event.key == pygame.Kr and not gameactive: # Restart game birdy = SCREENHEIGHT // 2 birdvelocity = 0 pipes = [] score = 0 gameactive = True if event.type == pygame.MOUSEBUTTONDOWN and gameactive: birdvelocity = FLAPSTRENGTH if gameactive: # Bird physics birdvelocity += GRAVITY birdy += birdvelocity # Generate pipes if currenttime - lastpipetime > PIPEFREQUENCY: pipes.append(createpipe()) lastpipetime = currenttime # Move and draw pipes movepipes(pipes) pipes = removeoffscreenpipes(pipes) drawpipes(pipes) # Check for scoring for pipe in pipes: if pipe["top"].right 0.5.1 ~~~python import torch from transformers import AutoModelForCausalLM, AutoTokenizer from autoround import AutoRound model = AutoModelForCausalLM.frompretrained(modelname, devicemap="cpu", torchdtype="auto") tokenizer = AutoTokenizer.frompretrained(modelname) autoround = AutoRound(model, tokenizer, iters=0, layerconfig=layerconfig, nsamples=512) autoround.quantizeandsave("/dataset/Qwen3-235B-A22B-Instruct-2507-q4km", format="gguf:q4km") The model can produce factually incorrect output, and should not be relied on to produce factually accurate information. Because of the limitations of the pretrained model and the finetuning datasets, it is possible that this model could generate lewd, biased or otherwise offensive outputs. Therefore, before deploying any applications of the model, developers should perform safety testing. Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. Here are a couple of useful links to learn more about Intel's AI software: The license on this model does not constitute legal advice. We are not responsible for the actions of third parties who use this model. Please consult an attorney before using this model for commercial purposes. @article{cheng2023optimize, title={Optimize weight rounding via signed gradient descent for the quantization of llms}, author={Cheng, Wenhua and Zhang, Weiwei and Shen, Haihao and Cai, Yiyang and He, Xin and Lv, Kaokao and Liu, Yi}, journal={arXiv preprint arXiv:2309.05516}, year={2023} }

NaNK

license:apache-2.0

765

Qwen3-30B-A3B-Thinking-2507-gguf-q2ks-mixed-AutoRound

This model is a mixed gguf q2ks format of Qwen/Qwen3-30B-A3B-Thinking-2507 generated by intel/auto-round algorithm. Embedding layer and lm-head layer are fallback to 8 bits and non expert layers are fallback to 4 bits. Please refer to Section `Generate the model` for more details. ~~~bash /llama-cli -hf Intel/Qwen3-30B-A3B-Thinking-2507-gguf-q2ks-mixed-AutoRound --conversation ~~~ We are going to create a Flappy Bird game in Python using the Pygame library. Steps: 1. Install Pygame if not already installed (pip install pygame) 2. Create a window for the game. 3. Define the main game loop. 4. Implement the bird (flappy bird) movement and physics. 5. Create pipes (obstacles) that move from right to left. 6. Handle collisions with pipes and the ground. 7. Add a score system. Note: We'll use simple graphics (rectangles for bird and pipes) for simplicity. - We'll set up the window size. - The bird will have a gravity effect and can jump (by pressing space). - Pipes will be generated at random heights. - We'll keep track of the score (number of pipes passed). - Game over when the bird hits a pipe or the ground. 1. Import necessary modules. 2. Define constants (width, height, gravity, etc.) 3. Create the bird class (or just use a rectangle with position and velocity). 4. Create the pipe class (with two pipes: top and bottom). 5. Main game loop. However, note: We are not going to use images for simplicity (using rectangles). Here's a complete Flappy Bird game implementation in Python using Pygame. This version uses simple rectangles for the bird and pipes, but you can easily replace them with images for a more authentic look. 1. Simple Graphics: Uses circles for the bird and rectangles for pipes (replace with images later) 2. Physics: Gravity and flapping mechanics 3. Score System: Counts pipes passed 4. Game Over: Detects collisions with pipes or ground 5. Restart Option: Press Space to restart after game over 1. Save as `flappybird.py` 2. Install Pygame: `pip install pygame` 3. Run: `python flappybird.py` 2. Adjust Difficulty: - Increase `PIPESPEED` for faster pipes - Adjust `GRAVITY` and `FLAPFORCE` for different physics - The game uses rectangles for simplicity (you can replace with images) - The bird is a circle with radius 20 (positioned at center) - Pipes are generated at random heights with fixed gap - Game over when bird hits top/bottom or pipes This implementation is optimized for simplicity and clarity. If you want to add more features like animations or sound, you can extend the code as shown in the customization tips. (Note: The actual game window will show the bird moving through pipes) Enjoy playing Flappy Bird! The game is designed to be playable with keyboard (spacebar to jump) and will restart automatically after game over. The model can produce factually incorrect output, and should not be relied on to produce factually accurate information. Because of the limitations of the pretrained model and the finetuning datasets, it is possible that this model could generate lewd, biased or otherwise offensive outputs. Therefore, before deploying any applications of the model, developers should perform safety testing. Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. Here are a couple of useful links to learn more about Intel's AI software: The license on this model does not constitute legal advice. We are not responsible for the actions of third parties who use this model. Please consult an attorney before using this model for commercial purposes. @article{cheng2023optimize, title={Optimize weight rounding via signed gradient descent for the quantization of llms}, author={Cheng, Wenhua and Zhang, Weiwei and Shen, Haihao and Cai, Yiyang and He, Xin and Lv, Kaokao and Liu, Yi}, journal={arXiv preprint arXiv:2309.05516}, year={2023} }

NaNK

license:apache-2.0

750

Qwen3-Next-80B-A3B-Instruct-int4-mixed-AutoRound

This model is a mixed int4 model with groupsize 128 and symmetric quantization of Qwen/Qwen3-Next-80B-A3B-Instruct generated by intel/auto-round via RTN(no algorithm tuning). Non expert layers are fallback to 8 bits. Please refer to Section Generate the model for more details. Please follow the license of the original model. For vllm, this pr is required https://github.com/vllm-project/vllm/pull/24818 | benchmark | n-shot | backend | Intel/Qwen3-Next-80B-A3B-Instruct-int4-mixed-AutoRound | Qwen/Qwen3-Next-80B-A3B-Instruct | | :-------: | :----: | :-----: | :----------------------------------------------------: | :------------------------------: | | gsm8k | 5 | vllm | 0.8393 | 0.8074 | | mmlupro | 5 | vllm | 0.7630 | 0.7621 | The model can produce factually incorrect output, and should not be relied on to produce factually accurate information. Because of the limitations of the pretrained model and the finetuning datasets, it is possible that this model could generate lewd, biased or otherwise offensive outputs. Therefore, before deploying any applications of the model, developers should perform safety testing. Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. Here are a couple of useful links to learn more about Intel's AI software: The license on this model does not constitute legal advice. We are not responsible for the actions of third parties who use this model. Please consult an attorney before using this model for commercial purposes. @article{cheng2023optimize, title={Optimize weight rounding via signed gradient descent for the quantization of llms}, author={Cheng, Wenhua and Zhang, Weiwei and Shen, Haihao and Cai, Yiyang and He, Xin and Lv, Kaokao and Liu, Yi}, journal={arXiv preprint arXiv:2309.05516}, year={2023} }

NaNK

license:apache-2.0

721

ldm3d-4c

—

697

MiniMax-M2-gguf-q2ks-mixed-AutoRound

—

609

Qwen3-Coder-30B-A3B-Instruct-int4-AutoRound

This model is an int4 model with groupsize 128 and symmetric quantization of Qwen/Qwen3-Coder-30B-A3B-Instruct generated by intel/auto-round algorithm. Please follow the license of the original model. ~~~bash vllm serve Intel/Qwen3-Coder-30B-A3B-Instruct-int4-AutoRound --tensor-parallel-size 4 --max-model-len 65536 ~~~ ~~~python from transformers import AutoModelForCausalLM, AutoTokenizer modelname = "Intel/Qwen3-Coder-30B-A3B-Instruct-int4-AutoRound" load the tokenizer and the model tokenizer = AutoTokenizer.frompretrained(modelname) model = AutoModelForCausalLM.frompretrained( modelname, torchdtype="auto", devicemap="auto" ) prepare the model input prompt = "Write a quick sort algorithm." messages = [ {"role": "user", "content": prompt} ] text = tokenizer.applychattemplate( messages, tokenize=False, addgenerationprompt=True, ) modelinputs = tokenizer([text], returntensors="pt").to(model.device) conduct text completion generatedids = model.generate( modelinputs, maxnewtokens=65536 ) outputids = generatedids[0][len(modelinputs.inputids[0]):].tolist() content = tokenizer.decode(outputids, skipspecialtokens=True) print("content:", content) """ content: Here's a quicksort algorithm implementation in Python: ```python def quicksort(arr): ''' Sorts an array using the quicksort algorithm. Returns: None (sorts in-place) ''' if len(arr) 1: quicksorthelper(0, len(arr) - 1) The algorithm efficiently sorts arrays by repeatedly dividing them into smaller subproblems, making it one of the most widely used sorting algorithms in practice. """ ~~~ ~~~bash auto-round --model Qwen/Qwen3-Coder-30B-A3B-Instruct --outputdir "./tmpautoround" --enabletorchcompile --nsamples 512 --fplayers mlp.gate ~~~ The model can produce factually incorrect output, and should not be relied on to produce factually accurate information. Because of the limitations of the pretrained model and the finetuning datasets, it is possible that this model could generate lewd, biased or otherwise offensive outputs. Therefore, before deploying any applications of the model, developers should perform safety testing. Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. Here are a couple of useful links to learn more about Intel's AI software: The license on this model does not constitute legal advice. We are not responsible for the actions of third parties who use this model. Please consult an attorney before using this model for commercial purposes. @article{cheng2023optimize, title={Optimize weight rounding via signed gradient descent for the quantization of llms}, author={Cheng, Wenhua and Zhang, Weiwei and Shen, Haihao and Cai, Yiyang and He, Xin and Lv, Kaokao and Liu, Yi}, journal={arXiv preprint arXiv:2309.05516}, year={2023} }

NaNK

license:apache-2.0

582

Qwen3-Next-80B-A3B-Instruct-int4-AutoRound

This model is a int4 model with groupsize 128 and symmetric quantization of Qwen/Qwen3-Next-80B-A3B-Instruct generated by intel/auto-round. Please follow the license of the original model. For vllm, this pr is required https://github.com/vllm-project/vllm/pull/24818 vLLM The following command can be used to create an API endpoint at `http://localhost:8000/v1` with maximum context length 256K tokens. The following command is recommended for MTP with the rest settings the same as above: | benchmark | n-shot | backend | Intel/Qwen3-Next-80B-A3B-Instruct-int4-AutoRound | Qwen/Qwen3-Next-80B-A3B-Instruct | | :-------: | :----: | :-----: | :----------------------------------------------: | :------------------------------: | | gsm8k | 5 | vllm | 0.8643 | 0.8074 | | mmlupro | 5 | vllm | 0.7570 | 0.7621 | The model can produce factually incorrect output, and should not be relied on to produce factually accurate information. Because of the limitations of the pretrained model and the finetuning datasets, it is possible that this model could generate lewd, biased or otherwise offensive outputs. Therefore, before deploying any applications of the model, developers should perform safety testing. Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. Here are a couple of useful links to learn more about Intel's AI software: The license on this model does not constitute legal advice. We are not responsible for the actions of third parties who use this model. Please consult an attorney before using this model for commercial purposes. @article{cheng2023optimize, title={Optimize weight rounding via signed gradient descent for the quantization of llms}, author={Cheng, Wenhua and Zhang, Weiwei and Shen, Haihao and Cai, Yiyang and He, Xin and Lv, Kaokao and Liu, Yi}, journal={arXiv preprint arXiv:2309.05516}, year={2023} }

NaNK

license:apache-2.0

573

llava-gemma-2b

NaNK

—

545

Qwen3-Next-80B-A3B-Thinking-int4-mixed-AutoRound

This model is a mixed int4 model with groupsize 128 and symmetric quantization of Qwen/Qwen3-Next-80B-A3B-Thinking generated by intel/auto-round via RTN(no algorithm tuning). Non expert layers are fallback to 8 bits. Please refer to Section Generate the model for more details. Please follow the license of the original model. For vllm, this pr is required https://github.com/vllm-project/vllm/pull/24818 | benchmark | n-shot | backend | Intel/Qwen3-Next-80B-A3B-Thinking-int4-mixed-AutoRound | Qwen/Qwen3-Next-80B-A3B-Thinking | | :-------: | :----: | :-----: | :----------------------------------------------------: | :------------------------------: | | gsm8k | 5 | vllm | 0.8446 | 0.8453 | | mmlupro | 5 | vllm | 0.7281 | 0.7271 | The model can produce factually incorrect output, and should not be relied on to produce factually accurate information. Because of the limitations of the pretrained model and the finetuning datasets, it is possible that this model could generate lewd, biased or otherwise offensive outputs. Therefore, before deploying any applications of the model, developers should perform safety testing. Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. Here are a couple of useful links to learn more about Intel's AI software: The license on this model does not constitute legal advice. We are not responsible for the actions of third parties who use this model. Please consult an attorney before using this model for commercial purposes. @article{cheng2023optimize, title={Optimize weight rounding via signed gradient descent for the quantization of llms}, author={Cheng, Wenhua and Zhang, Weiwei and Shen, Haihao and Cai, Yiyang and He, Xin and Lv, Kaokao and Liu, Yi}, journal={arXiv preprint arXiv:2309.05516}, year={2023} }

NaNK

license:apache-2.0

530

DeepSeek-R1-0528-Qwen3-8B-int4-AutoRound

NaNK

license:mit

525

GLM 4.6 Gguf Q2ks Mixed AutoRound

NaNK

—

454

Qwen3-235B-A22B-Instruct-2507-gguf-q2ks-mixed-AutoRound

This model is a mixed gguf:q2ks of Qwen/Qwen3-235B-A22B-Instruct-2507 generated by intel/auto-round algorithm. Embedding layer and lm-head layer are fallback to 8 bits and non expert layers are fallback to 4 bits. Please refer to Section `Generate the model` for more details. ~~~bash ./llama-cli -hf Intel/Qwen3-235B-A22B-Instruct-2507-gguf-q2ks-mixed-AutoRound:q2ks --conversation ~~~ ~~~python """ > Hi Hello! How can I help you today? 😊 > code a flappy bird in Python Sure! Here's a simple Flappy Bird game implemented in Python using the Pygame library. 🔧 Requirements: Make sure you have Pygame installed: Screen dimensions WIDTH, HEIGHT = 400, 600 screen = pygame.display.setmode((WIDTH, HEIGHT)) pygame.display.setcaption("Flappy Bird") Colors WHITE = (255, 255, 255) BLACK = (0, 0, 0) GREEN = (0, 128, 0) SKYBLUE = (135, 206, 235) Game settings GRAVITY = 0.5 FLAPSTRENGTH = -10 PIPESPEED = 3 PIPEGAP = 150 PIPEFREQUENCY = 1500 # milliseconds Bird class class Bird: def init(self): self.x = 50 self.y = HEIGHT // 2 self.velocity = 0 self.width = 20 self.height = 20 def update(self): self.velocity += GRAVITY self.y += self.velocity if self.y HEIGHT: self.y = HEIGHT def draw(self): pygame.draw.circle(screen, BLACK, (self.x, int(self.y)), self.width // 2) Pipe class class Pipe: def init(self): self.x = WIDTH self.top = random.randint(50, HEIGHT - PIPEGAP - 50) self.bottom = self.top + PIPEGAP self.width = 50 self.passed = False def update(self): self.x -= PIPESPEED return self.x > -self.width def draw(self): pygame.draw.rect(screen, GREEN, (self.x, 0, self.width, self.top)) pygame.draw.rect(screen, GREEN, (self.x, self.bottom, self.width, HEIGHT - self.bottom)) def collide(self, bird): if bird.x + bird.width // 2 > self.x and bird.x - bird.width // 2 self.bottom: return True return False Main game loop def game(): bird = Bird() pipes = [] score = 0 clock = pygame.time.Clock() lastpipe = pygame.time.getticks() while running: screen.fill(SKYBLUE) dt = clock.tick(60) currenttime = pygame.time.getticks() # Event handling for event in pygame.event.get(): if event.type == pygame.QUIT: pygame.quit() sys.exit() if event.type == pygame.KEYDOWN: if event.key == pygame.KSPACE and gameactive: bird.flap() if event.key == pygame.Kr and not gameactive: game() # Generate new pipes if currenttime - lastpipe > PIPEFREQUENCY: pipes.append(Pipe()) lastpipe = currenttime # Update and draw pipes pipes = [p for p in pipes if p.update()] for pipe in pipes: pipe.draw() if pipe.collide(bird): gameactive = False # Score logic if pipe.x + pipe.width = HEIGHT or bird.y 0.5.1 ~~~python import torch from transformers import AutoModelForCausalLM, AutoTokenizer from autoround import AutoRound model = AutoModelForCausalLM.frompretrained(modelname, devicemap="cpu", torchdtype="auto") tokenizer = AutoTokenizer.frompretrained(modelname) layerconfig = {} for n, m in model.namedmodules(): if n == "lmhead" or isinstance(m,torch.nn.Embedding): layerconfig[n] = {"bits": 8} elif isinstance(m, torch.nn.Linear) and (not "expert" in n or "sharedexperts" in n) and n != "lmhead": layerconfig[n] = {"bits": 4} autoround = AutoRound(model, tokenizer, iters=0, layerconfig=layerconfig, nsamples=512) autoround.quantizeandsave("/models/Qwen3-235B-A22B-Instruct-2507-q2ks", format="gguf:q2ks") The model can produce factually incorrect output, and should not be relied on to produce factually accurate information. Because of the limitations of the pretrained model and the finetuning datasets, it is possible that this model could generate lewd, biased or otherwise offensive outputs. Therefore, before deploying any applications of the model, developers should perform safety testing. Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. Here are a couple of useful links to learn more about Intel's AI software: The license on this model does not constitute legal advice. We are not responsible for the actions of third parties who use this model. Please consult an attorney before using this model for commercial purposes. @article{cheng2023optimize, title={Optimize weight rounding via signed gradient descent for the quantization of llms}, author={Cheng, Wenhua and Zhang, Weiwei and Shen, Haihao and Cai, Yiyang and He, Xin and Lv, Kaokao and Liu, Yi}, journal={arXiv preprint arXiv:2309.05516}, year={2023} }

NaNK

license:apache-2.0

428

Ling Flash 2.0 Gguf Q2ks Mixed AutoRound

This model is a mixed gguf q2ks format of inclusionAI/Ling-flash-2.0 generated by intel/auto-round algorithm. Embedding layer and lm-head layer are fallback to 8 bits and non expert layers are fallback to 4 bits. Please refer to Section `Generate the model` for more details. ~~~bash /llama-cli -hf Intel/Ling-flash-2.0-gguf-q2ks-mixed-AutoRound ~~~ ~~~bash > Write a quick sort algorithm. Sure! Here's a simple and clear implementation of the Quick Sort algorithm in Python: 🧠 How It Works: - Divide and Conquer: Pick a pivot, partition the array into elements less than, equal to, and greater than the pivot. - Recursively apply the same to left and right partitions. - Not in-place in this version (creates new lists). An in-place version is more memory-efficient. ⏱ Time Complexity: - Best/Average Case: O(n log n) - Worst Case: O(n²) — when the pivot is always the smallest or largest element (e.g., on already sorted data), but this is mitigated by choosing a good pivot (like middle, random, or median-of-three). Would you like the in-place version (which is more efficient in memory) as well? The model can produce factually incorrect output, and should not be relied on to produce factually accurate information. Because of the limitations of the pretrained model and the finetuning datasets, it is possible that this model could generate lewd, biased or otherwise offensive outputs. Therefore, before deploying any applications of the model, developers should perform safety testing. Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. Here are a couple of useful links to learn more about Intel's AI software: The license on this model does not constitute legal advice. We are not responsible for the actions of third parties who use this model. Please consult an attorney before using this model for commercial purposes. @article{cheng2023optimize, title={Optimize weight rounding via signed gradient descent for the quantization of llms}, author={Cheng, Wenhua and Zhang, Weiwei and Shen, Haihao and Cai, Yiyang and He, Xin and Lv, Kaokao and Liu, Yi}, journal={arXiv preprint arXiv:2309.05516}, year={2023} }

NaNK

—

415

Qwen3-Next-80B-A3B-Thinking-int4-AutoRound

This model is a int4 model with groupsize 128 and symmetric quantization of Qwen/Qwen3-Next-80B-A3B-Thinking generated by intel/auto-round. Please follow the license of the original model. For vllm, this pr is required https://github.com/vllm-project/vllm/pull/24818 | benchmark | n-shot | backend | Intel/Qwen3-Next-80B-A3B-Thinking-int4-AutoRound | Qwen/Qwen3-Next-80B-A3B-Thinking | | :-------: | :----: | :-----: | :----------------------------------------------: | :------------------------------: | | gsm8k | 5 | vllm | 0.8446 | 0.8453 | | mmlupro | 5 | vllm | 0.7079 | 0.7271 | The model can produce factually incorrect output, and should not be relied on to produce factually accurate information. Because of the limitations of the pretrained model and the finetuning datasets, it is possible that this model could generate lewd, biased or otherwise offensive outputs. Therefore, before deploying any applications of the model, developers should perform safety testing. Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. Here are a couple of useful links to learn more about Intel's AI software: The license on this model does not constitute legal advice. We are not responsible for the actions of third parties who use this model. Please consult an attorney before using this model for commercial purposes. @article{cheng2023optimize, title={Optimize weight rounding via signed gradient descent for the quantization of llms}, author={Cheng, Wenhua and Zhang, Weiwei and Shen, Haihao and Cai, Yiyang and He, Xin and Lv, Kaokao and Liu, Yi}, journal={arXiv preprint arXiv:2309.05516}, year={2023} }

NaNK

license:apache-2.0

411

Qwen2.5-0.5B-W4A16-G128-AutoRound-LLMC-TEST-ONLY

NaNK

—

409

polite-guard

license:apache-2.0

380

tiny-random-bert_ipex_model

This is a tiny random bert model. It was uploaded by IPEXModelForQuestionAnswering. This is useful for functional testing (not quality generation, since its weights are random) on optimum-intel

—

342

tiny-random-llama2_ipex_model

This is a tiny random Llama model derived from "meta-llama/Llama-2-7b-hf". It was uploaded by IPEXModelForCausalLM. This is useful for functional testing (not quality generation, since its weights are random) on optimum-intel

llama

337

tiny-random-mistral_ipex_model

—

333

tiny-random-gpt2_ipex_model

This is a tiny random gpt2 model. It was uploaded by IPEXModelForCausalLM. This is useful for functional testing (not quality generation, since its weights are random) on optimum-intel

—

332

tiny-random-falcon_ipex_model

This is a tiny random falcon model. It was uploaded by IPEXModelForCausalLM. This is useful for functional testing (not quality generation, since its weights are random) on optimum-intel

—

332

tiny-random-vit_ipex_model

This is a tiny random vit model derived from "google/vit-base-patch16-224". It was uploaded by IPEXModelForImageClassification. This is useful for functional testing (not quality generation, since its weights are random) on optimum-intel

—

331

Qwen3-Coder-480B-A35B-Instruct-int4-mixed-AutoRound

NaNK

license:apache-2.0

306

gpt-oss-120b-gguf-q4ks-AutoRound

This model is a gguf q4ks format of openai/gpt-oss-120b generated by intel/auto-round algorithm. ~~~bash llama-cli -hf Intel/gpt-oss-120b-gguf-q4ks-AutoRound ~~~ ~~~bash > Write a quick sort algorithm. analysis The user: "Write a quick sort algorithm." Likely they want code in some language. I should provide a quicksort implementation, possibly multiple languages, explain. I can give Python example. Also discuss partition scheme. Provide code. I'll respond with an example in Python, plus explanation. Also could include C++ version. Provide generic. assistant final Below is a complete, ready‑to‑run implementation of the Quick‑Sort algorithm together with a short explanation of how it works. You’ll find versions in Python, C++, and JavaScript so you can pick the one that best fits your project. Quick‑Sort is a divide‑and‑conquer sorting algorithm that works in three steps: 1. Choose a pivot element from the array (any element works, but a good choice reduces the chance of worst‑case O(n²) behavior). 2. Partition the array into two sub‑arrays: Elements ≤ pivot go to the left side. Elements > pivot go to the right side. 3. Recursively apply Quick‑Sort to the left and right sub‑arrays. The algorithm runs in average O(n log n) time and O(log n) extra space (the recursion stack). The in‑place version shown below uses the Lomuto partition scheme, which is easy to understand and implement. cpp #include #include #include // std::rand, std::srand #include // std::time // Lomuto partition int partition(std::vector & a, int low, int high) { int pivot = a[high]; // Choose rightmost element as pivot int i = low - 1; // Index of smaller element for (int j = low; j & a, int low, int high) { if (low & a) { quicksort(a, 0, staticcast (a.size()) - 1); } // ------------------------------------------------- int main() { std::srand(staticcast (std::time(nullptr))); // optional random seed std::vector data = {33, 10, 55, 71, 29, 3, 90, 42}; std::cout } arr - The array to sort. @param {number} low - Starting index (default 0). @param {number} high - Ending index (default arr.length‑1). / function quicksort(arr, low = 0, high = arr.length - 1) { if (low pivot` to avoid O(n²) on many equal elements | | Iterative Quick‑Sort | When recursion depth may overflow (very large arrays) | Use an explicit stack (or tail‑call optimisation) instead of recursion | | Metric | Average case | Worst case | Space (auxiliary) | |--------|--------------|-----------|-------------------| | Time | O(n log n) | O(n²) (e.g., already sorted with naive pivot) | O(log n) recursion depth (in‑place) | | Stability | No (elements with equal keys may be reordered) | – | – | If you need a stable sort, consider Merge‑Sort or TimSort (the algorithm used by Python’s built‑in `list.sort()`). The one‑liner above is a functional (non‑in‑place) version that is easy to understand but uses O(n) extra memory for each recursive call. Pick the version that fits your constraints, and you’ll have a fast, reliable sorting routine ready to go! Requirement pip install git+https://github.com/intel/auto-round.git@hengguo/ggufupdate0806 The model can produce factually incorrect output, and should not be relied on to produce factually accurate information. Because of the limitations of the pretrained model and the finetuning datasets, it is possible that this model could generate lewd, biased or otherwise offensive outputs. Therefore, before deploying any applications of the model, developers should perform safety testing. Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. Here are a couple of useful links to learn more about Intel's AI software: The license on this model does not constitute legal advice. We are not responsible for the actions of third parties who use this model. Please consult an attorney before using this model for commercial purposes. @article{cheng2023optimize, title={Optimize weight rounding via signed gradient descent for the quantization of llms}, author={Cheng, Wenhua and Zhang, Weiwei and Shen, Haihao and Cai, Yiyang and He, Xin and Lv, Kaokao and Liu, Yi}, journal={arXiv preprint arXiv:2309.05516}, year={2023} }

NaNK

—

286

Qwen3-30B-A3B-Instruct-2507-int4-mixed-AutoRound

NaNK

license:apache-2.0

260

gpt-oss-20b-int4-AutoRound

NaNK

license:apache-2.0

230

dpt-swinv2-large-384

license:mit

209

MiniMax-M2-REAP-172B-A10B-gguf-q2ks-mixed-AutoRound

NaNK

—

185

Qwen3.5-35B-A3B-gguf-q2ks-mixed-AutoRound

NaNK

—

175

Qwen3-VL-30B-A3B-Instruct-int4-AutoRound

NaNK

—

171

deepmath-v1

NaNK

license:apache-2.0

169

gpt-oss-120b-int4-AutoRound

NaNK

—

161

Qwen3 8B GGUF Q2KS AS AutoRound

This example model demonstrates how to use AutoRound’s compatibility to automatically generate a mixed-bit quantization recipe. ~~~python from autoround import AutoRound, AutoScheme modelname = "Qwen/Qwen3-8B" avgbits = 3.0 scheme = AutoScheme(avgbits=avgbits, options=("GGUF:Q2KS", "GGUF:Q4KS"), ignorescalezpbits=True) layerconfig = {"lmhead": "GGUF:Q6K"} ar = AutoRound(model=modelname, scheme=scheme, layerconfig=layerconfig, iters=0) ar.quantizeandsave() ~~~

NaNK

—

155

zoedepth-kitti

license:mit

151

gpt-oss-20b-gguf-q2ks-AutoRound

This model is a gguf q2ks format of openai/gpt-oss-20b generated by intel/auto-round algorithm. ~~~bash llama-cli -hf Intel/gpt-oss-20b-gguf-q2ks-AutoRound ~~~ ~~~bash > Write a quick sort algorithm. analysis The user says: "Write a quick sort algorithm." They didn't specify language. Usually, we can answer with a generic explanation, maybe pseudocode. The user says "Write a quick sort algorithm." We should respond with a quick sort algorithm. Possibly in Python? Or a typical quick sort implementation. The user didn't specify the language. They might want a quicksort implementation. We can answer in Python. We can also explain the algorithm. We should answer concisely but thorough. The user: "Write a quick sort algorithm." So we can just give code. Thus the best solution: Provide quicksort algorithm in Python. Possibly iterative? But we can provide recursive quicksort. We should mention the typical quicksort: pick pivot, partition, sort left, sort right. We'll produce a quicksort function. Provide an implementation. We can show both in-place quicksort and functional quicksort. The question didn't ask for any particular style, so just give a quicksort algorithm. So just provide code and explanation. python def quicksortinplace(a, low=0, high=None): """ Sorts the list `a` in‑place using the quick‑sort algorithm. :param a: list to sort :param low: first index to sort (default 0) :param high: last index to sort (default None, meaning the last element) """ if high is None: high = len(a) - 1 if low = pivot` stay on the right. 3. Recursion: After partitioning, we sort the left sub‑array (`low` … `p-1`) and the right sub‑array (`p+1` … `high`). 4. In‑place: The algorithm sorts the original list without allocating additional memory for the sub‑arrays. - Average‑case: `O(n log n)` – when partitions are roughly balanced. - Worst‑case: `O(n²)` – when partitions are highly unbalanced (e.g., already sorted input with the last element as pivot). This can be mitigated by picking a random pivot or the median‑of‑three pivot. Feel free to adapt the code (e.g., use a random pivot or a different partition scheme) for your specific needs. Generate the model Requirement pip install git+https://github.com/intel/auto-round.git@hengguo/ggufupdate0806 The model can produce factually incorrect output, and should not be relied on to produce factually accurate information. Because of the limitations of the pretrained model and the finetuning datasets, it is possible that this model could generate lewd, biased or otherwise offensive outputs. Therefore, before deploying any applications of the model, developers should perform safety testing. Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. Here are a couple of useful links to learn more about Intel's AI software: The license on this model does not constitute legal advice. We are not responsible for the actions of third parties who use this model. Please consult an attorney before using this model for commercial purposes. @article{cheng2023optimize, title={Optimize weight rounding via signed gradient descent for the quantization of llms}, author={Cheng, Wenhua and Zhang, Weiwei and Shen, Haihao and Cai, Yiyang and He, Xin and Lv, Kaokao and Liu, Yi}, journal={arXiv preprint arXiv:2309.05516}, year={2023} }

NaNK

—

143

dpt-beit-large-384

license:mit

141

Seed-OSS-36B-Instruct-int4-AutoRound

NaNK

license:apache-2.0

109

dpt-swinv2-base-384

license:mit

109

Qwen3-VL-235B-A22B-Instruct-int4-mixed-AutoRound

NaNK

—

108

Qwen3-VL-235B-A22B-Instruct-int4-AutoRound

This model is an int4 model with groupsize 128 and symmetric quantization of Qwen/Qwen3-VL-235B-A22B-Instruct generated by intel/auto-round. Please follow the license of the original model. INT4 Inference Please import `Qwen3VLMoeForConditionalGeneration` from `modelingqwen3vlmoe.py` in the model folder. Generate the model Step 1: use the script to convert model The model can produce factually incorrect output, and should not be relied on to produce factually accurate information. Because of the limitations of the pretrained model and the finetuning datasets, it is possible that this model could generate lewd, biased or otherwise offensive outputs. Therefore, before deploying any applications of the model, developers should perform safety testing. Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. Here are a couple of useful links to learn more about Intel's AI software: The license on this model does not constitute legal advice. We are not responsible for the actions of third parties who use this model. Please consult an attorney before using this model for commercial purposes. @article{cheng2023optimize, title={Optimize weight rounding via signed gradient descent for the quantization of llms}, author={Cheng, Wenhua and Zhang, Weiwei and Shen, Haihao and Cai, Yiyang and He, Xin and Lv, Kaokao and Liu, Yi}, journal={arXiv preprint arXiv:2309.05516}, year={2023} }

NaNK

—

102

Qwen3-30B-A3B-Thinking-2507-int4-AutoRound

This model is an int4 model with groupsize 128 and symmetric quantization of Qwen/Qwen3-30B-A3B-Thinking-2507 generated by intel/auto-round algorithm. Please follow the license of the original model. ~~~bash vllm serve Intel/Qwen3-30B-A3B-Thinking-2507-int4-AutoRound --tensor-parallel-size 4 --max-model-len 32768 --enable-expert-parallel ~~~ ~~~python from transformers import AutoModelForCausalLM, AutoTokenizer modelname = "Intel/Qwen3-30B-A3B-Thinking-2507-int4-AutoRound" load the tokenizer and the model tokenizer = AutoTokenizer.frompretrained(modelname) model = AutoModelForCausalLM.frompretrained( modelname, torchdtype="auto", devicemap="auto" ) prepare the model input prompt = "Give me a short introduction to large language model." messages = [ {"role": "user", "content": prompt} ] text = tokenizer.applychattemplate( messages, tokenize=False, addgenerationprompt=True, ) modelinputs = tokenizer([text], returntensors="pt").to(model.device) conduct text completion generatedids = model.generate( modelinputs, maxnewtokens=32768 ) outputids = generatedids[0][len(modelinputs.inputids[0]):].tolist() parsing thinking content try: # rindex finding 151668 ( ) index = len(outputids) - outputids[::-1].index(151668) except ValueError: index = 0 thinkingcontent = tokenizer.decode(outputids[:index], skipspecialtokens=True).strip("\n") content = tokenizer.decode(outputids[index:], skipspecialtokens=True).strip("\n") print("thinking content:", thinkingcontent) # no opening tag print("content:", content) """ ....will update later... """ ~~~ ~~~bash auto-round --model Qwen/Qwen3-30B-A3B-Thinking-2507 --outputdir "./tmpautoround" --enabletorchcompile --nsamples 512 --fplayers mlp.gate ~~~ | benchmark | backend | Intel/Qwen3-30B-A3B-Thinking-2507-int4-AutoRound | Qwen/Qwen3-30B-A3B-Thinking-2507 | | :-------: | :-----: | :----------------------------------------------: | :------------------------------: | | mmlupro | vllm | 0.6956 | 0.7144 | The model can produce factually incorrect output, and should not be relied on to produce factually accurate information. Because of the limitations of the pretrained model and the finetuning datasets, it is possible that this model could generate lewd, biased or otherwise offensive outputs. Therefore, before deploying any applications of the model, developers should perform safety testing. Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. Here are a couple of useful links to learn more about Intel's AI software: The license on this model does not constitute legal advice. We are not responsible for the actions of third parties who use this model. Please consult an attorney before using this model for commercial purposes. @article{cheng2023optimize, title={Optimize weight rounding via signed gradient descent for the quantization of llms}, author={Cheng, Wenhua and Zhang, Weiwei and Shen, Haihao and Cai, Yiyang and He, Xin and Lv, Kaokao and Liu, Yi}, journal={arXiv preprint arXiv:2309.05516}, year={2023} }

NaNK

license:apache-2.0

Qwen3-235B-A22B-Instruct-2507-int4-mixed-AutoRound

NaNK

license:apache-2.0

tvp-base

—

Qwen3-235B-A22B-AutoRound-Recipe

NaNK

license:apache-2.0

Qwen3-235B-A22B-MXFP4-AutoRound-Recipe

NaNK

license:apache-2.0

roberta-base-mrpc

license:mit

ldm3d

—

Kimi-K2-Instruct-gguf-q2ks-mixed-AutoRound

NaNK

—

Qwen3-235B-A22B-Thinking-2507-gguf-q2ks-mixed-AutoRound

This model is a mixed guff q2km model of Qwen/Qwen3-235B-A22B-Thinking-2507 generated by intel/auto-round. Embedding layer and lm-head layer are fallback to 8 bits and non expert layers are fallback to 4 bits. Please refer to Section `Generate the model` for more details. ~~~bash ./llama-cli -hf Intel/Qwen3-235B-A22B-Thinking-2507-gguf-q2ks-mixed-AutoRound:q2ks --conversation ~~~ 我需要理解这些数字的含义。9.8 可能指的是 9.8，而 9.11 可能指的是 9.11。在十进制中，9.8 表示 9 + 8/10 = 9.8，而 9.11 表示 9 + 11/100 = 9.11。用户可能是在问日期，比如 9月8日 vs 9月11日，但问题中写的是“9.8”和“9.11”，在中文中，点通常表示小数点，而不是日期分隔符。日期通常用斜杠或连字符，如 9/8 或 9-8。在中文上下文中，“9.8” 可能被误解为日期，但标准表示法中，日期通常不会用点分隔。例如，9月8日常写成“9/8”或“9月8日”。另一个可能性：9.8 可能指代某个事件，如 9.8地震，但9.8地震通常指2008年5月12日汶川地震，震级为8.0，但这里写的是9.8，可能不准确。9.11 可能指9/11事件。如果用户意指日期，9月8日 vs 9月11日，那么9月11日在9月8日之后，所以9月11日更大，但日期比较通常不是“大”的意思，而是时间顺序。或许是版本号，如软件版本9.8 vs 9.11，但9.11版本可能不存在，通常版本号是9.10、9.11等，但9.8和9.11，9.11可能比9.8新，但版本号比较不是简单的数值，因为9.11可能表示9.1.1，但这里写的是9.11，可能直接比较。 9.8% = 0.098, 9.11% = 0.0911, 0.098 > 0.0911, 所以9.8% > 9.11%。为什么 9.8 更大？ - 9.8 表示 9.8（即 9 + 8/10 = 9.8）。 - 9.11 表示 9.11（即 9 + 11/100 = 9.11）。 - 将两者转换为相同小数位数以便比较： - 9.8 = 9.80 - 9.11 = 9.11 - 比较小数部分：80/100（0.80） > 11/100（0.11），因此 9.80 > 9.11。 - 计算差值：9.8 - 9.11 = 0.69 > 0，这进一步证明 9.8 更大。其他可能的误解 - 如果是日期比较（如 9月8日 vs 9月11日）：在日期上下文中，“9.8”通常指 9月8日，“9.11”指 9月11日。9月11日比9月8日晚（即时间顺序上更大），但问题中的“大”一般指数值大小而非时间先后。数值比较仍是 9.8 > 9.11。 - 如果是事件代号（如 9.8地震 vs 9.11事件）： - “9.8”可能指2008年5月12日汶川地震（实际震级为8.0，但有时被误称为9.8）。 - “9.11”通常指2001年9月11日美国恐怖袭击事件。但事件本身无法直接比较“大小”，问题更可能是在问数值。如果指事件影响力，9.11事件全球影响更大，但问题表述更倾向于数值比较。结论 9.8 大。如果您的问题有特定背景（如日期、事件或版本号），请提供更多细节，我可以进一步解释！ ~~~python import torch from transformers import AutoModelForCausalLM, AutoTokenizer from autoround import AutoRound model = AutoModelForCausalLM.frompretrained(modelname, devicemap="cpu", torchdtype="auto") tokenizer = AutoTokenizer.frompretrained(modelname) layerconfig = {} for n, m in model.namedmodules(): if n == "lmhead" or isinstance(m, torch.nn.Embedding): layerconfig[n] = {"bits": 8} elif isinstance(m, torch.nn.Linear) and (not "expert" in n or "sharedexperts" in n) and n != "lmhead": layerconfig[n] = {"bits": 4} autoround = AutoRound(model, tokenizer, iters=0, nsamples=512,layerconfig=layerconfig) autoround.quantizeandsave("./Qwen3-235B-A22B-Thinking-2507-q2ks", format="gguf:q2ks") The model can produce factually incorrect output, and should not be relied on to produce factually accurate information. Because of the limitations of the pretrained model and the finetuning datasets, it is possible that this model could generate lewd, biased or otherwise offensive outputs. Therefore, before deploying any applications of the model, developers should perform safety testing. Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. Here are a couple of useful links to learn more about Intel's AI software: The license on this model does not constitute legal advice. We are not responsible for the actions of third parties who use this model. Please consult an attorney before using this model for commercial purposes. @article{cheng2023optimize, title={Optimize weight rounding via signed gradient descent for the quantization of llms}, author={Cheng, Wenhua and Zhang, Weiwei and Shen, Haihao and Cai, Yiyang and He, Xin and Lv, Kaokao and Liu, Yi}, journal={arXiv preprint arXiv:2309.05516}, year={2023} }

NaNK

license:apache-2.0

MiniMax-M2.5-int4-AutoRound

—

GLM-4.7-int4-mixed-AutoRound

—

Qwen3-235B-A22B-q2ks-mixed-AutoRound

This model is a mixed gguf:q2ks of Qwen/Qwen3-235B-A22B generated by intel/auto-round algorithm. Embedding layer and lm-head layer are fallback to 8 bits and non expert layers are fallback to 4 bits. Please refer to Section `Generate the model` for more details. ~~~bash ./llama-cli -hf Intel/Qwen3-235B-A22B-q2ks-mixed-ar:q2ks --conversation ~~~ Okay, the user said "Hi". That's a greeting. I should respond in a friendly and welcoming manner. Let me make sure to acknowledge their greeting and offer assistance. Maybe something like, "Hello! How can I assist you today?" That's simple and to the point, and opens the door for them to ask questions or request help. I should keep it approachable and positive. Alright, that should do it. Okay, the user wants me to create a Flappy Bird game in Python. Let me think about how to approach this. First, I remember that Flappy Bird is a simple game where the bird jumps through pipes. So, I need to handle movement, collision detection, and scoring. Python has libraries like Pygame which are good for game development. Wait, the user might not be familiar with Pygame. I should include installation instructions. Oh right, they might need to install pygame first. So I should mention using pip install pygame. Now, breaking down the game components. There's the bird, the pipes, and the game loop. Let me outline each part. For the bird, it needs to move up when the user presses a key, and gravity pulls it down. The position and velocity variables will track that. Pipes: They should move from right to left, and when they go off-screen, they reset to the right with a random height. Need to generate pipes periodically. Collision detection: Check if the bird's position overlaps with any pipe. Also, check if the bird hits the ground or goes above the screen. Game loop: Handle events, update positions, check collisions, and redraw everything. Also, keep track of the score when the bird passes a pipe. Wait, maybe I should structure the code step by step. Start by initializing Pygame, setting up the screen, colors, and then create classes or functions for Bird, Pipe, and the game mechanics. But since the user might want a quick example without too much complexity, I can write a simplified version. Let me make sure variables are properly initialized, like bird's velocity, pipe speed, and score. Also, need to handle the game over screen and restart option. Oh right, when there's a collision, display "Game Over" and wait for the user to quit or restart. Testing for possible issues: The pipes might not generate correctly if the height is not random enough. Should generate pipes at intervals and check their positions. Wait, in Pygame, the coordinates start at (0,0) top-left. So bird's y position should be adjusted with gravity. Maybe use rectangles for collision detection. I should write code that creates a list of pipes, updates their positions each frame, and removes them when off-screen. Then add new pipes as needed. Also, the score increases when the bird passes a pipe. Need to track which pipes have been passed already to increment the score. Let me put all together: Initialize Pygame, set up display, define classes or variables, game loop, events, updates, rendering, and collision checks. Hmm, I should make sure the code is well-commented so the user can follow along. Also, mention that they can adjust parameters like gravity, jump strength, pipe speed, etc. Wait, possible bugs: If the bird's rectangle isn't correctly sized, collision detection might not work. Need to ensure that the bird's rect and pipe rects are properly defined. Also, the initial position of the pipes should be off-screen to the right, then move left each frame. When a pipe moves past the left edge, reset its position to the right with new height. Testing the code might be necessary, but since I can't run it here, I'll make sure to structure it logically. Maybe include a main loop that runs until the user quits, handles keyboard inputs, updates bird and pipes, checks collisions. So putting it all together, the code should be functional and simple enough for someone learning Pygame to understand. Alright, I'll draft the code step by step now. Here's a simplified Flappy Bird implementation in Python using Pygame: 2. Save the code to a file (e.g., `flappybird.py`) 3. Run the file using Python Features: - Simple physics (gravity) - Randomly generated pipes - Score tracking - Collision detection - Basic game over screen You can enhance this basic implementation by: 1. Adding images/sprites for the bird and pipes 2. Implementing sound effects 3. Adding start screen and restart functionality 4. Improving the scoring system 5. Adding difficulty progression Let me know if you'd like help expanding any specific feature! ~~~python import torch from transformers import AutoModelForCausalLM, AutoTokenizer from autoround import AutoRound model = AutoModelForCausalLM.frompretrained(modelname, devicemap="cpu", torchdtype="auto") tokenizer = AutoTokenizer.frompretrained(modelname) layerconfig = {} for n, m in model.namedmodules(): if n == "lmhead" or isinstance(m,torch.nn.Embedding): layerconfig[n] = {"bits": 8} elif isinstance(m, torch.nn.Linear) and (not "expert" in n or "sharedexperts" in n) and n != "lmhead": layerconfig[n] = {"bits": 4} autoround = AutoRound(model, tokenizer, iters=0, layerconfig=layerconfig, nsamples=512) autoround.quantizeandsave("/dataset/Qwen3-235B-A22B-q2ks", format="gguf:q2ks") The model can produce factually incorrect output, and should not be relied on to produce factually accurate information. Because of the limitations of the pretrained model and the finetuning datasets, it is possible that this model could generate lewd, biased or otherwise offensive outputs. Therefore, before deploying any applications of the model, developers should perform safety testing. Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. Here are a couple of useful links to learn more about Intel's AI software: The license on this model does not constitute legal advice. We are not responsible for the actions of third parties who use this model. Please consult an attorney before using this model for commercial purposes. @article{cheng2023optimize, title={Optimize weight rounding via signed gradient descent for the quantization of llms}, author={Cheng, Wenhua and Zhang, Weiwei and Shen, Haihao and Cai, Yiyang and He, Xin and Lv, Kaokao and Liu, Yi}, journal={arXiv preprint arXiv:2309.05516}, year={2023} }

NaNK

license:apache-2.0

Qwen3-Coder-480B-A35B-Instruct-gguf-q2ks-mixed-AutoRound

NaNK

license:apache-2.0

DeepSeek-V3.1-Terminus-gguf-q2ks-mixed-AutoRound

This model is a mixed gguf q2ks format of deepseek-ai/DeepSeek-V3.1-Terminus generated by intel/auto-round algorithm. Embedding layer and lm-head layer are fallback to 8 bits and non expert layers are fallback to 4 bits. Please refer to Section `Generate the model` for more details. ~~~bash ./llama-cli -hf Intel/DeepSeek-V3.1-Terminus-gguf-q2ks-mixed-AutoRound ~~~ ~~~bash > Write a quick sort algorithm. Here's a quick sort algorithm implemented in Python: 1. First version (easy to understand): - Uses list comprehensions to create left, middle, and right partitions - More readable but uses extra memory 2. Second version (in-place): - More memory efficient - Uses the Lomuto partition scheme - Modifies the original array Time Complexity: - Best/Average case: O(n log n) - Worst case: O(n²) - when pivot is always the smallest or largest element Space Complexity: - First version: O(n) - due to creating new lists - Second version: O(log n) - due to recursion stack The algorithm works by selecting a 'pivot' element and partitioning the array around it, then recursively sorting the sub-arrays. ~~~ The model can produce factually incorrect output, and should not be relied on to produce factually accurate information. Because of the limitations of the pretrained model and the finetuning datasets, it is possible that this model could generate lewd, biased or otherwise offensive outputs. Therefore, before deploying any applications of the model, developers should perform safety testing. Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. Here are a couple of useful links to learn more about Intel's AI software: The license on this model does not constitute legal advice. We are not responsible for the actions of third parties who use this model. Please consult an attorney before using this model for commercial purposes. @article{cheng2023optimize, title={Optimize weight rounding via signed gradient descent for the quantization of llms}, author={Cheng, Wenhua and Zhang, Weiwei and Shen, Haihao and Cai, Yiyang and He, Xin and Lv, Kaokao and Liu, Yi}, journal={arXiv preprint arXiv:2309.05516}, year={2023} }

NaNK

—

Qwen3-235B-A22B-Instruct-2507-int4-AutoRound

This model is an int4 model with groupsize 128 and symmetric quantization of [Qwen/Qwen3-235B-A22B-Instruct-2507](https://huggingface.co/deepseek-ai/DeepSeek-R1-0528-Qwen3-8B) generated by intel/auto-round algorithm. ~~~bash vllm serve Intel/Qwen3-235B-A22B-Thinking-2507-int4-AutoRound --tensor-parallel-size 4 --max-model-len 32768 ~~~ ~~~python from transformers import AutoModelForCausalLM, AutoTokenizer modelname = "Intel/Qwen3-235B-A22B-Instruct-2507-int4-AutoRound" load the tokenizer and the model tokenizer = AutoTokenizer.frompretrained(modelname) model = AutoModelForCausalLM.frompretrained( modelname, torchdtype="auto", devicemap="auto" ) prepare the model input prompt = "Give me a short introduction to large language model." messages = [ {"role": "user", "content": prompt} ] text = tokenizer.applychattemplate( messages, tokenize=False, addgenerationprompt=True, ) modelinputs = tokenizer([text], returntensors="pt").to(model.device) conduct text completion generatedids = model.generate( modelinputs, maxnewtokens=16384 ) outputids = generatedids[0][len(modelinputs.inputids[0]):].tolist() content = tokenizer.decode(outputids, skipspecialtokens=True) """ content: A large language model (LLM) is a type of artificial intelligence trained on vast amounts of text data to understand and generate human-like language. Using deep learning techniques, particularly transformer architectures, LLMs learn patterns, grammar, facts, and reasoning abilities from diverse sources like books, websites, and articles. They can perform a wide range of language tasks, such as answering questions, writing stories, coding, and translating languages. Examples include models like GPT, Llama, and PaLM. Their power comes from their size—often billions of parameters—and extensive training, enabling them to produce coherent and contextually relevant responses. The model can produce factually incorrect output, and should not be relied on to produce factually accurate information. Because of the limitations of the pretrained model and the finetuning datasets, it is possible that this model could generate lewd, biased or otherwise offensive outputs. Therefore, before deploying any applications of the model, developers should perform safety testing. Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. Here are a couple of useful links to learn more about Intel's AI software: The license on this model does not constitute legal advice. We are not responsible for the actions of third parties who use this model. Please consult an attorney before using this model for commercial purposes. @article{cheng2023optimize, title={Optimize weight rounding via signed gradient descent for the quantization of llms}, author={Cheng, Wenhua and Zhang, Weiwei and Shen, Haihao and Cai, Yiyang and He, Xin and Lv, Kaokao and Liu, Yi}, journal={arXiv preprint arXiv:2309.05516}, year={2023} }

NaNK

license:apache-2.0

bert-base-uncased-squadv1.1-sparse-80-1x4-block-pruneofa

license:apache-2.0

GLM-4.5-gguf-q2ks-mixed-AutoRound

NaNK

—

Qwen3-VL-235B-A22B-Thinking-int4-mixed-AutoRound

This model is a mixed int4 model with groupsize 128 and symmetric quantization of Qwen/Qwen3-VL-235B-A22B-Thinking generated by intel/auto-round via RTN(no algorithm tuning). Non expert layers are fallback to 8 bits. Please refer to Section Generate the model for more details. Please follow the license of the original model. Please import `Qwen3VLMoeForConditionalGeneration` from `modelingqwen3vlmoe.py` in the model folder. Generate the model Step 1: use the script to convert model The model can produce factually incorrect output, and should not be relied on to produce factually accurate information. Because of the limitations of the pretrained model and the finetuning datasets, it is possible that this model could generate lewd, biased or otherwise offensive outputs. Therefore, before deploying any applications of the model, developers should perform safety testing. Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. Here are a couple of useful links to learn more about Intel's AI software: The license on this model does not constitute legal advice. We are not responsible for the actions of third parties who use this model. Please consult an attorney before using this model for commercial purposes. @article{cheng2023optimize, title={Optimize weight rounding via signed gradient descent for the quantization of llms}, author={Cheng, Wenhua and Zhang, Weiwei and Shen, Haihao and Cai, Yiyang and He, Xin and Lv, Kaokao and Liu, Yi}, journal={arXiv preprint arXiv:2309.05516}, year={2023} }

NaNK

—

DeepSeek-V3.1-int4-mixed-AutoRound

NaNK

—

GLM-4.5V-int4-AutoRound

This model is an int4 model with groupsize 128 and symmetric quantization of zai-org/GLM-4.5V generated by intel/auto-round algorithm. Please follow the license of the original model. The model can produce factually incorrect output, and should not be relied on to produce factually accurate information. Because of the limitations of the pretrained model and the finetuning datasets, it is possible that this model could generate lewd, biased or otherwise offensive outputs. Therefore, before deploying any applications of the model, developers should perform safety testing. Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. Here are a couple of useful links to learn more about Intel's AI software: The license on this model does not constitute legal advice. We are not responsible for the actions of third parties who use this model. Please consult an attorney before using this model for commercial purposes. @article{cheng2023optimize, title={Optimize weight rounding via signed gradient descent for the quantization of llms}, author={Cheng, Wenhua and Zhang, Weiwei and Shen, Haihao and Cai, Yiyang and He, Xin and Lv, Kaokao and Liu, Yi}, journal={arXiv preprint arXiv:2309.05516}, year={2023} }

NaNK

—

DeepSeek-V3.1-int4-AutoRound

This model is a int4 model with groupsize 128 and symmetric quantization of deepseek-ai/DeepSeek-V3.1 generated by intel/auto-round. Please follow the license of the original model. ~~~bash vllm serve Intel/DeepSeek-V3.1-int4-AutoRound ~~~ INT4 Inference Potential overflow/underflow issues have been observed on CUDA, primarily due to kernel limitations. For better accuracy, we recommend deploying the model on CPU or using our INT4 mixed version Generate the model Mian branch is required if the model is fp8 and the device supports fp8 https://github.com/intel/auto-round | benchmark | backend | Intel/DeepSeek-V3.1-int4-AutoRound | deepseek-ai/DeepSeek-V3.1 | | :-------: | :-----: | :--------------------------------: | :-----------------------: | | mmlupro | vllm | 0.7865 | 0.7965 | The model can produce factually incorrect output, and should not be relied on to produce factually accurate information. Because of the limitations of the pretrained model and the finetuning datasets, it is possible that this model could generate lewd, biased or otherwise offensive outputs. Therefore, before deploying any applications of the model, developers should perform safety testing. Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. Here are a couple of useful links to learn more about Intel's AI software: The license on this model does not constitute legal advice. We are not responsible for the actions of third parties who use this model. Please consult an attorney before using this model for commercial purposes. @article{cheng2023optimize, title={Optimize weight rounding via signed gradient descent for the quantization of llms}, author={Cheng, Wenhua and Zhang, Weiwei and Shen, Haihao and Cai, Yiyang and He, Xin and Lv, Kaokao and Liu, Yi}, journal={arXiv preprint arXiv:2309.05516}, year={2023} }

NaNK

—

ldm3d-pano

—

fid_flan_t5_base_nq

license:cc-by-sa-3.0

neural-chat-7b-v3

A large language model (LLM) licensed under Apache 2.0.

NaNK

license:apache-2.0

Qwen3-30B-A3B-q2ks-mixed-AutoRound-inc-v0

This model is a mixed gguf:q2ks of Qwen/Qwen3-30B-A3B generated by intel/auto-round algorithm. Embedding layer and non expert layers are fallback to 4 bits. Please refer to Section `Generate the model` for more details. ~~~bash ./llama-cli -hf Intel/Qwen3-30B-A3B-q2ks-mixed-AutoRound-inc-v0 --conversation ~~~ Okay, the user just said "Hi". I need to respond appropriately. Since it's a greeting, I should greet them back and offer assistance. Let me make sure the response is friendly and open-ended. Maybe something like "Hello! How can I assist you today?" That should work. I don't need to add anything else unless they ask for more. Keep it simple and polite. Okay, the user is asking whether 9.8 is bigger than 9.11. Let me think about how to approach this. First, I need to confirm the numbers. 9.8 and 9.11. Wait, are these decimal numbers? So 9.8 is the same as 9.80, right? And 9.11 is 9.11. Comparing them, 9.80 is larger than 9.11 because 0.80 is bigger than 0.11. So 9.8 is bigger. But maybe the user is confused about decimal points. Let me check if there's any other way to interpret these numbers. Like, maybe they're talking about dates? But 9.8 and 9.11 could be September 8th and September 11th. But the question is about which is "big" – maybe the user is asking which date is later? September 11th is later than September 8th, so 9.11 would be bigger in that context. But the original question is about numerical values. Wait, the user wrote "9.8大还是9.11大". "大" means big, so comparing the numbers. If it's about numerical values, 9.8 is larger than 9.11. But if it's about dates, September 11th is later. However, the user didn't specify context. Maybe they are asking about numbers. So I should explain both possibilities. But in math terms, 9.8 is greater than 9.11. Because when comparing decimals, 9.8 is 9.80, which is larger than 9.11. So the answer would be 9.8 is bigger. But I should make sure to clarify that if they meant dates, then 9.11 would be later. However, since the question is in Chinese, maybe the user is asking about numbers. I should respond by stating that 9.8 is larger than 9.11 when comparing numerical values. But also mention if they're referring to dates, then 9.11 (September 11th) is later. But since the original question is about "大", which is about size, it's likely about numbers. So the answer is 9.8 is bigger. Wait, but in Chinese, sometimes people might write 9.8 as 9.8 or 9.11 as 9.11. So the comparison is straightforward. So the answer is 9.8 is larger. But I need to make sure the user understands that. Maybe they are confused about decimal places. Let me explain step by step. First, compare the whole number parts: both are 9. Then compare the decimal parts. 0.8 vs. 0.11. Since 0.8 is 0.80, which is larger than 0.11. So 9.8 is bigger. Alternatively, if they are dates, September 11th is later than September 8th. But the user didn't specify. So I should cover both possibilities. But the user's question is in Chinese, so maybe they are asking about numbers. So I'll explain that 9.8 is larger than 9.11 when comparing numerical values. I should also check if there's any cultural context. For example, in some contexts, numbers might be written differently. But I think the standard decimal comparison applies here. So the final answer is 9.8 is larger than 9.11. But I should present it clearly, maybe with an example. - 9.8 是 9.80（可以理解为 9.8），而 9.11 是 9.11。 - 当比较小数部分时，0.80（即 9.8）大于 0.11（即 9.11）。 - 因此，9.8 > 9.11。如果问题涉及日期（例如 9月8日和 9月11日），则 9月11日（9.11）是更晚的日期，但根据数值大小，9.8 更大。 Okay, let's see. The user is asking a riddle: "There are ten birds on a tree. A gun is used to shoot one, how many are left?" Hmm, this seems like a classic riddle that's not straightforward. Let me think about it. First, the initial number is ten birds. If you shoot one, the obvious answer would be 10 - 1 = 9. But riddles often have a twist. Maybe the answer is different. Let me recall similar riddles. Oh, right! The trick here is that when you shoot a bird, the others might fly away because of the noise. So instead of 9, the answer could be zero because all the remaining birds would be scared and leave. But wait, the user might be expecting that. However, sometimes the answer depends on the context. If the birds are dead, then 9 would be the answer. But if they are alive and the gun shot causes them to fly away, then none would be left. But the riddle is often designed to have the answer as zero. Let me check if that's the case. Another angle: the question is in Chinese, so maybe there's a specific cultural reference. But I think the common answer is zero. However, I should explain both possibilities. The user might be testing if they know the trick. So the answer is either 9 or 0, depending on whether the other birds fly away. But I need to make sure I explain the reasoning. So, the answer is zero because the sound of the gunshot would scare the remaining birds, causing them to fly away. Therefore, no birds are left. But if the question is literal, then 9. But the riddle is designed to trick you into thinking it's 9, but the actual answer is zero. So I should present both possibilities and explain the trick. 答案通常是0，因为问题的意图是考察“逻辑思维”而非数学计算。类似的问题还有：“树上有3只鸟，打死了1只，剩下几只？”答案可能是“0只”（其他鸟飞走了）或“2只”（假设鸟不会飞）。但根据常见设定，答案是 0只。 Okay, the user is asking "how many e in Qwen". Let me check the spelling of Qwen. Q-w-e-n. So the letters are Q, W, E, N. So there is one E. But maybe they're asking about the number of E's in the name "Qwen". Let me confirm. The name Qwen is spelled Q-W-E-N. So there's one E. But maybe the user is confused or there's a typo. Alternatively, if they meant "Qwen" as in the model name, but the model is called Qwen, so the letters are Q, W, E, N. So one E. But maybe they're asking about the number of E's in the word "Qwen" when written in lowercase or uppercase. But regardless, it's just one E. So the answer is 1. But I need to make sure there's no other E's. Let me double-check. Q-W-E-N. Yes, only one E. So the answer is 1. But maybe the user is tricking me? Or maybe they meant something else. But I think the answer is 1. ~~~python import torch from transformers import AutoModelForCausalLM,AutoTokenizer from autoround import AutoRound modelname = "Qwen/Qwen3-30B-A3B" model = AutoModelForCausalLM.frompretrained(modelname, devicemap="cpu", torchdtype="auto") tokenizer = AutoTokenizer.frompretrained(modelname) layerconfig = {} for n,m in model.namedmodules(): if isinstance(m, torch.nn.Embedding): layerconfig[n]={"bits":4} elif isinstance(m,torch.nn.Linear) and not "expert" in n and n != "lmhead": layerconfig[n] = {"bits": 4} autoround = AutoRound(model,tokenizer,iters=0,layerconfig=layerconfig) autoround.quantizeandsave("./tmpautoround",format="gguf:q2ks") ~~~ The model can produce factually incorrect output, and should not be relied on to produce factually accurate information. Because of the limitations of the pretrained model and the finetuning datasets, it is possible that this model could generate lewd, biased or otherwise offensive outputs. Therefore, before deploying any applications of the model, developers should perform safety testing. Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. Here are a couple of useful links to learn more about Intel's AI software: The license on this model does not constitute legal advice. We are not responsible for the actions of third parties who use this model. Please consult an attorney before using this model for commercial purposes. @article{cheng2023optimize, title={Optimize weight rounding via signed gradient descent for the quantization of llms}, author={Cheng, Wenhua and Zhang, Weiwei and Shen, Haihao and Cai, Yiyang and He, Xin and Lv, Kaokao and Liu, Yi}, journal={arXiv preprint arXiv:2309.05516}, year={2023} }

NaNK

license:apache-2.0

toxic-prompt-roberta

NaNK

license:mit

GLM-4.5-int4-AutoRound

NaNK

—

dynamic-minilmv2-L6-H384-squad1.1-int8-static

license:mit

gpt-oss-20b-int4-g64-AutoRound

NaNK

—

DeepSeek-V3.1-Terminus-int4-mixed-AutoRound

NaNK

—

deberta-v3-base-mrpc

license:mit

NeuroPrompts

—

Qwen3-Coder-480B-A35B-Instruct-int4-AutoRound

This model is a mixed int4 model with groupsize 128 and symmetric quantization of Qwen/Qwen3-Coder-480B-A35B-Instruct generated by intel/auto-round algorithm. `mlp.gate` layers fallback to 16 bits to ensure runing successfully on vLLM. vLLM usage ~~~bash vllm serve Intel/Qwen3-Coder-480B-A35B-Instruct-int4-AutoRound --tensor-parallel-size 4 --max-model-len 65536 ~~~ ~~~python from transformers import AutoModelForCausalLM, AutoTokenizer modelname = "Intel/Qwen3-Coder-480B-A35B-Instruct-int4-AutoRound" load the tokenizer and the model tokenizer = AutoTokenizer.frompretrained(modelname) model = AutoModelForCausalLM.frompretrained( modelname, torchdtype="auto", devicemap="auto" ) prompts = [ "Write a quick sort algorithm.", "Write a flappy bird.", "Write a llm quantization algorithm.", ] texts = [] for prompt in prompts: messages = [ {"role": "user", "content": prompt} ] text = tokenizer.applychattemplate( messages, tokenize=False, addgenerationprompt=True ) texts.append(text) inputs = tokenizer(texts, returntensors="pt", padding=True, truncation=True, paddingside="left").to(model.device) conduct text completion outputs = model.generate( inputs, maxnewtokens=65536, ) generatedids = [ outputids[len(inputids):] for inputids, outputids in zip(inputs["inputids"], outputs) ] decodedoutputs = tokenizer.batchdecode(generatedids, skipspecialtokens=True) for i, prompt in enumerate(prompts): inputid = inputs print(f"Prompt: {prompt}") print(f"Generated: {decodedoutputs[i]}") print("-" 50) Here is the sample command to reproduce the model. 380G The model can produce factually incorrect output, and should not be relied on to produce factually accurate information. Because of the limitations of the pretrained model and the finetuning datasets, it is possible that this model could generate lewd, biased or otherwise offensive outputs. Therefore, before deploying any applications of the model, developers should perform safety testing. Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. Here are a couple of useful links to learn more about Intel's AI software: The license on this model does not constitute legal advice. We are not responsible for the actions of third parties who use this model. Please consult an attorney before using this model for commercial purposes. @article{cheng2023optimize, title={Optimize weight rounding via signed gradient descent for the quantization of llms}, author={Cheng, Wenhua and Zhang, Weiwei and Shen, Haihao and Cai, Yiyang and He, Xin and Lv, Kaokao and Liu, Yi}, journal={arXiv preprint arXiv:2309.05516}, year={2023} }

NaNK

license:apache-2.0

distilbart-cnn-12-6-int8-dynamic-inc

license:apache-2.0

vit-base-patch16-224-int8-static-inc

license:apache-2.0

Qwen3-235B-A22B-Thinking-2507-int4-AutoRound

This model is an int4 model with groupsize 128 and symmetric quantization of Qwen/Qwen3-235B-A22B-Thinking-2507 generated by intel/auto-round algorithm. Please follow the license of the original model. ~~~bash vllm serve Intel/Qwen3-235B-A22B-Thinking-2507-int4-AutoRound --tensor-parallel-size 4 --max-model-len 32768 ~~~ ~~~python from transformers import AutoModelForCausalLM, AutoTokenizer modelname = "Intel/Qwen3-235B-A22B-Thinking-2507-int4-AutoRound" load the tokenizer and the model tokenizer = AutoTokenizer.frompretrained(modelname) model = AutoModelForCausalLM.frompretrained( modelname, torchdtype="auto", devicemap="auto" ) prepare the model input prompt = "Give me a short introduction to large language model." messages = [ {"role": "user", "content": prompt} ] text = tokenizer.applychattemplate( messages, tokenize=False, addgenerationprompt=True, ) modelinputs = tokenizer([text], returntensors="pt").to(model.device) conduct text completion generatedids = model.generate( modelinputs, maxnewtokens=32768 ) outputids = generatedids[0][len(modelinputs.inputids[0]):].tolist() parsing thinking content try: # rindex finding 151668 ( ) index = len(outputids) - outputids[::-1].index(151668) except ValueError: index = 0 thinkingcontent = tokenizer.decode(outputids[:index], skipspecialtokens=True).strip("\n") content = tokenizer.decode(outputids[index:], skipspecialtokens=True).strip("\n") print("thinking content:", thinkingcontent) # no opening tag print("content:", content) """ thinking content: Okay, the user asked for a short introduction to large language models. Let me start by breaking down what they need. They probably want a concise explanation without too much jargon. Since they specified "short," I should keep it to the essentials. First, I need to define what an LLM is. Mention that it's a type of AI trained on massive text data. Highlight key capabilities like understanding and generating human-like text. Maybe include examples of tasks they can do—writing, translating, answering questions. That makes it relatable. Wait, the user might not know technical terms like "deep learning" or "transformer architecture." Should I explain those? Probably not, since they want it short. But a brief mention of the underlying tech could add context without diving deep. Just say they're based on advanced neural networks. Also, important to note that LLMs predict the next word. That's a core mechanism. But avoid getting too technical. Emphasize the scale—billions of parameters and huge datasets. That's why they're "large." Should I mention limitations? The user didn't ask, but it's part of a proper intro. Maybe a sentence about not understanding like humans and potential inaccuracies. But since they want it short, maybe just hint at it without details. Check if the response covers: definition, how they work (briefly), what they can do, and a note on limitations. Keep it under a few paragraphs. Avoid markdown as per the user's request. Use simple language. Make sure it's clear and not overwhelming. Also, consider the user's possible background. Could be a student, professional, or someone curious about AI. Tailor to a general audience. No need for references or citations. Just straightforward points. Double-check the key points: trained on vast text, predict text, various applications, scale matters, limitations. Yep, that covers it. Keep examples concrete like chatbots, translation. Avoid mentioning specific models unless necessary—maybe just say "like ChatGPT" as an example. Alright, structure: one paragraph defining LLMs, another on capabilities, a sentence on how they work, and a brief note on limitations. Make sure it's all concise. content: A Large Language Model (LLM) is a type of artificial intelligence trained to understand, generate, and interact with human language. Built using deep learning (typically transformer architectures), LLMs learn patterns from massive datasets of text—often spanning books, websites, and other written material. This training allows them to perform tasks like answering questions, writing stories, translating languages, coding, and even holding conversations, all by predicting the most likely next word in a sequence. Key traits: - Scale: They have billions (or trillions) of parameters—internal "knobs" fine-tuned during training—enabling nuanced language handling. - Versatility: One model can handle diverse tasks without task-specific retraining (a trait called few-shot learning). - Limitations: They don’t "understand" meaning like humans; they mimic patterns, which can lead to inaccuracies, biases, or nonsensical outputs. Examples include OpenAI’s GPT series, Google’s Gemini, and Meta’s Llama. While powerful, LLMs are tools—they augment human creativity and efficiency but require careful use to manage errors and ethical risks. ~~~bash auto-round --model Qwen/Qwen3-235B-A22B-Thinking-2507 --outputdir "./tmpautoround" --enabletorchcompile --nsamples 512 --fplayers mlp.gate ~~~ The model can produce factually incorrect output, and should not be relied on to produce factually accurate information. Because of the limitations of the pretrained model and the finetuning datasets, it is possible that this model could generate lewd, biased or otherwise offensive outputs. Therefore, before deploying any applications of the model, developers should perform safety testing. Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. Here are a couple of useful links to learn more about Intel's AI software: The license on this model does not constitute legal advice. We are not responsible for the actions of third parties who use this model. Please consult an attorney before using this model for commercial purposes. @article{cheng2023optimize, title={Optimize weight rounding via signed gradient descent for the quantization of llms}, author={Cheng, Wenhua and Zhang, Weiwei and Shen, Haihao and Cai, Yiyang and He, Xin and Lv, Kaokao and Liu, Yi}, journal={arXiv preprint arXiv:2309.05516}, year={2023} }

NaNK

license:apache-2.0

bge-small-en-v1.5-rag-int8-static

license:mit

whisper-small-int8-dynamic-inc

license:apache-2.0

distilbert-base-uncased-finetuned-sst-2-english-int8-static-inc

license:apache-2.0

deepseek-coder-1.3b_base_ov_int8

NaNK

llama

phi-2-int4-inc

license:apache-2.0

DeepSeek V3.1 Terminus Int8 AutoRound

This model is a int8 model with groupsize 128 and symmetric quantization of deepseek-ai/DeepSeek-V3.1-Terminus generated by intel/auto-round via RTN(no algorithm tuning). Please refer to Section Generate the model for more details. Please follow the license of the original model. The model can produce factually incorrect output, and should not be relied on to produce factually accurate information. Because of the limitations of the pretrained model and the finetuning datasets, it is possible that this model could generate lewd, biased or otherwise offensive outputs. Therefore, before deploying any applications of the model, developers should perform safety testing. Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. Here are a couple of useful links to learn more about Intel's AI software: The license on this model does not constitute legal advice. We are not responsible for the actions of third parties who use this model. Please consult an attorney before using this model for commercial purposes. @article{cheng2023optimize, title={Optimize weight rounding via signed gradient descent for the quantization of llms}, author={Cheng, Wenhua and Zhang, Weiwei and Shen, Haihao and Cai, Yiyang and He, Xin and Lv, Kaokao and Liu, Yi}, journal={arXiv preprint arXiv:2309.05516}, year={2023} }

NaNK

license:apache-2.0

bert-base-uncased-mnli-sparse-70-unstructured

—

whisper-tiny-onnx-int4-inc

license:apache-2.0

Qwen3-8B-gguf-q4km-AutoRound-inc-v1

This model is an example gguf:q4km model of Qwen/Qwen3-8B generated by intel/auto-round algorithm. The algorithm may change for this format in the future ~~~bash ./llama-cli -m ./Qwen3-8B-8.2B-Q4KM.gguf --conversation ~~~ 嗯，用户问的是“9.8大还是9.11大”，看起来是想比较这两个数字的大小。首先，我需要确认用户的问题是什么。可能他们是在比较两个小数，或者在某个特定的上下文中，比如日期、时间或者其他数值。不过，通常来说，直接比较9.8和9.11的话，应该先看整数部分，整数部分都是9，所以接下来比较小数部分。小数部分的话，9.8的小数部分是0.8，而9.11的小数部分是0.11。这时候需要比较0.8和0.11的大小。0.8等于0.80，而0.11是0.11，显然0.80比0.11大。所以9.8应该比9.11大。不过，用户可能是在问其他的情况，比如日期，比如9月8日和9月11日，这时候比较的话，9月11日比9月8日更晚，所以9.11在日期上更大。但用户的问题里用了小数点，可能更倾向于数值比较。不过也有可能用户有拼写错误或者误解，比如“9.8”和“9.11”是否指其他东西？比如在某些情况下，比如版本号，或者某种编码，但通常这种情况下，比较方式还是基于数值的大小。或者用户可能是在问哪个数字更大，但需要确认是否有其他上下文。另外，用户可能对小数的比较不太熟悉，所以需要详细解释步骤。比如，先比较整数部分，如果整数部分相同，再比较小数部分。或者用户可能混淆了小数点后的位数，比如9.8其实是9.80，而9.11是9.11，这时候比较的话，9.80更大。或者用户可能误以为小数点后的位数越多，数值就越大，比如9.11比9.8多一位小数，但实际上数值的大小取决于实际的数值大小，而不是位数。也有可能用户输入有误，比如9.8和9.11可能应该是其他数字，或者有其他符号。但根据现有的信息，应该按数值比较来处理。所以正确的答案应该是9.8大于9.11。不过为了确保，可能需要再检查一下，比如将两者转换为相同的位数，比如9.80和9.11，然后比较，确实是9.80更大。或者用分数来表示，比如9.8是98/10，而9.11是911/100，然后通分比较，但这样比较复杂，直接比较小数更简单。另外，用户可能是在做数学题，或者学习小数比较，所以需要详细的步骤来解释，避免他们以后遇到类似的问题时不知道如何处理。所以回答时应该分步骤，先比较整数部分，再比较小数部分，同时指出小数点后的位数不影响大小，而是要看实际的数值。或者指出，当整数部分相同时，小数部分的数值越大，整个数就越大，所以9.8的小数部分0.8大于0.11，因此9.8更大。也有可能用户是想测试我是否知道比较小数的方法，或者是否在某些特殊情况下有不同的比较方式。比如，在某些编程语言中，可能处理小数的方式不同，但通常数学上的比较方式是标准的。所以需要确认用户的问题是否在数学范围内，还是其他领域。总结来说，正确的答案应该是9.8更大，但需要明确解释比较的方法，确保用户理解比较小数的步骤。同时，可能需要提醒用户注意小数点后的位数不影响大小，而是要看实际的数值，比如0.8和0.11的比较。 1. 比较整数部分：两者整数部分均为 9，因此需要进一步比较小数部分。 2. 比较小数部分： - 9.8 的小数部分是 0.8（可视为 0.80）。 - 9.11 的小数部分是 0.11。 - 0.80 > 0.11，因此 9.8 > 9.11。结论：在数值大小上，9.8 比 9.11 大。如果涉及日期（如 9月8日 vs 9月11日），则 9月11日更晚，但需根据具体上下文判断。嗯，用户问的是“树上有十只鸟，用枪打死了一只，还剩下几只”。这个问题看起来像是一个脑筋急转弯，可能有隐藏的陷阱。首先，我需要确认用户的问题意图。通常这种问题的答案不是直接的数学计算，而是需要考虑实际情况。首先，如果树上有十只鸟，用枪打死了其中一只，剩下的鸟会因为惊吓而飞走。所以实际剩下的鸟应该是零只，因为其他九只都飞走了。不过，也有可能用户想考察逻辑思维，比如是否存在特殊情况，比如鸟是否聋了，或者枪声是否被其他因素干扰，但这些可能性比较低。另外，可能用户想测试是否考虑到实际情况，比如鸟的反应。通常情况下，鸟听到枪声会立即飞走，所以剩下的鸟数量是零。但有时候这类问题会有不同的答案，比如“因为枪声太大，所有鸟都吓飞了，所以零只”，或者“因为其他鸟没有被吓到，所以剩下九只”，但这需要更多的上下文信息。还需要考虑是否有其他隐藏的条件，比如题目中的“用枪打死”是否指枪声，还是实际射击。如果是实际射击，可能需要考虑是否所有鸟都被打中，但题目中只提到打死了一只，所以可能其他鸟都飞走了。另外，可能用户想考察是否注意到“用枪打死”这个动作本身会引发鸟的反应，从而导致所有鸟飞走，所以答案是零只。但有时候这类问题可能会有不同的答案，比如“剩下的鸟数量取决于其他因素，比如是否有鸟被吓到，或者是否还有鸟在树上”，但通常的标准答案是零只，因为所有鸟都被吓飞了。总结起来，正确的逻辑是：鸟听到枪声会飞走，所以剩下的鸟数量为零。但需要确认是否有其他可能的解释，比如题目是否在考察其他因素，但通常这类问题的标准答案是零只。可能的陷阱： - 有人可能直接计算“10 - 1 = 9”，但忽略了鸟的反应（听到枪声会飞走）。 - 如果题目隐含“鸟不会飞走”，答案会不同（如9只），但现实中鸟通常会因惊吓飞走。嗯，用户问的是“how many e in deepseek”，也就是“在'deepseek'中有多少个字母e”。我需要仔细分析这个问题。首先，用户可能是在测试我是否能正确识别单词中的字母数量，或者是否有拼写错误。先确认单词拼写是否正确。"Deepseek" 是一个常见的词汇，可能是指某个品牌或项目的名字，比如 DeepSeek 是一个大模型的名字。不过，用户可能只是单纯想知道其中字母e的数量。 D, E, E, P, S, E, E, K。这样排列的话，总共有四个e。但需要确认每个字母的位置是否正确。或者，可能用户输入有误，比如是“Deepseek”（注意大小写），但无论大小写，字母e的数量是相同的。再检查一遍：不过，也有可能用户拼错了单词，比如是“Deepseeks”或者其他形式，但根据问题中的“deepseek”，应该按原词计算。因此，正确的答案是4个e。可能用户是在练习英语字母统计，或者测试我的注意力。需要确保没有遗漏字母，比如是否有隐藏的e或者拼写错误。确认无误后，给出答案即可。分解如下： D → E → E → P → S → E → E → K 其中，e 出现在第2、3、6、7个位置，共 4 个。注意： - 如果用户想统计的是 "DeepSeek"（首字母大写），答案仍为 4 个 e（大小写不影响统计）。 - 若问题涉及其他拼写（如 "deepseeks"），需重新确认。 Here is the sample command to generate the model. auto-round>0.5.1 The model can produce factually incorrect output, and should not be relied on to produce factually accurate information. Because of the limitations of the pretrained model and the finetuning datasets, it is possible that this model could generate lewd, biased or otherwise offensive outputs. Therefore, before deploying any applications of the model, developers should perform safety testing. Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. Here are a couple of useful links to learn more about Intel's AI software: The license on this model does not constitute legal advice. We are not responsible for the actions of third parties who use this model. Please consult an attorney before using this model for commercial purposes. @article{cheng2023optimize, title={Optimize weight rounding via signed gradient descent for the quantization of llms}, author={Cheng, Wenhua and Zhang, Weiwei and Shen, Haihao and Cai, Yiyang and He, Xin and Lv, Kaokao and Liu, Yi}, journal={arXiv preprint arXiv:2309.05516}, year={2023} }

NaNK

license:apache-2.0

MiroThinker-1.7-mini-int4-AutoRound

—

electra-small-discriminator-mrpc

license:apache-2.0

Bert Base Uncased Sparse 90 Unstructured Pruneofa

Model Details: 90% Sparse BERT-Base (uncased) Prune Once for All This model is a sparse pre-trained model that can be fine-tuned for a wide range of language tasks. The process of weight pruning is forcing some of the weights of the neural network to zero. Setting some of the weights to zero results in sparser matrices. Updating neural network weights does involve matrix multiplication, and if we can keep the matrices sparse while retaining enough important information, we can reduce the overall computational overhead. The term "sparse" in the title of the model indicates a ratio of sparsity in the weights; for more details, you can read Zafrir et al. (2021). Visualization of Prunce Once for All method from Zafrir et al. (2021): | Model Detail | Description | | ----------- | ----------- | | Model Authors - Company | Intel | | Date | September 30, 2021 | | Version | 1 | | Type | NLP - General sparse language model | | Architecture | "The method consists of two steps, teacher preparation and student pruning. The sparse pre-trained model we trained is the model we use for transfer learning while maintaining its sparsity pattern. We call the method Prune Once for All since we show how to fine-tune the sparse pre-trained models for several language tasks while we prune the pre-trained model only once." (Zafrir et al., 2021) | | Paper or Other Resources | Zafrir et al. (2021); GitHub Repo | | License | Apache 2.0 | | Questions or Comments | Community Tab and Intel Developers Discord| | Intended Use | Description | | ----------- | ----------- | | Primary intended uses | This is a general sparse language model; in its current form, it is not ready for downstream prediction tasks, but it can be fine-tuned for several language tasks including (but not limited to) question-answering, genre natural language inference, and sentiment classification. | | Primary intended users | Anyone who needs an efficient general language model for other downstream tasks. | | Out-of-scope uses | The model should not be used to intentionally create hostile or alienating environments for people.| Here is an example of how to import this model in Python: Metrics (Model Performance): | Model | Model Size | SQuADv1.1 (EM/F1) | MNLI-m (Acc) | MNLI-mm (Acc) | QQP (Acc/F1) | QNLI (Acc) | SST-2 (Acc) | |-------------------------------|:----------:|:-----------------:|:------------:|:-------------:|:------------:|:----------:|:-----------:| | 80% Sparse BERT-Base uncased fine-tuned on SQuAD1.1 | - | 81.29/88.47 | - | - | - | - | - | | 85% Sparse BERT-Base uncased | Medium | 81.10/88.42 | 82.71 | 83.67 | 91.15/88.00 | 90.34 | 91.46 | | 90% Sparse BERT-Base uncased | Medium | 79.83/87.25 | 81.45 | 82.43 | 90.93/87.72 | 89.07 | 90.88 | | 90% Sparse BERT-Large uncased | Large | 83.35/90.20 | 83.74 | 84.20 | 91.48/88.43 | 91.39 | 92.95 | | 85% Sparse DistilBERT uncased | Small | 78.10/85.82 | 81.35 | 82.03 | 90.29/86.97 | 88.31 | 90.60 | | 90% Sparse DistilBERT uncased | Small | 76.91/84.82 | 80.68 | 81.47 | 90.05/86.67 | 87.66 | 90.02 | All the results are the mean of two seperate experiments with the same hyper-parameters and different seeds. | Training and Evaluation Data | Description | | ----------- | ----------- | | Datasets | English Wikipedia Dataset (2500M words). | | Motivation | To build an efficient and accurate base model for several downstream language tasks. | | Preprocessing | "We use the English Wikipedia dataset (2500M words) for training the models on the pre-training task. We split the data into train (95%) and validation (5%) sets. Both sets are preprocessed as described in the models’ original papers (Devlin et al., 2019, Sanh et al., 2019). We process the data to use the maximum sequence length allowed by the models, however, we allow shorter sequences at a probability of 0:1." | | Ethical Considerations | Description | | ----------- | ----------- | | Data | The training data come from Wikipedia articles | | Human life | The model is not intended to inform decisions central to human life or flourishing. It is an aggregated set of labelled Wikipedia articles. | | Mitigations | No additional risk mitigation strategies were considered during model development. | | Risks and harms | Significant research has explored bias and fairness issues with language models (see, e.g., Sheng et al., 2021, and Bender et al., 2021). Predictions generated by the model may include disturbing and harmful stereotypes across protected classes; identity characteristics; and sensitive, social, and occupational groups. Beyond this, the extent of the risks involved by using the model remain unknown.| | Use cases | - | | Caveats and Recommendations | | ----------- | | Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. There are no additional caveats or recommendations for this model. |

license:apache-2.0

whisper-tiny-int8-dynamic-inc

license:apache-2.0

gpt-oss-20b-int4-AutoRound-FP8KV

NaNK

license:apache-2.0

neural-chat-7b-v1-1

NaNK

license:apache-2.0

sd-1.5-lcm-openvino

license:mit

whisper-large-v2-onnx-int4-inc

license:apache-2.0

ldm3d-sr

—

Distilbert Base Uncased Sparse 90 Unstructured Pruneofa

Model Details: 90% Sparse DistilBERT-Base (uncased) Prune Once for All This model is a sparse pre-trained model that can be fine-tuned for a wide range of language tasks. The process of weight pruning is forcing some of the weights of the neural network to zero. Setting some of the weights to zero results in sparser matrices. Updating neural network weights does involve matrix multiplication, and if we can keep the matrices sparse while retaining enough important information, we can reduce the overall computational overhead. The term "sparse" in the title of the model indicates a ratio of sparsity in the weights; for more details, you can read Zafrir et al. (2021). Visualization of Prunce Once for All method from Zafrir et al. (2021): | Model Detail | Description | | ----------- | ----------- | | Model Authors - Company | Intel | | Date | September 30, 2021 | | Version | 1 | | Type | NLP - General sparse language model | | Architecture | "The method consists of two steps, teacher preparation and student pruning. The sparse pre-trained model we trained is the model we use for transfer learning while maintaining its sparsity pattern. We call the method Prune Once for All since we show how to fine-tune the sparse pre-trained models for several language tasks while we prune the pre-trained model only once." (Zafrir et al., 2021) | | Paper or Other Resources | Zafrir et al. (2021); GitHub Repo | | License | Apache 2.0 | | Questions or Comments | Community Tab and Intel Developers Discord| | Intended Use | Description | | ----------- | ----------- | | Primary intended uses | This is a general sparse language model; in its current form, it is not ready for downstream prediction tasks, but it can be fine-tuned for several language tasks including (but not limited to) question-answering, genre natural language inference, and sentiment classification. | | Primary intended users | Anyone who needs an efficient general language model for other downstream tasks. | | Out-of-scope uses | The model should not be used to intentionally create hostile or alienating environments for people.| Here is an example of how to import this model in Python: Metrics (Model Performance): | Model | Model Size | SQuADv1.1 (EM/F1) | MNLI-m (Acc) | MNLI-mm (Acc) | QQP (Acc/F1) | QNLI (Acc) | SST-2 (Acc) | |-------------------------------|:----------:|:-----------------:|:------------:|:-------------:|:------------:|:----------:|:-----------:| | 80% Sparse BERT-Base uncased fine-tuned on SQuAD1.1 | - | 81.29/88.47 | - | - | - | - | - | | 85% Sparse BERT-Base uncased | Medium | 81.10/88.42 | 82.71 | 83.67 | 91.15/88.00 | 90.34 | 91.46 | | 90% Sparse BERT-Base uncased | Medium | 79.83/87.25 | 81.45 | 82.43 | 90.93/87.72 | 89.07 | 90.88 | | 90% Sparse BERT-Large uncased | Large | 83.35/90.20 | 83.74 | 84.20 | 91.48/88.43 | 91.39 | 92.95 | | 85% Sparse DistilBERT uncased | Small | 78.10/85.82 | 81.35 | 82.03 | 90.29/86.97 | 88.31 | 90.60 | | 90% Sparse DistilBERT uncased | Small | 76.91/84.82 | 80.68 | 81.47 | 90.05/86.67 | 87.66 | 90.02 | All the results are the mean of two seperate experiments with the same hyper-parameters and different seeds. | Training and Evaluation Data | Description | | ----------- | ----------- | | Datasets | English Wikipedia Dataset (2500M words). | | Motivation | To build an efficient and accurate base model for several downstream language tasks. | | Preprocessing | "We use the English Wikipedia dataset (2500M words) for training the models on the pre-training task. We split the data into train (95%) and validation (5%) sets. Both sets are preprocessed as described in the models’ original papers (Devlin et al., 2019, Sanh et al., 2019). We process the data to use the maximum sequence length allowed by the models, however, we allow shorter sequences at a probability of 0:1." | | Ethical Considerations | Description | | ----------- | ----------- | | Data | The training data come from Wikipedia articles | | Human life | The model is not intended to inform decisions central to human life or flourishing. It is an aggregated set of labelled Wikipedia articles. | | Mitigations | No additional risk mitigation strategies were considered during model development. | | Risks and harms | Significant research has explored bias and fairness issues with language models (see, e.g., Sheng et al., 2021, and Bender et al., 2021). Predictions generated by the model may include disturbing and harmful stereotypes across protected classes; identity characteristics; and sensitive, social, and occupational groups. Beyond this, the extent of the risks involved by using the model remain unknown.| | Use cases | - | | Caveats and Recommendations | | ----------- | | Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. There are no additional caveats or recommendations for this model. |

license:apache-2.0

distilgpt2-wikitext2

license:apache-2.0

bert-mini-sst2-distilled-sparse-90-1X4-block

license:mit

xlnet-base-cased-mrpc-int8-static-inc

license:mit

Qwen3-8B-gguf-q2ks-AutoRound-inc-v2

NaNK

license:apache-2.0

distilbert-base-uncased-finetuned-sst-2-english-int8-dynamic-inc

license:apache-2.0

Qwen3-30B-A3B-FP8-DYNAMIC-Test-Only

NaNK

license:apache-2.0

Qwen3-30B-A3B-FP8-DYNAMIC

NaNK

license:apache-2.0

neural-chat-7b-v3-1-int4-inc

NaNK

license:apache-2.0

Qwen3-8B-gguf-q2ks-AutoRound-inc-v1

NaNK

license:apache-2.0

bert-base-uncased-mnli-sparse-70-unstructured-no-classifier

—

whisper-tiny-int8-static-inc

license:apache-2.0

DeepSeek-V3.1-gguf-q2ks-mixed-AutoRound

This model is a mixed gguf q2ks format of deepseek-ai/DeepSeek-V3.1 generated by intel/auto-round algorithm. Embedding layer and lm-head layer are fallback to 8 bits and non expert layers are fallback to 4 bits. Please refer to Section `Generate the model` for more details. ~~~bash ./llama-cli -hf Intel/DeepSeek-V3.1-gguf-q2ks-mixed-AutoRound ~~~ ~~~bash > Write a quick sort algorithm. Here is a quick implementation of the QuickSort algorithm in Python: This implementation uses a recursive approach and selects the middle element as the pivot. It partitions the array into three parts: elements less than the pivot, elements equal to the pivot, and elements greater than the pivot. It then recursively sorts the left and right partitions and combines them with the middle partition. Note: This version is not in-place and uses extra memory for the partitions. For an in-place version that uses less memory, you would need to implement partitioning within the original array. ~~~ The model can produce factually incorrect output, and should not be relied on to produce factually accurate information. Because of the limitations of the pretrained model and the finetuning datasets, it is possible that this model could generate lewd, biased or otherwise offensive outputs. Therefore, before deploying any applications of the model, developers should perform safety testing. Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. Here are a couple of useful links to learn more about Intel's AI software: The license on this model does not constitute legal advice. We are not responsible for the actions of third parties who use this model. Please consult an attorney before using this model for commercial purposes. @article{cheng2023optimize, title={Optimize weight rounding via signed gradient descent for the quantization of llms}, author={Cheng, Wenhua and Zhang, Weiwei and Shen, Haihao and Cai, Yiyang and He, Xin and Lv, Kaokao and Liu, Yi}, journal={arXiv preprint arXiv:2309.05516}, year={2023} }

NaNK

—

whisper-large-int8-dynamic-inc

license:apache-2.0

whisper-large-v2-int8-static-inc

license:apache-2.0

Qwen3-Coder-480B-A35B-Instruct-gguf-q4km-AutoRound

This model is a gguf q4km format of Qwen/Qwen3-Coder-480B-A35B-Instruct generated by intel/auto-round algorithm. Embedding layer and lm-head layer are fallback to 8 bits . Please refer to Section `Generate the model` for more details. ~~~bash /llama-cli -hf Intel/Qwen3-Coder-480B-A35B-Instruct-gguf-q4km-AutoRound:q4km --conversation > code flappy bird in python Flappy Bird in Python using Pygame Here's a complete implementation of Flappy Bird using Pygame: The model can produce factually incorrect output, and should not be relied on to produce factually accurate information. Because of the limitations of the pretrained model and the finetuning datasets, it is possible that this model could generate lewd, biased or otherwise offensive outputs. Therefore, before deploying any applications of the model, developers should perform safety testing. Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. Here are a couple of useful links to learn more about Intel's AI software: The license on this model does not constitute legal advice. We are not responsible for the actions of third parties who use this model. Please consult an attorney before using this model for commercial purposes. @article{cheng2023optimize, title={Optimize weight rounding via signed gradient descent for the quantization of llms}, author={Cheng, Wenhua and Zhang, Weiwei and Shen, Haihao and Cai, Yiyang and He, Xin and Lv, Kaokao and Liu, Yi}, journal={arXiv preprint arXiv:2309.05516}, year={2023} }

NaNK

license:apache-2.0

bert-base-uncased-QNLI-int8-inc

license:mit

deberta-v3-base-mrpc-int8-dynamic-inc

license:mit

hebrew-math-tutor-v1

NaNK

license:apache-2.0

fid_t5_large_nq

license:cc-by-sa-3.0

Qwen3.5-397B-A17B-gguf-q2ks-mixed-AutoRound

NaNK

—

whisper-medium-int8-static-inc

license:apache-2.0

albert-base-v2-sst2-int8-static-inc

license:apache-2.0

bert-base-uncased-STS-B-int8-inc

license:mit

ColBERT-NQ

license:cc-by-sa-3.0

whisper-small-onnx-int4-inc

license:apache-2.0

DeepSeek-V3.2-int4-AutoRound

—

gpt-oss-120b-int4-g64-AutoRound

NaNK

—

deberta-v3-base-mrpc-int8-static-inc

license:mit

tiny-random-distilgpt2

—

gpt-j-6B-int8-dynamic-inc

NaNK

license:apache-2.0

whisper-base-onnx-int4-inc

license:apache-2.0

whisper-large-onnx-int4-inc

license:apache-2.0

gpt-oss-120b-int4-AutoRound-FP8KV

NaNK

license:apache-2.0

whisper-medium-onnx-int4-inc

license:apache-2.0

neural-chat-7b-v3-3-int4-inc

NaNK

license:apache-2.0

bert-base-uncased-squad-int8-static-inc

license:apache-2.0

whisper-base-int8-dynamic-inc

license:apache-2.0

Mistral-7B-v0.1-int4-inc-lmhead

NaNK

license:apache-2.0

Bert Base Uncased Sparse 1 2

Pretrained model pruned to 1:2 structured sparsity. The model is a pruned version of the BERT base model. The model can be used for fine-tuning to downstream tasks with sparsity already embeded to the model. To keep the sparsity a mask should be added to each sparse weight blocking the optimizer from updating the zeros. Evaluation Results We get the following results on the tasks development set, all results are mean of 5 different seeded models: | Task | MNLI-m (Acc) | MNLI-mm (Acc) | QQP (Acc/F1) | QNLI (Acc) | SST-2 (Acc) | STS-B (Pears/Spear) | SQuADv1.1 (Acc/F1) | |------|--------------|---------------|--------------|------------|-------------|---------------------|--------------------| | | 83.3 | 83.9 | 90.8/87.6 | 90.4 | 91.3 | 88.8/88.3 | 80.5/88.2 |

—

bert-base-uncased-sparse-70-unstructured

—

bert-base-uncased-mrpc-int8-static-inc

license:apache-2.0

xlm-roberta-base-mrpc-int8-dynamic-inc

license:mit

albert-base-v2-sst2-int8-dynamic-inc

license:apache-2.0

whisper-large-int8-static-inc

license:apache-2.0

whisper-small-int8-static-inc

license:apache-2.0

tvp-base-ANet

—

msmarco_fid

license:cc-by-nc-4.0

llava-gemma-7b

NaNK

—

gpt-j-6B-int8-static-inc

NaNK

license:apache-2.0

bge-large-en-v1.5-rag-int8-static

license:mit

bert-base-uncased-finetuned-swag-int8-static-inc

license:apache-2.0

distilbert-base-uncased-finetuned-conll03-english-int8-static-inc

license:apache-2.0

xlnet-base-cased-mrpc

license:mit

MiniLM-L12-H384-uncased-mrpc

license:mit

roberta-base-squad2-int8-static-inc

license:cc-by-4.0

whisper-base-int8-static-inc

license:apache-2.0

bert-base-uncased-sparse-85-unstructured-pruneofa

license:apache-2.0

roberta-base-mrpc-int8-static-inc

license:mit

bart-large-mrpc

license:apache-2.0

roberta-large-mnli-int8-static-inc

license:mit

distilbert-base-uncased-squadv1.1-sparse-80-1X4-block

license:apache-2.0

gpt-j-6B-pytorch-int8-static-inc

NaNK

license:apache-2.0

bert-large-uncased-rte-int8-dynamic-inc

license:apache-2.0

albert-base-v2-MRPC-int8-inc

license:mit

msmarco_fid_early_exit

license:cc-by-nc-4.0

bge-base-en-v1.5-rag-int8-static

license:mit

phi-2-ov-quantized

license:mit

Qwen2-7B-int4-inc

NaNK

license:apache-2.0

LongCat-Flash-Thinking-2601-int4-mixed-AutoRound

license:mit

bert-large-uncased-squadv1.1-sparse-80-1x4-block-pruneofa

license:apache-2.0

bert-base-uncased-mrpc-int8-qat-inc

license:apache-2.0

whisper-large-v2-int8-dynamic-inc

license:apache-2.0

Qwen2.5-VL-7B-Instruct-int4-mixed-AutoRound

NaNK

license:apache-2.0

DeepSeek-V3.2-Speciale-int4-AutoRound

—

distilbert-base-uncased-sparse-85-unstructured-pruneofa

license:apache-2.0

bart-large-mrpc-int8-dynamic-inc

license:apache-2.0

MiniLM-L12-H384-uncased-mrpc-int8-static-inc

license:mit

distilbert-base-cased-distilled-squad-int8-static-inc

license:apache-2.0

distilbert-base-uncased-sparse-80-1x4-block-pruneofa

—

t5-base-cnn-dm-int8-dynamic-inc

license:apache-2.0

roberta-base-mrpc-int8-dynamic-inc

license:mit

MiniLM-L12-H384-uncased-mrpc-int8-dynamic-inc

license:mit

bert-large-uncased-rte-int8-static-inc

license:apache-2.0

bert-base-cased-finetuned-sst2-int8-inc

license:apache-2.0

whisper-medium-int8-dynamic-inc

license:apache-2.0

eli5_fid_early_exit

license:cc-by-nc-4.0

nq_fid_lfqa

license:cc-by-nc-4.0

falcon-7b-sq-int8-inc

NaNK

license:apache-2.0

bart-large-cnn-int8-dynamic-inc

license:mit

distilbert-base-uncased-MRPC-int8-dynamic-inc

license:mit

bert-large-uncased-sparse-80-1x4-block-pruneofa

license:apache-2.0

camembert-base-mrpc-int8-dynamic-inc

license:mit

bert-base-uncased-mrpc-int8-dynamic-inc

license:apache-2.0

xlm-roberta-base-mrpc

license:mit

electra-small-discriminator-mrpc-int8-static-inc

license:mit

t5-large-finetuned-xsum-cnn-int8-dynamic-inc

license:mit

distilbert-base-uncased-MRPC-int8-static-inc

license:mit

bert-large-uncased-cola-int8-inc

license:apache-2.0

nq_fid_lfqa_early_exit

license:cc-by-nc-4.0

falcon-7b-int4-inc

NaNK

license:apache-2.0

Qwen2-0.5B-Instuct-int4-inc

NaNK

license:apache-2.0

intel-optimized-model-for-embeddings-v1

license:mit

intel-optimized-model-for-embeddings-int8-v1

license:mit

Qwen3-8B-int4-AutoRound

NaNK

license:apache-2.0

neural-embedding-v1

—

whisper.cpp-openvino-models

license:mit

demucs-openvino

license:mit

Phi-3-mini-4k-instruct-int4-inc

license:apache-2.0

gemma-4-26B-A4B-it-int4-mixed-AutoRound

NaNK

—

table-transformer-int8-static-inc

license:mit

musicgen-static-openvino

license:cc-by-nc-4.0

deepfilternet-openvino

license:mit

sd-1.5-controlnet-depth-quantized

—

Qwen2-1.5B-Instuct-int4-inc

NaNK

license:apache-2.0

versatile_audio_super_resolution_openvino

license:mit

gemma-4-31B-it-int4-AutoRound

NaNK

—

gemma-4-26B-A4B-it-int4-AutoRound

NaNK

—

gemma-4-26B-A4B-it-int4-mixed

NaNK

—

MiroThinker-1.7-int4-AutoRound

—

drumsep_mdx23c_jarredou_openvino

Drum Separation (5-stem) MDX23C (trained by @jarredou) OpenVINO Models This repo stores OpenVINO(TM) models in IR format that are used to perform 5-stem Drum Separation. The original pytorch checkpoint / config used to convert / produce the OpenVINO IRs found in this repo were downloaded from here: https://github.com/jarredou/models/releases/tag/DrumSep To better support a range of OpenVINO-supported devices, the `forward` method from original pytorch model was sliced to remove the STFT / iSTFT functions. The `mdx23cfwd.xml/.bin` OpenVINO IR, stored in this repo, is a conversion of the modified pytorch model to OpenVINO format. The OpenVINO IRs are intended to be used with the set of OpenVINO-based AI plugins for Audacity(R), here: https://github.com/intel/openvino-plugins-ai-audacity Intel’s Human Rights Disclaimer: Intel is committed to respecting human rights and avoiding complicity in human rights abuses. See Intel's Global Human Rights Principles. Intel's products and software are intended only to be used in applications that do not cause or contribute to a violation of an internationally recognized human right.

license:cc-by-nc-sa-4.0

Qwen3-235B-A22B-MXFP8-AutoRound-Recipe

NaNK

license:apache-2.0

Qwen3-8B-w2g64-AutoRound-test

NaNK

—

coresearch-retriever-spanbert

license:cc-by-3.0

gpt-j-6b-sparse

NaNK

license:apache-2.0

sd-reference-only

—

gemma-2b-int4-inc

NaNK

license:apache-2.0

Baichuan2-13B-Chat-int4-inc

NaNK

license:apache-2.0

SOLAR-10.7B-Instruct-v1.0-int4-inc

NaNK

license:apache-2.0

opt-1.3b-int4-inc-recipe

NaNK

—

Phi-3-mini-128k-instruct-int4-inc-recipe

license:mit

whisper-small-openvino

NaNK

license:apache-2.0

Qwen2.5-0.5B-Instruct-int4-inc

NaNK

license:apache-2.0

Qwen2.5-1.5B-Instruct-int4-inc

NaNK

license:apache-2.0

whisper-medium-openvino-fp16

NaNK

license:apache-2.0

vocals_mel_band_roformer_kimberleyJSN_openvino

Vocals MelBandRoformer (trained by @KimberleyJSN) OpenVINO Models This repo stores OpenVINO(TM) models in IR format that are used to perform Vocals Extraction / Removal. The OpenVINO IRs (.xml, .bin files) stored here have been converted from @KimberleyJSN's pytorch model checkpoint (MelBandRoformer.ckpt) from here: https://huggingface.co/KimberleyJSN/melbandroformer The config used was downloaded from here: https://raw.githubusercontent.com/ZFTurbo/Music-Source-Separation-Training/main/configs/KimberleyJensen/configvocalsmelbandroformerkj.yaml Both of these files are uploaded to this repo, under `pytorch` folder. The OpenVINO IRs are intended to be used with the set of OpenVINO-based AI plugins for Audacity(R), here: https://github.com/intel/openvino-plugins-ai-audacity To better support a range of OpenVINO-supported devices, the MelBandRoformer model has been sliced / converted to 3 separate OpenVINO IRs: melbandpre.xml/.bin -> Pre-processing operations (such as STFT) which convert input audio waveforms to frequency domain. melbandfwd.xml / .bin -> The majority of the layers / ops in the original model. melbandpost.xml / .bin -> Post-processing operations (such as iSTFT) which convert frequency domain outputs from `melbandfwd` to output waveforms. Intel’s Human Rights Disclaimer: Intel is committed to respecting human rights and avoiding complicity in human rights abuses. See Intel's Global Human Rights Principles. Intel's products and software are intended only to be used in applications that do not cause or contribute to a violation of an internationally recognized human right.

license:gpl-3.0

Intel

dpt-hybrid-midas

zoedepth-nyu-kitti

dpt-large

zoedepth-nyu

Qwen3.5-122B-A10B-int4-AutoRound

neural-chat-7b-v3-3

Mistral-Small-3.2-24B-Instruct-2506-int4-AutoRound

Qwen3.5-397B-A17B-int4-AutoRound

Qwen3.5-35B-A3B-int4-AutoRound

Qwen3-Coder-30B-A3B-Instruct-gguf-q2ks-mixed-AutoRound

dpt-large-ade

bert-base-uncased-mrpc

Qwen2-0.5B-Instruct-int4-sym-AutoRound

tiny-random-llama2

GLM-5-int4-mixed-AutoRound

neural-chat-7b-v3-1

tiny-random-gpt2

tiny-random-falcon

dpt-swinv2-tiny-256

distilbert-base-uncased-distilled-squad-int8-static-inc

t5-small-xsum-int8-dynamic-inc

dynamic_tinybert

Step-3.5-Flash-int4-mixed-AutoRound

neural-chat-7b-v3-2

Qwen3-Coder-30B-A3B-Instruct-gguf-q4km-AutoRound

Qwen3-Omni-30B-A3B-Instruct-int4-AutoRound

GLM 4.6 REAP 218B A32B FP8 Gguf Q2ks Mixed AutoRound

Qwen3-30B-A3B-Instruct-2507-gguf-q2ks-mixed-AutoRound

Qwen3-30B-A3B-Instruct-2507-int4-AutoRound

Gpt Oss 20b Gguf Q4ks AutoRound

dpt-beit-base-384

dpt-beit-large-512

Qwen3-235B-A22B-Instruct-2507-gguf-q4km-AutoRound

Qwen3-30B-A3B-Thinking-2507-gguf-q2ks-mixed-AutoRound

Qwen3-Next-80B-A3B-Instruct-int4-mixed-AutoRound

ldm3d-4c

MiniMax-M2-gguf-q2ks-mixed-AutoRound

Qwen3-Coder-30B-A3B-Instruct-int4-AutoRound

Qwen3-Next-80B-A3B-Instruct-int4-AutoRound

llava-gemma-2b

Qwen3-Next-80B-A3B-Thinking-int4-mixed-AutoRound

DeepSeek-R1-0528-Qwen3-8B-int4-AutoRound

GLM 4.6 Gguf Q2ks Mixed AutoRound

Qwen3-235B-A22B-Instruct-2507-gguf-q2ks-mixed-AutoRound

Ling Flash 2.0 Gguf Q2ks Mixed AutoRound

Qwen3-Next-80B-A3B-Thinking-int4-AutoRound

Qwen2.5-0.5B-W4A16-G128-AutoRound-LLMC-TEST-ONLY

polite-guard

tiny-random-bert_ipex_model

tiny-random-llama2_ipex_model

tiny-random-mistral_ipex_model

tiny-random-gpt2_ipex_model

tiny-random-falcon_ipex_model

tiny-random-vit_ipex_model

Qwen3-Coder-480B-A35B-Instruct-int4-mixed-AutoRound

gpt-oss-120b-gguf-q4ks-AutoRound

Qwen3-30B-A3B-Instruct-2507-int4-mixed-AutoRound

gpt-oss-20b-int4-AutoRound

dpt-swinv2-large-384

MiniMax-M2-REAP-172B-A10B-gguf-q2ks-mixed-AutoRound

Qwen3.5-35B-A3B-gguf-q2ks-mixed-AutoRound

Qwen3-VL-30B-A3B-Instruct-int4-AutoRound

deepmath-v1

gpt-oss-120b-int4-AutoRound

Qwen3 8B GGUF Q2KS AS AutoRound

zoedepth-kitti

gpt-oss-20b-gguf-q2ks-AutoRound

dpt-beit-large-384

Seed-OSS-36B-Instruct-int4-AutoRound

dpt-swinv2-base-384

Qwen3-VL-235B-A22B-Instruct-int4-mixed-AutoRound

Qwen3-VL-235B-A22B-Instruct-int4-AutoRound

Qwen3-30B-A3B-Thinking-2507-int4-AutoRound

Qwen3-235B-A22B-Instruct-2507-int4-mixed-AutoRound

tvp-base

Qwen3-235B-A22B-AutoRound-Recipe

Qwen3-235B-A22B-MXFP4-AutoRound-Recipe

roberta-base-mrpc

ldm3d