OPPOer
Qwen-Image-Edit-2509-Pruning
Qwen-Image-Edit-2509-13B-4steps
Update - 2025/10/09: We release Qwen-Image-Edit-2509-Pruning-13B-4steps - 2025/09/29: We release Qwen-Image-Edit-2509-Pruning-14B - 2025/09/28: We release Qwen-Image-Edit-Pruning-13B-4steps Introduction This open-source project is based on Qwen-Image-Edit and has attempted model pruning, removing 20 layers while retaining the weights of 40 layers, resulting in a model size of 13.6B parameters. The pruned version will continue to be iterated upon. Please stay tuned. Install the latest version of diffusers and pytorch š If you find our work helpful, please consider citing our paper and leaving valuable stars
X2I
AndesVL 4B Thinking
AndesVL is a suite of mobile-optimized Multimodal Large Language Models (MLLMs) with 0.6B to 4B parameters, built upon Qwen3's LLM and various visual encoders. Designed for efficient edge deployment, it achieves first-tier performance on diverse benchmarks, including those for text-rich tasks, reasoning tasks, Visual Question Answering (VQA), multi-image tasks, multilingual tasks, and GUI tasks. Its "1+N" LoRA architecture and QALFT framework facilitate efficient task adaptation and model com...
AndesVL 0 6B Instruct
AndesVL is a suite of mobile-optimized Multimodal Large Language Models (MLLMs) with 0.6B to 4B parameters, built upon Qwen3's LLM and various visual encoders. Designed for efficient edge deployment, it achieves first-tier performance on diverse benchmarks, including those for text-rich tasks, reasoning tasks, Visual Question Answering (VQA), multi-image tasks, multilingual tasks, and GUI tasks. Its "1+N" LoRA architecture and QALFT framework facilitate efficient task adaptation and model com...
AndesVL 4B Instruct
AndesVL is a suite of mobile-optimized Multimodal Large Language Models (MLLMs) with 0.6B to 4B parameters, built upon Qwen3's LLM and various visual encoders. Designed for efficient edge deployment, it achieves first-tier performance on diverse benchmarks, including those for text-rich tasks, reasoning tasks, Visual Question Answering (VQA), multi-image tasks, multilingual tasks, and GUI tasks. Its "1+N" LoRA architecture and QALFT framework facilitate efficient task adaptation and model com...
MultilingualFLUX.1-adapter
Qwen-Image-Pruning
Update - 2025/09/28: We release Qwen-Image-Edit-Pruning https://huggingface.co/OPPOer/Qwen-Image-Edit-Pruning - 2025/09/24: We release Qwen-Image-12B, an open-source pruned variant with 12.7B parameters. Experimental results show that its performance is on par with the prior 13.6B model pruned by removing 20 layers, as validated through both objective benchmarks and human assessment. Continuous optimization efforts are underway. - 2025/09/22: ComfyUI Path: https://huggingface.co/wikeeyang/Qwen-Image-Pruning-for-ComfyUI Thanks to wikeeyang. Introduction This open-source project is based on Qwen-Image and has attempted model pruning, removing 20 layers while retaining the weights of 40 layers, resulting in a model size of 12B parameters. The pruned model has experienced a slight drop in objective metrics. The pruned version will continue to be iterated upon. Additionally, the pruned version supports the adaptation and loading of community models such as LoRA and ControlNet. Please stay tuned. For the relevant inference scripts, please refer to Qwen-Image-Pruning. Install the latest version of diffusers and pytorch š If you find our work helpful, please consider citing our paper and leaving valuable stars
AndesVL 2B Instruct
Qwen Image 10B
Install the latest version of diffusers and transformers Inference Download the file transformerqwenimage10B.py from https://github.com/OPPO-Mente-Lab/Qwen-Image-Pruning' to your local directory, and then you can directly load model with
AndesVL 0 6B Thinking
AndesVL is a suite of mobile-optimized Multimodal Large Language Models (MLLMs) with 0.6B to 4B parameters, built upon Qwen3's LLM and various visual encoders. Designed for efficient edge deployment, it achieves first-tier performance on diverse benchmarks, including those for text-rich tasks, reasoning tasks, Visual Question Answering (VQA), multi-image tasks, multilingual tasks, and GUI tasks. Its "1+N" LoRA architecture and QALFT framework facilitate efficient task adaptation and model com...
AndesVL 1B Instruct
Qwen-Image-13B-8steps
Update - 2025/09/28: We release Qwen-Image-Edit-Pruning https://huggingface.co/OPPOer/Qwen-Image-Edit-Pruning - 2025/09/24: We release Qwen-Image-13B. The model is pruned by removing 20 layers. Continuous optimization efforts are underway. - 2025/09/22: ComfyUI Path: https://huggingface.co/wikeeyang/Qwen-Image-Pruning-for-ComfyUI Thanks to wikeeyang. Introduction This open-source project is based on Qwen-Image and has attempted model pruning, removing 20 layers while retaining the weights of 40 layers, resulting in a model size of 13B parameters. The pruned model has experienced a slight drop in objective metrics. The pruned version will continue to be iterated upon. Additionally, the pruned version supports the adaptation and loading of community models such as LoRA and ControlNet. Please stay tuned. For the relevant inference scripts, please refer to Qwen-Image-Pruning. Install the latest version of diffusers and pytorch
AndesVL 1B Thinking
AndesVL is a suite of mobile-optimized Multimodal Large Language Models (MLLMs) with 0.6B to 4B parameters, built upon Qwen3's LLM and various visual encoders. Designed for efficient edge deployment, it achieves first-tier performance on diverse benchmarks, including those for text-rich tasks, reasoning tasks, Visual Question Answering (VQA), multi-image tasks, multilingual tasks, and GUI tasks. Its "1+N" LoRA architecture and QALFT framework facilitate efficient task adaptation and model compression, enabling a 6.7x peak decoding speedup and a 1.8 bits-per-weight compression ratio on mobile chips. Detailed model sizes and components are provided below: | Model | Total Parameters (B) | Visual Encoder | LLM | |---|---|---|---| | AndesVL-0.6B | 0.695 | SigLIP2-Base | Qwen3-0.6B | | AndesVL-1B | 0.927 | AIMv2-Large | Qwen3-0.6B | | AndesVL-2B | 2.055 | AIMv2-Large | Qwen3-1.7B| | AndesVL-4B | 4.360 | AIMv2-Large | Qwen3-4B | Citation If you find our work helpful, feel free to give us a cite. Acknowledge We are very grateful for the efforts of the Qwen, AimV2 and Siglip 2 projects.
AndesVL 2B Thinking
AndesVL is a suite of mobile-optimized Multimodal Large Language Models (MLLMs) with 0.6B to 4B parameters, built upon Qwen3's LLM and various visual encoders. Designed for efficient edge deployment, it achieves first-tier performance on diverse benchmarks, including those for text-rich tasks, reasoning tasks, Visual Question Answering (VQA), multi-image tasks, multilingual tasks, and GUI tasks. Its "1+N" LoRA architecture and QALFT framework facilitate efficient task adaptation and model compression, enabling a 6.7x peak decoding speedup and a 1.8 bits-per-weight compression ratio on mobile chips. Detailed model sizes and components are provided below: | Model | Total Parameters (B) | Visual Encoder | LLM | |---|---|---|---| | AndesVL-0.6B | 0.695 | SigLIP2-Base | Qwen3-0.6B | | AndesVL-1B | 0.927 | AIMv2-Large | Qwen3-0.6B | | AndesVL-2B | 2.055 | AIMv2-Large | Qwen3-1.7B| | AndesVL-4B | 4.360 | AIMv2-Large | Qwen3-4B | Citation If you find our work helpful, feel free to give us a cite. Acknowledge We are very grateful for the efforts of the Qwen, AimV2 and Siglip 2 projects.
Qwen-Image-12B-8steps
Update - 2025/09/28: We release Qwen-Image-Edit-Pruning https://huggingface.co/OPPOer/Qwen-Image-Edit-Pruning - 2025/09/24: We release Qwen-Image-12B. Experimental results show that its performance is on par with the prior 13.6B model pruned by removing 20 layers, as validated through both objective benchmarks and human assessment. Continuous optimization efforts are underway. - 2025/09/22: ComfyUI Path: https://huggingface.co/wikeeyang/Qwen-Image-Pruning-for-ComfyUI Thanks to wikeeyang. Introduction This open-source project is based on Qwen-Image and has attempted model pruning, removing 20 layers while retaining the weights of 40 layers, resulting in a model size of 12B parameters. The pruned model has experienced a slight drop in objective metrics. The pruned version will continue to be iterated upon. Additionally, the pruned version supports the adaptation and loading of community models such as LoRA and ControlNet. Please stay tuned. For the relevant inference scripts, please refer to Qwen-Image-Pruning. Download the file transformerqwenimagepruning.py from https://github.com/OPPO-Mente-Lab/Qwen-Image-Pruning to your local directory, and then you can directly load model with Install the latest version of diffusers and pytorch
Qwen-Image-Edit-Pruning
Update - 2025/10/09: We release Qwen-Image-Edit-2509-Pruning-13B-4steps - 2025/09/29: We release Qwen-Image-Edit-2509-Pruning-14B - 2025/09/28: We release Qwen-Image-Edit-Pruning-13B-4steps Introduction This open-source project is based on Qwen-Image-Edit and has attempted model pruning, removing 20 layers while retaining the weights of 40 layers, resulting in a model size of 13.6B parameters. The pruned version will continue to be iterated upon. Please stay tuned. Install the latest version of diffusers and pytorch
TLCMSDXL
X2Edit
For the relevant data construction scripts, model training and inference scripts, please refer to X2Edit. Prepare the environment, install the required libraries: Inference We provides inference scripts for editing images with resolutions of 1024 and 512. In addition, we can choose the base model of X2Edit, including FLUX.1-Krea, FLUX.1-dev, FLUX.1-schnell, PixelWave, shuttle-3-diffusion, and choose the LoRA for integration with MoE-LoRA including Turbo-Alpha, AntiBlur, Midjourney-Mix2, Super-Realism, Chatgpt-Ghibli. Choose the model you like and download it. For the MoE-LoRA, we will open source a unified checkpoint that can be used for both 512 and 1024 resolutions. Before executing the script, download Qwen3-8B to select the task type for the input instruction, base model(FLUX.1-Krea, FLUX.1-dev, FLUX.1-schnell, shuttle-3-diffusion), MLLM and Alignet. All scripts follow analogous command patterns. Simply replace the script filename while maintaining consistent parameter configurations. device: The device used for inference. default: `cuda` pixel: The resolution of the input image, , you can choose from [512, 1024]. default: `1024` numexperts: The number of expert in MoE. default: `12` basepath: The path of base model. qwenpath: The path of model used to select the task type for the input instruction. We use Qwen3-8B here. lorapath: The path of MoE-LoRA in X2Edit. extralorapath: The path of extra LoRA for plug-and-play. default: `None`. š If you find our work helpful, please consider citing our paper and leaving valuable stars
TLCM
TLCMFlux
PEA-Diffusion
MultilingualSD3-adapter
`MultilingualSD3-adapter` is a multilingual adapter tailored for the SD3. Originating from an ECCV 2024 paper titled PEA-Diffusion. The open-source code is available at https://github.com/OPPO-Mente-Lab/PEA-Diffusion. Usage We used the multilingual encoder umt5-xxl,Mul-OpenCLIP and HunyuanDiTCLIP. We implemented a reverse denoising process for distillation training. To learn more check out the diffusers documentation License The adapter itself is Apache License 2.0, but it must follow the license of the main model.