qihoo360
fg-clip-large
FG-CLIP: Fine-Grained Visual and Textual Alignment FG-CLIP: Fine-Grained Visual and Textual Alignment Chunyu Xie, Bin Wang, Fanjing Kong, Jincheng Li, Dawei Liang, Gengshen Zhang, Dawei Leng†, Yuhui Yin(Equal Contribution, ✝Corresponding Author) [](https://arxiv.org/abs/2505.05071) [](https://icml.cc/Conferences/2025) [](https://github.com/360CVGroup/FG-CLIP) Model Framework FG-CLIP’s training proceeds in two stages: the first stage leverages global-level caption-image pairs to achieve initial fine-grained alignment, while the second stage supplements these with additional region-level captions, including detailed region captions and positive/negative region descriptions to further refine the alignment. Citation If you find FG-CLIP useful for your research and applications, please cite using this BibTeX: This project utilizes certain datasets and checkpoints that are subject to their respective original licenses. Users must comply with all terms and conditions of these original licenses. The content of this project itself is licensed under the Apache license 2.0.
360Zhinao-search
fg-clip2-base
FG-CLIP 2: A Bilingual Fine-grained Vision-language Alignment Model Code: https://github.com/360CVGroup/FG-CLIP Project page: https://360cvgroup.github.io/FG-CLIP FG-CLIP 2 is the foundation model ...
fg-clip-base
Fg Clip2 Large
FG-CLIP 2: A Bilingual Fine-grained Vision-language Alignment Model Code: https://github.com/360CVGroup/FG-CLIP Project page: https://360cvgroup.github.io/FG-CLIP FG-CLIP 2 is the foundation model for fine-grained vision-language understanding in both English and Chinese. Across 29 datasets and 8 diverse tasks, it consistently surpasses recent strong baselines such as SigLIP 2 and MetaCLIP 2, achieving the best reported performance to date in both languages. FG-CLIP 2: A Bilingual Fine-grained Vision-language Alignment Model Chunyu Xie, Bin Wang, Fanjing Kong, Jincheng Li, Dawei Liang, Ji Ao, Dawei Leng†, Yuhui Yin(Equal Contribution, †Corresponding Author) [](https://arxiv.org/abs/2510.10921) [](https://huggingface.co/collections/qihoo360/fg-clip-2-68ecbf9c548623bb78bc7913) [](https://huggingface.co/collections/qihoo360/fg-clip-2-68ecbf9c548623bb78bc7913) [](https://research.360.cn/sass/index) FG-CLIP: Fine-Grained Visual and Textual Alignment (code branch: v1.0) Chunyu Xie, Bin Wang, Fanjing Kong, Jincheng Li, Dawei Liang, Gengshen Zhang, Dawei Leng†, Yuhui Yin (Equal Contribution, †Corresponding Author) [](https://arxiv.org/abs/2505.05071) [](https://icml.cc/Conferences/2025) [](https://huggingface.co/collections/qihoo360/fg-clip-681da45d4acfb65c240a6d08) [](https://huggingface.co/datasets/qihoo360/FineHARD) [](https://deepwiki.com/360CVGroup/FG-CLIP) Citation If you find FG-CLIP 2 useful for your research and applications, please cite using this BibTeX: This project utilizes certain datasets and checkpoints that are subject to their respective original licenses. Users must comply with all terms and conditions of these original licenses. The content of this project itself is licensed under the Apache license 2.0.
360Zhinao-7B-Base
fg-clip2-so400m
FG-CLIP 2: A Bilingual Fine-grained Vision-language Alignment Model Code: https://github.com/360CVGroup/FG-CLIP Project page: https://360cvgroup.github.io/FG-CLIP FG-CLIP 2 is the foundation model for fine-grained vision-language understanding in both English and Chinese. Across 29 datasets and 8 diverse tasks, it consistently surpasses recent strong baselines such as SigLIP 2 and MetaCLIP 2, achieving the best reported performance to date in both languages. FG-CLIP 2: A Bilingual Fine-grained Vision-language Alignment Model Chunyu Xie, Bin Wang, Fanjing Kong, Jincheng Li, Dawei Liang, Ji Ao, Dawei Leng†, Yuhui Yin(Equal Contribution, †Corresponding Author) [](https://arxiv.org/abs/2510.10921) [](https://huggingface.co/collections/qihoo360/fg-clip-2-68ecbf9c548623bb78bc7913) [](https://huggingface.co/collections/qihoo360/fg-clip-2-68ecbf9c548623bb78bc7913) [](https://research.360.cn/sass/index) FG-CLIP: Fine-Grained Visual and Textual Alignment (code branch: v1.0) Chunyu Xie, Bin Wang, Fanjing Kong, Jincheng Li, Dawei Liang, Gengshen Zhang, Dawei Leng†, Yuhui Yin (Equal Contribution, †Corresponding Author) [](https://arxiv.org/abs/2505.05071) [](https://icml.cc/Conferences/2025) [](https://huggingface.co/collections/qihoo360/fg-clip-681da45d4acfb65c240a6d08) [](https://huggingface.co/datasets/qihoo360/FineHARD) [](https://deepwiki.com/360CVGroup/FG-CLIP) Citation If you find FG-CLIP 2 useful for your research and applications, please cite using this BibTeX: This project utilizes certain datasets and checkpoints that are subject to their respective original licenses. Users must comply with all terms and conditions of these original licenses. The content of this project itself is licensed under the Apache license 2.0.
Light-R1-14B-DS-GGUF
TinyR1-32B-Preview
!!! Important Release !!! The official version of TinyR1-32B has been released. Please visit TinyR1-32B We have officially open-sourced the training dataset , as well as the full training and evaluation pipeline . We have uploaded the technical report. Paper Link 👁️ Introduction We introduce our first-generation reasoning model, Tiny-R1-32B-Preview, which outperforms the 70B model Deepseek-R1-Distill-Llama-70B and nearly matches the full R1 model in math. We applied supervised fine-tuning (SFT) to Deepseek-R1-Distill-Qwen-32B across three target domains—Mathematics, Code, and Science — using the 360-LLaMA-Factory training framework to produce three domain-specific models. We used questions from open-source data as seeds. Meanwhile, responses for mathematics, coding, and science tasks were generated by R1, creating specialized models for each domain. Building on this, we leveraged the Mergekit tool from the Arcee team to combine multiple models, creating Tiny-R1-32B-Preview, which demonstrates strong overall performance. Note: We have fixed a new tokenizer config bug that existed before March 3, 2025, at 20:50 Beijing Time (UTC+8). Refer to Hotfix. Evaluation | Model | Math (AIME 2024) | Coding (LiveCodeBench) | Science (GPQA-Diamond) | | ------------------------------- | ------------------- | ----------------------- | ---------------------- | | Deepseek-R1-Distill-Qwen-32B | 72.6 | 57.2 | 62.1 | | Deepseek-R1-Distill-Llama-70B | 70.0 | 57.5 | 65.2 | | Deepseek-R1 | 79.8 | 65.9 | 71.5 | | Tiny-R1-32B-Preview (Ours) | 78.1 | 61.6 | 65.0 All scores are reported as pass@1. For AIME 2024, we sample 16 responses, and for GPQA-Diamond, we sample 4 responses, both using average overall accuracy for stable evaluation. We merged the models trained separately in three directions into a single model. Below are the comparison results. | Model | Math (AIME 2024) | Coding (LiveCodeBench) | Science (GPQA-Diamond) | | ------------------------------- | ------------------- | ----------------------- | ---------------------- | | Math-Model | 73.1 | - | - | | Code-Model | - | 63.4 | - | | Science-Model | - | - | 64.5 | | Merged-Model (Tiny-R1-32B-Preview) | 78.1 | 61.6 | 65.0 1. Math 58.3k CoT trajectories from open-r1/OpenR1-Math-220k, default subset 2. Coding 19k CoT trajectories open-thoughts/OpenThoughts-114k, coding subset 3. Science 8.6k CoT trajectories: - 2.7k CoT trajectories from simplescaling/dataablationfull59K, science and health science subset - 4.9k CoT trajectories from open-thoughts/OpenThoughts-114k, science subset - 1.0k CoT trajectories from simplescaling/s1K-1.1, all Open Source Plan We will publish a technical report as soon as possible and open-source our training and evaluation code, selected training data, and evaluation logs. Having benefited immensely from the open-source community, we are committed to giving back in every way we can. Caveats TinyR1-32B-Preview is an experimental research model designed to advance AI reasoning capabilities. As a preview release, it has demonstrated higher evaluation scores on some benchmarks but is not intended for general user applications. Key limitations include: 1. Incorrect parameter configurations may result in repetitive output loops, similar to R1. We recommend setting the temperature to 0.6 and top-p to 0.95, in line with R1's configuration. 2. The model currently omits the ` ` token, which indicates the reasoning start, and only outputs the ` ` token to signal the end of the thinking process. This will be addressed soon in the next version. 3. The model may generate overly lengthy reasoning chains for simple queries. We are working on improvements. 4. Benchmark coverage is limited. We encourage users to evaluate the model on additional benchmarks, and we will continue to expand our benchmark results. 5. The model requires enhanced safety measures to ensure reliable and secure performance. Hotfix (March 3, 2025) On March 3, 2025, at 20:50 Beijing Time (UTC+8), we update our tokenizer. Users who downloaded our model prior to this update are advised to re-download the tokenizer-related configuration files (tokenizer.json, tokenizerconfig.json, config.json and specialtokensmap.json). Our internal testing has verified that this update resolves the following issues reported by users: 1. Output repetition. 2. Degradation in benchmark performance. 3. Generation of token IDs exceeding the vocabulary range. We appreciate your feedback and encourage you to report any further issues. Additionally, we are actively working on the technical report and consolidating relevant code and data. 360 Team: Lin Sun, Guangxiang Zhao, Xiaoqi Jian, Weihong Lin, Yongfu Zhu, Change Jia, Linglin Zhang, Jinzhu Wu, Sai-er Hu, Xiangzheng Zhang PKU Team: Yuhan Wu, Zihan Jiang, Wenrui Liu, Junting Zhou, Bin Cui, Tong Yang
Light-R1-14B-DS
360VL-8B
Light-IF-32B
Light-TLLM-7B
TinyR1-32B
360LayoutAnalysis
HiCo_T2I
Light-R1-32B
Light-IF-14B
🤗 Hugging Face    |    📑 Paper Link    |    📑 Blog    |    📑 Github    |    📑 SuperCLUE-CPIF    Evaluation |Model|SuperClue|IFEval|CFBench|IFBench| | ---- | ---- | ---- | ---- | ---- | |Qwen3-14B|0.227|0.898|0.827|0.422| |Qwen3-32B|0.234|0.877|0.823|0.384| |Qwen3-235B-A22B|0.244|0.882|0.834|0.423| |Qwen3-235B-A22B-Thinking-2507|0.434|0.916|0.843|0.475| |DeepSeek-R1-0528|0.436|0.863|0.827|0.415| |Doubao-seed-1-6-thinking-250615|0.362|0.832|0.82|0.477| |Doubao-seed-1-6-thinking-250715|0.345|0.856|0.84|0.366| |ChatGPT-4o-latest|0.260|0.836|0.807|0.365| |Deepseek-v3-250324|0.306|0.859|0.833|0.405| |Doubao-1.5-pro-32k-250115|0.285|0.889|0.797|0.375| |Kimi-K2|0.227|0.921|0.820|0.395| |GLM-4.5|0.395|0.893|0.833|0.466| | Light-IF-14B (ours) 🤗 |0.589|0.962|0.833|0.697| SuperCLUE-CPIF In the latest SuperCLUE-CPIF evaluation, Light-IF-14B (shown as 360zhinao3-o1.5 in the figure below) reached the domestic SOTA, outperforming ERNIE-X1.1 and DeepSeek-V3.2-Exp-Thinking. SuperCLUE-CPIF (Chinese Precise Instruction Following) is a benchmark designed to assess how well large language models (LLMs) can accurately follow complex, multi-constraint instructions in Chinese. Introduction Light-IF-14B is the most powerful 14B instruction-following model we have open-sourced, even outperforming Light-IF-32B. This remarkable performance is largely attributed to our carefully designed curriculum learning strategy. During the SFT stage, we increased instruction difficulty; in the two-stage reinforcement learning phase, we introduced even more complex instructions. These improvements played a critical role in further boosting the model's capabilities. The following contains a code snippet illustrating how to use the model generate content based on given inputs. 好的,我现在需要帮用户写一份国庆节旅游攻略,满足几个具体要求。首先,需要8个主题,每个主题的小标题用【】显示。然后每个主题的内容要有2到5句(不包含主题)。整个攻略一共20行,用单个换行符分隔,20行不包括主题。还要不超过300字。 首先,我需要确定国庆节常见的旅游主题,比如景点推荐、美食、交通、住宿、注意事项等等。然后每个主题下要有2-5句内容。接下来,计算总行数。8个主题,每个主题可能包含多少行?需要总行数20行,所以每个主题大约2-3行? 但每个主题的小标题是单独一行吗?比如【主题一】作为一行,然后主题内容行数。所以总行数包括主题行吗?用户说“20行(以单个换行符分割,20行不包括主题)”。可能是指不包括主题的小标题行?需要确认。可能用户的意思是整个攻略中,除了主题小标题之外的内容行是20行?或者整个攻略总行数是20行,其中主题小标题算行吗?可能需要仔细分析。 比如,用户说“整个攻略一共是20行(以单个换行符分割,20行不包括主题)”,所以主题的小标题不计入这20行。那需要主题下的内容行数总和是20行? 然后每个主题需要2-5句,也就是每个主题的内容行数是2-5行?需要8个主题,所以总行数是8个主题各自的内容行数之和等于20行。需要分配每个主题的内容行数。比如,有些主题用3行,有些用2行? 比如,假设大部分主题用2行,可能需要8个主题×2行=16行,不够20行,所以需要部分主题用3行。比如4个主题用3行,4个用2行?这样总行数4×3+4×2=12+8=20行。这样分配可能更合理。 1. 【热门景点】 2. 【特色美食】 3. 【交通推荐】 4. 【住宿建议】 5. 【亲子活动】 6. 【摄影打卡】 7. 【购物优惠】 8. 【安全提示】 主题1:热门景点。可能的内容行: - 长城、故宫等经典景点人流量大,建议提前预约。 - 选择早班或错峰出行更省时。 主题2:特色美食。可能: - 北京烤鸭、老北京炸酱面是必尝美食。 - 居庸关长城脚下的农家菜也别有风味。 这样2行? 主题8:1→20?不行,每个主题需要至少2句,即至少2行?用户说每个主题不少于2句,不超过5句(不包含主题)。所以每个主题的内容行数至少2行,最多5行。所以主题8必须至少2行,所以前面可能调整: 比如主题1-6各3行,主题7-8各2行?这样总行数:6×3 +2×2=18+4=22,超过了20,需要调整。 主题行数分布:3,3,3,3,2,2,2,3 →总和是3×5+2×3+3=15+6+3=24?不对,可能需要重新计算。 假设5个主题用3行,3个主题用2行。总行数5×3+3×2=15+6=21,接近20,差1行。所以可能调整其中一个主题用2行,其他4个用3行,3个用2行? 接下来检查每个主题的句子数是否符合2-5句(不包含主题)。每个主题的内容行数即句子数吗?可能需要调整句子是否完整,每个行是一句话吗?是的,用户可能认为每行是句子。所以每个主题的内容行数等于句子数吗? 主题8:24 →44+28=72, +43=115, +42=157, +27=184, +27=211, +21=232, +24=256字,符合300以内。 3. 慕田峪等小景点人少景美。→14字(慕田峪等小景点人少景美:慕田峪3,等1,小景点3,人少2,景美2→3+1+3+2+2=11?可能我数的方式有问题,但大概总字数不会超过300。) 行5:“胡同小吃摊位有地道滋味。”→胡同(2)小吃(2)摊位(2)有(1)地道(2)滋味(2)→2+2+2+1+2+2?不,句子是“胡同小吃摊位有地道滋味。” 共11字? 行7-9:主题3的3行,约16+12+12=40字(地铁是出行首选,国庆加密班次→16字?可能需要更准确:地铁是出行首选,国庆加密班次→14字?) 不管怎样,总字数大概在44+30+40+40+30+30+20+20=244字?应该远低于300,符合要求。 For deployment, you can use `sglang>=0.4.6.post1` or `vllm>=0.8.5` or to create an OpenAI-compatible API endpoint.
TinyR1-Safety-8B
NAMI-T2I
Light-IF-4B
Light-IF-8B
360Zhinao2-7B-Base
Light-R1-7B-DS
BDM1.0
360Zhinao2-7B-Chat-32K-Int4
360Zhinao-7B-Chat-32K
360Zhinao3-7B-Instruct
Light-R1-32B-DS
360Zhinao-7B-Chat-360K-Int4
360Zhinao3-7B
360Zhinao2-7B-Chat-360K
360VL-70B
llama3-8B-360Zhinao-360k-Instruct
360Zhinao-1.8B-Reranking
360Zhinao3-7B-O1.5
360Zhinao-7B-Chat-4K-Int4
360Zhinao2-7B-Chat-4K-Int4
360Zhinao2-7B-Chat-32K
360Zhinao2-7B-Chat-360K-Int4
FancyVideo
Inner-Adaptor-Architecture
RefVTON
RelaCtrl
360Zhinao-7B-Chat-360K
360Zhinao-7B-Chat-32K-Int4
Qihoo-T2X
PlanGen
WISA
This is the official reproduction of WISA, designed to enhance Text-to-Video models by improving their ability to simulate the real world. WISA: World Simulator Assistant for Physics-Aware Text-to-Video Generation Jing Wang, Ao Ma, Ke Cao, Jun Zheng, Zhanjie Zhang, Jiasong Feng, Shanyuan Liu, Yuhang Ma, Bo Cheng, Dawei Leng‡, Yuhui Yin, Xiaodan Liang‡(Equal Contribution, ‡Corresponding Authors)
360Zhinao-7B-Chat-4K
EVTAR
person-to-person Try on with Additional Unpaired Visual Reference [](https://huggingface.co/qihoo360/RefVTON) [](https://arxiv.org/abs/2511.00956) We propose REFVTON, an End-to-End Virtual Try-on model with Additional Visual Reference, that directly fits the target garment onto the person image while incorporating reference images to enhance the model's ability to preserve and accurately depict clothing details. We provide pretrained backbone networks and LoRA weights for testing and deployment. Please download the `.safetensors` files from [here] and place them in the `checkpoints` directory. 512384pytorchloraweights.safetensors:512x384 resolution high-quality virtual fitting model ✅ Available 1024768pytorchloraweights.safetensors:1024x768 resolution high-quality virtual fitting model ✅ Available - [x] [2025.10.11] Release the virtual try-on inference code and LoRA weights. - [x] [2025.10.13] Release the technical report on Arxiv. - And End-To-End virtual try-on model: Can function either as an inpainting model for placing the target clothing into masked areas, or as a direct garment transfer onto the human body. - Using Reference Image To Enhance the Try-on Performance: To emulate human attention on the overall wearing effect rather than the garment itself when shopping online, our model allows using images of a model wearing the target clothing as input, thereby better preserving its material texture and design details. - Improved Performance Our model achieves state-of-the-art performance on public benchmarks and demonstrates strong generalization ability to in-the-wild inputs. Currently, we provide a small test set with additional reference images "difference person wearing the target cloth" for trying our model. We plan to release the reference data generation code, along with our proposed full dataset containing model reference images, in the future. Nevertheless, inference can still be performed in a reference-free setting on public benchmarks, including VITON-HD and DressCode. One key feature of our method is the use of reference data, where an image of a different person wearing the target garment is provided to help the model imagine how the target person would look in that garment. In most online shopping applications, such additonal reference images are commonly used by customers to better visualize the clothing. However, publicly available datasets such as VITON-HD and DressCode do not include such reference data, so we generate them ourselves. Please prepare the pretrained weights of the Flux-Kontext model and the Qwen2.5-VL-32B model. And you can generate the additonal reference image using the following commands: We provide pretrained backbone networks and LoRA weights for testing and deployment. Please download the `.safetensors` files from [here] and place them in the `checkpoints` directory. Here we provide the inference code for our REFVTON. - `pretrainedmodelnameorpath`: Path to the downloaded Flux-Kontext model weights. - `instancedatadir`: Path to your dataset. For inference on VITON-HD or DressCode, ensure that the words "viton" or "DressCode" appear in the path. - `outputdir`: Path to the downloaded or trained LoRA weights. - `condscale`: Resize scale of the reference image during training. Defaults to `1.0` for $512\times384$ and `2.0` for $1024\times768$ resolution. - `usereference`: Whether to use a additonal reference image as input. - `usedifferent`: Only applicable for VITON/DressCode inference. Whether to use different cloth-person pairs. - `useperson`: Only applicable for VITON/DressCode inference. Whether to use the unmasked person image instead of the agnostic masked image as input for the virtual try-on task. We quantitatively evaluate the quality of virtual try-on results using the FID, KID, SSIM, and LPIPS. Here, we provide the evaluation code for the VITON-HD and DressCode datasets. - `paired`: If you perform unpaired generation, where different garments are fitted onto the target person, you should enable this flag during evaluation. This code is mainly built upon Diffusers, Flux, and CatVTON repositories. Thanks so much for their solid work! If you find this repository useful, please consider citing our paper: