kanashi6
UniLIP-3B
This repository contains the model (3B version) presented in the paper UniLiP: Adapting CLIP for Unified Multimodal Understanding, Generation and Editing. UniLIP proposes a unified, CLIP-based encoder featuring both rich semantics and fine-grained image details. Through a two-stage and self-distillation training for reconstruction, we empower CLIP to achieve excellent reconstruction results without compromising its original understanding abilities. Leveraging this powerful unified representation, UniLIP excels across understanding, generation, and editing tasks. For more details, please refer to the original paper and the GitHub repository:
UniLIP 1B
This repository contains the model (1B version) presented in the paper UniLiP: Adapting CLIP for Unified Multimodal Understanding, Generation and Editing. UniLIP proposes a unified, CLIP-based encoder featuring both rich semantics and fine-grained image details. Through a two-stage and self-distillation training for reconstruction, we empower CLIP to achieve excellent reconstruction results without compromising its original understanding abilities. Leveraging this powerful unified representation, UniLIP excels across understanding, generation, and editing tasks. For more details, please refer to the original paper and the GitHub repository:
UFO-InternVL2-8B-rec-ft
UFO-InternVL2-8B-res-ft
UFO-InternVL2-8B-reasonseg-ft
GiT
UFO
This repository contains the model presented in the paper UFO: A Unified Approach to Fine-grained Visual Perception via Open-ended Language Interface. UFO unifies object-level detection, pixel-level segmentation, and image-level vision-language tasks into a single model by transforming all perception targets into the language space. It introduces a novel embedding retrieval approach that relies solely on the language interface to support segmentation tasks. For more details, please refer to the original paper and the GitHub repository: - Paper: UFO: A Unified Approach to Fine-grained Visual Perception via Open-ended Language Interface - GitHub: https://github.com/nnnth/UFO
UFO-InternVL2-8B-instruct
UniLIP
This repository contains the model (autoencoders) presented in the paper UniLiP: Adapting CLIP for Unified Multimodal Understanding, Generation and Editing. UniLIP proposes a unified, CLIP-based encoder featuring both rich semantics and fine-grained image details. Through a two-stage and self-distillation training for reconstruction, we empower CLIP to achieve excellent reconstruction results without compromising its original understanding abilities. Leveraging this powerful unified representation, UniLIP excels across understanding, generation, and editing tasks. For more details, please refer to the original paper and the GitHub repository: