kanashi6

9 models • 4 total models in database
Sort by:

UniLIP-3B

This repository contains the model (3B version) presented in the paper UniLiP: Adapting CLIP for Unified Multimodal Understanding, Generation and Editing. UniLIP proposes a unified, CLIP-based encoder featuring both rich semantics and fine-grained image details. Through a two-stage and self-distillation training for reconstruction, we empower CLIP to achieve excellent reconstruction results without compromising its original understanding abilities. Leveraging this powerful unified representation, UniLIP excels across understanding, generation, and editing tasks. For more details, please refer to the original paper and the GitHub repository:

NaNK
license:apache-2.0
154
2

UniLIP 1B

This repository contains the model (1B version) presented in the paper UniLiP: Adapting CLIP for Unified Multimodal Understanding, Generation and Editing. UniLIP proposes a unified, CLIP-based encoder featuring both rich semantics and fine-grained image details. Through a two-stage and self-distillation training for reconstruction, we empower CLIP to achieve excellent reconstruction results without compromising its original understanding abilities. Leveraging this powerful unified representation, UniLIP excels across understanding, generation, and editing tasks. For more details, please refer to the original paper and the GitHub repository:

NaNK
license:apache-2.0
62
1

UFO-InternVL2-8B-rec-ft

NaNK
license:apache-2.0
2
1

UFO-InternVL2-8B-res-ft

NaNK
license:apache-2.0
1
1

UFO-InternVL2-8B-reasonseg-ft

NaNK
license:apache-2.0
1
1

GiT

license:apache-2.0
0
7

UFO

This repository contains the model presented in the paper UFO: A Unified Approach to Fine-grained Visual Perception via Open-ended Language Interface. UFO unifies object-level detection, pixel-level segmentation, and image-level vision-language tasks into a single model by transforming all perception targets into the language space. It introduces a novel embedding retrieval approach that relies solely on the language interface to support segmentation tasks. For more details, please refer to the original paper and the GitHub repository: - Paper: UFO: A Unified Approach to Fine-grained Visual Perception via Open-ended Language Interface - GitHub: https://github.com/nnnth/UFO

license:apache-2.0
0
3

UFO-InternVL2-8B-instruct

NaNK
license:apache-2.0
0
1

UniLIP

This repository contains the model (autoencoders) presented in the paper UniLiP: Adapting CLIP for Unified Multimodal Understanding, Generation and Editing. UniLIP proposes a unified, CLIP-based encoder featuring both rich semantics and fine-grained image details. Through a two-stage and self-distillation training for reconstruction, we empower CLIP to achieve excellent reconstruction results without compromising its original understanding abilities. Leveraging this powerful unified representation, UniLIP excels across understanding, generation, and editing tasks. For more details, please refer to the original paper and the GitHub repository:

NaNK
license:apache-2.0
0
1