showlab

19 models • 2 total models in database

Sort by:

ShowUI-2B

NaNK

license:mit

2,379

267

magvitv2

license:mit

2,186

OmniConsistency

OmniConsistency: Learning Style-Agnostic Consistency from Paired Stylization Data [[Official Code]](https://github.com/showlab/OmniConsistency) [[Paper]](https://huggingface.co/papers/2505.18445) [[Dataset]](https://huggingface.co/datasets/showlab/OmniConsistency) We recommend using Python 3.10 and PyTorch with CUDA support. To set up the environment: You can download the OmniConsistency model and trained LoRAs directly from Hugging Face. Or download using Python script: Usage Here's a basic example of using OmniConsistency: Datasets Our datasets have been uploaded to the Hugging Face. and is available for direct use via the datasets library. You can easily load any of the 22 style subsets like this:

Show O2 7B

[//]: # ( Show-o2: Improved Unified Multimodal Models ) Jinheng Xie 1   Zhenheng Yang 2   Mike Zheng Shou 1 1 Show Lab, National University of Singapore  2 Bytedance  [](https://arxiv.org/abs/2506.15564) [](https://github.com/showlab/Show-o/tree/main/show-o2) [](https://github.com/showlab/Show-o/blob/main/docs/wechatqa3.jpg) This paper presents improved native unified multimodal models, \emph{i.e.,} Show-o2, that leverage autoregressive modeling and flow matching. Built upon a 3D causal variational autoencoder space, unified visual representations are constructed through a dual-path of spatial (-temporal) fusion, enabling scalability across image and video modalities while ensuring effective multimodal understanding and generation. Based on a language model, autoregressive modeling and flow matching are natively applied to the language head and flow head, respectively, to facilitate text token prediction and image/video generation. A two-stage training recipe is designed to effectively learn and scale to larger models. The resulting Show-o2 models demonstrate versatility in handling a wide range of multimodal understanding and generation tasks across diverse modalities, including text, images, and videos. Code and models are released at this https URL . What is the new about Show-o2? We perform the unified learning of multimodal understanding and generation on the text token and 3D Causal VAE space, which is scalable for text, image, and video modalities. A dual-path of spatial (-temporal) fusion is proposed to accommodate the distinct feature dependency of multimodal understanding and generation. We employ specific heads with autoregressive modeling and flow matching for the overall unified learning of multimodal understanding, image/video and mixed-modality generation. Pre-trained Model Weigths The Show-o2 checkpoints can be found on Hugging Face: showlab/show-o2-1.5B showlab/show-o2-1.5B-HQ showlab/show-o2-7B showlab/show-o2-1.5B (further unified fine-tuning on video understanding data) showlab/show-o2-7B (further unified fine-tuning on video understanding data) Login your wandb account on your machine or server. Download Wan2.1 3D causal VAE model weight here and put it on the current directory. Demo for Multimodal Understanding and you can find the results on wandb. Demo for Text-to-Image Generation and you can find the results on wandb. Citation To cite the paper and model, please use the below: Acknowledgments This work is heavily based on Show-o.

NaNK

license:apache-2.0

205

showlab

ShowUI-2B

magvitv2

OmniConsistency

Show O2 7B

show-o

show-1-base

show-o2-1.5B

show-1-interpolation

Show O 512x512

show-o2-7B-w-video-und

show-1-sr1

show-1-sr2

show-o2-1.5B-HQ

show-o-w-clip-vit-512x512

show-o-w-clip-vit

show-o2-1.5B-w-video-und

show-o-512x512-wo-llava-tuning

show-1-base-0.0

makeanything