show-o2-1.5B
107
3
1.5B
license:apache-2.0
by
showlab
Other
OTHER
1.5B params
New
107 downloads
Early-stage
Edge AI:
Mobile
Laptop
Server
4GB+ RAM
Mobile
Laptop
Server
Quick Summary
[//]: # ( Show-o2: Improved Unified Multimodal Models ) Jinheng Xie 1 Zhenheng Yang 2 Mike Zheng Shou 1 1 Show Lab, National University of Singa...
Device Compatibility
Mobile
4-6GB RAM
Laptop
16GB RAM
Server
GPU
Minimum Recommended
2GB+ RAM
Code Examples
image-leveltext
# image-level
python3 inference_mmu.py config=configs/showo2_7b_demo_432x432.yaml \
mmu_image_path=./docs/mmu/pexels-jane-pham-727419-1571673.jpg question='Describe the image in detail.'
python3 inference_mmu.py config=configs/showo2_7b_demo_432x432.yaml \
mmu_image_path=./docs/mmu/pexels-fotios-photos-2923436.jpg question='请告诉我图片中写着什么?'
python3 inference_mmu.py config=configs/showo2_7b_demo_432x432.yaml \
mmu_image_path=./docs/mmu/pexels-taryn-elliott-4144459.jpg question='How many avocados (including the halved) are in this image? Tell me how to make an avocado milkshake in detail.'
# video
python3 inference_mmu_vid.py config=configs/showo2_7b_demo_video_understanding.yaml \
mmu_video_path='./docs/videos/' question="Describe the video." \
num_video_frames_mmu=32
python3 inference_mmu_vid.py config=configs/showo2_1.5b_demo_video_understanding.yaml \
mmu_video_path='./docs/videos/' question="Describe the video." \
num_video_frames_mmu=32image-leveltext
# image-level
python3 inference_mmu.py config=configs/showo2_7b_demo_432x432.yaml \
mmu_image_path=./docs/mmu/pexels-jane-pham-727419-1571673.jpg question='Describe the image in detail.'
python3 inference_mmu.py config=configs/showo2_7b_demo_432x432.yaml \
mmu_image_path=./docs/mmu/pexels-fotios-photos-2923436.jpg question='请告诉我图片中写着什么?'
python3 inference_mmu.py config=configs/showo2_7b_demo_432x432.yaml \
mmu_image_path=./docs/mmu/pexels-taryn-elliott-4144459.jpg question='How many avocados (including the halved) are in this image? Tell me how to make an avocado milkshake in detail.'
# video
python3 inference_mmu_vid.py config=configs/showo2_7b_demo_video_understanding.yaml \
mmu_video_path='./docs/videos/' question="Describe the video." \
num_video_frames_mmu=32
python3 inference_mmu_vid.py config=configs/showo2_1.5b_demo_video_understanding.yaml \
mmu_video_path='./docs/videos/' question="Describe the video." \
num_video_frames_mmu=32image-leveltext
# image-level
python3 inference_mmu.py config=configs/showo2_7b_demo_432x432.yaml \
mmu_image_path=./docs/mmu/pexels-jane-pham-727419-1571673.jpg question='Describe the image in detail.'
python3 inference_mmu.py config=configs/showo2_7b_demo_432x432.yaml \
mmu_image_path=./docs/mmu/pexels-fotios-photos-2923436.jpg question='请告诉我图片中写着什么?'
python3 inference_mmu.py config=configs/showo2_7b_demo_432x432.yaml \
mmu_image_path=./docs/mmu/pexels-taryn-elliott-4144459.jpg question='How many avocados (including the halved) are in this image? Tell me how to make an avocado milkshake in detail.'
# video
python3 inference_mmu_vid.py config=configs/showo2_7b_demo_video_understanding.yaml \
mmu_video_path='./docs/videos/' question="Describe the video." \
num_video_frames_mmu=32
python3 inference_mmu_vid.py config=configs/showo2_1.5b_demo_video_understanding.yaml \
mmu_video_path='./docs/videos/' question="Describe the video." \
num_video_frames_mmu=32image-leveltext
# image-level
python3 inference_mmu.py config=configs/showo2_7b_demo_432x432.yaml \
mmu_image_path=./docs/mmu/pexels-jane-pham-727419-1571673.jpg question='Describe the image in detail.'
python3 inference_mmu.py config=configs/showo2_7b_demo_432x432.yaml \
mmu_image_path=./docs/mmu/pexels-fotios-photos-2923436.jpg question='请告诉我图片中写着什么?'
python3 inference_mmu.py config=configs/showo2_7b_demo_432x432.yaml \
mmu_image_path=./docs/mmu/pexels-taryn-elliott-4144459.jpg question='How many avocados (including the halved) are in this image? Tell me how to make an avocado milkshake in detail.'
# video
python3 inference_mmu_vid.py config=configs/showo2_7b_demo_video_understanding.yaml \
mmu_video_path='./docs/videos/' question="Describe the video." \
num_video_frames_mmu=32
python3 inference_mmu_vid.py config=configs/showo2_1.5b_demo_video_understanding.yaml \
mmu_video_path='./docs/videos/' question="Describe the video." \
num_video_frames_mmu=32image-leveltext
# image-level
python3 inference_mmu.py config=configs/showo2_7b_demo_432x432.yaml \
mmu_image_path=./docs/mmu/pexels-jane-pham-727419-1571673.jpg question='Describe the image in detail.'
python3 inference_mmu.py config=configs/showo2_7b_demo_432x432.yaml \
mmu_image_path=./docs/mmu/pexels-fotios-photos-2923436.jpg question='请告诉我图片中写着什么?'
python3 inference_mmu.py config=configs/showo2_7b_demo_432x432.yaml \
mmu_image_path=./docs/mmu/pexels-taryn-elliott-4144459.jpg question='How many avocados (including the halved) are in this image? Tell me how to make an avocado milkshake in detail.'
# video
python3 inference_mmu_vid.py config=configs/showo2_7b_demo_video_understanding.yaml \
mmu_video_path='./docs/videos/' question="Describe the video." \
num_video_frames_mmu=32
python3 inference_mmu_vid.py config=configs/showo2_1.5b_demo_video_understanding.yaml \
mmu_video_path='./docs/videos/' question="Describe the video." \
num_video_frames_mmu=32image-leveltext
# image-level
python3 inference_mmu.py config=configs/showo2_7b_demo_432x432.yaml \
mmu_image_path=./docs/mmu/pexels-jane-pham-727419-1571673.jpg question='Describe the image in detail.'
python3 inference_mmu.py config=configs/showo2_7b_demo_432x432.yaml \
mmu_image_path=./docs/mmu/pexels-fotios-photos-2923436.jpg question='请告诉我图片中写着什么?'
python3 inference_mmu.py config=configs/showo2_7b_demo_432x432.yaml \
mmu_image_path=./docs/mmu/pexels-taryn-elliott-4144459.jpg question='How many avocados (including the halved) are in this image? Tell me how to make an avocado milkshake in detail.'
# video
python3 inference_mmu_vid.py config=configs/showo2_7b_demo_video_understanding.yaml \
mmu_video_path='./docs/videos/' question="Describe the video." \
num_video_frames_mmu=32
python3 inference_mmu_vid.py config=configs/showo2_1.5b_demo_video_understanding.yaml \
mmu_video_path='./docs/videos/' question="Describe the video." \
num_video_frames_mmu=32image-leveltext
# image-level
python3 inference_mmu.py config=configs/showo2_7b_demo_432x432.yaml \
mmu_image_path=./docs/mmu/pexels-jane-pham-727419-1571673.jpg question='Describe the image in detail.'
python3 inference_mmu.py config=configs/showo2_7b_demo_432x432.yaml \
mmu_image_path=./docs/mmu/pexels-fotios-photos-2923436.jpg question='请告诉我图片中写着什么?'
python3 inference_mmu.py config=configs/showo2_7b_demo_432x432.yaml \
mmu_image_path=./docs/mmu/pexels-taryn-elliott-4144459.jpg question='How many avocados (including the halved) are in this image? Tell me how to make an avocado milkshake in detail.'
# video
python3 inference_mmu_vid.py config=configs/showo2_7b_demo_video_understanding.yaml \
mmu_video_path='./docs/videos/' question="Describe the video." \
num_video_frames_mmu=32
python3 inference_mmu_vid.py config=configs/showo2_1.5b_demo_video_understanding.yaml \
mmu_video_path='./docs/videos/' question="Describe the video." \
num_video_frames_mmu=32image-leveltext
# image-level
python3 inference_mmu.py config=configs/showo2_7b_demo_432x432.yaml \
mmu_image_path=./docs/mmu/pexels-jane-pham-727419-1571673.jpg question='Describe the image in detail.'
python3 inference_mmu.py config=configs/showo2_7b_demo_432x432.yaml \
mmu_image_path=./docs/mmu/pexels-fotios-photos-2923436.jpg question='请告诉我图片中写着什么?'
python3 inference_mmu.py config=configs/showo2_7b_demo_432x432.yaml \
mmu_image_path=./docs/mmu/pexels-taryn-elliott-4144459.jpg question='How many avocados (including the halved) are in this image? Tell me how to make an avocado milkshake in detail.'
# video
python3 inference_mmu_vid.py config=configs/showo2_7b_demo_video_understanding.yaml \
mmu_video_path='./docs/videos/' question="Describe the video." \
num_video_frames_mmu=32
python3 inference_mmu_vid.py config=configs/showo2_1.5b_demo_video_understanding.yaml \
mmu_video_path='./docs/videos/' question="Describe the video." \
num_video_frames_mmu=32image-leveltext
# image-level
python3 inference_mmu.py config=configs/showo2_7b_demo_432x432.yaml \
mmu_image_path=./docs/mmu/pexels-jane-pham-727419-1571673.jpg question='Describe the image in detail.'
python3 inference_mmu.py config=configs/showo2_7b_demo_432x432.yaml \
mmu_image_path=./docs/mmu/pexels-fotios-photos-2923436.jpg question='请告诉我图片中写着什么?'
python3 inference_mmu.py config=configs/showo2_7b_demo_432x432.yaml \
mmu_image_path=./docs/mmu/pexels-taryn-elliott-4144459.jpg question='How many avocados (including the halved) are in this image? Tell me how to make an avocado milkshake in detail.'
# video
python3 inference_mmu_vid.py config=configs/showo2_7b_demo_video_understanding.yaml \
mmu_video_path='./docs/videos/' question="Describe the video." \
num_video_frames_mmu=32
python3 inference_mmu_vid.py config=configs/showo2_1.5b_demo_video_understanding.yaml \
mmu_video_path='./docs/videos/' question="Describe the video." \
num_video_frames_mmu=32image-leveltext
# image-level
python3 inference_mmu.py config=configs/showo2_7b_demo_432x432.yaml \
mmu_image_path=./docs/mmu/pexels-jane-pham-727419-1571673.jpg question='Describe the image in detail.'
python3 inference_mmu.py config=configs/showo2_7b_demo_432x432.yaml \
mmu_image_path=./docs/mmu/pexels-fotios-photos-2923436.jpg question='请告诉我图片中写着什么?'
python3 inference_mmu.py config=configs/showo2_7b_demo_432x432.yaml \
mmu_image_path=./docs/mmu/pexels-taryn-elliott-4144459.jpg question='How many avocados (including the halved) are in this image? Tell me how to make an avocado milkshake in detail.'
# video
python3 inference_mmu_vid.py config=configs/showo2_7b_demo_video_understanding.yaml \
mmu_video_path='./docs/videos/' question="Describe the video." \
num_video_frames_mmu=32
python3 inference_mmu_vid.py config=configs/showo2_1.5b_demo_video_understanding.yaml \
mmu_video_path='./docs/videos/' question="Describe the video." \
num_video_frames_mmu=32image-leveltext
# image-level
python3 inference_mmu.py config=configs/showo2_7b_demo_432x432.yaml \
mmu_image_path=./docs/mmu/pexels-jane-pham-727419-1571673.jpg question='Describe the image in detail.'
python3 inference_mmu.py config=configs/showo2_7b_demo_432x432.yaml \
mmu_image_path=./docs/mmu/pexels-fotios-photos-2923436.jpg question='请告诉我图片中写着什么?'
python3 inference_mmu.py config=configs/showo2_7b_demo_432x432.yaml \
mmu_image_path=./docs/mmu/pexels-taryn-elliott-4144459.jpg question='How many avocados (including the halved) are in this image? Tell me how to make an avocado milkshake in detail.'
# video
python3 inference_mmu_vid.py config=configs/showo2_7b_demo_video_understanding.yaml \
mmu_video_path='./docs/videos/' question="Describe the video." \
num_video_frames_mmu=32
python3 inference_mmu_vid.py config=configs/showo2_1.5b_demo_video_understanding.yaml \
mmu_video_path='./docs/videos/' question="Describe the video." \
num_video_frames_mmu=32image-leveltext
# image-level
python3 inference_mmu.py config=configs/showo2_7b_demo_432x432.yaml \
mmu_image_path=./docs/mmu/pexels-jane-pham-727419-1571673.jpg question='Describe the image in detail.'
python3 inference_mmu.py config=configs/showo2_7b_demo_432x432.yaml \
mmu_image_path=./docs/mmu/pexels-fotios-photos-2923436.jpg question='请告诉我图片中写着什么?'
python3 inference_mmu.py config=configs/showo2_7b_demo_432x432.yaml \
mmu_image_path=./docs/mmu/pexels-taryn-elliott-4144459.jpg question='How many avocados (including the halved) are in this image? Tell me how to make an avocado milkshake in detail.'
# video
python3 inference_mmu_vid.py config=configs/showo2_7b_demo_video_understanding.yaml \
mmu_video_path='./docs/videos/' question="Describe the video." \
num_video_frames_mmu=32
python3 inference_mmu_vid.py config=configs/showo2_1.5b_demo_video_understanding.yaml \
mmu_video_path='./docs/videos/' question="Describe the video." \
num_video_frames_mmu=32image-leveltext
# image-level
python3 inference_mmu.py config=configs/showo2_7b_demo_432x432.yaml \
mmu_image_path=./docs/mmu/pexels-jane-pham-727419-1571673.jpg question='Describe the image in detail.'
python3 inference_mmu.py config=configs/showo2_7b_demo_432x432.yaml \
mmu_image_path=./docs/mmu/pexels-fotios-photos-2923436.jpg question='请告诉我图片中写着什么?'
python3 inference_mmu.py config=configs/showo2_7b_demo_432x432.yaml \
mmu_image_path=./docs/mmu/pexels-taryn-elliott-4144459.jpg question='How many avocados (including the halved) are in this image? Tell me how to make an avocado milkshake in detail.'
# video
python3 inference_mmu_vid.py config=configs/showo2_7b_demo_video_understanding.yaml \
mmu_video_path='./docs/videos/' question="Describe the video." \
num_video_frames_mmu=32
python3 inference_mmu_vid.py config=configs/showo2_1.5b_demo_video_understanding.yaml \
mmu_video_path='./docs/videos/' question="Describe the video." \
num_video_frames_mmu=32image-leveltext
# image-level
python3 inference_mmu.py config=configs/showo2_7b_demo_432x432.yaml \
mmu_image_path=./docs/mmu/pexels-jane-pham-727419-1571673.jpg question='Describe the image in detail.'
python3 inference_mmu.py config=configs/showo2_7b_demo_432x432.yaml \
mmu_image_path=./docs/mmu/pexels-fotios-photos-2923436.jpg question='请告诉我图片中写着什么?'
python3 inference_mmu.py config=configs/showo2_7b_demo_432x432.yaml \
mmu_image_path=./docs/mmu/pexels-taryn-elliott-4144459.jpg question='How many avocados (including the halved) are in this image? Tell me how to make an avocado milkshake in detail.'
# video
python3 inference_mmu_vid.py config=configs/showo2_7b_demo_video_understanding.yaml \
mmu_video_path='./docs/videos/' question="Describe the video." \
num_video_frames_mmu=32
python3 inference_mmu_vid.py config=configs/showo2_1.5b_demo_video_understanding.yaml \
mmu_video_path='./docs/videos/' question="Describe the video." \
num_video_frames_mmu=32image-leveltext
# image-level
python3 inference_mmu.py config=configs/showo2_7b_demo_432x432.yaml \
mmu_image_path=./docs/mmu/pexels-jane-pham-727419-1571673.jpg question='Describe the image in detail.'
python3 inference_mmu.py config=configs/showo2_7b_demo_432x432.yaml \
mmu_image_path=./docs/mmu/pexels-fotios-photos-2923436.jpg question='请告诉我图片中写着什么?'
python3 inference_mmu.py config=configs/showo2_7b_demo_432x432.yaml \
mmu_image_path=./docs/mmu/pexels-taryn-elliott-4144459.jpg question='How many avocados (including the halved) are in this image? Tell me how to make an avocado milkshake in detail.'
# video
python3 inference_mmu_vid.py config=configs/showo2_7b_demo_video_understanding.yaml \
mmu_video_path='./docs/videos/' question="Describe the video." \
num_video_frames_mmu=32
python3 inference_mmu_vid.py config=configs/showo2_1.5b_demo_video_understanding.yaml \
mmu_video_path='./docs/videos/' question="Describe the video." \
num_video_frames_mmu=32image-leveltext
# image-level
python3 inference_mmu.py config=configs/showo2_7b_demo_432x432.yaml \
mmu_image_path=./docs/mmu/pexels-jane-pham-727419-1571673.jpg question='Describe the image in detail.'
python3 inference_mmu.py config=configs/showo2_7b_demo_432x432.yaml \
mmu_image_path=./docs/mmu/pexels-fotios-photos-2923436.jpg question='请告诉我图片中写着什么?'
python3 inference_mmu.py config=configs/showo2_7b_demo_432x432.yaml \
mmu_image_path=./docs/mmu/pexels-taryn-elliott-4144459.jpg question='How many avocados (including the halved) are in this image? Tell me how to make an avocado milkshake in detail.'
# video
python3 inference_mmu_vid.py config=configs/showo2_7b_demo_video_understanding.yaml \
mmu_video_path='./docs/videos/' question="Describe the video." \
num_video_frames_mmu=32
python3 inference_mmu_vid.py config=configs/showo2_1.5b_demo_video_understanding.yaml \
mmu_video_path='./docs/videos/' question="Describe the video." \
num_video_frames_mmu=32image-leveltext
# image-level
python3 inference_mmu.py config=configs/showo2_7b_demo_432x432.yaml \
mmu_image_path=./docs/mmu/pexels-jane-pham-727419-1571673.jpg question='Describe the image in detail.'
python3 inference_mmu.py config=configs/showo2_7b_demo_432x432.yaml \
mmu_image_path=./docs/mmu/pexels-fotios-photos-2923436.jpg question='请告诉我图片中写着什么?'
python3 inference_mmu.py config=configs/showo2_7b_demo_432x432.yaml \
mmu_image_path=./docs/mmu/pexels-taryn-elliott-4144459.jpg question='How many avocados (including the halved) are in this image? Tell me how to make an avocado milkshake in detail.'
# video
python3 inference_mmu_vid.py config=configs/showo2_7b_demo_video_understanding.yaml \
mmu_video_path='./docs/videos/' question="Describe the video." \
num_video_frames_mmu=32
python3 inference_mmu_vid.py config=configs/showo2_1.5b_demo_video_understanding.yaml \
mmu_video_path='./docs/videos/' question="Describe the video." \
num_video_frames_mmu=32image-leveltext
# image-level
python3 inference_mmu.py config=configs/showo2_7b_demo_432x432.yaml \
mmu_image_path=./docs/mmu/pexels-jane-pham-727419-1571673.jpg question='Describe the image in detail.'
python3 inference_mmu.py config=configs/showo2_7b_demo_432x432.yaml \
mmu_image_path=./docs/mmu/pexels-fotios-photos-2923436.jpg question='请告诉我图片中写着什么?'
python3 inference_mmu.py config=configs/showo2_7b_demo_432x432.yaml \
mmu_image_path=./docs/mmu/pexels-taryn-elliott-4144459.jpg question='How many avocados (including the halved) are in this image? Tell me how to make an avocado milkshake in detail.'
# video
python3 inference_mmu_vid.py config=configs/showo2_7b_demo_video_understanding.yaml \
mmu_video_path='./docs/videos/' question="Describe the video." \
num_video_frames_mmu=32
python3 inference_mmu_vid.py config=configs/showo2_1.5b_demo_video_understanding.yaml \
mmu_video_path='./docs/videos/' question="Describe the video." \
num_video_frames_mmu=32image-leveltext
# image-level
python3 inference_mmu.py config=configs/showo2_7b_demo_432x432.yaml \
mmu_image_path=./docs/mmu/pexels-jane-pham-727419-1571673.jpg question='Describe the image in detail.'
python3 inference_mmu.py config=configs/showo2_7b_demo_432x432.yaml \
mmu_image_path=./docs/mmu/pexels-fotios-photos-2923436.jpg question='请告诉我图片中写着什么?'
python3 inference_mmu.py config=configs/showo2_7b_demo_432x432.yaml \
mmu_image_path=./docs/mmu/pexels-taryn-elliott-4144459.jpg question='How many avocados (including the halved) are in this image? Tell me how to make an avocado milkshake in detail.'
# video
python3 inference_mmu_vid.py config=configs/showo2_7b_demo_video_understanding.yaml \
mmu_video_path='./docs/videos/' question="Describe the video." \
num_video_frames_mmu=32
python3 inference_mmu_vid.py config=configs/showo2_1.5b_demo_video_understanding.yaml \
mmu_video_path='./docs/videos/' question="Describe the video." \
num_video_frames_mmu=32image-leveltext
# image-level
python3 inference_mmu.py config=configs/showo2_7b_demo_432x432.yaml \
mmu_image_path=./docs/mmu/pexels-jane-pham-727419-1571673.jpg question='Describe the image in detail.'
python3 inference_mmu.py config=configs/showo2_7b_demo_432x432.yaml \
mmu_image_path=./docs/mmu/pexels-fotios-photos-2923436.jpg question='请告诉我图片中写着什么?'
python3 inference_mmu.py config=configs/showo2_7b_demo_432x432.yaml \
mmu_image_path=./docs/mmu/pexels-taryn-elliott-4144459.jpg question='How many avocados (including the halved) are in this image? Tell me how to make an avocado milkshake in detail.'
# video
python3 inference_mmu_vid.py config=configs/showo2_7b_demo_video_understanding.yaml \
mmu_video_path='./docs/videos/' question="Describe the video." \
num_video_frames_mmu=32
python3 inference_mmu_vid.py config=configs/showo2_1.5b_demo_video_understanding.yaml \
mmu_video_path='./docs/videos/' question="Describe the video." \
num_video_frames_mmu=32image-leveltext
# image-level
python3 inference_mmu.py config=configs/showo2_7b_demo_432x432.yaml \
mmu_image_path=./docs/mmu/pexels-jane-pham-727419-1571673.jpg question='Describe the image in detail.'
python3 inference_mmu.py config=configs/showo2_7b_demo_432x432.yaml \
mmu_image_path=./docs/mmu/pexels-fotios-photos-2923436.jpg question='请告诉我图片中写着什么?'
python3 inference_mmu.py config=configs/showo2_7b_demo_432x432.yaml \
mmu_image_path=./docs/mmu/pexels-taryn-elliott-4144459.jpg question='How many avocados (including the halved) are in this image? Tell me how to make an avocado milkshake in detail.'
# video
python3 inference_mmu_vid.py config=configs/showo2_7b_demo_video_understanding.yaml \
mmu_video_path='./docs/videos/' question="Describe the video." \
num_video_frames_mmu=32
python3 inference_mmu_vid.py config=configs/showo2_1.5b_demo_video_understanding.yaml \
mmu_video_path='./docs/videos/' question="Describe the video." \
num_video_frames_mmu=32image-leveltext
# image-level
python3 inference_mmu.py config=configs/showo2_7b_demo_432x432.yaml \
mmu_image_path=./docs/mmu/pexels-jane-pham-727419-1571673.jpg question='Describe the image in detail.'
python3 inference_mmu.py config=configs/showo2_7b_demo_432x432.yaml \
mmu_image_path=./docs/mmu/pexels-fotios-photos-2923436.jpg question='请告诉我图片中写着什么?'
python3 inference_mmu.py config=configs/showo2_7b_demo_432x432.yaml \
mmu_image_path=./docs/mmu/pexels-taryn-elliott-4144459.jpg question='How many avocados (including the halved) are in this image? Tell me how to make an avocado milkshake in detail.'
# video
python3 inference_mmu_vid.py config=configs/showo2_7b_demo_video_understanding.yaml \
mmu_video_path='./docs/videos/' question="Describe the video." \
num_video_frames_mmu=32
python3 inference_mmu_vid.py config=configs/showo2_1.5b_demo_video_understanding.yaml \
mmu_video_path='./docs/videos/' question="Describe the video." \
num_video_frames_mmu=32image-leveltext
# image-level
python3 inference_mmu.py config=configs/showo2_7b_demo_432x432.yaml \
mmu_image_path=./docs/mmu/pexels-jane-pham-727419-1571673.jpg question='Describe the image in detail.'
python3 inference_mmu.py config=configs/showo2_7b_demo_432x432.yaml \
mmu_image_path=./docs/mmu/pexels-fotios-photos-2923436.jpg question='请告诉我图片中写着什么?'
python3 inference_mmu.py config=configs/showo2_7b_demo_432x432.yaml \
mmu_image_path=./docs/mmu/pexels-taryn-elliott-4144459.jpg question='How many avocados (including the halved) are in this image? Tell me how to make an avocado milkshake in detail.'
# video
python3 inference_mmu_vid.py config=configs/showo2_7b_demo_video_understanding.yaml \
mmu_video_path='./docs/videos/' question="Describe the video." \
num_video_frames_mmu=32
python3 inference_mmu_vid.py config=configs/showo2_1.5b_demo_video_understanding.yaml \
mmu_video_path='./docs/videos/' question="Describe the video." \
num_video_frames_mmu=32image-leveltext
# image-level
python3 inference_mmu.py config=configs/showo2_7b_demo_432x432.yaml \
mmu_image_path=./docs/mmu/pexels-jane-pham-727419-1571673.jpg question='Describe the image in detail.'
python3 inference_mmu.py config=configs/showo2_7b_demo_432x432.yaml \
mmu_image_path=./docs/mmu/pexels-fotios-photos-2923436.jpg question='请告诉我图片中写着什么?'
python3 inference_mmu.py config=configs/showo2_7b_demo_432x432.yaml \
mmu_image_path=./docs/mmu/pexels-taryn-elliott-4144459.jpg question='How many avocados (including the halved) are in this image? Tell me how to make an avocado milkshake in detail.'
# video
python3 inference_mmu_vid.py config=configs/showo2_7b_demo_video_understanding.yaml \
mmu_video_path='./docs/videos/' question="Describe the video." \
num_video_frames_mmu=32
python3 inference_mmu_vid.py config=configs/showo2_1.5b_demo_video_understanding.yaml \
mmu_video_path='./docs/videos/' question="Describe the video." \
num_video_frames_mmu=32image-leveltext
# image-level
python3 inference_mmu.py config=configs/showo2_7b_demo_432x432.yaml \
mmu_image_path=./docs/mmu/pexels-jane-pham-727419-1571673.jpg question='Describe the image in detail.'
python3 inference_mmu.py config=configs/showo2_7b_demo_432x432.yaml \
mmu_image_path=./docs/mmu/pexels-fotios-photos-2923436.jpg question='请告诉我图片中写着什么?'
python3 inference_mmu.py config=configs/showo2_7b_demo_432x432.yaml \
mmu_image_path=./docs/mmu/pexels-taryn-elliott-4144459.jpg question='How many avocados (including the halved) are in this image? Tell me how to make an avocado milkshake in detail.'
# video
python3 inference_mmu_vid.py config=configs/showo2_7b_demo_video_understanding.yaml \
mmu_video_path='./docs/videos/' question="Describe the video." \
num_video_frames_mmu=32
python3 inference_mmu_vid.py config=configs/showo2_1.5b_demo_video_understanding.yaml \
mmu_video_path='./docs/videos/' question="Describe the video." \
num_video_frames_mmu=32image-leveltext
# image-level
python3 inference_mmu.py config=configs/showo2_7b_demo_432x432.yaml \
mmu_image_path=./docs/mmu/pexels-jane-pham-727419-1571673.jpg question='Describe the image in detail.'
python3 inference_mmu.py config=configs/showo2_7b_demo_432x432.yaml \
mmu_image_path=./docs/mmu/pexels-fotios-photos-2923436.jpg question='请告诉我图片中写着什么?'
python3 inference_mmu.py config=configs/showo2_7b_demo_432x432.yaml \
mmu_image_path=./docs/mmu/pexels-taryn-elliott-4144459.jpg question='How many avocados (including the halved) are in this image? Tell me how to make an avocado milkshake in detail.'
# video
python3 inference_mmu_vid.py config=configs/showo2_7b_demo_video_understanding.yaml \
mmu_video_path='./docs/videos/' question="Describe the video." \
num_video_frames_mmu=32
python3 inference_mmu_vid.py config=configs/showo2_1.5b_demo_video_understanding.yaml \
mmu_video_path='./docs/videos/' question="Describe the video." \
num_video_frames_mmu=32image-leveltext
# image-level
python3 inference_mmu.py config=configs/showo2_7b_demo_432x432.yaml \
mmu_image_path=./docs/mmu/pexels-jane-pham-727419-1571673.jpg question='Describe the image in detail.'
python3 inference_mmu.py config=configs/showo2_7b_demo_432x432.yaml \
mmu_image_path=./docs/mmu/pexels-fotios-photos-2923436.jpg question='请告诉我图片中写着什么?'
python3 inference_mmu.py config=configs/showo2_7b_demo_432x432.yaml \
mmu_image_path=./docs/mmu/pexels-taryn-elliott-4144459.jpg question='How many avocados (including the halved) are in this image? Tell me how to make an avocado milkshake in detail.'
# video
python3 inference_mmu_vid.py config=configs/showo2_7b_demo_video_understanding.yaml \
mmu_video_path='./docs/videos/' question="Describe the video." \
num_video_frames_mmu=32
python3 inference_mmu_vid.py config=configs/showo2_1.5b_demo_video_understanding.yaml \
mmu_video_path='./docs/videos/' question="Describe the video." \
num_video_frames_mmu=32image-leveltext
# image-level
python3 inference_mmu.py config=configs/showo2_7b_demo_432x432.yaml \
mmu_image_path=./docs/mmu/pexels-jane-pham-727419-1571673.jpg question='Describe the image in detail.'
python3 inference_mmu.py config=configs/showo2_7b_demo_432x432.yaml \
mmu_image_path=./docs/mmu/pexels-fotios-photos-2923436.jpg question='请告诉我图片中写着什么?'
python3 inference_mmu.py config=configs/showo2_7b_demo_432x432.yaml \
mmu_image_path=./docs/mmu/pexels-taryn-elliott-4144459.jpg question='How many avocados (including the halved) are in this image? Tell me how to make an avocado milkshake in detail.'
# video
python3 inference_mmu_vid.py config=configs/showo2_7b_demo_video_understanding.yaml \
mmu_video_path='./docs/videos/' question="Describe the video." \
num_video_frames_mmu=32
python3 inference_mmu_vid.py config=configs/showo2_1.5b_demo_video_understanding.yaml \
mmu_video_path='./docs/videos/' question="Describe the video." \
num_video_frames_mmu=32image-leveltext
# image-level
python3 inference_mmu.py config=configs/showo2_7b_demo_432x432.yaml \
mmu_image_path=./docs/mmu/pexels-jane-pham-727419-1571673.jpg question='Describe the image in detail.'
python3 inference_mmu.py config=configs/showo2_7b_demo_432x432.yaml \
mmu_image_path=./docs/mmu/pexels-fotios-photos-2923436.jpg question='请告诉我图片中写着什么?'
python3 inference_mmu.py config=configs/showo2_7b_demo_432x432.yaml \
mmu_image_path=./docs/mmu/pexels-taryn-elliott-4144459.jpg question='How many avocados (including the halved) are in this image? Tell me how to make an avocado milkshake in detail.'
# video
python3 inference_mmu_vid.py config=configs/showo2_7b_demo_video_understanding.yaml \
mmu_video_path='./docs/videos/' question="Describe the video." \
num_video_frames_mmu=32
python3 inference_mmu_vid.py config=configs/showo2_1.5b_demo_video_understanding.yaml \
mmu_video_path='./docs/videos/' question="Describe the video." \
num_video_frames_mmu=32image-leveltext
# image-level
python3 inference_mmu.py config=configs/showo2_7b_demo_432x432.yaml \
mmu_image_path=./docs/mmu/pexels-jane-pham-727419-1571673.jpg question='Describe the image in detail.'
python3 inference_mmu.py config=configs/showo2_7b_demo_432x432.yaml \
mmu_image_path=./docs/mmu/pexels-fotios-photos-2923436.jpg question='请告诉我图片中写着什么?'
python3 inference_mmu.py config=configs/showo2_7b_demo_432x432.yaml \
mmu_image_path=./docs/mmu/pexels-taryn-elliott-4144459.jpg question='How many avocados (including the halved) are in this image? Tell me how to make an avocado milkshake in detail.'
# video
python3 inference_mmu_vid.py config=configs/showo2_7b_demo_video_understanding.yaml \
mmu_video_path='./docs/videos/' question="Describe the video." \
num_video_frames_mmu=32
python3 inference_mmu_vid.py config=configs/showo2_1.5b_demo_video_understanding.yaml \
mmu_video_path='./docs/videos/' question="Describe the video." \
num_video_frames_mmu=32image-leveltext
# image-level
python3 inference_mmu.py config=configs/showo2_7b_demo_432x432.yaml \
mmu_image_path=./docs/mmu/pexels-jane-pham-727419-1571673.jpg question='Describe the image in detail.'
python3 inference_mmu.py config=configs/showo2_7b_demo_432x432.yaml \
mmu_image_path=./docs/mmu/pexels-fotios-photos-2923436.jpg question='请告诉我图片中写着什么?'
python3 inference_mmu.py config=configs/showo2_7b_demo_432x432.yaml \
mmu_image_path=./docs/mmu/pexels-taryn-elliott-4144459.jpg question='How many avocados (including the halved) are in this image? Tell me how to make an avocado milkshake in detail.'
# video
python3 inference_mmu_vid.py config=configs/showo2_7b_demo_video_understanding.yaml \
mmu_video_path='./docs/videos/' question="Describe the video." \
num_video_frames_mmu=32
python3 inference_mmu_vid.py config=configs/showo2_1.5b_demo_video_understanding.yaml \
mmu_video_path='./docs/videos/' question="Describe the video." \
num_video_frames_mmu=32image-leveltext
# image-level
python3 inference_mmu.py config=configs/showo2_7b_demo_432x432.yaml \
mmu_image_path=./docs/mmu/pexels-jane-pham-727419-1571673.jpg question='Describe the image in detail.'
python3 inference_mmu.py config=configs/showo2_7b_demo_432x432.yaml \
mmu_image_path=./docs/mmu/pexels-fotios-photos-2923436.jpg question='请告诉我图片中写着什么?'
python3 inference_mmu.py config=configs/showo2_7b_demo_432x432.yaml \
mmu_image_path=./docs/mmu/pexels-taryn-elliott-4144459.jpg question='How many avocados (including the halved) are in this image? Tell me how to make an avocado milkshake in detail.'
# video
python3 inference_mmu_vid.py config=configs/showo2_7b_demo_video_understanding.yaml \
mmu_video_path='./docs/videos/' question="Describe the video." \
num_video_frames_mmu=32
python3 inference_mmu_vid.py config=configs/showo2_1.5b_demo_video_understanding.yaml \
mmu_video_path='./docs/videos/' question="Describe the video." \
num_video_frames_mmu=32image-leveltext
# image-level
python3 inference_mmu.py config=configs/showo2_7b_demo_432x432.yaml \
mmu_image_path=./docs/mmu/pexels-jane-pham-727419-1571673.jpg question='Describe the image in detail.'
python3 inference_mmu.py config=configs/showo2_7b_demo_432x432.yaml \
mmu_image_path=./docs/mmu/pexels-fotios-photos-2923436.jpg question='请告诉我图片中写着什么?'
python3 inference_mmu.py config=configs/showo2_7b_demo_432x432.yaml \
mmu_image_path=./docs/mmu/pexels-taryn-elliott-4144459.jpg question='How many avocados (including the halved) are in this image? Tell me how to make an avocado milkshake in detail.'
# video
python3 inference_mmu_vid.py config=configs/showo2_7b_demo_video_understanding.yaml \
mmu_video_path='./docs/videos/' question="Describe the video." \
num_video_frames_mmu=32
python3 inference_mmu_vid.py config=configs/showo2_1.5b_demo_video_understanding.yaml \
mmu_video_path='./docs/videos/' question="Describe the video." \
num_video_frames_mmu=32image-leveltext
# image-level
python3 inference_mmu.py config=configs/showo2_7b_demo_432x432.yaml \
mmu_image_path=./docs/mmu/pexels-jane-pham-727419-1571673.jpg question='Describe the image in detail.'
python3 inference_mmu.py config=configs/showo2_7b_demo_432x432.yaml \
mmu_image_path=./docs/mmu/pexels-fotios-photos-2923436.jpg question='请告诉我图片中写着什么?'
python3 inference_mmu.py config=configs/showo2_7b_demo_432x432.yaml \
mmu_image_path=./docs/mmu/pexels-taryn-elliott-4144459.jpg question='How many avocados (including the halved) are in this image? Tell me how to make an avocado milkshake in detail.'
# video
python3 inference_mmu_vid.py config=configs/showo2_7b_demo_video_understanding.yaml \
mmu_video_path='./docs/videos/' question="Describe the video." \
num_video_frames_mmu=32
python3 inference_mmu_vid.py config=configs/showo2_1.5b_demo_video_understanding.yaml \
mmu_video_path='./docs/videos/' question="Describe the video." \
num_video_frames_mmu=32image-leveltext
# image-level
python3 inference_mmu.py config=configs/showo2_7b_demo_432x432.yaml \
mmu_image_path=./docs/mmu/pexels-jane-pham-727419-1571673.jpg question='Describe the image in detail.'
python3 inference_mmu.py config=configs/showo2_7b_demo_432x432.yaml \
mmu_image_path=./docs/mmu/pexels-fotios-photos-2923436.jpg question='请告诉我图片中写着什么?'
python3 inference_mmu.py config=configs/showo2_7b_demo_432x432.yaml \
mmu_image_path=./docs/mmu/pexels-taryn-elliott-4144459.jpg question='How many avocados (including the halved) are in this image? Tell me how to make an avocado milkshake in detail.'
# video
python3 inference_mmu_vid.py config=configs/showo2_7b_demo_video_understanding.yaml \
mmu_video_path='./docs/videos/' question="Describe the video." \
num_video_frames_mmu=32
python3 inference_mmu_vid.py config=configs/showo2_1.5b_demo_video_understanding.yaml \
mmu_video_path='./docs/videos/' question="Describe the video." \
num_video_frames_mmu=32image-leveltext
# image-level
python3 inference_mmu.py config=configs/showo2_7b_demo_432x432.yaml \
mmu_image_path=./docs/mmu/pexels-jane-pham-727419-1571673.jpg question='Describe the image in detail.'
python3 inference_mmu.py config=configs/showo2_7b_demo_432x432.yaml \
mmu_image_path=./docs/mmu/pexels-fotios-photos-2923436.jpg question='请告诉我图片中写着什么?'
python3 inference_mmu.py config=configs/showo2_7b_demo_432x432.yaml \
mmu_image_path=./docs/mmu/pexels-taryn-elliott-4144459.jpg question='How many avocados (including the halved) are in this image? Tell me how to make an avocado milkshake in detail.'
# video
python3 inference_mmu_vid.py config=configs/showo2_7b_demo_video_understanding.yaml \
mmu_video_path='./docs/videos/' question="Describe the video." \
num_video_frames_mmu=32
python3 inference_mmu_vid.py config=configs/showo2_1.5b_demo_video_understanding.yaml \
mmu_video_path='./docs/videos/' question="Describe the video." \
num_video_frames_mmu=32image-leveltext
# image-level
python3 inference_mmu.py config=configs/showo2_7b_demo_432x432.yaml \
mmu_image_path=./docs/mmu/pexels-jane-pham-727419-1571673.jpg question='Describe the image in detail.'
python3 inference_mmu.py config=configs/showo2_7b_demo_432x432.yaml \
mmu_image_path=./docs/mmu/pexels-fotios-photos-2923436.jpg question='请告诉我图片中写着什么?'
python3 inference_mmu.py config=configs/showo2_7b_demo_432x432.yaml \
mmu_image_path=./docs/mmu/pexels-taryn-elliott-4144459.jpg question='How many avocados (including the halved) are in this image? Tell me how to make an avocado milkshake in detail.'
# video
python3 inference_mmu_vid.py config=configs/showo2_7b_demo_video_understanding.yaml \
mmu_video_path='./docs/videos/' question="Describe the video." \
num_video_frames_mmu=32
python3 inference_mmu_vid.py config=configs/showo2_1.5b_demo_video_understanding.yaml \
mmu_video_path='./docs/videos/' question="Describe the video." \
num_video_frames_mmu=32image-leveltext
# image-level
python3 inference_mmu.py config=configs/showo2_7b_demo_432x432.yaml \
mmu_image_path=./docs/mmu/pexels-jane-pham-727419-1571673.jpg question='Describe the image in detail.'
python3 inference_mmu.py config=configs/showo2_7b_demo_432x432.yaml \
mmu_image_path=./docs/mmu/pexels-fotios-photos-2923436.jpg question='请告诉我图片中写着什么?'
python3 inference_mmu.py config=configs/showo2_7b_demo_432x432.yaml \
mmu_image_path=./docs/mmu/pexels-taryn-elliott-4144459.jpg question='How many avocados (including the halved) are in this image? Tell me how to make an avocado milkshake in detail.'
# video
python3 inference_mmu_vid.py config=configs/showo2_7b_demo_video_understanding.yaml \
mmu_video_path='./docs/videos/' question="Describe the video." \
num_video_frames_mmu=32
python3 inference_mmu_vid.py config=configs/showo2_1.5b_demo_video_understanding.yaml \
mmu_video_path='./docs/videos/' question="Describe the video." \
num_video_frames_mmu=32image-leveltext
# image-level
python3 inference_mmu.py config=configs/showo2_7b_demo_432x432.yaml \
mmu_image_path=./docs/mmu/pexels-jane-pham-727419-1571673.jpg question='Describe the image in detail.'
python3 inference_mmu.py config=configs/showo2_7b_demo_432x432.yaml \
mmu_image_path=./docs/mmu/pexels-fotios-photos-2923436.jpg question='请告诉我图片中写着什么?'
python3 inference_mmu.py config=configs/showo2_7b_demo_432x432.yaml \
mmu_image_path=./docs/mmu/pexels-taryn-elliott-4144459.jpg question='How many avocados (including the halved) are in this image? Tell me how to make an avocado milkshake in detail.'
# video
python3 inference_mmu_vid.py config=configs/showo2_7b_demo_video_understanding.yaml \
mmu_video_path='./docs/videos/' question="Describe the video." \
num_video_frames_mmu=32
python3 inference_mmu_vid.py config=configs/showo2_1.5b_demo_video_understanding.yaml \
mmu_video_path='./docs/videos/' question="Describe the video." \
num_video_frames_mmu=32image-leveltext
# image-level
python3 inference_mmu.py config=configs/showo2_7b_demo_432x432.yaml \
mmu_image_path=./docs/mmu/pexels-jane-pham-727419-1571673.jpg question='Describe the image in detail.'
python3 inference_mmu.py config=configs/showo2_7b_demo_432x432.yaml \
mmu_image_path=./docs/mmu/pexels-fotios-photos-2923436.jpg question='请告诉我图片中写着什么?'
python3 inference_mmu.py config=configs/showo2_7b_demo_432x432.yaml \
mmu_image_path=./docs/mmu/pexels-taryn-elliott-4144459.jpg question='How many avocados (including the halved) are in this image? Tell me how to make an avocado milkshake in detail.'
# video
python3 inference_mmu_vid.py config=configs/showo2_7b_demo_video_understanding.yaml \
mmu_video_path='./docs/videos/' question="Describe the video." \
num_video_frames_mmu=32
python3 inference_mmu_vid.py config=configs/showo2_1.5b_demo_video_understanding.yaml \
mmu_video_path='./docs/videos/' question="Describe the video." \
num_video_frames_mmu=32image-leveltext
# image-level
python3 inference_mmu.py config=configs/showo2_7b_demo_432x432.yaml \
mmu_image_path=./docs/mmu/pexels-jane-pham-727419-1571673.jpg question='Describe the image in detail.'
python3 inference_mmu.py config=configs/showo2_7b_demo_432x432.yaml \
mmu_image_path=./docs/mmu/pexels-fotios-photos-2923436.jpg question='请告诉我图片中写着什么?'
python3 inference_mmu.py config=configs/showo2_7b_demo_432x432.yaml \
mmu_image_path=./docs/mmu/pexels-taryn-elliott-4144459.jpg question='How many avocados (including the halved) are in this image? Tell me how to make an avocado milkshake in detail.'
# video
python3 inference_mmu_vid.py config=configs/showo2_7b_demo_video_understanding.yaml \
mmu_video_path='./docs/videos/' question="Describe the video." \
num_video_frames_mmu=32
python3 inference_mmu_vid.py config=configs/showo2_1.5b_demo_video_understanding.yaml \
mmu_video_path='./docs/videos/' question="Describe the video." \
num_video_frames_mmu=32image-leveltext
# image-level
python3 inference_mmu.py config=configs/showo2_7b_demo_432x432.yaml \
mmu_image_path=./docs/mmu/pexels-jane-pham-727419-1571673.jpg question='Describe the image in detail.'
python3 inference_mmu.py config=configs/showo2_7b_demo_432x432.yaml \
mmu_image_path=./docs/mmu/pexels-fotios-photos-2923436.jpg question='请告诉我图片中写着什么?'
python3 inference_mmu.py config=configs/showo2_7b_demo_432x432.yaml \
mmu_image_path=./docs/mmu/pexels-taryn-elliott-4144459.jpg question='How many avocados (including the halved) are in this image? Tell me how to make an avocado milkshake in detail.'
# video
python3 inference_mmu_vid.py config=configs/showo2_7b_demo_video_understanding.yaml \
mmu_video_path='./docs/videos/' question="Describe the video." \
num_video_frames_mmu=32
python3 inference_mmu_vid.py config=configs/showo2_1.5b_demo_video_understanding.yaml \
mmu_video_path='./docs/videos/' question="Describe the video." \
num_video_frames_mmu=32image-leveltext
# image-level
python3 inference_mmu.py config=configs/showo2_7b_demo_432x432.yaml \
mmu_image_path=./docs/mmu/pexels-jane-pham-727419-1571673.jpg question='Describe the image in detail.'
python3 inference_mmu.py config=configs/showo2_7b_demo_432x432.yaml \
mmu_image_path=./docs/mmu/pexels-fotios-photos-2923436.jpg question='请告诉我图片中写着什么?'
python3 inference_mmu.py config=configs/showo2_7b_demo_432x432.yaml \
mmu_image_path=./docs/mmu/pexels-taryn-elliott-4144459.jpg question='How many avocados (including the halved) are in this image? Tell me how to make an avocado milkshake in detail.'
# video
python3 inference_mmu_vid.py config=configs/showo2_7b_demo_video_understanding.yaml \
mmu_video_path='./docs/videos/' question="Describe the video." \
num_video_frames_mmu=32
python3 inference_mmu_vid.py config=configs/showo2_1.5b_demo_video_understanding.yaml \
mmu_video_path='./docs/videos/' question="Describe the video." \
num_video_frames_mmu=32image-leveltext
# image-level
python3 inference_mmu.py config=configs/showo2_7b_demo_432x432.yaml \
mmu_image_path=./docs/mmu/pexels-jane-pham-727419-1571673.jpg question='Describe the image in detail.'
python3 inference_mmu.py config=configs/showo2_7b_demo_432x432.yaml \
mmu_image_path=./docs/mmu/pexels-fotios-photos-2923436.jpg question='请告诉我图片中写着什么?'
python3 inference_mmu.py config=configs/showo2_7b_demo_432x432.yaml \
mmu_image_path=./docs/mmu/pexels-taryn-elliott-4144459.jpg question='How many avocados (including the halved) are in this image? Tell me how to make an avocado milkshake in detail.'
# video
python3 inference_mmu_vid.py config=configs/showo2_7b_demo_video_understanding.yaml \
mmu_video_path='./docs/videos/' question="Describe the video." \
num_video_frames_mmu=32
python3 inference_mmu_vid.py config=configs/showo2_1.5b_demo_video_understanding.yaml \
mmu_video_path='./docs/videos/' question="Describe the video." \
num_video_frames_mmu=32image-leveltext
# image-level
python3 inference_mmu.py config=configs/showo2_7b_demo_432x432.yaml \
mmu_image_path=./docs/mmu/pexels-jane-pham-727419-1571673.jpg question='Describe the image in detail.'
python3 inference_mmu.py config=configs/showo2_7b_demo_432x432.yaml \
mmu_image_path=./docs/mmu/pexels-fotios-photos-2923436.jpg question='请告诉我图片中写着什么?'
python3 inference_mmu.py config=configs/showo2_7b_demo_432x432.yaml \
mmu_image_path=./docs/mmu/pexels-taryn-elliott-4144459.jpg question='How many avocados (including the halved) are in this image? Tell me how to make an avocado milkshake in detail.'
# video
python3 inference_mmu_vid.py config=configs/showo2_7b_demo_video_understanding.yaml \
mmu_video_path='./docs/videos/' question="Describe the video." \
num_video_frames_mmu=32
python3 inference_mmu_vid.py config=configs/showo2_1.5b_demo_video_understanding.yaml \
mmu_video_path='./docs/videos/' question="Describe the video." \
num_video_frames_mmu=32image-leveltext
# image-level
python3 inference_mmu.py config=configs/showo2_7b_demo_432x432.yaml \
mmu_image_path=./docs/mmu/pexels-jane-pham-727419-1571673.jpg question='Describe the image in detail.'
python3 inference_mmu.py config=configs/showo2_7b_demo_432x432.yaml \
mmu_image_path=./docs/mmu/pexels-fotios-photos-2923436.jpg question='请告诉我图片中写着什么?'
python3 inference_mmu.py config=configs/showo2_7b_demo_432x432.yaml \
mmu_image_path=./docs/mmu/pexels-taryn-elliott-4144459.jpg question='How many avocados (including the halved) are in this image? Tell me how to make an avocado milkshake in detail.'
# video
python3 inference_mmu_vid.py config=configs/showo2_7b_demo_video_understanding.yaml \
mmu_video_path='./docs/videos/' question="Describe the video." \
num_video_frames_mmu=32
python3 inference_mmu_vid.py config=configs/showo2_1.5b_demo_video_understanding.yaml \
mmu_video_path='./docs/videos/' question="Describe the video." \
num_video_frames_mmu=32image-leveltext
# image-level
python3 inference_mmu.py config=configs/showo2_7b_demo_432x432.yaml \
mmu_image_path=./docs/mmu/pexels-jane-pham-727419-1571673.jpg question='Describe the image in detail.'
python3 inference_mmu.py config=configs/showo2_7b_demo_432x432.yaml \
mmu_image_path=./docs/mmu/pexels-fotios-photos-2923436.jpg question='请告诉我图片中写着什么?'
python3 inference_mmu.py config=configs/showo2_7b_demo_432x432.yaml \
mmu_image_path=./docs/mmu/pexels-taryn-elliott-4144459.jpg question='How many avocados (including the halved) are in this image? Tell me how to make an avocado milkshake in detail.'
# video
python3 inference_mmu_vid.py config=configs/showo2_7b_demo_video_understanding.yaml \
mmu_video_path='./docs/videos/' question="Describe the video." \
num_video_frames_mmu=32
python3 inference_mmu_vid.py config=configs/showo2_1.5b_demo_video_understanding.yaml \
mmu_video_path='./docs/videos/' question="Describe the video." \
num_video_frames_mmu=32text
python3 inference_t2i.py config=configs/showo2_1.5b_demo_1024x1024.yaml \
batch_size=4 guidance_scale=7.5 num_inference_steps=50;
python3 inference_t2i.py config=configs/showo2_1.5b_demo_512x512.yaml \
batch_size=4 guidance_scale=7.5 num_inference_steps=50;
python3 inference_t2i.py config=configs/showo2_1.5b_demo_432x432.yaml \
batch_size=4 guidance_scale=7.5 num_inference_steps=50;
python3 inference_t2i.py config=configs/showo2_7b_demo_432x432.yaml \
batch_size=4 guidance_scale=7.5 num_inference_steps=50;text
python3 inference_t2i.py config=configs/showo2_1.5b_demo_1024x1024.yaml \
batch_size=4 guidance_scale=7.5 num_inference_steps=50;
python3 inference_t2i.py config=configs/showo2_1.5b_demo_512x512.yaml \
batch_size=4 guidance_scale=7.5 num_inference_steps=50;
python3 inference_t2i.py config=configs/showo2_1.5b_demo_432x432.yaml \
batch_size=4 guidance_scale=7.5 num_inference_steps=50;
python3 inference_t2i.py config=configs/showo2_7b_demo_432x432.yaml \
batch_size=4 guidance_scale=7.5 num_inference_steps=50;text
python3 inference_t2i.py config=configs/showo2_1.5b_demo_1024x1024.yaml \
batch_size=4 guidance_scale=7.5 num_inference_steps=50;
python3 inference_t2i.py config=configs/showo2_1.5b_demo_512x512.yaml \
batch_size=4 guidance_scale=7.5 num_inference_steps=50;
python3 inference_t2i.py config=configs/showo2_1.5b_demo_432x432.yaml \
batch_size=4 guidance_scale=7.5 num_inference_steps=50;
python3 inference_t2i.py config=configs/showo2_7b_demo_432x432.yaml \
batch_size=4 guidance_scale=7.5 num_inference_steps=50;text
python3 inference_t2i.py config=configs/showo2_1.5b_demo_1024x1024.yaml \
batch_size=4 guidance_scale=7.5 num_inference_steps=50;
python3 inference_t2i.py config=configs/showo2_1.5b_demo_512x512.yaml \
batch_size=4 guidance_scale=7.5 num_inference_steps=50;
python3 inference_t2i.py config=configs/showo2_1.5b_demo_432x432.yaml \
batch_size=4 guidance_scale=7.5 num_inference_steps=50;
python3 inference_t2i.py config=configs/showo2_7b_demo_432x432.yaml \
batch_size=4 guidance_scale=7.5 num_inference_steps=50;text
python3 inference_t2i.py config=configs/showo2_1.5b_demo_1024x1024.yaml \
batch_size=4 guidance_scale=7.5 num_inference_steps=50;
python3 inference_t2i.py config=configs/showo2_1.5b_demo_512x512.yaml \
batch_size=4 guidance_scale=7.5 num_inference_steps=50;
python3 inference_t2i.py config=configs/showo2_1.5b_demo_432x432.yaml \
batch_size=4 guidance_scale=7.5 num_inference_steps=50;
python3 inference_t2i.py config=configs/showo2_7b_demo_432x432.yaml \
batch_size=4 guidance_scale=7.5 num_inference_steps=50;text
python3 inference_t2i.py config=configs/showo2_1.5b_demo_1024x1024.yaml \
batch_size=4 guidance_scale=7.5 num_inference_steps=50;
python3 inference_t2i.py config=configs/showo2_1.5b_demo_512x512.yaml \
batch_size=4 guidance_scale=7.5 num_inference_steps=50;
python3 inference_t2i.py config=configs/showo2_1.5b_demo_432x432.yaml \
batch_size=4 guidance_scale=7.5 num_inference_steps=50;
python3 inference_t2i.py config=configs/showo2_7b_demo_432x432.yaml \
batch_size=4 guidance_scale=7.5 num_inference_steps=50;text
python3 inference_t2i.py config=configs/showo2_1.5b_demo_1024x1024.yaml \
batch_size=4 guidance_scale=7.5 num_inference_steps=50;
python3 inference_t2i.py config=configs/showo2_1.5b_demo_512x512.yaml \
batch_size=4 guidance_scale=7.5 num_inference_steps=50;
python3 inference_t2i.py config=configs/showo2_1.5b_demo_432x432.yaml \
batch_size=4 guidance_scale=7.5 num_inference_steps=50;
python3 inference_t2i.py config=configs/showo2_7b_demo_432x432.yaml \
batch_size=4 guidance_scale=7.5 num_inference_steps=50;text
python3 inference_t2i.py config=configs/showo2_1.5b_demo_1024x1024.yaml \
batch_size=4 guidance_scale=7.5 num_inference_steps=50;
python3 inference_t2i.py config=configs/showo2_1.5b_demo_512x512.yaml \
batch_size=4 guidance_scale=7.5 num_inference_steps=50;
python3 inference_t2i.py config=configs/showo2_1.5b_demo_432x432.yaml \
batch_size=4 guidance_scale=7.5 num_inference_steps=50;
python3 inference_t2i.py config=configs/showo2_7b_demo_432x432.yaml \
batch_size=4 guidance_scale=7.5 num_inference_steps=50;text
python3 inference_t2i.py config=configs/showo2_1.5b_demo_1024x1024.yaml \
batch_size=4 guidance_scale=7.5 num_inference_steps=50;
python3 inference_t2i.py config=configs/showo2_1.5b_demo_512x512.yaml \
batch_size=4 guidance_scale=7.5 num_inference_steps=50;
python3 inference_t2i.py config=configs/showo2_1.5b_demo_432x432.yaml \
batch_size=4 guidance_scale=7.5 num_inference_steps=50;
python3 inference_t2i.py config=configs/showo2_7b_demo_432x432.yaml \
batch_size=4 guidance_scale=7.5 num_inference_steps=50;text
python3 inference_t2i.py config=configs/showo2_1.5b_demo_1024x1024.yaml \
batch_size=4 guidance_scale=7.5 num_inference_steps=50;
python3 inference_t2i.py config=configs/showo2_1.5b_demo_512x512.yaml \
batch_size=4 guidance_scale=7.5 num_inference_steps=50;
python3 inference_t2i.py config=configs/showo2_1.5b_demo_432x432.yaml \
batch_size=4 guidance_scale=7.5 num_inference_steps=50;
python3 inference_t2i.py config=configs/showo2_7b_demo_432x432.yaml \
batch_size=4 guidance_scale=7.5 num_inference_steps=50;text
python3 inference_t2i.py config=configs/showo2_1.5b_demo_1024x1024.yaml \
batch_size=4 guidance_scale=7.5 num_inference_steps=50;
python3 inference_t2i.py config=configs/showo2_1.5b_demo_512x512.yaml \
batch_size=4 guidance_scale=7.5 num_inference_steps=50;
python3 inference_t2i.py config=configs/showo2_1.5b_demo_432x432.yaml \
batch_size=4 guidance_scale=7.5 num_inference_steps=50;
python3 inference_t2i.py config=configs/showo2_7b_demo_432x432.yaml \
batch_size=4 guidance_scale=7.5 num_inference_steps=50;text
python3 inference_t2i.py config=configs/showo2_1.5b_demo_1024x1024.yaml \
batch_size=4 guidance_scale=7.5 num_inference_steps=50;
python3 inference_t2i.py config=configs/showo2_1.5b_demo_512x512.yaml \
batch_size=4 guidance_scale=7.5 num_inference_steps=50;
python3 inference_t2i.py config=configs/showo2_1.5b_demo_432x432.yaml \
batch_size=4 guidance_scale=7.5 num_inference_steps=50;
python3 inference_t2i.py config=configs/showo2_7b_demo_432x432.yaml \
batch_size=4 guidance_scale=7.5 num_inference_steps=50;text
python3 inference_t2i.py config=configs/showo2_1.5b_demo_1024x1024.yaml \
batch_size=4 guidance_scale=7.5 num_inference_steps=50;
python3 inference_t2i.py config=configs/showo2_1.5b_demo_512x512.yaml \
batch_size=4 guidance_scale=7.5 num_inference_steps=50;
python3 inference_t2i.py config=configs/showo2_1.5b_demo_432x432.yaml \
batch_size=4 guidance_scale=7.5 num_inference_steps=50;
python3 inference_t2i.py config=configs/showo2_7b_demo_432x432.yaml \
batch_size=4 guidance_scale=7.5 num_inference_steps=50;text
python3 inference_t2i.py config=configs/showo2_1.5b_demo_1024x1024.yaml \
batch_size=4 guidance_scale=7.5 num_inference_steps=50;
python3 inference_t2i.py config=configs/showo2_1.5b_demo_512x512.yaml \
batch_size=4 guidance_scale=7.5 num_inference_steps=50;
python3 inference_t2i.py config=configs/showo2_1.5b_demo_432x432.yaml \
batch_size=4 guidance_scale=7.5 num_inference_steps=50;
python3 inference_t2i.py config=configs/showo2_7b_demo_432x432.yaml \
batch_size=4 guidance_scale=7.5 num_inference_steps=50;text
python3 inference_t2i.py config=configs/showo2_1.5b_demo_1024x1024.yaml \
batch_size=4 guidance_scale=7.5 num_inference_steps=50;
python3 inference_t2i.py config=configs/showo2_1.5b_demo_512x512.yaml \
batch_size=4 guidance_scale=7.5 num_inference_steps=50;
python3 inference_t2i.py config=configs/showo2_1.5b_demo_432x432.yaml \
batch_size=4 guidance_scale=7.5 num_inference_steps=50;
python3 inference_t2i.py config=configs/showo2_7b_demo_432x432.yaml \
batch_size=4 guidance_scale=7.5 num_inference_steps=50;text
python3 inference_t2i.py config=configs/showo2_1.5b_demo_1024x1024.yaml \
batch_size=4 guidance_scale=7.5 num_inference_steps=50;
python3 inference_t2i.py config=configs/showo2_1.5b_demo_512x512.yaml \
batch_size=4 guidance_scale=7.5 num_inference_steps=50;
python3 inference_t2i.py config=configs/showo2_1.5b_demo_432x432.yaml \
batch_size=4 guidance_scale=7.5 num_inference_steps=50;
python3 inference_t2i.py config=configs/showo2_7b_demo_432x432.yaml \
batch_size=4 guidance_scale=7.5 num_inference_steps=50;text
python3 inference_t2i.py config=configs/showo2_1.5b_demo_1024x1024.yaml \
batch_size=4 guidance_scale=7.5 num_inference_steps=50;
python3 inference_t2i.py config=configs/showo2_1.5b_demo_512x512.yaml \
batch_size=4 guidance_scale=7.5 num_inference_steps=50;
python3 inference_t2i.py config=configs/showo2_1.5b_demo_432x432.yaml \
batch_size=4 guidance_scale=7.5 num_inference_steps=50;
python3 inference_t2i.py config=configs/showo2_7b_demo_432x432.yaml \
batch_size=4 guidance_scale=7.5 num_inference_steps=50;text
python3 inference_t2i.py config=configs/showo2_1.5b_demo_1024x1024.yaml \
batch_size=4 guidance_scale=7.5 num_inference_steps=50;
python3 inference_t2i.py config=configs/showo2_1.5b_demo_512x512.yaml \
batch_size=4 guidance_scale=7.5 num_inference_steps=50;
python3 inference_t2i.py config=configs/showo2_1.5b_demo_432x432.yaml \
batch_size=4 guidance_scale=7.5 num_inference_steps=50;
python3 inference_t2i.py config=configs/showo2_7b_demo_432x432.yaml \
batch_size=4 guidance_scale=7.5 num_inference_steps=50;text
python3 inference_t2i.py config=configs/showo2_1.5b_demo_1024x1024.yaml \
batch_size=4 guidance_scale=7.5 num_inference_steps=50;
python3 inference_t2i.py config=configs/showo2_1.5b_demo_512x512.yaml \
batch_size=4 guidance_scale=7.5 num_inference_steps=50;
python3 inference_t2i.py config=configs/showo2_1.5b_demo_432x432.yaml \
batch_size=4 guidance_scale=7.5 num_inference_steps=50;
python3 inference_t2i.py config=configs/showo2_7b_demo_432x432.yaml \
batch_size=4 guidance_scale=7.5 num_inference_steps=50;text
python3 inference_t2i.py config=configs/showo2_1.5b_demo_1024x1024.yaml \
batch_size=4 guidance_scale=7.5 num_inference_steps=50;
python3 inference_t2i.py config=configs/showo2_1.5b_demo_512x512.yaml \
batch_size=4 guidance_scale=7.5 num_inference_steps=50;
python3 inference_t2i.py config=configs/showo2_1.5b_demo_432x432.yaml \
batch_size=4 guidance_scale=7.5 num_inference_steps=50;
python3 inference_t2i.py config=configs/showo2_7b_demo_432x432.yaml \
batch_size=4 guidance_scale=7.5 num_inference_steps=50;text
python3 inference_t2i.py config=configs/showo2_1.5b_demo_1024x1024.yaml \
batch_size=4 guidance_scale=7.5 num_inference_steps=50;
python3 inference_t2i.py config=configs/showo2_1.5b_demo_512x512.yaml \
batch_size=4 guidance_scale=7.5 num_inference_steps=50;
python3 inference_t2i.py config=configs/showo2_1.5b_demo_432x432.yaml \
batch_size=4 guidance_scale=7.5 num_inference_steps=50;
python3 inference_t2i.py config=configs/showo2_7b_demo_432x432.yaml \
batch_size=4 guidance_scale=7.5 num_inference_steps=50;text
python3 inference_t2i.py config=configs/showo2_1.5b_demo_1024x1024.yaml \
batch_size=4 guidance_scale=7.5 num_inference_steps=50;
python3 inference_t2i.py config=configs/showo2_1.5b_demo_512x512.yaml \
batch_size=4 guidance_scale=7.5 num_inference_steps=50;
python3 inference_t2i.py config=configs/showo2_1.5b_demo_432x432.yaml \
batch_size=4 guidance_scale=7.5 num_inference_steps=50;
python3 inference_t2i.py config=configs/showo2_7b_demo_432x432.yaml \
batch_size=4 guidance_scale=7.5 num_inference_steps=50;text
python3 inference_t2i.py config=configs/showo2_1.5b_demo_1024x1024.yaml \
batch_size=4 guidance_scale=7.5 num_inference_steps=50;
python3 inference_t2i.py config=configs/showo2_1.5b_demo_512x512.yaml \
batch_size=4 guidance_scale=7.5 num_inference_steps=50;
python3 inference_t2i.py config=configs/showo2_1.5b_demo_432x432.yaml \
batch_size=4 guidance_scale=7.5 num_inference_steps=50;
python3 inference_t2i.py config=configs/showo2_7b_demo_432x432.yaml \
batch_size=4 guidance_scale=7.5 num_inference_steps=50;text
python3 inference_t2i.py config=configs/showo2_1.5b_demo_1024x1024.yaml \
batch_size=4 guidance_scale=7.5 num_inference_steps=50;
python3 inference_t2i.py config=configs/showo2_1.5b_demo_512x512.yaml \
batch_size=4 guidance_scale=7.5 num_inference_steps=50;
python3 inference_t2i.py config=configs/showo2_1.5b_demo_432x432.yaml \
batch_size=4 guidance_scale=7.5 num_inference_steps=50;
python3 inference_t2i.py config=configs/showo2_7b_demo_432x432.yaml \
batch_size=4 guidance_scale=7.5 num_inference_steps=50;text
python3 inference_t2i.py config=configs/showo2_1.5b_demo_1024x1024.yaml \
batch_size=4 guidance_scale=7.5 num_inference_steps=50;
python3 inference_t2i.py config=configs/showo2_1.5b_demo_512x512.yaml \
batch_size=4 guidance_scale=7.5 num_inference_steps=50;
python3 inference_t2i.py config=configs/showo2_1.5b_demo_432x432.yaml \
batch_size=4 guidance_scale=7.5 num_inference_steps=50;
python3 inference_t2i.py config=configs/showo2_7b_demo_432x432.yaml \
batch_size=4 guidance_scale=7.5 num_inference_steps=50;text
python3 inference_t2i.py config=configs/showo2_1.5b_demo_1024x1024.yaml \
batch_size=4 guidance_scale=7.5 num_inference_steps=50;
python3 inference_t2i.py config=configs/showo2_1.5b_demo_512x512.yaml \
batch_size=4 guidance_scale=7.5 num_inference_steps=50;
python3 inference_t2i.py config=configs/showo2_1.5b_demo_432x432.yaml \
batch_size=4 guidance_scale=7.5 num_inference_steps=50;
python3 inference_t2i.py config=configs/showo2_7b_demo_432x432.yaml \
batch_size=4 guidance_scale=7.5 num_inference_steps=50;text
python3 inference_t2i.py config=configs/showo2_1.5b_demo_1024x1024.yaml \
batch_size=4 guidance_scale=7.5 num_inference_steps=50;
python3 inference_t2i.py config=configs/showo2_1.5b_demo_512x512.yaml \
batch_size=4 guidance_scale=7.5 num_inference_steps=50;
python3 inference_t2i.py config=configs/showo2_1.5b_demo_432x432.yaml \
batch_size=4 guidance_scale=7.5 num_inference_steps=50;
python3 inference_t2i.py config=configs/showo2_7b_demo_432x432.yaml \
batch_size=4 guidance_scale=7.5 num_inference_steps=50;text
python3 inference_t2i.py config=configs/showo2_1.5b_demo_1024x1024.yaml \
batch_size=4 guidance_scale=7.5 num_inference_steps=50;
python3 inference_t2i.py config=configs/showo2_1.5b_demo_512x512.yaml \
batch_size=4 guidance_scale=7.5 num_inference_steps=50;
python3 inference_t2i.py config=configs/showo2_1.5b_demo_432x432.yaml \
batch_size=4 guidance_scale=7.5 num_inference_steps=50;
python3 inference_t2i.py config=configs/showo2_7b_demo_432x432.yaml \
batch_size=4 guidance_scale=7.5 num_inference_steps=50;text
python3 inference_t2i.py config=configs/showo2_1.5b_demo_1024x1024.yaml \
batch_size=4 guidance_scale=7.5 num_inference_steps=50;
python3 inference_t2i.py config=configs/showo2_1.5b_demo_512x512.yaml \
batch_size=4 guidance_scale=7.5 num_inference_steps=50;
python3 inference_t2i.py config=configs/showo2_1.5b_demo_432x432.yaml \
batch_size=4 guidance_scale=7.5 num_inference_steps=50;
python3 inference_t2i.py config=configs/showo2_7b_demo_432x432.yaml \
batch_size=4 guidance_scale=7.5 num_inference_steps=50;text
python3 inference_t2i.py config=configs/showo2_1.5b_demo_1024x1024.yaml \
batch_size=4 guidance_scale=7.5 num_inference_steps=50;
python3 inference_t2i.py config=configs/showo2_1.5b_demo_512x512.yaml \
batch_size=4 guidance_scale=7.5 num_inference_steps=50;
python3 inference_t2i.py config=configs/showo2_1.5b_demo_432x432.yaml \
batch_size=4 guidance_scale=7.5 num_inference_steps=50;
python3 inference_t2i.py config=configs/showo2_7b_demo_432x432.yaml \
batch_size=4 guidance_scale=7.5 num_inference_steps=50;text
python3 inference_t2i.py config=configs/showo2_1.5b_demo_1024x1024.yaml \
batch_size=4 guidance_scale=7.5 num_inference_steps=50;
python3 inference_t2i.py config=configs/showo2_1.5b_demo_512x512.yaml \
batch_size=4 guidance_scale=7.5 num_inference_steps=50;
python3 inference_t2i.py config=configs/showo2_1.5b_demo_432x432.yaml \
batch_size=4 guidance_scale=7.5 num_inference_steps=50;
python3 inference_t2i.py config=configs/showo2_7b_demo_432x432.yaml \
batch_size=4 guidance_scale=7.5 num_inference_steps=50;text
python3 inference_t2i.py config=configs/showo2_1.5b_demo_1024x1024.yaml \
batch_size=4 guidance_scale=7.5 num_inference_steps=50;
python3 inference_t2i.py config=configs/showo2_1.5b_demo_512x512.yaml \
batch_size=4 guidance_scale=7.5 num_inference_steps=50;
python3 inference_t2i.py config=configs/showo2_1.5b_demo_432x432.yaml \
batch_size=4 guidance_scale=7.5 num_inference_steps=50;
python3 inference_t2i.py config=configs/showo2_7b_demo_432x432.yaml \
batch_size=4 guidance_scale=7.5 num_inference_steps=50;text
python3 inference_t2i.py config=configs/showo2_1.5b_demo_1024x1024.yaml \
batch_size=4 guidance_scale=7.5 num_inference_steps=50;
python3 inference_t2i.py config=configs/showo2_1.5b_demo_512x512.yaml \
batch_size=4 guidance_scale=7.5 num_inference_steps=50;
python3 inference_t2i.py config=configs/showo2_1.5b_demo_432x432.yaml \
batch_size=4 guidance_scale=7.5 num_inference_steps=50;
python3 inference_t2i.py config=configs/showo2_7b_demo_432x432.yaml \
batch_size=4 guidance_scale=7.5 num_inference_steps=50;text
python3 inference_t2i.py config=configs/showo2_1.5b_demo_1024x1024.yaml \
batch_size=4 guidance_scale=7.5 num_inference_steps=50;
python3 inference_t2i.py config=configs/showo2_1.5b_demo_512x512.yaml \
batch_size=4 guidance_scale=7.5 num_inference_steps=50;
python3 inference_t2i.py config=configs/showo2_1.5b_demo_432x432.yaml \
batch_size=4 guidance_scale=7.5 num_inference_steps=50;
python3 inference_t2i.py config=configs/showo2_7b_demo_432x432.yaml \
batch_size=4 guidance_scale=7.5 num_inference_steps=50;text
python3 inference_t2i.py config=configs/showo2_1.5b_demo_1024x1024.yaml \
batch_size=4 guidance_scale=7.5 num_inference_steps=50;
python3 inference_t2i.py config=configs/showo2_1.5b_demo_512x512.yaml \
batch_size=4 guidance_scale=7.5 num_inference_steps=50;
python3 inference_t2i.py config=configs/showo2_1.5b_demo_432x432.yaml \
batch_size=4 guidance_scale=7.5 num_inference_steps=50;
python3 inference_t2i.py config=configs/showo2_7b_demo_432x432.yaml \
batch_size=4 guidance_scale=7.5 num_inference_steps=50;text
python3 inference_t2i.py config=configs/showo2_1.5b_demo_1024x1024.yaml \
batch_size=4 guidance_scale=7.5 num_inference_steps=50;
python3 inference_t2i.py config=configs/showo2_1.5b_demo_512x512.yaml \
batch_size=4 guidance_scale=7.5 num_inference_steps=50;
python3 inference_t2i.py config=configs/showo2_1.5b_demo_432x432.yaml \
batch_size=4 guidance_scale=7.5 num_inference_steps=50;
python3 inference_t2i.py config=configs/showo2_7b_demo_432x432.yaml \
batch_size=4 guidance_scale=7.5 num_inference_steps=50;text
python3 inference_t2i.py config=configs/showo2_1.5b_demo_1024x1024.yaml \
batch_size=4 guidance_scale=7.5 num_inference_steps=50;
python3 inference_t2i.py config=configs/showo2_1.5b_demo_512x512.yaml \
batch_size=4 guidance_scale=7.5 num_inference_steps=50;
python3 inference_t2i.py config=configs/showo2_1.5b_demo_432x432.yaml \
batch_size=4 guidance_scale=7.5 num_inference_steps=50;
python3 inference_t2i.py config=configs/showo2_7b_demo_432x432.yaml \
batch_size=4 guidance_scale=7.5 num_inference_steps=50;text
python3 inference_t2i.py config=configs/showo2_1.5b_demo_1024x1024.yaml \
batch_size=4 guidance_scale=7.5 num_inference_steps=50;
python3 inference_t2i.py config=configs/showo2_1.5b_demo_512x512.yaml \
batch_size=4 guidance_scale=7.5 num_inference_steps=50;
python3 inference_t2i.py config=configs/showo2_1.5b_demo_432x432.yaml \
batch_size=4 guidance_scale=7.5 num_inference_steps=50;
python3 inference_t2i.py config=configs/showo2_7b_demo_432x432.yaml \
batch_size=4 guidance_scale=7.5 num_inference_steps=50;text
python3 inference_t2i.py config=configs/showo2_1.5b_demo_1024x1024.yaml \
batch_size=4 guidance_scale=7.5 num_inference_steps=50;
python3 inference_t2i.py config=configs/showo2_1.5b_demo_512x512.yaml \
batch_size=4 guidance_scale=7.5 num_inference_steps=50;
python3 inference_t2i.py config=configs/showo2_1.5b_demo_432x432.yaml \
batch_size=4 guidance_scale=7.5 num_inference_steps=50;
python3 inference_t2i.py config=configs/showo2_7b_demo_432x432.yaml \
batch_size=4 guidance_scale=7.5 num_inference_steps=50;text
python3 inference_t2i.py config=configs/showo2_1.5b_demo_1024x1024.yaml \
batch_size=4 guidance_scale=7.5 num_inference_steps=50;
python3 inference_t2i.py config=configs/showo2_1.5b_demo_512x512.yaml \
batch_size=4 guidance_scale=7.5 num_inference_steps=50;
python3 inference_t2i.py config=configs/showo2_1.5b_demo_432x432.yaml \
batch_size=4 guidance_scale=7.5 num_inference_steps=50;
python3 inference_t2i.py config=configs/showo2_7b_demo_432x432.yaml \
batch_size=4 guidance_scale=7.5 num_inference_steps=50;text
python3 inference_t2i.py config=configs/showo2_1.5b_demo_1024x1024.yaml \
batch_size=4 guidance_scale=7.5 num_inference_steps=50;
python3 inference_t2i.py config=configs/showo2_1.5b_demo_512x512.yaml \
batch_size=4 guidance_scale=7.5 num_inference_steps=50;
python3 inference_t2i.py config=configs/showo2_1.5b_demo_432x432.yaml \
batch_size=4 guidance_scale=7.5 num_inference_steps=50;
python3 inference_t2i.py config=configs/showo2_7b_demo_432x432.yaml \
batch_size=4 guidance_scale=7.5 num_inference_steps=50;text
python3 inference_t2i.py config=configs/showo2_1.5b_demo_1024x1024.yaml \
batch_size=4 guidance_scale=7.5 num_inference_steps=50;
python3 inference_t2i.py config=configs/showo2_1.5b_demo_512x512.yaml \
batch_size=4 guidance_scale=7.5 num_inference_steps=50;
python3 inference_t2i.py config=configs/showo2_1.5b_demo_432x432.yaml \
batch_size=4 guidance_scale=7.5 num_inference_steps=50;
python3 inference_t2i.py config=configs/showo2_7b_demo_432x432.yaml \
batch_size=4 guidance_scale=7.5 num_inference_steps=50;text
python3 inference_t2i.py config=configs/showo2_1.5b_demo_1024x1024.yaml \
batch_size=4 guidance_scale=7.5 num_inference_steps=50;
python3 inference_t2i.py config=configs/showo2_1.5b_demo_512x512.yaml \
batch_size=4 guidance_scale=7.5 num_inference_steps=50;
python3 inference_t2i.py config=configs/showo2_1.5b_demo_432x432.yaml \
batch_size=4 guidance_scale=7.5 num_inference_steps=50;
python3 inference_t2i.py config=configs/showo2_7b_demo_432x432.yaml \
batch_size=4 guidance_scale=7.5 num_inference_steps=50;text
python3 inference_t2i.py config=configs/showo2_1.5b_demo_1024x1024.yaml \
batch_size=4 guidance_scale=7.5 num_inference_steps=50;
python3 inference_t2i.py config=configs/showo2_1.5b_demo_512x512.yaml \
batch_size=4 guidance_scale=7.5 num_inference_steps=50;
python3 inference_t2i.py config=configs/showo2_1.5b_demo_432x432.yaml \
batch_size=4 guidance_scale=7.5 num_inference_steps=50;
python3 inference_t2i.py config=configs/showo2_7b_demo_432x432.yaml \
batch_size=4 guidance_scale=7.5 num_inference_steps=50;text
python3 inference_t2i.py config=configs/showo2_1.5b_demo_1024x1024.yaml \
batch_size=4 guidance_scale=7.5 num_inference_steps=50;
python3 inference_t2i.py config=configs/showo2_1.5b_demo_512x512.yaml \
batch_size=4 guidance_scale=7.5 num_inference_steps=50;
python3 inference_t2i.py config=configs/showo2_1.5b_demo_432x432.yaml \
batch_size=4 guidance_scale=7.5 num_inference_steps=50;
python3 inference_t2i.py config=configs/showo2_7b_demo_432x432.yaml \
batch_size=4 guidance_scale=7.5 num_inference_steps=50;text
python3 inference_t2i.py config=configs/showo2_1.5b_demo_1024x1024.yaml \
batch_size=4 guidance_scale=7.5 num_inference_steps=50;
python3 inference_t2i.py config=configs/showo2_1.5b_demo_512x512.yaml \
batch_size=4 guidance_scale=7.5 num_inference_steps=50;
python3 inference_t2i.py config=configs/showo2_1.5b_demo_432x432.yaml \
batch_size=4 guidance_scale=7.5 num_inference_steps=50;
python3 inference_t2i.py config=configs/showo2_7b_demo_432x432.yaml \
batch_size=4 guidance_scale=7.5 num_inference_steps=50;text
python3 inference_t2i.py config=configs/showo2_1.5b_demo_1024x1024.yaml \
batch_size=4 guidance_scale=7.5 num_inference_steps=50;
python3 inference_t2i.py config=configs/showo2_1.5b_demo_512x512.yaml \
batch_size=4 guidance_scale=7.5 num_inference_steps=50;
python3 inference_t2i.py config=configs/showo2_1.5b_demo_432x432.yaml \
batch_size=4 guidance_scale=7.5 num_inference_steps=50;
python3 inference_t2i.py config=configs/showo2_7b_demo_432x432.yaml \
batch_size=4 guidance_scale=7.5 num_inference_steps=50;Citationtext
@article{xie2025showo2,
title={Show-o2: Improved Native Unified Multimodal Models},
author={Xie, Jinheng and Yang, Zhenheng and Shou, Mike Zheng},
journal={arXiv preprint},
year={2025}
}Citationtext
@article{xie2025showo2,
title={Show-o2: Improved Native Unified Multimodal Models},
author={Xie, Jinheng and Yang, Zhenheng and Shou, Mike Zheng},
journal={arXiv preprint},
year={2025}
}Citationtext
@article{xie2025showo2,
title={Show-o2: Improved Native Unified Multimodal Models},
author={Xie, Jinheng and Yang, Zhenheng and Shou, Mike Zheng},
journal={arXiv preprint},
year={2025}
}Citationtext
@article{xie2025showo2,
title={Show-o2: Improved Native Unified Multimodal Models},
author={Xie, Jinheng and Yang, Zhenheng and Shou, Mike Zheng},
journal={arXiv preprint},
year={2025}
}Citationtext
@article{xie2025showo2,
title={Show-o2: Improved Native Unified Multimodal Models},
author={Xie, Jinheng and Yang, Zhenheng and Shou, Mike Zheng},
journal={arXiv preprint},
year={2025}
}Citationtext
@article{xie2025showo2,
title={Show-o2: Improved Native Unified Multimodal Models},
author={Xie, Jinheng and Yang, Zhenheng and Shou, Mike Zheng},
journal={arXiv preprint},
year={2025}
}Citationtext
@article{xie2025showo2,
title={Show-o2: Improved Native Unified Multimodal Models},
author={Xie, Jinheng and Yang, Zhenheng and Shou, Mike Zheng},
journal={arXiv preprint},
year={2025}
}Citationtext
@article{xie2025showo2,
title={Show-o2: Improved Native Unified Multimodal Models},
author={Xie, Jinheng and Yang, Zhenheng and Shou, Mike Zheng},
journal={arXiv preprint},
year={2025}
}Citationtext
@article{xie2025showo2,
title={Show-o2: Improved Native Unified Multimodal Models},
author={Xie, Jinheng and Yang, Zhenheng and Shou, Mike Zheng},
journal={arXiv preprint},
year={2025}
}Citationtext
@article{xie2025showo2,
title={Show-o2: Improved Native Unified Multimodal Models},
author={Xie, Jinheng and Yang, Zhenheng and Shou, Mike Zheng},
journal={arXiv preprint},
year={2025}
}Citationtext
@article{xie2025showo2,
title={Show-o2: Improved Native Unified Multimodal Models},
author={Xie, Jinheng and Yang, Zhenheng and Shou, Mike Zheng},
journal={arXiv preprint},
year={2025}
}Citationtext
@article{xie2025showo2,
title={Show-o2: Improved Native Unified Multimodal Models},
author={Xie, Jinheng and Yang, Zhenheng and Shou, Mike Zheng},
journal={arXiv preprint},
year={2025}
}Citationtext
@article{xie2025showo2,
title={Show-o2: Improved Native Unified Multimodal Models},
author={Xie, Jinheng and Yang, Zhenheng and Shou, Mike Zheng},
journal={arXiv preprint},
year={2025}
}Citationtext
@article{xie2025showo2,
title={Show-o2: Improved Native Unified Multimodal Models},
author={Xie, Jinheng and Yang, Zhenheng and Shou, Mike Zheng},
journal={arXiv preprint},
year={2025}
}Citationtext
@article{xie2025showo2,
title={Show-o2: Improved Native Unified Multimodal Models},
author={Xie, Jinheng and Yang, Zhenheng and Shou, Mike Zheng},
journal={arXiv preprint},
year={2025}
}Citationtext
@article{xie2025showo2,
title={Show-o2: Improved Native Unified Multimodal Models},
author={Xie, Jinheng and Yang, Zhenheng and Shou, Mike Zheng},
journal={arXiv preprint},
year={2025}
}Citationtext
@article{xie2025showo2,
title={Show-o2: Improved Native Unified Multimodal Models},
author={Xie, Jinheng and Yang, Zhenheng and Shou, Mike Zheng},
journal={arXiv preprint},
year={2025}
}Citationtext
@article{xie2025showo2,
title={Show-o2: Improved Native Unified Multimodal Models},
author={Xie, Jinheng and Yang, Zhenheng and Shou, Mike Zheng},
journal={arXiv preprint},
year={2025}
}Citationtext
@article{xie2025showo2,
title={Show-o2: Improved Native Unified Multimodal Models},
author={Xie, Jinheng and Yang, Zhenheng and Shou, Mike Zheng},
journal={arXiv preprint},
year={2025}
}Citationtext
@article{xie2025showo2,
title={Show-o2: Improved Native Unified Multimodal Models},
author={Xie, Jinheng and Yang, Zhenheng and Shou, Mike Zheng},
journal={arXiv preprint},
year={2025}
}Citationtext
@article{xie2025showo2,
title={Show-o2: Improved Native Unified Multimodal Models},
author={Xie, Jinheng and Yang, Zhenheng and Shou, Mike Zheng},
journal={arXiv preprint},
year={2025}
}Citationtext
@article{xie2025showo2,
title={Show-o2: Improved Native Unified Multimodal Models},
author={Xie, Jinheng and Yang, Zhenheng and Shou, Mike Zheng},
journal={arXiv preprint},
year={2025}
}Citationtext
@article{xie2025showo2,
title={Show-o2: Improved Native Unified Multimodal Models},
author={Xie, Jinheng and Yang, Zhenheng and Shou, Mike Zheng},
journal={arXiv preprint},
year={2025}
}Citationtext
@article{xie2025showo2,
title={Show-o2: Improved Native Unified Multimodal Models},
author={Xie, Jinheng and Yang, Zhenheng and Shou, Mike Zheng},
journal={arXiv preprint},
year={2025}
}Citationtext
@article{xie2025showo2,
title={Show-o2: Improved Native Unified Multimodal Models},
author={Xie, Jinheng and Yang, Zhenheng and Shou, Mike Zheng},
journal={arXiv preprint},
year={2025}
}Citationtext
@article{xie2025showo2,
title={Show-o2: Improved Native Unified Multimodal Models},
author={Xie, Jinheng and Yang, Zhenheng and Shou, Mike Zheng},
journal={arXiv preprint},
year={2025}
}Citationtext
@article{xie2025showo2,
title={Show-o2: Improved Native Unified Multimodal Models},
author={Xie, Jinheng and Yang, Zhenheng and Shou, Mike Zheng},
journal={arXiv preprint},
year={2025}
}Citationtext
@article{xie2025showo2,
title={Show-o2: Improved Native Unified Multimodal Models},
author={Xie, Jinheng and Yang, Zhenheng and Shou, Mike Zheng},
journal={arXiv preprint},
year={2025}
}Citationtext
@article{xie2025showo2,
title={Show-o2: Improved Native Unified Multimodal Models},
author={Xie, Jinheng and Yang, Zhenheng and Shou, Mike Zheng},
journal={arXiv preprint},
year={2025}
}Citationtext
@article{xie2025showo2,
title={Show-o2: Improved Native Unified Multimodal Models},
author={Xie, Jinheng and Yang, Zhenheng and Shou, Mike Zheng},
journal={arXiv preprint},
year={2025}
}Citationtext
@article{xie2025showo2,
title={Show-o2: Improved Native Unified Multimodal Models},
author={Xie, Jinheng and Yang, Zhenheng and Shou, Mike Zheng},
journal={arXiv preprint},
year={2025}
}Citationtext
@article{xie2025showo2,
title={Show-o2: Improved Native Unified Multimodal Models},
author={Xie, Jinheng and Yang, Zhenheng and Shou, Mike Zheng},
journal={arXiv preprint},
year={2025}
}Citationtext
@article{xie2025showo2,
title={Show-o2: Improved Native Unified Multimodal Models},
author={Xie, Jinheng and Yang, Zhenheng and Shou, Mike Zheng},
journal={arXiv preprint},
year={2025}
}Citationtext
@article{xie2025showo2,
title={Show-o2: Improved Native Unified Multimodal Models},
author={Xie, Jinheng and Yang, Zhenheng and Shou, Mike Zheng},
journal={arXiv preprint},
year={2025}
}Citationtext
@article{xie2025showo2,
title={Show-o2: Improved Native Unified Multimodal Models},
author={Xie, Jinheng and Yang, Zhenheng and Shou, Mike Zheng},
journal={arXiv preprint},
year={2025}
}Citationtext
@article{xie2025showo2,
title={Show-o2: Improved Native Unified Multimodal Models},
author={Xie, Jinheng and Yang, Zhenheng and Shou, Mike Zheng},
journal={arXiv preprint},
year={2025}
}Citationtext
@article{xie2025showo2,
title={Show-o2: Improved Native Unified Multimodal Models},
author={Xie, Jinheng and Yang, Zhenheng and Shou, Mike Zheng},
journal={arXiv preprint},
year={2025}
}Citationtext
@article{xie2025showo2,
title={Show-o2: Improved Native Unified Multimodal Models},
author={Xie, Jinheng and Yang, Zhenheng and Shou, Mike Zheng},
journal={arXiv preprint},
year={2025}
}Citationtext
@article{xie2025showo2,
title={Show-o2: Improved Native Unified Multimodal Models},
author={Xie, Jinheng and Yang, Zhenheng and Shou, Mike Zheng},
journal={arXiv preprint},
year={2025}
}Citationtext
@article{xie2025showo2,
title={Show-o2: Improved Native Unified Multimodal Models},
author={Xie, Jinheng and Yang, Zhenheng and Shou, Mike Zheng},
journal={arXiv preprint},
year={2025}
}Citationtext
@article{xie2025showo2,
title={Show-o2: Improved Native Unified Multimodal Models},
author={Xie, Jinheng and Yang, Zhenheng and Shou, Mike Zheng},
journal={arXiv preprint},
year={2025}
}Citationtext
@article{xie2025showo2,
title={Show-o2: Improved Native Unified Multimodal Models},
author={Xie, Jinheng and Yang, Zhenheng and Shou, Mike Zheng},
journal={arXiv preprint},
year={2025}
}Citationtext
@article{xie2025showo2,
title={Show-o2: Improved Native Unified Multimodal Models},
author={Xie, Jinheng and Yang, Zhenheng and Shou, Mike Zheng},
journal={arXiv preprint},
year={2025}
}Citationtext
@article{xie2025showo2,
title={Show-o2: Improved Native Unified Multimodal Models},
author={Xie, Jinheng and Yang, Zhenheng and Shou, Mike Zheng},
journal={arXiv preprint},
year={2025}
}Citationtext
@article{xie2025showo2,
title={Show-o2: Improved Native Unified Multimodal Models},
author={Xie, Jinheng and Yang, Zhenheng and Shou, Mike Zheng},
journal={arXiv preprint},
year={2025}
}Citationtext
@article{xie2025showo2,
title={Show-o2: Improved Native Unified Multimodal Models},
author={Xie, Jinheng and Yang, Zhenheng and Shou, Mike Zheng},
journal={arXiv preprint},
year={2025}
}Citationtext
@article{xie2025showo2,
title={Show-o2: Improved Native Unified Multimodal Models},
author={Xie, Jinheng and Yang, Zhenheng and Shou, Mike Zheng},
journal={arXiv preprint},
year={2025}
}Deploy This Model
Production-ready deployment in minutes
Together.ai
Instant API access to this model
Production-ready inference API. Start free, scale to millions.
Try Free APIReplicate
One-click model deployment
Run models in the cloud with simple API. No DevOps required.
Deploy NowDisclosure: We may earn a commission from these partners. This helps keep LLMYourWay free.