OFA-Sys

29 models • 1 total models in database

Sort by:

chinese-clip-vit-base-patch16

Introduction This is the base-version of the Chinese CLIP, with ViT-B/16 as the image encoder and RoBERTa-wwm-base as the text encoder. Chinese CLIP is a simple implementation of CLIP on a large-scale dataset of around 200 million Chinese image-text pairs. For more details, please refer to our technical report https://arxiv.org/abs/2211.01335 and our official github repo https://github.com/OFA-Sys/Chinese-CLIP (Welcome to star! 🔥🔥) Use with the official API We provide a simple code snippet to show how to use the API of Chinese-CLIP to compute the image & text embeddings and similarities. However, if you are not satisfied with only using the API, feel free to check our github repo https://github.com/OFA-Sys/Chinese-CLIP for more details about training and inference. Metric R@1 R@5 R@10 R@1 R@5 R@10 R@1 R@5 R@10 R@1 R@5 R@10 Wukong 51.7 78.9 86.3 77.4 94.5 97.0 76.1 94.8 97.5 92.7 99.1 99.6 R2D2 60.9 86.8 92.7 84.4 96.7 98.4 77.6 96.7 98.9 95.6 99.8 100.0 CN-CLIP 71.2 91.4 95.5 83.8 96.9 98.6 81.6 97.5 98.8 95.3 99.7 100.0 Metric R@1 R@5 R@10 R@1 R@5 R@10 R@1 R@5 R@10 R@1 R@5 R@10 Wukong 53.4 80.2 90.1 74.0 94.4 98.1 55.2 81.0 90.6 73.3 94.0 98.0 R2D2 56.4 85.0 93.1 79.1 96.5 98.9 63.3 89.3 95.7 79.3 97.1 98.7 CN-CLIP 69.2 89.9 96.1 81.5 96.9 99.1 63.0 86.6 92.9 83.5 97.3 99.2 Task CIFAR10 CIFAR100 DTD EuroSAT FER FGVC KITTI MNIST PC VOC GIT 88.5 61.1 42.9 43.4 41.4 6.7 22.1 68.9 50.0 80.2 ALIGN 94.9 76.8 66.1 52.1 50.8 25.0 41.2 74.0 55.2 83.0 CLIP 94.9 77.0 56.0 63.0 48.3 33.3 11.5 79.0 62.3 84.0 CN-CLIP 96.0 79.7 51.2 52.0 55.1 26.2 49.9 79.4 63.5 84.9 Citation If you find Chinese CLIP helpful, feel free to cite our paper. Thanks for your support!

—

115,313

117

OFA-Sys

chinese-clip-vit-base-patch16

chinese-clip-vit-large-patch14-336px

chinese-clip-vit-large-patch14

chinese-clip-vit-huge-patch14

small-stable-diffusion-v0

InsTagger

ofa-tiny

ofa-large

ofa-base

gsm8k-rft-llama7b2-u13b

ofa-large-caption

ofa-base-caption-fairseq-version

ofa-huge-vqa

MuggleMath_13B

gsm8k-rft-llama13b-u13b

gsm8k-rft-llama13b2-u13b

ProLLaMA-7B

ofa-huge

ofa-medium

OccuLLaMA-7B

TagLM-13b-v2.0

ofa-base-vqa-fairseq-version

chinese-clip-rn50

ofa-base-snlive-fairseq-version

expertllama-7b-delta

ONE-PEACE

ofa-base-refcoco-fairseq-version

gsm8k-rft-llama7b-u13b

TagLM-13b-v1.0