LLMYourWay
ModelsDevices
Edge AI
CompareInsights
Enterprise

dandelin

8 models • 1 total models in database
Sort by:

vilt-b32-mlm

Vision-and-Language Transformer (ViLT), pre-trained only Vision-and-Language Transformer (ViLT) model pre-trained on GCC+SBU+COCO+VG (200k steps). It was introduced in the paper ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision by Kim et al. and first released in this repository. Note: this model only includes the language modeling head. Disclaimer: The team releasing ViLT did not write a model card for this model so this model card has been written by the Hugging Face team. You can use the raw model for masked language modeling given an image and a piece of text with [MASK] tokens.

—
3,736,607
12

vilt-b32-finetuned-coco

--- license: apache-2.0 ---

license:apache-2.0
270,295
1

vilt-b32-finetuned-vqa

license:apache-2.0
68,462
415

vilt-b32-finetuned-nlvr2

license:apache-2.0
473
2

vilt-b32-mlm-itm

license:apache-2.0
93
3

vilt-b32-finetuned-flickr30k

license:apache-2.0
10
3

hype-sampler-abl

—
0
2

hype-sampler

—
0
1
LLMYourWay

The definitive AI model comparison platform. Compare 12K+ models, track performance, and discover the perfect AI solution for your needs.

Made with AI
Real-time Data

Product

  • Find Your Device
  • Browse Models
  • Compare AI
  • Benchmarks
  • Pricing
  • API Access

Resources

  • Blog & Articles
  • Methodology
  • Changelog
  • Trending
  • Use Cases

Company

  • About
  • Contact

Legal

  • Privacy Policy
  • Cookie Policy
  • Terms of Service
12K+12,000+
AI Models Tracked & Updated Daily
© 2026 LLMYourWay. All rights reserved.
Data updated every 4 hours
Powered by real-time AI data
API