tue-mps

25 models • 2 total models in database

Sort by:

coco_panoptic_eomt_large_640

EoMT (Encoder-only Mask Transformer) is a Vision Transformer (ViT) architecture designed for high-quality and efficient image segmentation. It was introduced in the CVPR 2025 highlight paper: Your ViT is Secretly an Image Segmentation Model by Tommie Kerssies, Niccolò Cavagnero, Alexander Hermans, Narges Norouzi, Giuseppe Averta, Bastian Leibe, Gijs Dubbelman, and Daan de Geus. > Key Insight: Given sufficient scale and pretraining, a plain ViT along with additional few params can perform segmentation without the need for task-specific decoders or pixel fusion modules. The same model backbone supports semantic, instance, and panoptic segmentation with different post-processing 🤗 The original implementation can be found in this repository. The HuggingFace model page is available at this link. Here is how to use this model for Panotpic Segmentation: Citation If you find our work useful, please consider citing us as:

tue-mps

coco_panoptic_eomt_large_640

coco_instance_eomt_large_1280

ade20k_panoptic_eomt_large_1280

ade20k_semantic_eomt_large_512

coco_instance_eomt_large_640

coco_panoptic_eomt_base_640_2x

cityscapes_semantic_eomt_large_1024

videomt-dinov2-small-ytvis2019

coco_panoptic_eomt_small_640_2x

ade20k_panoptic_eomt_giant_1280

eomt-dinov3-coco-panoptic-small-640

coco_panoptic_eomt_7b_640

eomt-dinov3-coco-panoptic-large-640

eomt-dinov3-coco-instance-large-1280

eomt-dinov3-coco-instance-large-640

eomt-dinov3-ade-semantic-large-512

ade20k_panoptic_eomt_giant_640

ade20k_panoptic_eomt_large_640

eomt-dinov3-coco-panoptic-large-1280

eomt-dinov3-coco-panoptic-base-640

coco_panoptic_eomt_large_1280

coco_panoptic_eomt_giant_1280

coco_panoptic_eomt_giant_640

simple-tad

coco_instance_eomt_large_640_dinov3