tue-mps

25 models • 2 total models in database
Sort by:

coco_panoptic_eomt_large_640

EoMT (Encoder-only Mask Transformer) is a Vision Transformer (ViT) architecture designed for high-quality and efficient image segmentation. It was introduced in the CVPR 2025 highlight paper: Your ViT is Secretly an Image Segmentation Model by Tommie Kerssies, Niccolò Cavagnero, Alexander Hermans, Narges Norouzi, Giuseppe Averta, Bastian Leibe, Gijs Dubbelman, and Daan de Geus. > Key Insight: Given sufficient scale and pretraining, a plain ViT along with additional few params can perform segmentation without the need for task-specific decoders or pixel fusion modules. The same model backbone supports semantic, instance, and panoptic segmentation with different post-processing 🤗 The original implementation can be found in this repository. The HuggingFace model page is available at this link. Here is how to use this model for Panotpic Segmentation: Citation If you find our work useful, please consider citing us as:

license:mit
16,851
11

coco_instance_eomt_large_1280

license:mit
8,454
0

ade20k_panoptic_eomt_large_1280

license:mit
1,868
0

ade20k_semantic_eomt_large_512

license:mit
1,111
3

coco_instance_eomt_large_640

license:mit
828
7

coco_panoptic_eomt_base_640_2x

license:mit
658
0

cityscapes_semantic_eomt_large_1024

license:mit
433
1

videomt-dinov2-small-ytvis2019

409
0

coco_panoptic_eomt_small_640_2x

license:mit
251
0

ade20k_panoptic_eomt_giant_1280

license:mit
171
0

eomt-dinov3-coco-panoptic-small-640

license:mit
110
0

coco_panoptic_eomt_7b_640

NaNK
license:mit
86
0

eomt-dinov3-coco-panoptic-large-640

license:mit
73
0

eomt-dinov3-coco-instance-large-1280

license:mit
44
0

eomt-dinov3-coco-instance-large-640

license:mit
42
0

eomt-dinov3-ade-semantic-large-512

license:mit
38
0

ade20k_panoptic_eomt_giant_640

license:mit
29
0

ade20k_panoptic_eomt_large_640

license:mit
27
1

eomt-dinov3-coco-panoptic-large-1280

license:mit
24
0

eomt-dinov3-coco-panoptic-base-640

license:mit
23
0

coco_panoptic_eomt_large_1280

license:mit
10
0

coco_panoptic_eomt_giant_1280

license:mit
9
3

coco_panoptic_eomt_giant_640

license:mit
8
0

simple-tad

license:cc-by-nc-4.0
0
2

coco_instance_eomt_large_640_dinov3

EoMT (Encoder-only Mask Transformer) is a Vision Transformer (ViT) architecture designed for high-quality and efficient image segmentation. It was introduced in the CVPR 2025 highlight paper: Your ViT is Secretly an Image Segmentation Model by Tommie Kerssies, Niccolò Cavagnero, Alexander Hermans, Narges Norouzi, Giuseppe Averta, Bastian Leibe, Gijs Dubbelman, and Daan de Geus. > Key Insight: Given sufficient scale and pretraining, a plain ViT along with additional few params can perform segmentation without the need for task-specific decoders or pixel fusion modules. The same model backbone supports semantic, instance, and panoptic segmentation with different post-processing 🤗 The original implementation can be found in this repository. The HuggingFace model page is available at this link. Citation If you find our work useful, please consider citing us as:

license:mit
0
1