merve

64 models • 1 total models in database
Sort by:

license-plate-detr-dinov3

367
1

rtdetr_v2_r50vd-mobile-ui-design

license:apache-2.0
116
1

sam2-hiera-base-plus

license:apache-2.0
88
0

sam2-hiera-large

license:apache-2.0
80
2

yolos-small-license-plates

license:apache-2.0
79
0

flux-lego-lora-dreambooth

60
13

SmolVLM2-2.2B-DocVQA

NaNK
57
3

sam2-hiera-small

license:apache-2.0
49
2

my_awesome_food_model

license:apache-2.0
36
0

beans-vit-224

license:apache-2.0
33
1

lego-sdxl-dora

NaNK
30
3

lego-sdxl-dora-3

NaNK
27
2

kosmos-2.5-ft

Kosmos-2.5 fine-tuned on grounded OCR (OCR with bounding boxes), find the script here: (GH, HF)

NaNK
license:apache-2.0
21
2

lego_LoRA

NaNK
18
4

lego-lora

NaNK
16
2

sam2-hiera-tiny

license:apache-2.0
16
0

chatgpt-prompt-generator-v12

license:apache-2.0
15
72

PaddleOCR-VL-1.5-hf

license:apache-2.0
15
0

PaddleOCR-VL-hf

license:apache-2.0
15
0

Isaac-0.1

NaNK
license:cc-by-nc-4.0
13
2

lego-dreambooth-sdxl

NaNK
13
1

chatgpt-prompts-bart-long

license:apache-2.0
12
57

emoji-dreambooth-trained-xl

NaNK
9
6

vit-mobilenet-beans-224

8
1

gemma-7b-8bit

NaNK
7
1

vq-vae

6
2

trained-flux-lora-lego

5
1

resnet-mobilenet-beans-5

5
0

paligemma_vqav2

NaNK
4
14

sam-finetuned

license:apache-2.0
4
0

Mistral-7B-Instruct-v0.2

NaNK
license:apache-2.0
4
0

SmolVLM2-500M-Video-Instruct-video-feedback

license:apache-2.0
3
0

paligemma2-3b-vqav2

NaNK
2
6

blip2-flan-t5-xxl

license:mit
2
1

peft-copy-test

2
0

detr-resnet-50-onnx

2
0

VeCLIP-b16-100m

2
0

SmolVLM2-500M-Video-Instruct-videofeedback

license:apache-2.0
2
0

chatgpt-prompts-bart

license:apache-2.0
1
5

pokemon-classifier

license:apache-2.0
1
2

dreambooth_bioshock

1
0

orb_diffusiondb_controlnet

1
0

turkish-rte

license:mit
1
0

musicgen-small

license:cc-by-nc-4.0
1
0

gemma-7b-it-8bit

NaNK
1
0

VeCLIP-b16-3m

1
0

colpali_ufo

This model is a fine-tuned version of vidore/colpali-v1.2-hf on an unknown dataset. The following hyperparameters were used during training: - learningrate: 5e-05 - trainbatchsize: 4 - evalbatchsize: 8 - seed: 42 - gradientaccumulationsteps: 4 - totaltrainbatchsize: 16 - optimizer: Use adamwtorch with betas=(0.9,0.999) and epsilon=1e-08 and optimizerargs=No additional optimizer arguments - lrschedulertype: linear - lrschedulerwarmupsteps: 100 - numepochs: 1 - PEFT 0.11.1 - Transformers 4.48.0.dev0 - Pytorch 2.5.1+cu121 - Datasets 2.21.0 - Tokenizers 0.21.0

1
0

smol-vision

Smol Vision 🐣 Recipes for shrinking, optimizing, customizing cutting edge vision and multimodal AI models. Original GH repository is here migrated to Hugging Face since notebooks there aren't rendered 🥲 Latest examples 👇🏻 - Fine-tune ColPali for Multimodal RAG - Fine-tune Gemma-3n for all modalities (audio-text-image) - Any-to-Any (Video) RAG with OmniEmbed and Qwen Note: The script and notebook are updated to fix few issues related to QLoRA! | | Notebook | Description | |------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------| | Quantization/ONNX | Faster and Smaller Zero-shot Object Detection with Optimum | Quantize the state-of-the-art zero-shot object detection model OWLv2 using Optimum ONNXRuntime tools. | | VLM Fine-tuning | Fine-tune PaliGemma | Fine-tune state-of-the-art vision language backbone PaliGemma using transformers. | | Intro to Optimum/ORT | Optimizing DETR with 🤗 Optimum | A soft introduction to exporting vision models to ONNX and quantizing them. | | Model Shrinking | Knowledge Distillation for Computer Vision | Knowledge distillation for image classification. | | Quantization | Fit in vision models using Quanto | Fit in vision models to smaller hardware using quanto | | Speed-up | Faster foundation models with torch.compile | Improving latency for foundation models using `torch.compile` | | VLM Fine-tuning | Fine-tune Florence-2 | Fine-tune Florence-2 on DocVQA dataset | | VLM Fine-tuning | QLoRA/Fine-tune IDEFICS3 or SmolVLM on VQAv2 | QLoRA/Full Fine-tune IDEFICS3 or SmolVLM on VQAv2 dataset | | VLM Fine-tuning (Script) | QLoRA Fine-tune IDEFICS3 on VQAv2 | QLoRA/Full Fine-tune IDEFICS3 or SmolVLM on VQAv2 dataset | | Multimodal RAG | Multimodal RAG using ColPali and Qwen2-VL | Learn to retrieve documents and pipeline to RAG without hefty document processing using ColPali through Byaldi and do the generation with Qwen2-VL | | Multimodal Retriever Fine-tuning | Fine-tune ColPali for Multimodal RAG | Learn to apply contrastive fine-tuning on ColPali to customize it for your own multimodal document RAG use case | | VLM Fine-tuning | Fine-tune Gemma-3n for all modalities (audio-text-image) | Fine-tune Gemma-3n model to handle any modality: audio, text, and image. | | Multimodal RAG | Any-to-Any (Video) RAG with OmniEmbed and Qwen | Do retrieval and generation across modalities (including video) using OmniEmbed and Qwen. | | Speed-up/Memory Optimization | Vision language model serving using TGI (SOON) | Explore speed-ups and memory improvements for vision-language model serving with text-generation inference | | Quantization/Optimum/ORT | All levels of quantization and graph optimizations for Image Segmentation using Optimum (SOON) | End-to-end model optimization using Optimum |

0
177

yolov9

0
43

idefics3llama-vqav2

license:apache-2.0
0
8

gemma-3n-finevideo

This model is a fine-tuned version of google/gemma-3n-E2B-it. It has been trained using TRL. - TRL: 0.19.1 - Transformers: 4.53.2 - Pytorch: 2.6.0+cu124 - Datasets: 4.0.0 - Tokenizers: 0.21.2

NaNK
0
7

EfficientSAM

license:apache-2.0
0
6

anime-faces-generator

license:apache-2.0
0
2

breast_cancernb8gjv4n-diagnosis-classification

license:apache-2.0
0
2

xgboost-example

0
2

ner-replica

license:apache-2.0
0
2

blip2-opt-6.7b

NaNK
license:mit
0
2

multilabel-v1-replica

license:apache-2.0
0
1

distilbert-base-uncased-finetuned-cola

license:apache-2.0
0
1

turkish-rte-2

license:mit
0
1

siglip-faiss-wikiart

license:apache-2.0
0
1

hiera-tiny-224-in1k

license:cc-by-nc-4.0
0
1

VeCLIP-b16-200m

0
1

flux-dreambooth-lora

0
1