Ertugrul

4 models • 1 total models in database

Sort by:

Qwen2-VL-7B-Captioner-Relaxed

Qwen2-VL-7B-Captioner-Relaxed is an instruction-tuned version of Qwen2-VL-7B-Instruct, an advanced multimodal large language model. This fine-tuned version is based on a hand-curated dataset for text-to-image models, providing significantly more detailed descriptions of given images. Enhanced Detail: Generates more comprehensive and nuanced image descriptions. Relaxed Constraints: Offers less restrictive image descriptions compared to the base model. Natural Language Output: Describes different subjects in the image while specifying their locations using natural language. Optimized for Image Generation: Produces captions in formats compatible with state-of-the-art text-to-image generation models. Note: This fine-tuned model is optimized for creating text-to-image datasets. As a result, performance on other tasks (e.g., ~10% decrease on mmmuval) may be lower compared to the original model. If you encounter errors such as `KeyError: 'qwen2vl'` or `ImportError: cannot import name 'Qwen2VLForConditionalGeneration' from 'transformers'`, try installing the latest version of the transformers library from source: `pip install git+https://github.com/huggingface/transformers` If you prefer no coding option, there's simple gui that allows you to caption selected images. You can find more about it here: - Google AI/ML Developer Programs team supported this work by providing Google Cloud Credit For more detailed options, refer to the Qwen2-VL-7B-Instruct documentation.

NaNK

license:apache-2.0

652

Ertugrul

Qwen2-VL-7B-Captioner-Relaxed

Qwen2.5-VL-7B-Captioner-Relaxed

Pixtral-12B-Captioner-Relaxed

deprem_bert_128k_v2