gme-Qwen2-VL-2B-Instruct
64.4K
114
2.0B
2 languages
license:apache-2.0
by
Alibaba-NLP
Embedding Model
OTHER
2B params
Fair
64K downloads
Community-tested
Edge AI:
Mobile
Laptop
Server
5GB+ RAM
Mobile
Laptop
Server
Quick Summary
We are excited to present `GME-Qwen2VL` series of unified multimodal embedding models, which are based on the advanced Qwen2-VL multimodal large language models (MLLMs).
Device Compatibility
Mobile
4-6GB RAM
Laptop
16GB RAM
Server
GPU
Minimum Recommended
2GB+ RAM
Code Examples
Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Usagepythontransformers
from transformers import AutoModel
from transformers.utils.versions import require_version
require_version(
"transformers<4.52.0",
"The remote code has some issues with transformers>=4.52.0, please downgrade: pip install transformers==4.51.3"
)
t2i_prompt = 'Find an image that matches the given text.'
texts = [
"The Tesla Cybertruck is a battery electric pickup truck built by Tesla, Inc. since 2023.",
"Alibaba office.",
]
images = [
'https://upload.wikimedia.org/wikipedia/commons/e/e9/Tesla_Cybertruck_damaged_window.jpg',
'https://upload.wikimedia.org/wikipedia/commons/e/e0/TaobaoCity_Alibaba_Xixi_Park.jpg',
]
gme = AutoModel.from_pretrained(
"Alibaba-NLP/gme-Qwen2-VL-2B-Instruct",
torch_dtype="float16", device_map='cuda', trust_remote_code=True
)
# Single-modal embedding
e_text = gme.get_text_embeddings(texts=texts)
e_image = gme.get_image_embeddings(images=images)
print('Single-modal', (e_text @ e_image.T).tolist())
## Single-modal [[0.359619140625, 0.0655517578125], [0.04180908203125, 0.374755859375]]
# How to set embedding instruction
e_query = gme.get_text_embeddings(texts=texts, instruction=t2i_prompt)
# If is_query=False, we always use the default instruction.
e_corpus = gme.get_image_embeddings(images=images, is_query=False)
print('Single-modal with instruction', (e_query @ e_corpus.T).tolist())
## Single-modal with instruction [[0.429931640625, 0.11505126953125], [0.049835205078125, 0.409423828125]]
# Fused-modal embedding
e_fused = gme.get_fused_embeddings(texts=texts, images=images)
print('Fused-modal', (e_fused @ e_fused.T).tolist())
## Fused-modal [[1.0, 0.05511474609375], [0.05511474609375, 1.0]]Deploy This Model
Production-ready deployment in minutes
Together.ai
Instant API access to this model
Production-ready inference API. Start free, scale to millions.
Try Free APIReplicate
One-click model deployment
Run models in the cloud with simple API. No DevOps required.
Deploy NowDisclosure: We may earn a commission from these partners. This helps keep LLMYourWay free.