hbXNov
lavida-llada-reason
Qwen3-VL-2B-Instruct-HoneyBee
We train Qwen/Qwen3-VL-2B-Instruct for 1 epoch on Meta's HoneyBee dataset. Specifically, we use Llama-factory for training and use this processed version of the data. We evaluate the base model and this model on 10 evals using the publicly released scripts. In particular, we also provide the adaptation of their main eval script to Qwen3-VL models using vLLM here. The sampling params set for Qwen3 VL models were: `maxtokens=2048, temperature=0.7, topp=0.8, topk=20, repetitionpenalty=1.0, presencepenalty=1.5` for both the models. | Model | Average | Mathverse (vision only) | Mathvista | Mathvision (testmini) | We-Math | Math500 | GPQA-D | Logicvista | Dynamath | Hallusionbench | MMMU-Pro (Vision) | |-------------------------------------|----------|--------------------------|------------|------------------------|----------|----------|--------|-------------|-----------|------------------|-------------------| | qwen3-2b-vl-instruct | 43.2 | 36.3 | 61.6 | 27.3 | 59.8 | 61.8 | 7.6 | 27.9 | 49.5 | 65.0 | 34.8 | | qwen3-2b-vl-instruct (1epoch HoneyBee) | 49.4 | 43.9 | 63.0 | 28.0 | 66.2 | 63.4 | 35.9 | 44.0 | 50.5 | 66.1 | 32.7 |