Omni-Reward: Towards Generalist Omni-Modal Reward Modeling with Free-Form Preferences
š Paper | š» Code | š¤ Benchmark | š¤ Dataset | š¤ Model | š Homepage
OmniRewardModel is our pretrained discriminative reward model designed to handle omni-modal tasks (e.g., text, image, video) and free-form human preferences.
It is built upon the open-source base model MiniCPM-o-26, with an additional value head appended to produce scalar reward scores.
The model supports fine-grained scoring across various tasks and modalities, and can be seamlessly loaded via Hugging Face Hub.
To reproduce the training process in our paper, please make sure to set up the environment as described below. Our training code is built upon the llama-factory framework.
We recommend using `torch==2.2.0` for best compatibility.
Install PyTorch (choose one based on your CUDA version):
Download all required training and evaluation datasets from OmniRewardData and OmniRewardBench:
To reproduce the training results described in our paper, please navigate to the OmniReward-Factory directory and run the following scripts:
You can also directly use our pretrained Omni-Reward for evaluation without retraining.
š https://huggingface.co/jinzhuoran/OmniRewardModel
- `--evaldataset`: Specifies the evaluation dataset (e.g., `omnit2t`, `omnit2i`, `omnit2v`, etc.).