EditScore
EditScore-7B
EditScore is a series of state-of-the-art open-source reward models (7Bโ72B) designed to evaluate and enhance instruction-guided image editing. โจ Highlights - State-of-the-Art Performance: Effectively matches the performance of leading proprietary VLMs. With a self-ensembling strategy, our largest model surpasses even GPT-5 on our comprehensive benchmark, EditReward-Bench. - A Reliable Evaluation Standard: We introduce EditReward-Bench, the first public benchmark specifically designed for evaluating reward models in image editing, featuring 13 subtasks, 11 state-of-the-art editing models (including proprietary models) and expert human annotations. - Simple and Easy-to-Use: Get an accurate quality score for your image edits with just a few lines of code. - Versatile Applications: Ready to use as a best-in-class reranker to improve editing outputs, or as a high-fidelity reward signal for stable and effective Reinforcement Learning (RL) fine-tuning. ๐ฅ News - 2025-09-30: We release OmniGen2-EditScore7B, unlocking online RL For Image Editing via high-fidelity EditScore. LoRA weights are available at Hugging Face and ModelScope. - 2025-09-30: We are excited to release EditScore and EditReward-Bench! Model weights and the benchmark dataset are now publicly available. You can access them on Hugging Face: Models Collection and Benchmark Dataset, and on ModelScope: Models Collection and Benchmark Dataset. ๐ Introduction While Reinforcement Learning (RL) holds immense potential for this domain, its progress has been severely hindered by the absence of a high-fidelity, efficient reward signal. To overcome this barrier, we provide a systematic, two-part solution: - A Rigorous Evaluation Standard: We first introduce EditReward-Bench, a new public benchmark for the direct and reliable evaluation of reward models. It features 13 diverse subtasks and expert human annotations, establishing a gold standard for measuring reward signal quality. - A Powerful & Versatile Tool: Guided by our benchmark, we developed the EditScore model series. Through meticulous data curation and an effective self-ensembling strategy, EditScore sets a new state of the art for open-source reward models, even surpassing the accuracy of leading proprietary VLMs. We demonstrate the practical utility of EditScore through two key applications: - As a State-of-the-Art Reranker: Use EditScore to perform Best-of-N selection and instantly improve the output quality of diverse editing models. - As a High-Fidelity Reward for RL: Use EditScore as a robust reward signal to fine-tune models via RL, enabling stable training and unlocking significant performance gains where general-purpose VLMs fail. This repository releases both the EditScore models and the EditReward-Bench dataset to facilitate future research in reward modeling, policy optimization, and AI-driven model improvement. EditScore as a superior reward signal for image editing. ๐ TODO We are actively working on improving EditScore and expanding its capabilities. Here's what's next: - [ ] Release RL training code applying EditScore to OmniGen2. - [ ] Provide Best-of-N inference scripts for OmniGen2, Flux-dev-Kontext, and Qwen-Image-Edit. ๐งช Usage Example Using EditScore is straightforward. The model will be automatically downloaded from the Hugging Face Hub on its first run. ๐ Benchmark Your Image-Editing Reward Model We provide an evaluation script to benchmark reward models on EditReward-Bench. To evaluate your own custom reward model, simply create a scorer class with a similar interface and update the script. โค๏ธ Citing Us If you find this repository or our work useful, please consider giving a star โญ and citation ๐ฆ, which would be greatly appreciated: