Zhaoxuan

1 models • 1 total models in database

Sort by:

PUGC Mistral DPO

This is the model checkpoint for ACL 2025 paper "Aligning Large Language Models with Implicit Preferences from User-Generated Content" (https://arxiv.org/abs/2506.04463) The model is trained from Mistral-7B-Instruct-v0.2 with DPO, using preference data harvested from user-generated content. If you find this model helpful to your research, please cite the following paper:

NaNK

license:apache-2.0