Search-R3: Reasoning-Reinforced Representation for Search (paper)
Welcome to Search-R3, our cool approach to making LLMs way better at search tasks by teaching them to embed after they think.
We've figured out a neat trick: when language models reason step-by-step and then create embeddings based on that reasoning, they get much better at understanding what you're really searching for.
Different to BERT-based embedding models that use a fixed \ token for representation, this model creates embeddings by generating a special \ token during chat conversations.
The model is basically an auto-regressive language model. The easiest way to test it out is with a simple chat - just ask "Who are you?" and see how it responds.
Note: we use Qwen-2.5-Instruct as the base model, we didn't change the chat template.
Here's a more complete example.py showing how to use Search-R3 to find relevant documents:
And that's it! Search-R3 creates powerful embeddings that capture the essence of content better than traditional embedding models - because it thinks through the meaning first, then creates the embedding. This makes it especially good at handling complex questions and nuanced content.
Our model is currently in preview phase for academic purposes only. It's not yet stable or reliable enough for industrial or commercial applications. There are a few things to keep in mind when using Search-R3:
+ Model size: we built this on a 1.5B parameter instruction model and applied RL directly without distillation. Because of its relatively small size, the model can sometimes struggle to follow instructions consistently. If you try different prompts than the ones we provide, the model might not generate embedding tokens reliably.
+ Sequence length: we didn't train the model on very long sequences. Performance can degrade when you feed it longer texts - so if you're benchmarking against us, that's one way you might be able to beat our scores!
If you use Search-R3 in your research or applications, please cite our paper: