GAIR

51 models • 1 total models in database

Sort by:

DeepResearcher-7b

DeepResearcher is the first comprehensive framework for end-to-end training of LLM-based deep research agents through scaling reinforcement learning (RL) in real-world environments with authentic web search interactions. Our qualitative analysis reveals emergent cognitive behaviors from end-to-end RL training, including the ability to formulate plans, cross-validate information from multiple sources, engage in self-reflection to redirect research, and maintain honesty when unable to find definitive answers. - License: Apache 2.0 - Model type: Reinforcement learning-based LLM (Large Language Model). - Language(s): The model is designed for tasks in English. - Finetuned from model: The model is built using the Qwen2.5-7B-Instruct architecture . - Repository: DeepResearcher GitHub . - Paper: DeepResearcher: Scaling Deep Research via Reinforcement Learning in Real-world Environments To get started, you can visit the DeepResearcher repository on GitHub, where the model's code and setup instructions are provided . The model was trained on open-domain question-answering datasets, including: - NaturalQuestions (NQ) - TriviaQA (TQ) - HotpotQA - 2Wiki MultiHopQA DeepResearcher was trained using reinforcement learning (RL) with the Group Relative Policy Optimization (GRPO) algorithm. It was tested in both in-domain (NQ, TQ, HotpotQA) and out-of-domain (Musique, Bamboogle, PopQA) settings . The model was evaluated on several datasets, including: - NQ (Natural Questions) - TQ (TriviaQA) - HotpotQA - 2Wiki - Musique - Bamboogle - PopQA . DeepResearcher outperforms all baseline models, achieving a substantial improvement in task completion across the datasets, particularly in out-of-domain scenarios.

GAIR

DeepResearcher-7b

ReasonEval-7B

ReasonEval-34B

LIMI-Air

daVinci-MagiHuman

LIMO-v2

daVinci-Dev-32B-MT

daVinci-Dev-72B

LIMO

rst-temporal-reasoning-11b

LIMI

LiveTalk-1.3B-V0.1

daVinci-Dev-72B-MT

Abel-7B-002

OpenSWE-72B

daVinci-Agency

rst-intent-detection-11b

SR Scientist 30B

Anole-7b-v0.1

confucius-confidence-verb

twgi-critique-anole-7b

rst-information-extraction-11b

daVinci-origin-3B

ToRL-1.5B

ToRL-7B

Anole-7b

autoj-13b-GPTQ-4bits

daVinci-origin-7B

autoj-13b

LIMR

ReAlign-Task-Classifier

twgi-subgoal-anole-7b

daVinci-Dev-32B

rst-gaokao-writing-11b

rst-all-11b

rst-word-sense-disambiguation-11b

rst-natural-language-inference-11b

confucius-multisample

autoj-scenario-classifier

autoj-bilingual-6b

Abel-7B-001

Abel-13B-001

Safety-J-v5

Safety-J-v1

Abel-70B-001

rst-fact-retrieval-11b

rst-topic-classification-11b

rst-sentiment-classification-11b

rst-summarization-11b

rst-gaokao-cloze-11b

rst-gaokao-rc-11b