YannQi
R-4B
R-4B: Incentivizing General-Purpose Auto-Thinking Capability in MLLMs via Bi-Mode Annealing and Reinforce Learning [📚 Arxiv Paper] [🤗 Hugging Face] [🤖️ ModelScope] [💻 Code] In this repo, we present R-4B, a multimodal large language model designed for general-purpose auto-thinking, autonomously switching between step-by-step thinking and direct response generation based on task complexity. This capability enables R-4B to deliver high-quality responses while significantly improving inference efficiency and reducing computational costs. The development of R-4B follows a two-stage training paradigm: (1) Bi-mode Annealing, which establishes both thinking and non-thinking capabilities for VQA; and (2) Bi-mode Policy Optimization (BPO), which enables the model to adaptively switch between thinking and non-thinking modes based on input demands. - 🧠 Think Smart, Act Fast: Adaptive & Controllable Thinking! Our model provides three-mode control over the response process. - Auto-thinking Mode: Unleash auto-thinking that works across general topics, from simple Q&A to complex scientific analysis. It saves time and computation by thinking only when it matters. - Support Manual Control: Explicitly command the model to use its `thinking` or `non-thinking` capabilities, enabling you to make your choices for every job. - 🏆 Strong Performance, Open for Everyone! Our model is now fully open-source. It achieves state-of-the-art performance among models of comparable size. - [2025.08.20] 🚀 vLLM Support is Here! Our R-4B model is now fully compatible with vLLM for high-performance inference. - [2025.08.18] 🏆 Top Rank Achieved! We are thrilled to announce that R-4B is now ranked #1 among all open-source models on the OpenCompass Multi-modal Reasoning Leaderboard! - [2025.08.11] 🥇 Rank #1! R-4B ranks first under 20B parameters on the OpenCompass Multi-modal Academic Leaderboard! - [2025.08.05] 🎉 R-4B is Released! Our model is now publicly available. You can download it from Hugging Face. Below, we provide simple examples to show how to use R-4B with 🤗 Transformers. > [!NOTE] > Users can dynamically control the model's response by selecting one of three modes (`auto-thinking`, `thinking`, or `non-thinking`) with `thinkingmode`. `thinkingmode=auto` for `auto-thinking` mode; `thinkingmode=long` for `thinking` mode; `thinkingmode=short` for `non-thinking` mode. > Default is `auto-thinking`. - We recommend using vLLM for fast R-4B deployment and inference. The code of R-4B requires the newest vllm now. Please install from local source: > [!TIP] > The `thinkingmode` switch is also available in APIs created by vLLM. > Default is `auto-thinking`. 1. R-4B establishes itself with powerful, state-of-the-art perceptual abilities that are competitive with larger models. 2. In evaluation sets that require complex logical reasoning and mathematical problem-solving, such as WeMath, MathVerse, and LogicVista, R-4B displays a strong performance curve. This highlights its advanced adaptive thinking capacity for logical deduction and solving complex quantitative problems. R-4B is developed based on the codebases of the following projects: LLaVA-Next, SigLIP2, Qwen3, Qwen2.5-VL, VLMEvalKit. We sincerely thank these projects for their outstanding work.