Ricky06662

6 models • 1 total models in database

Sort by:

Seg-Zero-7B

This model is based on the paper Seg-Zero: Reasoning-Chain Guided Segmentation via Cognitive Reinforcement. It uses a decoupled architecture with a reasoning model and a segmentation model. It's trained via reinforcement learning using GRPO without explicit reasoning data, leading to robust zero-shot generalization and emergent test-time reasoning. This is a Seg-Zero-7B model. It introduces a decoupled architecture consisting of a reasoning model and a segmentation model. The reasoning model interprets user intentions, generates explicit reasoning chains, and produces positional prompts, which are subsequently used by the segmentation model to generate pixel-level masks. You will get the thinking process in the command line and the mask will be saved in the inferencescripts folder. You can also provide your own imagepath and text:

NaNK

—

1,076

VisionReasoner-7B

NaNK

license:apache-2.0

436

TaskRouter-1.5B

VisionReasoner: Unified Visual Perception and Reasoning via Reinforcement Learning This repository contains the code for the model described in the paper VisionReasoner: Unified Visual Perception and Reasoning via Reinforcement Learning. Code: https://github.com/dvlab-research/VisionReasoner

NaNK

—

410

Seg-Zero-7B-Best-on-ReasonSegTest

NaNK

—

103

Visurf-7B-Best-on-gRefCOCO

NaNK

license:apache-2.0

Visurf-7B-NoThink-Best-on-gRefCOCO

ViSurf: Visual Supervised-and-Reinforcement Fine-Tuning for Large Vision-and-Language Models This repository contains the `Visurf-7B-NoThink-Best-on-gRefCOCO` model, as presented in the paper ViSurf: Visual Supervised-and-Reinforcement Fine-Tuning for Large Vision-and-Language Models. Abstract: Typical post-training paradigms for Large Vision-and-Language Models (LVLMs) include Supervised Fine-Tuning (SFT) and Reinforcement Learning with Verifiable Rewards (RLVR). SFT leverages external guidance to inject new knowledge, whereas RLVR utilizes internal reinforcement to enhance reasoning capabilities and overall performance. However, our analysis reveals that SFT often leads to sub-optimal performance, while RLVR struggles with tasks that exceed the model's internal knowledge base. To address these limitations, we propose ViSurf (Visual Supervised-and-Reinforcement Fine-Tuning), a unified post-training paradigm that integrates the strengths of both SFT and RLVR within a single stage. We analyze the derivation of the SFT and RLVR objectives to establish the ViSurf objective, providing a unified perspective on these two paradigms. The core of ViSurf involves injecting ground-truth labels into the RLVR rollouts, thereby providing simultaneous external supervision and internal reinforcement. Furthermore, we introduce three novel reward control strategies to stabilize and optimize the training process. Extensive experiments across several diverse benchmarks demonstrate the effectiveness of ViSurf, outperforming both individual SFT, RLVR, and two-stage SFT $\rightarrow$ RLVR. In-depth analysis corroborates these findings, validating the derivation and design principles of ViSurf. For more details, including the code and training procedures, please refer to the official GitHub repository. ViSurf (Visual Supervised-and-Reinforcement Fine-Tuning) is a unified post-training paradigm that integrates the strengths of both SFT and RLVR within a single stage. First, clone the repository and create a conda environment as described in the GitHub README: You can load and use the model with the `transformers` library. The `tokenizerconfig.json` indicates the use of `Qwen25VLProcessor`, so `AutoProcessor` is also needed. If you find our work helpful or inspiring, please feel free to cite it.

NaNK

license:apache-2.0