FlashResearch-4B-Thinking
[](https://huggingface.co/your-username/your-model-name) [](#license) [](https://huggingface.co/datasets/cheapresearch/CheapResearch-DS-33k)
A 4B-parameter Qwen model distilled from Tongyi DeepResearch-30B A3B, optimized for web-scale “deep research” tasks and inference with Alibaba-NLP/DeepResearch.
Base: Qwen 4B (dense) Teacher: Tongyi DeepResearch 30B A3B (MoE) Method: SFT distillation on 33k curated deep-research examples Dataset: `flashresearch/FlashResearch-DS-33k` Primary Use: Fast, low-cost DeepResearch agent runs (browsing, multi-step reasoning, source-grounded answers)
Primary dataset: `flashresearch/FlashResearch-DS-33k`
Inference with Alibaba-NLP/DeepResearch (Recommended)
This model is intended to be used directly with the DeepResearch repo.
Single 12–16GB GPU is enough for 4B FP16; FP8/INT4 quantization allows smaller VRAM. If you quantize, the summary model can be local as well.
Qwen team for the base 4B architecture Alibaba-NLP for DeepResearch CheapResearch contributors for the 33k dataset
v1.0.0 (2025-10-04) — First public release (33k distillation, DeepResearch-ready)