Step-3.5-Flash-GGUF-Q8_0
325
3
license:apache-2.0
by
stepfun-ai
Other
OTHER
New
325 downloads
Early-stage
Edge AI:
Mobile
Laptop
Server
Unknown
Mobile
Laptop
Server
Quick Summary
AI model with specialized capabilities.
Code Examples
5.2 Setupbash
pip install --upgrade "openai>=1.0"6.1 vLLMbashvllm
# via Docker
docker pull vllm/vllm-openai:nightly
# or via pip (nightly wheels)
pip install -U vllm --pre \
--index-url https://pypi.org/simple \
--extra-index-url https://wheels.vllm.ai/nightlytextvllm
You can also refer to the [Step-3.5-Flash](https://github.com/vllm-project/recipes/blob/main/StepFun/Step-3.5-Flash.md) recipe.
### 6.2 SGLang
1. Install SGLang.6.2 SGLangtext
2. Launch the server.
- For bf16 modeltext
## 7. Using Step 3.5 Flash on Agent Platforms
### 7.1 Claude Code & Codex
It's straightforward to add Step 3.5 Flash to the list of models in most coding environments. See below for the instructions for configuring Claude Code and Codex to use Step 3.5 Flash.
#### 7.1.1 Prerequisites
Sign up at StepFun.ai or OpenRouter and grab an API key, as mentioned in the Quick Start.
#### 7.1.2 Environment setup
Claude Code and Codex rely on Node.js. We recommend installing Node.js version > v20. You can install Node via nvm.
**Mac/Linux**:Users in China can set up npm mirrortext
**Windows**:
You can download the installation file (`nvm-setup.exe`) from [https://github.com/coreybutler/nvm-windows/releases](https://github.com/coreybutler/nvm-windows/releases). Follow the instructions to install nvm. Run nvm commands to make sure it is installed.
#### 7.1.3 Use Step 3.5 Flash on Claude Code
1. Install Claude Code.install claude code via npmtext
2. Configure Claude Code.
To accommodate diverse workflows in Claude Code, we support both **Anthropic-style** and **OpenAI-style** APIs.
**Option A: Anthropic API style**:
> If you intend to use the **OpenRouter** API, refer to the OpenRouter integration guide.
Step 1: Edit Claude Settings. Update `~/.claude/settings.json`.
> You only need to modify the fields shown below. Leave the rest of the file unchanged.text
**Option B: OpenAI API style**
> Note: OpenAI API style here refers to the `chat/completions/` format.
> We recommend using `claude-code-router`. For details, see [https://github.com/musistudio/claude-code-router](https://github.com/musistudio/claude-code-router).
After Claude Code is installed, install `claude-code-router` :install ccr via npmtext
Add the following configurations to `~/.claude-code-router/config.json`.Start Claudetext
#### 7.1.4 Use Step 3.5 Flash on Codex
1. Install Codex7.1.4 Use Step 3.5 Flash on Codextext
2. Configure Codex
Add the following settings to `~/.codex/config.toml`, keeping the rest of the settings as they are.text
#### 7.1.5 Use Step 3.5 Flash on Step-DeepResearch (DeepResearch)
1. Use the reference environment setup below and configure `MODEL_NAME` to `Step-3.5-Flash`. [https://github.com/stepfun-ai/StepDeepResearch?tab=readme-ov-file#1-environment-setup](https://github.com/stepfun-ai/StepDeepResearch?tab=readme-ov-file#1-environment-setup)
## 8. Known Issues and Future Directions
1. **Token Efficiency**. Step 3.5 Flash achieves frontier-level agentic intelligence but currently relies on longer generation trajectories than Gemini 3.0 Pro to reach comparable quality.
2. **Efficient Universal Mastery**. We aim to unify generalist versatility with deep domain expertise. To achieve this efficiently, we are advancing variants of on-policy distillation, allowing the model to internalize expert behaviors with higher sample efficiency.
3. **RL for More Agentic Tasks**. While Step 3.5 Flash demonstrates competitive performance on academic agentic benchmarks, the next frontier of agentic AI necessitates the application of RL to intricate, expert-level tasks found in professional work, engineering, and research.
4. **Operational Scope and Constraints**. Step 3.5 Flash is tailored for coding and work-centric tasks, but may experience reduced stability during distribution shifts. This typically occurs in highly specialized domains or long-horizon, multi-turn dialogues, where the model may exhibit repetitive reasoning, mixed-language outputs, or inconsistencies in time and identity awareness.
## 9. Co-Developing the Future
We view our roadmap as a living document, evolving continuously based on real-world usage and developer feedback.
As we work to shape the future of AGI by expanding broad model capabilities, we want to ensure we are solving the right problems. We invite you to be part of this continuous feedback loop—your insights directly influence our priorities.
- **Join the Conversation**: Our Discord community is the primary hub for brainstorming future architectures, proposing capabilities, and getting early access updates 🚀
- **Report Friction**: Encountering limitations? You can open an issue on GitHub or flag it directly in our Discord support channels.
## 📜 Citation
If you find this project useful in your research, please cite our technical report:Deploy This Model
Production-ready deployment in minutes
Together.ai
Instant API access to this model
Production-ready inference API. Start free, scale to millions.
Try Free APIReplicate
One-click model deployment
Run models in the cloud with simple API. No DevOps required.
Deploy NowDisclosure: We may earn a commission from these partners. This helps keep LLMYourWay free.