Step-3.5-Flash-GGUF-Q8_0

Name: Step-3.5-Flash-GGUF-Q8_0
Author: stepfun-ai

325

license:apache-2.0

stepfun-ai

Other

OTHER

New

325 downloads

Early-stage

Try on Hugging Face Add to Compare

Edge AI:

Mobile

Laptop

Server

Unknown

Mobile

Laptop

Server

Quick Summary

AI model with specialized capabilities.

Code Examples

5.2 Setupbash

pip install --upgrade "openai>=1.0"

6.1 vLLMbashvllm

# via Docker
docker pull vllm/vllm-openai:nightly

# or via pip (nightly wheels)
pip install -U vllm --pre \
  --index-url https://pypi.org/simple \
  --extra-index-url https://wheels.vllm.ai/nightly

textvllm

You can also refer to the [Step-3.5-Flash](https://github.com/vllm-project/recipes/blob/main/StepFun/Step-3.5-Flash.md) recipe.

### 6.2 SGLang

1. Install SGLang.

6.2 SGLangtext

2. Launch the server.
  - For bf16 model

text

## 7. Using Step 3.5 Flash on Agent Platforms

### 7.1 Claude Code & Codex
It's straightforward to add Step 3.5 Flash to the list of models in most coding environments. See below for the instructions for configuring Claude Code and Codex to use Step 3.5 Flash.

#### 7.1.1 Prerequisites
Sign up at StepFun.ai or OpenRouter and grab an API key, as mentioned in the Quick Start.

#### 7.1.2 Environment setup
Claude Code and Codex rely on Node.js. We recommend installing Node.js version > v20. You can install Node via nvm.

**Mac/Linux**:

Users in China can set up npm mirrortext

**Windows**:
You can download the installation file (`nvm-setup.exe`) from [https://github.com/coreybutler/nvm-windows/releases](https://github.com/coreybutler/nvm-windows/releases). Follow the instructions to install nvm. Run nvm commands to make sure it is installed.

#### 7.1.3 Use Step 3.5 Flash on Claude Code

1. Install Claude Code.

install claude code via npmtext

2. Configure Claude Code.

  To accommodate diverse workflows in Claude Code, we support both **Anthropic-style** and **OpenAI-style** APIs. 

  **Option A: Anthropic API style**:

  > If you intend to use the **OpenRouter** API, refer to the OpenRouter integration guide.  

  Step 1: Edit Claude Settings. Update `~/.claude/settings.json`.
  > You only need to modify the fields shown below. Leave the rest of the file unchanged.

text

**Option B: OpenAI API style**

  > Note: OpenAI API style here refers to the `chat/completions/` format.

  > We recommend using `claude-code-router`. For details, see [https://github.com/musistudio/claude-code-router](https://github.com/musistudio/claude-code-router).

  After Claude Code is installed, install `claude-code-router` :

install ccr via npmtext

Add the following configurations to `~/.claude-code-router/config.json`.

Start Claudetext

#### 7.1.4 Use Step 3.5 Flash on Codex
1. Install Codex

7.1.4 Use Step 3.5 Flash on Codextext

2. Configure Codex
Add the following settings to `~/.codex/config.toml`, keeping the rest of the settings as they are.

text

#### 7.1.5 Use Step 3.5 Flash on Step-DeepResearch (DeepResearch)
1. Use the reference environment setup below and configure `MODEL_NAME` to `Step-3.5-Flash`. [https://github.com/stepfun-ai/StepDeepResearch?tab=readme-ov-file#1-environment-setup](https://github.com/stepfun-ai/StepDeepResearch?tab=readme-ov-file#1-environment-setup)


## 8. Known Issues and Future Directions

1. **Token Efficiency**. Step 3.5 Flash achieves frontier-level agentic intelligence but currently relies on longer generation trajectories than Gemini 3.0 Pro to reach comparable quality.
2. **Efficient Universal Mastery**. We aim to unify generalist versatility with deep domain expertise. To achieve this efficiently, we are advancing variants of on-policy distillation, allowing the model to internalize expert behaviors with higher sample efficiency.
3. **RL for More Agentic Tasks**. While Step 3.5 Flash demonstrates competitive performance on academic agentic benchmarks, the next frontier of agentic AI necessitates the application of RL to intricate, expert-level tasks found in professional work, engineering, and research.
4. **Operational Scope and Constraints**. Step 3.5 Flash is tailored for coding and work-centric tasks, but may experience reduced stability during distribution shifts. This typically occurs in highly specialized domains or long-horizon, multi-turn dialogues, where the model may exhibit repetitive reasoning, mixed-language outputs, or inconsistencies in time and identity awareness.

## 9. Co-Developing the Future

We view our roadmap as a living document, evolving continuously based on real-world usage and developer feedback.
As we work to shape the future of AGI by expanding broad model capabilities, we want to ensure we are solving the right problems. We invite you to be part of this continuous feedback loop—your insights directly influence our priorities.

- **Join the Conversation**: Our Discord community is the primary hub for brainstorming future architectures, proposing capabilities, and getting early access updates 🚀
- **Report Friction**: Encountering limitations? You can open an issue on GitHub or flag it directly in our Discord support channels.

## 📜 Citation

If you find this project useful in your research, please cite our technical report:

Deploy This Model

Production-ready deployment in minutes

Together.ai

Instant API access to this model

Fastest API

Production-ready inference API. Start free, scale to millions.

Try Free API

Replicate

One-click model deployment

Easiest Setup

Run models in the cloud with simple API. No DevOps required.

Deploy Now

Disclosure: We may earn a commission from these partners. This helps keep LLMYourWay free.