ByteDance-Seed

53 models • 8 total models in database

Sort by:

UI-TARS-1.5-7B

--- license: apache-2.0 language: - en pipeline_tag: image-text-to-text tags: - multimodal - gui library_name: transformers ---

Seed-OSS-36B-Instruct

You can get to know us better through the following channels👇 > [!NOTE] > This model card is dedicated to the `Seed-OSS-36B-Base-Instruct` model. News - [2025/08/20]🔥We release `Seed-OSS-36B-Base...

academic-ds-9B

This is a 9B model whose architecture is deepseek-v3, trained from scratch using 350B+ tokens from fully open-source, English-only datasets. It is designed for development and debugging purposes within the open-source community.

NaNK

license:apache-2.0

20,594

Seed-X-PPO-7B

SeedVR2-7B

SeedVR2: One-Step Video Restoration via Diffusion Adversarial Post-Training > Jianyi Wang, Shanchuan Lin, Zhijie Lin, Yuxi Ren, Meng Wei, Zongsheng Yue, Shangchen Zhou, Hao Chen, Yang Zhao, Ceyuan Yang, Xuefeng Xiao, Chen Change Loy, Lu Jiang > > Recent advances in diffusion-based video restoration (VR) demonstrate significant improvement in visual quality, yet yield a prohibitive computational cost during inference. While several distillation-based approaches have exhibited the potential of one-step image restoration, extending existing approaches to VR remains challenging and underexplored, due to the limited generation ability and poor temporal consistency, particularly when dealing with high-resolution video in real-world settings. In this work, we propose a one-step diffusion-based VR model, termed as SeedVR2, which performs adversarial VR training against real data. To handle the challenging high-resolution VR within a single step, we introduce several enhancements to both model architecture and training procedures. Specifically, an adaptive window attention mechanism is proposed, where the window size is dynamically adjusted to fit the output resolutions, avoiding window inconsistency observed under high-resolution VR using window attention with a predefined window size. To stabilize and improve the adversarial post-training towards VR, we further verify the effectiveness of a series of losses, including a proposed feature matching loss without significantly sacrificing training efficiency. Extensive experiments show that SeedVR2 can achieve comparable or even better performance compared with existing VR approaches in a single step. 📮 Notice Limitations: These are the prototype models and the performance may not be perfectly align with the paper. Our methods are sometimes not robust to heavy degradations and very large motions, and shares some failure cases with existing methods, e.g., fail to fully remove the degradation or simply generate unpleasing details. Moreover, due to the strong generation ability, Our methods tend to overly generate details on inputs with very light degradations, e.g., 720p AIGC videos, leading to oversharpened results occasionally. 📜 License SeedVR and SeedVR2 are licensed under the Apache 2.0.

NaNK

license:apache-2.0

2,840

Seed-OSS-36B-Base

You can get to know us better through the following channels👇 > [!NOTE] > This model card is dedicated to the `Seed-OSS-36B-Base` model. News - [2025/08/20]🔥We release `Seed-OSS-36B-Base` (both with and without synthetic data versions) and `Seed-OSS-36B-Instruct`. Introduction Seed-OSS is a series of open-source large language models developed by ByteDance's Seed Team, designed for powerful long-context, reasoning, agent and general capabilities, and versatile developer-friendly features. Although trained with only 12T tokens, Seed-OSS achieves excellent performance on several popular open benchmarks. We release this series of models to the open-source community under the Apache-2.0 license. > [!NOTE] > Seed-OSS is primarily optimized for international (i18n) use cases. Key Features - Flexible Control of Thinking Budget: Allowing users to flexibly adjust the reasoning length as needed. This capability of dynamically controlling the reasoning length enhances inference efficiency in practical application scenarios. - Enhanced Reasoning Capability: Specifically optimized for reasoning tasks while maintaining balanced and excellent general capabilities. - Agentic Intelligence: Performs exceptionally well in agentic tasks such as tool-using and issue resolving. - Research-Friendly: Given that the inclusion of synthetic instruction data in pre-training may affect the post-training research, we released pre-trained models both with and without instruction data, providing the research community with more diverse options. - Native Long Context: Trained with up-to-512K long context natively. Seed-OSS adopts the popular causal language model architecture with RoPE, GQA attention, RMSNorm and SwiGLU activation. | | | |:---:|:---:| | | Seed-OSS-36B | | Parameters | 36B | | Attention | GQA | | Activation Function | SwiGLU | | Number of Layers | 64 | | Number of QKV Heads | 80 / 8 / 8 | | Head Size | 128 | | Hidden Size | 5120 | | Vocabulary Size | 155K | | Context Length | 512K | | RoPE Base Frequency | 1e7 | Incorporating synthetic instruction data into pretraining leads to improved performance on most benchmarks. We adopt the version augmented with synthetic instruction data (i.e., w/ syn.) as `Seed-OSS-36B-Base`. We also release `Seed-OSS-36B-Base-woSyn` trained without such data (i.e., w/o syn.), offering the community a high-performance foundation model unaffected by synthetic instruction data. Benchmark Seed1.6-Base Qwen3-30B-A3B-Base-2507 Qwen2.5-32B-Base Seed-OSS-36B-Base ( w/ syn. ) Seed-OSS-36B-Base-woSyn ( w/o syn. ) - "" indicates that the results in this column are presented in the format of "reproducedresults (reportedresultsifany)". Benchmark Seed1.6-Thinking-0715 OAI-OSS-20B Qwen3-30B-A3B-Thinking-2507 Qwen3-32B Gemma3-27B Seed-OSS-36B-Instruct GPQA-D 80.7 72.2 (71.5) 71.4 (73.4) 66.7 (68.4) 42.4 71.4 LiveCodeBench v6 (02/2025-05/2025) 66.8 63.8 60.3 (66) 53.4 - 67.4 SWE-Bench Verified (OpenHands) 41.8 (60.7) 31 23.4 - 56 SWE-Bench Verified (AgentLess 410) 48.4 - 33.5 39.7 - 47 - Bold denotes open-source SOTA. Underlined indicates the second place in the open-source model. - "" indicates that the results in this column are presented in the format of "reproducedresults (reportedresultsifany)". Some results have been omitted due to the failure of the evaluation run. - The results of Gemma3-27B are sourced directly from its technical report. - The results of ArcAGI-V2 were measured on the official evaluation set, which was not involved in the training process. - Generation configs for Seed-OSS-36B-Instruct: temperature=1.1, topp=0.95. Specifically, for Taubench, temperature=1, topp=0.7. > [!NOTE] > We recommend sampling with `temperature=1.1` and `topp=0.95`. Users can flexibly specify the model's thinking budget. The figure below shows the performance curves across different tasks as the thinking budget varies. For simpler tasks (such as IFEval), the model's chain of thought (CoT) is shorter, and the score exhibits fluctuations as the thinking budget increases. For more challenging tasks (such as AIME and LiveCodeBench), the model's CoT is longer, and the score improves with an increase in the thinking budget. Here is an example with a thinking budget set to 512: during the reasoning process, the model periodically triggers self-reflection to estimate the consumed and remaining budget, and delivers the final response once the budget is exhausted or the reasoning concludes. If no thinking budget is set (default mode), Seed-OSS will initiate thinking with unlimited length. If a thinking budget is specified, users are advised to prioritize values that are integer multiples of 512 (e.g., 512, 1K, 2K, 4K, 8K, or 16K), as the model has been extensively trained on these intervals. Models are instructed to output a direct response when the thinking budget is 0, and we recommend setting any budget below 512 to this value. Download Seed-OSS checkpoint to `./Seed-OSS-36B-Instruct` Transformers The `generate.py` script provides a simple interface for model inference with configurable options. Key Parameters | Parameter | Description | |-----------|-------------| | `--modelpath` | Path to the pretrained model directory (required) | | `--prompts` | Input prompts (default: sample cooking/code questions) | | `--maxnewtokens` | Maximum tokens to generate (default: 4096) | | `--attnimplementation` | Attention mechanism: `flashattention2` (default) or `eager` | | `--loadin4bit/8bit` | Enable 4-bit/8-bit quantization (reduces memory usage) | | `--thinkingbudget` | Thinking budget in tokens (default: -1 for unlimited budget) | - First install vLLM with Seed-OSS support version: License This project is licensed under Apache-2.0. See the LICENSE flie for details. Founded in 2023, ByteDance Seed Team is dedicated to crafting the industry's most advanced AI foundation models. The team aspires to become a world-class research team and make significant contributions to the advancement of science and society.

NaNK

license:apache-2.0

2,834

Seed-Coder-8B-Base

Introduction We are thrilled to introduce Seed-Coder, a powerful, transparent, and parameter-efficient family of open-source code models at the 8B scale, featuring base, instruct, and reasoning variants. Seed-Coder contributes to promote the evolution of open code models through the following highlights. - Model-centric: Seed-Coder predominantly leverages LLMs instead of hand-crafted rules for code data filtering, minimizing manual effort in pretraining data construction. - Transparent: We openly share detailed insights into our model-centric data pipeline, including methods for curating GitHub data, commits data, and code-related web data. - Powerful: Seed-Coder achieves state-of-the-art performance among open-source models of comparable size across a diverse range of coding tasks. This repo contains the Seed-Coder-8B-Base model, with the following features: - Type: Causal language models - Training Stage: Pretraining - Data Source: GitHub data, code-related web data - Training Tokens: 6 trillion - Supports: Code completion, code infilling (Fill-in-the-Middle) - Context Length: 32,768 Model Downloads | Model Name | Length | Download | Notes | |---------------------------------------------------------|--------|------------------------------------|-----------------------| | 👉 Seed-Coder-8B-Base | 32K | 🤗 Model | Pretrained on our model-centric code data. | | Seed-Coder-8B-Instruct | 32K | 🤗 Model | Instruction-tuned for alignment with user intent. | | Seed-Coder-8B-Reasoning | 64K | 🤗 Model | RL trained to boost reasoning capabilities. | | Seed-Coder-8B-Reasoning-bf16 | 64K | 🤗 Model | RL trained to boost reasoning capabilities. | Requirements You will need to install the latest versions of `transformers` and `accelerate`: Here is a simple example demonstrating how to load the model and perform code generation using the Hugging Face `pipeline` API: Seed-Coder-8B-Base natively supports Fill-in-the-Middle (FIM) tasks, where the model is given a prefix and a suffix and asked to predict the missing middle content. This allows for code infilling scenarios such as completing a function body or inserting missing logic between two pieces of code. Seed-Coder-8B-Base has been evaluated on code generation, code completion, and code reasoning benchmarks, achieving state-of-the-art performance among ~8B open-source models. | | DeepSeek-Coder-6.7B-Base | OpenCoder-8B-Base | Qwen2.5-Coder-7B | Seed-Coder-8B-Base | |------------|:------------------------:|:-----------------:|:----------------:|:------------------:| | HumanEval | 47.6 | 66.5 | 72.0 | 77.4 | | MBPP | 70.2 | 79.9 | 79.4 | 82.0 | | MultiPL-E | 44.7 | 61.0 | 58.8 | 67.6 | | cruxeval-O | 41.0 | 43.9 | 56.0 | 54.8 | For detailed benchmark performance, please refer to our 📑 Technical Report. This project is licensed under the MIT License. See the LICENSE file for details. If you find Seed-Coder helpful, please consider citing our work:

NaNK

llama

2,350

Seed-X-PPO-7B-GPTQ-Int8

Introduction We are excited to introduce Seed-X, a powerful series of open-source multilingual translation language models, including an instruction model, a reinforcement learning model, and a reward model. It pushes the boundaries of translation capabilities within 7 billion parameters. We develop Seed-X as an accessible, off-the-shelf tool to support the community in advancing translation research and applications: Exceptional translation capabilities: Seed-X exhibits state-of-the-art translation capabilities, on par with or outperforming ultra-large models like Gemini-2.5, Claude-3.5, and GPT-4, as validated by human evaluations and automatic metrics. Deployment and inference-friendly: With a compact 7B parameter count and mistral architecture, Seed-X offers outstanding translation performance in a lightweight and efficient package, ideal for deployment and inference. Broad domain coverage: Seed-X excels on a highly challenging translation test set spanning diverse domains, including the internet, science and technology, office dialogues, e-commerce, biomedicine, finance, law, literature, and entertainment. This repo contains the Seed-X-PPO-7B-GPTQ-Int8 model, with the following features: Type: Causal language models Training Stage: Pretraining & Post-training Support: Multilingual translation among 28 languages Quantization: GPTQ 8-bit （We recommend using Seed-X-PPO model, as its translation performance is superior to Seed-X-Instruct.） | Languages | Abbr. | Languages | Abbr. | Languages | Abbr. | Languages | Abbr. | | ----------- | ----------- |-----------|-----------|-----------|-----------| -----------|-----------| |Arabic | ar |French | fr | Malay | ms | Russian | ru | |Czech | cs |Croatian | hr | Norwegian Bokmal | nb | Swedish | sv | |Danish | da |Hungarian | hu | Dutch | nl | Thai | th | |German | de |Indonesian | id | Norwegian | no | Turkish | tr | |English | en |Italian | it | Polish | pl | Ukrainian | uk | |Spanish | es |Japanese | ja | Portuguese | pt | Vietnamese | vi | |Finnish | fi |Korean | ko | Romanian | ro | Chinese | zh | Model Downloads | Model Name | Description | Download | | ----------- | ----------- |----------- | Seed-X-Instruct | Instruction-tuned for alignment with user intent. |🤗 Model| | Seed-X-PPO | RL trained to boost translation capabilities. | 🤗 Model| | 👉 Seed-X-PPO-GPTQ-Int8 | Quantization: GPTQ 8-bit. | 🤗 Model| | Seed-X-PPO-AWQ-Int4 | Quantization: AWQ 4-bit. | 🤗 Model| |Seed-X-RM | Reward model to evaluate the quality of translation.| 🤗 Model| 📮 Notice The language tags at the end of the prompt is necessary, which are used in PPO training. For example, when the target language is German, \ needs to be added. You can refer to the above table for language abbreviations. This model is specialized in multilingual translation, which is unexpected to support other tasks. We don't have any chat template, thus you don't have to perform . Please avoid prompting the model in a multi-round conversation format. We recommend against using unofficial quantized versions for local deployment. We will soon release an official quantized model and develop a demo on Hugging Face Space. Here is a simple example demonstrating how to load the model and perform translation using Evaluation We evaluated Seed-X on a diverse set of translation benchmarks, including FLORES-200, WMT-25, and a publicly released challenge set accompanied by human evaluations. For detailed benchmark results and analysis, please refer to our Technical Report. License This project is licensed under OpenMDW. See the LICENSE file for details. Citation If you find Seed-X useful for your research and applications, feel free to give us a star ⭐ or cite us using:

NaNK

—

1,623

UI-TARS-7B-DPO

UI-TARS-2B-SFT

NaNK

license:apache-2.0

1,281

SeedVR2-3B

NaNK

license:apache-2.0

867

BAGEL-7B-MoT

🥯 BAGEL • Unified Model for Multimodal Understanding and Generation > We present BAGEL, an open‑source multimodal foundation model with 7B active parameters (14B total) trained on large‑scale interleaved multimodal data. BAGEL outperforms the current top‑tier open‑source VLMs like Qwen2.5-VL and InternVL-2.5 on standard multimodal understanding leaderboards, and delivers text‑to‑image quality that is competitive with strong specialist generators such as SD3. Moreover, BAGEL demonstrates supe...

BFS-Prover-V1-7B

NaNK

license:apache-2.0

494

Stable-DiffCoder-8B-Instruct

UI-TARS-72B-DPO

UI-TARS-72B-DPO UI-TARS-2B-SFT  |  UI-TARS-7B-SFT  |  UI-TARS-7B-DPO(Recommended)  |  UI-TARS-72B-SFT  |  UI-TARS-72B-DPO(Recommended) Introduction UI-TARS is a next-generation native GUI agent model designed to interact seamlessly with graphical user interfaces (GUIs) using human-like perception, reasoning, and action capabilities. Unlike traditional modular frameworks, UI-TARS integrates all key components—perception, reasoning, grounding, and memory—within a single vision-language model (VLM), enabling end-to-end task automation without predefined workflows or manual rules. This repository contains the model for the paper UI-TARS: Pioneering Automated GUI Interaction with Native Agents. Performance Perception Capabilty Evaluation | Model | VisualWebBench | WebSRC | SQAshort | |---------------------------|---------------|---------|----------| | Qwen2-VL-7B | 73.3 | 81.8 | 84.9 | | Qwen-VL-Max | 74.1 | 91.1 | 78.6 | | Gemini-1.5-Pro | 75.4 | 88.9 | 82.2 | | UIX-Qwen2-7B | 75.9 | 82.9 | 78.8 | | Claude-3.5-Sonnet | 78.2 | 90.4 | 83.1 | | GPT-4o | 78.5 | 87.7 | 82.3 | | UI-TARS-2B | 72.9 | 89.2 | 86.4 | | UI-TARS-7B | 79.7 | 93.6 | 87.7 | | UI-TARS-72B | 82.8 | 89.3 | 88.6 | | Agent Model | Dev-Text | Dev-Icon | Dev-Avg | Creative-Text | Creative-Icon | Creative-Avg | CAD-Text | CAD-Icon | CAD-Avg | Scientific-Text | Scientific-Icon | Scientific-Avg | Office-Text | Office-Icon | Office-Avg | OS-Text | OS-Icon | OS-Avg | Avg-Text | Avg-Icon | Avg | |--------------------------|----------|----------|----------|--------------|--------------|--------------|---------|---------|---------|---------------|---------------|---------------|------------|------------|------------|--------|--------|--------|---------|---------|------| | QwenVL-7B | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.7 | 0.0 | 0.4 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.1 | 0.0 | 0.1 | | GPT-4o | 1.3 | 0.0 | 0.7 | 1.0 | 0.0 | 0.6 | 2.0 | 0.0 | 1.5 | 2.1 | 0.0 | 1.2 | 1.1 | 0.0 | 0.9 | 0.0 | 0.0 | 0.0 | 1.3 | 0.0 | 0.8 | | SeeClick | 0.6 | 0.0 | 0.3 | 1.0 | 0.0 | 0.6 | 2.5 | 0.0 | 1.9 | 3.5 | 0.0 | 2.0 | 1.1 | 0.0 | 0.9 | 2.8 | 0.0 | 1.5 | 1.8 | 0.0 | 1.1 | | Qwen2-VL-7B | 2.6 | 0.0 | 1.3 | 1.5 | 0.0 | 0.9 | 0.5 | 0.0 | 0.4 | 6.3 | 0.0 | 3.5 | 3.4 | 1.9 | 3.0 | 0.9 | 0.0 | 0.5 | 2.5 | 0.2 | 1.6 | | OS-Atlas-4B | 7.1 | 0.0 | 3.7 | 3.0 | 1.4 | 2.3 | 2.0 | 0.0 | 1.5 | 9.0 | 5.5 | 7.5 | 5.1 | 3.8 | 4.8 | 5.6 | 0.0 | 3.1 | 5.0 | 1.7 | 3.7 | | ShowUI-2B | 16.9 | 1.4 | 9.4 | 9.1 | 0.0 | 5.3 | 2.5 | 0.0 | 1.9 | 13.2 | 7.3 | 10.6 | 15.3 | 7.5 | 13.5 | 10.3 | 2.2 | 6.6 | 10.8 | 2.6 | 7.7 | | CogAgent-18B | 14.9 | 0.7 | 8.0 | 9.6 | 0.0 | 5.6 | 7.1 | 3.1 | 6.1 | 22.2 | 1.8 | 13.4 | 13.0 | 0.0 | 10.0 | 5.6 | 0.0 | 3.1 | 12.0 | 0.8 | 7.7 | | Aria-UI | 16.2 | 0.0 | 8.4 | 23.7 | 2.1 | 14.7 | 7.6 | 1.6 | 6.1 | 27.1 | 6.4 | 18.1 | 20.3 | 1.9 | 16.1 | 4.7 | 0.0 | 2.6 | 17.1 | 2.0 | 11.3 | | UGround-7B | 26.6 | 2.1 | 14.7 | 27.3 | 2.8 | 17.0 | 14.2 | 1.6 | 11.1 | 31.9 | 2.7 | 19.3 | 31.6 | 11.3 | 27.0 | 17.8 | 0.0 | 9.7 | 25.0 | 2.8 | 16.5 | | Claude Computer Use | 22.0 | 3.9 | 12.6 | 25.9 | 3.4 | 16.8 | 14.5 | 3.7 | 11.9 | 33.9 | 15.8 | 25.8 | 30.1 | 16.3 | 26.9 | 11.0 | 4.5 | 8.1 | 23.4 | 7.1 | 17.1 | | OS-Atlas-7B | 33.1 | 1.4 | 17.7 | 28.8 | 2.8 | 17.9 | 12.2 | 4.7 | 10.3 | 37.5 | 7.3 | 24.4 | 33.9 | 5.7 | 27.4 | 27.1 | 4.5 | 16.8 | 28.1 | 4.0 | 18.9 | | UGround-V1-7B | - | - | 35.5 | - | - | 27.8 | - | - | 13.5 | - | - | 38.8 | - | - | 48.8 | - | - | 26.1 | - | - | 31.1 | | UI-TARS-2B | 47.4 | 4.1 | 26.4 | 42.9 | 6.3 | 27.6 | 17.8 | 4.7 | 14.6 | 56.9 | 17.3 | 39.8 | 50.3 | 17.0 | 42.6 | 21.5 | 5.6 | 14.3 | 39.6 | 8.4 | 27.7 | | UI-TARS-7B | 58.4 | 12.4 | 36.1 | 50.0 | 9.1 | 32.8 | 20.8| 9.4 | 18.0| 63.9 | 31.8 | 50.0 | 63.3 | 20.8 | 53.5 | 30.8 | 16.9| 24.5 | 47.8 | 16.2 | 35.7 | | UI-TARS-72B | 63.0 | 17.3 | 40.8 | 57.1 | 15.4 | 39.6 | 18.8 | 12.5| 17.2 | 64.6 | 20.9 | 45.7 | 63.3 | 26.4 | 54.8 | 42.1| 15.7 | 30.1| 50.9| 17.5| 38.1 | | Method | Mobile-Text | Mobile-Icon/Widget | Desktop-Text | Desktop-Icon/Widget | Web-Text | Web-Icon/Widget | Avg | |--------|-------------|-------------|-------------|-------------|-------------|---------|---------| | Agent Framework | | | | | | | | | GPT-4 (SeeClick) | 76.6 | 55.5 | 68.0 | 28.6 | 40.9 | 23.3 | 48.8 | | GPT-4 (OmniParser) | 93.9 | 57.0 | 91.3 | 63.6 | 81.3 | 51.0 | 73.0 | | GPT-4 (UGround-7B) | 90.1 | 70.3 | 87.1 | 55.7 | 85.7 | 64.6 | 75.6 | | GPT-4o (SeeClick) | 81.0 | 59.8 | 69.6 | 33.6 | 43.9 | 26.2 | 52.3 | | GPT-4o (UGround-7B) | 93.4 | 76.9 | 92.8 | 67.9 | 88.7 | 68.9 | 81.4 | | Agent Model | | | | | | | | | GPT-4 | 22.6 | 24.5 | 20.2 | 11.8 | 9.2 | 8.8 | 16.2 | | GPT-4o | 20.2 | 24.9 | 21.1 | 23.6 | 12.2 | 7.8 | 18.3 | | CogAgent | 67.0 | 24.0 | 74.2 | 20.0 | 70.4 | 28.6 | 47.4 | | SeeClick | 78.0 | 52.0 | 72.2 | 30.0 | 55.7 | 32.5 | 53.4 | | Qwen2-VL | 75.5 | 60.7 | 76.3 | 54.3 | 35.2 | 25.7 | 55.3 | | UGround-7B | 82.8 | 60.3 | 82.5 | 63.6 | 80.4 | 70.4 | 73.3 | | Aguvis-G-7B | 88.3 | 78.2 | 88.1 | 70.7 | 85.7 | 74.8 | 81.8 | | OS-Atlas-7B | 93.0 | 72.9 | 91.8 | 62.9 | 90.9 | 74.3 | 82.5 | | Claude Computer Use | - | - | - | - | - | - | 83.0 | | Gemini 2.0 (Project Mariner) | - | - | - | - | - | - | 84.0 | | Aguvis-7B | 95.6 | 77.7 | 93.8 | 67.1 | 88.3 | 75.2 | 84.4 | | Aguvis-72B | 94.5 | 85.2 | 95.4 | 77.9 | 91.3 | 85.9 | 89.2 | | Our Model | | | | | | | | | UI-TARS-2B | 93.0 | 75.5 | 90.7 | 68.6 | 84.3 | 74.8 | 82.3 | | UI-TARS-7B | 94.5 | 85.2 | 95.9 | 85.7 | 90.0 | 83.5 | 89.5 | | UI-TARS-72B | 94.9 | 82.5 | 89.7 | 88.6 | 88.7 | 85.0 | 88.4 | | Method | Mobile-Text | Mobile-Icon/Widget | Desktop-Text | Desktop-Icon/Widget | Web-Text | Web-Icon/Widget | Avg | |--------|-------------|-------------|-------------|-------------|-------------|---------|---------| | Agent Framework | | | | | | | | | GPT-4o (SeeClick) | 85.2 | 58.8 | 79.9 | 37.1 | 72.7 | 30.1 | 63.6 | | GPT-4o (OS-Atlas-4B) | 95.5 | 75.8 | 79.4 | 49.3 | 90.2 | 66.5 | 79.1 | | GPT-4o (OS-Atlas-7B) | 96.2 | 83.4 | 89.7 | 69.3 | 94.0 | 79.8 | 87.1 | | Agent Model | | | | | | | | | SeeClick | 78.4 | 50.7 | 70.1 | 29.3 | 55.2 | 32.5 | 55.1 | | OS-Atlas-4B | 87.2 | 59.7 | 72.7 | 46.4 | 85.9 | 63.1 | 71.9 | | OS-Atlas-7B | 95.2 | 75.8 | 90.7 | 63.6 | 90.6 | 77.3 | 84.1 | | Our Model | | | | | | | | | UI-TARS-2B | 95.2 | 79.1 | 90.7 | 68.6 | 87.2 | 78.3 | 84.7 | | UI-TARS-7B | 96.9 | 89.1 | 95.4 | 85.0 | 93.6 | 85.2 | 91.6 | | UI-TARS-72B | 94.8 | 86.3 | 91.2 | 87.9 | 91.5 | 87.7 | 90.3 | Offline Agent Capability Evaluation - Multimodal Mind2Web | Method | Cross-Task Ele.Acc | Cross-Task Op.F1 | Cross-Task Step SR | Cross-Website Ele.Acc | Cross-Website Op.F1 | Cross-Website Step SR | Cross-Domain Ele.Acc | Cross-Domain Op.F1 | Cross-Domain Step SR | |--------|----------------------|-------------------|--------------------|----------------------|--------------------|-------------------|--------------------|-------------------|-------------------| | Agent Framework | | | | | | | | | | | GPT-4o (SeeClick) | 32.1 | - | - | 33.1 | - | - | 33.5 | - | - | | GPT-4o (UGround) | 47.7 | - | - | 46.0 | - | - | 46.6 | - | - | | GPT-4o (Aria-UI) | 57.6 | - | - | 57.7 | - | - | 61.4 | - | - | | GPT-4V (OmniParser) | 42.4 | 87.6 | 39.4 | 41.0 | 84.8 | 36.5 | 45.5 | 85.7 | 42.0 | | Agent Model | | | | | | | | | | | GPT-4o | 5.7 | 77.2 | 4.3 | 5.7 | 79.0 | 3.9 | 5.5 | 86.4 | 4.5 | | GPT-4 (SOM) | 29.6 | - | 20.3 | 20.1 | - | 13.9 | 27.0 | - | 23.7 | | GPT-3.5 (Text-only) | 19.4 | 59.2 | 16.8 | 14.9 | 56.5 | 14.1 | 25.2 | 57.9 | 24.1 | | GPT-4 (Text-only) | 40.8 | 63.1 | 32.3 | 30.2 | 61.0 | 27.0 | 35.4 | 61.9 | 29.7 | | Claude | 62.7 | 84.7 | 53.5 | 59.5 | 79.6 | 47.7 | 64.5 | 85.4 | 56.4 | | Aguvis-7B | 64.2 | 89.8 | 60.4 | 60.7 | 88.1 | 54.6 | 60.4 | 89.2 | 56.6 | | CogAgent | - | - | 62.3 | - | - | 54.0 | - | - | 59.4 | | Aguvis-72B | 69.5 | 90.8 | 64.0 | 62.6 | 88.6 | 56.5 | 63.5 | 88.5 | 58.2 | | Our Model | | | | | | | | | | | UI-TARS-2B | 62.3 | 90.0 | 56.3 | 58.5 | 87.2 | 50.8 | 58.8 | 89.6 | 52.3 | | UI-TARS-7B | 73.1 | 92.2 | 67.1 | 68.2 | 90.9 | 61.7 | 66.6 | 90.9 | 60.5 | | UI-TARS-72B | 74.7 | 92.5 | 68.6 | 72.4 | 91.2 | 63.5 | 68.9 | 91.8 | 62.1 | | Agent Models | AndroidControl-Low Type | AndroidControl-Low Grounding | AndroidControl-Low SR | AndroidControl-High Type | AndroidControl-High Grounding | AndroidControl-High SR | GUIOdyssey Type | GUIOdyssey Grounding | GUIOdyssey SR | |---------------------|----------------------|----------------------|----------------|----------------------|----------------------|----------------|----------------|----------------|----------------| | Claude | 74.3 | 0.0 | 19.4 | 63.7 | 0.0 | 12.5 | 60.9 | 0.0 | 3.1 | | GPT-4o | 74.3 | 0.0 | 19.4 | 66.3 | 0.0 | 20.8 | 34.3 | 0.0 | 3.3 | | SeeClick | 93.0 | 73.4 | 75.0 | 82.9 | 62.9 | 59.1 | 71.0 | 52.4 | 53.9 | | InternVL-2-4B | 90.9 | 84.1 | 80.1 | 84.1 | 72.7 | 66.7 | 82.1 | 55.5 | 51.5 | | Qwen2-VL-7B | 91.9 | 86.5 | 82.6 | 83.8 | 77.7 | 69.7 | 83.5 | 65.9 | 60.2 | | Aria-UI | -- | 87.7 | 67.3 | -- | 43.2 | 10.2 | -- | 86.8 | 36.5 | | OS-Atlas-4B | 91.9 | 83.8 | 80.6 | 84.7 | 73.8 | 67.5 | 83.5 | 61.4 | 56.4 | | OS-Atlas-7B | 93.6 | 88.0 | 85.2 | 85.2 | 78.5 | 71.2 | 84.5 | 67.8 | 62.0 | | Aguvis-7B | -- | -- | 80.5 | -- | -- | 61.5 | -- | -- | -- | | Aguvis-72B | -- | -- | 84.4 | -- | -- | 66.4 | -- | -- | -- | | UI-TARS-2B | 98.1 | 87.3 | 89.3 | 81.2 | 78.4 | 68.9 | 93.9 | 86.8 | 83.4 | | UI-TARS-7B | 98.0 | 89.3 | 90.8 | 83.7 | 80.5 | 72.5 | 94.6 | 90.1 | 87.0 | | UI-TARS-72B | 98.1 | 89.9 | 91.3 | 85.2 | 81.5 | 74.7 | 95.4 | 91.4 | 88.6 | | Method | OSWorld (Online) | AndroidWorld (Online) | |--------|-------------------|------------------| | Agent Framework | | | | GPT-4o (UGround) | - | 32.8 | | GPT-4o (Aria-UI) | 15.2 | 44.8 | | GPT-4o (Aguvis-7B) | 14.8 | 37.1 | | GPT-4o (Aguvis-72B) | 17.0 | - | | GPT-4o (OS-Atlas-7B) | 14.6 | - | | Agent Model | | | | GPT-4o | 5.0 | 34.5 (SoM) | | Gemini-Pro-1.5 | 5.4 | 22.8 (SoM) | | Aguvis-72B | 10.3 | 26.1 | | Claude Computer-Use | 14.9 (15 steps) | 27.9 | | Claude Computer-Use | 22.0 (50 steps) | - | | Our Model | | | | UI-TARS-7B-SFT | 17.7 (15 steps) | 33.0 | | UI-TARS-7B-DPO | 18.7 (15 steps) | - | | UI-TARS-72B-SFT | 18.8 (15 steps) | 46.6 | | UI-TARS-72B-DPO | 22.7 (15 steps) | - | | UI-TARS-72B-DPO | 24.6 (50 steps) | - | Citation If you find our paper and model useful in your research, feel free to give us a cite.

Stable-DiffCoder-8B-Base

NaNK

llama

423

BFS-Prover-V2-7B

BFS-Prover-V2: Scaling up Multi-Turn Off-Policy RL and Multi-Agent Tree Search for LLM Step-Provers We introduce BFS-Prover-V2, the state-of-the-art open-source step-level theorem proving system for Lean4, designed to address the dual challenges of scaling both training and inference in neural theorem proving. BFS-Prover-V2 introduces novel solutions to overcome these limitations through: 1. Training-time scaling: A novel multi-stage expert iteration framework with adaptive tactic-level data filtering and periodic retraining to surmount the performance plateaus that typically curtail long-term post training 2. Inference-time scaling: A planner-enhanced multi-agent tree search system for hierarchical reasoning that scales performance at inference time BFS-Prover-V2 achieves 95.08\% and 41.4\% on the miniF2F and ProofNet test sets respectively, setting a new state-of-the-art for step-level provers. This repo contains the BFS-Prover-V2-7B model, with the following features: - Base Model: Qwen2.5-Math-7B - Training Approach: Multi-stage expert iteration with best-first tree search - Training Data Sources: - Mathlib (via LeanDojo) - Lean-Github repositories - Autoformalized NuminaMath datasets - Goedel-Pset | Model | miniF2F-test | miniF2F-valid | ProofNet-test | |:------|:------------:|:-------------:|:-------------:| | 👉 BFS-Prover-V2-7B | 82.4% | - | - | | BFS-Prover-V2-32B | 86.1% | 85.5% | 41.4% | | BFS-Prover-V2-32B w/ Planner | 95.08% | 95.5% | - | Usage - The model expects input in the format `"{state}:::"` where {state} is a Lean4 tactic state. - `:::` serves as a special indicator to signal the model to generate a tactic for the given state. - The model will echo back the input state followed by the generated tactic. This project is licensed under the Apache License 2.0. For questions and feedback about the tactic generator model, please contact: - Ran Xin ([email protected]) - Zeyu Zheng ([email protected])

NaNK

license:apache-2.0

394

Seed-OSS-36B-Base-woSyn

You can get to know us better through the following channels👇 > [!NOTE] > This model card is dedicated to the `Seed-OSS-36B-Base-woSyn` model. News - [2025/08/20]🔥We release `Seed-OSS-36B-Base` (both with and without synthetic data versions) and `Seed-OSS-36B-Instruct`. Introduction Seed-OSS is a series of open-source large language models developed by ByteDance's Seed Team, designed for powerful long-context, reasoning, agent and general capabilities, and versatile developer-friendly features. Although trained with only 12T tokens, Seed-OSS achieves excellent performance on several popular open benchmarks. We release this series of models to the open-source community under the Apache-2.0 license. > [!NOTE] > Seed-OSS is primarily optimized for international (i18n) use cases. Key Features - Flexible Control of Thinking Budget: Allowing users to flexibly adjust the reasoning length as needed. This capability of dynamically controlling the reasoning length enhances inference efficiency in practical application scenarios. - Enhanced Reasoning Capability: Specifically optimized for reasoning tasks while maintaining balanced and excellent general capabilities. - Agentic Intelligence: Performs exceptionally well in agentic tasks such as tool-using and issue resolving. - Research-Friendly: Given that the inclusion of synthetic instruction data in pre-training may affect the post-training research, we released pre-trained models both with and without instruction data, providing the research community with more diverse options. - Native Long Context: Trained with up-to-512K long context natively. Seed-OSS adopts the popular causal language model architecture with RoPE, GQA attention, RMSNorm and SwiGLU activation. | | | |:---:|:---:| | | Seed-OSS-36B | | Parameters | 36B | | Attention | GQA | | Activation Function | SwiGLU | | Number of Layers | 64 | | Number of QKV Heads | 80 / 8 / 8 | | Head Size | 128 | | Hidden Size | 5120 | | Vocabulary Size | 155K | | Context Length | 512K | | RoPE Base Frequency | 1e7 | Incorporating synthetic instruction data into pretraining leads to improved performance on most benchmarks. We adopt the version augmented with synthetic instruction data (i.e., w/ syn.) as `Seed-OSS-36B-Base`. We also release `Seed-OSS-36B-Base-woSyn` trained without such data (i.e., w/o syn.), offering the community a high-performance foundation model unaffected by synthetic instruction data. Benchmark Seed1.6-Base Qwen3-30B-A3B-Base-2507 Qwen2.5-32B-Base Seed-OSS-36B-Base ( w/ syn. ) Seed-OSS-36B-Base-woSyn ( w/o syn. ) - "" indicates that the results in this column are presented in the format of "reproducedresults (reportedresultsifany)". Benchmark Seed1.6-Thinking-0715 OAI-OSS-20B Qwen3-30B-A3B-Thinking-2507 Qwen3-32B Gemma3-27B Seed-OSS-36B-Instruct GPQA-D 80.7 72.2 (71.5) 71.4 (73.4) 66.7 (68.4) 42.4 71.4 LiveCodeBench v6 (02/2025-05/2025) 66.8 63.8 60.3 (66) 53.4 - 67.4 SWE-Bench Verified (OpenHands) 41.8 (60.7) 31 23.4 - 56 SWE-Bench Verified (AgentLess 410) 48.4 - 33.5 39.7 - 47 - Bold denotes open-source SOTA. Underlined indicates the second place in the open-source model. - "" indicates that the results in this column are presented in the format of "reproducedresults (reportedresultsifany)". Some results have been omitted due to the failure of the evaluation run. - The results of Gemma3-27B are sourced directly from its technical report. - The results of ArcAGI-V2 were measured on the official evaluation set, which was not involved in the training process. - Generation configs for Seed-OSS-36B-Instruct: temperature=1.1, topp=0.95. Specifically, for Taubench, temperature=1, topp=0.7. > [!NOTE] > We recommend sampling with `temperature=1.1` and `topp=0.95`. Users can flexibly specify the model's thinking budget. The figure below shows the performance curves across different tasks as the thinking budget varies. For simpler tasks (such as IFEval), the model's chain of thought (CoT) is shorter, and the score exhibits fluctuations as the thinking budget increases. For more challenging tasks (such as AIME and LiveCodeBench), the model's CoT is longer, and the score improves with an increase in the thinking budget. Here is an example with a thinking budget set to 512: during the reasoning process, the model periodically triggers self-reflection to estimate the consumed and remaining budget, and delivers the final response once the budget is exhausted or the reasoning concludes. If no thinking budget is set (default mode), Seed-OSS will initiate thinking with unlimited length. If a thinking budget is specified, users are advised to prioritize values that are integer multiples of 512 (e.g., 512, 1K, 2K, 4K, 8K, or 16K), as the model has been extensively trained on these intervals. Models are instructed to output a direct response when the thinking budget is 0, and we recommend setting any budget below 512 to this value. Download Seed-OSS checkpoint to `./Seed-OSS-36B-Instruct` Transformers The `generate.py` script provides a simple interface for model inference with configurable options. Key Parameters | Parameter | Description | |-----------|-------------| | `--modelpath` | Path to the pretrained model directory (required) | | `--prompts` | Input prompts (default: sample cooking/code questions) | | `--maxnewtokens` | Maximum tokens to generate (default: 4096) | | `--attnimplementation` | Attention mechanism: `flashattention2` (default) or `eager` | | `--loadin4bit/8bit` | Enable 4-bit/8-bit quantization (reduces memory usage) | | `--thinkingbudget` | Thinking budget in tokens (default: -1 for unlimited budget) | - First install vLLM with Seed-OSS support version: License This project is licensed under Apache-2.0. See the LICENSE flie for details. Founded in 2023, ByteDance Seed Team is dedicated to crafting the industry's most advanced AI foundation models. The team aspires to become a world-class research team and make significant contributions to the advancement of science and society.

NaNK

license:apache-2.0

364

BFS-Prover-V2-32B

BFS-Prover-V2: Scaling up Multi-Turn Off-Policy RL and Multi-Agent Tree Search for LLM Step-Provers We introduce BFS-Prover-V2, the state-of-the-art open-source step-level theorem proving system for Lean4, designed to address the dual challenges of scaling both training and inference in neural theorem proving. BFS-Prover-V2 introduces novel solutions to overcome these limitations through: 1. Training-time scaling: A novel multi-stage expert iteration framework with adaptive tactic-level data filtering and periodic retraining to surmount the performance plateaus that typically curtail long-term post training 2. Inference-time scaling: A planner-enhanced multi-agent tree search system for hierarchical reasoning that scales performance at inference time BFS-Prover-V2 achieves 95.08\% and 41.4\% on the miniF2F and ProofNet test sets respectively, setting a new state-of-the-art for step-level provers. This repo contains the BFS-Prover-V2-32B model, with the following features: - Base Model: Qwen2.5-32B - Training Approach: Multi-stage expert iteration with best-first tree search - Training Data Sources: - Mathlib (via LeanDojo) - Lean-Github repositories - Autoformalized NuminaMath datasets - Goedel-Pset | Model | miniF2F-test | miniF2F-valid | ProofNet-test | |:------|:------------:|:-------------:|:-------------:| | BFS-Prover-V2-7B | 82.4% | - | - | | 👉 BFS-Prover-V2-32B | 86.1% | 85.5% | 41.4% | | 👉 BFS-Prover-V2-32B w/ Planner | 95.08% | 95.5% | - | Usage - The model expects input in the format `"{state}:::"` where {state} is a Lean4 tactic state. - `:::` serves as a special indicator to signal the model to generate a tactic for the given state. - The model will echo back the input state followed by the generated tactic. This project is licensed under the Apache License 2.0. For questions and feedback about the tactic generator model, please contact: - Ran Xin ([email protected]) - Zeyu Zheng ([email protected])

NaNK

license:apache-2.0

355

Tar-7B

NaNK

license:apache-2.0

336

AHN-Mamba2-for-Qwen-2.5-Instruct-3B

NaNK

—

269

M3-Agent-Control

license:apache-2.0

231

Seed-Coder-8B-Reasoning

Seed-X-RM-7B

NaNK

—

204

M3-Agent-Memorization

license:apache-2.0

203

Seed-X-PPO-7B-AWQ-Int4

Introduction We are excited to introduce Seed-X, a powerful series of open-source multilingual translation language models, including an instruction model, a reinforcement learning model, and a reward model. It pushes the boundaries of translation capabilities within 7 billion parameters. We develop Seed-X as an accessible, off-the-shelf tool to support the community in advancing translation research and applications: Exceptional translation capabilities: Seed-X exhibits state-of-the-art translation capabilities, on par with or outperforming ultra-large models like Gemini-2.5, Claude-3.5, and GPT-4, as validated by human evaluations and automatic metrics. Deployment and inference-friendly: With a compact 7B parameter count and mistral architecture, Seed-X offers outstanding translation performance in a lightweight and efficient package, ideal for deployment and inference. Broad domain coverage: Seed-X excels on a highly challenging translation test set spanning diverse domains, including the internet, science and technology, office dialogues, e-commerce, biomedicine, finance, law, literature, and entertainment. This repo contains the Seed-X-PPO-7B-AWQ-Int4 model, with the following features: Type: Causal language models Training Stage: Pretraining & Post-training Support: Multilingual translation among 28 languages Quantization: GPTQ 4-bit （We recommend using Seed-X-PPO model, as its translation performance is superior to Seed-X-Instruct.） | Languages | Abbr. | Languages | Abbr. | Languages | Abbr. | Languages | Abbr. | | ----------- | ----------- |-----------|-----------|-----------|-----------| -----------|-----------| |Arabic | ar |French | fr | Malay | ms | Russian | ru | |Czech | cs |Croatian | hr | Norwegian Bokmal | nb | Swedish | sv | |Danish | da |Hungarian | hu | Dutch | nl | Thai | th | |German | de |Indonesian | id | Norwegian | no | Turkish | tr | |English | en |Italian | it | Polish | pl | Ukrainian | uk | |Spanish | es |Japanese | ja | Portuguese | pt | Vietnamese | vi | |Finnish | fi |Korean | ko | Romanian | ro | Chinese | zh | Model Downloads | Model Name | Description | Download | | ----------- | ----------- |----------- | Seed-X-Instruct | Instruction-tuned for alignment with user intent. |🤗 Model| | Seed-X-PPO | RL trained to boost translation capabilities. | 🤗 Model| | Seed-X-PPO-GPTQ-Int8 | Quantization: GPTQ 8-bit. | 🤗 Model| | 👉 Seed-X-PPO-AWQ-Int4 | Quantization: AWQ 4-bit. | 🤗 Model| |Seed-X-RM | Reward model to evaluate the quality of translation.| 🤗 Model| 📮 Notice The language tags at the end of the prompt is necessary, which are used in PPO training. For example, when the target language is German, \ needs to be added. You can refer to the above table for language abbreviations. This model is specialized in multilingual translation, which is unexpected to support other tasks. We don't have any chat template, thus you don't have to perform . Please avoid prompting the model in a multi-round conversation format. We recommend against using unofficial quantized versions for local deployment. We will soon release an official quantized model and develop a demo on Hugging Face Space. Here is a simple example demonstrating how to load the model and perform translation using Evaluation We evaluated Seed-X on a diverse set of translation benchmarks, including FLORES-200, WMT-25, and a publicly released challenge set accompanied by human evaluations. For detailed benchmark results and analysis, please refer to our Technical Report. License This project is licensed under OpenMDW. See the LICENSE file for details. Citation If you find Seed-X useful for your research and applications, feel free to give us a star ⭐ or cite us using:

NaNK

—

195

SAIL-7B

NaNK

license:apache-2.0

178

cudaLLM-8B

CudaLLM: A Language Model for High-Performance CUDA Kernel Generation Model Description cudaLLM-8B is a language model for generating high-performance and syntactically correct CUDA kernels. It is based on the Qwen3-8B model and has undergone a two-stage training process to master the complexities of parallel programming for GPUs. Performance on KernelBench: | | Bo1 | Bo2 | Bo4 | Bo8 | Bo16 | |---------|-------|-----|-----|-----|------| | Level-1 | 79.75 | 83 | 84 | 86 | 87 | | Level-2 | 67.30 | 70 | 71 | 72 | 73 | | Level-3 | 20.83 | 26 | 30 | 34 | 36 | Training Procedure The model was trained using the verl library. The model was trained and evaluated on: - SFT Dataset: A high-quality dataset of CUDA problem-solution pairs (sftcudallmr1.parquet), originally generated by DeepSeek R1, DeepSeel Coder-7B, and Qwen2-32B. - RL Dataset: A refined dataset (rlcudallm0424.parquet) used to provide performance-based rewards during the RL stage. - Evaluation Dataset: The model's performance was benchmarked against the KernelBench dataset. Intended Use and Limitations Intended Use The primary use of CudaLLM is to assist developers in writing and optimizing high-performance CUDA kernels. It can be used for: - Accelerating scientific computing and machine learning workloads. - As a co-pilot or productivity tool for HPC and CUDA developers. - Research into AI-driven code generation and optimization. Limitations and Bias - Correctness is Not Guaranteed: While trained to produce correct code, the model's output should always be rigorously tested and verified before deployment in production systems. - Security Risks: The generated code is not guaranteed to be secure. Never run model-generated code from an untrusted source without careful inspection. - Performance Variability: Kernel performance can vary significantly depending on the target GPU architecture, input data sizes, and compiler version. The generated code may require further manual tuning. - Specialized Domain: This model is highly specialized for CUDA code generation. Its performance on general-purpose programming tasks or natural language conversation will be limited.

NaNK

license:apache-2.0

168

AHN-GDN-for-Qwen-2.5-Instruct-3B

NaNK

—

167

AHN-DN-for-Qwen-2.5-Instruct-3B

NaNK

—

160

AHN-GDN-for-Qwen-2.5-Instruct-14B

NaNK

license:apache-2.0

152

AHN-GDN-for-Qwen-2.5-Instruct-7B

NaNK

license:apache-2.0

152

AHN-DN-for-Qwen-2.5-Instruct-7B

NaNK

license:apache-2.0

147

AHN-Mamba2-for-Qwen-2.5-Instruct-7B

NaNK

license:apache-2.0

146

AHN-DN-for-Qwen-2.5-Instruct-14B

NaNK

license:apache-2.0

143

SeedVR-3B

NaNK

license:apache-2.0

134

AHN Mamba2 For Qwen 2.5 Instruct 14B

AHN: Artificial Hippocampus Networks for Efficient Long-Context Modeling > Artificial Hippocampus Networks (AHNs) transform lossless memory into fixed-size compressed representations for long-context modeling. Lossless memory (e.g., attention’s key-value (KV) cache) stores exact input information but grows with sequence length, making it inefficient for long sequences. In contrast, compressed memory (e.g., RNNs’ hidden state) maintains a constant size and offers fixed computational costs per input token, but this comes at the cost of information loss. To harness the benefits of both memory types, AHNs continually convert lossless memory outside the sliding attention window into compressed form. AHNs can be instantiated with any RNN-like architectures. The model then integrates both memory types to make predictions across long contexts. This repository hosts the model weights for AHN. For installation, usage instructions, and further documentation, please visit our GitHub repository. (a) Illustration of the model augmented with Artificial Hippocampus Networks (AHNs). In this example, the sliding window length is 3. When the input sequence length is less than or equal to the window length, the model operates identically to a standard Transformer. For longer sequences, AHNs continually compress the token outside the window into a compact memory representation. The model then utilizes both the lossless information within window, and the compressed memory to generate the next token. (b) Self-distillation training framework of AHNs based on an open-weight LLM. During training, the base LLM's weights are frozen, and only the AHNs' parameters are trained. Model Zoo | base model | AHN module | #params | checkpoint (AHN only) | |:---:|:---:| :---:|:---:| | Qwen2.5-3B-Instruct | Mamba2 | 11.9M | 🤗model | | Qwen2.5-3B-Instruct | DeltaNet | 11.8M | 🤗model | | Qwen2.5-3B-Instruct | GatedDeltaNet | 13.0M | 🤗model | | Qwen2.5-7B-Instruct | Mamba2 | 18.6M | 🤗model | | Qwen2.5-7B-Instruct | DeltaNet | 18.5M | 🤗model | | Qwen2.5-7B-Instruct | GatedDeltaNet | 21.3M | 🤗model | | Qwen2.5-14B-Instruct | Mamba2 | 51.4M | 🤗model | | Qwen2.5-14B-Instruct | DeltaNet | 51.1M | 🤗model | | Qwen2.5-14B-Instruct | GatedDeltaNet | 61.0M | 🤗model | - Yunhao Fang: [email protected] - Weihao Yu (corresponding author): [email protected]

NaNK

license:apache-2.0

131

Seed-Coder-8B-Reasoning-bf16

NaNK

llama

127

UI-TARS-72B-SFT

NaNK

license:apache-2.0

125

Tar-1.5B

NaNK

license:apache-2.0

122

SeedVR-7B

NaNK

license:apache-2.0

VINCIE-3B

NaNK

license:apache-2.0

cryofm-v1

license:apache-2.0

cryofm-v2

license:apache-2.0

Tar-TA-Tok

NaNK

license:apache-2.0

bamboo_mixer

license:cc-by-4.0

BM-Model

[](https://arxiv.org/abs/2506.03107) [](https://boese0601.github.io/bytemorph/) [](https://huggingface.co/datasets/ByteDance-Seed/BM-Bench) [](https://huggingface.co/datasets/ByteDance-Seed/BM-6M-Demo) [](https://huggingface.co/datasets/ByteDance-Seed/BM-6M) [](https://huggingface.co/spaces/Boese0601/ByteMorpher-Demo) [](https://huggingface.co/ByteDance-Seed/BM-Model) [](https://github.com/ByteDance-Seed/BM-code)

NaNK

—

ConfRover-interp-20M-v1.0

license:apache-2.0

byteff2

license:apache-2.0

VINCIE-7B

NaNK

license:apache-2.0

ByteDance-Seed

UI-TARS-1.5-7B

Seed-OSS-36B-Instruct

academic-ds-9B

Seed-X-PPO-7B

SeedVR2-7B

Seed-OSS-36B-Base

Seed-Coder-8B-Base

Seed-Coder-8B-Instruct

Seed-X-Instruct-7B

UI-TARS-7B-SFT

Seed-X-PPO-7B-GPTQ-Int8

UI-TARS-7B-DPO

UI-TARS-2B-SFT

SeedVR2-3B

BAGEL-7B-MoT

BFS-Prover-V1-7B

Stable-DiffCoder-8B-Instruct

UI-TARS-72B-DPO

Stable-DiffCoder-8B-Base

BFS-Prover-V2-7B

Seed-OSS-36B-Base-woSyn

BFS-Prover-V2-32B

Tar-7B

AHN-Mamba2-for-Qwen-2.5-Instruct-3B

M3-Agent-Control

Seed-Coder-8B-Reasoning

Seed-X-RM-7B

M3-Agent-Memorization

Seed-X-PPO-7B-AWQ-Int4

SAIL-7B

cudaLLM-8B

AHN-GDN-for-Qwen-2.5-Instruct-3B

AHN-DN-for-Qwen-2.5-Instruct-3B

AHN-GDN-for-Qwen-2.5-Instruct-14B

AHN-GDN-for-Qwen-2.5-Instruct-7B

AHN-DN-for-Qwen-2.5-Instruct-7B

AHN-Mamba2-for-Qwen-2.5-Instruct-7B

AHN-DN-for-Qwen-2.5-Instruct-14B

SeedVR-3B

AHN Mamba2 For Qwen 2.5 Instruct 14B

Seed-Coder-8B-Reasoning-bf16

UI-TARS-72B-SFT

Tar-1.5B

SeedVR-7B

VINCIE-3B

cryofm-v1

cryofm-v2

Tar-TA-Tok

bamboo_mixer

BM-Model

ConfRover-interp-20M-v1.0

byteff2

VINCIE-7B