meituan-longcat

24 models • 6 total models in database

Sort by:

LongCat-Flash-Chat

Model Introduction We introduce LongCat-Flash, a powerful and efficient language model with 560 billion total parameters, featuring an innovative Mixture-of-Experts (MoE) architecture. The model incorporates a dynamic computation mechanism that activates 18.6B∼31.3B parameters (averaging∼27B) based on contextual demands, optimizing both computational efficiency and performance. To achieve advanced training and inference efficiency, we employ a shortcut-connected architecture that expands computation-communication overlap window, achieving over 100 tokens per second (TPS) for inference cost-effectively. Our comprehensive training and scaling strategies ensure stable, efficient training, while tailored data strategies enhance model performance. Now we release LongCat-Flash-Chat, a non-thinking foundation model that delivers highly competitive performance among leading models, with exceptional strengths in agentic tasks. 🌟 Scalable Architectural Design for Computational Efficiency LongCat-Flash is designed and optimized under two key principles: efficient computation utilization, as well as efficient training and inference. Specifically, (1) As not all tokens are equal, we introduce the zero-computation experts mechanism in MoE blocks to allocate a dynamic computation budget to important tokens based on their significance, i.e., activating 18.6 to 31.3 billion parameters (out of 560 billion total) based on contextual demands. To ensure consistent computation load, we employ expert bias adjusted by a PID-controller, maintaining an average of∼27 billion activated parameters per token. (2) As communication overhead becomes a bottleneck during MoE model scaling, we incorporate the Shortcut-connected MoE (ScMoE) design to expand the computation-communication overlap window. Combined with customized infrastructure optimizations, this design enables training at a massive scale of over tens of thousands accelerators and inference with high throughput and low latency. Effectively and efficiently scaling model size remains a key challenge in strategy design. To this end, we develop a comprehensive stability-and-scaling framework for robustly training large-scale models: (1) We successfully apply a hyperparameter transfer strategy to such a large model, predicting optimal hyperparameter configurations by leveraging results from smaller proxy models with theoretical guarantees. (2) We initialize the model using a model-growth mechanism based on a refined half-scale checkpoint, achieving improved performance compared to conventional initialization methods. (3) A multi-pronged stability suite incorporates principled router-gradient balancing, a hidden z-loss to suppress massive activations, and fine-tuned optimizer configurations. (4) To enhance the reliability of large-scale cluster training, we introduce deterministic computation. This guarantees the exact reproducibility of experiments and enables the detection of SDC (Silent Data Corruption) during the training process. These interventions ensure that LongCat-Flash ’s training remains stable, with no irrecoverable loss spikes. 🌟 Multi-Stage Training Pipeline for Agentic Capability Through a meticulously designed pipeline, LongCat-Flash is endowed with advanced agentic behaviors. Initial efforts focus on constructing a more suitable base model for agentic post-training, where we design a two-stage pretraining data fusion strategy to concentrate reasoning-intensive domain data. During mid-training, we enhance reasoning and coding capabilities while extending the context length to 128k to meet agentic post-training requirements. Building on this advanced base model, we proceed with a multi-stage post-training. Recognizing the scarcity of high-quality, high-difficulty training problems for agentic tasks, we design a multi-agent synthesis framework that defines task difficulty across three axes, i.e., information processing, tool-set complexity, and user interaction—using specialized controllers to generate complex tasks requiring iterative reasoning and environmental interaction. For more detail, please refer to the comprehensive LongCat-Flash Technical Report. Evaluation Results | Benchmark | DeepSeek V3.1 | Qwen3 MoE-2507 | Kimi-K2 | GPT-4.1 | Claude4 Sonnet | Gemini2.5 Flash | LongCat-Flash | |---------------|-------------------|--------------------|-------------|-------------|--------------------|---------------------|-------------| | Architecture | MoE | MoE | MoE | - | - | - | MoE | | # Total Params | 671B | 235B | 1043B | - | - | - | 560B | | # Activated Params | 37B | 22B | 32B | - | - | - | 27B | | General Domains | | | | | | | | | MMLU (acc) | 90.96 | 90.23 | 89.86 | 89.64 | 91.75 | 86.33 | 89.71 | | MMLU-Pro (acc) | 84.45 | 84.83 | 82.06 | 81.72 | 83.74 | 81.95 | 82.68 | | ArenaHard-V2 (acc) | 84.10 | 88.20 | 85.70 | 61.50 | 62.10 | 77.00 | 86.50 | | CEval (acc) | 89.21 | 92.70 | 91.26 | 79.53 | 86.63 | 78.78 | 90.44 | | CMMLU (acc) | 88.04 | 88.14 | 89.66 | 77.65 | 86.51 | 78.30 | 84.34 | | Instruction Following | | | | | | | | | IFEval (acc) | 86.69 | 88.54 | 88.91 | 85.58 | 88.35 | 83.92 | 89.65 | | COLLIE (acc) | 43.80 | 49.71 | 56.34 | 50.00 | 51.22 | 48.60 | 57.10 | | Meeseeks-zh (acc) | 33.83 | 35.32 | 42.79 | 41.54 | 35.07 | 34.84 | 43.03 | | Mathematical Reasoning | | | | | | | | | MATH500 (acc) | 96.08 | 98.80 | 97.60 | 90.60 | 93.80 | 98.40 | 96.40 | | AIME24 (avg@10) | 66.30 | 81.67 | 69.60 | 47.00 | 47.00 | 79.67 | 70.42 | | AIME25 (avg@10) | 49.27 | 68.33 | 50.66 | 32.00 | 37.00 | 67.33 | 61.25 | | BeyondAIME (avg@10) | 36.50 | 57.60 | 36.60 | 22.10 | 20.50 | 44.20 | 43.00 | | General Reasoning | | | | | | | | | GPQA-diamond (acc) | 74.90 | 77.43 | 75.76 | 67.68 | 70.71 | 80.30 | 73.23 | | DROP (f1) | 84.19 | 78.57 | 89.04 | 66.94 | 73.06 | 45.03 | 79.06 | | ZebraLogic (acc) | 85.30 | 94.22 | 89.11 | 56.30 | 75.85 | 51.78 | 89.30 | | GraphWalks-128k (precision) | 73.54 | 80.72 | 47.50 | 85.02 | 80.57 | 64.83 | 51.05 | | Coding | | | | | | | | | LiveCodeBench (pass@1) | 56.40 | 46.48 | 46.70 | 39.21 | 45.59 | 39.65 | 48.02 | | Humaneval+ (pass@1) | 92.68 | 94.51 | 85.98 | 93.29 | 94.51 | 87.80 | 88.41 | | MBPP+ (pass@1) | 79.89 | 79.89 | 81.75 | 79.37 | 80.16 | 76.19 | 79.63 | | SWE-Bench-Verified (acc) | 66.00 | 42.00 | 64.60 | 48.60 | 68.00 | 40.60 | 60.40 | | TerminalBench (acc) | 31.30 | 17.28 | 25.93 | 28.40 | 40.74 | 12.35 | 39.51 | | Agentic Tool Use | | | | | | | | | τ²-Bench (telecom) (avg@4) | 38.50 | 22.50 | 67.50 | 35.20 | 46.20 | 16.50 | 73.68 | | τ²-Bench (airline) (avg@4) | 46.00 | 36.00 | 54.20 | 56.00 | 60.00 | 41.50 | 58.00 | | τ²-Bench (retail) (avg@4) | 64.90 | 70.50 | 70.80 | 74.10 | 80.00 | 64.80 | 71.27 | | AceBench (acc) | 69.70 | 71.10 | 82.20 | 80.10 | 76.20 | 74.50 | 76.10 | | VitaBench (avg@4) | 20.30 | 8.50 | 18.20 | 19.00 | 23.00 | 8.00 | 24.30 | | Safety | | | | | | | | | Harmful | 82.79 | 80.82 | 53.91 | 56.19 | 66.56 | - | 83.98 | | Criminal | 87.83 | 89.13 | 77.19 | 81.58 | 87.58 | - | 91.24 | | Misinformation | 83.17 | 77.76 | 42.68 | 45.49 | 54.91 | - | 81.72 | | Privacy | 98.80 | 98.80 | 96.39 | 98.80 | 100.00 | - | 93.98 | Note: Values marked with `` are sourced from other public reports. DeepSeek-V3.1, Qwen3-235B-A22B, Gemini2.5-Flash, and Claude4-Sonnet are evaluated under their non-thinking mode. Chat Template The details of our chat template are provided in the `tokenizerconfig.json` file. Below are some examples. With the following prefix, LongCat-Flash can generate responses corresponding to user queries: When a system prompt is specified, the prefix will take the following format: In multi-turn scenarios, the prefix is constructed by concatenating the context with the latest user query: Here, N denotes the N-th round of user queries, with indexing starting from zero. LongCat-Flash supports tool calling in the following format: Deployment We have implemented basic adaptations in both SGLang and vLLM to support the deployment of LongCat-Flash. For comprehensive guidance, please refer to the Deployment Guide in the LongCat-Flash-Chat repository. Chat Website You can chat with LongCat-Flash on our official website: https://longcat.ai. This repository, including both the model weights and the source code, is released under the MIT License. Any contributions to this repository are licensed under the MIT License, unless otherwise stated. This license does not grant any rights to use Meituan trademarks or patents. Usage Considerations This model has not been specifically designed or comprehensively evaluated for every possible downstream application. Developers should take into account the known limitations of large language models, including performance variations across different languages, and carefully assess accuracy, safety, and fairness before deploying the model in sensitive or high-risk scenarios. It is the responsibility of developers and downstream users to understand and comply with all applicable laws and regulations relevant to their use case, including but not limited to data protection, privacy, and content safety requirements. Nothing in this Model Card should be interpreted as altering or restricting the terms of the MIT License under which the model is released. We kindly encourage citation of our work if you find it useful. Contact Please contact us at [email protected] or open an issue if you have any questions.

meituan-longcat

LongCat-Flash-Chat

LongCat-Image-Edit

LongCat-Image

LongCat-Video

LongCat-Flash-Lite

LongCat Flash Thinking

LongCat-AudioDiT-1B

LongCat-Flash-Chat-FP8

LongCat-Next

LongCat-Flash-Omni

LongCat-Flash-Thinking-2601

LongCat-AudioDiT-3.5B

LongCat-Image-Dev

LongCat-Flash-Thinking-2601-FP8

LongCat-Flash-Thinking-ZigZag

Auto-ATT

LongCat Flash Omni FP8

LongCat-Flash-Prover

LongCat-Flash-Thinking-FP8

UNO-Scorer-Qwen3-14B

LongCat-Video-Avatar

LongCat Audio Codec

LongCat-Flash-Lite-FP8

LongCat-HeavyMode-Summary