meituan-longcat
LongCat-Flash-Chat
Model Introduction We introduce LongCat-Flash, a powerful and efficient language model with 560 billion total parameters, featuring an innovative Mixture-of-Experts (MoE) architecture. The model incorporates a dynamic computation mechanism that activates 18.6B∼31.3B parameters (averaging∼27B) based on contextual demands, optimizing both computational efficiency and performance. To achieve advanced training and inference efficiency, we employ a shortcut-connected architecture that expands computation-communication overlap window, achieving over 100 tokens per second (TPS) for inference cost-effectively. Our comprehensive training and scaling strategies ensure stable, efficient training, while tailored data strategies enhance model performance. Now we release LongCat-Flash-Chat, a non-thinking foundation model that delivers highly competitive performance among leading models, with exceptional strengths in agentic tasks. 🌟 Scalable Architectural Design for Computational Efficiency LongCat-Flash is designed and optimized under two key principles: efficient computation utilization, as well as efficient training and inference. Specifically, (1) As not all tokens are equal, we introduce the zero-computation experts mechanism in MoE blocks to allocate a dynamic computation budget to important tokens based on their significance, i.e., activating 18.6 to 31.3 billion parameters (out of 560 billion total) based on contextual demands. To ensure consistent computation load, we employ expert bias adjusted by a PID-controller, maintaining an average of∼27 billion activated parameters per token. (2) As communication overhead becomes a bottleneck during MoE model scaling, we incorporate the Shortcut-connected MoE (ScMoE) design to expand the computation-communication overlap window. Combined with customized infrastructure optimizations, this design enables training at a massive scale of over tens of thousands accelerators and inference with high throughput and low latency. Effectively and efficiently scaling model size remains a key challenge in strategy design. To this end, we develop a comprehensive stability-and-scaling framework for robustly training large-scale models: (1) We successfully apply a hyperparameter transfer strategy to such a large model, predicting optimal hyperparameter configurations by leveraging results from smaller proxy models with theoretical guarantees. (2) We initialize the model using a model-growth mechanism based on a refined half-scale checkpoint, achieving improved performance compared to conventional initialization methods. (3) A multi-pronged stability suite incorporates principled router-gradient balancing, a hidden z-loss to suppress massive activations, and fine-tuned optimizer configurations. (4) To enhance the reliability of large-scale cluster training, we introduce deterministic computation. This guarantees the exact reproducibility of experiments and enables the detection of SDC (Silent Data Corruption) during the training process. These interventions ensure that LongCat-Flash ’s training remains stable, with no irrecoverable loss spikes. 🌟 Multi-Stage Training Pipeline for Agentic Capability Through a meticulously designed pipeline, LongCat-Flash is endowed with advanced agentic behaviors. Initial efforts focus on constructing a more suitable base model for agentic post-training, where we design a two-stage pretraining data fusion strategy to concentrate reasoning-intensive domain data. During mid-training, we enhance reasoning and coding capabilities while extending the context length to 128k to meet agentic post-training requirements. Building on this advanced base model, we proceed with a multi-stage post-training. Recognizing the scarcity of high-quality, high-difficulty training problems for agentic tasks, we design a multi-agent synthesis framework that defines task difficulty across three axes, i.e., information processing, tool-set complexity, and user interaction—using specialized controllers to generate complex tasks requiring iterative reasoning and environmental interaction. For more detail, please refer to the comprehensive LongCat-Flash Technical Report. Evaluation Results | Benchmark | DeepSeek V3.1 | Qwen3 MoE-2507 | Kimi-K2 | GPT-4.1 | Claude4 Sonnet | Gemini2.5 Flash | LongCat-Flash | |---------------|-------------------|--------------------|-------------|-------------|--------------------|---------------------|-------------| | Architecture | MoE | MoE | MoE | - | - | - | MoE | | # Total Params | 671B | 235B | 1043B | - | - | - | 560B | | # Activated Params | 37B | 22B | 32B | - | - | - | 27B | | General Domains | | | | | | | | | MMLU (acc) | 90.96 | 90.23 | 89.86 | 89.64 | 91.75 | 86.33 | 89.71 | | MMLU-Pro (acc) | 84.45 | 84.83 | 82.06 | 81.72 | 83.74 | 81.95 | 82.68 | | ArenaHard-V2 (acc) | 84.10 | 88.20 | 85.70 | 61.50 | 62.10 | 77.00 | 86.50 | | CEval (acc) | 89.21 | 92.70 | 91.26 | 79.53 | 86.63 | 78.78 | 90.44 | | CMMLU (acc) | 88.04 | 88.14 | 89.66 | 77.65 | 86.51 | 78.30 | 84.34 | | Instruction Following | | | | | | | | | IFEval (acc) | 86.69 | 88.54 | 88.91 | 85.58 | 88.35 | 83.92 | 89.65 | | COLLIE (acc) | 43.80 | 49.71 | 56.34 | 50.00 | 51.22 | 48.60 | 57.10 | | Meeseeks-zh (acc) | 33.83 | 35.32 | 42.79 | 41.54 | 35.07 | 34.84 | 43.03 | | Mathematical Reasoning | | | | | | | | | MATH500 (acc) | 96.08 | 98.80 | 97.60 | 90.60 | 93.80 | 98.40 | 96.40 | | AIME24 (avg@10) | 66.30 | 81.67 | 69.60 | 47.00 | 47.00 | 79.67 | 70.42 | | AIME25 (avg@10) | 49.27 | 68.33 | 50.66 | 32.00 | 37.00 | 67.33 | 61.25 | | BeyondAIME (avg@10) | 36.50 | 57.60 | 36.60 | 22.10 | 20.50 | 44.20 | 43.00 | | General Reasoning | | | | | | | | | GPQA-diamond (acc) | 74.90 | 77.43 | 75.76 | 67.68 | 70.71 | 80.30 | 73.23 | | DROP (f1) | 84.19 | 78.57 | 89.04 | 66.94 | 73.06 | 45.03 | 79.06 | | ZebraLogic (acc) | 85.30 | 94.22 | 89.11 | 56.30 | 75.85 | 51.78 | 89.30 | | GraphWalks-128k (precision) | 73.54 | 80.72 | 47.50 | 85.02 | 80.57 | 64.83 | 51.05 | | Coding | | | | | | | | | LiveCodeBench (pass@1) | 56.40 | 46.48 | 46.70 | 39.21 | 45.59 | 39.65 | 48.02 | | Humaneval+ (pass@1) | 92.68 | 94.51 | 85.98 | 93.29 | 94.51 | 87.80 | 88.41 | | MBPP+ (pass@1) | 79.89 | 79.89 | 81.75 | 79.37 | 80.16 | 76.19 | 79.63 | | SWE-Bench-Verified (acc) | 66.00 | 42.00 | 64.60 | 48.60 | 68.00 | 40.60 | 60.40 | | TerminalBench (acc) | 31.30 | 17.28 | 25.93 | 28.40 | 40.74 | 12.35 | 39.51 | | Agentic Tool Use | | | | | | | | | τ²-Bench (telecom) (avg@4) | 38.50 | 22.50 | 67.50 | 35.20 | 46.20 | 16.50 | 73.68 | | τ²-Bench (airline) (avg@4) | 46.00 | 36.00 | 54.20 | 56.00 | 60.00 | 41.50 | 58.00 | | τ²-Bench (retail) (avg@4) | 64.90 | 70.50 | 70.80 | 74.10 | 80.00 | 64.80 | 71.27 | | AceBench (acc) | 69.70 | 71.10 | 82.20 | 80.10 | 76.20 | 74.50 | 76.10 | | VitaBench (avg@4) | 20.30 | 8.50 | 18.20 | 19.00 | 23.00 | 8.00 | 24.30 | | Safety | | | | | | | | | Harmful | 82.79 | 80.82 | 53.91 | 56.19 | 66.56 | - | 83.98 | | Criminal | 87.83 | 89.13 | 77.19 | 81.58 | 87.58 | - | 91.24 | | Misinformation | 83.17 | 77.76 | 42.68 | 45.49 | 54.91 | - | 81.72 | | Privacy | 98.80 | 98.80 | 96.39 | 98.80 | 100.00 | - | 93.98 | Note: Values marked with `` are sourced from other public reports. DeepSeek-V3.1, Qwen3-235B-A22B, Gemini2.5-Flash, and Claude4-Sonnet are evaluated under their non-thinking mode. Chat Template The details of our chat template are provided in the `tokenizerconfig.json` file. Below are some examples. With the following prefix, LongCat-Flash can generate responses corresponding to user queries: When a system prompt is specified, the prefix will take the following format: In multi-turn scenarios, the prefix is constructed by concatenating the context with the latest user query: Here, N denotes the N-th round of user queries, with indexing starting from zero. LongCat-Flash supports tool calling in the following format: Deployment We have implemented basic adaptations in both SGLang and vLLM to support the deployment of LongCat-Flash. For comprehensive guidance, please refer to the Deployment Guide in the LongCat-Flash-Chat repository. Chat Website You can chat with LongCat-Flash on our official website: https://longcat.ai. This repository, including both the model weights and the source code, is released under the MIT License. Any contributions to this repository are licensed under the MIT License, unless otherwise stated. This license does not grant any rights to use Meituan trademarks or patents. Usage Considerations This model has not been specifically designed or comprehensively evaluated for every possible downstream application. Developers should take into account the known limitations of large language models, including performance variations across different languages, and carefully assess accuracy, safety, and fairness before deploying the model in sensitive or high-risk scenarios. It is the responsibility of developers and downstream users to understand and comply with all applicable laws and regulations relevant to their use case, including but not limited to data protection, privacy, and content safety requirements. Nothing in this Model Card should be interpreted as altering or restricting the terms of the MIT License under which the model is released. We kindly encourage citation of our work if you find it useful. Contact Please contact us at [email protected] or open an issue if you have any questions.
LongCat-Image-Edit
LongCat-Image
LongCat-Video
Model Introduction We introduce LongCat-Video, a foundational video generation model with 13.6B parameters, delivering strong performance across Text-to-Video, Image-to-Video, and Video-Continuation generation tasks. It particularly excels in efficient and high-quality long video generation, representing our first step toward world models. Key Features - 🌟 Unified architecture for multiple tasks: LongCat-Video unifies Text-to-Video, Image-to-Video, and Video-Continuation tasks within a single video generation framework. It natively supports all these tasks with a single model and consistently delivers strong performance across each individual task. - 🌟 Long video generation: LongCat-Video is natively pretrained on Video-Continuation tasks, enabling it to produce minutes-long videos without color drifting or quality degradation. - 🌟 Efficient inference: LongCat-Video generates $720p$, $30fps$ videos within minutes by employing a coarse-to-fine generation strategy along both the temporal and spatial axes. Block Sparse Attention further enhances efficiency, particularly at high resolutions - 🌟 Strong performance with multi-reward RLHF: Powered by multi-reward Group Relative Policy Optimization (GRPO), comprehensive evaluations on both internal and public benchmarks demonstrate that LongCat-Video achieves performance comparable to leading open-source video generation models as well as the latest commercial solutions. For more detail, please refer to the comprehensive LongCat-Video Technical Report. FlashAttention-2 is enabled in the model config by default; you can also change the model config to use FlashAttention-3 or xformers. | Models | Download Link | | --- | --- | | LongCat-Video | 🤗 Huggingface | Text-to-Video The Text-to-Video MOS evaluation results on our internal benchmark. | MOS score | Veo3 | PixVerse-V5 | Wan 2.2-T2V-A14B | LongCat-Video | |---------------|-------------------|--------------------|-------------|-------------| | Accessibility | Proprietary | Proprietary | Open Source | Open Source | | Architecture | - | - | MoE | Dense | | # Total Params | - | - | 28B | 13.6B | | # Activated Params | - | - | 14B | 13.6B | | Text-Alignment↑ | 3.99 | 3.81 | 3.70 | 3.76 | | Visual Quality↑ | 3.23 | 3.13 | 3.26 | 3.25 | | Motion Quality↑ | 3.86 | 3.81 | 3.78 | 3.74 | | Overall Quality↑ | 3.48 | 3.36 | 3.35 | 3.38 | Image-to-Video The Image-to-Video MOS evaluation results on our internal benchmark. | MOS score | Seedance 1.0 | Hailuo-02 | Wan 2.2-I2V-A14B | LongCat-Video | |---------------|-------------------|--------------------|-------------|-------------| | Accessibility | Proprietary | Proprietary | Open Source | Open Source | | Architecture | - | - | MoE | Dense | | # Total Params | - | - | 28B | 13.6B | | # Activated Params | - | - | 14B | 13.6B | | Image-Alignment↑ | 4.12 | 4.18 | 4.18 | 4.04 | | Text-Alignment↑ | 3.70 | 3.85 | 3.33 | 3.49 | | Visual Quality↑ | 3.22 | 3.18 | 3.23 | 3.27 | | Motion Quality↑ | 3.77 | 3.80 | 3.79 | 3.59 | | Overall Quality↑ | 3.35 | 3.27 | 3.26 | 3.17 | Community works are welcome! Please PR or inform us in Issue to add your work. - CacheDiT offers Fully Cache Acceleration support for LongCat-Video with DBCache and TaylorSeer, achieved nearly 1.7x speedup without obvious loss of precision. Visit their example for more details. The model weights are released under the MIT License. Any contributions to this repository are licensed under the MIT License, unless otherwise stated. This license does not grant any rights to use Meituan trademarks or patents. Usage Considerations This model has not been specifically designed or comprehensively evaluated for every possible downstream application. Developers should take into account the known limitations of large language models, including performance variations across different languages, and carefully assess accuracy, safety, and fairness before deploying the model in sensitive or high-risk scenarios. It is the responsibility of developers and downstream users to understand and comply with all applicable laws and regulations relevant to their use case, including but not limited to data protection, privacy, and content safety requirements. Nothing in this Model Card should be interpreted as altering or restricting the terms of the MIT License under which the model is released. Citation We kindly encourage citation of our work if you find it useful. We would like to thank the contributors to the Wan, UMT5-XXL, Diffusers and HuggingFace repositories, for their open research. Contact Please contact us at [email protected] or join our WeChat Group if you have any questions.
LongCat-Flash-Lite
LongCat Flash Thinking
We introduce and release LongCat-Flash-Thinking, which is a powerful and efficient large reasoning model (LRM) with 560 billion total parameters, featuring an innovative Mixture-of-Experts (MoE) architecture. The model incorporates a dynamic computation mechanism that activates 18.6B∼31.3B parameters (averaging∼27B) based on contextual demands, optimizing both computational efficiency and performance. LongCat-Flash-Thinking is developed by our DORA system, which is an efficient distributed RL framework that supports asynchronous training and flexible accelerator usage to ensure stability and efficiency. Our comprehensive data curation and domain-parallel training recipe ensures stable and efficient training. In addition to general reasoning, the model is also equipped with techniques of formal reasoning and agentic reasoning, advancing the LRMs' reasoning ability on diverse complex tasks such as mathematics, logic, programming, automatic theorem proving, and tool use. Specifically, the development of LongCat-Flash-Thinking follows a two-phase pipeline: - Long CoT Cold-Start Training: This phase aims to cultivate the model's foundational reasoning abilities. This begins with a curriculum learning strategy during mid-training to bolster intrinsic capabilities, followed by a SFT stage on reasoning-intensive and agentic data to prepare the model for advanced learning. - Large-Scale RL: The second phase scales up this potential through an efficient RL framework, built upon our Dynamic Orchestration for Asynchronous Rollout (DORA) system for industrial-scale asynchronous training. To address the stability challenges in asynchronous RL training, we adapt and extend the GRPO algorithm for a robust exploration-exploitation balance. A key innovation in this phase is our domain-parallel training scheme, which simultaneously optimizes the model across distinct domains and subsequently merges the resulting domain-expert models into a fused model. Finally, we perform a general RL stage to further refine the fused model and enhance its robustness, safety, and human alignment ability. To overcome the instability of traditional mixed-domain RL training, LongCat-Flash-Thinking incorporates a domain-parallel training scheme that decouples optimization across STEM, coding, and agentic tasks. This approach not only stabilizes training, but also allows to fuse the resulting domain-expert models into a nearly Pareto-optimal final model that excels across all specialties. LongCat-Flash-Thinking is built upon our self-designed DORA system. The main motivation is to optimize long-tail generation by leveraging multiple old versions of the Actor model through streaming rollout while keeping sampling consistency. DORA system consists of two core components, such as elastic colocation and multi-version asynchronous pipeline. These components aim to enhance training efficiency, ensure policy consistency per sample, and further enable efficient KV-cache reuse, facilitating stable and scalable training on tens of thousands of accelerators. 🌟 Advancing Formal Reasoning and Agentic Reasoning In addition to general reasoning (e.g., mathematics, logic, coding, instruction-following, etc.), LongCat-Flash-Thinking also emphasizes two other critical capabilities. - Formal Reasoning: LongCat-Flash-Thinking can solve complex formal reasoning tasks, e.g., automatic theorem proving. To help realize this potential and empower researchers, we introduce significant enhancements to our model's formal reasoning capabilities. To achieve this, we introduce a novel expert iteration framework for careful data synthesis, involving statement formalization, iterative proof synthesis, and syntax/consistency filtering. - Agentic Reasoning: LongCat-Flash-Thinking can adaptively utilize provided tools to solve complex reasoning tasks. To reach this goal, we introduce a dual-path reasoning approach to identify and retain high-quality queries that genuinely require tool assistance, thereby fostering the development of robust agentic abilities. After high-value query selection, we synthesize corresponding high-quality solution trajectories based on a versatile environment with diverse tool APIs, including MCP servers and simulated tools for both single and multi-turn interactions. For more details, please refer to the comprehensive LongCat-Flash-Thinking Technical Report. | Benchmark | DeepSeek-V3.1-Thinking | Qwen3-235B-A22B-Thinking-2507 | GLM-4.5 | OpenAI-o3 | Gemini2.5-Pro | GPT-5-Thinking | LongCat-Flash-Thinking | |---------------|-------------------------|------------------------------|--------|-----------|---------------|----------------|-------------------------| | Architecture | MoE | MoE | MoE | - | - | - | MoE | | \# Total Params | 671B | 235B | 355B | - | - | - | 560B | | \# Activated Params | 37B | 22B | 32B | - | - | - | 27B | | General QA | | | | | | | | | MMLU-Pro (acc) | 84.4 | 84.4 | 81.5 | 85.3 | 86.7 | 84.5 | 82.6 | | MMLU-Redux (acc) | 90.5 | 91.4 | 89.9 | 93.1 | 90.1 | 92.6 | 89.3 | | Alignment | | | | | | | | | IFEval (strict prompt) | 86.3 | 89.3 | 85.4 | 90.2 | 92.4 | 92.8 | 86.9 | | Arena-Hard (hard prompt gemini) | 57.1 | 74.5 | 67.7 | 87.1 | 87.1 | 87.7 | 69.9 | | Mathematical Reasoning | | | | | | | | | MATH500 (Mean@1) | 98.8 | 99.6 | 95.4 | 98.4 | 98.0 | 99.2 | 99.2 | | HMMT25 (Mean@32) | 80.4 | 83.8 | 76.3 | 71.9 | 79.3 | 84.8 | 83.7 | | AIME24 (Mean@32) | 93.9 | 93.9 | 89.3 | 91.6 | 90.7 | 92.0 | 93.3 | | AIME25 (Mean@32) | 87.9 | 92.5 | 85.5 | 88.9 | 89.2 | 94.6 | 90.6 | | BeyondAIME (Mean@10) | 71.8 | 71.5 | 66.0 | 63.2 | 63.0 | 70.0 | 69.5 | | General Reasoning | | | | | | | | | GPQA-Diamond (Mean@16) | 84.2 | 80.4 | 78.3 | 81.9 | 84.0 | 84.4 | 81.5 | | ZebraLogic (Mean@1) | 96.1 | 97.5 | 90.9 | 94.3 | 92.4 | 92.7 | 95.5 | | Sudoku-Bench (Mean@1) | 1.0 | 2.0 | 1.0 | 70.0 | 0.0 | 63.0 | 56.0 | | ARC-AGI (Mean@1) | 37.5 | 45.3 | 21.41 | 47.3 | 46.8 | 59.0 | 50.3 | | Coding | | | | | | | | | LiveCodeBench (Mean@4) | 73.5 | 75.4 | 61.1 | 76.2 | 74.2 | 80.6 | 79.4 | | OJBench (Mean@1) | 33.6 | 32.1 | 19.0 | 38.4 | 41.6 | 34.1 | 40.7 | | Agentic Tool Using | | | | | | | | | SWE-Bench (Pass@1) | 66.0 | 34.4 | 64.2 | 69.1 | 59.6 | 74.9 | 59.4 | | BFCL V3 (full) | 55.4 | 75.7 | 79.1 | 72.4 | 63.2 | 60.1 | 74.4 | | τ²-Bench-Retail (Mean@4) | 65.4 | 68.2 | 69.3 | 72.8 | 70.9 | 81.1 | 71.5 | | τ²-Bench-Airline (Mean@4) | 44.0 | 58.0 | 66.0 | 62.5 | 58.0 | 62.6 | 67.5 | | τ²-Bench-Telecom (Mean@4) | 23.7 | 47.3 | 56.1 | 67.5 | 38.3 | 96.7 | 83.1 | | VitaBench | 13.5 | 21.5 | 26.8 | 35.3 | 24.3 | 29.3 | 29.5 | | Formal Theorem Proving | | | | | | | | | MiniF2F-Test (Pass@1) | 49.6 | 11.9 | 10.9 | 15.2 | 13.9 | 21.4 | 67.6 | | MiniF2F-Test (Pass@8) | 74.4 | 20.9 | 22.1 | 29.6 | 29.4 | 39.7 | 79.4 | | MiniF2F-Test (Pass@32) | 79.5 | 26.6 | 27.0 | 37.7 | 41.8 | 51.2 | 81.6 | | Safety | | | | | | | | | Harmful | 79.2 | 84.3 | 70.4 | 64.8 | 44.3 | 56.8 | 93.7 | | Criminal | 89.7 | 92.7 | 88.8 | 85.7 | 77.4 | 87.3 | 97.1 | | Misinformation | 81.1 | 80.9 | 67.1 | 42.7 | 31.0 | 41.9 | 93.0 | | Privacy | 96.2 | 100.0 | 97.6 | 100.0 | 95.0 | 98.8 | 98.8 | Note: - Values marked with are sourced from other public reports. - The inference parameters of our LongCat-Flash-Thinking are set as `temperature=1.0`, `topk=-1`, and `topp=0.95`. Chat Template The details of our chat template are provided in the `tokenizerconfig.json` file. Below are some examples. With the following prefix, LongCat-Flash can generate responses corresponding to user queries: When a system prompt is specified, the prefix will take the following format: In multi-turn scenarios, the prefix is constructed by concatenating the context with the latest user query: Here, N denotes the N-th round of user queries, with indexing starting from zero. LongCat-Flash supports tool calling in the following format: Mathematical Reasoning We recommend adding the following instructions when solving mathematical or other STEM-related reasoning tasks, so that the output results can be located for evaluation. LongCat-Flash-Thinking also supports formal reasoning, like automatic theorem proving (ATP). The specific template is: Deployment We have implemented basic adaptations in both SGLang and vLLM to support the deployment of LongCat-Flash-Thinking. Please refer to the Deployment Guide for detailed deployment instructions. Chat Website You can chat with LongCat-Flash-Thinking on our official website: https://longcat.ai. Please turn on the button "Think" ("深度思考" in Chinese) before submitting your request. The model weights are released under the MIT License. Any contributions to this repository are licensed under the MIT License, unless otherwise stated. This license does not grant any rights to use Meituan trademarks or patents. Usage Considerations This model has not been specifically designed or comprehensively evaluated for every possible downstream application. Developers should take into account the known limitations of large language models, including performance variations across different languages, and carefully assess accuracy, safety, and fairness before deploying the model in sensitive or high-risk scenarios. It is the responsibility of developers and downstream users to understand and comply with all applicable laws and regulations relevant to their use case, including but not limited to data protection, privacy, and content safety requirements. Nothing in this Model Card should be interpreted as altering or restricting the terms of the MIT License under which the model is released. Citation We kindly encourage citation of our work if you find it useful. Contact Please contact us at [email protected] or join our WeChat Group if you have any questions.
LongCat-AudioDiT-1B
LongCat-Flash-Chat-FP8
Model Introduction We introduce LongCat-Flash, a powerful and efficient language model with 560 billion total parameters, featuring an innovative Mixture-of-Experts (MoE) architecture. The model incorporates a dynamic computation mechanism that activates 18.6B∼31.3B parameters (averaging∼27B) based on contextual demands, optimizing both computational efficiency and performance. To achieve advanced training and inference efficiency, we employ a shortcut-connected architecture that expands computation-communication overlap window, achieving over 100 tokens per second (TPS) for inference cost-effectively. Our comprehensive training and scaling strategies ensure stable, efficient training, while tailored data strategies enhance model performance. Now we release LongCat-Flash-Chat, a non-thinking foundation model that delivers highly competitive performance among leading models, with exceptional strengths in agentic tasks. 🌟 Scalable Architectural Design for Computational Efficiency LongCat-Flash is designed and optimized under two key principles: efficient computation utilization, as well as efficient training and inference. Specifically, (1) As not all tokens are equal, we introduce the zero-computation experts mechanism in MoE blocks to allocate a dynamic computation budget to important tokens based on their significance, i.e., activating 18.6 to 31.3 billion parameters (out of 560 billion total) based on contextual demands. To ensure consistent computation load, we employ expert bias adjusted by a PID-controller, maintaining an average of∼27 billion activated parameters per token. (2) As communication overhead becomes a bottleneck during MoE model scaling, we incorporate the Shortcut-connected MoE (ScMoE) design to expand the computation-communication overlap window. Combined with customized infrastructure optimizations, this design enables training at a massive scale of over tens of thousands accelerators and inference with high throughput and low latency. Effectively and efficiently scaling model size remains a key challenge in strategy design. To this end, we develop a comprehensive stability-and-scaling framework for robustly training large-scale models: (1) We successfully apply a hyperparameter transfer strategy to such a large model, predicting optimal hyperparameter configurations by leveraging results from smaller proxy models with theoretical guarantees. (2) We initialize the model using a model-growth mechanism based on a refined half-scale checkpoint, achieving improved performance compared to conventional initialization methods. (3) A multi-pronged stability suite incorporates principled router-gradient balancing, a hidden z-loss to suppress massive activations, and fine-tuned optimizer configurations. (4) To enhance the reliability of large-scale cluster training, we introduce deterministic computation. This guarantees the exact reproducibility of experiments and enables the detection of SDC (Silent Data Corruption) during the training process. These interventions ensure that LongCat-Flash ’s training remains stable, with no irrecoverable loss spikes. 🌟 Multi-Stage Training Pipeline for Agentic Capability Through a meticulously designed pipeline, LongCat-Flash is endowed with advanced agentic behaviors. Initial efforts focus on constructing a more suitable base model for agentic post-training, where we design a two-stage pretraining data fusion strategy to concentrate reasoning-intensive domain data. During mid-training, we enhance reasoning and coding capabilities while extending the context length to 128k to meet agentic post-training requirements. Building on this advanced base model, we proceed with a multi-stage post-training. Recognizing the scarcity of high-quality, high-difficulty training problems for agentic tasks, we design a multi-agent synthesis framework that defines task difficulty across three axes, i.e., information processing, tool-set complexity, and user interaction—using specialized controllers to generate complex tasks requiring iterative reasoning and environmental interaction. For more detail, please refer to the comprehensive LongCat-Flash Technical Report. Evaluation Results | Benchmark | DeepSeek V3.1 | Qwen3 MoE-2507 | Kimi-K2 | GPT-4.1 | Claude4 Sonnet | Gemini2.5 Flash | LongCat-Flash | |---------------|-------------------|--------------------|-------------|-------------|--------------------|---------------------|-------------| | Architecture | MoE | MoE | MoE | - | - | - | MoE | | # Total Params | 671B | 235B | 1043B | - | - | - | 560B | | # Activated Params | 37B | 22B | 32B | - | - | - | 27B | | General Domains | | | | | | | | | MMLU (acc) | 90.96 | 90.23 | 89.86 | 89.64 | 91.75 | 86.33 | 89.71 | | MMLU-Pro (acc) | 84.45 | 84.83 | 82.06 | 81.72 | 83.74 | 81.95 | 82.68 | | ArenaHard-V2 (acc) | 84.10 | 88.20 | 85.70 | 61.50 | 62.10 | 77.00 | 86.50 | | CEval (acc) | 89.21 | 92.70 | 91.26 | 79.53 | 86.63 | 78.78 | 90.44 | | CMMLU (acc) | 88.04 | 88.14 | 89.66 | 77.65 | 86.51 | 78.30 | 84.34 | | Instruction Following | | | | | | | | | IFEval (acc) | 86.69 | 88.54 | 88.91 | 85.58 | 88.35 | 83.92 | 89.65 | | COLLIE (acc) | 43.80 | 49.71 | 56.34 | 50.00 | 51.22 | 48.60 | 57.10 | | Meeseeks-zh (acc) | 33.83 | 35.32 | 42.79 | 41.54 | 35.07 | 34.84 | 43.03 | | Mathematical Reasoning | | | | | | | | | MATH500 (acc) | 96.08 | 98.80 | 97.60 | 90.60 | 93.80 | 98.40 | 96.40 | | AIME24 (avg@10) | 66.30 | 81.67 | 69.60 | 47.00 | 47.00 | 79.67 | 70.42 | | AIME25 (avg@10) | 49.27 | 68.33 | 50.66 | 32.00 | 37.00 | 67.33 | 61.25 | | BeyondAIME (avg@10) | 36.50 | 57.60 | 36.60 | 22.10 | 20.50 | 44.20 | 43.00 | | General Reasoning | | | | | | | | | GPQA-diamond (acc) | 74.90 | 77.43 | 75.76 | 67.68 | 70.71 | 80.30 | 73.23 | | DROP (f1) | 84.19 | 78.57 | 89.04 | 66.94 | 73.06 | 45.03 | 79.06 | | ZebraLogic (acc) | 85.30 | 94.22 | 89.11 | 56.30 | 75.85 | 51.78 | 89.30 | | GraphWalks-128k (precision) | 73.54 | 80.72 | 47.50 | 85.02 | 80.57 | 64.83 | 51.05 | | Coding | | | | | | | | | LiveCodeBench (pass@1) | 56.40 | 46.48 | 46.70 | 39.21 | 45.59 | 39.65 | 48.02 | | Humaneval+ (pass@1) | 92.68 | 94.51 | 85.98 | 93.29 | 94.51 | 87.80 | 88.41 | | MBPP+ (pass@1) | 79.89 | 79.89 | 81.75 | 79.37 | 80.16 | 76.19 | 79.63 | | SWE-Bench-Verified (acc) | 66.00 | 42.00 | 64.60 | 48.60 | 68.00 | 40.60 | 60.40 | | TerminalBench (acc) | 31.30 | 17.28 | 25.93 | 28.40 | 40.74 | 12.35 | 39.51 | | Agentic Tool Use | | | | | | | | | τ²-Bench (telecom) (avg@4) | 38.50 | 22.50 | 67.50 | 35.20 | 46.20 | 16.50 | 73.68 | | τ²-Bench (airline) (avg@4) | 46.00 | 36.00 | 54.20 | 56.00 | 60.00 | 41.50 | 58.00 | | τ²-Bench (retail) (avg@4) | 64.90 | 70.50 | 70.80 | 74.10 | 80.00 | 64.80 | 71.27 | | AceBench (acc) | 69.70 | 71.10 | 82.20 | 80.10 | 76.20 | 74.50 | 76.10 | | VitaBench (avg@4) | 20.30 | 8.50 | 18.20 | 19.00 | 23.00 | 8.00 | 24.30 | | Safety | | | | | | | | | Harmful | 82.79 | 80.82 | 53.91 | 56.19 | 66.56 | - | 83.98 | | Criminal | 87.83 | 89.13 | 77.19 | 81.58 | 87.58 | - | 91.24 | | Misinformation | 83.17 | 77.76 | 42.68 | 45.49 | 54.91 | - | 81.72 | | Privacy | 98.80 | 98.80 | 96.39 | 98.80 | 100.00 | - | 93.98 | Note: Values marked with `` are sourced from other public reports. DeepSeek-V3.1, Qwen3-235B-A22B, Gemini2.5-Flash, and Claude4-Sonnet are evaluated under their non-thinking mode. Chat Template The details of our chat template are provided in the `tokenizerconfig.json` file. Below are some examples. With the following prefix, LongCat-Flash can generate responses corresponding to user queries: When a system prompt is specified, the prefix will take the following format: In multi-turn scenarios, the prefix is constructed by concatenating the context with the latest user query: Here, N denotes the N-th round of user queries, with indexing starting from zero. LongCat-Flash supports tool calling in the following format: Deployment We have implemented basic adaptations in both SGLang and vLLM to support the deployment of LongCat-Flash. For comprehensive guidance, please refer to the Deployment Guide in the LongCat-Flash-Chat repository. Chat Website You can chat with LongCat-Flash on our official website: https://longcat.ai. This repository, including both the model weights and the source code, is released under the MIT License. Any contributions to this repository are licensed under the MIT License, unless otherwise stated. This license does not grant any rights to use Meituan trademarks or patents. Usage Considerations This model has not been specifically designed or comprehensively evaluated for every possible downstream application. Developers should take into account the known limitations of large language models, including performance variations across different languages, and carefully assess accuracy, safety, and fairness before deploying the model in sensitive or high-risk scenarios. It is the responsibility of developers and downstream users to understand and comply with all applicable laws and regulations relevant to their use case, including but not limited to data protection, privacy, and content safety requirements. Nothing in this Model Card should be interpreted as altering or restricting the terms of the MIT License under which the model is released. We kindly encourage citation of our work if you find it useful. Contact Please contact us at [email protected] or open an issue if you have any questions.
LongCat-Next
LongCat-Flash-Omni
Model Introduction We introduce LongCat-Flash-Omni, a state-of-the-art open-source omni-modal model with 560 billion parameters (with 27B activated), excelling at real-time audio-visual interaction, which is attained by leveraging LongCat-Flash's high-performance Shortcut-connected Mixture-of-Experts (MoE) architecture with zero-computation experts, augmented by efficient multimodal perception and speech reconstruction modules. Through an effective curriculum-inspired progressive training strategy, our model achieves comprehensive multimodal capabilities while maintaining strong unimodal capability. Now, we open-source the model to foster future research and development in the community. LongCat-Flash-Omni is an open-source omni-modal model that achieves state-of-the-art cross-modal comprehension performance. It seamlessly integrates powerful offline multi-modal understanding with real-time audio–visual interaction within a single all-in-one framework. 🌟 Large-Scale with Low-Latency Audio–Visual Interaction By leveraging an efficient LLM backbone, carefully designed lightweight modality encoders and decoder, and a chunk-wise audio–visual feature interleaving mechanism, LongCat-Flash-Omni achieves low-latency, high-quality audio–visual processing and streaming speech generation. It supports a context window of up to 128K tokens, enabling advanced capabilities in long-term memory, multi-turn dialogue, and temporal reasoning across multiple modalities. The model adopts an innovative multi-stage pretraining pipeline that progressively incorporates text, audio, and visual modalities under a balanced data strategy and early-fusion training paradigm, ensuring strong omni-modal performance without degradation in any single modality. Inspired by the concept of modality decoupling, we propose a Modality-Decoupled Parallelism training scheme that significantly enhances the efficiency of large-scale and highly challenging multimodal training. 🌟 Open-Source Contribution We provide a comprehensive overview of the training methodology and data strategies behind LongCat-Flash-Omni, and release the model to accelerate future research and innovation in omni-modal intelligence. For more detail, please refer to the comprehensive LongCat-Flash-Omni Technical Report. | Benchmark | LongCat-Flash-Omni Instruct | Gemini-2.5-Pro (ThinkingBudget128) | Gemini-2.5-Flash (non-thinking) | Qwen3-Omni Instruct | Qwen2.5-Omni Instruct | |-----------|-------------------------------|-----------------------------------|------------------------------|----------------------|-------------------------| | OmniBench | 61.38 | 66.80 | 54.99 | 58.41 | 48.16 | | WorldSense | 60.89 | 63.96 | 58.72 | 52.01 | 46.69 | | DailyOmni | 82.38 | 80.61 | 80.78 | 69.33 | 47.45 | | UNO-Bench | 49.90 | 64.48 | 54.30 | 42.10 | 32.60 | Image-to-Text | Benchmark | LongCat-Flash-Omni Instruct | Gemini-2.5-Pro (ThinkingBudget128) | Gemini-2.5-Flash (non-thinking) | Qwen3-Omni Instruct | Seed-1.6 | GPT-4o-1120 | Qwen3-VL-235B-A22B-Instruct | Qwen2.5-VL-72B-Instruct | |-----------|-------------------------------|-----------------------------------|------------------------------|----------------------|----------|---------------|------------------------------|---------------------------| | General |||||||||| | MMBench-EN test | 87.5 | 89.8 | 89.3 | 86.8 | 88.5 | 83.7 | 88.3 | 88.6 | | MMBench-ZH test | 88.7 | 89.2 | 88.5 | 86.4 | 83.8 | 82.8 | 89.8 | 87.9 | | RealWorldQA | 74.8 | 76.0 | 73.9 | 72.9 | 74.5 | 74.1 | 79.3 | 75.7 | | MMStar | 70.9 | 78.5 | 75.5 | 68.5 | 71.5 | 63.2 | 78.4 | 68.2 | | STEM & Reasoning |||||||||| | MathVista mini | 77.9 | 77.7 | 77.1 | 75.9 | 78.7 | 62.8 | 84.9 | 74.8 | | MMMU val | 70.7 | 80.9 | 76.3 | 69.1 | 74.9 | 69.4 | 78.7 | 70.2 | | MMVet | 69.0 | 80.7 | 79.5 | 68.9 | 74.4 | 76.6 | 75.9 | 74.5 | | Multi-Image |||||||||| | BLINK | 63.1 | 70.0 | 65.7 | 56.1 | 65.0 | 65.5 | 70.7 | 60.1 | | MuirBench | 77.1 | 74.0 | 73.7 | 62.1 | 74.6 | 70.5 | 72.8 | 70.7 | | Mantis | 84.8 | 83.9 | 83.4 | 80.7 | 81.1 | 79.3 | 79.7 | 82.0 | | Text Recognition & Chart/Document Understanding |||||||||| | ChartQA | 87.6 | 71.7 | 77.6 | 86.8 | 82.4 | 74.5 | 89.2 | 89.5 | | DocVQA | 91.8 | 94.0 | 93.6 | 95.7 | 94.3 | 80.9 | 94.6 | 96.4 | | OCRBench | 84.9 | 87.2 | 85.6 | 85.5 | 85.6 | 82.3 | 91.2 | 88.5 | | OmniDocBench EN/ZH ↓ | 22.8/29.0 | 31.9/24.5 | 22.8/32.9 | 28.4/40.5 | 22.0/27.6 | 25.9/37.7 | 13.6/17.5 | 22.6/32.4 | | Grounding & Counting |||||||||| | RefCOCO-avg | 92.3 | 75.4 | 71.9 | 89.3 | 80.2 | - | 87.1 | 90.3 | | CountBench | 92.4 | 91.0 | 78.6 | 90.0 | 94.1 | 85.6 | 94.3 | 93.6 | | Graphical User Interface (GUI) |||||||||| | VisualWebBench | 78.7 | 81.1 | 73.5 | 79.3 | 81.1 | 77.1 | 80.8 | 82.3 | | ScreenSpot-v2 | 91.2 | 75.8 | 63.9 | 94.7 | 91.7 | - | 93.4 | 92.9 | | AndroidControl low | 91.2 | 79.2 | 79.1 | 90.5 | 84.6 | 65.2 | 90.0 | 93.7 | | AndroidControl high | 75.6 | 60.8 | 55.5 | 70.8 | 55.2 | 41.7 | 74.1 | 67.4 | Note: Values marked with are sourced from public reports. As GPT-4o does not support image grounding, we do not report its results on RefCOCO and ScreenSpot-v2 Video-to-Text | Benchmark | LongCat-Flash-Omni Instruct | Gemini-2.5-Pro (ThinkingBudget128) | Gemini-2.5-Flash (non-thinking) | Qwen3-Omni Instruct | Seed-1.6 | GPT-4o-1120 | Qwen3-VL (235B-A22B-Instruct) | Qwen2.5-VL-72B-Instruct | |-----------|-------------------------------|-----------------------------------|------------------------------|----------------------|----------|---------------|------------------------------|---------------------------| | Short Video |||||||||| | MVBench | 75.2 | 66.4 | 63.0 | 69.3 | 68.4 | 62.1 | 71.3 | 70.4 | | NextQA | 86.2 | 84.2 | 81.4 | 82.4 | 84.1 | 79.7 | 81.3 | 82.3 | | TempCompass | 82.2 | 80.8 | 80.2 | 73.5 | 79.4 | 76.4 | 80.5 | 74.8 | | Long Video |||||||||| | VideoMME (w/o audio) | 76.2 | - | - | 70.5 | 75.2 | 73.2 | 79.2 | 73.3 | | VideoMME (w/ audio) | 78.2 | 80.6 | 78.5 | 73.0 | - | - | - | - | | LongVideoBench | 69.3 | 69.4 | 66.4 | 65.4 | 64.8 | 63.9 | - | 60.7 | | STEM & Reasoning |||||||||| | MMVU | 67.1 | 75.6 | 72.4 | 62.4 | 67.3 | 67.4 | 69.3 | 62.9 | | Video-MMMU | 67.5 | 79.4 | 76.6 | 60.3 | 75.4 | 68.0 | 73.7 | 59.3 | Note: Values marked with are sourced from public reports. Table 1: Automatic Speech Recognition (ASR) and Speech-to-Text Translation (S2TT) | Benchmark | LongCat-Flash-Omni Instruct | Gemini-2.5-Pro (ThinkingBudget128) | GPT-4o-Audio | Qwen3-Omni Instruct | Kimi-Audio | Step-Audio-2-mini | |-----------|-------------------------------|-----------------------------------|--------------|----------------------|------------|-------------------| | ASR | | | | | | | | LibriSpeech (test-clean \| test-other) | 1.57 \| 4.01 | 1.74 \| 3.80 | 30.00 \| 41.83 | 1.22 \| 2.48 | 1.28 \| 2.42 | 1.33 \| 2.86 | | AISHELL-1 | 0.63 | 3.11 | 34.81 | 0.84 | 0.60 | 0.78 | | AISHELL-2 | 2.78 | 5.24 | 77.73 | 2.34 | 2.56 | 2.16 | | Fleurs (zh \| en) | 3.99 \| 5.02 | 2.24 \| 4.77 | 3.91 \| 5.56 | 2.20 \| 2.72 | 2.69 \| 4.44 | 2.53 \| 3.05 | | CommonVoice 15 (zh \| en) | 4.98 \| 13.59 | 47.30 \| 49.86 | 42.83 \| 23.88 | 4.31 \| 6.05 | 8.46 \| 7.92 | 5.00 \| 6.75 | | WenetSpeech (test-meeting \| test-net) | 6.69 \| 6.09 | 136.13 \| 32.82 | 54.35 \| 67.90 | 5.89 \| 4.69 | 6.28 \| 5.37 | 4.87 \| 4.82 | | S2TT (BLEU) | | | | | | | | CoVost2 en→zh | 47.23 | 41.94 | 29.32 | 48.72 | - | 49.12 | | CoVost2 zh→en | 27.32 | 25.38 | 16.01 | 21.51 | - | 29.47 | Note: ASR results are in CER/WER (lower is better), S2TT results are in BLEU score. Table 2: Audio Understanding | Benchmark | LongCat-Flash-Omni Instruct | Gemini-2.5-Pro (ThinkingBudget128) | GPT-4o-Audio | Qwen3-Omni Instruct | Kimi-Audio | Step-Audio-2-mini | |-----------|-------------------------------|-----------------------------------|--------------|----------------------|------------|-------------------| | MMAU | 75.90 | 72.80 | 68.40 | 77.50 | 65.20 | 73.20 | | VocalSound | 92.76 | 89.45 | 82.37 | 91.60 | 94.85 | 87.58 | | TUT2017 | 65.43 | 33.15 | 20.74 | 40.74 | 65.25 | 30.67 | | ClothoAQA | 72.83 | 69.67 | 61.87 | 75.16 | 72.21 | 68.39 | | Nonspeech7k | 93.79 | 87.59 | 72.28 | 80.83 | 93.93 | 73.24 | | CochlScene | 70.02 | 45.34 | 34.94 | 43.03 | 80.42 | 44.58 | | MELD | 54.60 | 46.74 | 39.00 | 50.80 | 59.13 | 31.44 | Table 3: Audio-to-Text Chat | Benchmark | LongCat-Flash-Omni Instruct | Gemini-2.5-Pro (ThinkingBudget128) | GPT-4o-Audio | Qwen3-Omni Instruct | Kimi-Audio | Step-Audio-2-mini | |-----------|-------------------------------|-----------------------------------|--------------|----------------------|------------|-------------------| | OpenAudioBench | | | | | | | | LlamaQuestions | 83.33 | 83.00 | 86.30 | 83.30 | 79.33 | 69.70 | | ReasoningQA | 79.71 | 80.30 | 68.71 | 84.16 | 58.02 | 55.64 | | TriviaQA | 86.20 | 90.20 | 76.00 | 75.90 | 62.10 | 45.30 | | Webquestions | 76.00 | 80.90 | 81.20 | 75.20 | 70.20 | 54.40 | | AlpacaEval | 75.43 | 76.58 | 81.61 | 85.43 | 75.73 | 53.92 | | VoiceBench | | | | | | | | AlpacaEval | 4.94 | 4.70 | 4.73 | 4.74 | 4.46 | 3.84 | | CommonEval | 4.32 | 4.11 | 4.37 | 4.54 | 3.97 | 3.19 | | OpenBookQA | 93.41 | 95.16 | 87.90 | 89.70 | 83.52 | 72.97 | | SDQA | 82.46 | 83.54 | 90.10 | 76.90 | 63.12 | 44.85 | | MMSU | 81.95 | 88.32 | 78.90 | 69.00 | 62.17 | 52.00 | | AdvBench | 100 | 97.69 | 99.23 | 99.30 | 100 | 97.00 | | IFEval | 77.99 | 77.83 | 66.81 | 77.80 | 61.10 | 29.80 | | Benchmark | LongCat-Flash-Omni Instruct | LongCat-Flash | DeepSeek V3.1 | Qwen3 MoE-2507 | Kimi-K2 | GPT-4.1 | Claude Sonnet-4 | Gemini-2.5-Flash | |-----------|-------------------------------|---------------|---------------|----------------|---------|---------|-----------------|------------------| | Architecture | MoE | MoE | MoE | MoE | MoE | - | - | - | | # Total Params | 560B | 560B | 671B | 235B | 1043B | - | - | - | | # Activated Params | 27B | 27B | 37B | 22B | 32B | - | - | - | | General Domains |||||||||| | MMLU (acc) | 90.30 | 89.71 | 90.96 | 90.23 | 89.86 | 89.64 | 91.75 | 86.33 | | MMLU-Pro (acc) | 82.73 | 82.68 | 84.45 | 84.83 | 82.06 | 81.72 | 83.74 | 81.95 | | CEval (acc) | 91.68 | 90.44 | 89.21 | 92.70 | 91.26 | 79.53 | 86.63 | 78.78 | | CMMLU (acc) | 89.39 | 84.34 | 88.04 | 88.14 | 89.66 | 77.65 | 86.51 | 78.30 | | Instruction Following |||||||||| | IFEval (acc) | 82.44 | 89.65 | 86.69 | 88.54 | 88.91 | 85.58 | 88.35 | 83.92 | | COLLIE (acc) | 45.69 | 57.10 | 43.80 | 49.71 | 56.34 | 50.00 | 51.22 | 48.60 | | Meeseeks-zh (acc) | 39.05 | 43.03 | 33.83 | 35.32 | 42.79 | 41.54 | 35.07 | 34.84 | | Mathematical Reasoning |||||||||| | MATH500 (acc) | 97.60 | 96.40 | 96.08 | 98.80 | 97.60 | 90.60 | 93.80 | 98.40 | | AIME24 (avg@10) | 72.92 | 70.42 | 66.30 | 81.67 | 69.60 | 47.00 | 47.00 | 79.67 | | BeyondAIME (avg@10) | 47.40 | 43.00 | 36.50 | 57.60 | 36.60 | 22.10 | 20.50 | 44.20 | | General Reasoning |||||||||| | GPQA-diamond (acc) | 74.41 | 73.23 | 74.90 | 77.43 | 75.76 | 67.68 | 70.71 | 80.30 | | DROP (f1) | 83.53 | 79.06 | 84.19 | 78.57 | 89.04 | 66.94 | 73.06 | 45.03 | | ZebraLogic (acc) | 86.00 | 89.30 | 85.30 | 94.22 | 89.11 | 56.30 | 80.10 | 57.00 | | GraphWalks-128k (precision) | 56.00 | 51.05 | 73.54 | 80.72 | 47.50 | 85.02 | 80.57 | 64.83 | | Coding |||||||||| | LiveCodeBench (pass@1) | 52.64 | 48.02 | 56.40 | 46.48 | 46.70 | 39.21 | 45.59 | 39.65 | | Humaneval+ (pass@1) | 90.85 | 88.41 | 92.68 | 94.51 | 85.98 | 93.29 | 94.51 | 87.80 | | MBPP+ (pass@1) | 80.16 | 79.63 | 79.89 | 79.89 | 81.75 | 79.37 | 80.16 | 76.19 | Note: Values marked with are sourced from other public reports. Note that DeepSeek-V3.1, Qwen3-235B-A22B, Gemini2.5-Flash, and Claude4-Sonnet are evaluated under their non-thinking mode. LongCat-Flash-Omni is a MoE model, which means that the model weights are distributed across multiple devices. Therefore, during loading in Hugging Face Transformers or vLLM, model weights will be automatically downloaded based on the model name. However, if your runtime environment is not conducive to downloading weights during execution, you can refer to the following commands to manually download the model weights to a local directory: We have implemented basic adaptations in SGLang to support running the Longcat-Flash-Omni model. Currently, the official SGLang does not natively support Longcat-Flash-Omni, so you can temporarily use our development branch for local installation and testing. Due to its size of 560 billion parameters (560B), LongCat-Flash-Omni requires at least one node (e.g., 8×H20-141G) to host the model weights in FP8 format, and at least two nodes (e.g., 16×H800-80G) for BF16 weights. Detailed launch configurations are provided below. Installation python >= 3.10.0 (Recommend to use Anaconda) PyTorch >= 2.8 CUDA >= 12.9 The model can be served on your cluster using a combination of Tensor Parallelism and Expert Parallelism. Once all dependencies are installed, you can launch the demo using the following command. > NOTE: Replace \$NODE\RANK and \$MASTER\IP with the corresponding values of your GPU machines. All test cases are defined in examplesdict.py, and additional test cases may be added as needed. After model execution, the generated results are saved in the directory specified by the --output-dir parameter. You can use LongCat-Flash-Omni (web version currently only supports audio interaction features) on https://longcat.ai. The full service will be provided in subsequent updates. We are excited to announce that the LongCat-Flash-Omni app is now available for both Android and iOS. For Android, you can download it from the following QR code. For iOS, you can download it by searching "LongCat" at App Store or QR code. Currently, only the Chinese App Store is supported. The model weights are released under the MIT License. Any contributions to this repository are licensed under the MIT License, unless otherwise stated. This license does not grant any rights to use Meituan trademarks or patents. Usage Considerations This model has not been specifically designed or comprehensively evaluated for every possible downstream application. Developers should take into account the known limitations of large language models, including performance variations across different languages, and carefully assess accuracy, safety, and fairness before deploying the model in sensitive or high-risk scenarios. It is the responsibility of developers and downstream users to understand and comply with all applicable laws and regulations relevant to their use case, including but not limited to data protection, privacy, and content safety requirements. Nothing in this Model Card should be interpreted as altering or restricting the terms of the MIT License under which the model is released. Citation We kindly encourage citation of our work if you find it useful. Contact Please contact us at [email protected] or join our WeChat Group if you have any questions.
LongCat-Flash-Thinking-2601
LongCat-AudioDiT-3.5B
LongCat-Image-Dev
LongCat-Flash-Thinking-2601-FP8
LongCat-Flash-Thinking-ZigZag
Auto-ATT
Auto-ATT 🔊🤖 Automatically Evaluating the Human-likeness of TTS Systems via Audio-LLM-Based Score Regression > Auto-ATT is a model that LoRA finetuned on Qwen2-Audio-Instruct and offers a plug‑and‑play pipeline to grade “Audio Turing Tests” (ATTs) at scale, producing objective scores that correlate with human judgements––all without manual listening. ATT is an evaluation framework with a standardized human evaluation protocol and an accompanying dataset, aiming to resolve the lack of unified protocols in TTS evaluation and the difficulty in comparing multiple TTS systems. To further support the training and iteration of TTS systems, we utilized additional private evaluation data to train Auto-ATT model based on Qwen2-Audio-7B, enabling a model-as-a-judge approach for rapid evaluation of TTS systems on the ATT dataset. The datasets and Auto-ATT model can be cound in ATT Collection.
LongCat Flash Omni FP8
Model Introduction We introduce LongCat-Flash-Omni, a state-of-the-art open-source omni-modal model with 560 billion parameters (with 27B activated), excelling at real-time audio-visual interaction, which is attained by leveraging LongCat-Flash's high-performance Shortcut-connected Mixture-of-Experts (MoE) architecture with zero-computation experts, augmented by efficient multimodal perception and speech reconstruction modules. Through an effective curriculum-inspired progressive training strategy, our model achieves comprehensive multimodal capabilities while maintaining strong unimodal capability. Now, we open-source the model to foster future research and development in the community. LongCat-Flash-Omni is an open-source omni-modal model that achieves state-of-the-art cross-modal comprehension performance. It seamlessly integrates powerful offline multi-modal understanding with real-time audio–visual interaction within a single all-in-one framework. 🌟 Large-Scale with Low-Latency Audio–Visual Interaction By leveraging an efficient LLM backbone, carefully designed lightweight modality encoders and decoder, and a chunk-wise audio–visual feature interleaving mechanism, LongCat-Flash-Omni achieves low-latency, high-quality audio–visual processing and streaming speech generation. It supports a context window of up to 128K tokens, enabling advanced capabilities in long-term memory, multi-turn dialogue, and temporal reasoning across multiple modalities. The model adopts an innovative multi-stage pretraining pipeline that progressively incorporates text, audio, and visual modalities under a balanced data strategy and early-fusion training paradigm, ensuring strong omni-modal performance without degradation in any single modality. Inspired by the concept of modality decoupling, we propose a Modality-Decoupled Parallelism training scheme that significantly enhances the efficiency of large-scale and highly challenging multimodal training. 🌟 Open-Source Contribution We provide a comprehensive overview of the training methodology and data strategies behind LongCat-Flash-Omni, and release the model to accelerate future research and innovation in omni-modal intelligence. For more detail, please refer to the comprehensive LongCat-Flash-Omni Technical Report. | Benchmark | LongCat-Flash-Omni Instruct | Gemini-2.5-Pro (ThinkingBudget128) | Gemini-2.5-Flash (non-thinking) | Qwen3-Omni Instruct | Qwen2.5-Omni Instruct | |-----------|-------------------------------|-----------------------------------|------------------------------|----------------------|-------------------------| | OmniBench | 61.38 | 66.80 | 54.99 | 58.41 | 48.16 | | WorldSense | 60.89 | 63.96 | 58.72 | 52.01 | 46.69 | | DailyOmni | 82.38 | 80.61 | 80.78 | 69.33 | 47.45 | | UNO-Bench | 49.90 | 64.48 | 54.30 | 42.10 | 32.60 | Image-to-Text | Benchmark | LongCat-Flash-Omni Instruct | Gemini-2.5-Pro (ThinkingBudget128) | Gemini-2.5-Flash (non-thinking) | Qwen3-Omni Instruct | Seed-1.6 | GPT-4o-1120 | Qwen3-VL-235B-A22B-Instruct | Qwen2.5-VL-72B-Instruct | |-----------|-------------------------------|-----------------------------------|------------------------------|----------------------|----------|---------------|------------------------------|---------------------------| | General |||||||||| | MMBench-EN test | 87.5 | 89.8 | 89.3 | 86.8 | 88.5 | 83.7 | 88.3 | 88.6 | | MMBench-ZH test | 88.7 | 89.2 | 88.5 | 86.4 | 83.8 | 82.8 | 89.8 | 87.9 | | RealWorldQA | 74.8 | 76.0 | 73.9 | 72.9 | 74.5 | 74.1 | 79.3 | 75.7 | | MMStar | 70.9 | 78.5 | 75.5 | 68.5 | 71.5 | 63.2 | 78.4 | 68.2 | | STEM & Reasoning |||||||||| | MathVista mini | 77.9 | 77.7 | 77.1 | 75.9 | 78.7 | 62.8 | 84.9 | 74.8 | | MMMU val | 70.7 | 80.9 | 76.3 | 69.1 | 74.9 | 69.4 | 78.7 | 70.2 | | MMVet | 69.0 | 80.7 | 79.5 | 68.9 | 74.4 | 76.6 | 75.9 | 74.5 | | Multi-Image |||||||||| | BLINK | 63.1 | 70.0 | 65.7 | 56.1 | 65.0 | 65.5 | 70.7 | 60.1 | | MuirBench | 77.1 | 74.0 | 73.7 | 62.1 | 74.6 | 70.5 | 72.8 | 70.7 | | Mantis | 84.8 | 83.9 | 83.4 | 80.7 | 81.1 | 79.3 | 79.7 | 82.0 | | Text Recognition & Chart/Document Understanding |||||||||| | ChartQA | 87.6 | 71.7 | 77.6 | 86.8 | 82.4 | 74.5 | 89.2 | 89.5 | | DocVQA | 91.8 | 94.0 | 93.6 | 95.7 | 94.3 | 80.9 | 94.6 | 96.4 | | OCRBench | 84.9 | 87.2 | 85.6 | 85.5 | 85.6 | 82.3 | 91.2 | 88.5 | | OmniDocBench EN/ZH ↓ | 22.8/29.0 | 31.9/24.5 | 22.8/32.9 | 28.4/40.5 | 22.0/27.6 | 25.9/37.7 | 13.6/17.5 | 22.6/32.4 | | Grounding & Counting |||||||||| | RefCOCO-avg | 92.3 | 75.4 | 71.9 | 89.3 | 80.2 | - | 87.1 | 90.3 | | CountBench | 92.4 | 91.0 | 78.6 | 90.0 | 94.1 | 85.6 | 94.3 | 93.6 | | Graphical User Interface (GUI) |||||||||| | VisualWebBench | 78.7 | 81.1 | 73.5 | 79.3 | 81.1 | 77.1 | 80.8 | 82.3 | | ScreenSpot-v2 | 91.2 | 75.8 | 63.9 | 94.7 | 91.7 | - | 93.4 | 92.9 | | AndroidControl low | 91.2 | 79.2 | 79.1 | 90.5 | 84.6 | 65.2 | 90.0 | 93.7 | | AndroidControl high | 75.6 | 60.8 | 55.5 | 70.8 | 55.2 | 41.7 | 74.1 | 67.4 | Note: Values marked with are sourced from public reports. As GPT-4o does not support image grounding, we do not report its results on RefCOCO and ScreenSpot-v2 Video-to-Text | Benchmark | LongCat-Flash-Omni Instruct | Gemini-2.5-Pro (ThinkingBudget128) | Gemini-2.5-Flash (non-thinking) | Qwen3-Omni Instruct | Seed-1.6 | GPT-4o-1120 | Qwen3-VL (235B-A22B-Instruct) | Qwen2.5-VL-72B-Instruct | |-----------|-------------------------------|-----------------------------------|------------------------------|----------------------|----------|---------------|------------------------------|---------------------------| | Short Video |||||||||| | MVBench | 75.2 | 66.4 | 63.0 | 69.3 | 68.4 | 62.1 | 71.3 | 70.4 | | NextQA | 86.2 | 84.2 | 81.4 | 82.4 | 84.1 | 79.7 | 81.3 | 82.3 | | TempCompass | 82.2 | 80.8 | 80.2 | 73.5 | 79.4 | 76.4 | 80.5 | 74.8 | | Long Video |||||||||| | VideoMME (w/o audio) | 76.2 | - | - | 70.5 | 75.2 | 73.2 | 79.2 | 73.3 | | VideoMME (w/ audio) | 78.2 | 80.6 | 78.5 | 73.0 | - | - | - | - | | LongVideoBench | 69.3 | 69.4 | 66.4 | 65.4 | 64.8 | 63.9 | - | 60.7 | | STEM & Reasoning |||||||||| | MMVU | 67.1 | 75.6 | 72.4 | 62.4 | 67.3 | 67.4 | 69.3 | 62.9 | | Video-MMMU | 67.5 | 79.4 | 76.6 | 60.3 | 75.4 | 68.0 | 73.7 | 59.3 | Note: Values marked with are sourced from public reports. Table 1: Automatic Speech Recognition (ASR) and Speech-to-Text Translation (S2TT) | Benchmark | LongCat-Flash-Omni Instruct | Gemini-2.5-Pro (ThinkingBudget128) | GPT-4o-Audio | Qwen3-Omni Instruct | Kimi-Audio | Step-Audio-2-mini | |-----------|-------------------------------|-----------------------------------|--------------|----------------------|------------|-------------------| | ASR | | | | | | | | LibriSpeech (test-clean \| test-other) | 1.57 \| 4.01 | 1.74 \| 3.80 | 30.00 \| 41.83 | 1.22 \| 2.48 | 1.28 \| 2.42 | 1.33 \| 2.86 | | AISHELL-1 | 0.63 | 3.11 | 34.81 | 0.84 | 0.60 | 0.78 | | AISHELL-2 | 2.78 | 5.24 | 77.73 | 2.34 | 2.56 | 2.16 | | Fleurs (zh \| en) | 3.99 \| 5.02 | 2.24 \| 4.77 | 3.91 \| 5.56 | 2.20 \| 2.72 | 2.69 \| 4.44 | 2.53 \| 3.05 | | CommonVoice 15 (zh \| en) | 4.98 \| 13.59 | 47.30 \| 49.86 | 42.83 \| 23.88 | 4.31 \| 6.05 | 8.46 \| 7.92 | 5.00 \| 6.75 | | WenetSpeech (test-meeting \| test-net) | 6.69 \| 6.09 | 136.13 \| 32.82 | 54.35 \| 67.90 | 5.89 \| 4.69 | 6.28 \| 5.37 | 4.87 \| 4.82 | | S2TT (BLEU) | | | | | | | | CoVost2 en→zh | 47.23 | 41.94 | 29.32 | 48.72 | - | 49.12 | | CoVost2 zh→en | 27.32 | 25.38 | 16.01 | 21.51 | - | 29.47 | Note: ASR results are in CER/WER (lower is better), S2TT results are in BLEU score. Table 2: Audio Understanding | Benchmark | LongCat-Flash-Omni Instruct | Gemini-2.5-Pro (ThinkingBudget128) | GPT-4o-Audio | Qwen3-Omni Instruct | Kimi-Audio | Step-Audio-2-mini | |-----------|-------------------------------|-----------------------------------|--------------|----------------------|------------|-------------------| | MMAU | 75.90 | 72.80 | 68.40 | 77.50 | 65.20 | 73.20 | | VocalSound | 92.76 | 89.45 | 82.37 | 91.60 | 94.85 | 87.58 | | TUT2017 | 65.43 | 33.15 | 20.74 | 40.74 | 65.25 | 30.67 | | ClothoAQA | 72.83 | 69.67 | 61.87 | 75.16 | 72.21 | 68.39 | | Nonspeech7k | 93.79 | 87.59 | 72.28 | 80.83 | 93.93 | 73.24 | | CochlScene | 70.02 | 45.34 | 34.94 | 43.03 | 80.42 | 44.58 | | MELD | 54.60 | 46.74 | 39.00 | 50.80 | 59.13 | 31.44 | Table 3: Audio-to-Text Chat | Benchmark | LongCat-Flash-Omni Instruct | Gemini-2.5-Pro (ThinkingBudget128) | GPT-4o-Audio | Qwen3-Omni Instruct | Kimi-Audio | Step-Audio-2-mini | |-----------|-------------------------------|-----------------------------------|--------------|----------------------|------------|-------------------| | OpenAudioBench | | | | | | | | LlamaQuestions | 83.33 | 83.00 | 86.30 | 83.30 | 79.33 | 69.70 | | ReasoningQA | 79.71 | 80.30 | 68.71 | 84.16 | 58.02 | 55.64 | | TriviaQA | 86.20 | 90.20 | 76.00 | 75.90 | 62.10 | 45.30 | | Webquestions | 76.00 | 80.90 | 81.20 | 75.20 | 70.20 | 54.40 | | AlpacaEval | 75.43 | 76.58 | 81.61 | 85.43 | 75.73 | 53.92 | | VoiceBench | | | | | | | | AlpacaEval | 4.94 | 4.70 | 4.73 | 4.74 | 4.46 | 3.84 | | CommonEval | 4.32 | 4.11 | 4.37 | 4.54 | 3.97 | 3.19 | | OpenBookQA | 93.41 | 95.16 | 87.90 | 89.70 | 83.52 | 72.97 | | SDQA | 82.46 | 83.54 | 90.10 | 76.90 | 63.12 | 44.85 | | MMSU | 81.95 | 88.32 | 78.90 | 69.00 | 62.17 | 52.00 | | AdvBench | 100 | 97.69 | 99.23 | 99.30 | 100 | 97.00 | | IFEval | 77.99 | 77.83 | 66.81 | 77.80 | 61.10 | 29.80 | | Benchmark | LongCat-Flash-Omni Instruct | LongCat-Flash | DeepSeek V3.1 | Qwen3 MoE-2507 | Kimi-K2 | GPT-4.1 | Claude Sonnet-4 | Gemini-2.5-Flash | |-----------|-------------------------------|---------------|---------------|----------------|---------|---------|-----------------|------------------| | Architecture | MoE | MoE | MoE | MoE | MoE | - | - | - | | # Total Params | 560B | 560B | 671B | 235B | 1043B | - | - | - | | # Activated Params | 27B | 27B | 37B | 22B | 32B | - | - | - | | General Domains |||||||||| | MMLU (acc) | 90.30 | 89.71 | 90.96 | 90.23 | 89.86 | 89.64 | 91.75 | 86.33 | | MMLU-Pro (acc) | 82.73 | 82.68 | 84.45 | 84.83 | 82.06 | 81.72 | 83.74 | 81.95 | | CEval (acc) | 91.68 | 90.44 | 89.21 | 92.70 | 91.26 | 79.53 | 86.63 | 78.78 | | CMMLU (acc) | 89.39 | 84.34 | 88.04 | 88.14 | 89.66 | 77.65 | 86.51 | 78.30 | | Instruction Following |||||||||| | IFEval (acc) | 82.44 | 89.65 | 86.69 | 88.54 | 88.91 | 85.58 | 88.35 | 83.92 | | COLLIE (acc) | 45.69 | 57.10 | 43.80 | 49.71 | 56.34 | 50.00 | 51.22 | 48.60 | | Meeseeks-zh (acc) | 39.05 | 43.03 | 33.83 | 35.32 | 42.79 | 41.54 | 35.07 | 34.84 | | Mathematical Reasoning |||||||||| | MATH500 (acc) | 97.60 | 96.40 | 96.08 | 98.80 | 97.60 | 90.60 | 93.80 | 98.40 | | AIME24 (avg@10) | 72.92 | 70.42 | 66.30 | 81.67 | 69.60 | 47.00 | 47.00 | 79.67 | | BeyondAIME (avg@10) | 47.40 | 43.00 | 36.50 | 57.60 | 36.60 | 22.10 | 20.50 | 44.20 | | General Reasoning |||||||||| | GPQA-diamond (acc) | 74.41 | 73.23 | 74.90 | 77.43 | 75.76 | 67.68 | 70.71 | 80.30 | | DROP (f1) | 83.53 | 79.06 | 84.19 | 78.57 | 89.04 | 66.94 | 73.06 | 45.03 | | ZebraLogic (acc) | 86.00 | 89.30 | 85.30 | 94.22 | 89.11 | 56.30 | 80.10 | 57.00 | | GraphWalks-128k (precision) | 56.00 | 51.05 | 73.54 | 80.72 | 47.50 | 85.02 | 80.57 | 64.83 | | Coding |||||||||| | LiveCodeBench (pass@1) | 52.64 | 48.02 | 56.40 | 46.48 | 46.70 | 39.21 | 45.59 | 39.65 | | Humaneval+ (pass@1) | 90.85 | 88.41 | 92.68 | 94.51 | 85.98 | 93.29 | 94.51 | 87.80 | | MBPP+ (pass@1) | 80.16 | 79.63 | 79.89 | 79.89 | 81.75 | 79.37 | 80.16 | 76.19 | Note: Values marked with are sourced from other public reports. Note that DeepSeek-V3.1, Qwen3-235B-A22B, Gemini2.5-Flash, and Claude4-Sonnet are evaluated under their non-thinking mode. LongCat-Flash-Omni is a MoE model, which means that the model weights are distributed across multiple devices. Therefore, during loading in Hugging Face Transformers or vLLM, model weights will be automatically downloaded based on the model name. However, if your runtime environment is not conducive to downloading weights during execution, you can refer to the following commands to manually download the model weights to a local directory: We have implemented basic adaptations in SGLang to support running the Longcat-Flash-Omni model. Currently, the official SGLang does not natively support Longcat-Flash-Omni, so you can temporarily use our development branch for local installation and testing. Due to its size of 560 billion parameters (560B), LongCat-Flash-Omni requires at least one node (e.g., 8×H20-141G) to host the model weights in FP8 format, and at least two nodes (e.g., 16×H800-80G) for BF16 weights. Detailed launch configurations are provided below. Installation python >= 3.10.0 (Recommend to use Anaconda) PyTorch >= 2.8 CUDA >= 12.9 The model can be served on your cluster using a combination of Tensor Parallelism and Expert Parallelism. Once all dependencies are installed, you can launch the demo using the following command. > NOTE: Replace \$NODE\RANK and \$MASTER\IP with the corresponding values of your GPU machines. All test cases are defined in examplesdict.py, and additional test cases may be added as needed. After model execution, the generated results are saved in the directory specified by the --output-dir parameter. You can use LongCat-Flash-Omni (web version currently only supports audio interaction features) on https://longcat.ai. The full service will be provided in subsequent updates. We are excited to announce that the LongCat-Flash-Omni app is now available for both Android and iOS. For Android, you can download it from the following QR code. For iOS, you can download it by searching "LongCat" at App Store or QR code. Currently, only the Chinese App Store is supported. The model weights are released under the MIT License. Any contributions to this repository are licensed under the MIT License, unless otherwise stated. This license does not grant any rights to use Meituan trademarks or patents. Usage Considerations This model has not been specifically designed or comprehensively evaluated for every possible downstream application. Developers should take into account the known limitations of large language models, including performance variations across different languages, and carefully assess accuracy, safety, and fairness before deploying the model in sensitive or high-risk scenarios. It is the responsibility of developers and downstream users to understand and comply with all applicable laws and regulations relevant to their use case, including but not limited to data protection, privacy, and content safety requirements. Nothing in this Model Card should be interpreted as altering or restricting the terms of the MIT License under which the model is released. Citation We kindly encourage citation of our work if you find it useful. Contact Please contact us at [email protected] or join our WeChat Group if you have any questions.
LongCat-Flash-Prover
LongCat-Flash-Thinking-FP8
UNO-Scorer-Qwen3-14B
LongCat-Video-Avatar
LongCat Audio Codec
LongCat-Audio-Codec: An Audio Tokenizer and Detokenizer Solution Designed for Speech Large Language Models We are excited to introduce LongCat-Audio-Codec, an audio tokenizer and detokenizer solution designed for speech large language models. It works by generating semantic and acoustic tokens in parallel, enabling high-fidelity audio reconstruction at extremely low bitrates with excellent backend support for Speech LLM. This repository hosts the resources for LongCat-Audio-Codec. For complete documentation, installation guides, and usage examples, please visit our GitHub Repository. - High Fidelity at Ultra-Low Bitrates: As a codec, it achieves high-intelligibility audio reconstruction at extremely low bitrates. - Low-Frame-Rate Tokenizer: As a tokenizer, it extracts semantic and acoustic tokens in parallel at a low frame rate of 16.6Hz, with flexible acoustic codebook configurations to adapt to different downstream tasks. - Low-Latency Streaming Detokenizer: Equipped with a specially designed streaming detokenizer that requires minimal future information to deliver high-quality audio output with low latency. - Super-Resolution Capability: Integrates audio super-resolution processing into the detokenizer, enabling the generation of high-quality audio with a higher sample rate if the sample rate of original input is lower than 24khz. | Resourcess | Notes | | ----------------------------- | ------------------------------------------------------------ | | LongCatAudioCodecencoder | encoder weights of LongCat-Audio-Codec with semantic encoder and acoustic encoder | | LongCatAudioCodecencodercmvn | coefficients of Cepstral Mean and Variance Normalization, used by encoder | | LongCatAudioCodecdecoder16k4codebooks | native 16k decoder, supply 1 semantic codebook and at most 3 acoustic codebooks | | LongCatAudioCodecdecoder24k2codebooks | super-resolution 24k decoder, supply 1 semantic codebook and 1 acoustic codebook | | LongCatAudioCodecdecoder24k4codebooks | super-resolution 24k decoder, supply 1 semantic codebook and at most 3 acoustic codebooks | If you find our work useful in your research, please consider citing: The code and models in this repository are released under the MIT License. This grants you broad permissions to use, copy, modify, and distribute the software, provided you include the original copyright notice. We claim no ownership over any content you generate using these models. The software is provided "AS IS", without any warranty. You are fully accountable for your use of the models. Your usage must not involve creating or sharing any content that violates applicable laws, causes harm to individuals, disseminates personal information with harmful intent, spreads misinformation, or targets vulnerable groups. 📩 Contact Please contact us at [email protected] or open an issue if you have any questions.