LongCat-Flash-Thinking-ZigZag
33
28
license:mit
by
meituan-longcat
Language Model
OTHER
New
33 downloads
Early-stage
Edge AI:
Mobile
Laptop
Server
Unknown
Mobile
Laptop
Server
Quick Summary
AI model with specialized capabilities.
Code Examples
python
from flash_mla_interface import flash_mla_varlen_func, flash_mla_with_kvcache
from streaming_sparse_attn_interface import streaming_sparse_attn_varlen_func, streaming_sparse_attn_with_kvcache
full_attn_out = flash_mla_varlen_func_(
q, # [nnz_q, num_heads_q, head_dim_qk]
k, # [nnz_k, num_heads_k, head_dim_qk]
v, # [nnz_k, num_heads_v, head_dim_vo]
cu_seqlens_q,
cu_seqlens_k,
max_seqlen_q,
max_seqlen_k,
softmax_scale,
causal=True
)
stream_attn_out = streaming_sparse_attn_varlen_func_(
q,
k,
v,
cu_seqlens_q,
cu_seqlens_k,
max_seqlen_q,
max_seqlen_k,
softmax_scale,
causal=True
)
full_attn_out = flash_mla_with_kvcache(
q, # [batch_size, seqlen_q, num_heads_q, head_dim_nope + head_dim_rope]
blocked_k, # [num_pages, page_size, num_heads_k, head_dim_nope + head_dim_rope]
cache_seqlens,
block_table,
head_dim_nope,
softmax_scale,
causal=True
)
stream_attn_out = streaming_sparse_attn_with_kvcache(
q,
blocked_k,
cache_seqlens,
block_table,
head_dim_nope,
softmax_scale,
causal=True
)Deploy This Model
Production-ready deployment in minutes
Together.ai
Instant API access to this model
Production-ready inference API. Start free, scale to millions.
Try Free APIReplicate
One-click model deployment
Run models in the cloud with simple API. No DevOps required.
Deploy NowDisclosure: We may earn a commission from these partners. This helps keep LLMYourWay free.