Alibaba-DAMO-Academy

26 models โ€ข 1 total models in database
Sort by:

RynnBrain-Nav-8B

NaNK
license:apache-2.0
1,979
13

RynnBrain-8B

NaNK
license:apache-2.0
897
12

RynnBrain-2B

NaNK
license:apache-2.0
824
25

RynnBrain-30B-A3B

NaNK
license:apache-2.0
414
18

RynnBrain-Plan-8B

NaNK
license:apache-2.0
392
7

RynnBrain-Plan-30B-A3B

NaNK
license:apache-2.0
92
5

PixelRefer-7B

NaNK
base_model:DAMO-NLP-SG/VideoLLaMA3-2B-Image
63
1

LumosX

license:apache-2.0
60
18

RynnEC-2B

RynnEC: Bringing MLLMs into Embodied World If our project helps you, please give us a star โญ on Github to support us. ๐Ÿ™๐Ÿ™ ๐Ÿ“ฐ News [2025.08.17] ๐Ÿค— RynnEC-7B model checkpoint has been released in Huggingface. [2025.08.08] ๐Ÿ”ฅ๐Ÿ”ฅ Release our RynnEC-2B model, RynnEC-Bench and training code. ๐ŸŒŸ Introduction RynnEC is a video multi-modal large language model (MLLM) specifically designed for embodied cognition tasks. ๐Ÿ“Architecture RynnEC can handle a variety of input types, including images, videos, visual prompts, and task instructions. Visual inputs are processed using a Vision Encoder equipped with an any-resolution strategy, while visual prompts are handled by a region encoder to extract fine-grained features. Textual inputs are seamlessly converted into a unified token stream through tokenization. For video segmentation tasks, a mask decoder is employed to transform the output segmentation embeddings into binary masks, ensuring precise and effective results. | Model | Base Model | HF Link | | -------------------- | ------------ | ------------------------------------------------------------ | | RynnEC-2B | Qwen2.5-1.5B-Instruct | Alibaba-DAMO-Academy/RynnEC-2B | | RynnEC-7B | Qwen2.5-7B-Instruct | Alibaba-DAMO-Academy/RynnEC-7B | Benchmark comparison across object cognition and spatial cognition. With a highly efficient 2B-parameter architecture, RynnEC-2B achieves state-of-the-art (SOTA) performance on complex spatial cognition tasks. If you find RynnEC useful for your research and applications, please cite using this BibTeX:

NaNK
license:apache-2.0
59
10

RynnBrain-CoP-8B

NaNK
license:apache-2.0
59
6

PixelRefer-2B

NaNK
base_model:DAMO-NLP-SG/VideoLLaMA3-2B-Image
45
1

Lumos-1

license:apache-2.0
33
10

PixelRefer-Lite-2B

NaNK
base_model:DAMO-NLP-SG/VideoLLaMA3-2B-Image
17
1

PixelRefer-Lite-7B

NaNK
base_model:DAMO-NLP-SG/VideoLLaMA3-2B-Image
17
1

RynnVLA-001-7B-Trajectory

NaNK
license:apache-2.0
12
4

RynnVLA-001-7B-Base

Github Repo: https://github.com/alibaba-damo-academy/RynnVLA-001 ๐Ÿ”ฅ We release RynnVLA-001-7B-Base (Stage 1: Ego-Centric Video Generative Pretraining), which is pretrained on large-scale ego-centric manipulation videos. RynnVLA-001 is a VLA model based on pretrained video generation model. The key insight is to implicitly transfer manipulation skills learned from human demonstrations in ego-centric videos to the manipulation of robot arms.

NaNK
license:apache-2.0
8
8

RynnEC-7B

RynnEC: Bringing MLLMs into Embodied World If our project helps you, please give us a star โญ on Github to support us. ๐Ÿ™๐Ÿ™ ๐Ÿ“ฐ News [2025.08.17] ๐Ÿค— RynnEC-7B model checkpoint has been released in Huggingface. [2025.08.08] ๐Ÿ”ฅ๐Ÿ”ฅ Release our RynnEC-2B model, RynnEC-Bench and training code. ๐ŸŒŸ Introduction RynnEC is a video multi-modal large language model (MLLM) specifically designed for embodied cognition tasks. ๐Ÿ“Architecture RynnEC can handle a variety of input types, including images, videos, visual prompts, and task instructions. Visual inputs are processed using a Vision Encoder equipped with an any-resolution strategy, while visual prompts are handled by a region encoder to extract fine-grained features. Textual inputs are seamlessly converted into a unified token stream through tokenization. For video segmentation tasks, a mask decoder is employed to transform the output segmentation embeddings into binary masks, ensuring precise and effective results. | Model | Base Model | HF Link | | -------------------- | ------------ | ------------------------------------------------------------ | | RynnEC-2B | Qwen2.5-1.5B-Instruct | Alibaba-DAMO-Academy/RynnEC-2B | | RynnEC-7B | Qwen2.5-7B-Instruct | Alibaba-DAMO-Academy/RynnEC-7B | Benchmark comparison across object cognition and spatial cognition. With a highly efficient 2B-parameter architecture, RynnEC-2B achieves state-of-the-art (SOTA) performance on complex spatial cognition tasks. If you find RynnEC useful for your research and applications, please cite using this BibTeX:

NaNK
license:apache-2.0
6
30

RynnEC-7B-Stage3

NaNK
license:apache-2.0
4
0

RynnEC-2B-stage3

NaNK
license:apache-2.0
2
0

RynnEC-2B-Stage2

NaNK
license:apache-2.0
1
0

UniLumos

license:apache-2.0
0
10

WorldVLA

NaNK
license:apache-2.0
0
10

Photon-S1

license:apache-2.0
0
2

Photon-S2

license:apache-2.0
0
2

OmniCT-7B

NaNK
license:apache-2.0
0
1

OmniCT-3B

NaNK
license:apache-2.0
0
1