lucasjin

14 models • 1 total models in database

Sort by:

Namo 500M V1

🤗 Namo-500M-V1 &nbsp&nbsp | &nbsp&nbsp🐝 Community You: I don't have GPUs to run VLMs. Namo R1: Hold my beer.... let's do this on CPU. Namo R1 🔥🔥 surpassed SmolVLM and Moondream2 in terms of same size! And we are keep evolving, more advanced models are under training! We are excited to open-source Namo, an extremly small yet mighty MLLM. While numerous MLLMs exist, few offer true extensibility or fully open-source their training data, model architectures, and training schedulers - critical components for reproducible AI research. The AI community has largely overlooked the potential of compact MLLMs, despite their demonstrated efficiency advantages. Our analysis reveals significant untapped potential in sub-billion parameter models, particularly for edge deployment and specialized applications. To address this gap, we're releasing Namo R1, a foundational 500M parameter model trained from scratch using innovative architectural choices. 1. CPU friendly: Even on CPUs, Namo R1 can runs very fast; 2. Omni-modal Scalability: Native support for future expansion into audio (ASR/TTS) and cross-modal fusion; 3. Training Transparency: Full disclosure of data curation processes and dynamic curriculum scheduling techniques. - `2025.02.21`: more to come...! - `2025.02.21`: 🔥🔥 The first version is ready to open, fire the MLLM power able to runs on CPU! - `2025.02.17`: Namo R1 start training. the result might keep updating as new models trained. | Model | MMB-EN-T | MMB-CN-T | Size | | -------------------- | -------------- | -------------- | ---- | | Namo-500M | 68.8 | 48.7 | 500M | | Namo-700M | training | training | 700M | | Namo-500M-R1 | training | training | 500M | | Namo-700M-R1 | training | training | 700M | | SmolVLM-500M | 53.8 | 35.4 | 500M | | SmolVLM-Instruct-DPO | 67.5 | 49.8 | 2.3B | | Moondream1 | 62.3 | 19.8 | 1.9B | | Moondream2 | 70 | 28.7 | 1.9B | ⚠️ Currently, the testing has only been conducted on a limited number of benchmarks. In the near future, more metrics will be reported. Even so, we've observed significant improvements compared to other small models. For cli multi-turn chat in terminal you can run `python demo.py`. (Namo cli directly in your terminal would be avaiable later.) In contrast to open-source VLMs like Qwen2.5-3B and MiniCPM, the Namo series offers the following features that enable anyone to train their own VLMs from scratch: - Extremely Small: Our first series has only 500 million parameters yet powerful on various tasks. - OCR Capability: With just a 500M model, you can perform multilingual OCR, covering not only Chinese and English but also Japanese and other languages. - Dynamic Resolution: We support native dynamic resolution as input, making it robust for images of any ratio. - Fully Open Source: We opensource all model codes including training steps and scripts! - R1 Support: Yes, we now support R1 for post-training. Above all, we are also ready to help when u want train your MLLM from scratch at any tasks! We are still actively training on new models, here are few things we will arrive: - Speech model; - Vision model with more decent vision encoders, such as SigLip2; - TTS ability; - Slightly larger models, up to 7B; 1. Got error when using deepspeed: ` AssertionError: nosync context manager is incompatible with gradient partitioning logic of ZeRO stage 2` ? Please upgrade transformers to 4.48+ and use latest deepspeed. All right reserved by Namo authors, code released under MIT License.

license:mit

lucasjin

Namo 500M V1

punc_ct-transformer_zh-cn-common-vocab272727-pytorch

Qwen3-VL-2B-Stage2-Unofficial

aimv2-large-patch14-224

Namo-500M-V2

LLava-Qwen-1_8B-Base

llava-qwen2-5-7b-chat-f2-glu-vit-lora

llava-qwen2-5-7b-chat-f3-glu-vit-lora

aimv2-large-patch14-native

XCodec2

chinese_ocr_llava

laion-gpt4v-dataset-images

Florence-2-Huge

drawmodels