Paranioar

6 models • 2 total models in database

Sort by:

NEO1 0 2B SFT

From Pixels to Words -- Towards Native Vision-Language Primitives at Scale Two lingering clouds cast shadows over its widespread exploration and promotion: - What fundamental constraints set native VLMs apart from modular ones, and to what extent can these barriers be overcome? - How to make research in native VLMs more accessible and democratized, thereby accelerating progress in the field. We construct native VLMs built from first principles, where its primitive should: - effectively align pixel and word representations within a shared semantic space; - seamlessly integrate the strengths of separate vision and language modules; - inherently embody various cross-modal properties that support unified vision-language encoding, aligning, and reasoning. - With only 390M image-text examples, NEO develops strong visual perception from scratch inside a dense and monolithic model via elaborate primitives. - NEO serves as a cornerstone for scalable and powerful native VLMs, paired with reusable components that foster a cost-effective and extensible ecosystem. - Number of Layers: 40 (12 for Pre-Buffer & 28 for Post-LLM) We release the 2B weights of NEO10 in Pre-Training (PT), Mid-Training (MT), and Supervised Fine-Tuning (SFT). | Model name | Weight | | ---------- | ------------------------------------------------------- | | NEO-2B-PT | 🤗 NEO-2B-PT HF link | | NEO-2B-MT | 🤗 NEO-2B-MT HF link | | NEO-2B-SFT | 🤗 NEO-2B-SFT HF link | ✒️✒️ Citation If NEO is helpful for your research, please consider star ⭐ and citation 📝 :

NaNK

license:apache-2.0

439

NEO1_0-2B-PT

NaNK

license:apache-2.0

NEO1 0 9B SFT

NaNK

license:apache-2.0

NEO1_0-9B-PT

From Pixels to Words -- Towards Native Vision-Language Primitives at Scale Two lingering clouds cast shadows over its widespread exploration and promotion: - What fundamental constraints set native VLMs apart from modular ones, and to what extent can these barriers be overcome? - How to make research in native VLMs more accessible and democratized, thereby accelerating progress in the field. We construct native VLMs built from first principles, where its primitive should: - effectively align pixel and word representations within a shared semantic space; - seamlessly integrate the strengths of separate vision and language modules; - inherently embody various cross-modal properties that support unified vision-language encoding, aligning, and reasoning. - With only 390M image-text examples, NEO develops strong visual perception from scratch inside a dense and monolithic model via elaborate primitives. - NEO serves as a cornerstone for scalable and powerful native VLMs, paired with reusable components that foster a cost-effective and extensible ecosystem. - Number of Layers: 42 (6 for Pre-Buffer & 36 for Post-LLM) We release the 9B weights of NEO10 in Pre-Training (PT), Mid-Training (MT), and Supervised Fine-Tuning (SFT). | Model name | Weight | | ---------- | ------------------------------------------------------- | | NEO-9B-PT | 🤗 NEO-9B-PT HF link | | NEO-9B-MT | 🤗 NEO-9B-MT HF link | | NEO-9B-SFT | 🤗 NEO-9B-SFT HF link | ✒️✒️ Citation If NEO is helpful for your research, please consider star ⭐ and citation 📝 :

NaNK

license:apache-2.0

NEO1_0-9B-MT

From Pixels to Words -- Towards Native Vision-Language Primitives at Scale Two lingering clouds cast shadows over its widespread exploration and promotion: - What fundamental constraints set native VLMs apart from modular ones, and to what extent can these barriers be overcome? - How to make research in native VLMs more accessible and democratized, thereby accelerating progress in the field. We construct native VLMs built from first principles, where its primitive should: - effectively align pixel and word representations within a shared semantic space; - seamlessly integrate the strengths of separate vision and language modules; - inherently embody various cross-modal properties that support unified vision-language encoding, aligning, and reasoning. - With only 390M image-text examples, NEO develops strong visual perception from scratch inside a dense and monolithic model via elaborate primitives. - NEO serves as a cornerstone for scalable and powerful native VLMs, paired with reusable components that foster a cost-effective and extensible ecosystem. - Number of Layers: 42 (6 for Pre-Buffer & 36 for Post-LLM) We release the 9B weights of NEO10 in Pre-Training (PT), Mid-Training (MT), and Supervised Fine-Tuning (SFT). | Model name | Weight | | ---------- | ------------------------------------------------------- | | NEO-9B-PT | 🤗 NEO-9B-PT HF link | | NEO-9B-MT | 🤗 NEO-9B-MT HF link | | NEO-9B-SFT | 🤗 NEO-9B-SFT HF link | ✒️✒️ Citation If NEO is helpful for your research, please consider star ⭐ and citation 📝 :

NaNK

license:apache-2.0

NEO1_0-2B-MT

NaNK

license:apache-2.0