Paranioar

6 models • 2 total models in database
Sort by:

NEO1 0 2B SFT

From Pixels to Words -- Towards Native Vision-Language Primitives at Scale Two lingering clouds cast shadows over its widespread exploration and promotion: - What fundamental constraints set native VLMs apart from modular ones, and to what extent can these barriers be overcome? - How to make research in native VLMs more accessible and democratized, thereby accelerating progress in the field. We construct native VLMs built from first principles, where its primitive should: - effectively align pixel and word representations within a shared semantic space; - seamlessly integrate the strengths of separate vision and language modules; - inherently embody various cross-modal properties that support unified vision-language encoding, aligning, and reasoning. - With only 390M image-text examples, NEO develops strong visual perception from scratch inside a dense and monolithic model via elaborate primitives. - NEO serves as a cornerstone for scalable and powerful native VLMs, paired with reusable components that foster a cost-effective and extensible ecosystem. - Number of Layers: 40 (12 for Pre-Buffer & 28 for Post-LLM) We release the 2B weights of NEO10 in Pre-Training (PT), Mid-Training (MT), and Supervised Fine-Tuning (SFT). | Model name | Weight | | ---------- | ------------------------------------------------------- | | NEO-2B-PT | šŸ¤— NEO-2B-PT HF link | | NEO-2B-MT | šŸ¤— NEO-2B-MT HF link | | NEO-2B-SFT | šŸ¤— NEO-2B-SFT HF link | āœ’ļøāœ’ļø Citation If NEO is helpful for your research, please consider star ⭐ and citation šŸ“ :

NaNK
license:apache-2.0
439
7

NEO1_0-2B-PT

From Pixels to Words -- Towards Native Vision-Language Primitives at Scale Two lingering clouds cast shadows over its widespread exploration and promotion: - What fundamental constraints set native VLMs apart from modular ones, and to what extent can these barriers be overcome? - How to make research in native VLMs more accessible and democratized, thereby accelerating progress in the field. We construct native VLMs built from first principles, where its primitive should: - effectively align pixel and word representations within a shared semantic space; - seamlessly integrate the strengths of separate vision and language modules; - inherently embody various cross-modal properties that support unified vision-language encoding, aligning, and reasoning. - With only 390M image-text examples, NEO develops strong visual perception from scratch inside a dense and monolithic model via elaborate primitives. - NEO serves as a cornerstone for scalable and powerful native VLMs, paired with reusable components that foster a cost-effective and extensible ecosystem. - Number of Layers: 40 (12 for Pre-Buffer & 28 for Post-LLM) We release the 2B weights of NEO10 in Pre-Training (PT), Mid-Training (MT), and Supervised Fine-Tuning (SFT). | Model name | Weight | | ---------- | ------------------------------------------------------- | | NEO-2B-PT | šŸ¤— NEO-2B-PT HF link | | NEO-2B-MT | šŸ¤— NEO-2B-MT HF link | | NEO-2B-SFT | šŸ¤— NEO-2B-SFT HF link | āœ’ļøāœ’ļø Citation If NEO is helpful for your research, please consider star ⭐ and citation šŸ“ :

NaNK
license:apache-2.0
70
1

NEO1 0 9B SFT

From Pixels to Words -- Towards Native Vision-Language Primitives at Scale Two lingering clouds cast shadows over its widespread exploration and promotion: - What fundamental constraints set native VLMs apart from modular ones, and to what extent can these barriers be overcome?

NaNK
license:apache-2.0
65
4

NEO1_0-9B-PT

From Pixels to Words -- Towards Native Vision-Language Primitives at Scale Two lingering clouds cast shadows over its widespread exploration and promotion: - What fundamental constraints set native VLMs apart from modular ones, and to what extent can these barriers be overcome? - How to make research in native VLMs more accessible and democratized, thereby accelerating progress in the field. We construct native VLMs built from first principles, where its primitive should: - effectively align pixel and word representations within a shared semantic space; - seamlessly integrate the strengths of separate vision and language modules; - inherently embody various cross-modal properties that support unified vision-language encoding, aligning, and reasoning. - With only 390M image-text examples, NEO develops strong visual perception from scratch inside a dense and monolithic model via elaborate primitives. - NEO serves as a cornerstone for scalable and powerful native VLMs, paired with reusable components that foster a cost-effective and extensible ecosystem. - Number of Layers: 42 (6 for Pre-Buffer & 36 for Post-LLM) We release the 9B weights of NEO10 in Pre-Training (PT), Mid-Training (MT), and Supervised Fine-Tuning (SFT). | Model name | Weight | | ---------- | ------------------------------------------------------- | | NEO-9B-PT | šŸ¤— NEO-9B-PT HF link | | NEO-9B-MT | šŸ¤— NEO-9B-MT HF link | | NEO-9B-SFT | šŸ¤— NEO-9B-SFT HF link | āœ’ļøāœ’ļø Citation If NEO is helpful for your research, please consider star ⭐ and citation šŸ“ :

NaNK
license:apache-2.0
53
0

NEO1_0-9B-MT

From Pixels to Words -- Towards Native Vision-Language Primitives at Scale Two lingering clouds cast shadows over its widespread exploration and promotion: - What fundamental constraints set native VLMs apart from modular ones, and to what extent can these barriers be overcome? - How to make research in native VLMs more accessible and democratized, thereby accelerating progress in the field. We construct native VLMs built from first principles, where its primitive should: - effectively align pixel and word representations within a shared semantic space; - seamlessly integrate the strengths of separate vision and language modules; - inherently embody various cross-modal properties that support unified vision-language encoding, aligning, and reasoning. - With only 390M image-text examples, NEO develops strong visual perception from scratch inside a dense and monolithic model via elaborate primitives. - NEO serves as a cornerstone for scalable and powerful native VLMs, paired with reusable components that foster a cost-effective and extensible ecosystem. - Number of Layers: 42 (6 for Pre-Buffer & 36 for Post-LLM) We release the 9B weights of NEO10 in Pre-Training (PT), Mid-Training (MT), and Supervised Fine-Tuning (SFT). | Model name | Weight | | ---------- | ------------------------------------------------------- | | NEO-9B-PT | šŸ¤— NEO-9B-PT HF link | | NEO-9B-MT | šŸ¤— NEO-9B-MT HF link | | NEO-9B-SFT | šŸ¤— NEO-9B-SFT HF link | āœ’ļøāœ’ļø Citation If NEO is helpful for your research, please consider star ⭐ and citation šŸ“ :

NaNK
license:apache-2.0
44
0

NEO1_0-2B-MT

From Pixels to Words -- Towards Native Vision-Language Primitives at Scale Two lingering clouds cast shadows over its widespread exploration and promotion: - What fundamental constraints set native VLMs apart from modular ones, and to what extent can these barriers be overcome? - How to make research in native VLMs more accessible and democratized, thereby accelerating progress in the field. We construct native VLMs built from first principles, where its primitive should: - effectively align pixel and word representations within a shared semantic space; - seamlessly integrate the strengths of separate vision and language modules; - inherently embody various cross-modal properties that support unified vision-language encoding, aligning, and reasoning. - With only 390M image-text examples, NEO develops strong visual perception from scratch inside a dense and monolithic model via elaborate primitives. - NEO serves as a cornerstone for scalable and powerful native VLMs, paired with reusable components that foster a cost-effective and extensible ecosystem. - Number of Layers: 40 (12 for Pre-Buffer & 28 for Post-LLM) We release the 2B weights of NEO10 in Pre-Training (PT), Mid-Training (MT), and Supervised Fine-Tuning (SFT). | Model name | Weight | | ---------- | ------------------------------------------------------- | | NEO-2B-PT | šŸ¤— NEO-2B-PT HF link | | NEO-2B-MT | šŸ¤— NEO-2B-MT HF link | | NEO-2B-SFT | šŸ¤— NEO-2B-SFT HF link | āœ’ļøāœ’ļø Citation If NEO is helpful for your research, please consider star ⭐ and citation šŸ“ :

NaNK
license:apache-2.0
36
0