rednote-hilab

10 models • 3 total models in database

Sort by:

dots.ocr

--- license: mit library_name: dots_ocr pipeline_tag: image-text-to-text tags: - image-to-text - ocr - document-parse - layout - table - formula - transformers - custom_code language: - en - zh - multilingual ---

license:mit

1,068,427

1,118

dots.llm1.inst

&nbsp&nbsp🤗 Hugging Face &nbsp&nbsp | &nbsp&nbsp 📑 Paper &nbsp&nbsp 🖥️ Demo &nbsp&nbsp | &nbsp&nbsp💬 WeChat (微信) &nbsp&nbsp | &nbsp&nbsp📕 rednote &nbsp&nbsp Visit our Hugging Face (click links above), search checkpoints with names starting with `dots.llm1` or visit the dots1 collection, and you will find all you need! Enjoy! - 2025.06.06: We released the `dots.llm1` series. Check our report for more details! The `dots.llm1` model is a large-scale MoE model that activates 14B parameters out of a total of 142B parameters, delivering performance on par with state-of-the-art models. Leveraging our meticulously crafted and efficient data processing pipeline, `dots.llm1` achieves performance comparable to Qwen2.5-72B after pretrained on high-quality corpus without synthetic data. To foster further research, we open-source intermediate training checkpoints spanning the entire training process, providing valuable insights into the learning dynamics of large language models. This repo contains the base and instruction-tuned `dots.llm1` model. which has the following features: - Type: A MoE model with 14B activated and 142B total parameters trained on high-quality corpus. - Training Stages: Pretraining and SFT. - Architecture: Multi-head Attention with QK-Norm in attention Layer, fine-grained MoE utilizing top-6 out of 128 routed experts, plus 2 shared experts. - Number of Layers: 62 - Number of Attention Heads: 32 - Supported Languages: English, Chinese - Context Length: 32,768 tokens - License: MIT - Enhanced Data Processing: We propose a scalable and fine-grained three-stage data processing framework designed to generate large-scale, high-quality and diverse data for pretraining. - No Synthetic Data during Pretraining: High-quality non-synthetic tokens was used in base model pretraining. - Performance and Cost Efficiency: `dots.llm1` is an open-source model that activates only 14B parameters at inference, delivering both comprehensive capabilities and high computational efficiency. - Infrastructure: We introduce an innovative MoE all-to-all communication and computation overlapping recipe based on interleaved 1F1B pipeline scheduling and an efficient grouped GEMM implementation to boost computational efficiency. - Open Accessibility to Model Dynamics: Intermediate model checkpoints are released spanning the entire training process, facilitating future research into the learning dynamics of large language models. | Model | #Total Params | #Activated Params | Context Length | Download Link | | :------------: | :------------: | :------------: | :------------: | :------------: | | dots.llm1.base | 142B | 14B | 32K | 🤗 Hugging Face | | dots.llm1.inst | 142B | 14B | 32K | 🤗 Hugging Face | The docker images are available on Docker Hub, based on the official images. Then you can verify whether the model is running successfully in the following way. We are working to merge it into Transformers (PR #38143). vLLM is a high-throughput and memory-efficient inference and serving engine for LLMs. Official support for this feature is covered in PR #18254. An OpenAI-compatible API will be available at `http://localhost:8000/v1`. SGLang is a fast serving framework for large language models and vision language models. SGLang could be used to launch a server with OpenAI-compatible API service. Official support for this feature is covered in PR #6471. An OpenAI-compatible API will be available at `http://localhost:8000/v1`. Detailed evaluation results are reported in this 📑 report. If you find `dots.llm1` is useful or want to use in your projects, please kindly cite our paper:

license:mit

6,280

174

dots.vlm1.inst

&nbsp&nbsp🤗 Hugging Face &nbsp&nbsp | &nbsp&nbsp 📄 Blog &nbsp&nbsp | &nbsp&nbsp 🔗 GitHub &nbsp&nbsp 🖥️ Demo &nbsp&nbsp | &nbsp&nbsp💬 WeChat (微信) &nbsp&nbsp | &nbsp&nbsp📕 rednote &nbsp&nbsp Visit our Hugging Face (click links above) or check out our live demo to try dots.vlm1! Enjoy! We are excited to introduce dots.vlm1, the first vision-language model in the dots model family. Built upon a 1.2 billion-parameter vision encoder and the DeepSeek V3 large language model (LLM), dots.vlm1 demonstrates strong multimodal understanding and reasoning capabilities. Model Highlights: - NaViT Vision Encoder: Trained entirely from scratch rather than fine-tuning an existing vision backbone. It natively supports dynamic resolution and incorporates pure visual supervision in addition to traditional text supervision, thereby enhancing the upper bound of perceptual capacity. Beyond image captioning datasets, a large amount of structured image data was introduced during pretraining to improve the model’s perceptual capabilities—particularly for tasks such as OCR. - Multimodal Training Data: In addition to conventional approaches, dots.vlm1 leverages a wide range of synthetic data strategies to cover diverse image types (e.g., tables, charts, documents, graphics) and descriptions (e.g., alt text, dense captions, grounding annotations). Furthermore, a strong multimodal model was used to rewrite web page data with interleaved text and images, significantly improving the quality of the training corpus. Through large-scale pretraining and carefully tuned post-training, dots.vlm1 achieves near state-of-the-art performance in both visual perception and reasoning, setting a new performance ceiling for open-source vision-language models—while still maintaining competitive capabilities in pure-text tasks. Special thanks to the DeepSeek team for the excellent DeepSeek V3 model. | | | Qwen2.5VL-72B | Gemini2.5 Pro | Seed-VL1.5 thinking | dots.vlm1 | |------|--------|----------------|--------------------|--------------------------|-----------| | STEM/Reasoning | MMMU | 69.3 | 84.22 | 79.89 | 80.11 | | | MMMUpro | 51.91 | 76.5 | 68.9 | 70.11 | | | MathVision | 39.4 | 72.34 | 68.77 | 69.64 | | | MathVista | 74.6 | 83.5 | 86.1 | 85.0 | | | ZeroBench | 2 | 5 | 2 | 4 | | | ZeroBench-sub | 20 | 30.24 | 25.75 | 26.65 | | | VisuLogic | 25.6 | 29.8 | 35.9 | 32.2 | | General Visual | MMbench-CN | 88.2 | 89 | 89.78 | 88.24 | | | MMbench-EN | 89.2 | 89.55 | 89.47 | 89.32 | | | MMStar | 71.13 | 78.73 | 78.33 | 76.67 | | | RealWorldQA | 75.9 | 78.43 | 78.69 | 79.08 | | | Vibe(GPT4o) | 60.13 | 76.39 | 68.59 | 69.24 | | | m3gia(cn) | 88.24 | 89.54 | 91.2 | 90.85 | | | SimpleVQAds | 52.19 | 57.09 | 61.34 | 55.8 | | | MMVP | 66 | 67.33 | 73.33 | 72 | | | HallusionBench | 56.5 | 63.07 | 63.49 | 64.83 | | | CVBench | 84.15 | 85.36 | 89.68 | 85.65 | | | Blink | 61.7 | 71.86 | 72.38 | 66.33 | | OCR/Doc/Chart | charxiv(dq) | 88.2 | 90.3 | 89.6 | 92.1 | | | charxiv(rq) | 48.5 | 68.3 | 63.4 | 64.4 | | | OCRReasoning | 38.02 | 70.81 | 63.42 | 66.23 | | | DOCVQA | 96.23 | 95.42 | 93.65 | 96.52 | | | ChartQA | 86.1 | 86.16 | 86.88 | 87.68 | | | OCRBenchV1 | 87.1 | 86.6 | 86.7 | 82.3 | | | AI2D | 88.3 | 91.03 | 89.05 | 88.37 | | Grounding/Counting | RefCOCO | 90.3 | 74.6 | 91.3 | 90.45 | | | CountBench | 92.4 | 91.79 | 89 | 91.99 | | Multi Image | muir | 69.38 | 70.5 | 79.77 | 78.58 | | | mantis | 79.26 | 84.33 | 82.3 | 86.18 | | | | Deepseek-R1-0528 | Qwen3-235B-A22B | Qwen3-235B-A22B-think-2507|dots.vlm1| | Text | LiveCodeBench | 73.3 | 70.7 | 78.4 | 72.94 | | | AIME 2025 | 87.5 | 82.6 | 92.3 | 85.83 | | | GPQA | 81 | 70.7 | 81.1 | 72.78 | Our model supports distributed deployment across multiple machines. Here's how to set up a 2-node cluster: Prerequisites: - Model: `rednote-hilab/dots.vlm1.inst` - Node 1 IP: `10.0.0.1` (master node) - Node 2 IP: `10.0.0.2` (worker node) Key parameters explanation: - `--tp 16`: Tensor parallelism across 16 GPUs per node - `--nnodes 2`: Total number of nodes in the cluster - `--node-rank`: Node identifier (0 for master, 1+ for workers) - `--context-length 65536`: Maximum context length - `--quantization fp8`: Use FP8 quantization for efficiency - `--chat-template dots-vlm`: Use custom chat template for dots.vlm model Once the servers are launched, you can access the model through OpenAI-compatible API:

license:mit

3,267

dots.mocr

license:mit

148

dots.llm1.base

NaNK

license:mit

115

dots.llm1.inst-FP8-dynamic

&nbsp&nbsp🤗 Hugging Face &nbsp&nbsp | &nbsp&nbsp 📑 Paper &nbsp&nbsp 🖥️ Demo &nbsp&nbsp | &nbsp&nbsp💬 WeChat (微信) &nbsp&nbsp | &nbsp&nbsp📕 rednote &nbsp&nbsp Visit our Hugging Face (click links above), search checkpoints with names starting with `dots.llm1` or visit the dots1 collection, and you will find all you need! Enjoy! - 2025.06.06: We released the `dots.llm1` series. Check our report for more details! The `dots.llm1` model is a large-scale MoE model that activates 14B parameters out of a total of 142B parameters, delivering performance on par with state-of-the-art models. Leveraging our meticulously crafted and efficient data processing pipeline, `dots.llm1` achieves performance comparable to Qwen2.5-72B after pretrained on high-quality corpus without synthetic data. To foster further research, we open-source intermediate training checkpoints spanning the entire training process, providing valuable insights into the learning dynamics of large language models. This repo contains the base and instruction-tuned `dots.llm1` model. which has the following features: - Type: A MoE model with 14B activated and 142B total parameters trained on high-quality corpus. - Training Stages: Pretraining and SFT. - Architecture: Multi-head Attention with QK-Norm in attention Layer, fine-grained MoE utilizing top-6 out of 128 routed experts, plus 2 shared experts. - Number of Layers: 62 - Number of Attention Heads: 32 - Supported Languages: English, Chinese - Context Length: 32,768 tokens - License: MIT - Enhanced Data Processing: We propose a scalable and fine-grained three-stage data processing framework designed to generate large-scale, high-quality and diverse data for pretraining. - No Synthetic Data during Pretraining: High-quality non-synthetic tokens was used in base model pretraining. - Performance and Cost Efficiency: `dots.llm1` is an open-source model that activates only 14B parameters at inference, delivering both comprehensive capabilities and high computational efficiency. - Infrastructure: We introduce an innovative MoE all-to-all communication and computation overlapping recipe based on interleaved 1F1B pipeline scheduling and an efficient grouped GEMM implementation to boost computational efficiency. - Open Accessibility to Model Dynamics: Intermediate model checkpoints are released spanning the entire training process, facilitating future research into the learning dynamics of large language models. We release the quantized `dots.llm1.inst.FP8-dynamic` model, which retains approximately 98% of the original performance after quantization. For convenience, we recommend running vLLM inference using our Docker image `rednotehilab/dots1:vllm-openai-v0.9.1`, , which is available on Docker Hub. We are working to merge it into Transformers (PR #38143). If you find `dots.llm1` is useful or want to use in your projects, please kindly cite our paper:

NaNK

license:mit

Dots.Ocr.Base

dots.ocr: Multilingual Document Layout Parsing in a Single Vision-Language Model [](https://github.com/rednote-hilab/dots.ocr/blob/master/assets/blog.md) [](https://huggingface.co/rednote-hilab/dots.ocr) dots.ocr is a powerful, multilingual document parser that unifies layout detection and content recognition within a single vision-language model while maintaining good reading order. Despite its compact 1.7B-parameter LLM foundation, it achieves state-of-the-art(SOTA) performance. 1. Powerful Performance: dots.ocr achieves SOTA performance for text, tables, and reading order on OmniDocBench, while delivering formula recognition results comparable to much larger models like Doubao-1.5 and gemini2.5-pro. 2. Multilingual Support: dots.ocr demonstrates robust parsing capabilities for low-resource languages, achieving decisive advantages across both layout detection and content recognition on our in-house multilingual documents benchmark. 3. Unified and Simple Architecture: By leveraging a single vision-language model, dots.ocr offers a significantly more streamlined architecture than conventional methods that rely on complex, multi-model pipelines. Switching between tasks is accomplished simply by altering the input prompt, proving that a VLM can achieve competitive detection results compared to traditional detection models like DocLayout-YOLO. 4. Efficient and Fast Performance: Built upon a compact 1.7B LLM, dots.ocr provides faster inference speeds than many other high-performing models based on larger foundations. Performance Comparison: dots.ocr vs. Competing Models > Notes: > - The EN, ZH metrics are the end2end evaluation results of OmniDocBench, and Multilingual metric is the end2end evaluation results of dots.ocr-bench. News 🚀 We release dots.ocr, — a multilingual documents parsing model based on 1.7b llm, with SOTA performance. The end-to-end evaluation results of different tasks. Model Type Methods Overall Edit ↓ Text Edit ↓ Formula Edit ↓ Table TEDS ↑ Table Edit ↓ Read Order Edit ↓ Pipeline Tools MinerU 0.150 0.357 0.061 0.215 0.278 0.577 78.6 62.1 0.180 0.344 0.079 0.292 Marker 0.336 0.556 0.080 0.315 0.530 0.883 67.6 49.2 0.619 0.685 0.114 0.340 Mathpix 0.191 0.365 0.105 0.384 0.306 0.454 77.0 67.1 0.243 0.320 0.108 0.304 Docling 0.589 0.909 0.416 0.987 0.999 1 61.3 25.0 0.627 0.810 0.313 0.837 Pix2Text 0.320 0.528 0.138 0.356 0.276 0.611 73.6 66.2 0.584 0.645 0.281 0.499 Unstructured 0.586 0.716 0.198 0.481 0.999 1 0 0.06 1 0.998 0.145 0.387 OpenParse 0.646 0.814 0.681 0.974 0.996 1 64.8 27.5 0.284 0.639 0.595 0.641 PPStruct-V3 0.145 0.206 0.058 0.088 0.295 0.535 - - 0.159 0.109 0.069 0.091 Expert VLMs GOT-OCR 0.287 0.411 0.189 0.315 0.360 0.528 53.2 47.2 0.459 0.520 0.141 0.280 Nougat 0.452 0.973 0.365 0.998 0.488 0.941 39.9 0 0.572 1.000 0.382 0.954 Mistral OCR 0.268 0.439 0.072 0.325 0.318 0.495 75.8 63.6 0.600 0.650 0.083 0.284 OLMOCR-sglang 0.326 0.469 0.097 0.293 0.455 0.655 68.1 61.3 0.608 0.652 0.145 0.277 SmolDocling-256M 0.493 0.816 0.262 0.838 0.753 0.997 44.9 16.5 0.729 0.907 0.227 0.522 Dolphin 0.206 0.306 0.107 0.197 0.447 0.580 77.3 67.2 0.180 0.285 0.091 0.162 MinerU 2 0.139 0.240 0.047 0.109 0.297 0.536 82.5 79.0 0.141 0.195 0.069 0.118 OCRFlux 0.195 0.281 0.064 0.183 0.379 0.613 71.6 81.3 0.253 0.139 0.086 0.187 MonkeyOCR-pro-3B 0.138 0.206 0.067 0.107 0.246 0.421 81.5 87.5 0.139 0.111 0.100 0.185 General VLMs GPT4o 0.233 0.399 0.144 0.409 0.425 0.606 72.0 62.9 0.234 0.329 0.128 0.251 Qwen2-VL-72B 0.252 0.327 0.096 0.218 0.404 0.487 76.8 76.4 0.387 0.408 0.119 0.193 Qwen2.5-VL-72B 0.214 0.261 0.092 0.18 0.315 0.434 82.9 83.9 0.341 0.262 0.106 0.168 Gemini2.5-Pro 0.148 0.212 0.055 0.168 0.356 0.439 85.8 86.4 0.13 0.119 0.049 0.121 doubao-1-5-thinking-vision-pro-250428 0.140 0.162 0.043 0.085 0.295 0.384 83.3 89.3 0.165 0.085 0.058 0.094 Expert VLMs dots.ocr 0.125 0.160 0.032 0.066 0.329 0.416 88.6 89.0 0.099 0.092 0.040 0.067 The end-to-end text recognition performance across 9 PDF page types. Model Type Models Book Slides Financial Report Textbook Exam Paper Magazine Academic Papers Notes Newspaper Overall Pipeline Tools MinerU 0.055 0.124 0.033 0.102 0.159 0.072 0.025 0.984 0.171 0.206 Marker 0.074 0.340 0.089 0.319 0.452 0.153 0.059 0.651 0.192 0.274 Mathpix 0.131 0.220 0.202 0.216 0.278 0.147 0.091 0.634 0.690 0.300 Expert VLMs GOT-OCR 0.111 0.222 0.067 0.132 0.204 0.198 0.179 0.388 0.771 0.267 Nougat 0.734 0.958 1.000 0.820 0.930 0.830 0.214 0.991 0.871 0.806 Dolphin 0.091 0.131 0.057 0.146 0.231 0.121 0.074 0.363 0.307 0.177 OCRFlux 0.068 0.125 0.092 0.102 0.119 0.083 0.047 0.223 0.536 0.149 MonkeyOCR-pro-3B 0.084 0.129 0.060 0.090 0.107 0.073 0.050 0.171 0.107 0.100 General VLMs GPT4o 0.157 0.163 0.348 0.187 0.281 0.173 0.146 0.607 0.751 0.316 Qwen2.5-VL-7B 0.148 0.053 0.111 0.137 0.189 0.117 0.134 0.204 0.706 0.205 InternVL3-8B 0.163 0.056 0.107 0.109 0.129 0.100 0.159 0.150 0.681 0.188 doubao-1-5-thinking-vision-pro-250428 0.048 0.048 0.024 0.062 0.085 0.051 0.039 0.096 0.181 0.073 Expert VLMs dots.ocr 0.031 0.047 0.011 0.082 0.079 0.028 0.029 0.109 0.056 0.055 > Notes: > - The metrics are from MonkeyOCR, OmniDocBench, and our own internal evaluations. > - We delete the Page-header and Page-footer cells in the result markdown. > - We use tikzpreprocess pipeline to upsample the images to dpi 200. This is an inhouse benchmark which contain 1493 pdf images with 100 languages. The end-to-end evaluation results of different tasks. Methods Overall Edit ↓ Text Edit ↓ Formula Edit ↓ Table TEDS ↑ Table Edit ↓ Read Order Edit ↓ doubao-1-5-thinking-vision-pro-250428 0.291 0.226 0.440 71.2 0.260 0.238 > Notes: > - We use the same metric calculation pipeline of OmniDocBench. > - We delete the Page-header and Page-footer cells in the result markdown. Overall Text Formula Table Picture Overall Text Formula Table Picture DocLayout-YOLO-DocStructBench 0.733 0.694 0.480 0.803 0.619 0.806 0.779 0.620 0.858 0.678 dots.ocr-parse all 0.831 0.801 0.654 0.838 0.748 0.922 0.909 0.770 0.888 0.831 dots.ocr-detection only 0.845 0.816 0.716 0.875 0.765 0.930 0.917 0.832 0.918 0.843 > Notes: > - promptlayoutallen for parse all, promptlayoutonlyen for detection only, please refer to prompts Model ArXiv Old Scans Math Tables Old Scans Headers and Footers Multi column Long Tiny Text Base Overall GOT OCR 52.7 52.0 0.2 22.1 93.6 42.0 29.9 94.0 48.3 ± 1.1 Marker 76.0 57.9 57.6 27.8 84.9 72.9 84.6 99.1 70.1 ± 1.1 MinerU 75.4 47.4 60.9 17.3 96.6 59.0 39.1 96.6 61.5 ± 1.1 Mistral OCR 77.2 67.5 60.6 29.3 93.6 71.3 77.1 99.4 72.0 ± 1.1 Nanonets OCR 67.0 68.6 77.7 39.5 40.7 69.9 53.4 99.3 64.5 ± 1.1 GPT-4o (No Anchor) 51.5 75.5 69.1 40.9 94.2 68.9 54.1 96.7 68.9 ± 1.1 GPT-4o (Anchored) 53.5 74.5 70.0 40.7 93.8 69.3 60.6 96.8 69.9 ± 1.1 Gemini Flash 2 (No Anchor) 32.1 56.3 61.4 27.8 48.0 58.7 84.4 94.0 57.8 ± 1.1 Gemini Flash 2 (Anchored) 54.5 56.1 72.1 34.2 64.7 61.5 71.5 95.6 63.8 ± 1.2 Qwen 2 VL (No Anchor) 19.7 31.7 24.2 17.1 88.9 8.3 6.8 55.5 31.5 ± 0.9 Qwen 2.5 VL (No Anchor) 63.1 65.7 67.3 38.6 73.6 68.3 49.1 98.3 65.5 ± 1.2 olmOCR v0.1.75 (No Anchor) 71.5 71.4 71.4 42.8 94.1 77.7 71.0 97.8 74.7 ± 1.1 olmOCR v0.1.75 (Anchored) 74.9 71.2 71.0 42.2 94.5 78.3 73.3 98.3 75.5 ± 1.0 MonkeyOCR-pro-3B 83.8 68.8 74.6 36.1 91.2 76.6 80.1 95.3 75.8 ± 1.0 dots.ocr 82.1 64.2 88.3 40.9 94.1 82.4 81.2 99.5 79.1 ± 1.0 > Note: > - The metrics are from MonkeyOCR, olmocr, and our own internal evaluations. > - We delete the Page-header and Page-footer cells in the result markdown. If you have trouble with the installation, try our Docker Image for an easier setup, and follow these steps: Download Model Weights > 💡Note: Please use a directory name without periods (e.g., `DotsOCR` instead of `dots.ocr`) for the model save path. This is a temporary workaround pending our integration with Transformers. 2. Deployment vLLM inference We highly recommend using vllm for deployment and inference. All of our evaluations results are based on vllm version 0.9.1. The Docker Image is based on the official vllm image. You can also follow Dockerfile to build the deployment environment by yourself. 3. Document Parse Based on vLLM server, you can parse an image or a pdf file using the following commands: 1. Structured Layout Data (`demoimage1.json`): A JSON file containing the detected layout elements, including their bounding boxes, categories, and extracted text. 2. Processed Markdown File (`demoimage1.md`): A Markdown file generated from the concatenated text of all detected cells. An additional version, `demoimage1nohf.md`, is also provided, which excludes page headers and footers for compatibility with benchmarks like Omnidocbench and olmOCR-bench. 3. Layout Visualization (`demoimage1.jpg`): The original image with the detected layout bounding boxes drawn on it. 4. Demo You can run the demo with the following command, or try directly at live demo Acknowledgments We would like to thank Qwen2.5-VL, aimv2, MonkeyOCR, OmniDocBench, PyMuPDF, for providing code and models. We also thank DocLayNet, M6Doc, CDLA, D4LA for providing valuable datasets. - Complex Document Elements: - Table&Formula: dots.ocr is not yet perfect for high-complexity tables and formula extraction. - Picture: Pictures in documents are currently not parsed. - Parsing Failures: The model may fail to parse under certain conditions: - When the character-to-pixel ratio is excessively high. Try enlarging the image or increasing the PDF parsing DPI (a setting of 200 is recommended). However, please note that the model performs optimally on images with a resolution under 11289600 pixels. - Continuous special characters, such as ellipses (`...`) and underscores (``), may cause the prediction output to repeat endlessly. In such scenarios, consider using alternative prompts like `promptlayoutonlyen`, `promptocr`, or `promptgroundingocr` (details here). - Performance Bottleneck: Despite its 1.7B parameter LLM foundation, dots.ocr is not yet optimized for high-throughput processing of large PDF volumes. We are committed to achieving more accurate table and formula parsing, as well as enhancing the model's OCR capabilities for broader generalization, all while aiming for a more powerful, more efficient model. Furthermore, we are actively considering the development of a more general-purpose perception model based on Vision-Language Models (VLMs), which would integrate general detection, image captioning, and OCR tasks into a unified framework. Parsing the content of the pictures in the documents is also a key priority for our future work. We believe that collaboration is the key to tackling these exciting challenges. If you are passionate about advancing the frontiers of document intelligence and are interested in contributing to these future endeavors, we would love to hear from you. Please reach out to us via email at: [[email protected]].

license:mit

rednote-hilab

dots.ocr

dots.llm1.inst

dots.vlm1.inst

dots.mocr

dots.llm1.base

dots.llm1.inst-FP8-dynamic

Dots.Ocr.Base

dots.mocr-svg

dots.ocr-1.5

dots.ocr-1.5-svg