datalab-to
chandra
Chandra is an OCR model that outputs markdown, HTML, and JSON. It is highly accurate at extracting text from images and PDFs, while preserving layout information. You can try Chandra in the free playground here, or at a hosted API here. - Convert documents to markdown, html, or json with detailed layout information - Good handwriting support - Reconstructs forms accurately, including checkboxes - Good support for tables, math, and complex layouts - Extracts images and diagrams, with captions and structured data - Support for 40+ languages We used the olmocr benchmark, which seems to be the most reliable current OCR benchmark in our testing. | Model | ArXiv | Old Scans Math | Tables | Old Scans | Headers and Footers | Multi column | Long tiny text | Base | Overall | Source | |:----------|:--------:|:--------------:|:--------:|:---------:|:-------------------:|:------------:|:--------------:|:----:|:--------------:|:------:| | Datalab Chandra v0.1.0 | 82.2 | 80.3 | 88.0 | 50.4 | 90.8 | 81.2 | 92.3 | 99.9 | 83.1 ± 0.9 | Own benchmarks | | Datalab Marker v1.10.0 | 83.8 | 69.7 | 74.8 | 32.3 | 86.6 | 79.4 | 85.7 | 99.6 | 76.5 ± 1.0 | Own benchmarks | | Mistral OCR API | 77.2 | 67.5 | 60.6 | 29.3 | 93.6 | 71.3 | 77.1 | 99.4 | 72.0 ± 1.1 | olmocr repo | | Deepseek OCR | 75.2 | 72.3 | 79.7 | 33.3 | 96.1 | 66.7 | 80.1 | 99.7 | 75.4 ± 1.0 | Own benchmarks | | GPT-4o (Anchored) | 53.5 | 74.5 | 70.0 | 40.7 | 93.8 | 69.3 | 60.6 | 96.8 | 69.9 ± 1.1 | olmocr repo | | Gemini Flash 2 (Anchored) | 54.5 | 56.1 | 72.1 | 34.2 | 64.7 | 61.5 | 71.5 | 95.6 | 63.8 ± 1.2 | olmocr repo | | Qwen 3 VL | 70.2 | 75.1 | 45.6 | 37.5 | 89.1 | 62.1 | 43.0 | 94.3 | 64.6 ± 1.1 | Own benchmarks | | olmOCR v0.3.0 | 78.6 | 79.9 | 72.9 | 43.9 | 95.1 | 77.3 | 81.2 | 98.9 | 78.5 ± 1.1 | olmocr repo | | dots.ocr | 82.1 | 64.2 | 88.3 | 40.9 | 94.1 | 82.4 | 81.2 | 99.5 | 79.1 ± 1.0 | dots.ocr repo | | Type | Name | Link | |------|------|------| | Tables | Water Damage Form | View | | Tables | 10K Filing | View | | Forms | Handwritten Form | View | | Forms | Lease Agreement | View | | Handwriting | Doctor Note | View | | Handwriting | Math Homework | View | | Books | Geography Textbook | View | | Books | Exercise Problems | View | | Math | Attention Diagram | View | | Math | Worksheet | View | | Math | EGA Page | View | | Newspapers | New York Times | View | | Newspapers | LA Times | View | | Other | Transcript | View | | Other | Flowchart | View | - Huggingface Transformers - VLLM - olmocr - Qwen 3 VL