datalab-to

10 models • 1 total models in database
Sort by:

chandra

Chandra is an OCR model that outputs markdown, HTML, and JSON. It is highly accurate at extracting text from images and PDFs, while preserving layout information. You can try Chandra in the free playground here, or at a hosted API here. - Convert documents to markdown, html, or json with detailed layout information - Good handwriting support - Reconstructs forms accurately, including checkboxes - Good support for tables, math, and complex layouts - Extracts images and diagrams, with captions and structured data - Support for 40+ languages We used the olmocr benchmark, which seems to be the most reliable current OCR benchmark in our testing. | Model | ArXiv | Old Scans Math | Tables | Old Scans | Headers and Footers | Multi column | Long tiny text | Base | Overall | Source | |:----------|:--------:|:--------------:|:--------:|:---------:|:-------------------:|:------------:|:--------------:|:----:|:--------------:|:------:| | Datalab Chandra v0.1.0 | 82.2 | 80.3 | 88.0 | 50.4 | 90.8 | 81.2 | 92.3 | 99.9 | 83.1 ± 0.9 | Own benchmarks | | Datalab Marker v1.10.0 | 83.8 | 69.7 | 74.8 | 32.3 | 86.6 | 79.4 | 85.7 | 99.6 | 76.5 ± 1.0 | Own benchmarks | | Mistral OCR API | 77.2 | 67.5 | 60.6 | 29.3 | 93.6 | 71.3 | 77.1 | 99.4 | 72.0 ± 1.1 | olmocr repo | | Deepseek OCR | 75.2 | 72.3 | 79.7 | 33.3 | 96.1 | 66.7 | 80.1 | 99.7 | 75.4 ± 1.0 | Own benchmarks | | GPT-4o (Anchored) | 53.5 | 74.5 | 70.0 | 40.7 | 93.8 | 69.3 | 60.6 | 96.8 | 69.9 ± 1.1 | olmocr repo | | Gemini Flash 2 (Anchored) | 54.5 | 56.1 | 72.1 | 34.2 | 64.7 | 61.5 | 71.5 | 95.6 | 63.8 ± 1.2 | olmocr repo | | Qwen 3 VL | 70.2 | 75.1 | 45.6 | 37.5 | 89.1 | 62.1 | 43.0 | 94.3 | 64.6 ± 1.1 | Own benchmarks | | olmOCR v0.3.0 | 78.6 | 79.9 | 72.9 | 43.9 | 95.1 | 77.3 | 81.2 | 98.9 | 78.5 ± 1.1 | olmocr repo | | dots.ocr | 82.1 | 64.2 | 88.3 | 40.9 | 94.1 | 82.4 | 81.2 | 99.5 | 79.1 ± 1.0 | dots.ocr repo | | Type | Name | Link | |------|------|------| | Tables | Water Damage Form | View | | Tables | 10K Filing | View | | Forms | Handwritten Form | View | | Forms | Lease Agreement | View | | Handwriting | Doctor Note | View | | Handwriting | Math Homework | View | | Books | Geography Textbook | View | | Books | Exercise Problems | View | | Math | Attention Diagram | View | | Math | Worksheet | View | | Math | EGA Page | View | | Newspapers | New York Times | View | | Newspapers | LA Times | View | | Other | Transcript | View | | Other | Flowchart | View | - Huggingface Transformers - VLLM - olmocr - Qwen 3 VL

160,410
508

surya_layout

license:cc-by-nc-sa-4.0
4,068
5

ocr_error_detection

license:cc-by-nc-sa-4.0
1,962
0

surya_layout0

license:cc-by-nc-sa-4.0
486
2

surya_tablerec

license:cc-by-nc-sa-4.0
90
3

texify

license:cc-by-nc-sa-4.0
80
2

line_detector0

license:cc-by-nc-sa-4.0
8
0

inline_math_det0

license:cc-by-nc-sa-4.0
7
0

chandra-ocr-2

6
8

surya-alpha

license:cc-by-nc-sa-4.0
3
3