monkt
Paddleocr Onnx
Multilingual OCR models from PaddleOCR, converted to ONNX format for production deployment. Use as a complete pipeline: Integrate with monkt.com for end-to-end document processing. Source: PaddlePaddle PP-OCRv5 Collection Format: ONNX (optimized for inference) License: Apache 2.0 16 models covering 48+ languages: - 11 PP-OCRv5 models (latest, highest accuracy) - 5 PP-OCRv3 models (legacy, additional language support) | Language Group | Path | Languages | Accuracy | Size | |----------------|------|-----------|----------|------| | English | `languages/english/` | English | 85.25% | 7.5 MB | | Latin | `languages/latin/` | French, German, Spanish, Italian, Portuguese, + 27 more | 84.7% | 7.5 MB | | East Slavic | `languages/eslav/` | Russian, Bulgarian, Ukrainian, Belarusian | 81.6% | 7.5 MB | | Korean | `languages/korean/` | Korean | 88.0% | 13 MB | | Chinese/Japanese | `languages/chinese/` | Chinese, Japanese | - | 81 MB | | Thai | `languages/thai/` | Thai | 82.68% | 7.5 MB | | Greek | `languages/greek/` | Greek | 89.28% | 7.4 MB | | Language Group | Path | Languages | Version | Size | |----------------|------|-----------|---------|------| | Devanagari | `languages/hindi/` | Hindi, Marathi, Nepali, Sanskrit | v3 | 8.6 MB | | Arabic | `languages/arabic/` | Arabic, Urdu, Persian/Farsi | v3 | 8.6 MB | | Tamil | `languages/tamil/` | Tamil | v3 | 8.6 MB | | Telugu | `languages/telugu/` | Telugu | v3 | 8.6 MB | | Model | Path | Version | Size | |-------|------|---------|------| | PP-OCRv5 Detection | `detection/v5/det.onnx` | v5 | 84 MB | | PP-OCRv3 Detection | `detection/v3/det.onnx` | v3 | 2.3 MB | Note: Use v5 detection with v5 recognition models. Use v3 detection with v3 recognition models. | Model | Path | Purpose | Accuracy | Size | |-------|------|---------|----------|------| | Document Orientation | `preprocessing/doc-orientation/` | Corrects rotated documents (0°, 90°, 180°, 270°) | 99.06% | 6.5 MB | | Text Line Orientation | `preprocessing/textline-orientation/` | Corrects upside-down text (0°, 180°) | 98.85% | 6.5 MB | | Document Unwarping | `preprocessing/doc-unwarping/` | Fixes curved/warped documents | - | 30 MB | Latin Script (32 languages): English, French, German, Spanish, Italian, Portuguese, Dutch, Polish, Czech, Slovak, Croatian, Bosnian, Serbian, Slovenian, Danish, Norwegian, Swedish, Icelandic, Estonian, Lithuanian, Hungarian, Albanian, Welsh, Irish, Turkish, Indonesian, Malay, Afrikaans, Swahili, Tagalog, Uzbek, Latin Cyrillic: Russian, Bulgarian, Ukrainian, Belarusian East Asian: Chinese (Simplified, Traditional), Japanese (Hiragana, Katakana, Kanji), Korean South Asian: Hindi, Marathi, Nepali, Sanskrit, Tamil, Telugu Optional preprocessing for rotated/distorted documents Preprocessing models improve accuracy on rotated or distorted documents: When to use preprocessing: - Document Orientation (`doc-orientation/`): Scanned documents with unknown rotation (0°/90°/180°/270°) - Text Line Orientation (`textline-orientation/`): Upside-down text lines (0°/180°) - Document Unwarping (`doc-unwarping/`): Curved pages, warped documents, camera photos Performance impact: +10-30% accuracy on distorted images, minimal speed overhead. | Document Language | Model Path | |-------------------|------------| | English | `languages/english/` | | French, German, Spanish, Italian, Portuguese | `languages/latin/` | | Russian, Bulgarian, Ukrainian, Belarusian | `languages/eslav/` | | Korean | `languages/korean/` | | Chinese, Japanese | `languages/chinese/` | | Thai | `languages/thai/` | | Greek | `languages/greek/` | | Hindi, Marathi, Nepali, Sanskrit | `languages/hindi/` + `detection/v3/` | | Arabic, Urdu, Persian/Farsi | `languages/arabic/` + `detection/v3/` | | Tamil | `languages/tamil/` + `detection/v3/` | | Telugu | `languages/telugu/` + `detection/v3/` | - Framework: PaddleOCR → ONNX - ONNX Opset: 11 - Precision: FP32 - Input Format: RGB images (dynamic size) - Inference: CPU/GPU via onnxruntime Detection Model - Input: `(batch, 3, height, width)` - dynamic - Output: Text bounding boxes Recognition Model - Input: `(batch, 3, 32, width)` - height fixed at 32px - Output: CTC logits → decoded with dictionary | Model | Accuracy | Dataset | |-------|----------|---------| | Greek | 89.28% | 2,799 images | | Korean | 88.0% | 5,007 images | | English | 85.25% | 6,530 images | | Latin | 84.7% | 3,111 images | | Thai | 82.68% | 4,261 images | | East Slavic | 81.6% | 7,031 images | Q: Which version should I use? A: Use PP-OCRv5 models for best accuracy. Use PP-OCRv3 only for South Asian languages not available in v5. Q: Can I mix v5 and v3 models? A: No. Use `detection/v5/det.onnx` with v5 recognition models, and `detection/v3/det.onnx` with v3 recognition models. Q: GPU acceleration? A: Install `onnxruntime-gpu` instead of `onnxruntime` for 10x faster inference. Q: Commercial use? A: Yes. Apache 2.0 license allows commercial use. - Original Models: PaddlePaddle Team - Conversion: paddle2onnx - Source: PP-OCRv5 Collection - PaddleOCR GitHub - PaddleOCR Documentation - ONNX Runtime - monkt.com - Document processing pipeline
doctype
A high-performance MobileNetV3-based document classifier that categorizes document images into 7 distinct types. Optimized for production deployment with ONNX format. This model classifies document images into the following categories: | Category | Description | |----------|-------------| | chart | Charts, graphs, and data visualizations | | diagram | Flowcharts, diagrams, and technical drawings | | documenthandwritten | Handwritten documents and notes | | documentprinted | Printed text documents | | map | Maps and geographic visualizations | | photo | Photographs and natural images | | screenshot | Screenshots and screen captures | - Architecture: MobileNetV3-Large (transfer learning + fine-tuning) - Input Size: 320×320 pixels - Parameters: ~5.4M (lightweight and efficient) - Inference Time: ~10-30ms on CPU (depending on hardware) - Dataset Size: 21,000 images (17,500 train / 2,100 val / 1,400 test) - Training Strategy: - Phase 1: Transfer learning with frozen base (40 epochs) - Phase 2: Fine-tuning entire model (20 epochs) - Data Augmentation: Rotation, shifts, zoom, brightness variation - Optimizer: Adam (lr=0.001 → 1e-5 for fine-tuning) If you use this model in your research or project, please cite.