PaddlePaddle

97 models • 8 total models in database
Sort by:

UVDoc

--- license: apache-2.0 library_name: PaddleOCR language: - en - zh pipeline_tag: image-to-text tags: - OCR - PaddlePaddle - PaddleOCR - doc_img_unwarping ---

license:apache-2.0
638,288
6

PP-LCNet_x1_0_doc_ori

--- license: apache-2.0 library_name: PaddleOCR language: - en - zh pipeline_tag: image-to-text tags: - OCR - PaddlePaddle - PaddleOCR - doc_img_orientation_classification ---

license:apache-2.0
515,156
2

PP-OCRv5_server_det

--- license: apache-2.0 library_name: PaddleOCR language: - en - zh pipeline_tag: image-to-text tags: - OCR - PaddlePaddle - PaddleOCR - textline_detection ---

license:apache-2.0
455,109
46

en_PP-OCRv5_mobile_rec

--- license: apache-2.0 library_name: PaddleOCR language: - en pipeline_tag: image-to-text tags: - OCR - PaddlePaddle - PaddleOCR - textline_recognition ---

license:apache-2.0
275,194
1

PP-DocLayoutV3_safetensors

license:apache-2.0
247,787
20

PP-LCNet_x1_0_textline_ori

--- license: apache-2.0 library_name: PaddleOCR language: - en - zh pipeline_tag: image-to-text tags: - OCR - PaddlePaddle - PaddleOCR - textline_orientation_classification ---

license:apache-2.0
204,543
0

PP-OCRv5_server_rec

PP-OCRv5serverrec is one of the PP-OCRv5rec that are the latest generation text line recognition models developed by PaddleOCR team. It aims to efficiently and accurately support the recognition of four major languages—Simplified Chinese, Traditional Chinese, English, and Japanese—as well as complex text scenarios such as handwriting, vertical text, pinyin, and rare characters using a single model. The key accuracy metrics are as follow: | Handwritten Chinese | Handwritten English | Printed Chinese | Printed English | Traditional Chinese | Ancient Text | Japanese | General Scenario | Pinyin | Rotation | Distortion | Artistic Text | Average | | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | | 0.5807 | 0.5806 | 0.9013 | 0.8679 | 0.7472 | 0.6039 | 0.7372 | 0.5946 | 0.8384 | 0.7435 | 0.9314 | 0.6397 | 0.8401 | Note: If any character (including punctuation) in a line was incorrect, the entire line was marked as wrong. This ensures higher accuracy in practical applications. Please refer to the following commands to install PaddlePaddle using pip: For details about PaddlePaddle installation, please refer to the PaddlePaddle official website. Install the latest version of the PaddleOCR inference package from PyPI: You can quickly experience the functionality with a single command: You can also integrate the model inference of the text recognition module into your project. Before running the following code, please download the sample image to your local machine. For details about usage command and descriptions of parameters, please refer to the Document. The ability of a single model is limited. But the pipeline consists of several models can provide more capacity to resolve difficult problems in real-world scenarios. The general OCR pipeline is used to solve text recognition tasks by extracting text information from images and outputting it in string format. And there are 5 modules in the pipeline: Document Image Orientation Classification Module (Optional) Text Image Unwarping Module (Optional) Text Line Orientation Classification Module (Optional) Text Detection Module Text Recognition Module Run a single command to quickly experience the OCR pipeline: If savepath is specified, the visualization results will be saved under `savepath`. The visualization output is shown below: The command-line method is for quick experience. For project integration, also only a few codes are needed as well: The default model used in pipeline is `PP-OCRv5serverrec`, and you can also use the local model file by argument `textrecognitionmodeldir`. For details about usage command and descriptions of parameters, please refer to the Document. Layout analysis is a technique used to extract structured information from document images. PP-StructureV3 includes the following six modules: Layout Detection Module General OCR Subline Document Image Preprocessing Subline (Optional) Table Recognition Subline (Optional) Seal Recognition Subline (Optional) Formula Recognition Subline (Optional) Run a single command to quickly experience the PP-StructureV3 pipeline: If savepath is specified, the visualization results will be saved under `savepath`. The visualization output is shown below: Just a few lines of code can experience the inference of the pipeline. Taking the PP-StructureV3 pipeline as an example: The default model used in pipeline is `PP-OCRv5serverrec`, and you can also use the local model file by argument `textrecognitionmodeldir`. For details about usage command and descriptions of parameters, please refer to the Document.

license:apache-2.0
90,464
16

PP-OCRv5_mobile_det

PP-OCRv5mobiledet is one of the PP-OCRv5det series, the latest generation of text detection models developed by the PaddleOCR team. It aims to efficiently and accurately supports the detection of text in diverse scenarios—including handwriting, vertical, rotated, and curved text—across multiple languages such as Simplified Chinese, Traditional Chinese, English, and Japanese. Key features include robust handling of complex layouts, varying text sizes, and challenging backgrounds, making it suitable for practical applications like document analysis, license plate recognition, and scene text detection. The key accuracy metrics are as follow: | Handwritten Chinese | Handwritten English | Printed Chinese | Printed English | Traditional Chinese | Ancient Text | Japanese | General Scenario | Pinyin | Rotation | Distortion | Artistic Text | Average | | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | | 0.744 | 0.777 | 0.905 | 0.910 | 0.823 | 0.581 | 0.727 | 0.721 | 0.575 | 0.647 | 0.827 | 0.525 | 0.770 | Please refer to the following commands to install PaddlePaddle using pip: For details about PaddlePaddle installation, please refer to the PaddlePaddle official website. Install the latest version of the PaddleOCR inference package from PyPI: You can quickly experience the functionality with a single command: You can also integrate the model inference of the text detection module into your project. Before running the following code, please download the sample image to your local machine. For details about usage command and descriptions of parameters, please refer to the Document. The ability of a single model is limited. But the pipeline consists of several models can provide more capacity to resolve difficult problems in real-world scenarios. The general OCR pipeline is used to solve text recognition tasks by extracting text information from images and outputting it in string format. And there are 5 modules in the pipeline: Document Image Orientation Classification Module (Optional) Text Image Unwarping Module (Optional) Text Line Orientation Classification Module (Optional) Text Detection Module Text Recognition Module Run a single command to quickly experience the OCR pipeline: If savepath is specified, the visualization results will be saved under `savepath`. The visualization output is shown below: The command-line method is for quick experience. For project integration, also only a few codes are needed as well: The default model used in pipeline is `PP-OCRv5serverdet`, so it is needed that specifing to `PP-OCRv5mobiledet` by argument `textdetectionmodelname`. And you can also use the local model file by argument `textdetectionmodeldir`. For details about usage command and descriptions of parameters, please refer to the Document. Layout analysis is a technique used to extract structured information from document images. PP-StructureV3 includes the following six modules: Layout Detection Module General OCR Pipeline Document Image Preprocessing Pipeline (Optional) Table Recognition Pipeline (Optional) Seal Recognition Pipeline (Optional) Formula Recognition Pipeline (Optional) Run a single command to quickly experience the PP-StructureV3 pipeline: Results would be printed to the terminal. If savepath is specified, the results will be saved under `savepath`. The predicted markdown visualization is shown below: Just a few lines of code can experience the inference of the pipeline. Taking the PP-StructureV3 pipeline as an example: The default model used in pipeline is `PP-OCRv5serverdet`, so it is needed that specifing to `PP-OCRv5mobiledet` by argument `textdetectionmodelname`. And you can also use the local model file by argument `textdetectionmodeldir`. For details about usage command and descriptions of parameters, please refer to the Document.

license:apache-2.0
52,192
16

latin_PP-OCRv5_mobile_rec

latinPP-OCRv5mobilerec is one of the PP-OCRv5rec that are the latest generation text line recognition models developed by PaddleOCR team. It aims to efficiently and accurately support the recognition of Korean. The key accuracy metrics are as follow: | 模型 | 拉丁字母语言数据集 精度 (%) | |-|-| | latinPP-OCRv5mobilerec | 84.7| Note: If any character (including punctuation) in a line was incorrect, the entire line was marked as wrong. This ensures higher accuracy in practical applications. Please refer to the following commands to install PaddlePaddle using pip: For details about PaddlePaddle installation, please refer to the PaddlePaddle official website. Install the latest version of the PaddleOCR inference package from PyPI: You can quickly experience the functionality with a single command: You can also integrate the model inference of the text recognition module into your project. Before running the following code, please download the sample image to your local machine. For details about usage command and descriptions of parameters, please refer to the Document. The ability of a single model is limited. But the pipeline consists of several models can provide more capacity to resolve difficult problems in real-world scenarios. The PP-OCRv5 pipeline is used to solve text recognition tasks by extracting text information from images and outputting it in string format. And there are 5 modules in the pipeline: Document Image Orientation Classification Module (Optional) Text Image Unwarping Module (Optional) Text Line Orientation Classification Module (Optional) Text Detection Module Text Recognition Module Run a single command to quickly experience the OCR pipeline: If savepath is specified, the visualization results will be saved under `savepath`. The visualization output is shown below: The command-line method is for quick experience. For project integration, also only a few codes are needed as well: The default model used in pipeline is `PP-OCRv5serverrec`, and you can also use the local model file by argument `textrecognitionmodeldir`. For details about usage command and descriptions of parameters, please refer to the Document.

license:apache-2.0
46,357
1

PaddleOCR-VL-1.5

license:apache-2.0
22,322
474

PP-DocLayoutV2

PP-DocLayoutV2 is a dedicated lightweight model for layout analysis, focusing specifically on element detection, classification, and reading order prediction. PP-DocLayoutV2 is composed of two sequentially connected networks. The first is an RT-DETR-based detection model that performs layout element detection and classification. The detected bounding boxes and class labels are then passed to a subsequent pointer network, which is responsible for ordering these layout elements. > For Windows users, please use WSL or a Docker container. For more usage details and parameter explanations, see the documentation. If you find PaddleOCR-VL helpful, feel free to give us a star and citation.

license:apache-2.0
17,313
21

PP-DocLayout_plus-L

A higher-precision layout area localization model trained on a self-built dataset containing Chinese and English papers, PPT, multi-layout magazines, contracts, books, exams, ancient books and research reports using RT-DETR-L. The layout detection model includes 20 common categories: document title, paragraph title, text, page number, abstract, table, references, footnotes, header, footer, algorithm, formula, formula number, image, table, seal, figuretable title, chart, and sidebar text and lists of references. The key metrics are as follow: | Model| mAP(0.5) (%) | | --- | --- | |PP-DocLayoutplus-L | 83.2 | Note: the evaluation set of the above precision indicators is the self built version sub area detection data set, including Chinese and English papers, magazines, newspapers, research reports PPT、 1000 document type pictures such as test papers and textbooks. Please refer to the following commands to install PaddlePaddle using pip: For details about PaddlePaddle installation, please refer to the PaddlePaddle official website. Install the latest version of the PaddleOCR inference package from PyPI: You can quickly experience the functionality with a single command: You can also integrate the model inference of the layout detection module into your project. Before running the following code, please download the sample image to your local machine. For details about usage command and descriptions of parameters, please refer to the Document. The ability of a single model is limited. But the pipeline consists of several models can provide more capacity to resolve difficult problems in real-world scenarios. Layout analysis is a technique used to extract structured information from document images. PP-StructureV3 includes the following six modules: Layout Detection Module General OCR Sub-pipeline Document Image Preprocessing Sub-pipeline (Optional) Table Recognition Sub-pipeline (Optional) Seal Recognition Sub-pipeline (Optional) Formula Recognition Sub-pipeline (Optional) You can quickly experience the PP-StructureV3 pipeline with a single command. You can experience the inference of the pipeline with just a few lines of code. Taking the PP-StructureV3 pipeline as an example: The default model used in pipeline is `PP-DocLayoutplus-L`. For details about usage command and descriptions of parameters, please refer to the Document.

license:apache-2.0
10,080
12

RT-DETR-L_wired_table_cell_det

license:apache-2.0
8,827
1

PP-OCRv5_mobile_rec

license:apache-2.0
8,505
7

SLANet_plus

license:apache-2.0
8,458
0

SLANeXt_wired

license:apache-2.0
8,423
0

PP-DocBlockLayout

license:apache-2.0
8,379
1

PP-LCNet_x1_0_table_cls

license:apache-2.0
8,357
1

PaddleOCR-VL

PaddleOCR-VL: Boosting Multilingual Document Parsing via a 0.9B Ultra-Compact Vision-Language Model [](https://github.com/PaddlePaddle/PaddleOCR) [](https://huggingface.co/PaddlePaddle/PaddleOCR-VL) [](https://modelscope.cn/models/PaddlePaddle/PaddleOCR-VL) [](https://huggingface.co/spaces/PaddlePaddle/PaddleOCR-VLOnlineDemo) [](https://modelscope.cn/studios/PaddlePaddle/PaddleOCR-VLOnlineDemo/summary) [](https://discord.gg/JPmZXDsEEK) [](https://x.com/PaddlePaddle) [](./LICENSE) 🔥 Official Website: Baidu AI Studio | 📝 arXiv: Technical Report PaddleOCR-VL is a SOTA and resource-efficient model tailored for document parsing. Its core component is PaddleOCR-VL-0.9B, a compact yet powerful vision-language model (VLM) that integrates a NaViT-style dynamic resolution visual encoder with the ERNIE-4.5-0.3B language model to enable accurate element recognition. This innovative model efficiently supports 109 languages and excels in recognizing complex elements (e.g., text, tables, formulas, and charts), while maintaining minimal resource consumption. Through comprehensive evaluations on widely used public benchmarks and in-house benchmarks, PaddleOCR-VL achieves SOTA performance in both page-level document parsing and element-level recognition. It significantly outperforms existing solutions, exhibits strong competitiveness against top-tier VLMs, and delivers fast inference speeds. These strengths make it highly suitable for practical deployment in real-world scenarios. 1. Compact yet Powerful VLM Architecture: We present a novel vision-language model that is specifically designed for resource-efficient inference, achieving outstanding performance in element recognition. By integrating a NaViT-style dynamic high-resolution visual encoder with the lightweight ERNIE-4.5-0.3B language model, we significantly enhance the model’s recognition capabilities and decoding efficiency. This integration maintains high accuracy while reducing computational demands, making it well-suited for efficient and practical document processing applications. 2. SOTA Performance on Document Parsing: PaddleOCR-VL achieves state-of-the-art performance in both page-level document parsing and element-level recognition. It significantly outperforms existing pipeline-based solutions and exhibiting strong competitiveness against leading vision-language models (VLMs) in document parsing. Moreover, it excels in recognizing complex document elements, such as text, tables, formulas, and charts, making it suitable for a wide range of challenging content types, including handwritten text and historical documents. This makes it highly versatile and suitable for a wide range of document types and scenarios. 3. Multilingual Support: PaddleOCR-VL Supports 109 languages, covering major global languages, including but not limited to Chinese, English, Japanese, Latin, and Korean, as well as languages with different scripts and structures, such as Russian (Cyrillic script), Arabic, Hindi (Devanagari script), and Thai. This broad language coverage substantially enhances the applicability of our system to multilingual and globalized document processing scenarios. 🚀 Enabled `flash-attn` in the `transformers` library to achieve faster inference with PaddleOCR-VL-0.9B. 🌟 PaddleOCR-VL-0.9B is now officially supported on `vLLM` . 🤗 Supports calling the core module PaddleOCR-VL-0.9B of PaddleOCR-VL via the `transformers` library. 🚀 We release PaddleOCR-VL, — a multilingual documents parsing via a 0.9B Ultra-Compact Vision-Language Model with SOTA performance. > Please ensure that you install PaddlePaddle framework version 3.2.1 or above, along with the special version of safetensors. For macOS users, please use Docker to set up the environment. Accelerate VLM Inference via Optimized Inference Servers You can start the vLLM inference service using one of two methods: For more usage details and parameter explanations, see the documentation. Currently, we support inference using the PaddleOCR-VL-0.9B model with the `transformers` library, which can recognize texts, formulas, tables, and chart elements. In the future, we plan to support full document parsing inference with `transformers`. Below is a simple script we provide to support inference using the PaddleOCR-VL-0.9B model with `transformers`. > [!NOTE] > Note: We currently recommend using the official method for inference, as it is faster and supports page-level document parsing. The example code below only supports element-level recognition. 👉 Click to expand: Use flash-attn to boost performance and reduce memory usage PaddleOCR-VL achieves SOTA performance for overall, text, formula, tables and reading order on OmniDocBench v1.5 PaddleOCR-VL achieves SOTA performance for almost all metrics of overall, text, formula, tables and reading order on OmniDocBench v1.0 > Notes: > - The metrics are from MinerU, OmniDocBench, and our own internal evaluations. PaddleOCR-VL’s robust and versatile capability in handling diverse document types, establishing it as the leading method in the OmniDocBench-OCR-block performance evaluation. In-house-OCR provides a evaluation of performance across multiple languages and text types. Our model demonstrates outstanding accuracy with the lowest edit distances in all evaluated scripts. Our self-built evaluation set contains diverse types of table images, such as Chinese, English, mixed Chinese-English, and tables with various characteristics like full, partial, or no borders, book/manual formats, lists, academic papers, merged cells, as well as low-quality, watermarked, etc. PaddleOCR-VL achieves remarkable performance across all categories. In-house-Formula evaluation set contains simple prints, complex prints, camera scans, and handwritten formulas. PaddleOCR-VL demonstrates the best performance in every category. The evaluation set is broadly categorized into 11 chart categories, including bar-line hybrid, pie, 100% stacked bar, area, bar, bubble, histogram, line, scatterplot, stacked area, and stacked bar. PaddleOCR-VL not only outperforms expert OCR VLMs but also surpasses some 72B-level multimodal language models. We would like to thank ERNIE, Keye, MinerU, OmniDocBench for providing valuable code, model weights and benchmarks. We also appreciate everyone's contribution to this open-source project! If you find PaddleOCR-VL helpful, feel free to give us a star and citation.

license:apache-2.0
8,337
1,577

RT-DETR-L_wireless_table_cell_det

license:apache-2.0
8,297
1

PP-FormulaNet_plus-L

license:apache-2.0
6,935
1

PP-Chart2Table

license:apache-2.0
6,143
2

PP-LCNet_x0_25_textline_ori

license:apache-2.0
3,915
0

korean_PP-OCRv5_mobile_rec

license:apache-2.0
3,856
11

PP-OCRv3_mobile_det

license:apache-2.0
3,753
0

eslav_PP-OCRv5_mobile_rec

license:apache-2.0
2,159
0

PP-OCRv4_mobile_det

license:apache-2.0
2,132
0

PP-OCRv4_mobile_rec

license:apache-2.0
1,737
1

en_PP-OCRv4_mobile_rec

license:apache-2.0
1,667
0

PP-OCRv4_server_det

license:apache-2.0
1,596
1

en_PP-OCRv3_mobile_rec

enPP-OCRv3mobilerec is a text line recognition model within the PP-OCRv3rec series, developed by the PaddleOCR team. The enPP-OCRv3mobilerec model is an English-specific model trained based on PP-OCRv3mobilerec, and it supports English recognition. The key accuracy metrics are as follow: Model Recognition Avg Accuracy(%) Model Storage Size (M) Introduction enPP-OCRv3mobilerec 70.69 7.8 M An ultra-lightweight English recognition model trained based on the PP-OCRv3 recognition model, supporting English and numeric character recognition. Note: If any character (including punctuation) in a line was incorrect, the entire line was marked as wrong. This ensures higher accuracy in practical applications. Please refer to the following commands to install PaddlePaddle using pip: For details about PaddlePaddle installation, please refer to the PaddlePaddle official website. Install the latest version of the PaddleOCR inference package from PyPI: You can quickly experience the functionality with a single command: You can also integrate the model inference of the text recognition module into your project. Before running the following code, please download the sample image to your local machine. For details about usage command and descriptions of parameters, please refer to the Document. The ability of a single model is limited. But the pipeline consists of several models can provide more capacity to resolve difficult problems in real-world scenarios. The general OCR pipeline is used to solve text recognition tasks by extracting text information from images and outputting it in string format. And there are 5 modules in the pipeline: Document Image Orientation Classification Module (Optional) Text Image Unwarping Module (Optional) Text Line Orientation Classification Module (Optional) Text Detection Module Text Recognition Module Run a single command to quickly experience the OCR pipeline: If savepath is specified, the visualization results will be saved under `savepath`. The visualization output is shown below: The command-line method is for quick experience. For project integration, also only a few codes are needed as well: The default model used in pipeline is `PP-OCRv5serverrec`, so it is needed that specifing to `enPP-OCRv3mobilerec` by argument `textrecognitionmodelname`. And you can also use the local model file by argument `textrecognitionmodeldir`. For details about usage command and descriptions of parameters, please refer to the Document.

license:apache-2.0
1,568
0

PP-OCRv4_server_seal_det

license:apache-2.0
1,545
1

PP-OCRv4_server_rec_doc

license:apache-2.0
1,362
1

PP-DocLayoutV2_safetensors

license:apache-2.0
1,309
2

PP-DocLayout-L

license:apache-2.0
1,291
4

arabic_PP-OCRv3_mobile_rec

arabicPP-OCRv3mobilerec is a text line recognition model within the PP-OCRv3rec series, developed by the PaddleOCR team. The arabicPP-OCRv3mobilerec model is an Arabic-alphabet-specific model trained based on PP-OCRv3mobilerec, and it supports Arabic alphabet recognition. The key accuracy metrics are as follow: Model Recognition Avg Accuracy(%) Model Storage Size (M) Introduction arabicPP-OCRv3mobilerec 73.55 7.8 M An ultra-lightweight Arabic alphabet recognition model trained based on the PP-OCRv3 recognition model, supporting Arabic alphabet and numeric character recognition. Note: If any character (including punctuation) in a line was incorrect, the entire line was marked as wrong. This ensures higher accuracy in practical applications. Please refer to the following commands to install PaddlePaddle using pip: For details about PaddlePaddle installation, please refer to the PaddlePaddle official website. Install the latest version of the PaddleOCR inference package from PyPI: You can quickly experience the functionality with a single command: You can also integrate the model inference of the text recognition module into your project. Before running the following code, please download the sample image to your local machine. For details about usage command and descriptions of parameters, please refer to the Document. The ability of a single model is limited. But the pipeline consists of several models can provide more capacity to resolve difficult problems in real-world scenarios. The general OCR pipeline is used to solve text recognition tasks by extracting text information from images and outputting it in string format. And there are 5 modules in the pipeline: Document Image Orientation Classification Module (Optional) Text Image Unwarping Module (Optional) Text Line Orientation Classification Module (Optional) Text Detection Module Text Recognition Module Run a single command to quickly experience the OCR pipeline: The command-line method is for quick experience. For project integration, also only a few codes are needed as well: The default model used in pipeline is `PP-OCRv5serverrec`, so it is needed that specifing to `arabicPP-OCRv3mobilerec` by argument `textrecognitionmodelname`. And you can also use the local model file by argument `textrecognitionmodeldir`. For details about usage command and descriptions of parameters, please refer to the Document.

license:apache-2.0
1,213
1

th_PP-OCRv5_mobile_rec

license:apache-2.0
1,008
1

RT-DETR-H_layout_3cls

license:apache-2.0
940
0

PP-Chart2Table_safetensors

license:apache-2.0
914
0

arabic_PP-OCRv5_mobile_rec

license:apache-2.0
833
0

SLANeXt_wireless

Table structure recognition is an important component of table recognition systems, capable of converting non-editable table images into editable table formats (such as HTML). The goal of table structure recognition is to identify the positions of rows, columns, and cells in tables. The performance of this module directly affects the accuracy and efficiency of the entire table recognition system. The table structure recognition module usually outputs HTML code for the table area, which is then passed as input to the tabl recognition pipeline for further processing. Model Accuracy (%) GPU Inference Time (ms) [Normal Mode / High Performance Mode] CPU Inference Time (ms) [Normal Mode / High Performance Mode] Model Storage Size (M) Note: The accuracy of SLANeXtwireless comes from the results of joint testing with SLANeXtwired. Please refer to the following commands to install PaddlePaddle using pip: For details about PaddlePaddle installation, please refer to the PaddlePaddle official website. Install the latest version of the PaddleOCR inference package from PyPI: You can quickly experience the functionality with a single command: You can also integrate the model inference of the table classification module into your project. Before running the following code, please download the sample image to your local machine. For details about usage command and descriptions of parameters, please refer to the Document. The ability of a single model is limited. But the pipeline consists of several models can provide more capacity to resolve difficult problems in real-world scenarios. The general table recognition V2 pipeline is used to solve table recognition tasks by extracting information from images and outputting it in HTML or Excel format. And there are 8 modules in the pipeline: Table Classification Module Table Structure Recognition Module Table Cell Detection Module Text Detection Module Text Recognition Module Layout Region Detection Module (Optional) Document Image Orientation Classification Module (Optional) Text Image Unwarping Module (Optional) Run a single command to quickly experience the general table recognition V2 pipeline: If savepath is specified, the visualization results will be saved under `savepath`. The visualization output is shown below: The command-line method is for quick experience. For project integration, also only a few codes are needed as well: For details about usage command and descriptions of parameters, please refer to the Document. Layout analysis is a technique used to extract structured information from document images. PP-StructureV3 includes the following six modules: Layout Detection Module General OCR Pipeline Document Image Preprocessing Pipeline (Optional) Table Recognition Pipeline (Optional) Seal Recognition Pipeline (Optional) Formula Recognition Pipeline (Optional) Run a single command to quickly experience the PP-StructureV3 pipeline: Results would be printed to the terminal. If savepath is specified, the results will be saved under `savepath`. Just a few lines of code can experience the inference of the pipeline. Taking the PP-StructureV3 pipeline as an example: For details about usage command and descriptions of parameters, please refer to the Document.

license:apache-2.0
801
0

PP-OCRv3_mobile_rec

license:apache-2.0
756
0

PP-OCRv4_server_rec

license:apache-2.0
651
1

PP-OCRv5_server_rec_safetensors

license:apache-2.0
635
2

PP-OCRv5_mobile_rec_safetensors

license:apache-2.0
612
2

SLANet

Table structure recognition is an important component of table recognition systems, capable of converting non-editable table images into editable table formats (such as HTML). The goal of table structure recognition is to identify the positions of rows, columns, and cells in tables. The performance of this module directly affects the accuracy and efficiency of the entire table recognition system. The table structure recognition module usually outputs HTML code for the table area, which is then passed as input to the tabl recognition pipeline for further processing. Model Accuracy (%) GPU Inference Time (ms) [Normal Mode / High Performance Mode] CPU Inference Time (ms) [Normal Mode / High Performance Mode] Model Storage Size (M) Please refer to the following commands to install PaddlePaddle using pip: For details about PaddlePaddle installation, please refer to the PaddlePaddle official website. Install the latest version of the PaddleOCR inference package from PyPI: You can quickly experience the functionality with a single command: You can also integrate the model inference of the table classification module into your project. Before running the following code, please download the sample image to your local machine. For details about usage command and descriptions of parameters, please refer to the Document. The ability of a single model is limited. But the pipeline consists of several models can provide more capacity to resolve difficult problems in real-world scenarios. The general table recognition V2 pipeline is used to solve table recognition tasks by extracting information from images and outputting it in HTML or Excel format. And there are 8 modules in the pipeline: Table Classification Module Table Structure Recognition Module Table Cell Detection Module Text Detection Module Text Recognition Module Layout Region Detection Module (Optional) Document Image Orientation Classification Module (Optional) Text Image Unwarping Module (Optional) Run a single command to quickly experience the general table recognition V2 pipeline: If savepath is specified, the visualization results will be saved under `savepath`. The visualization output is shown below: The command-line method is for quick experience. For project integration, also only a few codes are needed as well: Then, if you want to use the SLANet model for table recognition, just change the model name and use the end-to-end prediction mode as below: For details about usage command and descriptions of parameters, please refer to the Document. Layout analysis is a technique used to extract structured information from document images. PP-StructureV3 includes the following six modules: Layout Detection Module General OCR Pipeline Document Image Preprocessing Pipeline (Optional) Table Recognition Pipeline (Optional) Seal Recognition Pipeline (Optional) Formula Recognition Pipeline (Optional) Run a single command to quickly experience the PP-StructureV3 pipeline: Results would be printed to the terminal. If savepath is specified, the results will be saved under `savepath`. Just a few lines of code can experience the inference of the pipeline. Taking the PP-StructureV3 pipeline as an example: The default model used in pipeline is `SLANeXtwired` and `SLANeXtwireless`, so it is needed that specifing to `SLANet` by argument. For details about usage command and descriptions of parameters, please refer to the Document.

license:apache-2.0
555
0

PP-OCRv5_server_det_safetensors

license:apache-2.0
519
3

PP-OCRv5_mobile_det_safetensors

license:apache-2.0
513
2

SLANeXt_wired_safetensors

license:apache-2.0
468
0

latin_PP-OCRv3_mobile_rec

license:apache-2.0
450
0

devanagari_PP-OCRv3_mobile_rec

license:apache-2.0
425
0

PP-LCNet_x1_0_doc_ori_safetensors

license:apache-2.0
412
0

cyrillic_PP-OCRv3_mobile_rec

license:apache-2.0
391
0

PP-FormulaNet_plus-M

license:apache-2.0
372
0

japan_PP-OCRv3_mobile_rec

license:apache-2.0
356
0

UVDoc_safetensors

license:apache-2.0
333
0

PP-OCRv4_mobile_seal_det

license:apache-2.0
302
0

devanagari_PP-OCRv5_mobile_rec

license:apache-2.0
276
0

PP-DocBee2-3B

NaNK
license:apache-2.0
175
0

PP-DocLayout-M

license:apache-2.0
167
0

RT-DETR-H_layout_17cls

license:apache-2.0
165
2

PP-DocLayout-S

license:apache-2.0
160
0

cyrillic_PP-OCRv5_mobile_rec

license:apache-2.0
155
0

el_PP-OCRv5_mobile_rec

license:apache-2.0
147
0

PP-OCRv3_server_det

license:apache-2.0
141
0

PicoDet_layout_1x_table

license:apache-2.0
107
0

PP-DocLayoutV3

license:apache-2.0
98
8

ta_PP-OCRv5_mobile_rec

license:apache-2.0
91
0

korean_PP-OCRv3_mobile_rec

license:apache-2.0
87
0

PicoDet_layout_1x

license:apache-2.0
81
0

te_PP-OCRv5_mobile_rec

license:apache-2.0
77
0

ch_SVTRv2_rec

license:apache-2.0
74
0

te_PP-OCRv3_mobile_rec

license:apache-2.0
63
0

RT-DETR-L_wired_table_cell_det_safetensors

license:apache-2.0
55
0

PicoDet-S_layout_3cls

license:apache-2.0
55
0

ka_PP-OCRv3_mobile_rec

kaPP-OCRv3mobilerec is a text line recognition model within the PP-OCRv3rec series, developed by the PaddleOCR team. The kaPP-OCRv3mobilerec model is an Kannada-specific model trained based on PP-OCRv3mobilerec, and it supports English recognition. The key accuracy metrics are as follow: Model Recognition Avg Accuracy(%) Model Storage Size (M) Introduction kaPP-OCRv3mobilerec 96.96 8.0 M An ultra-lightweight Kannada recognition model trained based on the PP-OCRv3 recognition model, supporting Kannada and numeric character recognition. Note: If any character (including punctuation) in a line was incorrect, the entire line was marked as wrong. This ensures higher accuracy in practical applications. Please refer to the following commands to install PaddlePaddle using pip: For details about PaddlePaddle installation, please refer to the PaddlePaddle official website. Install the latest version of the PaddleOCR inference package from PyPI: You can quickly experience the functionality with a single command: You can also integrate the model inference of the text recognition module into your project. Before running the following code, please download the sample image to your local machine. For details about usage command and descriptions of parameters, please refer to the Document. The ability of a single model is limited. But the pipeline consists of several models can provide more capacity to resolve difficult problems in real-world scenarios. The general OCR pipeline is used to solve text recognition tasks by extracting text information from images and outputting it in string format. And there are 5 modules in the pipeline: Document Image Orientation Classification Module (Optional) Text Image Unwarping Module (Optional) Text Line Orientation Classification Module (Optional) Text Detection Module Text Recognition Module Run a single command to quickly experience the OCR pipeline: The command-line method is for quick experience. For project integration, also only a few codes are needed as well: The default model used in pipeline is `PP-OCRv5serverrec`, so it is needed that specifing to `kaPP-OCRv3mobilerec` by argument `textrecognitionmodelname`. And you can also use the local model file by argument `textrecognitionmodeldir`. For details about usage command and descriptions of parameters, please refer to the Document.

license:apache-2.0
54
0

ta_PP-OCRv3_mobile_rec

license:apache-2.0
53
0

SLANeXt_wireless_safetensors

license:apache-2.0
50
0

UniMERNet

license:apache-2.0
46
0

PP-FormulaNet_plus-S

license:apache-2.0
43
0

PicoDet-L_layout_3cls

license:apache-2.0
42
0

PicoDet-L_layout_17cls

license:apache-2.0
40
0

PP-DocBee-2B

NaNK
license:apache-2.0
36
1

PP-FormulaNet-S

license:apache-2.0
34
1

PP-DocLayout_plus-L_safetensors

license:apache-2.0
34
0

RT-DETR-L_wireless_table_cell_det_safetensors

license:apache-2.0
31
0

PicoDet-S_layout_17cls

license:apache-2.0
31
0

chinese_cht_PP-OCRv3_mobile_rec

license:apache-2.0
30
0

PP-DocBlockLayout_safetensors

license:apache-2.0
29
0

ch_RepSVTR_rec

license:apache-2.0
29
0

PP-LCNet_x1_0_textline_ori_safetensors

license:apache-2.0
26
0

PP-FormulaNet-L

license:apache-2.0
26
0

PP-LCNet_x0_25_textline_ori_safetensors

license:apache-2.0
25
0

PP-LCNet_x1_0_table_cls_safetensors

license:apache-2.0
24
0

LaTeX_OCR_rec

license:apache-2.0
16
0

PP-DocBee-7B

NaNK
license:apache-2.0
11
1

PaddleOCR-VL-1.5-GGUF

license:apache-2.0
7
1