PaddlePaddle
UVDoc
--- license: apache-2.0 library_name: PaddleOCR language: - en - zh pipeline_tag: image-to-text tags: - OCR - PaddlePaddle - PaddleOCR - doc_img_unwarping ---
PP-LCNet_x1_0_doc_ori
--- license: apache-2.0 library_name: PaddleOCR language: - en - zh pipeline_tag: image-to-text tags: - OCR - PaddlePaddle - PaddleOCR - doc_img_orientation_classification ---
PP-OCRv5_server_det
--- license: apache-2.0 library_name: PaddleOCR language: - en - zh pipeline_tag: image-to-text tags: - OCR - PaddlePaddle - PaddleOCR - textline_detection ---
en_PP-OCRv5_mobile_rec
--- license: apache-2.0 library_name: PaddleOCR language: - en pipeline_tag: image-to-text tags: - OCR - PaddlePaddle - PaddleOCR - textline_recognition ---
PP-DocLayoutV3_safetensors
PP-LCNet_x1_0_textline_ori
--- license: apache-2.0 library_name: PaddleOCR language: - en - zh pipeline_tag: image-to-text tags: - OCR - PaddlePaddle - PaddleOCR - textline_orientation_classification ---
PP-OCRv5_server_rec
PP-OCRv5serverrec is one of the PP-OCRv5rec that are the latest generation text line recognition models developed by PaddleOCR team. It aims to efficiently and accurately support the recognition of four major languages—Simplified Chinese, Traditional Chinese, English, and Japanese—as well as complex text scenarios such as handwriting, vertical text, pinyin, and rare characters using a single model. The key accuracy metrics are as follow: | Handwritten Chinese | Handwritten English | Printed Chinese | Printed English | Traditional Chinese | Ancient Text | Japanese | General Scenario | Pinyin | Rotation | Distortion | Artistic Text | Average | | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | | 0.5807 | 0.5806 | 0.9013 | 0.8679 | 0.7472 | 0.6039 | 0.7372 | 0.5946 | 0.8384 | 0.7435 | 0.9314 | 0.6397 | 0.8401 | Note: If any character (including punctuation) in a line was incorrect, the entire line was marked as wrong. This ensures higher accuracy in practical applications. Please refer to the following commands to install PaddlePaddle using pip: For details about PaddlePaddle installation, please refer to the PaddlePaddle official website. Install the latest version of the PaddleOCR inference package from PyPI: You can quickly experience the functionality with a single command: You can also integrate the model inference of the text recognition module into your project. Before running the following code, please download the sample image to your local machine. For details about usage command and descriptions of parameters, please refer to the Document. The ability of a single model is limited. But the pipeline consists of several models can provide more capacity to resolve difficult problems in real-world scenarios. The general OCR pipeline is used to solve text recognition tasks by extracting text information from images and outputting it in string format. And there are 5 modules in the pipeline: Document Image Orientation Classification Module (Optional) Text Image Unwarping Module (Optional) Text Line Orientation Classification Module (Optional) Text Detection Module Text Recognition Module Run a single command to quickly experience the OCR pipeline: If savepath is specified, the visualization results will be saved under `savepath`. The visualization output is shown below: The command-line method is for quick experience. For project integration, also only a few codes are needed as well: The default model used in pipeline is `PP-OCRv5serverrec`, and you can also use the local model file by argument `textrecognitionmodeldir`. For details about usage command and descriptions of parameters, please refer to the Document. Layout analysis is a technique used to extract structured information from document images. PP-StructureV3 includes the following six modules: Layout Detection Module General OCR Subline Document Image Preprocessing Subline (Optional) Table Recognition Subline (Optional) Seal Recognition Subline (Optional) Formula Recognition Subline (Optional) Run a single command to quickly experience the PP-StructureV3 pipeline: If savepath is specified, the visualization results will be saved under `savepath`. The visualization output is shown below: Just a few lines of code can experience the inference of the pipeline. Taking the PP-StructureV3 pipeline as an example: The default model used in pipeline is `PP-OCRv5serverrec`, and you can also use the local model file by argument `textrecognitionmodeldir`. For details about usage command and descriptions of parameters, please refer to the Document.
PP-OCRv5_mobile_det
PP-OCRv5mobiledet is one of the PP-OCRv5det series, the latest generation of text detection models developed by the PaddleOCR team. It aims to efficiently and accurately supports the detection of text in diverse scenarios—including handwriting, vertical, rotated, and curved text—across multiple languages such as Simplified Chinese, Traditional Chinese, English, and Japanese. Key features include robust handling of complex layouts, varying text sizes, and challenging backgrounds, making it suitable for practical applications like document analysis, license plate recognition, and scene text detection. The key accuracy metrics are as follow: | Handwritten Chinese | Handwritten English | Printed Chinese | Printed English | Traditional Chinese | Ancient Text | Japanese | General Scenario | Pinyin | Rotation | Distortion | Artistic Text | Average | | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | | 0.744 | 0.777 | 0.905 | 0.910 | 0.823 | 0.581 | 0.727 | 0.721 | 0.575 | 0.647 | 0.827 | 0.525 | 0.770 | Please refer to the following commands to install PaddlePaddle using pip: For details about PaddlePaddle installation, please refer to the PaddlePaddle official website. Install the latest version of the PaddleOCR inference package from PyPI: You can quickly experience the functionality with a single command: You can also integrate the model inference of the text detection module into your project. Before running the following code, please download the sample image to your local machine. For details about usage command and descriptions of parameters, please refer to the Document. The ability of a single model is limited. But the pipeline consists of several models can provide more capacity to resolve difficult problems in real-world scenarios. The general OCR pipeline is used to solve text recognition tasks by extracting text information from images and outputting it in string format. And there are 5 modules in the pipeline: Document Image Orientation Classification Module (Optional) Text Image Unwarping Module (Optional) Text Line Orientation Classification Module (Optional) Text Detection Module Text Recognition Module Run a single command to quickly experience the OCR pipeline: If savepath is specified, the visualization results will be saved under `savepath`. The visualization output is shown below: The command-line method is for quick experience. For project integration, also only a few codes are needed as well: The default model used in pipeline is `PP-OCRv5serverdet`, so it is needed that specifing to `PP-OCRv5mobiledet` by argument `textdetectionmodelname`. And you can also use the local model file by argument `textdetectionmodeldir`. For details about usage command and descriptions of parameters, please refer to the Document. Layout analysis is a technique used to extract structured information from document images. PP-StructureV3 includes the following six modules: Layout Detection Module General OCR Pipeline Document Image Preprocessing Pipeline (Optional) Table Recognition Pipeline (Optional) Seal Recognition Pipeline (Optional) Formula Recognition Pipeline (Optional) Run a single command to quickly experience the PP-StructureV3 pipeline: Results would be printed to the terminal. If savepath is specified, the results will be saved under `savepath`. The predicted markdown visualization is shown below: Just a few lines of code can experience the inference of the pipeline. Taking the PP-StructureV3 pipeline as an example: The default model used in pipeline is `PP-OCRv5serverdet`, so it is needed that specifing to `PP-OCRv5mobiledet` by argument `textdetectionmodelname`. And you can also use the local model file by argument `textdetectionmodeldir`. For details about usage command and descriptions of parameters, please refer to the Document.
latin_PP-OCRv5_mobile_rec
latinPP-OCRv5mobilerec is one of the PP-OCRv5rec that are the latest generation text line recognition models developed by PaddleOCR team. It aims to efficiently and accurately support the recognition of Korean. The key accuracy metrics are as follow: | 模型 | 拉丁字母语言数据集 精度 (%) | |-|-| | latinPP-OCRv5mobilerec | 84.7| Note: If any character (including punctuation) in a line was incorrect, the entire line was marked as wrong. This ensures higher accuracy in practical applications. Please refer to the following commands to install PaddlePaddle using pip: For details about PaddlePaddle installation, please refer to the PaddlePaddle official website. Install the latest version of the PaddleOCR inference package from PyPI: You can quickly experience the functionality with a single command: You can also integrate the model inference of the text recognition module into your project. Before running the following code, please download the sample image to your local machine. For details about usage command and descriptions of parameters, please refer to the Document. The ability of a single model is limited. But the pipeline consists of several models can provide more capacity to resolve difficult problems in real-world scenarios. The PP-OCRv5 pipeline is used to solve text recognition tasks by extracting text information from images and outputting it in string format. And there are 5 modules in the pipeline: Document Image Orientation Classification Module (Optional) Text Image Unwarping Module (Optional) Text Line Orientation Classification Module (Optional) Text Detection Module Text Recognition Module Run a single command to quickly experience the OCR pipeline: If savepath is specified, the visualization results will be saved under `savepath`. The visualization output is shown below: The command-line method is for quick experience. For project integration, also only a few codes are needed as well: The default model used in pipeline is `PP-OCRv5serverrec`, and you can also use the local model file by argument `textrecognitionmodeldir`. For details about usage command and descriptions of parameters, please refer to the Document.
PaddleOCR-VL-1.5
PP-DocLayoutV2
PP-DocLayoutV2 is a dedicated lightweight model for layout analysis, focusing specifically on element detection, classification, and reading order prediction. PP-DocLayoutV2 is composed of two sequentially connected networks. The first is an RT-DETR-based detection model that performs layout element detection and classification. The detected bounding boxes and class labels are then passed to a subsequent pointer network, which is responsible for ordering these layout elements. > For Windows users, please use WSL or a Docker container. For more usage details and parameter explanations, see the documentation. If you find PaddleOCR-VL helpful, feel free to give us a star and citation.
PP-DocLayout_plus-L
A higher-precision layout area localization model trained on a self-built dataset containing Chinese and English papers, PPT, multi-layout magazines, contracts, books, exams, ancient books and research reports using RT-DETR-L. The layout detection model includes 20 common categories: document title, paragraph title, text, page number, abstract, table, references, footnotes, header, footer, algorithm, formula, formula number, image, table, seal, figuretable title, chart, and sidebar text and lists of references. The key metrics are as follow: | Model| mAP(0.5) (%) | | --- | --- | |PP-DocLayoutplus-L | 83.2 | Note: the evaluation set of the above precision indicators is the self built version sub area detection data set, including Chinese and English papers, magazines, newspapers, research reports PPT、 1000 document type pictures such as test papers and textbooks. Please refer to the following commands to install PaddlePaddle using pip: For details about PaddlePaddle installation, please refer to the PaddlePaddle official website. Install the latest version of the PaddleOCR inference package from PyPI: You can quickly experience the functionality with a single command: You can also integrate the model inference of the layout detection module into your project. Before running the following code, please download the sample image to your local machine. For details about usage command and descriptions of parameters, please refer to the Document. The ability of a single model is limited. But the pipeline consists of several models can provide more capacity to resolve difficult problems in real-world scenarios. Layout analysis is a technique used to extract structured information from document images. PP-StructureV3 includes the following six modules: Layout Detection Module General OCR Sub-pipeline Document Image Preprocessing Sub-pipeline (Optional) Table Recognition Sub-pipeline (Optional) Seal Recognition Sub-pipeline (Optional) Formula Recognition Sub-pipeline (Optional) You can quickly experience the PP-StructureV3 pipeline with a single command. You can experience the inference of the pipeline with just a few lines of code. Taking the PP-StructureV3 pipeline as an example: The default model used in pipeline is `PP-DocLayoutplus-L`. For details about usage command and descriptions of parameters, please refer to the Document.
RT-DETR-L_wired_table_cell_det
PP-OCRv5_mobile_rec
SLANet_plus
SLANeXt_wired
PP-DocBlockLayout
PP-LCNet_x1_0_table_cls
PaddleOCR-VL
PaddleOCR-VL: Boosting Multilingual Document Parsing via a 0.9B Ultra-Compact Vision-Language Model [](https://github.com/PaddlePaddle/PaddleOCR) [](https://huggingface.co/PaddlePaddle/PaddleOCR-VL) [](https://modelscope.cn/models/PaddlePaddle/PaddleOCR-VL) [](https://huggingface.co/spaces/PaddlePaddle/PaddleOCR-VLOnlineDemo) [](https://modelscope.cn/studios/PaddlePaddle/PaddleOCR-VLOnlineDemo/summary) [](https://discord.gg/JPmZXDsEEK) [](https://x.com/PaddlePaddle) [](./LICENSE) 🔥 Official Website: Baidu AI Studio | 📝 arXiv: Technical Report PaddleOCR-VL is a SOTA and resource-efficient model tailored for document parsing. Its core component is PaddleOCR-VL-0.9B, a compact yet powerful vision-language model (VLM) that integrates a NaViT-style dynamic resolution visual encoder with the ERNIE-4.5-0.3B language model to enable accurate element recognition. This innovative model efficiently supports 109 languages and excels in recognizing complex elements (e.g., text, tables, formulas, and charts), while maintaining minimal resource consumption. Through comprehensive evaluations on widely used public benchmarks and in-house benchmarks, PaddleOCR-VL achieves SOTA performance in both page-level document parsing and element-level recognition. It significantly outperforms existing solutions, exhibits strong competitiveness against top-tier VLMs, and delivers fast inference speeds. These strengths make it highly suitable for practical deployment in real-world scenarios. 1. Compact yet Powerful VLM Architecture: We present a novel vision-language model that is specifically designed for resource-efficient inference, achieving outstanding performance in element recognition. By integrating a NaViT-style dynamic high-resolution visual encoder with the lightweight ERNIE-4.5-0.3B language model, we significantly enhance the model’s recognition capabilities and decoding efficiency. This integration maintains high accuracy while reducing computational demands, making it well-suited for efficient and practical document processing applications. 2. SOTA Performance on Document Parsing: PaddleOCR-VL achieves state-of-the-art performance in both page-level document parsing and element-level recognition. It significantly outperforms existing pipeline-based solutions and exhibiting strong competitiveness against leading vision-language models (VLMs) in document parsing. Moreover, it excels in recognizing complex document elements, such as text, tables, formulas, and charts, making it suitable for a wide range of challenging content types, including handwritten text and historical documents. This makes it highly versatile and suitable for a wide range of document types and scenarios. 3. Multilingual Support: PaddleOCR-VL Supports 109 languages, covering major global languages, including but not limited to Chinese, English, Japanese, Latin, and Korean, as well as languages with different scripts and structures, such as Russian (Cyrillic script), Arabic, Hindi (Devanagari script), and Thai. This broad language coverage substantially enhances the applicability of our system to multilingual and globalized document processing scenarios. 🚀 Enabled `flash-attn` in the `transformers` library to achieve faster inference with PaddleOCR-VL-0.9B. 🌟 PaddleOCR-VL-0.9B is now officially supported on `vLLM` . 🤗 Supports calling the core module PaddleOCR-VL-0.9B of PaddleOCR-VL via the `transformers` library. 🚀 We release PaddleOCR-VL, — a multilingual documents parsing via a 0.9B Ultra-Compact Vision-Language Model with SOTA performance. > Please ensure that you install PaddlePaddle framework version 3.2.1 or above, along with the special version of safetensors. For macOS users, please use Docker to set up the environment. Accelerate VLM Inference via Optimized Inference Servers You can start the vLLM inference service using one of two methods: For more usage details and parameter explanations, see the documentation. Currently, we support inference using the PaddleOCR-VL-0.9B model with the `transformers` library, which can recognize texts, formulas, tables, and chart elements. In the future, we plan to support full document parsing inference with `transformers`. Below is a simple script we provide to support inference using the PaddleOCR-VL-0.9B model with `transformers`. > [!NOTE] > Note: We currently recommend using the official method for inference, as it is faster and supports page-level document parsing. The example code below only supports element-level recognition. 👉 Click to expand: Use flash-attn to boost performance and reduce memory usage PaddleOCR-VL achieves SOTA performance for overall, text, formula, tables and reading order on OmniDocBench v1.5 PaddleOCR-VL achieves SOTA performance for almost all metrics of overall, text, formula, tables and reading order on OmniDocBench v1.0 > Notes: > - The metrics are from MinerU, OmniDocBench, and our own internal evaluations. PaddleOCR-VL’s robust and versatile capability in handling diverse document types, establishing it as the leading method in the OmniDocBench-OCR-block performance evaluation. In-house-OCR provides a evaluation of performance across multiple languages and text types. Our model demonstrates outstanding accuracy with the lowest edit distances in all evaluated scripts. Our self-built evaluation set contains diverse types of table images, such as Chinese, English, mixed Chinese-English, and tables with various characteristics like full, partial, or no borders, book/manual formats, lists, academic papers, merged cells, as well as low-quality, watermarked, etc. PaddleOCR-VL achieves remarkable performance across all categories. In-house-Formula evaluation set contains simple prints, complex prints, camera scans, and handwritten formulas. PaddleOCR-VL demonstrates the best performance in every category. The evaluation set is broadly categorized into 11 chart categories, including bar-line hybrid, pie, 100% stacked bar, area, bar, bubble, histogram, line, scatterplot, stacked area, and stacked bar. PaddleOCR-VL not only outperforms expert OCR VLMs but also surpasses some 72B-level multimodal language models. We would like to thank ERNIE, Keye, MinerU, OmniDocBench for providing valuable code, model weights and benchmarks. We also appreciate everyone's contribution to this open-source project! If you find PaddleOCR-VL helpful, feel free to give us a star and citation.
RT-DETR-L_wireless_table_cell_det
PP-FormulaNet_plus-L
PP-Chart2Table
PP-LCNet_x0_25_textline_ori
korean_PP-OCRv5_mobile_rec
PP-OCRv3_mobile_det
eslav_PP-OCRv5_mobile_rec
PP-OCRv4_mobile_det
PP-OCRv4_mobile_rec
en_PP-OCRv4_mobile_rec
PP-OCRv4_server_det
en_PP-OCRv3_mobile_rec
enPP-OCRv3mobilerec is a text line recognition model within the PP-OCRv3rec series, developed by the PaddleOCR team. The enPP-OCRv3mobilerec model is an English-specific model trained based on PP-OCRv3mobilerec, and it supports English recognition. The key accuracy metrics are as follow: Model Recognition Avg Accuracy(%) Model Storage Size (M) Introduction enPP-OCRv3mobilerec 70.69 7.8 M An ultra-lightweight English recognition model trained based on the PP-OCRv3 recognition model, supporting English and numeric character recognition. Note: If any character (including punctuation) in a line was incorrect, the entire line was marked as wrong. This ensures higher accuracy in practical applications. Please refer to the following commands to install PaddlePaddle using pip: For details about PaddlePaddle installation, please refer to the PaddlePaddle official website. Install the latest version of the PaddleOCR inference package from PyPI: You can quickly experience the functionality with a single command: You can also integrate the model inference of the text recognition module into your project. Before running the following code, please download the sample image to your local machine. For details about usage command and descriptions of parameters, please refer to the Document. The ability of a single model is limited. But the pipeline consists of several models can provide more capacity to resolve difficult problems in real-world scenarios. The general OCR pipeline is used to solve text recognition tasks by extracting text information from images and outputting it in string format. And there are 5 modules in the pipeline: Document Image Orientation Classification Module (Optional) Text Image Unwarping Module (Optional) Text Line Orientation Classification Module (Optional) Text Detection Module Text Recognition Module Run a single command to quickly experience the OCR pipeline: If savepath is specified, the visualization results will be saved under `savepath`. The visualization output is shown below: The command-line method is for quick experience. For project integration, also only a few codes are needed as well: The default model used in pipeline is `PP-OCRv5serverrec`, so it is needed that specifing to `enPP-OCRv3mobilerec` by argument `textrecognitionmodelname`. And you can also use the local model file by argument `textrecognitionmodeldir`. For details about usage command and descriptions of parameters, please refer to the Document.
PP-OCRv4_server_seal_det
PP-OCRv4_server_rec_doc
PP-DocLayoutV2_safetensors
PP-DocLayout-L
arabic_PP-OCRv3_mobile_rec
arabicPP-OCRv3mobilerec is a text line recognition model within the PP-OCRv3rec series, developed by the PaddleOCR team. The arabicPP-OCRv3mobilerec model is an Arabic-alphabet-specific model trained based on PP-OCRv3mobilerec, and it supports Arabic alphabet recognition. The key accuracy metrics are as follow: Model Recognition Avg Accuracy(%) Model Storage Size (M) Introduction arabicPP-OCRv3mobilerec 73.55 7.8 M An ultra-lightweight Arabic alphabet recognition model trained based on the PP-OCRv3 recognition model, supporting Arabic alphabet and numeric character recognition. Note: If any character (including punctuation) in a line was incorrect, the entire line was marked as wrong. This ensures higher accuracy in practical applications. Please refer to the following commands to install PaddlePaddle using pip: For details about PaddlePaddle installation, please refer to the PaddlePaddle official website. Install the latest version of the PaddleOCR inference package from PyPI: You can quickly experience the functionality with a single command: You can also integrate the model inference of the text recognition module into your project. Before running the following code, please download the sample image to your local machine. For details about usage command and descriptions of parameters, please refer to the Document. The ability of a single model is limited. But the pipeline consists of several models can provide more capacity to resolve difficult problems in real-world scenarios. The general OCR pipeline is used to solve text recognition tasks by extracting text information from images and outputting it in string format. And there are 5 modules in the pipeline: Document Image Orientation Classification Module (Optional) Text Image Unwarping Module (Optional) Text Line Orientation Classification Module (Optional) Text Detection Module Text Recognition Module Run a single command to quickly experience the OCR pipeline: The command-line method is for quick experience. For project integration, also only a few codes are needed as well: The default model used in pipeline is `PP-OCRv5serverrec`, so it is needed that specifing to `arabicPP-OCRv3mobilerec` by argument `textrecognitionmodelname`. And you can also use the local model file by argument `textrecognitionmodeldir`. For details about usage command and descriptions of parameters, please refer to the Document.
th_PP-OCRv5_mobile_rec
RT-DETR-H_layout_3cls
PP-Chart2Table_safetensors
arabic_PP-OCRv5_mobile_rec
SLANeXt_wireless
Table structure recognition is an important component of table recognition systems, capable of converting non-editable table images into editable table formats (such as HTML). The goal of table structure recognition is to identify the positions of rows, columns, and cells in tables. The performance of this module directly affects the accuracy and efficiency of the entire table recognition system. The table structure recognition module usually outputs HTML code for the table area, which is then passed as input to the tabl recognition pipeline for further processing. Model Accuracy (%) GPU Inference Time (ms) [Normal Mode / High Performance Mode] CPU Inference Time (ms) [Normal Mode / High Performance Mode] Model Storage Size (M) Note: The accuracy of SLANeXtwireless comes from the results of joint testing with SLANeXtwired. Please refer to the following commands to install PaddlePaddle using pip: For details about PaddlePaddle installation, please refer to the PaddlePaddle official website. Install the latest version of the PaddleOCR inference package from PyPI: You can quickly experience the functionality with a single command: You can also integrate the model inference of the table classification module into your project. Before running the following code, please download the sample image to your local machine. For details about usage command and descriptions of parameters, please refer to the Document. The ability of a single model is limited. But the pipeline consists of several models can provide more capacity to resolve difficult problems in real-world scenarios. The general table recognition V2 pipeline is used to solve table recognition tasks by extracting information from images and outputting it in HTML or Excel format. And there are 8 modules in the pipeline: Table Classification Module Table Structure Recognition Module Table Cell Detection Module Text Detection Module Text Recognition Module Layout Region Detection Module (Optional) Document Image Orientation Classification Module (Optional) Text Image Unwarping Module (Optional) Run a single command to quickly experience the general table recognition V2 pipeline: If savepath is specified, the visualization results will be saved under `savepath`. The visualization output is shown below: The command-line method is for quick experience. For project integration, also only a few codes are needed as well: For details about usage command and descriptions of parameters, please refer to the Document. Layout analysis is a technique used to extract structured information from document images. PP-StructureV3 includes the following six modules: Layout Detection Module General OCR Pipeline Document Image Preprocessing Pipeline (Optional) Table Recognition Pipeline (Optional) Seal Recognition Pipeline (Optional) Formula Recognition Pipeline (Optional) Run a single command to quickly experience the PP-StructureV3 pipeline: Results would be printed to the terminal. If savepath is specified, the results will be saved under `savepath`. Just a few lines of code can experience the inference of the pipeline. Taking the PP-StructureV3 pipeline as an example: For details about usage command and descriptions of parameters, please refer to the Document.
PP-OCRv3_mobile_rec
PP-OCRv4_server_rec
PP-OCRv5_server_rec_safetensors
PP-OCRv5_mobile_rec_safetensors
SLANet
Table structure recognition is an important component of table recognition systems, capable of converting non-editable table images into editable table formats (such as HTML). The goal of table structure recognition is to identify the positions of rows, columns, and cells in tables. The performance of this module directly affects the accuracy and efficiency of the entire table recognition system. The table structure recognition module usually outputs HTML code for the table area, which is then passed as input to the tabl recognition pipeline for further processing. Model Accuracy (%) GPU Inference Time (ms) [Normal Mode / High Performance Mode] CPU Inference Time (ms) [Normal Mode / High Performance Mode] Model Storage Size (M) Please refer to the following commands to install PaddlePaddle using pip: For details about PaddlePaddle installation, please refer to the PaddlePaddle official website. Install the latest version of the PaddleOCR inference package from PyPI: You can quickly experience the functionality with a single command: You can also integrate the model inference of the table classification module into your project. Before running the following code, please download the sample image to your local machine. For details about usage command and descriptions of parameters, please refer to the Document. The ability of a single model is limited. But the pipeline consists of several models can provide more capacity to resolve difficult problems in real-world scenarios. The general table recognition V2 pipeline is used to solve table recognition tasks by extracting information from images and outputting it in HTML or Excel format. And there are 8 modules in the pipeline: Table Classification Module Table Structure Recognition Module Table Cell Detection Module Text Detection Module Text Recognition Module Layout Region Detection Module (Optional) Document Image Orientation Classification Module (Optional) Text Image Unwarping Module (Optional) Run a single command to quickly experience the general table recognition V2 pipeline: If savepath is specified, the visualization results will be saved under `savepath`. The visualization output is shown below: The command-line method is for quick experience. For project integration, also only a few codes are needed as well: Then, if you want to use the SLANet model for table recognition, just change the model name and use the end-to-end prediction mode as below: For details about usage command and descriptions of parameters, please refer to the Document. Layout analysis is a technique used to extract structured information from document images. PP-StructureV3 includes the following six modules: Layout Detection Module General OCR Pipeline Document Image Preprocessing Pipeline (Optional) Table Recognition Pipeline (Optional) Seal Recognition Pipeline (Optional) Formula Recognition Pipeline (Optional) Run a single command to quickly experience the PP-StructureV3 pipeline: Results would be printed to the terminal. If savepath is specified, the results will be saved under `savepath`. Just a few lines of code can experience the inference of the pipeline. Taking the PP-StructureV3 pipeline as an example: The default model used in pipeline is `SLANeXtwired` and `SLANeXtwireless`, so it is needed that specifing to `SLANet` by argument. For details about usage command and descriptions of parameters, please refer to the Document.
PP-OCRv5_server_det_safetensors
PP-OCRv5_mobile_det_safetensors
SLANeXt_wired_safetensors
latin_PP-OCRv3_mobile_rec
devanagari_PP-OCRv3_mobile_rec
PP-LCNet_x1_0_doc_ori_safetensors
cyrillic_PP-OCRv3_mobile_rec
PP-FormulaNet_plus-M
japan_PP-OCRv3_mobile_rec
UVDoc_safetensors
PP-OCRv4_mobile_seal_det
devanagari_PP-OCRv5_mobile_rec
PP-DocBee2-3B
PP-DocLayout-M
RT-DETR-H_layout_17cls
PP-DocLayout-S
cyrillic_PP-OCRv5_mobile_rec
el_PP-OCRv5_mobile_rec
PP-OCRv3_server_det
PicoDet_layout_1x_table
PP-DocLayoutV3
ta_PP-OCRv5_mobile_rec
korean_PP-OCRv3_mobile_rec
PicoDet_layout_1x
te_PP-OCRv5_mobile_rec
ch_SVTRv2_rec
te_PP-OCRv3_mobile_rec
RT-DETR-L_wired_table_cell_det_safetensors
PicoDet-S_layout_3cls
ka_PP-OCRv3_mobile_rec
kaPP-OCRv3mobilerec is a text line recognition model within the PP-OCRv3rec series, developed by the PaddleOCR team. The kaPP-OCRv3mobilerec model is an Kannada-specific model trained based on PP-OCRv3mobilerec, and it supports English recognition. The key accuracy metrics are as follow: Model Recognition Avg Accuracy(%) Model Storage Size (M) Introduction kaPP-OCRv3mobilerec 96.96 8.0 M An ultra-lightweight Kannada recognition model trained based on the PP-OCRv3 recognition model, supporting Kannada and numeric character recognition. Note: If any character (including punctuation) in a line was incorrect, the entire line was marked as wrong. This ensures higher accuracy in practical applications. Please refer to the following commands to install PaddlePaddle using pip: For details about PaddlePaddle installation, please refer to the PaddlePaddle official website. Install the latest version of the PaddleOCR inference package from PyPI: You can quickly experience the functionality with a single command: You can also integrate the model inference of the text recognition module into your project. Before running the following code, please download the sample image to your local machine. For details about usage command and descriptions of parameters, please refer to the Document. The ability of a single model is limited. But the pipeline consists of several models can provide more capacity to resolve difficult problems in real-world scenarios. The general OCR pipeline is used to solve text recognition tasks by extracting text information from images and outputting it in string format. And there are 5 modules in the pipeline: Document Image Orientation Classification Module (Optional) Text Image Unwarping Module (Optional) Text Line Orientation Classification Module (Optional) Text Detection Module Text Recognition Module Run a single command to quickly experience the OCR pipeline: The command-line method is for quick experience. For project integration, also only a few codes are needed as well: The default model used in pipeline is `PP-OCRv5serverrec`, so it is needed that specifing to `kaPP-OCRv3mobilerec` by argument `textrecognitionmodelname`. And you can also use the local model file by argument `textrecognitionmodeldir`. For details about usage command and descriptions of parameters, please refer to the Document.