opencv

26 models • 2 total models in database

Sort by:

facial_expression_recognition

Face Detection Yunet

YuNet is a light-weight, fast and accurate face detection model, which achieves 0.834(APeasy), 0.824(APmedium), 0.708(APhard) on the WIDER Face validation set. - Model source: here. - This model can detect faces of pixels between around 10x10 to 300x300 due to the training scheme. - For details on training this model, please visit https://github.com/ShiqiYu/libfacedetection.train. - This ONNX model has fixed input shape, but OpenCV DNN infers on the exact shape of input image. See https://github.com/opencv/opencvzoo/issues/44 for more information. - `facedetectionyunet2023marint8bq.onnx` represents the block-quantized version in int8 precision and is generated using blockquantize.py with `blocksize=64`. - Paper source: Yunet: A tiny millisecond-level face detector. | Models | Easy AP | Medium AP | Hard AP | | ----------- | ------- | --------- | ------- | | YuNet | 0.8844 | 0.8656 | 0.7503 | | YuNet block | 0.8845 | 0.8652 | 0.7504 | | YuNet quant | 0.8810 | 0.8629 | 0.7503 | \: 'quant' stands for 'quantized'. \\: 'block' stands for 'blockwise quantized'. Install latest OpenCV and CMake >= 3.24.0 to get started with: All files in this directory are licensed under MIT License. - https://github.com/ShiqiYu/libfacedetection - https://github.com/ShiqiYu/libfacedetection.train If you use `YuNet` in your work, please use the following BibTeX entries:

—

handpose_estimation_mediapipe

This model estimates 21 hand keypoints per detected hand from palm detector. (The image below is referenced from MediaPipe Hands Keypoints) This model is converted from TFlite to ONNX using following tools: - TFLite model to ONNX: https://github.com/onnx/tensorflow-onnx - simplified by onnx-simplifier Note: - The int8-quantized model may produce invalid results due to a significant drop of accuracy. - Visit https://github.com/google/mediapipe/blob/master/docs/solutions/models.md#hands for models of larger scale. - `handposeestimationmediapipe2023febint8bq.onnx` represents the block-quantized version in int8 precision and is generated using blockquantize.py with `blocksize=64`. All files in this directory are licensed under Apache 2.0 License. - MediaPipe Handpose: https://developers.google.com/mediapipe/solutions/vision/handlandmarker - MediaPipe hands model and model card: https://github.com/google/mediapipe/blob/master/docs/solutions/models.md#hands - Handpose TFJS:https://github.com/tensorflow/tfjs-models/tree/master/handpose - Int8 model quantized with rgb evaluation set of FreiHAND: https://lmb.informatik.uni-freiburg.de/resources/datasets/FreihandDataset.en.html

—

inpainting_lama

LaMa is a very lightweight yet powerful image inpainting model. Requirements Install latest OpenCV >=5.0.0 and CMake >= 3.22.1 to get started with. All files in this directory are licensed under Apache License.

—

object_tracking_vittrack

—

Optical Flow Estimation Raft

RAFT This model is originally created by Zachary Teed and Jia Deng of Princeton University. The source code for the model is at their repository on GitHub, and the original research paper is published on Arxiv. The model was converted to ONNX by PINTO0309 in his model zoo. The ONNX model has several variations depending on the training dataset and input dimesnions. The model used in this demo is trained on Sintel dataset with input size of 360 $\times$ 480. Note: - `opticalflowestimationraft2023augint8bq.onnx` represents the block-quantized version in int8 precision and is generated using blockquantize.py with `blocksize=64`. While running on video, you can press q anytime to stop. The model demo runs on camera input, video input, or takes two images to compute optical flow across frames. The save and vis arguments of the shell command are only valid in the case of using video or two images as input. To run a different variation of the model, such as a model trained on a different dataset or with a different input size, refer to RAFT ONNX in PINTO Model Zoo to download your chosen model. And if your chosen model has different input shape from 360 $\times$ 480, change the input shape in raft.py line 15 to the new input shape. Then, add the model path to the --model argument of the shell command, such as in the following example commands: Example outputs The visualization argument displays both image inputs as well as out result. The original RAFT model is under BSD-3-Clause license. The conversion of the RAFT model to the ONNX format by PINTO0309 is under MIT License. Some of the code in demo.py and raft.py is adapted from ibaiGorordo's repository under BSD-3-Clause license. - https://arxiv.org/abs/2003.12039 - https://github.com/princeton-vl/RAFT - https://github.com/ibaiGorordo/ONNX-RAFT-Optical-Flow-Estimation/tree/main - https://github.com/PINTO0309/PINTOmodelzoo/tree/main/252RAFT

—

edge_detection_dexined

—

face_image_quality_assessment_ediffiqa

eDifFIQA(T) is a light-weight version of the models presented in the paper eDifFIQA: Towards Efficient Face Image Quality Assessment based on Denoising Diffusion Probabilistic Models, it achieves state-of-the-art results in the field of face image quality assessment. - The original implementation can be found here. - The included model combines a pretrained MobileFaceNet backbone, with a quality regression head trained using the proceedure presented in the original paper. - The model predicts quality scores of aligned face samples, where a higher predicted score corresponds to a higher quality of the input sample. - In the figure below we show the quality distribution on two distinct datasets: LFW[[1]](#1) and XQLFW[[2]](#2). The LFW dataset contains images of relatively high quality, whereas the XQLFW dataset contains images of variable quality. There is a clear difference between the two distributions, with high quality images from the LFW dataset receiving quality scores higher than 0.5, while the mixed images from XQLFW receive much lower quality scores on average. [1] B. Huang, M. Ramesh, T. Berg, and E. Learned-Miller “Labeled Faces in the Wild: A Database for Studying Face Recognition in Unconstrained Environments” University of Massachusetts, Amherst, Tech. Rep. 07-49, October 2007. [2] M. Knoche, S. Hormann, and G. Rigoll “Cross-Quality LFW: A Database for Analyzing Cross-Resolution Image Face Recognition in Unconstrained Environments,” in Proceedings of the IEEE International Conference on Automatic Face and Gesture Recognition (FG), 2021, pp. 1–5. NOTE: The provided demo uses ../facedetectionyunet for face detection, in order to properly align the face samples, while the original implementation uses a RetinaFace(ResNet50) model, which might cause some differences between the results of the two implementations. The demo outputs the quality of the sample via terminal (print) and via image in results.jpg. All files in this directory are licensed under CC-BY-4.0.

—

object_detection_yolox

—

person_detection_mediapipe

This model detects upper body and full body keypoints of a person, and is downloaded from https://github.com/PINTO0309/PINTOmodelzoo/blob/main/053BlazePose/20densifyposedetection/download.sh or converted from TFLite to ONNX using following tools: - TFLite model to ONNX with MediaPipe custom `densify` op: https://github.com/PINTO0309/tflite2tensorflow - simplified by onnx-simplifier SSD Anchors are generated from GenMediaPipePalmDectionSSDAnchors Note: - `persondetectionmediapipe2023marint8bq.onnx` represents the block-quantized version in int8 precision and is generated using blockquantize.py with `blocksize=64`. Install latest OpenCV and CMake >= 3.24.0 to get started with: All files in this directory are licensed under Apache 2.0 License. Reference - MediaPipe Pose: https://developers.google.com/mediapipe/solutions/vision/poselandmarker - MediaPipe pose model and model card: https://github.com/google/mediapipe/blob/master/docs/solutions/models.md#pose - BlazePose TFJS: https://github.com/tensorflow/tfjs-models/tree/master/pose-detection/src/blazeposetfjs

—

person_reid_youtureid

This model is provided by Tencent Youtu Lab [[Credits]](https://github.com/opencv/opencv/blob/394e640909d5d8edf9c1f578f8216d513373698c/samples/dnn/personreid.py#L6-L11). Note: - Model source: https://github.com/ReID-Team/ReIDextratestdata - `personreidyoutu2021novint8bq.onnx` represents the block-quantized version in int8 precision and is generated using blockquantize.py with `blocksize=64`. All files in this directory are licensed under Apache 2.0 License. - OpenCV DNN Sample: https://github.com/opencv/opencv/blob/4.x/samples/dnn/personreid.py - Model source: https://github.com/ReID-Team/ReIDextratestdata

—

pose_estimation_mediapipe

This model estimates 33 pose keypoints and person segmentation mask per detected person from person detector. (The image below is referenced from MediaPipe Pose Keypoints) This model is converted from TFlite to ONNX using following tools: - TFLite model to ONNX: https://github.com/onnx/tensorflow-onnx - simplified by onnx-simplifier Note: - Visit https://github.com/google/mediapipe/blob/master/docs/solutions/models.md#pose for models of larger scale. - `poseestimationmediapipe2023marint8bq.onnx` represents the block-quantized version in int8 precision and is generated using blockquantize.py with `blocksize=64`. Install latest OpenCV and CMake >= 3.24.0 to get started with: All files in this directory are licensed under Apache 2.0 License. Reference - MediaPipe Pose: https://developers.google.com/mediapipe/solutions/vision/poselandmarker - MediaPipe pose model and model card: https://github.com/google/mediapipe/blob/master/docs/solutions/models.md#pose - BlazePose TFJS: https://github.com/tensorflow/tfjs-models/tree/master/pose-detection/src/blazeposetfjs

—

text_detection_ppocr

PP-OCRv3: More Attempts for the Improvement of Ultra Lightweight OCR System. - The int8 quantization model may produce unstable results due to some loss of accuracy. - Original Paddle Models source of English: here. - Original Paddle Models source of Chinese: here. - `IC15` in the filename means the model is trained on IC15 dataset, which can detect English text instances only. - `TD500` in the filename means the model is trained on TD500 dataset), which can detect both English & Chinese instances. - Visit https://docs.opencv.org/master/d4/d43/tutorialdnntextspotting.html for more information. - `textdetectionxxppocrv32023mayint8bq.onnx` represents the block-quantized version in int8 precision and is generated using blockquantize.py with `blocksize=64`. Install latest OpenCV and CMake >= 3.24.0 to get started with: All files in this directory are licensed under Apache 2.0 License. - https://arxiv.org/abs/2206.03001 - https://github.com/PaddlePaddle/PaddleOCR - https://docs.opencv.org/master/d4/d43/tutorialdnntextspotting.html

—

image_segmentation_efficientsam

EfficientSAM: Leveraged Masked Image Pretraining for Efficient Segment Anything Notes: - The current implementation of the EfficientSAM demo uses the EfficientSAM-Ti model, which is specifically tailored for scenarios requiring higher speed and lightweight. - imagesegmentationefficientsamti2024may.onnx(supports only single point infering) - MD5 value: 117d6a6cac60039a20b399cc133c2a60 - SHA-256 value: e3957d2cd1422855f350aa7b044f47f5b3eafada64b5904ed330b696229e2943 - imagesegmentationefficientsamti2025april.onnx - MD5 value: f23cecbb344547c960c933ff454536a3 - SHA-256 value: 4eb496e0a7259d435b49b66faf1754aa45a5c382a34558ddda9a8c6fe5915d77 - imagesegmentationefficientsamti2025aprilint8.onnx - MD5 value: a1164f44b0495b82e9807c7256e95a50 - SHA-256 value: 5ecc8d59a2802c32246e68553e1cf8ce74cf74ba707b84f206eb9181ff774b4e Click to select foreground points, drag to use box to select and long press to select background points on the object you wish to segment in the displayed image. After clicking the Enter, the segmentation result will be shown in a new window. Clicking the Backspace to clear all the prompts. Here are some of the sample results that were observed using the model: All files in this directory are licensed under Apache 2.0 License. - https://arxiv.org/abs/2312.00863 - https://github.com/yformer/EfficientSAM - https://github.com/facebookresearch/segment-anything

—

license_plate_detection_yunet

This model is contributed by Dong Xu (徐栋) from watrix.ai (银河水滴). Please note that the model is trained with Chinese license plates, so the detection results of other license plates with this model may be limited. Note: - `licenseplatedetectionlpdyunet2023marint8bq.onnx` represents the block-quantized version in int8 precision and is generated using blockquantize.py with `blocksize=64`. All files in this directory are licensed under Apache 2.0 License - https://github.com/ShiqiYu/libfacedetection.train

—

object_detection_nanodet

—

palm_detection_mediapipe

This model detects palm bounding boxes and palm landmarks, and is converted from TFLite to ONNX using following tools: - TFLite model to ONNX: https://github.com/onnx/tensorflow-onnx - simplified by onnx-simplifier SSD Anchors are generated from GenMediaPipePalmDectionSSDAnchors Note: - Visit https://github.com/google/mediapipe/blob/master/docs/solutions/models.md#hands for models of larger scale. - `palmdetectionmediapipe2023febint8bq.onnx` represents the block-quantized version in int8 precision and is generated using blockquantize.py with `blocksize=64`. Install latest OpenCV (with opencvcontrib) and CMake >= 3.24.0 to get started with: All files in this directory are licensed under Apache 2.0 License. - MediaPipe Handpose: https://developers.google.com/mediapipe/solutions/vision/handlandmarker - MediaPipe hands model and model card: https://github.com/google/mediapipe/blob/master/docs/solutions/models.md#hands - Handpose TFJS:https://github.com/tensorflow/tfjs-models/tree/master/handpose - Int8 model quantized with rgb evaluation set of FreiHAND: https://lmb.informatik.uni-freiburg.de/resources/datasets/FreihandDataset.en.html

—

qrcode_wechatqrcode

WeChatQRCode for detecting and parsing QR Code, contributed by WeChat Computer Vision Team (WeChatCV). Visit opencv/opencvcontrib/modules/wechatqrcode for more details. - Model source: opencv/opencv3rdparty:wechatqrcode20210119 - The APIs `cv::wechatqrcode::WeChatQRCode` (C++) & `cv.wechatqrcodeWeChatQRCode` (Python) are both designed to run on default backend (OpenCV) and target (CPU) only. Therefore, benchmark results of this model are only available on CPU devices, until the APIs are updated with setting backends and targets. Install latest OpenCV (with opencvcontrib) and CMake >= 3.24.0 to get started with: All files in this directory are licensed under Apache 2.0 License. - https://github.com/opencv/opencvcontrib/tree/master/modules/wechatqrcode - https://github.com/opencv/opencv3rdparty/tree/wechatqrcode20210119

—

text_recognition_crnn

—

opencv_contribution

Welcome to the OpenCV Model Zoo! A zoo for models tuned for OpenCV DNN. OpenCV Zoo is licensed under the Apache 2.0 license. Please refer to licenses of different models.

—