deepghs
idolsankaku-eva02-large-tagger-v1
animefull-latest
idolsankaku-swinv2-tagger-v1
Trained using https://github.com/SmilingWolf/JAX-CV. TPUs used for training kindly provided by the TRC program. Dataset Trained on a human annotated dataset of real world photos. Validation results `v1.0: P=R: threshold = 0.3094, F1 = 0.6161` What's new Model v1.0/Dataset v1: First version of the dataset, tags updated on 2024-08-31. `timm` compatible! Load it up and give it a spin using the canonical one-liner! ONNX model is compatible with code developed for the v3 series of WD tagger models. The batch dimension of the ONNX model is not fixed to 1 anymore. Now you can go crazy with batch inference. Switched to Macro-F1 to measure model performance since it gives me a better gauge of overall training progress. Runtime deps ONNX model requires `onnxruntime >= 1.17.0` Inference code examples For timm: https://github.com/neggles/wdv3-timm For ONNX: https://huggingface.co/spaces/SmilingWolf/wd-tagger For JAX: https://github.com/SmilingWolf/wdv3-jax Final words Subject to change and updates. Downstream users are encouraged to use tagged releases rather than relying on the head of the repo. Thanks Thanks to the whole DeepGHS team for data gathering and encouraging me to push the models much further than they had any reason to attempt to reach, much less succeed.
animefull-latest-ckpt
whisper-medium-fleurs-lang-id-onnx
whisper-base-ft-common-language-id-onnx
whisper-small-ft-common-language-id-onnx
pixai-tagger-v0.9-onnx
This is the ONNX-exported version of PixAI's tagger pixai-labs/pixai-tagger-v0.9. - Model Type: Multilabel Image classification / feature backbone - Model Stats: - Params: 317.9M - FLOPs / MACs: 620.9G / 310.1G - Image size: 448 x 448 - Tags Count: 13461 - General (#0) Tags Count: 9741 - Character (#4) Tags Count: 3720 | Category | Name | Count | Threshold | |-----------:|:----------|--------:|------------:| | 0 | general | 9741 | 0.3 | | 4 | character | 3720 | 0.85 | We provided a sample image for our code samples, you can find it here. You can use function `getpixaitags` to tag your image. For more details of this function, see documentation of imgutils.tagging.pixai.
anime_face_detection
anime_classification
wd14_tagger_with_embeddings
paddleocr
Manga109 Yolo
| Model | Type | FLOPS | Params | F1 Score | Threshold | precision(B) | recall(B) | mAP50(B) | mAP50-95(B) | F1 Plot | Confusion | Labels | |:------------------:|:------:|:-------:|:--------:|:----------:|:-----------:|:--------------:|:-----------:|:----------:|:-------------:|:----------------------------------------------------------------------------------------------:|:----------------------------------------------------------------------------------------------------------------------:|:-------------------------------:| | v2023.12.07lyv11 | yolo | 87.3G | 25.3M | 0.92 | 0.373 | 0.93135 | 0.90668 | 0.95178 | 0.76254 | plot | confusion | `body`, `face`, `frame`, `text` | | v2023.12.07myv11 | yolo | 68.2G | 20.1M | 0.92 | 0.385 | 0.93086 | 0.90271 | 0.94808 | 0.75655 | plot | confusion | `body`, `face`, `frame`, `text` | | v2023.12.07syv11 | yolo | 21.6G | 9.43M | 0.9 | 0.383 | 0.92385 | 0.88507 | 0.93763 | 0.72903 | plot | confusion | `body`, `face`, `frame`, `text` | | v2023.12.07nyv11 | yolo | 6.44G | 2.59M | 0.88 | 0.361 | 0.90721 | 0.85409 | 0.91619 | 0.69051 | plot | confusion | `body`, `face`, `frame`, `text` | | v2023.12.07x | yolo | 258G | 68.2M | 0.92 | 0.355 | 0.93006 | 0.91027 | 0.95026 | 0.76595 | plot | confusion | `body`, `face`, `frame`, `text` | | v2023.12.07l | yolo | 165G | 43.6M | 0.92 | 0.387 | 0.93106 | 0.9027 | 0.94846 | 0.7599 | plot | confusion | `body`, `face`, `frame`, `text` | | v2023.12.07m | yolo | 79.1G | 25.9M | 0.91 | 0.376 | 0.93026 | 0.89896 | 0.94642 | 0.7526 | plot | confusion | `body`, `face`, `frame`, `text` | | v2023.12.07s | yolo | 28.7G | 11.1M | 0.9 | 0.379 | 0.92128 | 0.88426 | 0.93551 | 0.72422 | plot | confusion | `body`, `face`, `frame`, `text` | | v2023.12.07n | yolo | 8.2G | 3.01M | 0.88 | 0.366 | 0.90141 | 0.85407 | 0.91384 | 0.68565 | plot | confusion | `body`, `face`, `frame`, `text` | | v2021.12.30s | yolo | 28.7G | 11.1M | 0.9 | 0.374 | 0.92144 | 0.88396 | 0.93627 | 0.72851 | plot | confusion | `body`, `face`, `frame`, `text` | | v2021.12.30n | yolo | 8.2G | 3.01M | 0.88 | 0.355 | 0.90894 | 0.85875 | 0.91804 | 0.69184 | plot | confusion | `body`, `face`, `frame`, `text` | | v2021.12.30xyv11 | yolo | 195G | 56.9M | 0.92 | 0.366 | 0.93145 | 0.91217 | 0.95416 | 0.76876 | plot | confusion | `body`, `face`, `frame`, `text` | | v2021.12.30lyv11 | yolo | 87.3G | 25.3M | 0.92 | 0.383 | 0.93265 | 0.90913 | 0.9527 | 0.7648 | plot | confusion | `body`, `face`, `frame`, `text` | | v2021.12.30myv11 | yolo | 68.2G | 20.1M | 0.92 | 0.372 | 0.93163 | 0.90535 | 0.95104 | 0.76011 | plot | confusion | `body`, `face`, `frame`, `text` | | v2021.12.30nyv11 | yolo | 6.44G | 2.59M | 0.88 | 0.356 | 0.9086 | 0.85731 | 0.91858 | 0.69587 | plot | confusion | `body`, `face`, `frame`, `text` | | v2021.12.30syv11 | yolo | 21.6G | 9.43M | 0.9 | 0.368 | 0.9232 | 0.88748 | 0.93949 | 0.73393 | plot | confusion | `body`, `face`, `frame`, `text` |
ccip
anime_censor_detection
anime_aesthetic
yolo-face
imgutils-models
anime_person_detection
text_detection
silero-vad-onnx
siglip_beta
WARNING: Do not consider anything in this repo production ready. - siglipswinv2base2025022218h56m54s Text encoder trained on top of frozen SmilingWolf/wd-swinv2-tagger-v3, so pretty much SigLIT style. Compatible with existing DeepGHS indexes/embeddings. - siglipswinv2base2025050222h02m36s Based on `siglipswinv2base2025022218h56m54s`, with unfrozen image encoder. So SigLIP with warm start, I guess. - siglipeva02base2025050221h53m54s A test with a different architecture, trained from scratch using SigLIP. See deepghs/searchimagebyimageortext for example usage. The checkpoints in this repo have been structured for compatibility with the dghs-imgutils package. You can run it locally with the following 2 approaches
ccip_onnx
anime_real_cls
ml-danbooru-onnx
anime_head_detection
AnimeText_yolo
Insightface
We evaluated all these models with some evaluation datasets on face recognition. CFPW (500 ids/7K images/7K pairs)[1] LFW (5749 ids/13233 images/6K pairs)[2] CALFW (5749 ids/13233 images/6K pairs)[3] CPLFW (5749 ids/13233 images/6K pairs)[4] Below are the complete results and recommended thresholds. Det: Success rate of face detection and landmark localization. Rec-F1: Maximum F1 score achieved in face recognition. Rec-Thresh: Optimal threshold determined by the maximum F1 score. | Model | Eval ALL (Det/Rec-F1/Rec-Thresh) | Eval CALFW (Det/Rec-F1/Rec-Thresh) | Eval CFPW (Det/Rec-F1/Rec-Thresh) | Eval CPLFW (Det/Rec-F1/Rec-Thresh) | Eval LFW (Det/Rec-F1/Rec-Thresh) | |:----------|:-----------------------------------|:-------------------------------------|:------------------------------------|:-------------------------------------|:-----------------------------------| | buffalol | 99.88% / 98.34% / 0.2203 | 100.00% / 95.75% / 0.2273 | 99.99% / 99.66% / 0.1866 | 99.48% / 96.41% / 0.2207 | 100.00% / 99.85% / 0.2469 | | buffalos | 99.49% / 96.87% / 0.1994 | 99.99% / 94.45% / 0.2124 | 99.65% / 98.64% / 0.1845 | 98.04% / 92.61% / 0.2019 | 100.00% / 99.68% / 0.2314 | [1] Sengupta Soumyadip, Chen Jun-Cheng, Castillo Carlos, Patel Vishal M, Chellappa Rama, Jacobs David W, Frontal to profile face verification in the wild, WACV, 2016. [2] Gary B. Huang, Manu Ramesh, Tamara Berg, and Erik Learned-Miller. Labeled Faces in the Wild: A Database for Studying Face Recognition in Unconstrained Environments, 2007. [3] Zheng Tianyue, Deng Weihong, Hu Jiani, Cross-age lfw: A database for studying cross-age face recognition in unconstrained environments, arXiv:1708.08197, 2017. [4] Zheng, Tianyue, and Weihong Deng. Cross-Pose LFW: A Database for Studying Cross-Pose Face Recognition in Unconstrained Environments, 2018.