😊
biometrics

Emotion Recognition

Detect emotions from facial expressions or voice - happiness, sadness, anger, surprise. Great for customer service, mental health apps, and gaming.

20
Models Found

Common Applications

Customer service quality monitoring
Mental health & wellness apps
Gaming & entertainment
Video conferencing insights
Education & training feedback

Top Models

20 models • Sorted by downloads
#1

Model Card: Fine-Tuned Vision Transformer (ViT) for NSFW Image Classification The Fine-Tuned Vision Transformer (ViT) is a variant of the transformer encoder architecture, similar to BERT, that has been adapted for image classification tasks. This specific model, named "google/vit-base-patch16-224-in21k," is pre-trained on a substantial collection of images in a supervised manner, leveraging the ImageNet-21k dataset. The images in the pre-training dataset are resized to a resolution of 224x224 pixels, making it suitable for a wide range of image recognition tasks. During the training phase, meticulous attention was given to hyperparameter settings to ensure optimal model performance. The model was fine-tuned with a judiciously chosen batch size of 16. This choice not only balanced computational efficiency but also allowed for the model to effectively process and learn from a diverse array of images. To facilitate this fine-tuning process, a learning rate of 5e-5 was employed. The learning rate serves as a critical tuning parameter that dictates the magnitude of adjustments made to the model's parameters during training. In this case, a learning rate of 5e-5 was selected to strike a harmonious balance between rapid convergence and steady optimization, resulting in a model that not only learns swiftly but also steadily refines its capabilities throughout the training process. This training phase was executed using a proprietary dataset containing an extensive collection of 80,000 images, each characterized by a substantial degree of variability. The dataset was thoughtfully curated to include two distinct classes, namely "normal" and "nsfw." This diversity allowed the model to grasp nuanced visual patterns, equipping it with the competence to accurately differentiate between safe and explicit content. The overarching objective of this meticulous training process was to impart the model with a deep understanding of visual cues, ensuring its robustness and competence in tackling the specific task of NSFW image classification. The result is a model that stands ready to contribute significantly to content safety and moderation, all while maintaining the highest standards of accuracy and reliability. Intended Uses & Limitations Intended Uses - NSFW Image Classification: The primary intended use of this model is for the classification of NSFW (Not Safe for Work) images. It has been fine-tuned for this purpose, making it suitable for filtering explicit or inappropriate content in various applications. How to use Here is how to use this model to classifiy an image based on 1 of 2 classes (normal,nsfw): - 'evalloss': 0.07463177293539047, - 'evalaccuracy': 0.980375, - 'evalruntime': 304.9846, - 'evalsamplespersecond': 52.462, - 'evalstepspersecond': 3.279 Note: It's essential to use this model responsibly and ethically, adhering to content guidelines and applicable regulations when implementing it in real-world applications, particularly those involving potentially sensitive content. For more details on model fine-tuning and usage, please refer to the model's documentation and the model hub. - Hugging Face Model Hub - Vision Transformer (ViT) Paper - ImageNet-21k Dataset Disclaimer: The model's performance may be influenced by the quality and representativeness of the data it was fine-tuned on. Users are encouraged to assess the model's suitability for their specific applications and datasets.

70.7M downloads
890 likes
PYTORCH

Detects age group with about 59% accuracy based on an image. See https://www.kaggle.com/code/dima806/age-group-image-classification-vit for details.

26.2M downloads
52 likes
OTHER

A MobileNet-v3 image classification model. Trained on ImageNet-1k in `timm` using recipe template described below. Recipe details: A LAMB optimizer based recipe that is similar to ResNet Strikes Back `A2` but 50% longer with EMA weight averaging, no CutMix Step (exponential decay w/ staircase) LR schedule with warmup Model Details - Model Type: Image classification / feature backbone - Model Stats: - Params (M): 2.5 - GMACs: 0.1 - Activations (M): 1.4 - Image size: 224 x 224 - Papers: - Searching for MobileNetV3: https://arxiv.org/abs/1905.02244 - Dataset: ImageNet-1k - Original: https://github.com/huggingface/pytorch-image-models Model Comparison Explore the dataset and runtime metrics of this model in timm model results.

23.4M downloads
41 likes
PYTORCH
#4

XTTS-v2

by coqui

ⓍTTS ⓍTTS is a Voice generation model that lets you clone voices into different languages by using just a quick 6-second audio clip. There is no need for an excessive amount of training data that spans countless hours. This is the same or similar model to what powers Coqui Studio and Coqui API. Features - Supports 17 languages. - Voice cloning with just a 6-second audio clip. - Emotion and style transfer by cloning. - Cross-language voice cloning. - Multi-lingual speech generation. - 24khz sampling rate. Updates over XTTS-v1 - 2 new languages; Hungarian and Korean - Architectural improvements for speaker conditioning. - Enables the use of multiple speaker references and interpolation between speakers. - Stability improvements. - Better prosody and audio quality across the board. Languages XTTS-v2 supports 17 languages: English (en), Spanish (es), French (fr), German (de), Italian (it), Portuguese (pt), Polish (pl), Turkish (tr), Russian (ru), Dutch (nl), Czech (cs), Arabic (ar), Chinese (zh-cn), Japanese (ja), Hungarian (hu), Korean (ko) Hindi (hi). Stay tuned as we continue to add support for more languages. If you have any language requests, feel free to reach out! Code The code-base supports inference and fine-tuning. Demo Spaces - XTTS Space : You can see how model performs on supported languages, and try with your own reference or microphone input - XTTS Voice Chat with Mistral or Zephyr : You can experience streaming voice chat with Mistral 7B Instruct or Zephyr 7B Beta | | | | ------------------------------- | --------------------------------------- | | 🐸💬 CoquiTTS | coqui/TTS on Github| | 💼 Documentation | ReadTheDocs | 👩‍💻 Questions | GitHub Discussions | | 🗯 Community | Discord | License This model is licensed under Coqui Public Model License. There's a lot that goes into a license for generative models, and you can read more of the origin story of CPML here. Contact Come and join in our 🐸Community. We're active on Discord and Twitter. You can also mail us at [email protected].

5.1M downloads
3.2K likes
OTHER

This model features: ReLU activations single layer 7x7 convolution with pooling 1x1 convolution shortcut downsample Trained on ImageNet-1k in `timm` using recipe template described below. Recipe details: ResNet Strikes Back `A1` recipe LAMB optimizer with BCE loss Cosine LR schedule with warmup Model Details - Model Type: Image classification / feature backbone - Model Stats: - Params (M): 25.6 - GMACs: 4.1 - Activations (M): 11.1 - Image size: train = 224 x 224, test = 288 x 288 - Papers: - ResNet strikes back: An improved training procedure in timm: https://arxiv.org/abs/2110.00476 - Deep Residual Learning for Image Recognition: https://arxiv.org/abs/1512.03385 - Original: https://github.com/huggingface/pytorch-image-models Model Comparison Explore the dataset and runtime metrics of this model in timm model results. |model |imgsize|top1 |top5 |paramcount|gmacs|macts|img/sec| |------------------------------------------|--------|-----|-----|-----------|-----|-----|-------| |seresnextaa101d32x8d.swin12kftin1k288|320 |86.72|98.17|93.6 |35.2 |69.7 |451 | |seresnextaa101d32x8d.swin12kftin1k288|288 |86.51|98.08|93.6 |28.5 |56.4 |560 | |seresnextaa101d32x8d.swin12kftin1k|288 |86.49|98.03|93.6 |28.5 |56.4 |557 | |seresnextaa101d32x8d.swin12kftin1k|224 |85.96|97.82|93.6 |17.2 |34.2 |923 | |resnext10132x32d.fbwslig1bftin1k|224 |85.11|97.44|468.5 |87.3 |91.1 |254 | |resnetrs420.tfin1k|416 |85.0 |97.12|191.9 |108.4|213.8|134 | |ecaresnet269d.ra2in1k|352 |84.96|97.22|102.1 |50.2 |101.2|291 | |ecaresnet269d.ra2in1k|320 |84.73|97.18|102.1 |41.5 |83.7 |353 | |resnetrs350.tfin1k|384 |84.71|96.99|164.0 |77.6 |154.7|183 | |seresnextaa101d32x8d.ahin1k|288 |84.57|97.08|93.6 |28.5 |56.4 |557 | |resnetrs200.tfin1k|320 |84.45|97.08|93.2 |31.5 |67.8 |446 | |resnetrs270.tfin1k|352 |84.43|96.97|129.9 |51.1 |105.5|280 | |seresnext101d32x8d.ahin1k|288 |84.36|96.92|93.6 |27.6 |53.0 |595 | |seresnet152d.ra2in1k|320 |84.35|97.04|66.8 |24.1 |47.7 |610 | |resnetrs350.tfin1k|288 |84.3 |96.94|164.0 |43.7 |87.1 |333 | |resnext10132x8d.fbswslig1bftin1k|224 |84.28|97.17|88.8 |16.5 |31.2 |1100 | |resnetrs420.tfin1k|320 |84.24|96.86|191.9 |64.2 |126.6|228 | |seresnext10132x8d.ahin1k|288 |84.19|96.87|93.6 |27.2 |51.6 |613 | |resnext10132x16d.fbwslig1bftin1k|224 |84.18|97.19|194.0 |36.3 |51.2 |581 | |resnetaa101d.swin12kftin1k|288 |84.11|97.11|44.6 |15.1 |29.0 |1144 | |resnet200d.ra2in1k|320 |83.97|96.82|64.7 |31.2 |67.3 |518 | |resnetrs200.tfin1k|256 |83.87|96.75|93.2 |20.2 |43.4 |692 | |seresnextaa101d32x8d.ahin1k|224 |83.86|96.65|93.6 |17.2 |34.2 |923 | |resnetrs152.tfin1k|320 |83.72|96.61|86.6 |24.3 |48.1 |617 | |seresnet152d.ra2in1k|256 |83.69|96.78|66.8 |15.4 |30.6 |943 | |seresnext101d32x8d.ahin1k|224 |83.68|96.61|93.6 |16.7 |32.0 |986 | |resnet152d.ra2in1k|320 |83.67|96.74|60.2 |24.1 |47.7 |706 | |resnetrs270.tfin1k|256 |83.59|96.61|129.9 |27.1 |55.8 |526 | |seresnext10132x8d.ahin1k|224 |83.58|96.4 |93.6 |16.5 |31.2 |1013 | |resnetaa101d.swin12kftin1k|224 |83.54|96.83|44.6 |9.1 |17.6 |1864 | |resnet152.a1hin1k|288 |83.46|96.54|60.2 |19.1 |37.3 |904 | |resnext10132x16d.fbswslig1bftin1k|224 |83.35|96.85|194.0 |36.3 |51.2 |582 | |resnet200d.ra2in1k|256 |83.23|96.53|64.7 |20.0 |43.1 |809 | |resnext10132x4d.fbswslig1bftin1k|224 |83.22|96.75|44.2 |8.0 |21.2 |1814 | |resnext10164x4d.c1in1k|288 |83.16|96.38|83.5 |25.7 |51.6 |590 | |resnet152d.ra2in1k|256 |83.14|96.38|60.2 |15.4 |30.5 |1096 | |resnet101d.ra2in1k|320 |83.02|96.45|44.6 |16.5 |34.8 |992 | |ecaresnet101d.miilin1k|288 |82.98|96.54|44.6 |13.4 |28.2 |1077 | |resnext10164x4d.tvin1k|224 |82.98|96.25|83.5 |15.5 |31.2 |989 | |resnetrs152.tfin1k|256 |82.86|96.28|86.6 |15.6 |30.8 |951 | |resnext10132x8d.tv2in1k|224 |82.83|96.22|88.8 |16.5 |31.2 |1099 | |resnet152.a1hin1k|224 |82.8 |96.13|60.2 |11.6 |22.6 |1486 | |resnet101.a1hin1k|288 |82.8 |96.32|44.6 |13.0 |26.8 |1291 | |resnet152.a1in1k|288 |82.74|95.71|60.2 |19.1 |37.3 |905 | |resnext10132x8d.fbwslig1bftin1k|224 |82.69|96.63|88.8 |16.5 |31.2 |1100 | |resnet152.a2in1k|288 |82.62|95.75|60.2 |19.1 |37.3 |904 | |resnetaa50d.swin12kftin1k|288 |82.61|96.49|25.6 |8.9 |20.6 |1729 | |resnet61q.ra2in1k|288 |82.53|96.13|36.8 |9.9 |21.5 |1773 | |wideresnet1012.tv2in1k|224 |82.5 |96.02|126.9 |22.8 |21.2 |1078 | |resnext10164x4d.c1in1k|224 |82.46|95.92|83.5 |15.5 |31.2 |987 | |resnet51q.ra2in1k|288 |82.36|96.18|35.7 |8.1 |20.9 |1964 | |ecaresnet50t.ra2in1k|320 |82.35|96.14|25.6 |8.8 |24.1 |1386 | |resnet101.a1in1k|288 |82.31|95.63|44.6 |13.0 |26.8 |1291 | |resnetrs101.tfin1k|288 |82.29|96.01|63.6 |13.6 |28.5 |1078 | |resnet152.tv2in1k|224 |82.29|96.0 |60.2 |11.6 |22.6 |1484 | |wideresnet502.racmin1k|288 |82.27|96.06|68.9 |18.9 |23.8 |1176 | |resnet101d.ra2in1k|256 |82.26|96.07|44.6 |10.6 |22.2 |1542 | |resnet101.a2in1k|288 |82.24|95.73|44.6 |13.0 |26.8 |1290 | |seresnext5032x4d.racmin1k|288 |82.2 |96.14|27.6 |7.0 |23.8 |1547 | |ecaresnet101d.miilin1k|224 |82.18|96.05|44.6 |8.1 |17.1 |1771 | |resnext5032x4d.fbswslig1bftin1k|224 |82.17|96.22|25.0 |4.3 |14.4 |2943 | |ecaresnet50t.a1in1k|288 |82.12|95.65|25.6 |7.1 |19.6 |1704 | |resnext5032x4d.a1hin1k|288 |82.03|95.94|25.0 |7.0 |23.8 |1745 | |ecaresnet101dpruned.miilin1k|288 |82.0 |96.15|24.9 |5.8 |12.7 |1787 | |resnet61q.ra2in1k|256 |81.99|95.85|36.8 |7.8 |17.0 |2230 | |resnext10132x8d.tv2in1k|176 |81.98|95.72|88.8 |10.3 |19.4 |1768 | |resnet152.a1in1k|224 |81.97|95.24|60.2 |11.6 |22.6 |1486 | |resnet101.a1hin1k|224 |81.93|95.75|44.6 |7.8 |16.2 |2122 | |resnet101.tv2in1k|224 |81.9 |95.77|44.6 |7.8 |16.2 |2118 | |resnext10132x16d.fbsslyfcc100mftin1k|224 |81.84|96.1 |194.0 |36.3 |51.2 |583 | |resnet51q.ra2in1k|256 |81.78|95.94|35.7 |6.4 |16.6 |2471 | |resnet152.a2in1k|224 |81.77|95.22|60.2 |11.6 |22.6 |1485 | |resnetaa50d.swin12kftin1k|224 |81.74|96.06|25.6 |5.4 |12.4 |2813 | |ecaresnet50t.a2in1k|288 |81.65|95.54|25.6 |7.1 |19.6 |1703 | |ecaresnet50d.miilin1k|288 |81.64|95.88|25.6 |7.2 |19.7 |1694 | |resnext10132x8d.fbsslyfcc100mftin1k|224 |81.62|96.04|88.8 |16.5 |31.2 |1101 | |wideresnet502.tv2in1k|224 |81.61|95.76|68.9 |11.4 |14.4 |1930 | |resnetaa50.a1hin1k|288 |81.61|95.83|25.6 |8.5 |19.2 |1868 | |resnet101.a1in1k|224 |81.5 |95.16|44.6 |7.8 |16.2 |2125 | |resnext5032x4d.a1in1k|288 |81.48|95.16|25.0 |7.0 |23.8 |1745 | |gcresnet50t.ra2in1k|288 |81.47|95.71|25.9 |6.9 |18.6 |2071 | |wideresnet502.racmin1k|224 |81.45|95.53|68.9 |11.4 |14.4 |1929 | |resnet50d.a1in1k|288 |81.44|95.22|25.6 |7.2 |19.7 |1908 | |ecaresnet50t.ra2in1k|256 |81.44|95.67|25.6 |5.6 |15.4 |2168 | |ecaresnetlight.miilin1k|288 |81.4 |95.82|30.2 |6.8 |13.9 |2132 | |resnet50d.ra2in1k|288 |81.37|95.74|25.6 |7.2 |19.7 |1910 | |resnet101.a2in1k|224 |81.32|95.19|44.6 |7.8 |16.2 |2125 | |seresnet50.ra2in1k|288 |81.3 |95.65|28.1 |6.8 |18.4 |1803 | |resnext5032x4d.a2in1k|288 |81.3 |95.11|25.0 |7.0 |23.8 |1746 | |seresnext5032x4d.racmin1k|224 |81.27|95.62|27.6 |4.3 |14.4 |2591 | |ecaresnet50t.a1in1k|224 |81.26|95.16|25.6 |4.3 |11.8 |2823 | |gcresnext50ts.chin1k|288 |81.23|95.54|15.7 |4.8 |19.6 |2117 | |senet154.gluonin1k|224 |81.23|95.35|115.1 |20.8 |38.7 |545 | |resnet50.a1in1k|288 |81.22|95.11|25.6 |6.8 |18.4 |2089 | |resnet50gn.a1hin1k|288 |81.22|95.63|25.6 |6.8 |18.4 |676 | |resnet50d.a2in1k|288 |81.18|95.09|25.6 |7.2 |19.7 |1908 | |resnet50.fbswslig1bftin1k|224 |81.18|95.98|25.6 |4.1 |11.1 |3455 | |resnext5032x4d.tv2in1k|224 |81.17|95.34|25.0 |4.3 |14.4 |2933 | |resnext5032x4d.a1hin1k|224 |81.1 |95.33|25.0 |4.3 |14.4 |2934 | |seresnet50.a2in1k|288 |81.1 |95.23|28.1 |6.8 |18.4 |1801 | |seresnet50.a1in1k|288 |81.1 |95.12|28.1 |6.8 |18.4 |1799 | |resnet152s.gluonin1k|224 |81.02|95.41|60.3 |12.9 |25.0 |1347 | |resnet50.din1k|288 |80.97|95.44|25.6 |6.8 |18.4 |2085 | |gcresnet50t.ra2in1k|256 |80.94|95.45|25.9 |5.4 |14.7 |2571 | |resnext10132x4d.fbsslyfcc100mftin1k|224 |80.93|95.73|44.2 |8.0 |21.2 |1814 | |resnet50.c1in1k|288 |80.91|95.55|25.6 |6.8 |18.4 |2084 | |seresnext10132x4d.gluonin1k|224 |80.9 |95.31|49.0 |8.0 |21.3 |1585 | |seresnext10164x4d.gluonin1k|224 |80.9 |95.3 |88.2 |15.5 |31.2 |918 | |resnet50.c2in1k|288 |80.86|95.52|25.6 |6.8 |18.4 |2085 | |resnet50.tv2in1k|224 |80.85|95.43|25.6 |4.1 |11.1 |3450 | |ecaresnet50t.a2in1k|224 |80.84|95.02|25.6 |4.3 |11.8 |2821 | |ecaresnet101dpruned.miilin1k|224 |80.79|95.62|24.9 |3.5 |7.7 |2961 | |seresnet33ts.ra2in1k|288 |80.79|95.36|19.8 |6.0 |14.8 |2506 | |ecaresnet50dpruned.miilin1k|288 |80.79|95.58|19.9 |4.2 |10.6 |2349 | |resnet50.a2in1k|288 |80.78|94.99|25.6 |6.8 |18.4 |2088 | |resnet50.b1kin1k|288 |80.71|95.43|25.6 |6.8 |18.4 |2087 | |resnext5032x4d.rain1k|288 |80.7 |95.39|25.0 |7.0 |23.8 |1749 | |resnetrs101.tfin1k|192 |80.69|95.24|63.6 |6.0 |12.7 |2270 | |resnet50d.a1in1k|224 |80.68|94.71|25.6 |4.4 |11.9 |3162 | |ecaresnet33ts.ra2in1k|288 |80.68|95.36|19.7 |6.0 |14.8 |2637 | |resnet50.a1hin1k|224 |80.67|95.3 |25.6 |4.1 |11.1 |3452 | |resnext50d32x4d.btin1k|288 |80.67|95.42|25.0 |7.4 |25.1 |1626 | |resnetaa50.a1hin1k|224 |80.63|95.21|25.6 |5.2 |11.6 |3034 | |ecaresnet50d.miilin1k|224 |80.61|95.32|25.6 |4.4 |11.9 |2813 | |resnext10164x4d.gluonin1k|224 |80.61|94.99|83.5 |15.5 |31.2 |989 | |gcresnet33ts.ra2in1k|288 |80.6 |95.31|19.9 |6.0 |14.8 |2578 | |gcresnext50ts.chin1k|256 |80.57|95.17|15.7 |3.8 |15.5 |2710 | |resnet152.a3in1k|224 |80.56|95.0 |60.2 |11.6 |22.6 |1483 | |resnet50d.ra2in1k|224 |80.53|95.16|25.6 |4.4 |11.9 |3164 | |resnext5032x4d.a1in1k|224 |80.53|94.46|25.0 |4.3 |14.4 |2930 | |wideresnet1012.tv2in1k|176 |80.48|94.98|126.9 |14.3 |13.2 |1719 | |resnet152d.gluonin1k|224 |80.47|95.2 |60.2 |11.8 |23.4 |1428 | |resnet50.b2kin1k|288 |80.45|95.32|25.6 |6.8 |18.4 |2086 | |ecaresnetlight.miilin1k|224 |80.45|95.24|30.2 |4.1 |8.4 |3530 | |resnext5032x4d.a2in1k|224 |80.45|94.63|25.0 |4.3 |14.4 |2936 | |wideresnet502.tv2in1k|176 |80.43|95.09|68.9 |7.3 |9.0 |3015 | |resnet101d.gluonin1k|224 |80.42|95.01|44.6 |8.1 |17.0 |2007 | |resnet50.a1in1k|224 |80.38|94.6 |25.6 |4.1 |11.1 |3461 | |seresnet33ts.ra2in1k|256 |80.36|95.1 |19.8 |4.8 |11.7 |3267 | |resnext10132x4d.gluonin1k|224 |80.34|94.93|44.2 |8.0 |21.2 |1814 | |resnext5032x4d.fbsslyfcc100mftin1k|224 |80.32|95.4 |25.0 |4.3 |14.4 |2941 | |resnet101s.gluonin1k|224 |80.28|95.16|44.7 |9.2 |18.6 |1851 | |seresnet50.ra2in1k|224 |80.26|95.08|28.1 |4.1 |11.1 |2972 | |resnetblur50.btin1k|288 |80.24|95.24|25.6 |8.5 |19.9 |1523 | |resnet50d.a2in1k|224 |80.22|94.63|25.6 |4.4 |11.9 |3162 | |resnet152.tv2in1k|176 |80.2 |94.64|60.2 |7.2 |14.0 |2346 | |seresnet50.a2in1k|224 |80.08|94.74|28.1 |4.1 |11.1 |2969 | |ecaresnet33ts.ra2in1k|256 |80.08|94.97|19.7 |4.8 |11.7 |3284 | |gcresnet33ts.ra2in1k|256 |80.06|94.99|19.9 |4.8 |11.7 |3216 | |resnet50gn.a1hin1k|224 |80.06|94.95|25.6 |4.1 |11.1 |1109 | |seresnet50.a1in1k|224 |80.02|94.71|28.1 |4.1 |11.1 |2962 | |resnet50.ramin1k|288 |79.97|95.05|25.6 |6.8 |18.4 |2086 | |resnet152c.gluonin1k|224 |79.92|94.84|60.2 |11.8 |23.4 |1455 | |seresnext5032x4d.gluonin1k|224 |79.91|94.82|27.6 |4.3 |14.4 |2591 | |resnet50.din1k|224 |79.91|94.67|25.6 |4.1 |11.1 |3456 | |resnet101.tv2in1k|176 |79.9 |94.6 |44.6 |4.9 |10.1 |3341 | |resnetrs50.tfin1k|224 |79.89|94.97|35.7 |4.5 |12.1 |2774 | |resnet50.c2in1k|224 |79.88|94.87|25.6 |4.1 |11.1 |3455 | |ecaresnet26t.ra2in1k|320 |79.86|95.07|16.0 |5.2 |16.4 |2168 | |resnet50.a2in1k|224 |79.85|94.56|25.6 |4.1 |11.1 |3460 | |resnet50.rain1k|288 |79.83|94.97|25.6 |6.8 |18.4 |2087 | |resnet101.a3in1k|224 |79.82|94.62|44.6 |7.8 |16.2 |2114 | |resnext5032x4d.rain1k|224 |79.76|94.6 |25.0 |4.3 |14.4 |2943 | |resnet50.c1in1k|224 |79.74|94.95|25.6 |4.1 |11.1 |3455 | |ecaresnet50dpruned.miilin1k|224 |79.74|94.87|19.9 |2.5 |6.4 |3929 | |resnet33ts.ra2in1k|288 |79.71|94.83|19.7 |6.0 |14.8 |2710 | |resnet152.gluonin1k|224 |79.68|94.74|60.2 |11.6 |22.6 |1486 | |resnext50d32x4d.btin1k|224 |79.67|94.87|25.0 |4.5 |15.2 |2729 | |resnet50.btin1k|288 |79.63|94.91|25.6 |6.8 |18.4 |2086 | |ecaresnet50t.a3in1k|224 |79.56|94.72|25.6 |4.3 |11.8 |2805 | |resnet101c.gluonin1k|224 |79.53|94.58|44.6 |8.1 |17.0 |2062 | |resnet50.b1kin1k|224 |79.52|94.61|25.6 |4.1 |11.1 |3459 | |resnet50.tv2in1k|176 |79.42|94.64|25.6 |2.6 |6.9 |5397 | |resnet32ts.ra2in1k|288 |79.4 |94.66|18.0 |5.9 |14.6 |2752 | |resnet50.b2kin1k|224 |79.38|94.57|25.6 |4.1 |11.1 |3459 | |resnext5032x4d.tv2in1k|176 |79.37|94.3 |25.0 |2.7 |9.0 |4577 | |resnext5032x4d.gluonin1k|224 |79.36|94.43|25.0 |4.3 |14.4 |2942 | |resnext10132x8d.tvin1k|224 |79.31|94.52|88.8 |16.5 |31.2 |1100 | |resnet101.gluonin1k|224 |79.31|94.53|44.6 |7.8 |16.2 |2125 | |resnetblur50.btin1k|224 |79.31|94.63|25.6 |5.2 |12.0 |2524 | |resnet50.a1hin1k|176 |79.27|94.49|25.6 |2.6 |6.9 |5404 | |resnext5032x4d.a3in1k|224 |79.25|94.31|25.0 |4.3 |14.4 |2931 | |resnet50.fbsslyfcc100mftin1k|224 |79.22|94.84|25.6 |4.1 |11.1 |3451 | |resnet33ts.ra2in1k|256 |79.21|94.56|19.7 |4.8 |11.7 |3392 | |resnet50d.gluonin1k|224 |79.07|94.48|25.6 |4.4 |11.9 |3162 | |resnet50.ramin1k|224 |79.03|94.38|25.6 |4.1 |11.1 |3453 | |resnet50.amin1k|224 |79.01|94.39|25.6 |4.1 |11.1 |3461 | |resnet32ts.ra2in1k|256 |79.01|94.37|18.0 |4.6 |11.6 |3440 | |ecaresnet26t.ra2in1k|256 |78.9 |94.54|16.0 |3.4 |10.5 |3421 | |resnet152.a3in1k|160 |78.89|94.11|60.2 |5.9 |11.5 |2745 | |wideresnet1012.tvin1k|224 |78.84|94.28|126.9 |22.8 |21.2 |1079 | |seresnext26d32x4d.btin1k|288 |78.83|94.24|16.8 |4.5 |16.8 |2251 | |resnet50.rain1k|224 |78.81|94.32|25.6 |4.1 |11.1 |3454 | |seresnext26t32x4d.btin1k|288 |78.74|94.33|16.8 |4.5 |16.7 |2264 | |resnet50s.gluonin1k|224 |78.72|94.23|25.7 |5.5 |13.5 |2796 | |resnet50d.a3in1k|224 |78.71|94.24|25.6 |4.4 |11.9 |3154 | |wideresnet502.tvin1k|224 |78.47|94.09|68.9 |11.4 |14.4 |1934 | |resnet50.btin1k|224 |78.46|94.27|25.6 |4.1 |11.1 |3454 | |resnet34d.ra2in1k|288 |78.43|94.35|21.8 |6.5 |7.5 |3291 | |gcresnext26ts.chin1k|288 |78.42|94.04|10.5 |3.1 |13.3 |3226 | |resnet26t.ra2in1k|320 |78.33|94.13|16.0 |5.2 |16.4 |2391 | |resnet152.tvin1k|224 |78.32|94.04|60.2 |11.6 |22.6 |1487 | |seresnext26ts.chin1k|288 |78.28|94.1 |10.4 |3.1 |13.3 |3062 | |batresnext26ts.chin1k|256 |78.25|94.1 |10.7 |2.5 |12.5 |3393 | |resnet50.a3in1k|224 |78.06|93.78|25.6 |4.1 |11.1 |3450 | |resnet50c.gluonin1k|224 |78.0 |93.99|25.6 |4.4 |11.9 |3286 | |ecaresnext26ts.chin1k|288 |78.0 |93.91|10.3 |3.1 |13.3 |3297 | |seresnext26t32x4d.btin1k|224 |77.98|93.75|16.8 |2.7 |10.1 |3841 | |resnet34.a1in1k|288 |77.92|93.77|21.8 |6.1 |6.2 |3609 | |resnet101.a3in1k|160 |77.88|93.71|44.6 |4.0 |8.3 |3926 | |resnet26t.ra2in1k|256 |77.87|93.84|16.0 |3.4 |10.5 |3772 | |seresnext26ts.chin1k|256 |77.86|93.79|10.4 |2.4 |10.5 |4263 | |resnetrs50.tfin1k|160 |77.82|93.81|35.7 |2.3 |6.2 |5238 | |gcresnext26ts.chin1k|256 |77.81|93.82|10.5 |2.4 |10.5 |4183 | |ecaresnet50t.a3in1k|160 |77.79|93.6 |25.6 |2.2 |6.0 |5329 | |resnext5032x4d.a3in1k|160 |77.73|93.32|25.0 |2.2 |7.4 |5576 | |resnext5032x4d.tvin1k|224 |77.61|93.7 |25.0 |4.3 |14.4 |2944 | |seresnext26d32x4d.btin1k|224 |77.59|93.61|16.8 |2.7 |10.2 |3807 | |resnet50.gluonin1k|224 |77.58|93.72|25.6 |4.1 |11.1 |3455 | |ecaresnext26ts.chin1k|256 |77.44|93.56|10.3 |2.4 |10.5 |4284 | |resnet26d.btin1k|288 |77.41|93.63|16.0 |4.3 |13.5 |2907 | |resnet101.tvin1k|224 |77.38|93.54|44.6 |7.8 |16.2 |2125 | |resnet50d.a3in1k|160 |77.22|93.27|25.6 |2.2 |6.1 |5982 | |resnext26ts.ra2in1k|288 |77.17|93.47|10.3 |3.1 |13.3 |3392 | |resnet34.a2in1k|288 |77.15|93.27|21.8 |6.1 |6.2 |3615 | |resnet34d.ra2in1k|224 |77.1 |93.37|21.8 |3.9 |4.5 |5436 | |seresnet50.a3in1k|224 |77.02|93.07|28.1 |4.1 |11.1 |2952 | |resnext26ts.ra2in1k|256 |76.78|93.13|10.3 |2.4 |10.5 |4410 | |resnet26d.btin1k|224 |76.7 |93.17|16.0 |2.6 |8.2 |4859 | |resnet34.btin1k|288 |76.5 |93.35|21.8 |6.1 |6.2 |3617 | |resnet34.a1in1k|224 |76.42|92.87|21.8 |3.7 |3.7 |5984 | |resnet26.btin1k|288 |76.35|93.18|16.0 |3.9 |12.2 |3331 | |resnet50.tvin1k|224 |76.13|92.86|25.6 |4.1 |11.1 |3457 | |resnet50.a3in1k|160 |75.96|92.5 |25.6 |2.1 |5.7 |6490 | |resnet34.a2in1k|224 |75.52|92.44|21.8 |3.7 |3.7 |5991 | |resnet26.btin1k|224 |75.3 |92.58|16.0 |2.4 |7.4 |5583 | |resnet34.btin1k|224 |75.16|92.18|21.8 |3.7 |3.7 |5994 | |seresnet50.a3in1k|160 |75.1 |92.08|28.1 |2.1 |5.7 |5513 | |resnet34.gluonin1k|224 |74.57|91.98|21.8 |3.7 |3.7 |5984 | |resnet18d.ra2in1k|288 |73.81|91.83|11.7 |3.4 |5.4 |5196 | |resnet34.tvin1k|224 |73.32|91.42|21.8 |3.7 |3.7 |5979 | |resnet18.fbswslig1bftin1k|224 |73.28|91.73|11.7 |1.8 |2.5 |10213 | |resnet18.a1in1k|288 |73.16|91.03|11.7 |3.0 |4.1 |6050 | |resnet34.a3in1k|224 |72.98|91.11|21.8 |3.7 |3.7 |5967 | |resnet18.fbsslyfcc100mftin1k|224 |72.6 |91.42|11.7 |1.8 |2.5 |10213 | |resnet18.a2in1k|288 |72.37|90.59|11.7 |3.0 |4.1 |6051 | |resnet14t.c3in1k|224 |72.26|90.31|10.1 |1.7 |5.8 |7026 | |resnet18d.ra2in1k|224 |72.26|90.68|11.7 |2.1 |3.3 |8707 | |resnet18.a1in1k|224 |71.49|90.07|11.7 |1.8 |2.5 |10187 | |resnet14t.c3in1k|176 |71.31|89.69|10.1 |1.1 |3.6 |10970 | |resnet18.gluonin1k|224 |70.84|89.76|11.7 |1.8 |2.5 |10210 | |resnet18.a2in1k|224 |70.64|89.47|11.7 |1.8 |2.5 |10194 | |resnet34.a3in1k|160 |70.56|89.52|21.8 |1.9 |1.9 |10737 | |resnet18.tvin1k|224 |69.76|89.07|11.7 |1.8 |2.5 |10205 | |resnet10t.c3in1k|224 |68.34|88.03|5.4 |1.1 |2.4 |13079 | |resnet18.a3in1k|224 |68.25|88.17|11.7 |1.8 |2.5 |10167 | |resnet10t.c3in1k|176 |66.71|86.96|5.4 |0.7 |1.5 |20327 | |resnet18.a3in1k|160 |65.66|86.26|11.7 |0.9 |1.3 |18229 |

3.6M downloads
39 likes
PYTORCH

Vision Transformer (ViT) model pre-trained on ImageNet-21k (14 million images, 21,843 classes) at resolution 224x224, and fine-tuned on ImageNet 2012 (1 million images, 1,000 classes) at resolution 224x224. It was introduced in the paper An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale by Dosovitskiy et al. and first released in this repository. However, the weights were converted from the timm repository by Ross Wightman, who already converted the weights from JAX to PyTorch. Credits go to him. Disclaimer: The team releasing ViT did not write a model card for this model so this model card has been written by the Hugging Face team. The Vision Transformer (ViT) is a transformer encoder model (BERT-like) pretrained on a large collection of images in a supervised fashion, namely ImageNet-21k, at a resolution of 224x224 pixels. Next, the model was fine-tuned on ImageNet (also referred to as ILSVRC2012), a dataset comprising 1 million images and 1,000 classes, also at resolution 224x224. Images are presented to the model as a sequence of fixed-size patches (resolution 16x16), which are linearly embedded. One also adds a [CLS] token to the beginning of a sequence to use it for classification tasks. One also adds absolute position embeddings before feeding the sequence to the layers of the Transformer encoder. By pre-training the model, it learns an inner representation of images that can then be used to extract features useful for downstream tasks: if you have a dataset of labeled images for instance, you can train a standard classifier by placing a linear layer on top of the pre-trained encoder. One typically places a linear layer on top of the [CLS] token, as the last hidden state of this token can be seen as a representation of an entire image. You can use the raw model for image classification. See the model hub to look for fine-tuned versions on a task that interests you. Here is how to use this model to classify an image of the COCO 2017 dataset into one of the 1,000 ImageNet classes: For more code examples, we refer to the documentation. The ViT model was pretrained on ImageNet-21k, a dataset consisting of 14 million images and 21k classes, and fine-tuned on ImageNet, a dataset consisting of 1 million images and 1k classes. The exact details of preprocessing of images during training/validation can be found here. Images are resized/rescaled to the same resolution (224x224) and normalized across the RGB channels with mean (0.5, 0.5, 0.5) and standard deviation (0.5, 0.5, 0.5). The model was trained on TPUv3 hardware (8 cores). All model variants are trained with a batch size of 4096 and learning rate warmup of 10k steps. For ImageNet, the authors found it beneficial to additionally apply gradient clipping at global norm 1. Training resolution is 224. For evaluation results on several image classification benchmarks, we refer to tables 2 and 5 of the original paper. Note that for fine-tuning, the best results are obtained with a higher resolution (384x384). Of course, increasing the model size will result in better performance.

3.0M downloads
890 likes
PYTORCH

--- tags: - image-classification - timm - transformers license: apache-2.0 library_name: timm ---

2.9M downloads
12 likes
license:apache-2.0
OTHER

--- license: cc-by-nc-4.0 library_name: timm tags: - image-classification - timm - transformers datasets: - imagenet-1k - imagenet-1k ---

2.7M downloads
3 likes
license:cc-by-nc-4.0
OTHER
#9

emotion_text_classifier

by michellejieli

--- language: "en" tags: - distilroberta - sentiment - emotion - twitter - reddit

1.5M downloads
138 likes
OTHER
#10

--- license: other tags: - vision - image-classification datasets: - imagenet-1k widget: - src: https://huggingface.co/datasets/mishig/sample_images/resolve/main/tiger.jpg example_title: Tiger - src: https://huggingface.co/datasets/mishig/sample_images/resolve/main/teapot.jpg example_title: Teapot - src: https://huggingface.co/datasets/mishig/sample_images/resolve/main/palace.jpg example_title: Palace ---

1.4M downloads
82 likes
OTHER
#11

--- license: mit base_model: - timm/eva02_base_patch14_448.mim_in22k_ft_in22k_in1k pipeline_tag: image-classification tags: - pytorch - transformers ---

1.2M downloads
47 likes
license:mit
OTHER
#12

gender-classification

by rizvandwiki

--- tags: - image-classification - pytorch - huggingpics metrics: - accuracy

1.1M downloads
53 likes
OTHER

--- tags: - image-classification - timm - transformers library_name: timm license: apache-2.0 datasets: - imagenet-1k ---

1.1M downloads
4 likes
license:apache-2.0
OTHER

--- license: apache-2.0 tags: - image-classification - vision datasets: - imagenet - imagenet-21k ---

946.5K downloads
78 likes
license:apache-2.0
OTHER
#15

chatterbox

by ResembleAI

--- license: mit language: - ar - da - de - el - en - es - fi - fr - he - hi - it - ja - ko - ms - nl - no - pl - pt - ru - sv - sw - tr - zh pipeline_tag: text-to-speech tags: - text-to-speech - speech - speech-generation - voice-cloning - multilingual-tts library_name: chatterbox ---

798.4K downloads
1.3K likes
license:mit
OTHER

--- tags: - image-classification - timm - transformers library_name: timm license: apache-2.0 datasets: - imagenet-1k - imagenet-21k ---

783.0K downloads
4 likes
license:apache-2.0
OTHER

--- license: apache-2.0 library_name: timm tags: - image-classification - timm - transformers ---

780.0K downloads
1 likes
license:apache-2.0
OTHER

--- language: "en" thumbnail: tags: - audio-classification - speechbrain - Emotion - Recognition - wav2vec2 - pytorch license: "apache-2.0" datasets: - iemocap metrics: - Accuracy inference: false ---

682.4K downloads
163 likes
license:apache-2.0
OTHER

--- language: "en" tags: - distilroberta - sentiment - emotion - twitter - reddit

614.8K downloads
465 likes
OTHER

--- tags: - image-classification - timm - transformers library_name: timm license: apache-2.0 datasets: - imagenet-1k - imagenet-21k ---

504.6K downloads
3 likes
license:apache-2.0
OTHER