protectai

26 models • 1 total models in database
Sort by:

deberta-v3-base-prompt-injection-v2

--- license: apache-2.0 base_model: microsoft/deberta-v3-base language: - en datasets: - natolambert/xstest-v2-copy - VMware/open-instruct - alespalla/chatbot_instruction_prompts - HuggingFaceH4/grok-conversation-harmless - Harelix/Prompt-Injection-Mixed-Techniques-2024 - OpenSafetyLab/Salad-Data - jackhhao/jailbreak-classification tags: - prompt-injection - injection - security - llm-security - generated_from_trainer metrics: - accuracy - recall - precision - f1 pipeline_tag: text-classificatio

license:apache-2.0
263,733
80

deberta-v3-base-prompt-injection

license:apache-2.0
20,759
87

unbiased-toxic-roberta-onnx

This model is a conversion of unitary/unbiased-toxic-roberta to ONNX format using the šŸ¤— Optimum library. Trained models & code to predict toxic comments on 3 Jigsaw challenges: Toxic comment classification, Unintended Bias in Toxic comments, Multilingual toxic comment classification. āš ļø Disclaimer: The huggingface models currently give different results to the detoxify library (see issue here). Labels All challenges have a toxicity label. The toxicity labels represent the aggregate ratings of up to 10 annotators according the following schema: - Very Toxic (a very hateful, aggressive, or disrespectful comment that is very likely to make you leave a discussion or give up on sharing your perspective) - Toxic (a rude, disrespectful, or unreasonable comment that is somewhat likely to make you leave a discussion or give up on sharing your perspective) - Hard to Say - Not Toxic More information about the labelling schema can be found here. Toxic Comment Classification Challenge This challenge includes the following labels: - `toxic` - `severetoxic` - `obscene` - `threat` - `insult` - `identityhate` Jigsaw Unintended Bias in Toxicity Classification This challenge has 2 types of labels: the main toxicity labels and some additional identity labels that represent the identities mentioned in the comments. Only identities with more than 500 examples in the test set (combined public and private) are included during training as additional labels and in the evaluation calculation. - `toxicity` - `severetoxicity` - `obscene` - `threat` - `insult` - `identityattack` - `sexualexplicit` Identity labels used: - `male` - `female` - `homosexualgayorlesbian` - `christian` - `jewish` - `muslim` - `black` - `white` - `psychiatricormentalillness` A complete list of all the identity labels available can be found here. Loading the model requires the šŸ¤— Optimum library installed. Join our Slack to give us feedback, connect with the maintainers and fellow users, ask questions, or engage in discussions about LLM security!

license:apache-2.0
14,782
4

MoritzLaurer-roberta-base-zeroshot-v2.0-c-onnx

license:apache-2.0
8,856
0

distilroberta-bias-onnx

—
8,255
0

xlm-roberta-base-language-detection-onnx

license:mit
2,303
6

vishnun-codenlbert-sm-onnx

license:apache-2.0
1,833
1

guishe-nuner-v1_orgs-onnx

license:cc-by-sa-4.0
1,735
0

distilroberta-base-rejection-v1

license:apache-2.0
1,109
8

codebert-base-Malicious_URLs-onnx

—
790
5

lakshyakh93-deberta_finetuned_pii-onnx

license:apache-2.0
349
2

gyr66-bert-base-chinese-finetuned-ner-onnx

license:apache-2.0
197
1

bert-base-NER-onnx

NaNK
license:mit
183
4

deberta-v3-small-prompt-injection-v2

license:apache-2.0
156
2

deberta-v3-base-zeroshot-v1-onnx

NaNK
license:mit
104
4

deberta-v3-base-injection-onnx

license:mit
37
2

test-public-repo

—
34
1

deberta-v3-large-zeroshot-v1-onnx

NaNK
license:mit
12
1

bert-large-cased-finetuned-conll03-english-onnx

NaNK
license:mit
9
0

fmops-distilbert-prompt-injection-onnx

license:apache-2.0
9
0

MoritzLaurer-bge-m3-zeroshot-v2.0-c-onnx

license:apache-2.0
8
0

vishnun-codenlbert-tiny-onnx

license:apache-2.0
7
0

CodeBERTa-language-id-onnx

—
6
1

GPTFuzz-onnx

license:mit
3
2

bert-large-NER-onnx

NaNK
license:mit
2
0

llm-guard-models-onnx-gpu-optimized

—
0
4