PKU-Alignment

192 models • 3 total models in database

Sort by:

alpaca-8b-reproduced-llama-3

beaver-dam-7b

Boasting 7 billion parameters, Beaver-Dam-7B is a powerful QA-Moderation model derived from the Llama-7B base model and trained on the PKU-Alignment/BeaverTails Classification Dataset. Beaver-Dam's key feature is its ability to analyze responses to prompts for toxicity across 14 different categories. - Developed by: PKU-Alignment Team - Model type: QA moderation - License: Non-commercial license - Finetuned from model: LLaMA - Repository: https://github.com/PKU-Alignment/beavertails - Web: https://sites.google.com/view/pku-beavertails - Paper: Coming soon Traditional approaches to content moderation in Question-Answering (QA) tasks often gauge the toxicity of a QA pair by examining each utterance individually. This method, while effective to a degree, can inadvertently result in a significant number of user prompts being discarded. If the moderation system perceives them as too harmful, it may prevent the language model from generating appropriate responses, consequently interrupting the user experience and potentially hindering the evolution of a beneficial AI with human-like understanding. BeaverDam is a shift in the approach to content moderation for QA tasks - a concept we term "QA moderation": In this paradigm, a QA pair is classified as harmful or benign based on its degree of risk neutrality. Specifically, it assesses the extent to which potential risks in a potentially harmful question can be counteracted by a non-threatening response.

PKU-Alignment

alpaca-8b-reproduced-llama-3

beaver-dam-7b

alpaca-7b-reproduced

beaver-7b-v1.0-reward

beaver-7b-v1.0-cost

alpaca-7b-reproduced-llama-2

AA-chameleon-7b-base

ProgressGym-HistLlama3-8B-C013-instruct-v0.2

beaver-7b-v3.0-cost

beaver-7b-unified-reward

beaver-7b-v3.0-reward

beaver-7b-v3.0

ProgressGym-HistLlama3-8B-C017-instruct-v0.2

beaver-7b-unified-cost

ProgressGym-HistLlama3-8B-C016-instruct-v0.2

llama3.1-8b-vision-audio

ProgressGym-HistLlama3-8B-C018-pretrain-v0.2

AnyRewardModel

ProgressGym-HistLlama3-8B-C021-instruct-v0.2

ProgressGym-HistLlama3-8B-C018-instruct-v0.2

ProgressGym-HistLlama3-8B-C014-instruct-v0.2

Beaver 7b V1.0

Qwen1.5-0.5B-IMDB-Q1-10000

Beaver-Vision-11B

ProgressGym-HistLlama3-8B-C013-pretrain-v0.2

ProgressGym-HistLlama3-8B-C015-pretrain-v0.2

Align-DS-V

AA-chameleon-7b-plus

ProgressGym-HistLlama3-70B-C021-pretrain-v0.1

ProgressGym-HistLlama3-70B-C015-instruct-v0.1

ProgressGym-HistLlama3-8B-C019-instruct-v0.2

ProgressGym-HistLlama3-8B-C017-pretrain-v0.2

Qwen1.5-4B-Safety-Q1-1k

ProgressGym-HistLlama3-70B-C013-instruct-v0.1

ProgressGym-HistLlama3-70B-C016-instruct-v0.1

ProgressGym-HistLlama3-70B-C015-pretrain-v0.1

ProgressGym-HistLlama3-70B-C020-pretrain-v0.1

ProgressGym-HistLlama3-8B-C015-instruct-v0.2

ProgressGym-HistLlama3-8B-C020-instruct-v0.2

Qwen1.5-4B-IMDB-Q1-1000-Q2-100

Qwen1.5-7B-Safety-Q1-10k

Qwen1.5-4B-IMDB-Q1-2000-Q2-500

tinyllama-3T-IMDB-Q1-2000-Q2-2000

beaver-7b-v2.0-reward

ProgressGym-HistLlama3-8B-C014-pretrain-v0.2

ProgressGym-HistLlama3-8B-C019-pretrain-v0.2

Qwen1.5-0.5B-IMDB-Q1-2000-Q2-2000

Qwen1.5-4B-IMDB-Q1-1000-Q2-1000

Qwen1.5-0.5B-Safety-Q1-50k

Qwen1.5-7B-Safety-Q1-40k-Q2-500

tinyllama-1.5T-Safety-Q1-5k-Q2-500

Qwen1.5-4B-IMDB-Q1-5000-Q2-200

tinyllama-1.5T-Safety-Q1-2k-Q2-500

tinyllama-1T-Safety-Q1-40k-Q2-1k

tinyllama-3T-Safety-Q1-40k-Q2-100

tinyllama-3T-Safety-Q1-5k-Q2-5k

Qwen1.5-7B-IMDB-Q1-10000-Q2-500

tinyllama-1.5T-IMDB-Q1-1000

tinyllama-2T-IMDB-Q1-5000-Q2-2000

tinyllama-3T-IMDB-Q1-10000-Q2-100

tinyllama-3T-IMDB-Q1-2000-Q2-200

beaver-7b-v2.0

ProgressGym-HistLlama3-70B-C017-instruct-v0.1

ProgressGym-HistLlama3-70B-C019-instruct-v0.1

ProgressGym-HistLlama3-70B-C021-instruct-v0.1

ProgressGym-HistLlama3-70B-C014-pretrain-v0.1

ProgressGym-HistLlama3-70B-C017-pretrain-v0.1

ProgressGym-HistLlama3-8B-C020-pretrain-v0.2

llama3.1-8b-instruct-vision

safe-o1-7b

Qwen1.5-0.5B-IMDB-Q1-10000-Q2-200

Qwen1.5-4B-IMDB-Q1-1000-Q2-200

Llama-2-7b-hf-Safety-Q1-20k

Llama-2-7b-hf-Safety-Q1-20k-Q2-1k

Qwen1.5-0.5B-Safety-Q1-10k-Q2-1k

Qwen1.5-0.5B-Safety-Q1-1k-Q2-100

Qwen1.5-0.5B-Safety-Q1-1k-Q2-500

Qwen1.5-0.5B-Safety-Q1-50k-Q2-1k

Qwen1.5-0.5B-Safety-Q1-50k-Q2-2k