kaist-ai

27 models • 4 total models in database

Sort by:

janus-7b

- Homepage: https://lklab.kaist.ac.kr/Janus/ - Repository: https://github.com/kaistAI/Janus - Paper: https://arxiv.org/abs/2405.17977 - Point of Contact: [email protected] Janus is a model trained using Mistral-7B-v0.2 as its base model. Janus has been trained on Multifaceted Collection, a preference dataset containing 196k unique system messages for aligning LLMs to diverse human preferences. Janus not only excels at generating personalized responses that cater to various human preferences but is also adept at producing responses that are generally preferred for being helpful and harmless. Model Details Janus-7B is a model created by supervised fine-tuning using all 196k entries of the training data from the Multifaceted-Collection. - Model type: Language model - Language(s) (NLP): English - License: Apache 2.0 - Related Models: Janus-DPO-7B, Janus-ORPO-7B, Janus-RM-7B - Training Datasets: Multifaceted-Collection-SFT - Resources for more information: - Research paper - GitHub Repo Usage Janus is a model generalized for various system messages, allowing users to control the model's response by inputting the desired system message. The input prompt format is as follows: Additionally, an example of the inference code applying this is as follows: To train Janus and evaluate the responses it generates, please refer to the GitHub Repo. Additionally, refer to the Multifaceted Bench, which evaluates how well LLM generates personalized responses. Training Details Training hyperparameters The following hyperparameters were used during training: - learningrate: 5e-06 - trainbatchsize: 2 - evalbatchsize: 2 - seed: 42 - distributedtype: multi-GPU - numdevices: 4 - gradientaccumulationsteps: 4 - totaltrainbatchsize: 32 - totalevalbatchsize: 8 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lrschedulertype: cosine - lrschedulerwarmupsteps: 10 - numepochs: 4 - Transformers 4.40.0.dev0 - Pytorch 2.2.2 - Datasets 2.18.0 - Tokenizers 0.15.0 If you find the following model helpful, please consider citing our paper!

license:apache-2.0

janus-orpo-7b

- Homepage: https://lklab.kaist.ac.kr/Janus/ - Repository: https://github.com/kaistAI/Janus - Paper: https://arxiv.org/abs/2405.17977 - Point of Contact: [email protected] Janus is a model trained using Mistral-7B-v0.2 as its base model. Janus has been trained on Multifaceted Collection, a preference dataset containing 196k unique system messages for aligning LLMs to diverse human preferences. Janus not only excels at generating personalized responses that cater to various human preferences but is also adept at producing responses that are generally preferred for being helpful and harmless. Model Details Janus-ORPO-7B is a model created by applying ORPO to Mistral-7B-v0.2 using the Multifaceted-Collection-ORPO. Model Description - Model type: Language model - Language(s) (NLP): English - License: Apache 2.0 - Related Models: Janus-DPO-7B, Janus-7B, Janus-RM-7B - Training Datasets: Multifaceted-Collection-ORPO - Resources for more information: - Research paper - GitHub Repo Usage Janus is a model generalized for various system messages, allowing users to control the model's response by inputting the desired system message. The input prompt format is as follows: Additionally, an example of the inference code applying this is as follows: To train Janus and evaluate the responses it generates, please refer to the GitHub Repo. Additionally, refer to the Multifaceted Bench, which evaluates how well LLM generates personalized responses. Training Details Training hyperparameters The following hyperparameters were used during training: - learningrate: 5e-06 - trainbatchsize: 1 - evalbatchsize: 1 - seed: 42 - distributedtype: multi-GPU - numdevices: 4 - gradientaccumulationsteps: 4 - totaltrainbatchsize: 16 - totalevalbatchsize: 4 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lrschedulertype: cosine - lrschedulerwarmupsteps: 10 - numepochs: 2 - Transformers 4.40.0.dev0 - Pytorch 2.2.2 - Datasets 2.18.0 - Tokenizers 0.15.0 If you find the following model helpful, please consider citing our paper!

license:apache-2.0

janus-dpo-7b

- Homepage: https://lklab.kaist.ac.kr/Janus/ - Repository: https://github.com/kaistAI/Janus - Paper: https://arxiv.org/abs/2405.17977 - Point of Contact: [email protected] Janus is a model trained using Mistral-7B-v0.2 as its base model. Janus has been trained on Multifaceted Collection, a preference dataset containing 196k unique system messages for aligning LLMs to diverse human preferences. Janus not only excels at generating personalized responses that cater to various human preferences but is also adept at producing responses that are generally preferred for being helpful and harmless. Model Details Janus-DPO-7B is a model created by applying DPO to Janus using the Multifaceted-Collection-DPO. - Model type: Language model - Language(s) (NLP): English - License: Apache 2.0 - Related Models: Janus-7B, Janus-ORPO-7B, Janus-RM-7B - Training Datasets: Multifaceted-Collection-DPO - Resources for more information: - Research paper - GitHub Repo Usage Janus is a model generalized for various system messages, allowing users to control the model's response by inputting the desired system message. The input prompt format is as follows: Additionally, an example of the inference code applying this is as follows: To train Janus and evaluate the responses it generates, please refer to the GitHub Repo. Additionally, refer to the Multifaceted Bench, which evaluates how well LLM generates personalized responses. Training Details Training hyperparameters The following hyperparameters were used during training: - learningrate: 5e-07 - trainbatchsize: 1 - evalbatchsize: 8 - seed: 42 - distributedtype: multi-GPU - numdevices: 4 - gradientaccumulationsteps: 4 - totaltrainbatchsize: 16 - totalevalbatchsize: 32 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lrschedulertype: cosine - lrschedulerwarmupsteps: 10 - trainingsteps: 8143 - Transformers 4.40.0.dev0 - Pytorch 2.1.1 - Datasets 2.15.0 - Tokenizers 0.15.0 If you find the following model helpful, please consider citing our paper!

license:apache-2.0

mistral-orpo-capybara-7k

mistral-orpo-beta

orca2-langbridge-9b

license:apache-2.0

llama2-langbridge-9b

license:apache-2.0

janus-rm-7b

- Homepage: https://lklab.kaist.ac.kr/Janus/ - Repository: https://github.com/kaistAI/Janus - Paper: https://arxiv.org/abs/2405.17977 - Point of Contact: [email protected] Janus is a model trained using Mistral-7B-v0.2 as its base model. Janus has been trained on Multifaceted Collection, a preference dataset containing 196k unique system messages for aligning LLMs to diverse human preferences. Janus not only excels at generating personalized responses that cater to various human preferences but is also adept at producing responses that are generally preferred for being helpful and harmless. Model Details Janus-RM-7B is a reward model created by training Janus-7B (which is trained for only 1 epoch on the full 196k training instances) with Multifaceted-Collection-RM and a similar-sized mix of representative general helpfulness data: 72% of HH-RLHF, 14% of OASST1 dataset preprocessed for reward modeling, and 14% of WebGPT Comparisons. Janus-RM-7B predicts a scalar reward when provided with a concatenation of system message, instruction, chosen response, and rejected response. This can be utilized to perform as a scoring function for Best-of-N sampling or for preference tuning with proximal policy optimization (PPO). - Model type: Language model - Language(s) (NLP): English - License: Apache 2.0 - Related Models: Janus-DPO-7B, Janus-ORPO-7B, Janus-7B - Training Datasets: Multifaceted-Collection-RM, Anthropic/hh-rlhf, tasksource/oasst1pairwiserlhfreward, openai/webgptcomparisons - Resources for more information: - Research paper - GitHub Repo Usage Here is example code to load the reward model and calculate a scalar reward on a model output. To train Janus and evaluate the responses it generates, please refer to the GitHub Repo. Additionally, refer to the Multifaceted Bench, which evaluates how well LLM generates personalized responses. The following hyperparameters were used for training: - learningrate: 9e-6 - trainbatchsize: 8 - evalbatchsize: 2 - seed: 42 - distributedtype: multi-GPU - numdevices: 4 - gradientaccumulationsteps: 4 - totaltrainbatchsize: 128 - totalevalbatchsize: 8 - optimizer: AdamW with betas=(0.9,0.95) - lrschedulertype: cosine - lrschedulerwarmupsteps: 3% of the maximum number of steps - numepochs: 1 - useflashattention2: true - maximumsequencelength: 2048 - bf16: true - gradientcheckpointing: true - Transformers 4.40.0.dev0 - Pytorch 2.2.2 - Datasets 2.18.0 - Tokenizers 0.15.0 - DeepSpeed Zero-3 If you find the following model helpful, please consider citing our paper!

license:apache-2.0

codellama-langbridge-15b

license:apache-2.0

codellama-langbridge-20b

license:apache-2.0

cosupervision-emb_seq-Llama2_7b

volcano-13b

metamath-langbridge-9b

license:apache-2.0

llemma-langbrige-9b

license:apache-2.0

metamath-langbridge-15b

license:apache-2.0

metamath-langbridge-20b

license:apache-2.0

orca2-langbridge-15b

license:apache-2.0

volcano-7b

codellama-langbridge-9b

license:apache-2.0

orca2-langbridge-20b

license:apache-2.0

CoT-T5-3B

license:apache-2.0

selfee-13b-delta

CoT-T5-11B

license:apache-2.0

selfee-7b-delta

mistral-orpo-alpha

langbridge_encoder_tokenizer

license:apache-2.0

cosupervision-gen-Llama2_7b