ContextualAI

71 models • 2 total models in database

Sort by:

Contextual KTO Mistral PairRM

This repo contains the model and tokenizer checkpoints for: - model family mistralai/Mistral-7B-Instruct-v0.2 - optimized with the loss KTO - aligned using the snorkelai/Snorkel-Mistral-PairRM-DPO-Dataset - via 3 iterations of KTO on one epoch of each training partition, each previous iteration's model serving as the reference for the subsequent. [03/06/2024]: We are #2 on the (verified) Alpaca Eval 2.0 Leaderboard scoring 33.23! To prompt this model, ensure that the format is consistent with that of TuluV2. For example, a prompt should be formatted as follows, where ` ` corresponds to the human's role and ` ` corresponds to the LLM's role. The human should speak first: Note that a beginning-of-sequence (BOS) token is automatically added at tokenization time and does not have to be added by you. No end-of-sequence (EOS) token is added to the prompt. You may also use our tokenizer's `applychattemplate` if doing inference with `chatml` set or evaluating generations through non-local clients. For more info on KTO refer to our code repository or blog for more details on the methodology. If you found this work useful, feel free to cite our work:

ContextualAI

archangel_sft-kto_llama13b

ctxl-rerank-v2-instruct-multilingual-6b

tiny-random-MistralForCausalLM

ctxl-rerank-v2-instruct-multilingual-2b

ctxl-rerank-v2-instruct-multilingual-1b

Llama-200M

ctxl-rerank-v2-instruct-multilingual-2b-nvfp4

ctxl-rerank-v2-instruct-multilingual-1b-nvfp4

LMUnit-llama3.1-70b

Contextual KTO Mistral PairRM

ctx-bird-reward-250121

ctxl-rerank-v2-instruct-multilingual-6b-nvfp4

LMUnit-qwen2.5-72b

archangel_dpo_pythia1-4b

archangel_sft_llama7b

archangel_dpo_pythia6-9b

archangel_kto_llama7b

archangel_sft_pythia6-9b

archangel_kto_pythia2-8b

archangel_ppo_llama13b

archangel_ppo_pythia12-0b

archangel_kto_llama13b

archangel_sft-dpo_llama30b

archangel_sft-slic_llama7b

archangel_sft-kto_llama30b

archangel_sft_pythia1-4b

archangel_slic_pythia6-9b

archangel_ppo_llama30b

archangel_csft_llama7b

archangel_slic_llama7b

archangel_dpo_pythia2-8b

archangel_dpo_pythia12-0b

archangel_ppo_llama7b

archangel_sft-kto_pythia12-0b

archangel_sft-csft_pythia1-4b

archangel_sft-csft_pythia2-8b

archangel_sft-csft_pythia6-9b

archangel_sft-csft_llama13b

archangel_sft-slic_llama13b

archangel_kto_llama30b

archangel_sft_pythia12-0b

archangel_slic_pythia1-4b

archangel_slic_pythia2-8b

archangel_slic_pythia12-0b

archangel_kto_pythia6-9b

archangel_sft-ppo_pythia1-4b

archangel_sft-ppo_pythia2-8b

archangel_sft-ppo_llama13b

archangel_sft-slic_pythia2-8b

archangel_csft_pythia12-0b

archangel_sft-csft_llama30b

archangel_csft_llama30b

archangel_sft_pythia2-8b

archangel_sft_llama13b

archangel_slic_llama30b

archangel_dpo_llama7b

archangel_dpo_llama30b

archangel_ppo_pythia6-9b

archangel_sft-dpo_pythia6-9b

archangel_sft-kto_pythia2-8b

archangel_sft-kto_llama7b

archangel_sft-ppo_pythia12-0b

archangel_sft-ppo_llama7b

archangel_csft_pythia1-4b

archangel_sft-csft_pythia12-0b

archangel_sft-slic_pythia12-0b

archangel_sft-csft_llama7b

zephyr_sft_dpo

Llama-3.1-8b-Instruct

archangel_csft_pythia2-8b

zephyr_sft_kto