lightblue

57 models • 5 total models in database

Sort by:

Karasu-Mixtral-8x22B-v0.1

This is a finetune of the newly released mistral-community/Mixtral-8x22B-v0.1 base model. As the base model has not explicitly been trained to chat, we trained this model on a multilingual chat dataset so that the LLM community can use this model for conversations. The accuracy of the model is surprisingly high, and has a decently fast inference speed (roughly 40 tokens/s single batch on our tests), so we believe this will be useful to the community. We have tested (and thus recommend) running this model on vLLM. We recommend running it from the vLLM openAI server, using the following command: which is how we ran it on a 4 x A100 (80GB) machine. You can then call this model from Python installing the openai package: We will be uploading a 4bit AWQ model soon to make it easier to run this model on other machines (watch this space!). From qualitative testing, the model seems pretty smart, especially in English, and has very good recall of facts. It can still get confused with some logical questions, but has also passed a lot of the logical questions I have thrown at it that other open source LLMs often fail. FAIL! The banana would still be in the kitchen, as I put the plate on the banana. Sort of a failure, I'd prefer to say "the center of the Earth". The idea is a bit original. but the casting is 2/3rds Dune actresses. We trained this model on conversations between human users and GPT-4. 6,206 conversations from the openchat/openchatsharegpt4dataset dataset (link) 3,011 conversations that we created. We wanted to increase the representation of nonenglish prompts in our training dataset, so we sampled initial prompts from lmsys/lmsys-chat-1m, stratifying based on language. We then prompted gpt-4-0125 with these, and used the results as training data. We plan to release more information on this second dataset soon, as we are using it another dataset. The complete data used to train this model can be found at lightblue/gpt4conversationsmultilingual We trained this model using Axolotl's 4bit QLoRA configuration for roughly 100 minutes in a A100 (80GB) x 4 environment on the Azure cloud (StandardNC96adsA100v4). We used Deepspeed Zero2 to effectively train over 4 GPUs.

NaNK

license:apache-2.0