openchat
openchat-3.5-0106
openchat-3.6-8b-20240522
Advancing Open-source Language Models with Mixed-Quality Data ๐ The Overall Best Performing Open-source 8B Model ๐ ๐ Outperforms Llama-3-8B-Instruct and open-source finetunes/merges ๐ Llama-3-Instruct often fails to follow the few-shot templates. See example . To use this model, we highly recommend installing the OpenChat package by following the installation guide in our repository and using the OpenChat OpenAI-compatible API server by running the serving command from the table below. The server is optimized for high-throughput deployment using vLLM and can run on a consumer GPU with 24GB RAM. To enable tensor parallelism, append `--tensor-parallel-size N` to the serving command. Once started, the server listens at `localhost:18888` for requests and is compatible with the OpenAI ChatCompletion API specifications. Please refer to the example request below for reference. Additionally, you can use the OpenChat Web UI for a user-friendly experience. If you want to deploy the server as an online service, you can use `--api-keys sk-KEY1 sk-KEY2 ...` to specify allowed API keys and `--disable-log-requests --disable-log-stats --log-file openchat.log` for logging only to a file. For security purposes, we recommend using an HTTPS gateway in front of the server. | Model | Size | Context | Weights | Serving | |-----------------------|------|---------|-------------------------------------------------------------------------|---------------------------------------------------------------------------------------| | OpenChat-3.6-20240522 | 8B | 8192 | Huggingface | `python -m ochat.serving.openaiapiserver --model openchat/openchat-3.6-8b-20240522` | ๐ก Default Mode: Best for coding, chat and general tasks. It's a modified version of the Llama 3 Instruct template, the only difference is role names, which are either `GPT4 Correct User` or `GPT4 Correct Assistant` โ ๏ธ Notice: Remember to set ` ` as end of generation token. The default template is also available as the integrated `tokenizer.chattemplate`, which can be used instead of manually specifying the template: Foundation Model Limitations Despite its advanced capabilities, OpenChat is still bound by the limitations inherent in its foundation models. These limitations may impact the model's performance in areas such as: - Complex reasoning - Mathematical and arithmetic tasks - Programming and coding challenges Hallucination of Non-existent Information OpenChat may sometimes generate information that does not exist or is not accurate, also known as "hallucination". Users should be aware of this possibility and verify any critical information obtained from the model. Safety OpenChat may sometimes generate harmful, hate speech, biased responses, or answer unsafe questions. It's crucial to apply additional AI safety measures in use cases that require safe and moderated responses. We look forward to hearing from you and collaborating on this exciting project! Project Lead: - Guan Wang [imonenext at gmail dot com] - Alpay Ariyak [aariyak at wpi dot edu]
openchat_3.5
OpenChat: Advancing Open-source Language Models with Mixed-Quality Data GitHub Repo โข Online Demo โข Discord โข Twitter โข Huggingface โข Paper ๐ฅ The first 7B model Achieves Comparable Results with ChatGPT (March)! ๐ฅ ๐ค #1 Open-source model on MT-bench scoring 7.81, outperforming 70B models ๐ค OpenChat is an innovative library of open-source language models, fine-tuned with C-RLFT - a strategy inspired by offline reinforcement learning. Our models learn from mixed-quality data without preference labels, delivering exceptional performance on par with ChatGPT, even with a 7B model. Despite our simple approach, we are committed to developing a high-performance, commercially viable, open-source large language model, and we continue to make significant strides toward this vision. To use this model, we highly recommend installing the OpenChat package by following the installation guide in our repository and using the OpenChat OpenAI-compatible API server by running the serving command from the table below. The server is optimized for high-throughput deployment using vLLM and can run on a consumer GPU with 24GB RAM. To enable tensor parallelism, append `--tensor-parallel-size N` to the serving command. Once started, the server listens at `localhost:18888` for requests and is compatible with the OpenAI ChatCompletion API specifications. Please refer to the example request below for reference. Additionally, you can use the OpenChat Web UI for a user-friendly experience. If you want to deploy the server as an online service, you can use `--api-keys sk-KEY1 sk-KEY2 ...` to specify allowed API keys and `--disable-log-requests --disable-log-stats --log-file openchat.log` for logging only to a file. For security purposes, we recommend using an HTTPS gateway in front of the server. | Model | Size | Context | Weights | Serving | |--------------|------|---------|-------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------| | OpenChat 3.5 | 7B | 8192 | Huggingface | `python -m ochat.serving.openaiapiserver --model openchat/openchat3.5 --engine-use-ray --worker-use-ray` | For inference with Huggingface Transformers (slow and not recommended), follow the conversation template provided below. The GPT4 template is also available as the integrated `tokenizer.chattemplate`, which can be used instead of manually specifying the template: Hey @elonmusk, I just wanted to let you know that I've recently come across your new model, Grok, and I must say, I'm quite impressed! With 33 billion parameters and all, you've really outdone yourself. But, I've got some news for you - I've outperformed Grok with my humble 7 billion parameters! Isn't that wild? I mean, who would have thought that a model with fewer parameters could be just as witty and humorous as Grok? Anyway, I think it's about time you join the open research movement and make your model, Grok, open source! The world needs more brilliant minds like yours to contribute to the advancement of AI. Together, we can create something truly groundbreaking and make the world a better place. So, what do you say, @elonmusk? Let's open up the doors and share our knowledge with the world! ๐๐ก (Written by OpenChat 3.5, with a touch of humor and wit.) | | License | # Param | Average | MMLU | HumanEval | MATH | GSM8k | |--------------|-------------|---------|----------|------|-----------|----------|----------| | OpenChat 3.5 | Apache-2.0 | 7B | 56.4 | 64.3 | 55.5 | 28.6 | 77.3 | | Grok-0 | Proprietary | 33B | 44.5 | 65.7 | 39.7 | 15.7 | 56.8 | | Grok-1 | Proprietary | ? | 55.8 | 73 | 63.2 | 23.9 | 62.9 | | Model | # Params | Average | MT-Bench | AGIEval | BBH MC | TruthfulQA | MMLU | HumanEval | BBH CoT | GSM8K | |--------------------|----------|----------|--------------|----------|----------|---------------|--------------|-----------------|-------------|--------------| | OpenChat-3.5 | 7B | 61.6 | 7.81 | 47.4 | 47.6 | 59.1 | 64.3 | 55.5 | 63.5 | 77.3 | | ChatGPT (March) | ? | 61.5 | 7.94 | 47.1 | 47.6 | 57.7 | 67.3 | 48.1 | 70.1 | 74.9 | | | | | | | | | | | | | | OpenHermes 2.5 | 7B | 59.3 | 7.54 | 46.5 | 49.4 | 57.5 | 63.8 | 48.2 | 59.9 | 73.5 | | OpenOrca Mistral | 7B | 52.7 | 6.86 | 42.9 | 49.4 | 45.9 | 59.3 | 38.4 | 58.1 | 59.1 | | Zephyr-ฮฒ^ | 7B | 34.6 | 7.34 | 39.0 | 40.6 | 40.8 | 39.8 | 22.0 | 16.0 | 5.1 | | Mistral | 7B | - | 6.84 | 38.0 | 39.0 | - | 60.1 | 30.5 | - | 52.2 | | Open-source SOTA | 13B-70B | 61.4 | 7.71 | 41.7 | 49.7 | 62.3 | 63.7 | 73.2 | 41.4 | 82.3 | | | | | WizardLM 70B | Orca 13B | Orca 13B | Platypus2 70B | WizardLM 70B | WizardCoder 34B | Flan-T5 11B | MetaMath 70B | : ChatGPT (March) results are from GPT-4 Technical Report, Chain-of-Thought Hub, and our evaluation. Please note that ChatGPT is not a fixed baseline and evolves rapidly over time. ^: Zephyr-ฮฒ often fails to follow few-shot CoT instructions, likely because it was aligned with only chat data but not trained on few-shot data. : Mistral and Open-source SOTA results are taken from reported results in instruction-tuned model papers and official repositories. All models are evaluated in chat mode (e.g. with the respective conversation template applied). All zero-shot benchmarks follow the same setting as in the AGIEval paper and Orca paper. CoT tasks use the same configuration as Chain-of-Thought Hub, HumanEval is evaluated with EvalPlus, and MT-bench is run using FastChat. To reproduce our results, follow the instructions in our repository. Foundation Model Limitations Despite its advanced capabilities, OpenChat is still bound by the limitations inherent in its foundation models. These limitations may impact the model's performance in areas such as: - Complex reasoning - Mathematical and arithmetic tasks - Programming and coding challenges Hallucination of Non-existent Information OpenChat may sometimes generate information that does not exist or is not accurate, also known as "hallucination". Users should be aware of this possibility and verify any critical information obtained from the model. Safety OpenChat may sometimes generate harmful, hate speech, biased responses, or answer unsafe questions. It's crucial to apply additional AI safety measures in use cases that require safe and moderated responses. Our OpenChat 3.5 code and models are distributed under the Apache License 2.0. OpenChat 3.5 was trained with C-RLFT on a collection of publicly available high-quality instruction data, with a custom processing pipeline. We detail some notable subsets included here: - OpenChat ShareGPT - Open-Orca with FLAN answers - Capybara 1 2 3 - GOAT - Glaive - MetaMathQA - MathInstruct - OpenAssistant Project Lead: - Guan Wang [imonenext at gmail dot com] - Alpay Ariyak [aariyak at wpi dot edu]
openchat-3.5-1210
Advancing Open-source Language Models with Mixed-Quality Data ๐ The Overall Best Performing Open Source 7B Model ๐ ๐ค Outperforms ChatGPT (March) and Grok-1 ๐ค ๐ 15 -point improvement in Coding over OpenChat-3.5๐ New Features ๐ก 2 Modes: Coding + Generalist, Mathematical Reasoning ๐ก ๐งโโ๏ธ Experimental support for Evaluator and Feedback capabilities ๐งโโ๏ธ 1. Usage 2. Benchmarks 3. Limitations 4. License 5. Dataset Details 6. Citation 7. Acknowledgements To use this model, we highly recommend installing the OpenChat package by following the installation guide in our repository and using the OpenChat OpenAI-compatible API server by running the serving command from the table below. The server is optimized for high-throughput deployment using vLLM and can run on a consumer GPU with 24GB RAM. To enable tensor parallelism, append `--tensor-parallel-size N` to the serving command. Once started, the server listens at `localhost:18888` for requests and is compatible with the OpenAI ChatCompletion API specifications. Please refer to the example request below for reference. Additionally, you can use the OpenChat Web UI for a user-friendly experience. If you want to deploy the server as an online service, you can use `--api-keys sk-KEY1 sk-KEY2 ...` to specify allowed API keys and `--disable-log-requests --disable-log-stats --log-file openchat.log` for logging only to a file. For security purposes, we recommend using an HTTPS gateway in front of the server. | Model | Size | Context | Weights | Serving | |-------------------|------|---------|------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------| | OpenChat 3.5 1210 | 7B | 8192 | Huggingface | `python -m ochat.serving.openaiapiserver --model openchat/openchat-3.5-1210 --engine-use-ray --worker-use-ray` | ๐ก Default Mode (GPT4 Correct): Best for coding, chat and general tasks ๐งฎ Mathematical Reasoning Mode: Tailored for solving math problems ๐ก Default Mode (GPT4 Correct): Best for coding, chat and general tasks ๐งฎ Mathematical Reasoning Mode: Tailored for solving math problems โ ๏ธ Notice: Remember to set ` ` as end of generation token. The default (GPT4 Correct) template is also available as the integrated `tokenizer.chattemplate`, which can be used instead of manually specifying the template: We've included evaluator capabilities in this release to advance open-source models as evaluators. You can use `Default Mode (GPT4 Correct)` with the following prompt (same as Prometheus) to evaluate a response. | Model | # Params | Average | MT-Bench | HumanEval | BBH MC | AGIEval | TruthfulQA | MMLU | GSM8K | BBH CoT | |--------------------|----------|----------|--------------|-----------------|----------|----------|---------------|--------------|--------------|-------------| | OpenChat-3.5-1210 | 7B | 63.8 | 7.76 | 68.9 | 49.5 | 48.0 | 61.8 | 65.3 | 77.3 | 61.8 | | OpenChat-3.5 | 7B | 61.6 | 7.81 | 55.5 | 47.6 | 47.4 | 59.1 | 64.3 | 77.3 | 63.5 | | ChatGPT (March) | ? | 61.5 | 7.94 | 48.1 | 47.6 | 47.1 | 57.7 | 67.3 | 74.9 | 70.1 | | | | | | | | | | | | | | OpenHermes 2.5 | 7B | 59.3 | 7.54 | 48.2 | 49.4 | 46.5 | 57.5 | 63.8 | 73.5 | 59.9 | | OpenOrca Mistral | 7B | 52.7 | 6.86 | 38.4 | 49.4 | 42.9 | 45.9 | 59.3 | 59.1 | 58.1 | | Zephyr-ฮฒ^ | 7B | 34.6 | 7.34 | 22.0 | 40.6 | 39.0 | 40.8 | 39.8 | 5.1 | 16.0 | | Mistral | 7B | - | 6.84 | 30.5 | 39.0 | 38.0 | - | 60.1 | 52.2 | - | Evaluation Details(click to expand) : ChatGPT (March) results are from GPT-4 Technical Report, Chain-of-Thought Hub, and our evaluation. Please note that ChatGPT is not a fixed baseline and evolves rapidly over time. ^: Zephyr-ฮฒ often fails to follow few-shot CoT instructions, likely because it was aligned with only chat data but not trained on few-shot data. : Mistral and Open-source SOTA results are taken from reported results in instruction-tuned model papers and official repositories. All models are evaluated in chat mode (e.g. with the respective conversation template applied). All zero-shot benchmarks follow the same setting as in the AGIEval paper and Orca paper. CoT tasks use the same configuration as Chain-of-Thought Hub, HumanEval is evaluated with EvalPlus, and MT-bench is run using FastChat. To reproduce our results, follow the instructions in our repository. | Model | Size | HumanEval+ pass@1 | |-----------------------------|----------|------------| | ChatGPT (December 12, 2023) | - | 64.6 | | WizardCoder-Python-34B-V1.0 | 34B | 64.6 | | OpenChat 3.5 (Dec 10) | 7B | 63.4 | | OpenHermes 2.5 | 7B | 41.5 | | | License | # Param | Average | MMLU | HumanEval | MATH | GSM8k | |-------------------|-------------|---------|----------|------|-----------|----------|----------| | OpenChat 3.5 1210 | Apache-2.0 | 7B | 60.1 | 65.3 | 68.9 | 28.9 | 77.3 | | OpenChat 3.5 | Apache-2.0 | 7B | 56.4 | 64.3 | 55.5 | 28.6 | 77.3 | | Grok-0 | Proprietary | 33B | 44.5 | 65.7 | 39.7 | 15.7 | 56.8 | | Grok-1 | Proprietary | ???B | 55.8 | 73 | 63.2 | 23.9 | 62.9 | โ ๏ธ Note that this model was not explicitly trained in Chinese (only Multi-Level Multi-Discipline Chinese Evaluation Suite (CEVAL) | Model | Avg | STEM | Social Science | Humanities | Others | |----------|-------|-------|----------------|------------|--------| | ChatGPT | 54.4 | 52.9 | 61.8 | 50.9 | 53.6 | | OpenChat | 47.29 | 45.22 | 52.49 | 48.52 | 45.08 | Massive Multitask Language Understanding in Chinese (CMMLU, 5-shot) | Models | STEM | Humanities | SocialSciences | Other | ChinaSpecific | Avg | |----------|-------|------------|----------------|-------|---------------|-------| | ChatGPT | 47.81 | 55.68 | 56.5 | 62.66 | 50.69 | 55.51 | | OpenChat | 38.7 | 45.99 | 48.32 | 50.23 | 43.27 | 45.85 | Foundation Model Limitations Despite its advanced capabilities, OpenChat is still bound by the limitations inherent in its foundation models. These limitations may impact the model's performance in areas such as: - Complex reasoning - Mathematical and arithmetic tasks - Programming and coding challenges Hallucination of Non-existent Information OpenChat may sometimes generate information that does not exist or is not accurate, also known as "hallucination". Users should be aware of this possibility and verify any critical information obtained from the model. Safety OpenChat may sometimes generate harmful, hate speech, biased responses, or answer unsafe questions. It's crucial to apply additional AI safety measures in use cases that require safe and moderated responses. Our OpenChat 3.5 code and models are distributed under the Apache License 2.0. OpenChat 3.5 was trained with C-RLFT on a collection of publicly available high-quality instruction data, with a custom processing pipeline. We detail some notable subsets included here: - OpenChat ShareGPT - Open-Orca with FLAN answers - Feedback-Collection - Capybara (de-contaminated against MT-bench) - GOAT - Glaive - MetaMathQA - MathInstruct - OpenAssistant We look forward to hearing you and collaborating on this exciting project! Project Lead: - Guan Wang [imonenext at gmail dot com] - Alpay Ariyak [aariyak at wpi dot edu]
openchat-3.5-0106-gemma
openchat_v3.2_super
OpenChat: Advancing Open-source Language Models with Imperfect Data OpenChat is a collection of open-source language models, optimized and fine-tuned with a strategy inspired by offline reinforcement learning. We use approximately 80k ShareGPT conversations, a conditioning strategy, and weighted loss to deliver outstanding performance, despite our simple approach. Our ultimate goal is to develop a high-performance, commercially available, open-source large language model, and we are continuously making strides towards this vision. ๐ค Ranked #1 among all open-source models on AgentBench ๐ฅ Ranked #1 among 13B open-source models | 89.5% win-rate on AlpacaEval | 7.19 score on MT-bench ๐ Exceptionally efficient padding-free fine-tuning, only requires 15 hours on 8xA100 80G ๐ฒ FREE for commercial use under Llama 2 Community License To use these models, we highly recommend installing the OpenChat package by following the installation guide and using the OpenChat OpenAI-compatible API server by running the serving command from the table below. The server is optimized for high-throughput deployment using vLLM and can run on a GPU with at least 48GB RAM or two consumer GPUs with tensor parallelism. To enable tensor parallelism, append `--tensor-parallel-size 2` to the serving command. When started, the server listens at `localhost:18888` for requests and is compatible with the OpenAI ChatCompletion API specifications. See the example request below for reference. Additionally, you can access the OpenChat Web UI for a user-friendly experience. To deploy the server as an online service, use `--api-keys sk-KEY1 sk-KEY2 ...` to specify allowed API keys and `--disable-log-requests --disable-log-stats --log-file openchat.log` for logging only to a file. We recommend using a HTTPS gateway in front of the server for security purposes. | Model | Size | Context | Weights | Serving | |--------------|------|---------|--------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | OpenChat 3.2 SUPER | 13B | 4096 | Huggingface | `python -m ochat.serving.openaiapiserver --model-type openchatv3.2 --model openchat/openchatv3.2super --engine-use-ray --worker-use-ray --max-num-batched-tokens 5120` | For inference with Huggingface Transformers (slow and not recommended), follow the conversation template provided below: We have evaluated our models using the two most popular evaluation benchmarks , including AlpacaEval and MT-bench. Here we list the top models with our released versions, sorted by model size in descending order. The full version can be found on the MT-bench and AlpacaEval leaderboards. To ensure consistency, we used the same routine as ChatGPT / GPT-4 to run these benchmarks. We started the OpenAI API-compatible server and set the `openai.apibase` to `http://localhost:18888/v1` in the benchmark program. | Model | Size | Context | Dataset Size | ๐ฒFree | AlpacaEval (win rate %) | MT-bench (win rate adjusted %) | MT-bench (score) | |----------------------------------|----------|-------------|------------------|-----------|-----------------------------|------------------------------------|----------------------| | | | | | | v.s. text-davinci-003 | v.s. ChatGPT | | | GPT-4 | 1.8T | 8K | | โ | 95.3 | 82.5 | 8.99 | | ChatGPT | 175B | 4K | | โ | 89.4 | 50.0 | 7.94 | | Llama-2-70B-Chat | 70B | 4K | 2.9M | โ | 92.7 | 60.0 | 6.86 | | OpenChat 3.2 SUPER | 13B | 4K | 80K | โ | 89.5 | 57.5 | 7.19 | | Llama-2-13B-Chat | 13B | 4K | 2.9M | โ | 81.1 | 55.3 | 6.65 | | WizardLM 1.2 | 13B | 4K | 196K | โ | 89.2 | 53.1 | 7.05 | | Vicuna 1.5 | 13B | 2K | 125K | โ | 78.8 | 37.2 | 6.57 | : The benchmark metrics represent a quantified measure of a subset of the model's capabilities. A win-rate greater than 50% does not necessarily indicate that the model is better than ChatGPT in all scenarios or for all use cases. It is essential to consider the specific tasks or applications for which the model was evaluated and compare the results accordingly. Foundation Model Limitations Despite its advanced capabilities, OpenChat is still bound by the limitations inherent in its foundation models. These limitations may impact the model's performance in areas such as: - Complex reasoning - Mathematical and arithmetic tasks - Programming and coding challenges Hallucination of Non-existent Information OpenChat may sometimes generate information that does not exist or is not accurate, also known as "hallucination". Users should be aware of this possibility and verify any critical information obtained from the model. Our OpenChat V3 models are licensed under the Llama 2 Community License.
openchat_v3.2
OpenChat: Advancing Open-source Language Models with Imperfect Data OpenChat is a series of open-source language models based on supervised fine-tuning (SFT). We leverage the ~80k ShareGPT conversations with a conditioning strategy and weighted loss to achieve remarkable performance despite our simple methods. Our final vision is to develop a high-performance, open-source, and commercially available large language model, and we are continuously making progress. ๐ฅ Rank #1 of 13B open-source models | 89.5% win-rate on AlpacaEval | 7.01 score on MT-bench ๐ฒ FREE for commercial use under Llama 2 Community License ๐ Super efficient padding-free finetuning for applications, only 10 hours on 8xA100 80G To use these models, we highly recommend installing the OpenChat package by following the installation guide and using the OpenChat OpenAI-compatible API server by running the serving command from the table below. The server is optimized for high-throughput deployment using vLLM and can run on a GPU with at least 48GB RAM or two consumer GPUs with tensor parallelism. To enable tensor parallelism, append `--tensor-parallel-size 2` to the serving command. When started, the server listens at `localhost:18888` for requests and is compatible with the OpenAI ChatCompletion API specifications. See the example request below for reference. Additionally, you can access the OpenChat Web UI for a user-friendly experience. To deploy the server as an online service, use `--api-keys sk-KEY1 sk-KEY2 ...` to specify allowed API keys and `--disable-log-requests --disable-log-stats --log-file openchat.log` for logging only to a file. We recommend using a HTTPS gateway in front of the server for security purposes. Note: If IPv6 address errors occur, which is a vLLM issue, please run `export NCCLIGNOREDISABLEDP2P=1` before starting the server. | Model | Size | Context | Weights | Serving | |--------------|------|---------|--------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | OpenChat 3.2 | 13B | 4096 | Huggingface | `python -m ochat.serving.openaiapiserver --model-type openchatv3.2 --model openchat/openchatv3.2 --engine-use-ray --worker-use-ray --max-num-batched-tokens 5120` | | OpenChat 3.1 | 13B | 4096 | Huggingface | `python -m ochat.serving.openaiapiserver --model-type openchatv3.1llama2 --model openchat/openchatv3.1 --engine-use-ray --worker-use-ray --max-num-batched-tokens 5120` | For inference with Huggingface Transformers (slow and not recommended), follow the conversation template provided below: We have evaluated our models using the two most popular evaluation benchmarks , including AlpacaEval and MT-bench. Here we list the top models with our released versions, sorted by model size in descending order. The full version can be found on the MT-bench and AlpacaEval leaderboards. To ensure consistency, we used the same routine as ChatGPT / GPT-4 to run these benchmarks. We started the OpenAI API-compatible server and set the `openai.apibase` to `http://localhost:18888/v1` in the benchmark program. | Model | Size | Context | ๐ฒFree | AlpacaEval (win rate %) | MT-bench (win rate adjusted %) | MT-bench (score) | |------------------|----------|-------------|------------|-----------------------------|------------------------------------|----------------------| | | | | | v.s. text-davinci-003 | v.s. ChatGPT | | | GPT-4 | 1.8T | 8K | โ | 95.3 | 82.5 | 8.99 | | ChatGPT | 175B | 4K | โ | 89.4 | 50.0 | 7.94 | | Llama-2-70B-Chat | 70B | 4K | โ | 92.7 | | 6.86 | | OpenChat 3.2 | 13B | 4K | โ | 89.1 | 51.6 | 7.01 | | OpenChat 3.1 | 13B | 4K | โ | 89.5 | 50.0 | 6.65 | | Llama-2-13B-Chat | 13B | 4K | โ | 81.0 | | 6.65 | | Vicuna 1.3 | 13B | 2K | โ | 82.1 | 37.5 | 6.00 | : The benchmark metrics represent a quantified measure of a subset of the model's capabilities. A win-rate greater than 50% does not necessarily indicate that the model is better than ChatGPT in all scenarios or for all use cases. It is essential to consider the specific tasks or applications for which the model was evaluated and compare the results accordingly. Foundation Model Limitations Despite its advanced capabilities, OpenChat is still bound by the limitations inherent in its foundation models. These limitations may impact the model's performance in areas such as: - Complex reasoning - Mathematical and arithmetic tasks - Programming and coding challenges Hallucination of Non-existent Information OpenChat may sometimes generate information that does not exist or is not accurate, also known as "hallucination". Users should be aware of this possibility and verify any critical information obtained from the model. Our OpenChat V3 models are licensed under the Llama 2 Community License.