Nexusflow
Starling-LM-7B-beta
Athene-70B
Athene-RM-8B
Athene-V2-Chat
Athene-V2-Chat-72B: Rivaling GPT-4o across Benchmarks Nexusflow HF - Nexusflow Discord - Athene-V2 Blogpost We introduce Athene-V2-Chat-72B, an open-weights LLM on-par with GPT-4o across benchmarks. It is currently the best open model according to Chatbot Arena, where it beats GPT-4o-0513 (the best GPT-4o model on Arena) in hard and math category, and is on-par with GPT-4o-0513 in coding, instruction following, longer query and multi-turn. It is trained through RLHF with Qwen-2.5-72B-Instruct as base model. Athene-V2-Chat-72B excels in chat, math, and coding. Its sister model, Athene-V2-Agent-72B, surpasses GPT-4o in complex function calling and agentic applications. - Developed by: The Nexusflow Team - Model type: Chat Model - Finetuned from model: Qwen 2.5 72B-Instruct - License: Nexusflow Research License - Blog: https://nexusflow.ai/blogs/athene-v2 Usage Athene-V2-Chat uses the same chat template as Qwen2.5-72B-Instruct. Below is an example simple usage using the Transformers library. Note that by adding a system prompt that encourages the model to think step by step, the model can improve further on difficult math queries and problems like counting `r`s in strawberry. For fairness consideration we do not include such system prompt during chat evaluation. Acknowledgment We would like to thank the LMSYS Organization for their support of testing the model. We would like to thank Qwen Team and the open source community for their efforts in providing the datasets and base models.
NexusRaven V2 13B
NexusRaven-13B: Surpassing GPT-4 for Zero-shot Function Calling Nexusflow HF - Nexusflow Discord - NexusRaven-V2 blog post - Prompting Notebook CoLab - Leaderboard - Read-World Demo - NexusRaven-V2-13B Github Introducing NexusRaven-V2-13B NexusRaven is an open-source and commercially viable function calling LLM that surpasses the state-of-the-art in function calling capabilities. šŖ Versatile Function Calling Capability: NexusRaven-V2 is capable of generating single function calls, nested calls, and parallel calls in many challenging cases. š¤ Fully Explainable: NexusRaven-V2 is capable of generating very detailed explanations for the function calls it generates. This behavior can be turned off, to save tokens during inference. š Performance Highlights: NexusRaven-V2 surpasses GPT-4 by 7% in function calling success rates in human-generated use cases involving nested and composite functions. š§ Generalization to the Unseen: NexusRaven-V2 has never been trained on the functions used in evaluation. š„ Commercially Permissive: The training of NexusRaven-V2 does not involve any data generated by proprietary LLMs such as GPT-4. You have full control of the model when deployed in commercial applications. Please checkout the following links! - Prompting Notebook CoLab - Evaluation Leaderboard - NexusRaven-V2 Real-World Demo These python functions can do anything (including sending GET/POST requests to external APIs!). The two requirements include the python function signature and the appropriate docstring to generate the function call. NexusRaven-V2 also does best on functions with arguments, so please always only provide functions that require arguments to raven. NexusRaven-V2 is capable of generating deeply nested function calls, parallel function calls, and simple single calls. It can also justify the function calls it generated. If you would like to generate the call only, please set a stop criteria of \"\ \". Otherwise, please allow NexusRaven-V2 to run until its stop token (i.e. "\ "). Please refer to our notebook, How-To-Prompt.ipynb, for more advanced tutorials on using NexusRaven-V2! 1. When giving docstrings to Raven, please provide well-indented, detailed, and well-written docstrings as this can help accuracy. 2. Raven does better when all functions provided to it has arguments, either required or optional, (i.e. is preferred over ) as this can help accuracy. 3. We strongly recommend to set sampling to False when prompting NexusRaven-V2. 4. We strongly recommend a very low temperature (~0.001). 5. We strongly recommend following the prompting style below. When handling irrelevant user queries, users have noticed that specifying a "no-op" function with arguments work best. For example, something like this might work: Please ensure to provide an argument to this function, as Raven works best on functions with arguments. For parallel calls, due to the model being targeted for industry use, you can "enable" parallel calls by adding this into the prompt: This can be added above the User Query to "allow" the model to use parallel calls, otherwise, the model will focus on nested and single calls primarily. Quickstart You can run the model on a GPU using the following code. If you would like to prevent the generation of the explanation of the function call (for example, to save on inference tokens), please set a stopping criteria of \ . Please follow this prompting template to maximize the performance of RavenV2. If you currently have a workflow that is built around OpenAI's function calling and you want to try NexusRaven-V2, we have a package that helps you drop in NexusRaven-V2. We've also included a small demo for using Raven with langchain! For a deeper dive into the results, please see our Github README. Limitations 1. The model works best when it is connected with a retriever when there are a multitude of functions, as a large number of functions will saturate the context window of this model. 2. The model can be prone to generate incorrect calls. Please ensure proper guardrails to capture errant behavior is in place. 3. The explanations generated by NexusRaven-V2 might be incorrect. Please ensure proper guardrails are present to capture errant behavior. License This model was trained on commercially viable data and is licensed under the Nexusflow community license. References We thank the CodeLlama team for their amazing models! Contact Please join our Discord Channel to reach out for any issues and comments!
Athene-V2-Agent
Athene-V2-Agent: Surpassing GPT-4o for Tool Use And Agentic Usecases Nexusflow HF - Nexusflow Discord - Athene-V2 Blogpost Introducing Athene-V2-Agent Athene-V2-Agent is an open-source Agent LLM that surpasses the state-of-the-art in function calling and agentic capabilities. šŖ Versatile Agent Capability: Athene-V2-Agent is an agent model, capable of operating in environments with deeply nested dependencies with the environment. It is capable of reasoning and doing planning for trajectories with many tool calls necessary to answer a single query. š Performance Highlights: Athene-V2-Agent surpasses GPT-4o in single FC tasks by 18% in function calling success rates, and by 17% in Agentic success rates. š§ Generalization to the Unseen: Athene-V2-Agent has never been trained on the functions or agentic settings used in evaluation. - Developed by: The Nexusflow Team - Model type: Agent Model - Finetuned from model: Qwen-2.5-72B-Intruct - License: Nexusflow Research License - Blog: https://nexusflow.ai/blogs/athene-v2 Athene-V2-Agent is usable in any OpenAI API-compatible environment using our VLLM docker image. This should be a simple "drop-in" replacement to any agentic or tool-use setting with our VLLM docker image. You can now submit any OpenAI-Compatible tool-use requests to the model by hitting the VLLM endpoint. Athene-V2-Agent will be able to issue tool calls that you can execute and return results for. WARNING: Athene-V2-Agent uses a CUSTOM prompting style that is baked into the custom docker image, as the executable calls are extracted from the model's generated planning. For best performance, please ensure to use the docker image above for Athene-V2-Agent, including when benchmarking the model. Using HuggingFace tokenizer's chat template will yield suboptimal results for Agent usecases. Please reach out to us on Discord if you run into any issues! Examples An example Weather agent for this can be found here: Link. This example includes handling Athene for queries that are answerable and not answerable by the current tools. An example extraction and RAG-Agent can be found here: Link. This example includes handling RAG-based queries with a wikipedia tool. 1. When giving docstrings to Athene-V2-Agent, please provide well-indented, detailed, and well-written docstrings as this can help accuracy. 2. We strongly recommend using the docker image to interact with Athene-V2-Agent. 4. We strongly recommend to set sampling to False when prompting Athene-V2-Agent. 5. We strongly recommend a zero temperature. 6. Athene-V2-Agent is designed to work within systems, so it's tuned to be very controllable with the instructions specified in the tools, including for broad behaviors (like rejecting queries, or chatting) Handling Irrelevant Queries The Athene-V2-Agent model is strongly tuned to have its behavior be controllable with tools to make it easy to integrate into systems. Therefore, the model won't by default reject queries that are out of domain, as it will try its best to issue the most relevant call. However, when expecting irrelevant user queries and wanting the model to reject them, you can use a no-op function. For example, something like this would work: Please see the example Link here for a demo of this. Since Athene-V2-Agent model is strongly tuned to be controllable, so we wanted to ensure that it does not chat unless explicitly instructed to do so. You can do this by adding a `chat` tool, and allowing it to do so in the system prompt: And the following system prompt, as an example (but feel free to experiment to make Athene-V2-Agent behave the way you want it to!): Please see the example Link here for a demo of this. Contact Please join our Discord Channel to reach out for any issues and comments!