beyoru

38 models • 2 total models in database

Sort by:

Qwen3-4B-I-1209

🧾 Model Overview - 🏗️ Base Model: Qwen3-4B-Instruct-2507 - 🎯 Training Method: Reinforcement Learning (GRPO) with multiple reward functions This model (`Qwen3-4B-I-1209`) is finetuned for 🔧 tool-use and 📞 function call generation. 🏆 Reward Functions The model was trained with multi-signal rewards: 1. 📝 Rule-based Reward ✔️ Checks correctness of function call name and arguments. ➕ Partial credit for matching subsets of arguments. 2. 🔒 Self-Certainty Reward ⚡ Encourages confident predictions. 3. 🔧 Tool-Call Reward ✅ Validates structural correctness. ⚙️ Training Configuration - ⚡ Optimizer: AdamW - 📉 Learning Rate: 5e-6 with cosine decay (`minlrrate=0.1`) - ⏳ Scheduler: cosinewithminlr - 🔄 Generations per Prompt: 4 ACEBench | Model | Overall Accuracy | |--------------------------------|------------------| | Qwen3-4B-I-1209 | 0.7233 | | Qwen3-4B-Instruct-2507 (base) | 0.635 | | Salesforce/Llama-xLAM-2-8b-fc-r| 0.5792 | Contribute: I would be happy to receive a contribution to this model and get feedback about performance, quality of model 📖 Citation If you use this model in your research or application, please cite: ```bibtex @misc{qwen3-4b-i-1209, title = {Qwen3-4B-I-1209: Fine-tuned Qwen3-4B-Instruct with GRPO for Tool-Use and Function Calling}, author = {Beyoru}, year = {2025}, howpublished = {\url{https://huggingface.co/beyoru/Qwen3-4B-I-1509}} }

license:apache-2.0

Luna-Ethos

license:apache-2.0

Belle-VLM

license:apache-2.0

Luna-SRSA-Uncensored

EvolLLM

Omni-PLAnh-exp

Qwen3-0.9B-A0.6B-Coder

license:apache-2.0

MinCoder-1.5B-Expert

license:apache-2.0

WinterCode-2610

Tama

QwenSQL-2E

MinCoder-4B-Expert

This model is fine-tuned Qwen model using a custom reinforcement learning (RL) framework that rewards the model for producing solutions passing automated test cases — similar to the process of programming task evaluation on LeetCode. Instead of relying on labeled ground truth answers, the model learns through test-case-based rewards, promoting generalization and reasoning ability in algorithmic problem-solving.

license:apache-2.0

Luna-7B-A4B

Evol-Aes-Hybrid-4B

license:apache-2.0

Asita-8B

Spark-TTS-0.5B-with-KafkaSpark

Qwen3-CoderSmall

MinCoder-4B-Exp

license:apache-2.0

Luna

Luna is a conversational AI model designed for immersive roleplay (RP) and natural chatting. It is fine-tuned to respond in a more engaging, character-driven style compared to standard instruction-tuned models. Notes: - Optimized for roleplay-style conversations - Flexible: can be used for creative writing, storytelling, or character interactions - For best performance, you should describe the system prompt for your character. Fix: - Using old chat template 04/09

Lime

Qwen3-MaCoTo

Lunaa

EvolLLM-Linh

firefly_sesame

license:apache-2.0

Browser-mini

Luna-Fusion-RP

Luna is a conversational AI model designed for immersive roleplay (RP) and natural chatting. It is fine-tuned to respond in a more engaging, character-driven style compared to standard instruction-tuned models. This is a merging version of all RP models, base model in evolution merged method with the CREATIVE WRITING dataset. This influence by the success when merging model to impoveing performance in my test. Notes: - Optimized for roleplay-style conversations - Flexible: can be used for creative writing, storytelling, or character interactions - For best performance, you should describe the system prompt for your character. - 3 models is merging.

Cery-rc-M

license:apache-2.0

Tama-JP-beta

Tama is a conversational AI model designed for immersive NSFW roleplay (RP) and natural chatting. Tama-beta continue pretrained on the Qwen3-4B model Notes: - Optimized for roleplay-style conversations - Warning this model can generate NSFW content - You should describe the system prompt for your character.

Cery-rc-High

license:apache-2.0

Qwen3-4B-I-1509

🧾 Model Overview - 🏗️ Base Model: Qwen3-4B-Instruct-2507 - 🎯 Training Method: Reinforcement Learning (GRPO) with multiple reward functions This model (`Qwen3-4B-I-1509`) is finetuned for 🔧 tool-use and 📞 function call generation. 🏆 Reward Functions The model was trained with multi-signal rewards: 1. 📝 Rule-based Reward ✔️ Checks correctness of function call name and arguments. ➕ Partial credit for matching subsets of arguments. 2. 🔒 Self-Certainty Reward ⚡ Encourages confident predictions. 3. 🔧 Tool-Call Reward ✅ Validates structural correctness. ⚙️ Training Configuration - ⚡ Optimizer: AdamW - 📉 Learning Rate: 5e-6 with cosine decay (`minlrrate=0.1`) - ⏳ Scheduler: cosinewithminlr - 🔄 Generations per Prompt: 4 Important notes: - Why it lower than technical report? There have a limit of hardware so have to reduce some max tokens when evaluation for both 2 models I use the same configuration for all the models I review for larger or with a same size model. | 🧠 Model | ✈️ Airline | 🛍️ Retail | |-------------------|------------|-------------| | Qwen3-4B-I-1509 | 0.2800 | 0.2783 | | Base Model | 0.3000 | 0.2261 | ACEBench | Model | Overall Accuracy | |--------------------------------|------------------| | Qwen3-4B-I-1509 | 0.677 | | Qwen3-4B-Instruct-2507 (base) | 0.635 | | Salesforce/Llama-xLAM-2-8b-fc-r| 0.5792 | Contribute: I would be happy to receive a contribution to this model and get feedback about performance, quality of model 📖 Citation If you use this model in your research or application, please cite: ```bibtex @misc{qwen3-4b-i-1509, title = {Qwen3-4B-I-1509: Fine-tuned Qwen3-4B-Instruct with GRPO for Tool-Use and Function Calling}, author = {Beyoru}, year = {2025}, howpublished = {\url{https://huggingface.co/beyoru/Qwen3-4B-I-1509}} }

license:apache-2.0

ThinkAgain1.5

Model detail Reasoning natural and smarter\ No system prompt training\ LoRA training rank 16 and alpha 16\ Tool calling support\ Quanz this model may not get the best performance\

license:apache-2.0

kafka_sesame

license:apache-2.0

Cery-rc

license:apache-2.0

ThinkAgain1.6-S2

Model detail No system prompt training\ LoRA training rank 64 and alpha 128\ Tool calling support

MCQ-o1-1

Notes: - For small datasets with narrow content which the model has already done well on our domain, and doesn't want the model to forget the knowledge => Just need to focus on o. - Fine-tuned lora with rank = 1 and alpha = 1, epoch = 1, linear (optim) - DoRA Improvement - Increasing rank can help the model do better at robust structure. - Try more efficient fine-tuning

license:apache-2.0

Spark-TTS-0.5B-with-FireflySpark

license:cc-by-nc-sa-4.0

BronCode-Thinker

license:apache-2.0

Luna-SRSA-Uncensored-gguf