beyoru
Qwen3-4B-I-1209
๐งพ Model Overview - ๐๏ธ Base Model: Qwen3-4B-Instruct-2507 - ๐ฏ Training Method: Reinforcement Learning (GRPO) with multiple reward functions This model (`Qwen3-4B-I-1209`) is finetuned for ๐ง tool-use and ๐ function call generation. ๐ Reward Functions The model was trained with multi-signal rewards: 1. ๐ Rule-based Reward โ๏ธ Checks correctness of function call name and arguments. โ Partial credit for matching subsets of arguments. 2. ๐ Self-Certainty Reward โก Encourages confident predictions. 3. ๐ง Tool-Call Reward โ Validates structural correctness. โ๏ธ Training Configuration - โก Optimizer: AdamW - ๐ Learning Rate: 5e-6 with cosine decay (`minlrrate=0.1`) - โณ Scheduler: cosinewithminlr - ๐ Generations per Prompt: 4 ACEBench | Model | Overall Accuracy | |--------------------------------|------------------| | Qwen3-4B-I-1209 | 0.7233 | | Qwen3-4B-Instruct-2507 (base) | 0.635 | | Salesforce/Llama-xLAM-2-8b-fc-r| 0.5792 | Contribute: I would be happy to receive a contribution to this model and get feedback about performance, quality of model ๐ Citation If you use this model in your research or application, please cite: ```bibtex @misc{qwen3-4b-i-1209, title = {Qwen3-4B-I-1209: Fine-tuned Qwen3-4B-Instruct with GRPO for Tool-Use and Function Calling}, author = {Beyoru}, year = {2025}, howpublished = {\url{https://huggingface.co/beyoru/Qwen3-4B-I-1509}} }
Luna-Ethos
Belle-VLM
Luna-SRSA-Uncensored
EvolLLM
Omni-PLAnh-exp
Qwen3-0.9B-A0.6B-Coder
MinCoder-1.5B-Expert
WinterCode-2610
Tama
QwenSQL-2E
MinCoder-4B-Expert
This model is fine-tuned Qwen model using a custom reinforcement learning (RL) framework that rewards the model for producing solutions passing automated test cases โ similar to the process of programming task evaluation on LeetCode. Instead of relying on labeled ground truth answers, the model learns through test-case-based rewards, promoting generalization and reasoning ability in algorithmic problem-solving.
Luna-7B-A4B
Evol-Aes-Hybrid-4B
Asita-8B
Spark-TTS-0.5B-with-KafkaSpark
Qwen3-CoderSmall
MinCoder-4B-Exp
Luna
Luna is a conversational AI model designed for immersive roleplay (RP) and natural chatting. It is fine-tuned to respond in a more engaging, character-driven style compared to standard instruction-tuned models. Notes: - Optimized for roleplay-style conversations - Flexible: can be used for creative writing, storytelling, or character interactions - For best performance, you should describe the system prompt for your character. Fix: - Using old chat template 04/09
Lime
Qwen3-MaCoTo
Lunaa
EvolLLM-Linh
firefly_sesame
Browser-mini
Luna-Fusion-RP
Luna is a conversational AI model designed for immersive roleplay (RP) and natural chatting. It is fine-tuned to respond in a more engaging, character-driven style compared to standard instruction-tuned models. This is a merging version of all RP models, base model in evolution merged method with the CREATIVE WRITING dataset. This influence by the success when merging model to impoveing performance in my test. Notes: - Optimized for roleplay-style conversations - Flexible: can be used for creative writing, storytelling, or character interactions - For best performance, you should describe the system prompt for your character. - 3 models is merging.
Cery-rc-M
Tama-JP-beta
Tama is a conversational AI model designed for immersive NSFW roleplay (RP) and natural chatting. Tama-beta continue pretrained on the Qwen3-4B model Notes: - Optimized for roleplay-style conversations - Warning this model can generate NSFW content - You should describe the system prompt for your character.
Cery-rc-High
Qwen3-4B-I-1509
๐งพ Model Overview - ๐๏ธ Base Model: Qwen3-4B-Instruct-2507 - ๐ฏ Training Method: Reinforcement Learning (GRPO) with multiple reward functions This model (`Qwen3-4B-I-1509`) is finetuned for ๐ง tool-use and ๐ function call generation. ๐ Reward Functions The model was trained with multi-signal rewards: 1. ๐ Rule-based Reward โ๏ธ Checks correctness of function call name and arguments. โ Partial credit for matching subsets of arguments. 2. ๐ Self-Certainty Reward โก Encourages confident predictions. 3. ๐ง Tool-Call Reward โ Validates structural correctness. โ๏ธ Training Configuration - โก Optimizer: AdamW - ๐ Learning Rate: 5e-6 with cosine decay (`minlrrate=0.1`) - โณ Scheduler: cosinewithminlr - ๐ Generations per Prompt: 4 Important notes: - Why it lower than technical report? There have a limit of hardware so have to reduce some max tokens when evaluation for both 2 models I use the same configuration for all the models I review for larger or with a same size model. | ๐ง Model | โ๏ธ Airline | ๐๏ธ Retail | |-------------------|------------|-------------| | Qwen3-4B-I-1509 | 0.2800 | 0.2783 | | Base Model | 0.3000 | 0.2261 | ACEBench | Model | Overall Accuracy | |--------------------------------|------------------| | Qwen3-4B-I-1509 | 0.677 | | Qwen3-4B-Instruct-2507 (base) | 0.635 | | Salesforce/Llama-xLAM-2-8b-fc-r| 0.5792 | Contribute: I would be happy to receive a contribution to this model and get feedback about performance, quality of model ๐ Citation If you use this model in your research or application, please cite: ```bibtex @misc{qwen3-4b-i-1509, title = {Qwen3-4B-I-1509: Fine-tuned Qwen3-4B-Instruct with GRPO for Tool-Use and Function Calling}, author = {Beyoru}, year = {2025}, howpublished = {\url{https://huggingface.co/beyoru/Qwen3-4B-I-1509}} }
ThinkAgain1.5
Model detail Reasoning natural and smarter\ No system prompt training\ LoRA training rank 16 and alpha 16\ Tool calling support\ Quanz this model may not get the best performance\
kafka_sesame
Cery-rc
ThinkAgain1.6-S2
Model detail No system prompt training\ LoRA training rank 64 and alpha 128\ Tool calling support
MCQ-o1-1
Notes: - For small datasets with narrow content which the model has already done well on our domain, and doesn't want the model to forget the knowledge => Just need to focus on o. - Fine-tuned lora with rank = 1 and alpha = 1, epoch = 1, linear (optim) - DoRA Improvement - Increasing rank can help the model do better at robust structure. - Try more efficient fine-tuning