beyoru

38 models โ€ข 2 total models in database
Sort by:

Qwen3-4B-I-1209

๐Ÿงพ Model Overview - ๐Ÿ—๏ธ Base Model: Qwen3-4B-Instruct-2507 - ๐ŸŽฏ Training Method: Reinforcement Learning (GRPO) with multiple reward functions This model (`Qwen3-4B-I-1209`) is finetuned for ๐Ÿ”ง tool-use and ๐Ÿ“ž function call generation. ๐Ÿ† Reward Functions The model was trained with multi-signal rewards: 1. ๐Ÿ“ Rule-based Reward โœ”๏ธ Checks correctness of function call name and arguments. โž• Partial credit for matching subsets of arguments. 2. ๐Ÿ”’ Self-Certainty Reward โšก Encourages confident predictions. 3. ๐Ÿ”ง Tool-Call Reward โœ… Validates structural correctness. โš™๏ธ Training Configuration - โšก Optimizer: AdamW - ๐Ÿ“‰ Learning Rate: 5e-6 with cosine decay (`minlrrate=0.1`) - โณ Scheduler: cosinewithminlr - ๐Ÿ”„ Generations per Prompt: 4 ACEBench | Model | Overall Accuracy | |--------------------------------|------------------| | Qwen3-4B-I-1209 | 0.7233 | | Qwen3-4B-Instruct-2507 (base) | 0.635 | | Salesforce/Llama-xLAM-2-8b-fc-r| 0.5792 | Contribute: I would be happy to receive a contribution to this model and get feedback about performance, quality of model ๐Ÿ“– Citation If you use this model in your research or application, please cite: ```bibtex @misc{qwen3-4b-i-1209, title = {Qwen3-4B-I-1209: Fine-tuned Qwen3-4B-Instruct with GRPO for Tool-Use and Function Calling}, author = {Beyoru}, year = {2025}, howpublished = {\url{https://huggingface.co/beyoru/Qwen3-4B-I-1509}} }

NaNK
license:apache-2.0
3,154
0

Luna-Ethos

license:apache-2.0
262
0

Belle-VLM

NaNK
license:apache-2.0
143
0

Luna-SRSA-Uncensored

license:mit
100
1

EvolLLM

NaNK
โ€”
69
2

Omni-PLAnh-exp

NaNK
โ€”
66
0

Qwen3-0.9B-A0.6B-Coder

NaNK
license:apache-2.0
66
0

MinCoder-1.5B-Expert

NaNK
license:apache-2.0
42
0

WinterCode-2610

NaNK
โ€”
35
0

Tama

โ€”
34
10

QwenSQL-2E

โ€”
33
0

MinCoder-4B-Expert

This model is fine-tuned Qwen model using a custom reinforcement learning (RL) framework that rewards the model for producing solutions passing automated test cases โ€” similar to the process of programming task evaluation on LeetCode. Instead of relying on labeled ground truth answers, the model learns through test-case-based rewards, promoting generalization and reasoning ability in algorithmic problem-solving.

NaNK
license:apache-2.0
31
1

Luna-7B-A4B

NaNK
license:mit
26
0

Evol-Aes-Hybrid-4B

NaNK
license:apache-2.0
19
0

Asita-8B

NaNK
llama
18
0

Spark-TTS-0.5B-with-KafkaSpark

NaNK
โ€”
15
2

Qwen3-CoderSmall

NaNK
โ€”
14
2

MinCoder-4B-Exp

NaNK
license:apache-2.0
14
1

Luna

Luna is a conversational AI model designed for immersive roleplay (RP) and natural chatting. It is fine-tuned to respond in a more engaging, character-driven style compared to standard instruction-tuned models. Notes: - Optimized for roleplay-style conversations - Flexible: can be used for creative writing, storytelling, or character interactions - For best performance, you should describe the system prompt for your character. Fix: - Using old chat template 04/09

license:mit
12
12

Lime

NaNK
โ€”
12
4

Qwen3-MaCoTo

โ€”
11
0

Lunaa

license:mit
9
6

EvolLLM-Linh

NaNK
โ€”
6
3

firefly_sesame

license:apache-2.0
5
1

Browser-mini

โ€”
5
0

Luna-Fusion-RP

Luna is a conversational AI model designed for immersive roleplay (RP) and natural chatting. It is fine-tuned to respond in a more engaging, character-driven style compared to standard instruction-tuned models. This is a merging version of all RP models, base model in evolution merged method with the CREATIVE WRITING dataset. This influence by the success when merging model to impoveing performance in my test. Notes: - Optimized for roleplay-style conversations - Flexible: can be used for creative writing, storytelling, or character interactions - For best performance, you should describe the system prompt for your character. - 3 models is merging.

license:mit
4
4

Cery-rc-M

NaNK
license:apache-2.0
4
0

Tama-JP-beta

Tama is a conversational AI model designed for immersive NSFW roleplay (RP) and natural chatting. Tama-beta continue pretrained on the Qwen3-4B model Notes: - Optimized for roleplay-style conversations - Warning this model can generate NSFW content - You should describe the system prompt for your character.

license:mit
3
2

Cery-rc-High

NaNK
license:apache-2.0
3
0

Qwen3-4B-I-1509

๐Ÿงพ Model Overview - ๐Ÿ—๏ธ Base Model: Qwen3-4B-Instruct-2507 - ๐ŸŽฏ Training Method: Reinforcement Learning (GRPO) with multiple reward functions This model (`Qwen3-4B-I-1509`) is finetuned for ๐Ÿ”ง tool-use and ๐Ÿ“ž function call generation. ๐Ÿ† Reward Functions The model was trained with multi-signal rewards: 1. ๐Ÿ“ Rule-based Reward โœ”๏ธ Checks correctness of function call name and arguments. โž• Partial credit for matching subsets of arguments. 2. ๐Ÿ”’ Self-Certainty Reward โšก Encourages confident predictions. 3. ๐Ÿ”ง Tool-Call Reward โœ… Validates structural correctness. โš™๏ธ Training Configuration - โšก Optimizer: AdamW - ๐Ÿ“‰ Learning Rate: 5e-6 with cosine decay (`minlrrate=0.1`) - โณ Scheduler: cosinewithminlr - ๐Ÿ”„ Generations per Prompt: 4 Important notes: - Why it lower than technical report? There have a limit of hardware so have to reduce some max tokens when evaluation for both 2 models I use the same configuration for all the models I review for larger or with a same size model. | ๐Ÿง  Model | โœˆ๏ธ Airline | ๐Ÿ›๏ธ Retail | |-------------------|------------|-------------| | Qwen3-4B-I-1509 | 0.2800 | 0.2783 | | Base Model | 0.3000 | 0.2261 | ACEBench | Model | Overall Accuracy | |--------------------------------|------------------| | Qwen3-4B-I-1509 | 0.677 | | Qwen3-4B-Instruct-2507 (base) | 0.635 | | Salesforce/Llama-xLAM-2-8b-fc-r| 0.5792 | Contribute: I would be happy to receive a contribution to this model and get feedback about performance, quality of model ๐Ÿ“– Citation If you use this model in your research or application, please cite: ```bibtex @misc{qwen3-4b-i-1509, title = {Qwen3-4B-I-1509: Fine-tuned Qwen3-4B-Instruct with GRPO for Tool-Use and Function Calling}, author = {Beyoru}, year = {2025}, howpublished = {\url{https://huggingface.co/beyoru/Qwen3-4B-I-1509}} }

NaNK
license:apache-2.0
2
2

ThinkAgain1.5

Model detail Reasoning natural and smarter\ No system prompt training\ LoRA training rank 16 and alpha 16\ Tool calling support\ Quanz this model may not get the best performance\

NaNK
license:apache-2.0
2
2

kafka_sesame

NaNK
license:apache-2.0
2
0

Cery-rc

NaNK
license:apache-2.0
2
0

ThinkAgain1.6-S2

Model detail No system prompt training\ LoRA training rank 64 and alpha 128\ Tool calling support

โ€”
1
2

MCQ-o1-1

Notes: - For small datasets with narrow content which the model has already done well on our domain, and doesn't want the model to forget the knowledge => Just need to focus on o. - Fine-tuned lora with rank = 1 and alpha = 1, epoch = 1, linear (optim) - DoRA Improvement - Increasing rank can help the model do better at robust structure. - Try more efficient fine-tuning

NaNK
license:apache-2.0
1
1

Spark-TTS-0.5B-with-FireflySpark

NaNK
license:cc-by-nc-sa-4.0
1
0

BronCode-Thinker

NaNK
license:apache-2.0
0
2

Luna-SRSA-Uncensored-gguf

NaNK
license:mit
0
1