yasserrmd

117 models • 7 total models in database

Sort by:

Coder-GRPO-3B

Developer: `yasserrmd` Base model: `Qwen/Qwen2.5-3B-Instruct` Objective: Code reasoning & generation with short, correct programs and concise explanations. License: Apache-2.0 Dataset: `glaiveai/glaive-code-assistant` This model was fine-tuned with GRPO (Group Relative Policy Optimization) using Unsloth + TRL, targeting high-signal code tasks (write, refactor, explain, fix). Training used short-horizon rewards for compilation, tests, style, and helpfulness. Unsloth enabled faster, memory-efficient training on consumer GPUs. Code generation & refactoring Bug fixing with minimal diffs Explaining code clearly and concisely Writing tests & docstrings Lightweight agent/tool use (function calling) Not intended for: high-risk domains, hidden system development, or tasks requiring guaranteed security review. Method: GRPO via TRL (policy improves relative to group baseline) Frameworks: Unsloth + TRL + Hugging Face Transformers Data: `glaiveai/glaive-code-assistant` (code tasks, stepwise targets) Losses/Rewards (examples): ✅ Compiles / passes simple unit checks ✅ Minimal, correct diffs ✅ No secrets / unsafe code patterns ✅ Concise, actionable explanations > This README summarizes the setup; adapt hyperparameters to your hardware and target tasks. Chat Template (ChatML, Qwen-style) + System Instruction with ` ` > The ` ` block is used as an internal scratchpad. The model is asked to never reveal it. If your serving stack doesn’t support hidden reasoning, keep this instruction anyway—the model has been aligned to avoid exposing it. Stop generation when your serving stack detects end of answer, or add ` `. The model avoids revealing hidden reasoning: never output the ` ` content. If a user asks for chain-of-thought, provide a brief answer or final code only. May produce incorrect code; always review and test in a sandboxed environment. Avoids secrets, credentials, and unsafe instructions (e.g., malware).

yasserrmd

Coder-GRPO-3B

Gpt Oss Coder 20b

DeepSeek-R1-Distill-Qwen-1.5B-gguf

MedScholar 1.5B Gguf

MediPhi-Instruct-gguf

kallamni-4b-v1

LFM2-350M-gguf

ReaderLM-v2-gguf

DeepSeek-7B-1M-gguf

Arch-Router-1.5B-gguf

NextCoder-7B-gguf

LFM2-1.2B-gguf

AgenticCoder-4B-gguf

deepseek-esg-assistant

Text2SQL-1.5B

AgentUX-4B-gguf

EXAONE-4.0-1.2B-gguf

DentaInstruct-1.2B

smollm3-gguf

GemmaECG-Vision-gguf

kallamni-4b-v1-gguf

qwen-reasoning

caselaw-cpt-8b-gguf

Kimina-Prover-Distill-1.7B-gguf

MedScholar-Reasoning-1.5B-gguf

Midm-2.0-Mini-Instruct-gguf

OpenReasoning-Nemotron-1.5B-gguf

qwen2.5-html-0.5b-gguf

OCRFlux-3B-gguf

Qwen2.5-7B-Instruct-1M-gguf

Yehia-7B-preview-gguf

LFM2-700M-gguf

WebSailor-3B-gguf

II-Medical-8B-1706-gguf

UIGEN-X-8B-gguf

ERNIE-4.5-0.3B-gguf

kallamni-2.6b-v1

CoALM-8B-gguf

Solidity-LLM-gguf

A.X-3.1-Light-gguf

Fanar-1-9B-Instruct-gguf

granite-embedding-r2-onnx

SmallThinker-3B-Preview-gguf

gemma-3-1b-it-GGUF

Aryabhata-1.0-gguf

psychiatry-gemma-300m-emb

qwen3-4b-agentic-reasoner

kallamni-1.2b-v1

GeoScholar-QA-1.2B

Seed-X-Instruct-7B-gguf

Llada 346m

Kallamni Embed V1

Text2SQL-1.5B-gguf

Neuro-Orchestrator-8B

OphthaScholar-1.2B

geo-gemma-300m-emb

ConstructionSafetyQA-1.2B-V1

caselaw-cpt-8b

Human-Like-Qwen2.5-1.5B-Instruct-gguf

Seed-X-PPO-7B-gguf

DeepScaleR-1.5B-Preview-gguf

Seed-X-RM-7B-gguf

diffusion-text-demo

SoftwareArchitecture-Instruct-v1

DentaInstruct-1.2B-gguf

SciReason-LFM2-2.6B

GLM4.7-Distill-LFM2.5-1.2B

PharmaQA-1.2B

GemmaECG-Vision

PharmaQA-270M

mcp-instruct-v1

kallamni-700m-v1-lora

SinaReason-Magistral-2509

MegaSciMoE 1.2B

MedScholar-1.5B

qwen3-4b-agentic-reasoner-gguf

oncology-gemma-300m-emb

endocrinology-gemma-300m-emb

kallamni-1.2b-v1-gguf