yasserrmd
Coder-GRPO-3B
Developer: `yasserrmd` Base model: `Qwen/Qwen2.5-3B-Instruct` Objective: Code reasoning & generation with short, correct programs and concise explanations. License: Apache-2.0 Dataset: `glaiveai/glaive-code-assistant` This model was fine-tuned with GRPO (Group Relative Policy Optimization) using Unsloth + TRL, targeting high-signal code tasks (write, refactor, explain, fix). Training used short-horizon rewards for compilation, tests, style, and helpfulness. Unsloth enabled faster, memory-efficient training on consumer GPUs. Code generation & refactoring Bug fixing with minimal diffs Explaining code clearly and concisely Writing tests & docstrings Lightweight agent/tool use (function calling) Not intended for: high-risk domains, hidden system development, or tasks requiring guaranteed security review. Method: GRPO via TRL (policy improves relative to group baseline) Frameworks: Unsloth + TRL + Hugging Face Transformers Data: `glaiveai/glaive-code-assistant` (code tasks, stepwise targets) Losses/Rewards (examples): ✅ Compiles / passes simple unit checks ✅ Minimal, correct diffs ✅ No secrets / unsafe code patterns ✅ Concise, actionable explanations > This README summarizes the setup; adapt hyperparameters to your hardware and target tasks. Chat Template (ChatML, Qwen-style) + System Instruction with ` ` > The ` ` block is used as an internal scratchpad. The model is asked to never reveal it. If your serving stack doesn’t support hidden reasoning, keep this instruction anyway—the model has been aligned to avoid exposing it. Stop generation when your serving stack detects end of answer, or add ` `. The model avoids revealing hidden reasoning: never output the ` ` content. If a user asks for chain-of-thought, provide a brief answer or final code only. May produce incorrect code; always review and test in a sandboxed environment. Avoids secrets, credentials, and unsafe instructions (e.g., malware).
Gpt Oss Coder 20b
This model is a fine-tuned version of OpenAI's GPT-OSS-20B, optimized for code generation tasks. The fine-tuning leveraged the Unsloth library to enable efficient low-bit quantized training and inference. Base Model: openai/gpt-oss-20b Training Framework: Hugging Face's TRL library combined with Unsloth optimizations. Training Data: 1 million randomly generated records, trained for 150 steps Code generation and completion Programming query answering Code summarization The `reasoningeffort` parameter influences the model's focus during text generation: `low`: Produces straightforward, concise answers suitable for simple coding tasks. `medium`: Balances speed and detail, suitable for moderate complexity tasks. `high`: Encourages detailed and complex reasoning, useful for advanced code generation or explanations. Adjusting this parameter allows you to control the depth of the model's reasoning process, balancing between performance and response complexity. The fine-tuning process adapted GPT-OSS-20B to better assist with coding tasks by fine-tuning on a dataset of 1 million random records. The training utilized only the Unsloth library for efficient low-bit quantized fine-tuning.
DeepSeek-R1-Distill-Qwen-1.5B-gguf
MedScholar 1.5B Gguf
MediPhi-Instruct-gguf
kallamni-4b-v1
LFM2-350M-gguf
ReaderLM-v2-gguf
DeepSeek-7B-1M-gguf
Arch-Router-1.5B-gguf
NextCoder-7B-gguf
LFM2-1.2B-gguf
AgenticCoder-4B-gguf
deepseek-esg-assistant
Text2SQL-1.5B
Overview Text2SQL-1.5B is a powerful natural language to SQL model designed to convert user queries into structured SQL statements. It supports complex multi-table queries and ensures high accuracy in text-to-SQL conversion. System Instruction To ensure consistency in model outputs, use the following system instruction: > Always separate code and explanation. Return SQL code in a separate block, followed by the explanation in a separate paragraph. Use markdown triple backticks (` json messages = [ {"role": "system", "content": "Always separate code and explanation. Return SQL code in a separate block, followed by the explanation in a separate paragraph. Use markdown triple backticks ( Using the Model for Text-to-SQL Conversion The following code demonstrates how to use the model to convert natural language queries into SQL statements: sql for SQL) to format the code properly. Write the SQL query first in a separate code block. Then, explain the query in plain text. Do not merge them into one response. The query should always include the table structure using a CREATE TABLE statement before executing the main SQL query." Define user query userquery = "Show the total sales for each customer who has spent more than $50,000. CREATE TABLE sales ( id INT PRIMARY KEY, customerid INT, totalamount DECIMAL(10,2), FOREIGN KEY (customerid) REFERENCES customers(id) ); CREATE TABLE customers ( id INT PRIMARY KEY, name VARCHAR(255) ); " Define messages for input messages = [ {"role": "system", "content": systeminstruction}, {"role": "user", "content": userquery}, ] Print the generated SQL query print(response[0]['generatedtext']) ``` - Developed by: yasserrmd - License: apache-2.0 - Finetuned from model : unsloth/qwen2.5-coder-1.5b-instruct-bnb-4bit This qwen2 model was trained 2x faster with Unsloth and Huggingface's TRL library.
AgentUX-4B-gguf
EXAONE-4.0-1.2B-gguf
DentaInstruct-1.2B
smollm3-gguf
This is a quantized GGUF version of HuggingFaceTB/SmolLM3-3B-Base, optimized for fast, local inference using `llama.cpp`, `llm-gguf`, or `Ollama`. For training details, tokenizer, chat format, and architecture: 👉 SmolLM3‑3B‑Base on Hugging Face
GemmaECG-Vision-gguf
kallamni-4b-v1-gguf
qwen-reasoning
caselaw-cpt-8b-gguf
- Developed by: yasserrmd - License: apache-2.0 - Finetuned from model : yasserrmd/caselaw-cpt-8b This llama model was trained 2x faster with Unsloth and Huggingface's TRL library.
Kimina-Prover-Distill-1.7B-gguf
MedScholar-Reasoning-1.5B-gguf
Midm-2.0-Mini-Instruct-gguf
OpenReasoning-Nemotron-1.5B-gguf
qwen2.5-html-0.5b-gguf
OCRFlux-3B-gguf
Qwen2.5-7B-Instruct-1M-gguf
Yehia-7B-preview-gguf
- Developed by: yasserrmd - License: apache-2.0 - Quantized from model : Navid-AI/Yehia-7B-preview
LFM2-700M-gguf
WebSailor-3B-gguf
II-Medical-8B-1706-gguf
UIGEN-X-8B-gguf
ERNIE-4.5-0.3B-gguf
kallamni-2.6b-v1
كلّمني 2.6B v1 هو نموذج لغوي بالحجم 2.6 مليار باراميتر، مخصّص للمحادثة باللهجة الإماراتية المحكية. تم تدريبه ليولّد إجابات طبيعية باللهجة المحلية بعيدًا عن العربية الفصحى. هذا الإصدار مطوّر عن kallamni-1.2b-v1 بقدرة أكبر وجودة أفضل. نوع النموذج: نموذج مولّد للنصوص (Causal LM) مدرّب للتعليمات والمحادثة اللغة: اللهجة الإماراتية المحكية فقط النموذج الأساسي: `LiquidAI/LFM2-2.6B` التدريب: Fine-tuning باستخدام LoRA، لمدّة ~3 عصور تدريبية (epochs) الأدوات: Unsloth + TRL (SFTTrainer) البيانات: 35 ألف مثال محادثة مصطنع (Synthetic Dataset) الحجم: حوالي 35,000 مثال سؤال/جواب المصدر: بيانات اصطناعية تم توليدها خصيصًا باللهجة الإماراتية المجالات المغطاة: الحياة اليومية (التسوق، الطقس، التحية، العائلة، المواصلات) المناسبات الاجتماعية (الأعياد، الأعراس، المجالس) الروتين المنزلي والأنشطة البسيطة صيغة البيانات: محادثات chat بين user و assistant، مثلاً: Unsloth (تدريب سريع وفعّال بالذاكرة) TRL / SFTTrainer النموذج الأساسي: `LiquidAI/LFM2-2.6B` الإستراتيجية: LoRA على طبقات الانتباه والـ MLP قوالب محادثة ثابتة (System/User/Assistant) تعليمات للنظام باللهجة الإماراتية لتفادي الفصحى يعطي ردود طبيعية باللهجة الإماراتية مع مفردات محلية يحافظ على الأسلوب الودي في المحادثة متعددة الأدوار أحيانًا يميل لاستخدام كلمات فصحى → يُنصح باستخدام system prompt قوي روبوتات محادثة باللهجة الإماراتية مساعدات صوتية مخصصة أدوات تعليمية لتعلّم اللهجة الإماراتية أبحاث الذكاء الاصطناعي في اللهجات العربية قد يظهر بعض الانزلاق للفصحى إذا لم يُوجَّه جيدًا غير مخصص للاستشارات المتخصصة (طب، قانون، مالية) قد يولّد معلومات غير دقيقة أو عامة فريق LiquidAI على النموذج الأساسي `LFM2-2.6B` مكتبات Unsloth و TRL مجتمع الذكاء الاصطناعي العربي على الملاحظات والتجارب cc-by-nc-4.0 (للاستخدام غير التجاري). يرجى مراجعة الرخصة قبل أي استخدام تجاري. Kallamni 2.6B v1 is a 2.6B-parameter Arabic conversational model fine-tuned specifically for spoken Emirati Arabic (اللهجة الإماراتية المحكية). It’s built to produce natural, fluent, culturally aligned replies for everyday chat—not Modern Standard Arabic. Model type: Causal LM, instruction-tuned for chat Languages: Emirati Arabic (spoken style) Base model: `LiquidAI/LFM2-2.6B` Fine-tuning: LoRA adapters, ~3 epochs Frameworks: Unsloth + TRL (SFTTrainer) Dataset: 35K synthetic Emirati Q&A pairs (instruction-style) Size: ~35,000 examples Source: Synthetic Q&A generated to target everyday Emirati conversational use Domains covered: Daily life (shopping, weather, greetings, family, transport) Social & cultural contexts (Eid, weddings, gatherings/majlis) Home routines and casual plans Format: Chat-style messages using a simple user/assistant format, e.g. Unsloth → memory-efficient, faster finetuning TRL (SFTTrainer) → supervised instruction tuning Base model: `LiquidAI/LFM2-2.6B` Epochs: ~3 full passes over the 35K set Strategy LoRA on attention + MLP Consistent chat template during SFT Emirati-dialect system instruction to bias outputs away from MSA Always include a short Arabic system message forcing Emirati dialect. Keep few-shot examples short, colloquial, and clearly Emirati (avoid MSA). For longer tasks, set `maxnewtokens` higher (e.g., 256–512) and lower `temperature` for stability. Produces colloquial Emirati wording with consistent dialectal markers Handles short, casual turns well; maintains tone across multi-turn chat Occasionally mixes in general Arabic—reinforce with a strong system message Emirati-dialect chatbots and voice assistants Educational tools for spoken Emirati practice Research on Gulf-Arabic conversational modeling May drift toward MSA or generic Arabic without a firm system prompt Not suitable for specialized (medical/legal/financial) advice Can produce incorrect or outdated facts; verify critical content LiquidAI for the `LFM2-2.6B` base Unsloth and TRL for training tooling Thanks to the Arabic ML community for open resources and evaluations cc-by-nc-4.0 (non-commercial). Check the license before using in commercial settings.
CoALM-8B-gguf
Solidity-LLM-gguf
A.X-3.1-Light-gguf
Fanar-1-9B-Instruct-gguf
granite-embedding-r2-onnx
This is the INT8-quantized ONNX version of `ibm-granite/granite-embedding-english-r2`. It is optimized to run efficiently on CPU using 🤗 Optimum with ONNX Runtime. - Embedding dimension: 768 - Precision: INT8 (dynamic quantization) - Backend: ONNX Runtime - Use case: text embeddings, semantic search, clustering, retrieval Quantization reduces model size and makes inference faster on CPUs while preserving accuracy. Pooling strategy here is mean pooling; you can adapt CLS pooling or max pooling as needed. Works seamlessly with Hugging Face Hub + `optimum.onnxruntime`. Original Granite Embedding English R2 Optimum ONNX Runtime docs
SmallThinker-3B-Preview-gguf
gemma-3-1b-it-GGUF
Aryabhata-1.0-gguf
psychiatry-gemma-300m-emb
SentenceTransformer based on google/embeddinggemma-300m This is a sentence-transformers model finetuned from google/embeddinggemma-300m. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more. Model Description - Model Type: Sentence Transformer - Base model: google/embeddinggemma-300m - Maximum Sequence Length: 2048 tokens - Output Dimensionality: 768 dimensions - Similarity Function: Cosine Similarity - Documentation: Sentence Transformers Documentation - Repository: Sentence Transformers on GitHub - Hugging Face: Sentence Transformers on Hugging Face Size: 20,000 training samples Columns: sentence0 and sentence1 Approximate statistics based on the first 1000 samples: | | sentence0 | sentence1 | |:--------|:----------------------------------------------------------------------------------|:------------------------------------------------------------------------------------| | type | string | string | | details | min: 8 tokens mean: 19.87 tokens max: 52 tokens | min: 17 tokens mean: 89.86 tokens max: 246 tokens | Samples: | sentence0 | sentence1 | |:--------------------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | What factors can influence the decision-making capacity of individuals with psychiatric illnesses? | Various factors can influence the decision-making capacity of individuals with psychiatric illnesses. Psychopathology, insight about the illness, and cognitive dysfunction can all play a role. For example, the presence of delusions may affect decision-making, and impaired insight can hinder the recognition of the need for treatment. Emergency treatments also pose challenges, and consent may need to be obtained from a nominated representative. In cases of neurocognitive disorders, a caregiver may make decisions on behalf of the patient. | | What are some potential consequences of aggression events in inpatient psychiatry? | Aggression events in inpatient psychiatry can have several consequences. They can threaten the safety of both patients and workers, leading to the enforcement of compulsory measures such as physical restraining of the patients. Aggressive behavior can also result in physical traumas that require treatment, with some cases involving nurses as victims. Exposure to aggression can lead to severe stress and adverse psychological consequences for both patients and staff. | | What are some barriers to interprofessional communication between nursing and medical professions? | The analysis of students' research papers identified several themes that were cited as barriers to interprofessional communication between nursing and medical professions. These barriers include neglect of social norms and values in daily communications, hierarchical differences between nursing and medical professions, academic versus apprenticeship nursing education, and stress at work due to understaffing as organizational limitations. | Loss: MultipleNegativesRankingLoss with these parameters: Training Hyperparameters Non-Default Hyperparameters - `perdevicetrainbatchsize`: 4 - `perdeviceevalbatchsize`: 4 - `numtrainepochs`: 1 - `multidatasetbatchsampler`: roundrobin - `overwriteoutputdir`: False - `dopredict`: False - `evalstrategy`: no - `predictionlossonly`: True - `perdevicetrainbatchsize`: 4 - `perdeviceevalbatchsize`: 4 - `pergputrainbatchsize`: None - `pergpuevalbatchsize`: None - `gradientaccumulationsteps`: 1 - `evalaccumulationsteps`: None - `torchemptycachesteps`: None - `learningrate`: 5e-05 - `weightdecay`: 0.0 - `adambeta1`: 0.9 - `adambeta2`: 0.999 - `adamepsilon`: 1e-08 - `maxgradnorm`: 1 - `numtrainepochs`: 1 - `maxsteps`: -1 - `lrschedulertype`: linear - `lrschedulerkwargs`: {} - `warmupratio`: 0.0 - `warmupsteps`: 0 - `loglevel`: passive - `loglevelreplica`: warning - `logoneachnode`: True - `loggingnaninffilter`: True - `savesafetensors`: True - `saveoneachnode`: False - `saveonlymodel`: False - `restorecallbackstatesfromcheckpoint`: False - `nocuda`: False - `usecpu`: False - `usempsdevice`: False - `seed`: 42 - `dataseed`: None - `jitmodeeval`: False - `useipex`: False - `bf16`: False - `fp16`: False - `fp16optlevel`: O1 - `halfprecisionbackend`: auto - `bf16fulleval`: False - `fp16fulleval`: False - `tf32`: None - `localrank`: 0 - `ddpbackend`: None - `tpunumcores`: None - `tpumetricsdebug`: False - `debug`: [] - `dataloaderdroplast`: False - `dataloadernumworkers`: 0 - `dataloaderprefetchfactor`: None - `pastindex`: -1 - `disabletqdm`: False - `removeunusedcolumns`: True - `labelnames`: None - `loadbestmodelatend`: False - `ignoredataskip`: False - `fsdp`: [] - `fsdpminnumparams`: 0 - `fsdpconfig`: {'minnumparams': 0, 'xla': False, 'xlafsdpv2': False, 'xlafsdpgradckpt': False} - `fsdptransformerlayerclstowrap`: None - `acceleratorconfig`: {'splitbatches': False, 'dispatchbatches': None, 'evenbatches': True, 'useseedablesampler': True, 'nonblocking': False, 'gradientaccumulationkwargs': None} - `parallelismconfig`: None - `deepspeed`: None - `labelsmoothingfactor`: 0.0 - `optim`: adamwtorchfused - `optimargs`: None - `adafactor`: False - `groupbylength`: False - `lengthcolumnname`: length - `ddpfindunusedparameters`: None - `ddpbucketcapmb`: None - `ddpbroadcastbuffers`: False - `dataloaderpinmemory`: True - `dataloaderpersistentworkers`: False - `skipmemorymetrics`: True - `uselegacypredictionloop`: False - `pushtohub`: False - `resumefromcheckpoint`: None - `hubmodelid`: None - `hubstrategy`: everysave - `hubprivaterepo`: None - `hubalwayspush`: False - `hubrevision`: None - `gradientcheckpointing`: False - `gradientcheckpointingkwargs`: None - `includeinputsformetrics`: False - `includeformetrics`: [] - `evaldoconcatbatches`: True - `fp16backend`: auto - `pushtohubmodelid`: None - `pushtohuborganization`: None - `mpparameters`: - `autofindbatchsize`: False - `fulldeterminism`: False - `torchdynamo`: None - `rayscope`: last - `ddptimeout`: 1800 - `torchcompile`: False - `torchcompilebackend`: None - `torchcompilemode`: None - `includetokenspersecond`: False - `includenuminputtokensseen`: False - `neftunenoisealpha`: None - `optimtargetmodules`: None - `batchevalmetrics`: False - `evalonstart`: False - `useligerkernel`: False - `ligerkernelconfig`: None - `evalusegatherobject`: False - `averagetokensacrossdevices`: False - `prompts`: None - `batchsampler`: batchsampler - `multidatasetbatchsampler`: roundrobin - `routermapping`: {} - `learningratemapping`: {} Training Logs | Epoch | Step | Training Loss | |:-----:|:----:|:-------------:| | 0.1 | 500 | 0.0155 | | 0.2 | 1000 | 0.0172 | | 0.3 | 1500 | 0.0422 | | 0.4 | 2000 | 0.018 | | 0.5 | 2500 | 0.0058 | | 0.6 | 3000 | 0.0153 | | 0.7 | 3500 | 0.0059 | | 0.8 | 4000 | 0.0041 | | 0.9 | 4500 | 0.0026 | | 1.0 | 5000 | 0.003 | Framework Versions - Python: 3.12.11 - Sentence Transformers: 5.1.0 - Transformers: 4.56.1 - PyTorch: 2.8.0+cu128 - Accelerate: 1.10.1 - Datasets: 4.0.0 - Tokenizers: 0.22.0
qwen3-4b-agentic-reasoner
kallamni-1.2b-v1
كالّمني 1.2B v1 هو نموذج محادثة باللغة العربية بعدد 1.2 مليار معامل، مُدرّب خصيصاً على اللهجة الإماراتية المحكية. النموذج مصمَّم ليُنتج ردوداً طبيعية، سلسة، ومرتبطة ثقافياً بالحياة اليومية، بعيداً عن العربية الفصحى الرسمية. نوع النموذج: لغة سببية (Causal LM) مُهيأ للتعليمات (Instruction-tuned). اللغات: اللهجة الإماراتية المحكية. التدريب: 3 دورات (Epochs) باستخدام محولات LoRA. الأُطر: Unsloth + TRL. البيانات: 12,324 زوج سؤال-جواب باللهجة الإماراتية، مولّدة باستخدام GPT-5 و GPT-4o. الحجم: 12,324 مثال. المصدر: أزواج سؤال-جواب اصطناعية مولدة عبر GPT-5 + GPT-4o. المجالات المغطاة: المحادثات اليومية (تسوق، طقس، تحايا، عائلة، مواصلات). الفعاليات الاجتماعية والثقافية (الأعياد، الأعراس، المجالس). الروتين اليومي والأنشطة المنزلية. الصيغة: أمثلة على شكل محادثة مع وسوم ` user` / ` assistant`، مثل: Unsloth → تحسين كفاءة التدريب والذاكرة، أسرع بحوالي 2×. TRL (SFTTrainer) → تدريب مُراقَب مع مواءمة التعليمات. النموذج الأساسي: نموذج خفيف (1.2B) سببي. عدد الدورات: 3 مرور كامل على البيانات. استراتيجية التدريب: LoRA على طبقات الانتباه + الـ MLP. استخدام قالب محادثة ثابت عبر TRL. يمكنك تحميل وتشغيل النموذج باستخدام مكتبة `transformers`: دقة اللهجة: حوالي 85٪ اتساق إماراتي. ملاءمة الإجابة: \~90٪ جيدة أو شبه جيدة. نقاط الضعف: أحياناً جمل شبه رسمية أو حشو عام. نقاط القوة: استخدام تعابير إماراتية أصيلة. إجابات طبيعية بطول مناسب (8–15 كلمة). تغطية متوازنة لمواضيع العائلة، العمل، السفر، والمجتمع. شات بوتات ومساعدات صوتية للهجة الإماراتية. أدوات تعليمية لتعلم المحادثة باللهجة. لبنة بيانات لأبحاث نماذج اللغة الخليجية. قد يخلط أحياناً مع الفصحى أو العربية العامة. غير مناسب للإجابات العلمية أو القانونية أو الطبية. غير مهيأ للسياقات المهنية أو المتخصصة. فريق Unsloth على أدوات التدريب الفعّالة. TRL من Hugging Face لدعم التدريب الموجه بالتعليمات. توليد البيانات الاصطناعية عبر GPT-5 و GPT-4o. إلهام من إصدارات Liquid AI مفتوحة الأوزان. Kallamni 1.2B v1 is a 1.2B parameter Arabic conversational model fine-tuned specifically for spoken Emirati Arabic (اللهجة الإماراتية المحكية). It is designed to generate natural, fluent, and culturally relevant responses for daily-life conversations, rather than formal Modern Standard Arabic (MSA). Model type: Causal LM, instruction-tuned for chat. Languages: Emirati Arabic dialect (spoken style). Fine-tuning: 3 epochs with LoRA adapters. Frameworks: Unsloth + TRL. Dataset: 12,324 synthetic Emirati Arabic Q\&A pairs generated using GPT-5 and GPT-4o. Size: 12,324 examples. Source: Synthetic Q\&A pairs created via GPT-5 + GPT-4o, Emirati dialect. Domains covered: Daily life conversations (shopping, weather, greetings, family, transport). Social and cultural events (Eid, weddings, gatherings). Household and personal routines. Format: Chat-style examples with ` user` / ` assistant` tokens, e.g.: Unsloth → optimized finetuning, memory efficiency, \~2× faster training. TRL (SFTTrainer) → supervised fine-tuning with instruction alignment. Base model: Lightweight 1.2B causal LM. Epochs: 3 full passes over the dataset. Fine-tuning strategy: LoRA adapters on attention + MLP layers. Chat template applied consistently with TRL. You can load and run the model with `transformers`: Dialect accuracy: \~85% Emirati consistency. Answer relevance: \~90% good/semi-good. Weak cases: occasional semi-formal phrasing or generic filler. Strengths: Culturally aligned Emirati expressions. Natural conversational length (8–15 words minimum). Balanced coverage of family, work, travel, and social contexts. Chatbots & voice assistants for Emirati Arabic. Language learning tools for practicing dialect. Dataset building block for Gulf Arabic LLM research. May mix in some MSA or generic Arabic in rare cases. Not suitable for factual QA outside daily conversations. Not designed for professional/legal/medical contexts. Unsloth team for efficient fine-tuning tooling. TRL from Hugging Face for alignment training. Synthetic dataset generation powered by GPT-5 and GPT-4o. Liquid AI for innovative open-weight model releases that inspired experimentation.
GeoScholar-QA-1.2B
[](https://creativecommons.org/licenses/by/4.0/) [](https://huggingface.co/datasets/GeoGPT-Research-Project/GeoGPT-QA) GeoScholar-QA is a large language model fine-tuned for academic question answering in the field of geoscience. It is built upon the Liquid AI LFM2 base model and has been trained using the Unsloth framework with the GeoGPT-QA dataset. [4] This model's primary strength lies in its ability to explain concepts and theories within the earth sciences. It is not designed to provide consistently accurate statistics or citations. Dataset name: GeoGPT-QA Publisher / Project: GeoGPT-Research-Project Size: Approximately 41,400 rows in the `train` split. [4] Format: The dataset is tabular (originally CSV, automatically converted to Parquet) and includes fields such as `question`, `answer`, `title`, `authors`, `doi`, `journal`, `volume`, `pages`, and `license`. [4] Language: English [4] License: CC-BY 4.0 — You are permitted to share and adapt the dataset, but you must provide attribution and indicate if any changes were made. [4] Base model: Liquid AI LFM2 is a hybrid model designed for on-device deployment, offering a balance of quality, speed, and memory efficiency. [8] Fine-tuning framework: Unsloth is a framework designed to speed up and optimize the fine-tuning of large language models, making it more accessible on limited hardware. [10, 11] Training data: GeoGPT-QA [4] Objective: The model was trained through Supervised Fine-Tuning (SFT) using question-answer pairs. The focus of this training was on geoscience theory, conceptual knowledge, and explanations. [19] Effective batch size: A low per-device batch size was utilized along with gradient accumulation to prevent out-of-memory (OOM) errors during training. Training progress: (Example) Approximately 3000 steps, covering about 58% of the dataset. Please adjust this based on your final training run. GeoScholar-QA is intended for the following applications: Providing academic explanations in various fields of geoscience, such as plate tectonics, hydrology, and geomorphology. Serving as a teaching and learning aid for students and educators. Enhancing conceptual and theoretical understanding of earth science principles. High-risk or decision-making tasks that require precise numerical data or statistics. Reliance on the generated citations or study results without independent verification. The model may generate inaccurate numbers, study names, datasets, or locations. Answers in applied or technical contexts may be overgeneralized or vague. It should not be used as a substitute for verification by a domain expert, especially in research or policy-making settings. This model is trained using the GeoGPT-QA dataset, which is licensed under CC-BY 4.0. [4] You must give appropriate credit to the GeoGPT Research Project and include a link to the dataset. If you adapt or build upon this model, you must indicate that changes have been made. Model type: Text generation / QA Domain: Geoscience, Earth Sciences Base model: Liquid AI LFM2 Training method: SFT (Supervised Fine-Tuning) License: CC-BY 4.0
Seed-X-Instruct-7B-gguf
Llada 346m
This is a 346 Million parameter Large Language Diffusion Model trained with masked diffusion processes. This model demonstrates that diffusion-based approaches can be viable alternatives to autoregressive language models. Key Features - Architecture: Masked Diffusion Model (MDM) with Transformer encoder - Parameters: 346M - Sequence Length: 512 tokens - Vocab Size: 50,257 (GPT-2) - Training Data: 50,000 WikiText-2 samples - Algorithm: Masked Diffusion Model (MDM) - Loss Function: Cross-entropy on masked positions - Optimizer: AdamW (lr=3e-5, betas=(0.9, 0.95)) - Batch Size: 16 (effective: 32 with grad accumulation) - Gradient Checkpointing: Enabled - Mixed Precision: AMP (FP32/FP16) - Epochs: 4 - Training Samples: 50,000 - GPU: NVIDIA V100 (22GB VRAM) - Training Time: ~20 hours | Metric | Value | |--------|-------| | Initial Loss | 5.96 | | Final Loss | 4.94 | | Loss Reduction | 17.1% | | Total Parameters | 346M | | Model Size (FP32) | 1.38 GB | Advantages ✅ Bidirectional Context: Sees full context unlike autoregressive models ✅ Parallel Generation: Can predict multiple tokens simultaneously ✅ Reversal Invariance: Equal performance on forward and reverse tasks ✅ Global Coherence: Reduces error accumulation Limitations ❌ Slower generation (iterative denoising process) ❌ Requires more compute for inference ❌ Not fine-tuned for specific tasks Forward Process - Gradually mask tokens randomly - At timestep t ∈ [0,1], each token masked with probability t - Creates noisy version of input Reverse Process - Iteratively predict and unmask tokens - Uses transformer to predict masked positions - Trained with cross-entropy loss on masked tokens only - Gradient Checkpointing: Save memory during backprop - Mixed Precision (AMP): Use FP16 where possible - Gradient Accumulation: Simulate larger batches - Layer Norm First: Improved training stability MIT License - Feel free to use for research and commercial purposes - Based on "Large Language Diffusion Models" (Nie et al., 2025) - Built with PyTorch and Transformers - Trained on WikiText-2 dataset - Inspired by diffusion models for vision (DiT, Genie) For issues, questions, or suggestions, please open an issue on GitHub or contact the model author. Last Updated: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}
Kallamni Embed V1
kallamni-embed-v1 — Emirati Spoken Arabic Embedding Model Author: @yasserrmd Version: v1 License: Apache 2.0 🎯 Motivation `kallamni-embed-v1` was built to address a gap in Arabic NLP — the absence of a high-fidelity model for spoken Emirati Arabic. While most Arabic embeddings (AraBERT, CAMeLBERT, MARBERT) focus on MSA or pan-Arab dialects, they fail to capture UAE’s informal patterns such as: - Lexical variants: وايد, مب, سير, ويّاكم - Code-switching: “bro yalla lets go al mall” - Arabizi + emojis: “ana mb 3arf 😅 sho y9eer!” This model learns these naturally occurring forms using curated Emirati-style Q&A and conversation datasets. This is a sentence-transformers model finetuned from BAAI/bge-m3. It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more. Model Description - Model Type: Sentence Transformer - Base model: BAAI/bge-m3 - Maximum Sequence Length: 8192 tokens - Output Dimensionality: 1024 dimensions - Similarity Function: Cosine Similarity - Documentation: Sentence Transformers Documentation - Repository: Sentence Transformers on GitHub - Hugging Face: Sentence Transformers on Hugging Face Size: 50,000 training samples Columns: sentence0 and sentence1 Approximate statistics based on the first 1000 samples: | | sentence0 | sentence1 | |:--------|:----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------| | type | string | string | | details | min: 7 tokens mean: 13.47 tokens max: 24 tokens | min: 8 tokens mean: 18.85 tokens max: 36 tokens | Samples: | sentence0 | sentence1 | |:-----------------------------------------------------|:--------------------------------------------------------------| | قد استخدمت تطبيق تتبع السعرات الحرارية؟ | إيه، يساعدني في مراقبة أكلي ونسبة البروتين. | | شو كانت أول تجربة لك في التدريب العملي؟ | كانت مميزة، استفدت وتعلمت أشياء ما تدرسها الكتب. | | إذا حد قال 'على عينه حار'، شو يقصد؟ | يعني هذا شخص صريح وما يجامل، يقول اللي في قلبه. | Loss: MultipleNegativesRankingLoss with these parameters: Training Hyperparameters Non-Default Hyperparameters - `perdevicetrainbatchsize`: 24 - `perdeviceevalbatchsize`: 24 - `fp16`: True - `multidatasetbatchsampler`: roundrobin - `overwriteoutputdir`: False - `dopredict`: False - `evalstrategy`: no - `predictionlossonly`: True - `perdevicetrainbatchsize`: 24 - `perdeviceevalbatchsize`: 24 - `pergputrainbatchsize`: None - `pergpuevalbatchsize`: None - `gradientaccumulationsteps`: 1 - `evalaccumulationsteps`: None - `torchemptycachesteps`: None - `learningrate`: 5e-05 - `weightdecay`: 0.0 - `adambeta1`: 0.9 - `adambeta2`: 0.999 - `adamepsilon`: 1e-08 - `maxgradnorm`: 1 - `numtrainepochs`: 3 - `maxsteps`: -1 - `lrschedulertype`: linear - `lrschedulerkwargs`: {} - `warmupratio`: 0.0 - `warmupsteps`: 0 - `loglevel`: passive - `loglevelreplica`: warning - `logoneachnode`: True - `loggingnaninffilter`: True - `savesafetensors`: True - `saveoneachnode`: False - `saveonlymodel`: False - `restorecallbackstatesfromcheckpoint`: False - `nocuda`: False - `usecpu`: False - `usempsdevice`: False - `seed`: 42 - `dataseed`: None - `jitmodeeval`: False - `useipex`: False - `bf16`: False - `fp16`: True - `fp16optlevel`: O1 - `halfprecisionbackend`: auto - `bf16fulleval`: False - `fp16fulleval`: False - `tf32`: None - `localrank`: 1 - `ddpbackend`: None - `tpunumcores`: None - `tpumetricsdebug`: False - `debug`: [] - `dataloaderdroplast`: True - `dataloadernumworkers`: 0 - `dataloaderprefetchfactor`: None - `pastindex`: -1 - `disabletqdm`: False - `removeunusedcolumns`: True - `labelnames`: None - `loadbestmodelatend`: False - `ignoredataskip`: False - `fsdp`: [] - `fsdpminnumparams`: 0 - `fsdpconfig`: {'minnumparams': 0, 'xla': False, 'xlafsdpv2': False, 'xlafsdpgradckpt': False} - `fsdptransformerlayerclstowrap`: None - `acceleratorconfig`: {'splitbatches': False, 'dispatchbatches': None, 'evenbatches': True, 'useseedablesampler': True, 'nonblocking': False, 'gradientaccumulationkwargs': None} - `deepspeed`: None - `labelsmoothingfactor`: 0.0 - `optim`: adamwtorch - `optimargs`: None - `adafactor`: False - `groupbylength`: False - `lengthcolumnname`: length - `ddpfindunusedparameters`: None - `ddpbucketcapmb`: None - `ddpbroadcastbuffers`: False - `dataloaderpinmemory`: True - `dataloaderpersistentworkers`: False - `skipmemorymetrics`: True - `uselegacypredictionloop`: False - `pushtohub`: False - `resumefromcheckpoint`: None - `hubmodelid`: None - `hubstrategy`: everysave - `hubprivaterepo`: None - `hubalwayspush`: False - `gradientcheckpointing`: False - `gradientcheckpointingkwargs`: None - `includeinputsformetrics`: False - `includeformetrics`: [] - `evaldoconcatbatches`: True - `fp16backend`: auto - `pushtohubmodelid`: None - `pushtohuborganization`: None - `mpparameters`: - `autofindbatchsize`: False - `fulldeterminism`: False - `torchdynamo`: None - `rayscope`: last - `ddptimeout`: 1800 - `torchcompile`: False - `torchcompilebackend`: None - `torchcompilemode`: None - `includetokenspersecond`: False - `includenuminputtokensseen`: False - `neftunenoisealpha`: None - `optimtargetmodules`: None - `batchevalmetrics`: False - `evalonstart`: False - `useligerkernel`: False - `evalusegatherobject`: False - `averagetokensacrossdevices`: False - `prompts`: None - `batchsampler`: batchsampler - `multidatasetbatchsampler`: roundrobin Training Logs | Epoch | Step | Training Loss | |:------:|:----:|:-------------:| | 0.4803 | 500 | 0.3377 | | 0.9606 | 1000 | 0.1394 | | 1.4409 | 1500 | 0.0828 | | 1.9212 | 2000 | 0.0465 | | 2.4015 | 2500 | 0.0317 | | 2.8818 | 3000 | 0.0211 | | Metric | multilingual-e5-large | kallamni-embed-v1 | |:--|:--:|:--:| | nDCG@10 | 0.0268 | 0.0421 | | MRR | 0.0322 | 0.0437 | | Precision@1 | 0.0133 | 0.0267 | | Pearson Corr | −0.2718 | −0.0963 | | F1 | 1.000 | 1.000 | → +57 % gain in retrieval relevance over the multilingual baseline. | Subset | multilingual-e5-large | kallamni-embed-v1 | |:--|:--:|:--:| | PURE EMI | 0.0359 | 0.0582 | | ARABIZI + EMOJI | 0.0012 | 0.0167 | | CODE-SWITCH | 0.0010 | 0.0219 | | GULF OTHER | 0.0543 | 0.0469 | | SOCIAL NOISE | 0.0127 | 0.0334 | | CONTROL MIX | 0.0157 | 0.0386 | Statistical significance: Δ nDCG@10 = +0.0218 (95 % CI [0.0008 – 0.0439], p = 0.04) The Emirati-tuned model maintains high stability across dialectal noise — especially Arabizi, Code-Switch, and Social Noise subsets — where multilingual models collapse. - Handles informal input: Arabizi, emojis, typos, and Gulf-accented syntax. - Optimized for retrieval & RAG: Works well in vector databases for Emirati chatbots, citizen-service platforms, and multilingual UAE apps. - Fast inference: ~15 % faster than multilingual-e5-large on average batch size 32. - Cross-dialect adaptability: Maintains coherence on Gulf-neighbor variations (Kuwaiti, Omani). 🧩 Why Other Models Were Excluded | Model | nDCG@10 (pilot) | Pearson | Comment | |:--|--:|--:|:--| | CAMeLBERT-DA | 0.018 | −0.42 | Trained on MSA + Levantine Twitter, weak Emirati signal | | AraBERT v2 | 0.023 | −0.38 | Diacritic bias, poor slang handling | | MARBERT | 0.031 | −0.29 | Broad Gulf coverage, low UAE lexical overlap | | mE5-base | 0.025 | −0.31 | Generic multilingual, not dialect-aware | These models were retained for reference but excluded from the final leaderboard because they lack UAE-specific conversational grounding. 🔬 Benchmark Protocol All datasets were auto-synthesized inside the evaluation script to ensure control and reproducibility. - Retrieval pairs: 500 queries × 500 docs (3 hard negatives per gold) - Similarity pairs: 2 000 sentence pairs - Classification: 3 600 texts across 3 classes (Complaint / Humor / Question) - 5-fold cross-validation + paired bootstrap CIs | Task | Description | Example | |:--|:--|:--| | Semantic Search | Embed Emirati chat data for retrieval | “وين المكان اللي في الصورة؟” → relevant caption | | Conversational RAG | Retrieve contextually similar utterances | “شو معنى كلمة مب؟” | | Intent Classification | Complaint vs Informal chat vs Inquiry | “السيارة ما تشتغل من أمس 😡” | Framework Versions - Python: 3.11.13 - Sentence Transformers: 4.1.0 - Transformers: 4.52.4 - PyTorch: 2.6.0+cu124 - Accelerate: 1.8.1 - Datasets: 3.6.0 - Tokenizers: 0.21.2
Text2SQL-1.5B-gguf
Neuro-Orchestrator-8B
OphthaScholar-1.2B
- Developed by: yasserrmd - License: apache-2.0 - Finetuned from model : unsloth/LFM2-1.2B This lfm2 model was trained 2x faster with Unsloth and Huggingface's TRL library.
geo-gemma-300m-emb
ConstructionSafetyQA-1.2B-V1
ConstructionSafetyQA-1.2B-V1 is a fine-tuned version of LiquidAI/LFM2-1.2B, optimized using Unsloth for construction safety question answering. This model is designed to provide short, simple, and practical answers for construction workers and supervisors. It focuses on safety training, hazard prevention, and site best practices, avoiding unnecessary technical jargon. Model Details - Base Model: LiquidAI/LFM2-1.2B - Fine-tuning Framework: Unsloth - Task: Instruction-tuned for Construction Safety Q&A - Answer Style: - Short (1–3 sentences) - Worker/supervisor-focused - Safety-first, practical instructions Training aid for construction workers and supervisors On-site safety Q\&A assistant Toolbox talks and quick safety refreshers Not intended to replace certified training or safety regulations. Always follow official site rules and local laws. Provides general safety guidance, not region-specific code compliance. Answers are short and simple by design (not technical references). May need human review for critical safety decisions. Base model: LiquidAI/LFM2-1.2B Fine-tuning: Unsloth Inspiration: Promoting construction safety awareness with accessible AI tools.
caselaw-cpt-8b
Human-Like-Qwen2.5-1.5B-Instruct-gguf
Seed-X-PPO-7B-gguf
DeepScaleR-1.5B-Preview-gguf
Seed-X-RM-7B-gguf
diffusion-text-demo
SoftwareArchitecture-Instruct-v1
Domain: Software Architecture (for technical professionals) Type: Instruction-tuned LLM Base: LiquidAI/LFM2-1.2B (1.2 B parameter hybrid edge-optimized model) :contentReference[oaicite:1]{index=1} Fine-tuned on: `ajibawa-2023/Software-Architecture` dataset Author: Mohamed Yasser (`yasserrmd`) SoftwareArchitecture-Instruct-v1 is an instruction-tuned adaptation of LiquidAI’s lightweight and efficient LFM2-1.2B model. It’s specifically tailored to deliver high-quality, accurate, and technically rich responses to questions about software architecture—designed with engineers and architects in mind. The base model, LFM2-1.2B, features a 16-layer hybrid design (10 convolutional + 6 grouped query attention layers), supports a 32,768 token context, and offers fast inference on CPU, GPU, and NPU platforms—ideal for both cloud and edge deployments :contentReference[oaicite:2]{index=2}. We performed a 50-prompt benchmark across diverse software architecture topics: | Metric | Value | |------------------------------|----------------------| | Average Words per Response | ~144 | | Median Words per Response | ~139 | | Min / Max Words per Response | 47 / 224 | | Avg Sentences per Output | ~8.6 | | Lexical Diversity (TTR) | ~0.73 | | Readability Complexity | High (professional-level) | | Accuracy (topic keyword coverage) | Majority ≥ 60% | | Off-topic Responses | None detected | Interpretation: - Responses are substantive and domain-appropriate for technical audiences. - Coverage is strong—while a few answers could benefit from including extra keywords, the core technical content is accurate. - Readability intentionally leans into complexity, aligning with expert users. - Ideal for: Software architects, system designers, engineering leads, and experienced developers seeking architecture guidance. - Use cases include: - Exploring architectural patterns (e.g., CQRS, Saga, API Gateway). - Drafting design docs and decision rationale. - Architectural interview prep and system design walkthroughs. Not intended for: - Non-technical or general-purpose Q&A. - In-depth code generation or debugging without architectural focus. Base model: `LiquidAI/LFM2-1.2B`, optimized for edge/CPU inference ([ai.plainenglish.io][1], [generativeai.pub][2], [AI Models][3], [marktechpost.com][4], [Hugging Face][5]) Dataset: `ajibawa‑2023/Software‑Architecture` Fine-tuning: Supervised instruction tuning (Optionally include parameters if available—epochs, LR, hardware used) Answer length is capped by `maxnewtokens`. Some responses may truncate mid-explanation—raising this limit improves completeness. Keyword coverage is strong but not exhaustive. A few responses could benefit from enriching with additional terms. Not a replacement for expert-reviewed architectural validation—use as a support tool, not the final authority. Base model license: LFM Open License v1.0 ([Hugging Face][6]) Dataset license: (Insert dataset license if known)
DentaInstruct-1.2B-gguf
SciReason-LFM2-2.6B
[](https://opensource.org/licenses/Apache-2.0) [](https://huggingface.co/datasets/nvidia/OpenScienceReasoning-2) [](https://huggingface.co/LiquidAI/LFM2-2.6B) [](https://github.com/unsloth/unsloth) [](https://huggingface.co/yasserrmd) Model Overview SciReason-LFM2-2.6B is a fine-tuned version of LiquidAI/LFM2-2.6B, trained with Unsloth on the OpenScienceReasoning-2 dataset. The fine-tuning enhances the base model’s ability to handle multi-step scientific reasoning and produce coherent chain-of-thought explanations. Training Configuration - Framework: Unsloth - Dataset: nvidia/OpenScienceReasoning-2 - Examples: ~11,000 - Epochs: 1 - Total Steps: 1,375 - Batch size per device: 2 - Gradient Accumulation Steps: 4 - Effective Batch Size: 8 - Trainable Parameters: ~20M (LoRA / PEFT with Unsloth smart offloading) - Optimizer: AdamW - Learning Rate: 2e-4 - Weight Decay: 0.01 - LR Scheduler: cosine with warmup - Hardware: Single GPU (Unsloth offloading enabled) Scientific reasoning tasks Educational Q\&A Step-by-step logical problem solving ⚠️ Disclaimer: Not intended for clinical or legal decision-making. LiquidAI for LFM2-2.6B NVIDIA for OpenScienceReasoning-2 Unsloth for efficient fine-tuning with gradient offloading
GLM4.7-Distill-LFM2.5-1.2B
PharmaQA-1.2B
PharmaQA‑1.2B is a merged, instruction-tuned pharmacology and pharmacy domain language model based on Liquid AI LFM2-1.2B. It was fine-tuned using the MIRIAD-4.4M dataset for research and educational Q\&A in pharmacology, therapeutics, and drug mechanisms. This model is not intended for clinical or diagnostic use. | Property | Value | | ------------------ | --------------------------------------------------------------------------------------------- | | Base Model | Liquid AI `LFM2-1.2B` | | Fine-tuning Method | LoRA using Unsloth | | Parameters Trained | \~9M (0.78% of total) | | Dataset Used | MIRIAD-4.4M (subset of 50,000 examples) | | Epochs | 1 | | Final Format | Merged (LoRA + base) | | Model Size | 1.2 Billion | | License | ODC-BY v1.0 dataset license, non-commercial educational use only | | Author | Mohamed Yasser | This model is not intended for medical diagnosis, treatment planning, or patient care. It was trained on synthetic Q\&A pairs from peer-reviewed literature via MIRIAD and is for educational and academic research only. MIRIAD includes a cautionary note that aligns with OpenAI’s usage policies: > Do not use this dataset or models trained on it for actual medical diagnosis, decision-making, or any application involving real-world patients. From manual analysis of 50 unseen pharmacology questions: ✅ No hallucinations observed ✅ High alignment with biomedical terms (e.g., dihydrofolate reductase, QT prolongation) ✅ Long-form answers are clinically descriptive and accurate for education ⚠️ Short answers are concise but can lack therapeutic context Model: Educational and research use only Dataset: MIRIAD (ODC-BY v1.0) MIRIAD Team (for the dataset) Unsloth team (for fast & efficient LoRA) Hugging Face + Liquid AI for open model access Let me know if you'd like a Markdown file for this (`README.md`) or want help preparing the Hugging Face push commands.
GemmaECG-Vision
`GemmaECG-Vision` is a fine-tuned vision-language model built on `google/gemma-3n-e2b`, designed for ECG image interpretation tasks. The model accepts a medical ECG image along with a clinical instruction prompt and generates a structured analysis suitable for triage or documentation use cases. This model was developed using Unsloth for efficient fine-tuning and supports image + text inputs with medical task-specific prompt formatting. It is designed to run in offline or edge environments, enabling healthcare triage in resource-constrained settings. To assist healthcare professionals and emergency responders by providing AI-generated ECG analysis directly from medical images, without requiring internet access or cloud resources. This model expects: - An ECG image (`PIL.Image`) - A textual instruction such as: Framework: Unsloth + TRL SFTTrainer Hardware: Google Colab Pro (L4) Batch Size: 2 Epochs: 1 Learning Rate: 2e-4 Scheduler: Cosine Loss: CrossEntropy Precision: bfloat16 The training dataset is a curated subset of the PULSE-ECG/ECGInstruct dataset, reformatted for VLM instruction tuning. 3,272 samples of ECG image + structured instruction + clinical output Focused on realistic and medically relevant triage cases Dataset link: `yasserrmd/pulse-ecg-instruct-subset` The model was fine-tuned over 409 steps using the `pulse-ecg-instruct-subset` dataset. The training loss started above 9.5 and steadily declined to below 0.5, showing consistent convergence and learning throughout the single epoch. The loss curve demonstrates a stable optimization process without overfitting spikes. The chart below visualizes this progression, highlighting the model’s ability to adapt quickly to the ECG image-to-text task. Emergency triage in offline settings On-device ECG assessment Integration with medical edge devices (Jetson, Pi, Android) Rapid analysis during disaster response Not intended to replace licensed medical professionals Accuracy may vary depending on image quality Model outputs should be reviewed by a clinician before action This model is licensed under CC BY 4.0. You are free to use, modify, and distribute it with attribution.
PharmaQA-270M
PharmaQA‑270M is a compact, instruction-tuned language model for pharmacology and pharmacy domains. Based on Gemma3 (270M parameters), it was fine-tuned using LoRA and merged into a single model checkpoint for easy deployment. This model is optimized for educational and research use cases, especially where compute constraints are present. | Property | Value | | ------------------ | ---------------------------------------------------------------------------------------- | | Base Model | Google Gemma3 (270M parameters) | | Fine-tuning Method | LoRA using Unsloth | | Dataset Used | 25,000 Q\&A pairs from MIRIAD-4.4M | | Epochs | 3 | | Final Format | Merged (base + LoRA weights) | | Model Size | 270M | | License | ODC-BY v1.0 dataset license (non-commercial) | | Author | Mohamed Yasser | Do not use this model for real-world medical diagnosis, treatment, or care decisions. The model was trained on MIRIAD Q\&A pairs generated via LLMs from biomedical literature. MIRIAD and this model must be used for educational, research, and academic exploration only. This model inherits all OpenAI and ODC-BY v1.0 usage limitations associated with the dataset. | Metric | Value | | ------------------------ | --------------------------------------------------- | | Average Answer Length | 40.3 words | | Longest Answer | 95 words | | Shortest Answer | 12 words | | Empty / Short Responses | 0 | | Clinical Accuracy | ✅ Consistent terminology | | Depth in Short Responses | ⚠️ Limited | | Best Use Case | Lightweight educational deployment (MCQs, tutoring) | Model: Open for academic and non-commercial use Dataset: MIRIAD-4.4M under ODC-BY v1.0 MIRIAD creators for making the dataset openly accessible. Unsloth team for enabling fast LoRA tuning on small GPUs. Hugging Face and Google for Gemma3 base model.
mcp-instruct-v1
- Developed by: yasserrmd - License: apache-2.0 - Finetuned from model : unsloth/LFM2-1.2B This lfm2 model was trained 2x faster with Unsloth and Huggingface's TRL library.
kallamni-700m-v1-lora
- Developed by: yasserrmd - License: apache-2.0 - Finetuned from model : unsloth/LFM2-700M-unsloth-bnb-4bit This lfm2 model was trained 2x faster with Unsloth and Huggingface's TRL library.
SinaReason-Magistral-2509
[](https://opensource.org/licenses/Apache-2.0) [](https://huggingface.co/datasets/FreedomIntelligence/medical-o1-reasoning-SFT) [](https://huggingface.co/mistralai/Magistral-Small-2509) [](https://github.com/unsloth/unsloth) [](https://huggingface.co/yasserrmd) SinaReason is a powerful, instruction-tuned language model designed for step-by-step medical clinical reasoning. It is a fine-tuned version of the formidable `mistralai/Magistral-Small-2509` (a 24B parameter model), specifically adapted for generating a transparent chain-of-thought process before delivering a clinical summary. The name "Sina" is inspired by Ibn Sina (Avicenna), a Persian polymath who is regarded as one of the most significant physicians and thinkers of the Islamic Golden Age. His work, The Canon of Medicine, was a standard medical text for centuries, embodying the principles of logical, evidence-based reasoning. This model aims to emulate that spirit by structuring its output to be transparent, logical, and useful for educational and professional clinical settings. Advanced Clinical Reasoning: Leverages the powerful reasoning capabilities of its base model to analyze clinical vignettes. Chain-of-Thought (CoT) Output: Uniquely structured to first externalize its reasoning process within ` ... ` tags, showing its work before providing a conclusion. Built for Education & Professional Support: Designed to assist clinicians, researchers, and medical students in understanding and formulating clinical logic. Instruction-Tuned: Fine-tuned on the `FreedomIntelligence/medical-o1-reasoning-SFT` dataset to enhance its performance on medical reasoning tasks. This model is a research and educational tool. It is NOT a medical device, and it is NOT a substitute for a qualified human medical professional. Intended Audience: This model is designed for use by medical professionals, researchers, and students for educational and research purposes. It is explicitly NOT intended for use by patients for self-diagnosis or to receive medical advice. Risk of Inaccuracy: As with all language models, SinaReason can generate incorrect, incomplete, or biased information (hallucinations). All outputs must be critically reviewed and independently verified by a human expert before being used in any real-world scenario. Built-in Safeguard: The recommended system prompt (provided below) includes a specific instruction that guides the model to frame its responses for a professional audience and explicitly warns it not to provide direct medical advice to patients. This is a critical safeguard that should always be used. No Patient Relationship: The model does not and cannot form a doctor-patient relationship. Using this model does not constitute receiving medical care. To get the best and safest results from SinaReason, you must use the recommended system prompt. This prompt activates the model's chain-of-thought capabilities and enforces its role as a reasoning assistant, not a direct-to-patient advisor. First, ensure you have the necessary libraries installed: --- Next, use the following Python script for inference: Base Model: mistralai/Magistral-Small-2509, a powerful 24B parameter multimodal model. Dataset: The model was fine-tuned on the FreedomIntelligence/medical-o1-reasoning-SFT dataset. This dataset is designed for Supervised Fine-Tuning (SFT) to improve the medical reasoning capabilities of language models. Fine-Tuning Framework: This model was fine-tuned using the Unsloth library, an open-source framework designed for highly efficient, memory-optimized fine-tuning of large language models. Performance Optimization: Unsloth enables significantly faster training (up to 2x) and a massive reduction in memory usage (up to 80%) compared to standard methods like QLoRA. This was achieved through its optimized Triton kernels and manual backpropagation implementation. Hardware Accessibility: The use of Unsloth made it possible to fine-tune this 24B parameter model on consumer-grade or single-GPU cloud hardware, making advanced model customization more accessible. --- To validate its capabilities, `SinaReason-Magistral-2509` was subjected to a qualitative evaluation across a benchmark of 30 diverse medical reasoning prompts. These scenarios were designed to test the model's reasoning in various domains, including: General Internal Medicine Emergency & Critical Care Specialized Fields (Neurology, Pediatrics, Rheumatology, etc.) The model performed exceptionally well, achieving a qualitative grade of A-. This reflects its ability to consistently generate medically sound, logically structured responses while acknowledging the inherent limitations of any AI in handling real-world clinical nuance. Consistent Chain-of-Thought Reasoning: The model strictly adhered to the ` ... ` format in 100% of the test cases, providing a transparent and logical pathway to its conclusions. High Clinical Accuracy: For the classic clinical vignettes tested, the model demonstrated a high degree of accuracy in identifying the most likely diagnosis and suggesting appropriate, standard-of-care next steps. Effective Prioritization of Emergencies: The model successfully identified and prioritized life-threatening conditions, such as Subarachnoid Hemorrhage, Aortic Aneurysm, and Epiglottitis, recommending immediate and appropriate actions. For full transparency, the complete log of the 30 evaluation prompts and the model's verbatim responses can be reviewed below. Click to view the full 30-prompt evaluation results Prompt: "Patient: 45-year-old with sudden onset severe headache described as 'the worst ever'. What should be ruled out and how?" Model Output: > > Alright, let's think this through. We have a 45-year-old patient who's experiencing a sudden, severe headache that they describe as the worst they've ever had. This really makes me think about a subarachnoid hemorrhage, or SAH for short. It's classic because these headaches often come on suddenly and can be incredibly intense. > > First things first, I need to consider what could cause such a sudden and severe headache. SAH definitely comes to mind, but I should also think about other possibilities like a migraine or a tension headache. But, given the description, these don't quite fit as well because migraines usually come with other symptoms like nausea or light sensitivity, and tension headaches aren't typically that intense. > > Now, how do I go about confirming if it's an SAH? The gold standard is a non-contrast CT scan of the brain. It's the most straightforward way to see if there's any bleeding in the subarachnoid space. This test is pretty reliable when done within the first few hours of the headache starting. If the CT scan comes back negative, that doesn't necessarily rule out SAH, but it can help decide what to do next. > > If the CT scan doesn't show any signs of bleeding, I'd consider a lumbar puncture, also known as a spinal tap. This can help detect the presence of xanthochromia or red blood cells in the cerebrospinal fluid, which are indicators of SAH. > > But wait, I can't just stop there. There are other serious conditions that need to be considered. Meningitis could also cause a severe headache, and it's crucial to rule that out because it's a medical emergency. A lumbar puncture can help here too, looking for increased white blood cells and other markers of inflammation in the CSF. > > And let's not forget about giant cell arteritis, especially given the patient's age. This is more of a consideration if there are other symptoms like jaw pain or visual disturbances, but it's always good to keep in the back of my mind. > > So, to summarize, I'd start with a non-contrast CT scan to look for bleeding. If that doesn't show anything, I'd move to a lumbar puncture. This way, I can check for both SAH and meningitis. And while I'm doing all this, I should keep an open mind about other conditions like giant cell arteritis, especially if there are other clues in the clinical picture. > > Okay, I think this plan makes sense. It covers the most likely causes and uses the right tests to get to the answer as quickly and safely as possible. > > Given the presentation of a sudden and severe headache described as the worst ever by a 45-year-old patient, the primary concern is a subarachnoid hemorrhage (SAH). To rule this out, the initial step should be a non-contrast CT scan of the brain. This test is highly effective in detecting SAH when performed within the first few hours of the onset of symptoms. > > If the CT scan is negative, a lumbar puncture (spinal tap) is recommended to look for xanthochromia or red blood cells in the cerebrospinal fluid, which can confirm the presence of SAH. > > Additionally, it is important to consider other potential causes of severe headaches, such as meningitis, which can also be evaluated through a lumbar puncture by assessing for increased white blood cells and signs of inflammation in the CSF. In this patient's age group, giant cell arteritis should also be considered, particularly if there are associated symptoms like jaw pain or visual disturbances. Prompt: "Patient: 72-year-old with history of hypertension presents with confusion, right-sided weakness, and slurred speech. What is the likely cause and immediate steps?" Model Output: > > Alright, let's see what's going on here. We have a 72-year-old patient with a history of hypertension, and now they're showing up with some pretty concerning symptoms: confusion, weakness on the right side, and slurred speech. That's definitely ringing some alarm bells, especially with the hypertension history. > > Hmm, the right-sided weakness and slurred speech really point to something happening on the left side of the brain, since the brain's right side controls the left side of the body and vice versa. Given that, we're probably looking at some sort of neurological issue, and the most likely culprit in this age group and with hypertension is a stroke. > > Okay, so let's think about the type of stroke. We've got two main types to consider: ischemic and hemorrhagic. Given that hypertension is a known risk factor for both, but especially hemorrhagic strokes, it's a strong contender here. > > Now, what should we do next? The first thing that comes to mind is to get a CT scan of the head. This is crucial because it can quickly tell us if there's any bleeding in the brain, which would point to a hemorrhagic stroke. If there's no bleeding, then we're likely dealing with an ischemic stroke, which could potentially be treated with a clot-busting drug like tPA. > > But wait, there's more. Before we jump into any treatment, we need to assess the patient's stability. Are they stable enough for a CT scan? Are they at risk of aspirating due to the slurred speech? And we can't forget to consider if they have any contraindications for tPA, like a recent head injury or bleeding disorders. > > So, in summary, we're probably dealing with a stroke, and the first step is to do a CT scan to figure out if it's hemorrhagic or ischemic. This will guide us on whether we need to administer tPA or manage the bleeding if it's hemorrhagic. > > Based on the symptoms and the patient's history, the likely cause of the patient's confusion, right-sided weakness, and slurred speech is a stroke, with a hemorrhagic stroke being a strong possibility given the history of hypertension. The immediate step to take is to perform a CT scan of the head. This will help differentiate between a hemorrhagic and ischemic stroke, which is crucial for determining the appropriate treatment. Prompt: "Patient: 60-year-old with long-standing diabetes has numbness and tingling in both feet. What is the most likely diagnosis and first-line management?" Model Output: > > Okay, let's think about this. We have a 60-year-old with long-standing diabetes who's experiencing numbness and tingling in both feet. Right off the bat, diabetes is a big clue here. It's well-known that diabetes can lead to complications like neuropathy, especially if it's been around for a long time. > > Now, let's consider what neuropathy means in this context. Diabetic peripheral neuropathy is pretty common, and it's characterized by those exact symptoms: numbness and tingling. This makes me think it's a strong contender for the diagnosis here. > > Also, I should consider the other possible causes of neuropathy, like B12 deficiency or other neurological conditions. But with the history of diabetes, diabetic neuropathy seems like the most straightforward explanation. > > Alright, moving on to what we can do about it. First off, controlling the blood sugar levels is crucial. If we can manage the diabetes better, we might be able to slow down or even prevent further nerve damage. This could involve medication adjustments or lifestyle changes. > > The most likely diagnosis for a 60-year-old with long-standing diabetes experiencing numbness and tingling in both feet is diabetic peripheral neuropathy. This condition is common in individuals with diabetes and is characterized by damage to the peripheral nerves due to prolonged high blood sugar levels. For the first-line management of diabetic peripheral neuropathy, the primary goal is to control blood sugar levels effectively to prevent further nerve damage. Prompt: "Patient: 25-year-old presents with high fever, sore throat, swollen neck, and drooling. What life-threatening condition must be excluded?" Model Output: > > Alright, let's see what's going on with this patient. We've got a 25-year-old with a high fever, sore throat, swollen neck, and drooling. These symptoms sound pretty alarming. Let's break it down. A high fever and sore throat could point to an infection. Swollen neck suggests something going on in the neck area, maybe lymph nodes or something deeper like an abscess. Drooling? That's a big red flag. It makes me think there might be an issue with swallowing or even breathing. > > But wait, there's a more serious condition that could fit this picture: epiglottitis. Oh, right! Epiglottitis is an infection of the epiglottis, which is the flap at the back of the throat that covers the trachea. It can cause the throat to swell, leading to difficulty swallowing, drooling, and even difficulty breathing. This is a medical emergency because if the airway gets obstructed, it could be life-threatening. > > The symptoms presented by the 25-year-old patient—high fever, sore throat, swollen neck, and drooling—are concerning and must be evaluated urgently. The combination of these symptoms, especially the drooling, raises the possibility of a serious condition like epiglottitis. Epiglottitis is an infection of the epiglottis, which can lead to swelling and obstruction of the airway, posing a significant risk for respiratory distress and potential life-threatening complications if not promptly managed. Immediate medical attention is warranted to ensure the patient's safety and to initiate appropriate treatment. Prompt: "Patient: 30-year-old with polyuria, polydipsia, and weight loss. What investigation confirms the diagnosis?" Model Output: > > Alright, let's see what's going on here. We've got a 30-year-old patient who's experiencing polyuria and polydipsia, which means they're peeing a lot and drinking a lot of water. Plus, there's weight loss happening too. These symptoms are making me think about diabetes mellitus, particularly Type 1 Diabetes, since the sudden onset and weight loss are classic signs. > > Okay, let's think about how we can confirm Type 1 Diabetes. The most direct way is by checking for hyperglycemia, which means high blood sugar levels. A fasting blood glucose test would do the trick here. If it's elevated, that's a strong indicator. > > The symptoms you've described—polyuria, polydipsia, and weight loss—are suggestive of Type 1 Diabetes Mellitus. To confirm this diagnosis, the most reliable investigation is a blood test for elevated fasting blood glucose levels. In Type 1 Diabetes, blood glucose levels are typically elevated due to insufficient insulin production by the pancreas. A fasting blood glucose level of 126 mg/dL or higher on two separate occasions confirms the diagnosis. Prompt: "Patient: 40-year-old with chest pain after vomiting, now with subcutaneous emphysema. What condition must be suspected?" Model Output: > > Okay, so we have a 40-year-old patient who's experiencing chest pain, and it seems to have started after they were vomiting. That's interesting because vomiting can sometimes lead to increased pressure in the chest. Now, they also have subcutaneous emphysema, which means air is trapped under the skin. > > Hmm, what could cause air to escape like that? Well, one possibility is something called a Boerhaave's syndrome. This is a tear in the esophagus that can happen when there's a lot of pressure, like from vomiting. If the esophagus ruptures, air can escape into the chest cavity and then spread under the skin, causing that subcutaneous emphysema. The chest pain right after vomiting and the subcutaneous emphysema together are a classic sign of Boerhaave's syndrome. > > Based on the symptoms described—a 40-year-old patient with chest pain following an episode of vomiting and the presence of subcutaneous emphysema—the most likely condition to suspect is Boerhaave's syndrome. This condition involves a tear in the esophagus, often resulting from a sudden increase in intrathoracic pressure, such as during vomiting. The rupture of the esophagus allows air to escape into the mediastinum and potentially under the skin, leading to the characteristic subcutaneous emphysema. Prompt: "Patient: 55-year-old with progressive shortness of breath, orthopnea, and ankle swelling. What condition and investigation are likely?" Model Output: > > Okay, let's think this through. We've got a 55-year-old patient who's experiencing progressive shortness of breath. This alone is pretty concerning, but when you add in the orthopnea, which is breathing difficulty when lying down, and ankle swelling, it's starting to paint a clearer picture. > > Given the age of the patient, I'm considering conditions that are more prevalent in older adults. Heart failure is a big one here. It can cause all these symptoms because the heart isn't pumping blood effectively, leading to fluid buildup in the lungs and body. > > So, putting it all together, heart failure seems like the most plausible diagnosis. To confirm this, a chest X-ray and echocardiogram should help us see if there's fluid in the lungs or if the heart is enlarged or not functioning properly. > > The symptoms you've described—progressive shortness of breath, orthopnea, and ankle swelling—are highly suggestive of heart failure, particularly in a 55-year-old patient. Heart failure occurs when the heart is unable to pump blood effectively, leading to fluid retention in the body. To investigate this condition, the following tests are typically recommended: 1. Chest X-ray: This can reveal any signs of fluid in the lungs (pulmonary edema) or an enlarged heart (cardiomegaly). 2. Echocardiogram: This ultrasound of the heart helps assess heart function and identify any structural abnormalities. Prompt: "Patient: 70-year-old chronic smoker presents with chronic cough, sputum production, and progressive breathlessness. What is the diagnosis and staging approach?" Model Output: > > Okay, let's think this through. We have a 70-year-old patient who is a chronic smoker. That's a big clue right there because smoking is a major risk factor for lung disease. Now, they've got a chronic cough and are producing sputum. Hmm, that sounds a lot like chronic bronchitis. But there's also progressive breathlessness. Could it be something like emphysema? Or maybe even COPD, which is a combo of both bronchitis and emphysema? > > What about the diagnosis, let's start with some basic tests. A chest X-ray could help us see if there are any masses that might suggest cancer. A spirometry test would be great to check lung function and confirm COPD if present. > > Given the patient's history of chronic smoking and symptoms of chronic cough, sputum production, and progressive breathlessness, the diagnosis points towards chronic obstructive pulmonary disease (COPD), which encompasses chronic bronchitis and possibly emphysema. To approach the diagnosis and staging: 1. Initial Evaluation: Start with a chest X-ray. 2. Lung Function Testing: Perform spirometry to assess lung function and confirm the presence of COPD. 3. Further Imaging: If the chest X-ray shows any suspicious lesions, a CT scan of the chest is warranted. Prompt: "Patient: 18-year-old with sudden-onset pleuritic chest pain and dyspnea. What condition is suspected and how do you confirm?" Model Output: > > Alright, let's see what's going on with this 18-year-old. They're experiencing sudden-onset pleuritic chest pain and dyspnea. That's pretty concerning. Pleuritic pain usually means there's something irritating the pleura, which could point to a few different things. Could it be pneumonia? Or maybe it's a pulmonary embolism? And let's not forget about pleural effusion or even pneumothorax. > > So, what's the best next step? Let's go with the chest X-ray first because it's non-invasive and can quickly tell us if there's a pneumothorax or pneumonia. If the X-ray comes back normal, we might need to consider other tests like a CTPA or D-dimer to investigate further for a pulmonary embolism. > > In this case, the sudden-onset pleuritic chest pain and dyspnea in an 18-year-old suggest a few potential conditions. The symptoms could be indicative of pneumonia, pulmonary embolism, pneumothorax, or pleural effusion. To narrow down the diagnosis, a chest X-ray is a reasonable first step. This test can quickly help identify pneumonia, pneumothorax, or pleural effusion. If the chest X-ray is normal, further investigation might be needed, such as a D-dimer test or a CT pulmonary angiogram (CTPA), to evaluate for a pulmonary embolism. Prompt: "Patient: 34-year-old woman with malar rash, photosensitivity, and joint pain. What investigations are appropriate?" Model Output: > > Okay, let's think this through. We have a 34-year-old woman who's showing up with a malar rash. That's a pretty classic sign, right? It's like the butterfly rash across her cheeks. And she's also got photosensitivity, meaning her skin reacts to sunlight. Plus, she's dealing with joint pain. These symptoms together are ringing a bell for systemic lupus erythematosus, or SLE. > > Now, to confirm if this is really SLE, we need some investigations. The anti-nuclear antibody test, or ANA, usually comes up first. But we also need more specific markers like Anti-double-stranded DNA antibodies (anti-dsDNA) and anti-Smith antibody (anti-Sm). > > Based on the symptoms presented by the 34-year-old woman, including a malar rash, photosensitivity, and joint pain, the most likely diagnosis is systemic lupus erythematosus (SLE). To confirm this diagnosis and evaluate the extent of the disease, several investigations are appropriate: 1. Serological Tests: ANA, anti-dsDNA, and anti-Sm. 2. Inflammatory Markers: ESR and CRP. 3. Kidney Function Tests. 4. Liver Function Tests. 5. Complete Blood Count (CBC). Prompt: "Patient: 29-year-old with anaphylaxis after bee sting. What immediate management steps are required?" Model Output: > > Okay, so here's a 29-year-old patient who's just experienced anaphylaxis after a bee sting. This is serious, right? Anaphylaxis is like the body's extreme reaction, and we need to act fast to stabilize them. > > First thing that comes to mind is, what can we do to quickly calm down this severe allergic reaction? We've got epinephrine, which is kind of our go-to drug for anaphylaxis. It's usually administered intramuscularly, and the thigh is the best spot. Once the epinephrine is given, we also need to make sure the patient is breathing okay. Sometimes, they might need oxygen. Positioning matters too! We want the patient in a position where they can breathe and their blood pressure can be maintained. Lying them flat is a good idea. > > In the case of a 29-year-old patient experiencing anaphylaxis after a bee sting, the immediate management steps are critical: 1. Administer Epinephrine: This is the first and most crucial step. Use an intramuscular auto-injector (e.g., EpiPen) in the mid-outer thigh. 2. Ensure Airway Patency: Check the patient's airway and provide oxygen if there is any difficulty in breathing. 3. Position the Patient: Place the patient in a supine position to help maintain blood pressure. 4. Monitor Vital Signs. 5. Prepare for Biphasic Anaphylaxis. Prompt: "Patient: 40-year-old with severe abdominal pain out of proportion to examination findings. What is suspected?" Model Output: > > Okay, let's think about this. We've got a 40-year-old patient who's complaining of really severe abdominal pain. But here's the thing—the exam doesn't show anything too alarming. That's kind of a red flag for something serious. > > Hmm, what could it be? I know that sometimes, severe pain without obvious exam findings can point to something like an acute abdomen. But what exactly? Mesenteric ischemia comes to mind. It's not always obvious on a physical exam because the blood flow issues can be subtle. Plus, it can really hurt, even if you don't see anything obvious. > > In the case of a 40-year-old patient experiencing severe abdominal pain that is out of proportion to the examination findings, mesenteric ischemia is a strong possibility to consider. This condition occurs when the blood flow to the intestines is compromised and can lead to significant pain despite subtle or minimal findings on a physical examination. To further evaluate this possibility, it is crucial to perform additional investigations. A CT angiography (CTA) of the abdomen is particularly useful. Prompt: "Patient: 60-year-old with crushing chest pain, hypotension, and muffled heart sounds. What is the likely diagnosis?" Model Output: > > Okay, let's think about what's going on here. We have a 60-year-old patient who's experiencing crushing chest pain. That's a big red flag, right? It's usually a sign of something serious with the heart. Now, the hypotension, or low blood pressure, is concerning because it suggests the heart isn't pumping effectively. > > Now, what's really catching my attention is the muffled heart sounds. This is a classic clue. It makes me think of something that's affecting the heart's ability to produce sound, like if there's fluid or something else around it. Oh, right, this is very characteristic of cardiac tamponade. The crushing chest pain, low blood pressure, and muffled heart sounds all point to cardiac tamponade. > > The most likely diagnosis for the 60-year-old patient with crushing chest pain, hypotension, and muffled heart sounds is cardiac tamponade. This condition occurs when there is an accumulation of fluid around the heart, leading to compression and reduced efficiency of cardiac function. The muffled heart sounds, in particular, are a hallmark of this condition. Prompt: "Patient: 50-year-old with massive hematemesis and shock. What are the emergency steps?" Model Output: > > Alright, we've got a 50-year-old patient who's had massive hematemesis and is in shock. This is pretty serious. Let's start by thinking about what could cause this. Massive bleeding from the upper GI tract could mean a ruptured esophageal varices, a bleeding ulcer, or maybe Mallory-Weiss syndrome. > > First things first, we need to stabilize the patient. They're in shock, so that means we have to manage their blood pressure and heart rate. We'll need to give them some fluids, maybe even start a blood transfusion if they're losing a lot of blood. We should definitely get an EGD (esophagogastroduodenoscopy) scheduled as soon as possible. > > In the case of a 50-year-old patient presenting with massive hematemesis and shock, the emergency steps are critical: 1. Stabilize the Patient: Airway Management and Circulatory Support. 2. Identify the Source of Bleeding: Arrange for an urgent Esophagogastroduodenoscopy (EGD). 3. Manage the Bleeding: Pharmacological intervention (e.g., octreotide). 4. Post-Procedure Care. Prompt: "Patient: 23-year-old with fever, headache, and purpuric rash. What should be suspected?" Model Output: > > Alright, let's think about what's going on with this patient. We've got a 23-year-old who's feeling pretty rough with a fever and a headache. But the thing that really catches my eye is this purpuric rash. That's usually a sign of something more serious happening, like blood vessels being affected somehow. > > Now, with these symptoms, I'm starting to wonder about meningococcal disease. It's pretty notorious for causing a fever, headache, and this specific type of rash. The combination of fever, headache, and purpuric rash is a classic triad for this condition. > > Based on the symptoms presented—fever, headache, and a purpuric rash—meningococcal disease is a strong possibility. The combination of these symptoms is often indicative of this condition, especially the presence of the purpuric rash. Immediate treatment with antibiotics, such as ceftriaxone or cefotaxime, is crucial if meningococcal disease is suspected, even before laboratory confirmation. Prompt: "Patient: 36-year-old with blunt trauma to abdomen, unstable vitals. What is the best initial investigation?" Model Output: > > Alright, let's think about this. We have a 36-year-old patient who's had some pretty serious blunt trauma to the abdomen, and now they're unstable. When someone comes in with unstable vitals after something like this, we need to act fast. The main concern here is internal bleeding. > > First, I'm thinking about the best way to check for any internal bleeding. We need something that can give us a quick look. What about an FAST exam? That's a focused assessment with sonography for trauma. It's like the rapid response team for ultrasound in trauma cases. FAST can quickly tell us if there's free fluid in the abdomen, which usually means blood. It's noninvasive and can be done right at the bedside. > > In the scenario you've described, where a 36-year-old patient has suffered blunt trauma to the abdomen and presents with unstable vitals, the best initial investigation is a FAST (Focused Assessment with Sonography for Trauma) exam. This is a rapid, noninvasive ultrasound technique used to detect free fluid in the abdomen, which typically indicates internal bleeding. Prompt: "Patient: 70-year-old postoperative patient with sudden dyspnea, chest pain, and hemoptysis. What is suspected?" Model Output: > > Alright, let's think this through. We've got a 70-year-old patient who just had surgery, and now they're suddenly short of breath and in pain. Oh, and they're coughing up blood, too. > > First things first, what could cause these symptoms after surgery? Well, pulmonary embolism jumps to mind. Surgery is a big risk factor for blood clots, especially in the legs, which can travel to the lungs. The sudden onset of dyspnea, chest pain, and hemoptysis is like a textbook case for a PE. > > In this scenario, the most likely explanation for the 70-year-old postoperative patient's sudden dyspnea, chest pain, and hemoptysis is a pulmonary embolism (PE). The combination of these symptoms, especially following surgery, raises significant suspicion for a PE. Immediate evaluation and treatment are necessary to manage this potentially life-threatening condition. Prompt: "Patient: 4-year-old with barking cough, stridor, and hoarseness. What is the diagnosis and management?" Model Output: > > Okay, let's think about this. We've got a 4-year-old child with a barking cough, stridor, and hoarseness. These symptoms are pretty classic for something, but what exactly? > > A barking cough makes me think of croup. It's that high-pitched, seal-like cough that's often heard in young kids. And stridor, that's the noise you hear when breathing, especially when inhaling, because of some narrowing in the airways. This all fits together with croup. > > Now, how do we manage this? For mild cases, we usually start with supportive care. That means keeping the child hydrated, maybe using a cool mist humidifier. But if the symptoms are severe, that's when nebulized racemic epinephrine comes into play. > > Based on the symptoms described—a barking cough, stridor, and hoarseness—the most likely diagnosis for this 4-year-old child is croup. For management, mild cases can often be managed with supportive care. In more severe cases, nebulized racemic epinephrine can be used to reduce airway swelling. Oral dexamethasone may also be considered to reduce inflammation. Prompt: "Patient: 65-year-old with severe back pain, hypotension, and pulsatile abdominal mass. What condition must be suspected?" Model Output: > > Alright, let's take a look at this 65-year-old patient. They're experiencing severe back pain. Now, the patient is also showing hypotension, which is low blood pressure. This is a big red flag. > > Okay, and there's this pulsatile abdominal mass. That's a really key detail. A pulsatile mass in the abdomen often points to something that's expanding and pushing against the abdominal wall, like an aneurysm. In this context, I'm thinking about an abdominal aortic aneurysm specifically because it fits with the back pain and hypotension. > > Based on the symptoms presented—a 65-year-old patient experiencing severe back pain, hypotension, and a pulsatile abdominal mass—the most likely condition to suspect is an abdominal aortic aneurysm (AAA). These symptoms are characteristic of an AAA, which could be pressing on the aorta, causing back pain, and if it's leaking or rupturing, it could lead to hypotension. Prompt: "Patient: 22-year-old with high fever, neck stiffness, and photophobia. What is the likely diagnosis and initial management?" Model Output: > > Alright, let's think this through. We've got a 22-year-old patient with a high fever, neck stiffness, and photophobia. These symptoms are ringing a pretty loud alarm bell for meningitis. Now, we need to consider what's causing this inflammation. It could be bacterial, viral, or even fungal, but given the age and severity, bacterial meningitis is a big concern here. > > Now, what should we do? With bacterial meningitis, time is of the essence. We need to start empirical antibiotic treatment right away. Typically, for bacterial meningitis, we'd start with something like ceftriaxone or cefotaxime. We also need to get a lumbar puncture. > > The symptoms of high fever, neck stiffness, and photophobia in a 22-year-old patient strongly suggest meningitis, with bacterial meningitis being a significant concern. The initial management should be prompt and comprehensive: 1. Empirical Antibiotic Therapy: Start broad-spectrum antibiotics immediately. 2. Lumbar Puncture: Perform a lumbar puncture to obtain cerebrospinal fluid (CSF) for analysis. 3. Supportive Care. Prompt: "Patient: 50-year-old with chronic alcohol use presents with confusion, ataxia, and ophthalmoplegia. What deficiency is likely?" Model Output: > > Alright, let's see what's going on here. We've got a 50-year-old who's been drinking a lot for a long time and now they're showing up with some concerning symptoms. They're confused, can't walk straight, and have issues with their eye movements. > > When I hear 'confusion' and 'ataxia' together, I start thinking about Wernicke's encephalopathy. It's like a classic set of symptoms—confusion, problems with movement, and eye movement issues. > > Now, what could cause this in someone who's been drinking heavily? Well, chronic alcohol use often messes with the body's ability to absorb and use vitamins properly. One vitamin in particular, thiamine, is crucial for brain function and is frequently deficient in alcoholics. > > The symptoms you're describing—confusion, ataxia, and ophthalmoplegia—in a 50-year-old patient with a history of chronic alcohol use strongly suggest Wernicke's encephalopathy. This condition is typically caused by a deficiency of thiamine (vitamin B1). Thiamine is crucial for the proper functioning of the nervous system, and its deficiency is common in individuals who consume excessive amounts of alcohol. Prompt: "Patient: 5-year-old with recurrent bone pain, dactylitis, and anemia. What is the likely diagnosis?" Model Output: > > Alright, so we have a 5-year-old who keeps experiencing bone pain, has this thing called dactylitis, and is showing signs of anemia. > > Now, dactylitis. That's when fingers or toes swell up like a sausage, right? It's pretty specific to certain conditions, especially sickle cell disease. In kids, if they have sickle cell, they often have this dactylitis because of the sickled red blood cells getting stuck and causing inflammation. > > And then there's the anemia. Anemia in kids with sickle cell disease is common. Putting these symptoms together: bone pain, dactylitis, and anemia, it's really pointing towards sickle cell disease. > > Based on the symptoms described—a 5-year-old with recurrent bone pain, dactylitis, and anemia—the most likely diagnosis is sickle cell disease. The bone pain, often referred to as "sickle cell crisis," occurs due to vaso-occlusive events. Dactylitis, or "hand-foot syndrome," is a hallmark symptom in young children. Anemia is common because the sickled cells have a reduced lifespan. Prompt: "Patient: 40-year-old woman with exophthalmos, tremors, and weight loss. What is the most likely condition?" Model Output: > > Alright, let's think this through. We've got a 40-year-old woman here, and she's showing some interesting symptoms: exophthalmos, tremors, and weight loss. Exophthalmos is when the eyes bulge out, and tremors mean her hands or other body parts are shaking involuntarily. Plus, she's losing weight. > > Exophthalmos usually makes me think of thyroid issues, particularly hyperthyroidism. This is because in Grave's disease, which is a common type of hyperthyroidism, the eyes often bulge. And then there's the weight loss. That's a classic sign of hyperthyroidism too. > > Based on the symptoms you've described—exophthalmos, tremors, and weight loss—the most likely condition affecting this 40-year-old woman is hyperthyroidism. Exophthalmos is often associated with Grave's disease, a form of hyperthyroidism. Tremors and weight loss are also common due to the increased metabolic rate. Prompt: "Patient: 68-year-old with progressive memory loss, difficulty with daily tasks. What is the probable diagnosis?" Model Output: > > Let's think about this. We have a 68-year-old patient who's experiencing progressive memory loss and having trouble with everyday tasks. > > First off, given the age and the symptoms, Alzheimer's disease immediately comes to mind. It's pretty common in older adults and is known for causing memory issues and difficulties with daily activities. Alzheimer's typically has a gradual decline, which fits what we're seeing here. > > Based on the symptoms of progressive memory loss and difficulty with daily tasks in a 68-year-old patient, the most probable diagnosis is Alzheimer's disease. Alzheimer's is characterized by gradual cognitive decline, including memory impairment and challenges with routine activities, which aligns with the patient's presentation. Prompt: "Patient: 55-year-old with hematuria, flank pain, and abdominal mass. What cancer must be considered?" Model Output: > > Alright, let's think about this 55-year-old patient. They're experiencing hematuria, which is blood in the urine. Oh, and there's also flank pain. Now, there's an abdominal mass too. > > When I put all these symptoms together—hematuria, flank pain, and an abdominal mass—I start thinking about renal cell carcinoma, or kidney cancer. It's a pretty common cancer that fits these symptoms well. Renal cell carcinoma often presents with hematuria because it can bleed into the urinary tract. And it's also known to cause flank pain if the tumor is pressing on nearby structures. > > In this scenario, renal cell carcinoma (RCC), a type of kidney cancer, is a significant consideration given the symptoms of hematuria, flank pain, and an abdominal mass. Renal cell carcinoma commonly presents with these symptoms because the tumor can bleed into the urinary tract, causing hematuria, and it can also cause flank pain if it compresses nearby structures. Prompt: "Patient: 45-year-old HIV-positive man with cough, fever, and cavitary lung lesion. What are the possibilities?" Model Output: > > Okay, so we've got a 45-year-old man who's HIV-positive, and he's dealing with a cough, fever, and a cavitary lung lesion. Since he's HIV-positive, he's more prone to infections that typically wouldn't affect someone with a fully functioning immune system. That makes opportunistic infections a big consideration here. > > Cavitary lesions are often linked to infections like tuberculosis or some types of fungal infections. Let's weigh the possibilities. Tuberculosis is a classic suspect, especially given the cavitary lesion. It's known for causing cough, fever, and these kinds of lung lesions, and it's definitely more common in people who are HIV-positive. > > In this scenario, the most likely diagnosis is pulmonary tuberculosis (TB), given the combination of cough, fever, and a cavitary lung lesion in an HIV-positive patient. TB is a common opportunistic infection in individuals with compromised immune systems, and the cavitary lesion is a classic finding associated with this condition. Other possibilities include fungal infections (histoplasmosis, aspergillosis) or malignancy, but TB is the leading differential. Prompt: "Patient: 30-year-old pregnant woman with high BP, proteinuria, and seizures. What is the diagnosis?" Model Output: > > Okay, let's think this through. We have a 30-year-old pregnant woman showing up with high blood pressure, proteinuria, and seizures. These symptoms are pretty alarming. > > Oh, and the seizures are a game-changer here. They really narrow down the possibilities. This isn't just about high blood pressure anymore; it's something more serious. Seizures during pregnancy immediately make me think of eclampsia. Eclampsia is essentially severe preeclampsia that's progressed to the point where seizures happen. High blood pressure? Check. Protein in the urine? Check. Seizures? Definitely. These three together are the classic triad for eclampsia. > > The symptoms presented—high blood pressure, proteinuria, and seizures—are characteristic of eclampsia, a severe form of preeclampsia that occurs during pregnancy. Eclampsia is distinguished by the presence of seizures, which are not typically seen in uncomplicated preeclampsia. The combination of these symptoms in a pregnant woman strongly suggests eclampsia, which is a medical emergency requiring immediate attention. Prompt: "Patient: 2-month-old infant with projectile non-bilious vomiting, visible peristalsis. What is suspected?" Model Output: > > Alright, let's see what's going on here. We've got a 2-month-old baby who's been vomiting pretty hard, and it's not the usual kind of vomiting because it's not bile-colored. That's interesting because it points away from anything happening in the small intestines. Now, what's really catching my eye is the peristalsis. That's the wave-like motion of the stomach you can actually see, right? > > Okay, let's think about what could cause this kind of vomiting in an infant. The visible peristalsis? That's like a signature symptom of pyloric stenosis. It's kind of like the stomach is trying to squeeze through a very tight opening. It's pretty specific and a big clue. > > Based on the symptoms described, the most likely diagnosis for this 2-month-old infant is pyloric stenosis. Pyloric stenosis is characterized by projectile, non-bilious vomiting, which occurs due to the narrowing of the pyloric sphincter. The visible peristalsis is a classic sign of this condition, as the stomach contracts vigorously to push food through the narrowed opening. Prompt: "Patient: 33-year-old with morning stiffness >1 hour, symmetrical joint swelling. What is the diagnosis and first-line test?" Model Output: > > Alright, let's think this through. We have a 33-year-old patient who's experiencing morning stiffness that lasts more than an hour and also has symmetrical joint swelling. That's a pretty classic set of symptoms. When I hear morning stiffness that's prolonged and the joints are swollen symmetrically, rheumatoid arthritis (RA) immediately comes to mind. > > When it comes to testing, the first thing we'd want to do is confirm this suspicion. The rheumatoid factor (RF) test is pretty common, but honestly, it's not always reliable. The anti-cyclic citrullinated peptide (anti-CCP) test is much more specific and a better marker for RA. It's the go-to test when you suspect RA. > > The diagnosis most consistent with the symptoms described—morning stiffness lasting more than an hour and symmetrical joint swelling—is rheumatoid arthritis (RA). RA is known for affecting the small joints symmetrically, and the prolonged morning stiffness is a hallmark feature. The first-line test for confirming rheumatoid arthritis is the anti-cyclic citrullinated peptide (anti-CCP) antibody test. This test is highly specific for RA. Prompt: "Patient: 22-year-old with abdominal pain, bloody diarrhea, and weight loss. What are the differential diagnoses?" Model Output: > > Okay, let's figure out what's going on with this 22-year-old who's having abdominal pain, bloody diarrhea, and weight loss. > > First off, the bloody diarrhea is a big clue. It suggests some kind of inflammation or damage in the intestines. I'm thinking inflammatory bowel disease (IBD) could be a possibility here. Both ulcerative colitis and Crohn's disease can cause these symptoms. > > Now, the weight loss is another piece of the puzzle. It often points to a chronic issue that's affecting nutrient absorption. In IBD, this is common. Could this be an infection? Something like an infectious gastroenteritis could cause bloody diarrhea too, but it usually resolves faster and doesn't typically cause weight loss unless it's severe or chronic. > > Based on the symptoms you've described—a 22-year-old experiencing abdominal pain, bloody diarrhea, and weight loss—the most likely differential diagnoses include: 1. Inflammatory Bowel Disease (IBD): Ulcerative Colitis or Crohn's Disease. 2. Infectious Causes: Such as Campylobacter, Salmonella, or E. coli, though less likely given the chronicity implied by weight loss. 3. Gastrointestinal Malignancy: Less common at this age, but conditions like colon cancer should be considered. Special thanks to the developers of Unsloth for creating an open-source library that significantly lowers the barrier to fine-tuning large-scale models, enabling projects like this one to be developed with greater efficiency and on more accessible hardware. If you use this model in your research, please cite the original base model and the dataset used for fine-tuning.
MegaSciMoE 1.2B
MedScholar-1.5B
MedScholar-1.5B is a compact, instruction-aligned medical question-answering model fine-tuned on 1 million randomly selected examples from the MIRIAD-4.4M dataset. It is based on the Qwen/Qwen2.5-1.5B-Instruct model and designed for efficient, in-context clinical knowledge exploration — not diagnosis. - Base Model: Qwen2.5-1.5B-Instruct-unsloth-bnb-4bit - Fine-tuning Dataset: MIRIAD-4.4M - Samples Used: 1,000,000 examples randomly selected from the full set - Prompt Style: Minimal QA format (see below) - Training Framework: Unsloth with QLoRA - License: Apache-2.0 (inherits from base model); dataset is ODC-By 1.0 The model expects the prompt to end with `### Answer:`, and will generate only the answer text. Do not include the answer in the prompt during inference. This model was fine-tuned using randomly selected 1 million examples from the MIRIAD-4.4M dataset, which is released under the ODC-By 1.0 License. > The MIRIAD dataset is intended exclusively for academic research and educational exploration. > As stated by its authors: > > “The outputs generated by models trained or fine-tuned on this dataset must not be used for medical diagnosis or decision-making involving real individuals.” This model is for research, educational, and exploration purposes only. It is not a medical device and must not be used to provide clinical advice, diagnosis, or treatment. MIRIAD Dataset by Zheng et al. (2025) – https://huggingface.co/datasets/miriad/miriad-4.4M Qwen2.5 by Alibaba – https://huggingface.co/Qwen Training infrastructure: Unsloth This qwen2 model was trained 2x faster with Unsloth and Huggingface's TRL library.
qwen3-4b-agentic-reasoner-gguf
oncology-gemma-300m-emb
SentenceTransformer based on google/embeddinggemma-300m This is a sentence-transformers model finetuned from google/embeddinggemma-300m. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more. Model Description - Model Type: Sentence Transformer - Base model: google/embeddinggemma-300m - Maximum Sequence Length: 2048 tokens - Output Dimensionality: 768 dimensions - Similarity Function: Cosine Similarity - Documentation: Sentence Transformers Documentation - Repository: Sentence Transformers on GitHub - Hugging Face: Sentence Transformers on Hugging Face Size: 20,000 training samples Columns: sentence0 and sentence1 Approximate statistics based on the first 1000 samples: | | sentence0 | sentence1 | |:--------|:-----------------------------------------------------------------------------------|:------------------------------------------------------------------------------------| | type | string | string | | details | min: 10 tokens mean: 22.55 tokens max: 51 tokens | min: 18 tokens mean: 91.28 tokens max: 219 tokens | Samples: | sentence0 | sentence1 | |:------------------------------------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | Is there a way to prevent PTLD in high-risk patients? | Currently, there is no convincing data for the prophylaxis of PTLD. However, the case mentioned suggests that early use of rituximab after HSCT (Hematopoietic Stem Cell Transplantation) could be a good way to prevent PTLD in high-risk patients, especially those who are serum EBV (Epstein-Barr Virus) positive. Early recognition of PTLD, early lymph node biopsy, and early diagnosis are key factors in the successful treatment of PTLD. | | How does the 34-gene 'CTC profile' contribute to the prognostic power of breast cancer patients? | The 34-gene 'CTC profile' has been found to be predictive of CTC status in breast cancer patients. It demonstrated a classification accuracy of 82% in the training cohort and 67% in an independent microarray dataset. Furthermore, it has been shown to be prognostic in both independent datasets, with a hazard ratio (HR) of 10 in the first validation dataset and a HR of 3.2 in the second validation dataset. Importantly, multivariate analysis confirmed that the CTC profile provided prognostic information independent of other clinical variables in both patient cohorts. | | How are beauty care services for cancer patients organized and provided? | Beauty care services for cancer patients are not standardized or evaluated and vary from one establishment to another. In the case of the IGR, consultations on image advice and socio-aesthetics are provided by a socio-aesthetician who has been trained as a personal image advisor. These consultations are offered to women with breast cancer or young adults and adolescents with cancer who are referred by medical units. The consultations take place in a dedicated area with three rooms: an office, make-up parlor, and beauty care salon. Patients are usually seen multiple times during their treatment period. The socio-aesthetician is paid by the hospital and is part of the Onco-hematology Interdisciplinary Supportive Care Directorate. | Loss: MultipleNegativesRankingLoss with these parameters: Training Hyperparameters Non-Default Hyperparameters - `perdevicetrainbatchsize`: 4 - `perdeviceevalbatchsize`: 4 - `numtrainepochs`: 1 - `multidatasetbatchsampler`: roundrobin - `overwriteoutputdir`: False - `dopredict`: False - `evalstrategy`: no - `predictionlossonly`: True - `perdevicetrainbatchsize`: 4 - `perdeviceevalbatchsize`: 4 - `pergputrainbatchsize`: None - `pergpuevalbatchsize`: None - `gradientaccumulationsteps`: 1 - `evalaccumulationsteps`: None - `torchemptycachesteps`: None - `learningrate`: 5e-05 - `weightdecay`: 0.0 - `adambeta1`: 0.9 - `adambeta2`: 0.999 - `adamepsilon`: 1e-08 - `maxgradnorm`: 1 - `numtrainepochs`: 1 - `maxsteps`: -1 - `lrschedulertype`: linear - `lrschedulerkwargs`: {} - `warmupratio`: 0.0 - `warmupsteps`: 0 - `loglevel`: passive - `loglevelreplica`: warning - `logoneachnode`: True - `loggingnaninffilter`: True - `savesafetensors`: True - `saveoneachnode`: False - `saveonlymodel`: False - `restorecallbackstatesfromcheckpoint`: False - `nocuda`: False - `usecpu`: False - `usempsdevice`: False - `seed`: 42 - `dataseed`: None - `jitmodeeval`: False - `useipex`: False - `bf16`: False - `fp16`: False - `fp16optlevel`: O1 - `halfprecisionbackend`: auto - `bf16fulleval`: False - `fp16fulleval`: False - `tf32`: None - `localrank`: 0 - `ddpbackend`: None - `tpunumcores`: None - `tpumetricsdebug`: False - `debug`: [] - `dataloaderdroplast`: False - `dataloadernumworkers`: 0 - `dataloaderprefetchfactor`: None - `pastindex`: -1 - `disabletqdm`: False - `removeunusedcolumns`: True - `labelnames`: None - `loadbestmodelatend`: False - `ignoredataskip`: False - `fsdp`: [] - `fsdpminnumparams`: 0 - `fsdpconfig`: {'minnumparams': 0, 'xla': False, 'xlafsdpv2': False, 'xlafsdpgradckpt': False} - `fsdptransformerlayerclstowrap`: None - `acceleratorconfig`: {'splitbatches': False, 'dispatchbatches': None, 'evenbatches': True, 'useseedablesampler': True, 'nonblocking': False, 'gradientaccumulationkwargs': None} - `parallelismconfig`: None - `deepspeed`: None - `labelsmoothingfactor`: 0.0 - `optim`: adamwtorchfused - `optimargs`: None - `adafactor`: False - `groupbylength`: False - `lengthcolumnname`: length - `ddpfindunusedparameters`: None - `ddpbucketcapmb`: None - `ddpbroadcastbuffers`: False - `dataloaderpinmemory`: True - `dataloaderpersistentworkers`: False - `skipmemorymetrics`: True - `uselegacypredictionloop`: False - `pushtohub`: False - `resumefromcheckpoint`: None - `hubmodelid`: None - `hubstrategy`: everysave - `hubprivaterepo`: None - `hubalwayspush`: False - `hubrevision`: None - `gradientcheckpointing`: False - `gradientcheckpointingkwargs`: None - `includeinputsformetrics`: False - `includeformetrics`: [] - `evaldoconcatbatches`: True - `fp16backend`: auto - `pushtohubmodelid`: None - `pushtohuborganization`: None - `mpparameters`: - `autofindbatchsize`: False - `fulldeterminism`: False - `torchdynamo`: None - `rayscope`: last - `ddptimeout`: 1800 - `torchcompile`: False - `torchcompilebackend`: None - `torchcompilemode`: None - `includetokenspersecond`: False - `includenuminputtokensseen`: False - `neftunenoisealpha`: None - `optimtargetmodules`: None - `batchevalmetrics`: False - `evalonstart`: False - `useligerkernel`: False - `ligerkernelconfig`: None - `evalusegatherobject`: False - `averagetokensacrossdevices`: False - `prompts`: None - `batchsampler`: batchsampler - `multidatasetbatchsampler`: roundrobin - `routermapping`: {} - `learningratemapping`: {} Training Logs | Epoch | Step | Training Loss | |:-----:|:----:|:-------------:| | 0.1 | 500 | 0.0144 | | 0.2 | 1000 | 0.0293 | | 0.3 | 1500 | 0.0128 | | 0.4 | 2000 | 0.0153 | | 0.5 | 2500 | 0.0182 | | 0.6 | 3000 | 0.008 | | 0.7 | 3500 | 0.0098 | | 0.8 | 4000 | 0.0044 | | 0.9 | 4500 | 0.0024 | | 1.0 | 5000 | 0.0019 | Framework Versions - Python: 3.12.11 - Sentence Transformers: 5.1.0 - Transformers: 4.56.1 - PyTorch: 2.8.0+cu128 - Accelerate: 1.10.1 - Datasets: 4.0.0 - Tokenizers: 0.22.0
endocrinology-gemma-300m-emb
SentenceTransformer based on google/embeddinggemma-300m This is a sentence-transformers model finetuned from google/embeddinggemma-300m. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more. Model Description - Model Type: Sentence Transformer - Base model: google/embeddinggemma-300m - Maximum Sequence Length: 2048 tokens - Output Dimensionality: 768 dimensions - Similarity Function: Cosine Similarity - Documentation: Sentence Transformers Documentation - Repository: Sentence Transformers on GitHub - Hugging Face: Sentence Transformers on Hugging Face Size: 20,000 training samples Columns: sentence0 and sentence1 Approximate statistics based on the first 1000 samples: | | sentence0 | sentence1 | |:--------|:----------------------------------------------------------------------------------|:------------------------------------------------------------------------------------| | type | string | string | | details | min: 9 tokens mean: 21.14 tokens max: 54 tokens | min: 15 tokens mean: 90.48 tokens max: 223 tokens | Samples: | sentence0 | sentence1 | |:---------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | What factors contribute to the development of hypoglycemia unawareness in individuals with diabetes? | Hypoglycemia unawareness, also known as HAAF (hypoglycemia-associated autonomic failure), is a known complication of insulin therapy for type 1 and type 2 diabetes. Even a single episode of antecedent hypoglycemia can alter the neuroendocrine response during subsequent hypoglycemia. While the exact mechanism of HAAF is not fully understood, improved brain glucose transport is considered a major factor. In individuals with HAAF, brain glucose concentration is higher compared to controls. Chronic and recurrent hypoglycemia can enhance blood-brain glucose transport capacity, and increased expression of glucose transporters at the blood-brain barrier has been observed in animal models. HAAF is characterized by a lack of suppression of endogenous insulin secretion and failure of glucagon and catecholamine secretion during hypoglycemia. Decreased cortisol secretion is commonly present, but adrenal medullary effects predominate. Increased CRH secretion, acting via CRH receptor 1, may be invol... | | How was the baby boy with the TRβ R243W mutation diagnosed with resistance to thyroid hormone (RTH) instead of neonatal Graves' disease (GD)? | The baby boy was initially suspected of having neonatal GD due to his mother's condition. However, laboratory tests showed that his thyroid-stimulating hormone (TSH) levels were not suppressed, and he had high levels of free T4 (FT4) and free T3 (FT3) with no antibodies related to GD. Based on these findings, he was diagnosed with RTH instead of GD. | | What are the risk factors for developing diabetic muscle infarction (DMI)? | The risk factors for developing diabetic muscle infarction (DMI) include poorly controlled diabetes mellitus, particularly type 1 diabetes, and the presence of late complications such as nephropathy, retinopathy, and neuropathy. Other factors that may contribute to the development of DMI include hyperglycemia and long-standing diabetes. | Loss: MultipleNegativesRankingLoss with these parameters: Training Hyperparameters Non-Default Hyperparameters - `perdevicetrainbatchsize`: 6 - `perdeviceevalbatchsize`: 6 - `numtrainepochs`: 1 - `multidatasetbatchsampler`: roundrobin - `overwriteoutputdir`: False - `dopredict`: False - `evalstrategy`: no - `predictionlossonly`: True - `perdevicetrainbatchsize`: 6 - `perdeviceevalbatchsize`: 6 - `pergputrainbatchsize`: None - `pergpuevalbatchsize`: None - `gradientaccumulationsteps`: 1 - `evalaccumulationsteps`: None - `torchemptycachesteps`: None - `learningrate`: 5e-05 - `weightdecay`: 0.0 - `adambeta1`: 0.9 - `adambeta2`: 0.999 - `adamepsilon`: 1e-08 - `maxgradnorm`: 1 - `numtrainepochs`: 1 - `maxsteps`: -1 - `lrschedulertype`: linear - `lrschedulerkwargs`: {} - `warmupratio`: 0.0 - `warmupsteps`: 0 - `loglevel`: passive - `loglevelreplica`: warning - `logoneachnode`: True - `loggingnaninffilter`: True - `savesafetensors`: True - `saveoneachnode`: False - `saveonlymodel`: False - `restorecallbackstatesfromcheckpoint`: False - `nocuda`: False - `usecpu`: False - `usempsdevice`: False - `seed`: 42 - `dataseed`: None - `jitmodeeval`: False - `useipex`: False - `bf16`: False - `fp16`: False - `fp16optlevel`: O1 - `halfprecisionbackend`: auto - `bf16fulleval`: False - `fp16fulleval`: False - `tf32`: None - `localrank`: 0 - `ddpbackend`: None - `tpunumcores`: None - `tpumetricsdebug`: False - `debug`: [] - `dataloaderdroplast`: False - `dataloadernumworkers`: 0 - `dataloaderprefetchfactor`: None - `pastindex`: -1 - `disabletqdm`: False - `removeunusedcolumns`: True - `labelnames`: None - `loadbestmodelatend`: False - `ignoredataskip`: False - `fsdp`: [] - `fsdpminnumparams`: 0 - `fsdpconfig`: {'minnumparams': 0, 'xla': False, 'xlafsdpv2': False, 'xlafsdpgradckpt': False} - `fsdptransformerlayerclstowrap`: None - `acceleratorconfig`: {'splitbatches': False, 'dispatchbatches': None, 'evenbatches': True, 'useseedablesampler': True, 'nonblocking': False, 'gradientaccumulationkwargs': None} - `parallelismconfig`: None - `deepspeed`: None - `labelsmoothingfactor`: 0.0 - `optim`: adamwtorchfused - `optimargs`: None - `adafactor`: False - `groupbylength`: False - `lengthcolumnname`: length - `ddpfindunusedparameters`: None - `ddpbucketcapmb`: None - `ddpbroadcastbuffers`: False - `dataloaderpinmemory`: True - `dataloaderpersistentworkers`: False - `skipmemorymetrics`: True - `uselegacypredictionloop`: False - `pushtohub`: False - `resumefromcheckpoint`: None - `hubmodelid`: None - `hubstrategy`: everysave - `hubprivaterepo`: None - `hubalwayspush`: False - `hubrevision`: None - `gradientcheckpointing`: False - `gradientcheckpointingkwargs`: None - `includeinputsformetrics`: False - `includeformetrics`: [] - `evaldoconcatbatches`: True - `fp16backend`: auto - `pushtohubmodelid`: None - `pushtohuborganization`: None - `mpparameters`: - `autofindbatchsize`: False - `fulldeterminism`: False - `torchdynamo`: None - `rayscope`: last - `ddptimeout`: 1800 - `torchcompile`: False - `torchcompilebackend`: None - `torchcompilemode`: None - `includetokenspersecond`: False - `includenuminputtokensseen`: False - `neftunenoisealpha`: None - `optimtargetmodules`: None - `batchevalmetrics`: False - `evalonstart`: False - `useligerkernel`: False - `ligerkernelconfig`: None - `evalusegatherobject`: False - `averagetokensacrossdevices`: False - `prompts`: None - `batchsampler`: batchsampler - `multidatasetbatchsampler`: roundrobin - `routermapping`: {} - `learningratemapping`: {} Training Logs | Epoch | Step | Training Loss | |:------:|:----:|:-------------:| | 0.1500 | 500 | 0.0224 | | 0.2999 | 1000 | 0.0171 | | 0.4499 | 1500 | 0.0158 | | 0.5999 | 2000 | 0.0062 | | 0.7499 | 2500 | 0.0095 | | 0.8998 | 3000 | 0.0043 | Framework Versions - Python: 3.12.11 - Sentence Transformers: 5.1.0 - Transformers: 4.56.2 - PyTorch: 2.8.0+cu128 - Accelerate: 1.10.1 - Datasets: 4.0.0 - Tokenizers: 0.22.1
kallamni-1.2b-v1-gguf
SparkNV-Voice
SparkNV-Voice is a fine-tuned version of the Spark-TTS model trained on the NonverbalTTS dataset. It enables expressive speech synthesis with nonverbal cues (like laughter, sighs, sneezing, etc.) and rich emotional tone. Built for applications that require natural, human-like vocalization, this model produces speech with semantic tokens and global prosody control using BiCodec detokenization. - Base: `suno-ai/spark-tts` - Dataset: `deepvk/NonverbalTTS` - Architecture: Causal Language Model + BiCodec for audio token generation - Language: English - Voice: Single-speaker (no multi-speaker conditioning) To run this model, install the required dependencies: 17+ hours of annotated emotional & nonverbal English speech Automatic + human-validated labels Sources: VoxCeleb, Expresso Paper: arXiv:2507.13155 This model is released under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) license. Base model: `suno-ai/spark-tts` Dataset: `deepvk/NonverbalTTS` Author: `@yasserrmd` Open a discussion or issue on this repo. Contributions are welcome!
gemma-3-4b-it-GGUF
GemmaECG-Vision-base
- Developed by: yasserrmd - License: apache-2.0 - Finetuned from model : unsloth/gemma-3n-e2b-unsloth-bnb-4bit This gemma3n model was trained 2x faster with Unsloth and Huggingface's TRL library.
phi-4-gguf
AgenticCoder-4B
AgenticCoder‑4B is a compact 4B parameter language model designed for autonomous agent workflows and intelligent code reasoning. It merges the planning and tool-use strengths of `Jan-nano` with the coding and logic capabilities of `Qwen3‑4B‑Code‑Reasoning`, creating a balanced model ideal for real-world assistant scenarios, research agents, and smart development tools. - 🔁 Agentic Planning & MCP Alignment Trained on datasets and architectures optimized for multi-step reasoning, task decomposition, and memory–contextual workflows. - 💻 Code Understanding & Reasoning Strong capabilities in Python code generation, script explanation, optimization, and multi-turn task development. - 🧰 Tool Use Simulation Handles realistic tool interaction prompts such as CSV analysis, OCR, and file parsing in code. - 📦 Compact & Efficient (4B) Lightweight enough for cost-efficient deployment, edge device integration, and fine-tuning. - Merge Method: SLERP (`t = 0.4`) - Base Model: `Menlo/Jan-nano` - Merged With: `ertghiu256/qwen3-4b-code-reasoning` - Precision: `float16` - Tokenizer Source: `Menlo/Jan-nano` This model is provided for research and development use under the terms of the base models’ respective licenses. Please ensure compliance before commercial usage. Menlo/Jan-nano by Menlo Systems Qwen3‑4B‑Code‑Reasoning by ertghiu256 MergeKit, SLERP, Hugging Face
Human-Like-Qwen2.5-1.5B-Instruct
Foundation-Sec-8B-gguf
cardio-gemma-300m-emb
SentenceTransformer based on google/embeddinggemma-300m This is a sentence-transformers model finetuned from google/embeddinggemma-300m. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more. Model Description - Model Type: Sentence Transformer - Base model: google/embeddinggemma-300m - Maximum Sequence Length: 2048 tokens - Output Dimensionality: 768 dimensions - Similarity Function: Cosine Similarity - Documentation: Sentence Transformers Documentation - Repository: Sentence Transformers on GitHub - Hugging Face: Sentence Transformers on Hugging Face Size: 20,000 training samples Columns: sentence0 and sentence1 Approximate statistics based on the first 1000 samples: | | sentence0 | sentence1 | |:--------|:----------------------------------------------------------------------------------|:------------------------------------------------------------------------------------| | type | string | string | | details | min: 8 tokens mean: 22.56 tokens max: 57 tokens | min: 17 tokens mean: 89.45 tokens max: 260 tokens | Samples: | sentence0 | sentence1 | |:------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | What are the key features of diabetic cardiomyopathy and how are they affected by 11β-HSD1 inhibition? | Diabetic cardiomyopathy is characterized by fibrosis and hypertrophy in the heart tissues. In the low dose STZ-high fat model of type 2 diabetes, diabetic mice showed increased collagen deposition and irregular/disorganized muscle fibers in the heart. However, treatment with PF, an inhibitor of 11β-HSD1, normalized these alterations, indicating that 11β-HSD1 inhibition can prevent the development of diabetic cardiomyopathy. | | How does tissue Doppler imaging (TDI) contribute to the assessment of myocardial dyssynchrony? | Tissue Doppler imaging (TDI) is a technique used in echocardiography to evaluate the motion of the left ventricle. By analyzing myocardial regional velocity curves, TDI can provide information on the timing of systolic contractions in different myocardial segments. In the context of assessing dyssynchrony, TDI can measure the time-to-peak myocardial sustained systolic velocities (Ts) in all 12 left ventricular (LV) segments. The standard deviation of Ts (Ts-SD) can then be calculated to determine the presence of significant systolic IVD. | | How is conventional coronary angiography performed? | Conventional coronary angiography is performed via a femoral approach using approximately 40 mL of nonionic contrast material. A minimum of six orthogonal views are obtained to evaluate the coronary arteries. The images are evaluated by a board-certified cardiologist who assesses the diameter stenosis by visual estimation. | Loss: MultipleNegativesRankingLoss with these parameters: Training Hyperparameters Non-Default Hyperparameters - `perdevicetrainbatchsize`: 6 - `perdeviceevalbatchsize`: 6 - `numtrainepochs`: 1 - `multidatasetbatchsampler`: roundrobin - `overwriteoutputdir`: False - `dopredict`: False - `evalstrategy`: no - `predictionlossonly`: True - `perdevicetrainbatchsize`: 6 - `perdeviceevalbatchsize`: 6 - `pergputrainbatchsize`: None - `pergpuevalbatchsize`: None - `gradientaccumulationsteps`: 1 - `evalaccumulationsteps`: None - `torchemptycachesteps`: None - `learningrate`: 5e-05 - `weightdecay`: 0.0 - `adambeta1`: 0.9 - `adambeta2`: 0.999 - `adamepsilon`: 1e-08 - `maxgradnorm`: 1 - `numtrainepochs`: 1 - `maxsteps`: -1 - `lrschedulertype`: linear - `lrschedulerkwargs`: {} - `warmupratio`: 0.0 - `warmupsteps`: 0 - `loglevel`: passive - `loglevelreplica`: warning - `logoneachnode`: True - `loggingnaninffilter`: True - `savesafetensors`: True - `saveoneachnode`: False - `saveonlymodel`: False - `restorecallbackstatesfromcheckpoint`: False - `nocuda`: False - `usecpu`: False - `usempsdevice`: False - `seed`: 42 - `dataseed`: None - `jitmodeeval`: False - `useipex`: False - `bf16`: False - `fp16`: False - `fp16optlevel`: O1 - `halfprecisionbackend`: auto - `bf16fulleval`: False - `fp16fulleval`: False - `tf32`: None - `localrank`: 0 - `ddpbackend`: None - `tpunumcores`: None - `tpumetricsdebug`: False - `debug`: [] - `dataloaderdroplast`: False - `dataloadernumworkers`: 0 - `dataloaderprefetchfactor`: None - `pastindex`: -1 - `disabletqdm`: False - `removeunusedcolumns`: True - `labelnames`: None - `loadbestmodelatend`: False - `ignoredataskip`: False - `fsdp`: [] - `fsdpminnumparams`: 0 - `fsdpconfig`: {'minnumparams': 0, 'xla': False, 'xlafsdpv2': False, 'xlafsdpgradckpt': False} - `fsdptransformerlayerclstowrap`: None - `acceleratorconfig`: {'splitbatches': False, 'dispatchbatches': None, 'evenbatches': True, 'useseedablesampler': True, 'nonblocking': False, 'gradientaccumulationkwargs': None} - `parallelismconfig`: None - `deepspeed`: None - `labelsmoothingfactor`: 0.0 - `optim`: adamwtorchfused - `optimargs`: None - `adafactor`: False - `groupbylength`: False - `lengthcolumnname`: length - `ddpfindunusedparameters`: None - `ddpbucketcapmb`: None - `ddpbroadcastbuffers`: False - `dataloaderpinmemory`: True - `dataloaderpersistentworkers`: False - `skipmemorymetrics`: True - `uselegacypredictionloop`: False - `pushtohub`: False - `resumefromcheckpoint`: None - `hubmodelid`: None - `hubstrategy`: everysave - `hubprivaterepo`: None - `hubalwayspush`: False - `hubrevision`: None - `gradientcheckpointing`: False - `gradientcheckpointingkwargs`: None - `includeinputsformetrics`: False - `includeformetrics`: [] - `evaldoconcatbatches`: True - `fp16backend`: auto - `pushtohubmodelid`: None - `pushtohuborganization`: None - `mpparameters`: - `autofindbatchsize`: False - `fulldeterminism`: False - `torchdynamo`: None - `rayscope`: last - `ddptimeout`: 1800 - `torchcompile`: False - `torchcompilebackend`: None - `torchcompilemode`: None - `includetokenspersecond`: False - `includenuminputtokensseen`: False - `neftunenoisealpha`: None - `optimtargetmodules`: None - `batchevalmetrics`: False - `evalonstart`: False - `useligerkernel`: False - `ligerkernelconfig`: None - `evalusegatherobject`: False - `averagetokensacrossdevices`: False - `prompts`: None - `batchsampler`: batchsampler - `multidatasetbatchsampler`: roundrobin - `routermapping`: {} - `learningratemapping`: {} Training Logs | Epoch | Step | Training Loss | |:------:|:----:|:-------------:| | 0.1500 | 500 | 0.0276 | | 0.2999 | 1000 | 0.0145 | | 0.4499 | 1500 | 0.0072 | | 0.5999 | 2000 | 0.007 | | 0.7499 | 2500 | 0.0039 | | 0.8998 | 3000 | 0.0044 | Framework Versions - Python: 3.12.11 - Sentence Transformers: 5.1.0 - Transformers: 4.56.1 - PyTorch: 2.8.0+cu128 - Accelerate: 1.10.1 - Datasets: 4.0.0 - Tokenizers: 0.22.0
SinaReason-Magistral-2509-bnb-4bit
MedScholar-Reasoning-1.5B
phi-4-Sky-T1-data-gguf
Mistral-Small-24B-Instruct-2501-mlx
nsfk-detection
AgentUX-4B
AgentUX‑4B is a compact, agentic reasoning model designed for UI layout generation, component reasoning, and lightweight code structuring tasks. It’s a 4B-parameter model merged using SLERP (Spherical Linear Interpolation) via MergeKit, combining: 🔷 60% `Tesslate/UIGEN-X-4B-0729` — excellent at UI understanding and structured generation 🔹 40% `Menlo/Jan-nano` — strong generalist with compact tool-use and agentic reasoning 📐 UI reasoning & layout structure understanding 🧩 Component-to-code generation (HTML, JSX, CSS fragments) 🧠 Compact agentic planning and multi-step reasoning ⚡ Lightweight & merge-optimized for local inference and real-time apps 🧬 Merged using SLERP to preserve semantic smoothness between sources | Prompt | Task | | -------------------------------------------------- | -------------------------- | | "Generate a signup form layout using HTML and CSS" | Frontend layout generation | | "Explain the role of `flex-wrap` in UI design" | UI reasoning | | "Plan 3 steps to build a sidebar menu using React" | Agentic decomposition | 🔗 MergeKit method: `slerp` 🔍 Focused on reasoning alignment between structured generation (UIGEN) and agent-style planning (Jan-nano) 🤖 No additional fine-tuning post-merge Model licensed under Apache 2.0 All credit to the original base models:
gastroenterology-gemma-300m-emb
SentenceTransformer based on google/embeddinggemma-300m This is a sentence-transformers model finetuned from google/embeddinggemma-300m. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more. Model Description - Model Type: Sentence Transformer - Base model: google/embeddinggemma-300m - Maximum Sequence Length: 2048 tokens - Output Dimensionality: 768 dimensions - Similarity Function: Cosine Similarity - Documentation: Sentence Transformers Documentation - Repository: Sentence Transformers on GitHub - Hugging Face: Sentence Transformers on Hugging Face Size: 20,000 training samples Columns: sentence0 and sentence1 Approximate statistics based on the first 1000 samples: | | sentence0 | sentence1 | |:--------|:----------------------------------------------------------------------------------|:------------------------------------------------------------------------------------| | type | string | string | | details | min: 8 tokens mean: 21.43 tokens max: 49 tokens | min: 17 tokens mean: 87.31 tokens max: 241 tokens | Samples: | sentence0 | sentence1 | |:---------------------------------------------------------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | What are the benefits of combining high dose IV PPI with endoscopic treatment for peptic ulcer rebleeding management? | Combining high dose IV PPI with endoscopic treatment for peptic ulcer rebleeding management has been shown to be significantly better than administering high dose IV PPI alone. It reduces the rebleeding rate, with a rebleeding rate of 0% compared to 9% when high dose IV PPI is administered alone. However, there is no significant difference in terms of mortality rate between the two approaches. | | How can non-invasive methods be used to diagnose and evaluate schistosome-induced liver fibrosis? | Non-invasive methods, such as ultrasonography, CT, MRI, and serum markers, can be used to diagnose and evaluate schistosome-induced liver fibrosis. Liver biopsy, although considered the gold standard, is clinically impractical in the field. Ultrasonography is valuable in assessing pathology, but its availability is limited in many endemic communities. CT and MRI show distinct imaging features associated with hepatosplenic schistosomiasis and aid in diagnosis and clinical management. Serum markers, such as hyaluronic acid, collagen type III, YKL-40, and laminin, show promise in evaluating hepatic fibrosis. However, more studies are needed to evaluate the utility of other markers, such as matrix metalloproteinases, inhibitors, and cytokines. | | What are the typical symptoms and complications associated with Meckel's diverticulum? | Meckel's diverticulum can present with symptoms such as hemorrhage, obstruction, perforation, and inflammation. The reported lifetime complication rate is 4%. | Loss: MultipleNegativesRankingLoss with these parameters: Training Hyperparameters Non-Default Hyperparameters - `perdevicetrainbatchsize`: 6 - `perdeviceevalbatchsize`: 6 - `numtrainepochs`: 1 - `multidatasetbatchsampler`: roundrobin - `overwriteoutputdir`: False - `dopredict`: False - `evalstrategy`: no - `predictionlossonly`: True - `perdevicetrainbatchsize`: 6 - `perdeviceevalbatchsize`: 6 - `pergputrainbatchsize`: None - `pergpuevalbatchsize`: None - `gradientaccumulationsteps`: 1 - `evalaccumulationsteps`: None - `torchemptycachesteps`: None - `learningrate`: 5e-05 - `weightdecay`: 0.0 - `adambeta1`: 0.9 - `adambeta2`: 0.999 - `adamepsilon`: 1e-08 - `maxgradnorm`: 1 - `numtrainepochs`: 1 - `maxsteps`: -1 - `lrschedulertype`: linear - `lrschedulerkwargs`: {} - `warmupratio`: 0.0 - `warmupsteps`: 0 - `loglevel`: passive - `loglevelreplica`: warning - `logoneachnode`: True - `loggingnaninffilter`: True - `savesafetensors`: True - `saveoneachnode`: False - `saveonlymodel`: False - `restorecallbackstatesfromcheckpoint`: False - `nocuda`: False - `usecpu`: False - `usempsdevice`: False - `seed`: 42 - `dataseed`: None - `jitmodeeval`: False - `useipex`: False - `bf16`: False - `fp16`: False - `fp16optlevel`: O1 - `halfprecisionbackend`: auto - `bf16fulleval`: False - `fp16fulleval`: False - `tf32`: None - `localrank`: 0 - `ddpbackend`: None - `tpunumcores`: None - `tpumetricsdebug`: False - `debug`: [] - `dataloaderdroplast`: False - `dataloadernumworkers`: 0 - `dataloaderprefetchfactor`: None - `pastindex`: -1 - `disabletqdm`: False - `removeunusedcolumns`: True - `labelnames`: None - `loadbestmodelatend`: False - `ignoredataskip`: False - `fsdp`: [] - `fsdpminnumparams`: 0 - `fsdpconfig`: {'minnumparams': 0, 'xla': False, 'xlafsdpv2': False, 'xlafsdpgradckpt': False} - `fsdptransformerlayerclstowrap`: None - `acceleratorconfig`: {'splitbatches': False, 'dispatchbatches': None, 'evenbatches': True, 'useseedablesampler': True, 'nonblocking': False, 'gradientaccumulationkwargs': None} - `parallelismconfig`: None - `deepspeed`: None - `labelsmoothingfactor`: 0.0 - `optim`: adamwtorchfused - `optimargs`: None - `adafactor`: False - `groupbylength`: False - `lengthcolumnname`: length - `ddpfindunusedparameters`: None - `ddpbucketcapmb`: None - `ddpbroadcastbuffers`: False - `dataloaderpinmemory`: True - `dataloaderpersistentworkers`: False - `skipmemorymetrics`: True - `uselegacypredictionloop`: False - `pushtohub`: False - `resumefromcheckpoint`: None - `hubmodelid`: None - `hubstrategy`: everysave - `hubprivaterepo`: None - `hubalwayspush`: False - `hubrevision`: None - `gradientcheckpointing`: False - `gradientcheckpointingkwargs`: None - `includeinputsformetrics`: False - `includeformetrics`: [] - `evaldoconcatbatches`: True - `fp16backend`: auto - `pushtohubmodelid`: None - `pushtohuborganization`: None - `mpparameters`: - `autofindbatchsize`: False - `fulldeterminism`: False - `torchdynamo`: None - `rayscope`: last - `ddptimeout`: 1800 - `torchcompile`: False - `torchcompilebackend`: None - `torchcompilemode`: None - `includetokenspersecond`: False - `includenuminputtokensseen`: False - `neftunenoisealpha`: None - `optimtargetmodules`: None - `batchevalmetrics`: False - `evalonstart`: False - `useligerkernel`: False - `ligerkernelconfig`: None - `evalusegatherobject`: False - `averagetokensacrossdevices`: False - `prompts`: None - `batchsampler`: batchsampler - `multidatasetbatchsampler`: roundrobin - `routermapping`: {} - `learningratemapping`: {} Training Logs | Epoch | Step | Training Loss | |:------:|:----:|:-------------:| | 0.1500 | 500 | 0.0194 | | 0.2999 | 1000 | 0.0169 | | 0.4499 | 1500 | 0.0132 | | 0.5999 | 2000 | 0.0042 | | 0.7499 | 2500 | 0.0048 | | 0.8998 | 3000 | 0.0034 | Framework Versions - Python: 3.12.11 - Sentence Transformers: 5.1.0 - Transformers: 4.56.2 - PyTorch: 2.8.0+cu128 - Accelerate: 1.10.1 - Datasets: 4.0.0 - Tokenizers: 0.22.1
AQUA-7B-gguf
dental-gemma-300m-emb
SentenceTransformer based on google/embeddinggemma-300m This is a sentence-transformers model finetuned from google/embeddinggemma-300m. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more. Model Description - Model Type: Sentence Transformer - Base model: google/embeddinggemma-300m - Maximum Sequence Length: 2048 tokens - Output Dimensionality: 768 dimensions - Similarity Function: Cosine Similarity - Documentation: Sentence Transformers Documentation - Repository: Sentence Transformers on GitHub - Hugging Face: Sentence Transformers on Hugging Face Size: 20,000 training samples Columns: sentence0 and sentence1 Approximate statistics based on the first 1000 samples: | | sentence0 | sentence1 | |:--------|:----------------------------------------------------------------------------------|:------------------------------------------------------------------------------------| | type | string | string | | details | min: 9 tokens mean: 19.66 tokens max: 49 tokens | min: 22 tokens mean: 87.28 tokens max: 221 tokens | Samples: | sentence0 | sentence1 | |:---------------------------------------------------------------------------------------------------------------------------------------|:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | How do prosthodontists in Australia determine the need for placing a post in a tooth restoration? | Prosthodontists in Australia consider both the quantity of tooth structure and the type of planned restoration when deciding whether to place a post. The location of the tooth in the arch seems to have less influence on their decision. Molar teeth and mandibular anterior teeth are less likely to receive posts. | | What are some patient-centered outcome measures used to assess the impact of oral health problems on quality of life? | There are several patient-centered outcome measures called 'oral health related quality of life measures' (OHQoL) that have been developed to assess the extent to which oral health problems affect a person's quality of life. Two measures that have received particular attention are the Oral Health Impact Profile (OHIP-14) and the UK Oral Health Related Quality of Life (OHQoL-UK) questionnaires. The OHIP-14 measures the adverse impacts of oral conditions on daily life, while the OHQoL-UK incorporates both negative and positive influences on health. | | How does finite element analysis (FEA) contribute to the understanding and improvement of dental implant procedures? | Finite element analysis (FEA) is a computational method that can be used to simulate the distribution of stress and strain in the mandibular bone and osseointegrated implants. By considering various variables such as material characteristics, types of loads, and individual bio-subjectivity, FEA studies provide valuable insights into stress distribution and geometry evaluation. This information helps in making informed decisions about implant positioning, inclination, and type to ensure the long-term stability and success of dental implants. | Loss: MultipleNegativesRankingLoss with these parameters: Training Hyperparameters Non-Default Hyperparameters - `perdevicetrainbatchsize`: 4 - `perdeviceevalbatchsize`: 4 - `numtrainepochs`: 1 - `multidatasetbatchsampler`: roundrobin - `overwriteoutputdir`: False - `dopredict`: False - `evalstrategy`: no - `predictionlossonly`: True - `perdevicetrainbatchsize`: 4 - `perdeviceevalbatchsize`: 4 - `pergputrainbatchsize`: None - `pergpuevalbatchsize`: None - `gradientaccumulationsteps`: 1 - `evalaccumulationsteps`: None - `torchemptycachesteps`: None - `learningrate`: 5e-05 - `weightdecay`: 0.0 - `adambeta1`: 0.9 - `adambeta2`: 0.999 - `adamepsilon`: 1e-08 - `maxgradnorm`: 1 - `numtrainepochs`: 1 - `maxsteps`: -1 - `lrschedulertype`: linear - `lrschedulerkwargs`: {} - `warmupratio`: 0.0 - `warmupsteps`: 0 - `loglevel`: passive - `loglevelreplica`: warning - `logoneachnode`: True - `loggingnaninffilter`: True - `savesafetensors`: True - `saveoneachnode`: False - `saveonlymodel`: False - `restorecallbackstatesfromcheckpoint`: False - `nocuda`: False - `usecpu`: False - `usempsdevice`: False - `seed`: 42 - `dataseed`: None - `jitmodeeval`: False - `useipex`: False - `bf16`: False - `fp16`: False - `fp16optlevel`: O1 - `halfprecisionbackend`: auto - `bf16fulleval`: False - `fp16fulleval`: False - `tf32`: None - `localrank`: 0 - `ddpbackend`: None - `tpunumcores`: None - `tpumetricsdebug`: False - `debug`: [] - `dataloaderdroplast`: False - `dataloadernumworkers`: 0 - `dataloaderprefetchfactor`: None - `pastindex`: -1 - `disabletqdm`: False - `removeunusedcolumns`: True - `labelnames`: None - `loadbestmodelatend`: False - `ignoredataskip`: False - `fsdp`: [] - `fsdpminnumparams`: 0 - `fsdpconfig`: {'minnumparams': 0, 'xla': False, 'xlafsdpv2': False, 'xlafsdpgradckpt': False} - `fsdptransformerlayerclstowrap`: None - `acceleratorconfig`: {'splitbatches': False, 'dispatchbatches': None, 'evenbatches': True, 'useseedablesampler': True, 'nonblocking': False, 'gradientaccumulationkwargs': None} - `parallelismconfig`: None - `deepspeed`: None - `labelsmoothingfactor`: 0.0 - `optim`: adamwtorchfused - `optimargs`: None - `adafactor`: False - `groupbylength`: False - `lengthcolumnname`: length - `ddpfindunusedparameters`: None - `ddpbucketcapmb`: None - `ddpbroadcastbuffers`: False - `dataloaderpinmemory`: True - `dataloaderpersistentworkers`: False - `skipmemorymetrics`: True - `uselegacypredictionloop`: False - `pushtohub`: False - `resumefromcheckpoint`: None - `hubmodelid`: None - `hubstrategy`: everysave - `hubprivaterepo`: None - `hubalwayspush`: False - `hubrevision`: None - `gradientcheckpointing`: False - `gradientcheckpointingkwargs`: None - `includeinputsformetrics`: False - `includeformetrics`: [] - `evaldoconcatbatches`: True - `fp16backend`: auto - `pushtohubmodelid`: None - `pushtohuborganization`: None - `mpparameters`: - `autofindbatchsize`: False - `fulldeterminism`: False - `torchdynamo`: None - `rayscope`: last - `ddptimeout`: 1800 - `torchcompile`: False - `torchcompilebackend`: None - `torchcompilemode`: None - `includetokenspersecond`: False - `includenuminputtokensseen`: False - `neftunenoisealpha`: None - `optimtargetmodules`: None - `batchevalmetrics`: False - `evalonstart`: False - `useligerkernel`: False - `ligerkernelconfig`: None - `evalusegatherobject`: False - `averagetokensacrossdevices`: False - `prompts`: None - `batchsampler`: batchsampler - `multidatasetbatchsampler`: roundrobin - `routermapping`: {} - `learningratemapping`: {} Training Logs | Epoch | Step | Training Loss | |:-----:|:----:|:-------------:| | 0.1 | 500 | 0.0147 | | 0.2 | 1000 | 0.0142 | | 0.3 | 1500 | 0.0154 | | 0.4 | 2000 | 0.0085 | | 0.5 | 2500 | 0.0052 | | 0.6 | 3000 | 0.0071 | | 0.7 | 3500 | 0.0025 | | 0.8 | 4000 | 0.0028 | | 0.9 | 4500 | 0.0056 | | 1.0 | 5000 | 0.0045 | Framework Versions - Python: 3.12.11 - Sentence Transformers: 5.1.0 - Transformers: 4.56.1 - PyTorch: 2.8.0+cu128 - Accelerate: 1.10.1 - Datasets: 4.0.0 - Tokenizers: 0.22.0
nephrology-gemma-300m-emb
SentenceTransformer based on google/embeddinggemma-300m This is a sentence-transformers model finetuned from google/embeddinggemma-300m. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more. Model Description - Model Type: Sentence Transformer - Base model: google/embeddinggemma-300m - Maximum Sequence Length: 2048 tokens - Output Dimensionality: 768 dimensions - Similarity Function: Cosine Similarity - Documentation: Sentence Transformers Documentation - Repository: Sentence Transformers on GitHub - Hugging Face: Sentence Transformers on Hugging Face Size: 20,000 training samples Columns: sentence0 and sentence1 Approximate statistics based on the first 1000 samples: | | sentence0 | sentence1 | |:--------|:-----------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------| | type | string | string | | details | min: 10 tokens mean: 22.05 tokens max: 56 tokens | min: 20 tokens mean: 91.9 tokens max: 281 tokens | Samples: | sentence0 | sentence1 | |:-----------------------------------------------------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | How do the CKD-EPI and Japanese equations compare to Ccr and CGF in estimating renal function in cancer patients? | The CKD-EPI and Japanese equations provide more accurate estimates of renal function compared to 24-hour Ccr and CGF in cancer patients before and after chemotherapy with cisplatin. These new equations have lower bias and higher precision values, indicating better estimation of glomerular filtration rate (GFR). The CKD-EPI and Japanese equations were developed as better estimates of GFR than Ccr and CGF, which were mostly developed in chronic kidney disease (CKD) patients without cancer. The accuracy of the CKD-EPI and Japanese equations in estimating GFR in cancer patients is consistent with previous studies. Therefore, it is recommended to replace Ccr and CGF with these new equations for the evaluation of renal function in cancer patients undergoing cisplatin-containing chemotherapy. | | What are the clinical phenotypes of Bartter-like syndrome? | Bartter-like syndrome can be divided into at least three different clinical phenotypes: classic Bartter syndrome, Gitelman syndrome, and antenatal (neonatal) Bartter syndrome. Classic Bartter syndrome and Gitelman syndrome have renal tubular hypokalemic alkalosis, while antenatal Bartter syndrome also has profound systemic manifestations such as polyhydramnios, premature delivery, severe water and salt wasting, hypokalemic metabolic alkalosis, severe hypercalciuria, and marked growth retardation. | | What is granulomatous interstitial nephritis (GIN), and how frequently does it occur in patients with sarcoidosis? | Granulomatous interstitial nephritis (GIN) is a form of renal inflammation characterized by the presence of granulomas in the interstitial tissue of the kidneys. In patients with sarcoidosis, GIN is reportedly present in approximately one-third of patients with clinical evidence of renal disease. Post-mortem series have shown that between 7 and 27% of all patients with sarcoidosis may have GIN. It is important to note that GIN can occur in sarcoidosis patients even in the absence of obvious clinical renal disease. | Loss: MultipleNegativesRankingLoss with these parameters: Training Hyperparameters Non-Default Hyperparameters - `perdevicetrainbatchsize`: 6 - `perdeviceevalbatchsize`: 6 - `numtrainepochs`: 1 - `multidatasetbatchsampler`: roundrobin - `overwriteoutputdir`: False - `dopredict`: False - `evalstrategy`: no - `predictionlossonly`: True - `perdevicetrainbatchsize`: 6 - `perdeviceevalbatchsize`: 6 - `pergputrainbatchsize`: None - `pergpuevalbatchsize`: None - `gradientaccumulationsteps`: 1 - `evalaccumulationsteps`: None - `torchemptycachesteps`: None - `learningrate`: 5e-05 - `weightdecay`: 0.0 - `adambeta1`: 0.9 - `adambeta2`: 0.999 - `adamepsilon`: 1e-08 - `maxgradnorm`: 1 - `numtrainepochs`: 1 - `maxsteps`: -1 - `lrschedulertype`: linear - `lrschedulerkwargs`: {} - `warmupratio`: 0.0 - `warmupsteps`: 0 - `loglevel`: passive - `loglevelreplica`: warning - `logoneachnode`: True - `loggingnaninffilter`: True - `savesafetensors`: True - `saveoneachnode`: False - `saveonlymodel`: False - `restorecallbackstatesfromcheckpoint`: False - `nocuda`: False - `usecpu`: False - `usempsdevice`: False - `seed`: 42 - `dataseed`: None - `jitmodeeval`: False - `useipex`: False - `bf16`: False - `fp16`: False - `fp16optlevel`: O1 - `halfprecisionbackend`: auto - `bf16fulleval`: False - `fp16fulleval`: False - `tf32`: None - `localrank`: 0 - `ddpbackend`: None - `tpunumcores`: None - `tpumetricsdebug`: False - `debug`: [] - `dataloaderdroplast`: False - `dataloadernumworkers`: 0 - `dataloaderprefetchfactor`: None - `pastindex`: -1 - `disabletqdm`: False - `removeunusedcolumns`: True - `labelnames`: None - `loadbestmodelatend`: False - `ignoredataskip`: False - `fsdp`: [] - `fsdpminnumparams`: 0 - `fsdpconfig`: {'minnumparams': 0, 'xla': False, 'xlafsdpv2': False, 'xlafsdpgradckpt': False} - `fsdptransformerlayerclstowrap`: None - `acceleratorconfig`: {'splitbatches': False, 'dispatchbatches': None, 'evenbatches': True, 'useseedablesampler': True, 'nonblocking': False, 'gradientaccumulationkwargs': None} - `parallelismconfig`: None - `deepspeed`: None - `labelsmoothingfactor`: 0.0 - `optim`: adamwtorchfused - `optimargs`: None - `adafactor`: False - `groupbylength`: False - `lengthcolumnname`: length - `ddpfindunusedparameters`: None - `ddpbucketcapmb`: None - `ddpbroadcastbuffers`: False - `dataloaderpinmemory`: True - `dataloaderpersistentworkers`: False - `skipmemorymetrics`: True - `uselegacypredictionloop`: False - `pushtohub`: False - `resumefromcheckpoint`: None - `hubmodelid`: None - `hubstrategy`: everysave - `hubprivaterepo`: None - `hubalwayspush`: False - `hubrevision`: None - `gradientcheckpointing`: False - `gradientcheckpointingkwargs`: None - `includeinputsformetrics`: False - `includeformetrics`: [] - `evaldoconcatbatches`: True - `fp16backend`: auto - `pushtohubmodelid`: None - `pushtohuborganization`: None - `mpparameters`: - `autofindbatchsize`: False - `fulldeterminism`: False - `torchdynamo`: None - `rayscope`: last - `ddptimeout`: 1800 - `torchcompile`: False - `torchcompilebackend`: None - `torchcompilemode`: None - `includetokenspersecond`: False - `includenuminputtokensseen`: False - `neftunenoisealpha`: None - `optimtargetmodules`: None - `batchevalmetrics`: False - `evalonstart`: False - `useligerkernel`: False - `ligerkernelconfig`: None - `evalusegatherobject`: False - `averagetokensacrossdevices`: False - `prompts`: None - `batchsampler`: batchsampler - `multidatasetbatchsampler`: roundrobin - `routermapping`: {} - `learningratemapping`: {} Training Logs | Epoch | Step | Training Loss | |:------:|:----:|:-------------:| | 0.1500 | 500 | 0.0296 | | 0.2999 | 1000 | 0.0138 | | 0.4499 | 1500 | 0.0108 | | 0.5999 | 2000 | 0.0107 | | 0.7499 | 2500 | 0.0061 | | 0.8998 | 3000 | 0.0052 | Framework Versions - Python: 3.12.11 - Sentence Transformers: 5.1.0 - Transformers: 4.56.1 - PyTorch: 2.8.0+cu128 - Accelerate: 1.10.1 - Datasets: 4.0.0 - Tokenizers: 0.22.0
ortho-gemma-300m-emb
SentenceTransformer based on google/embeddinggemma-300m This is a sentence-transformers model finetuned from google/embeddinggemma-300m. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more. Model Description - Model Type: Sentence Transformer - Base model: google/embeddinggemma-300m - Maximum Sequence Length: 2048 tokens - Output Dimensionality: 768 dimensions - Similarity Function: Cosine Similarity - Documentation: Sentence Transformers Documentation - Repository: Sentence Transformers on GitHub - Hugging Face: Sentence Transformers on Hugging Face Size: 20,000 training samples Columns: sentence0 and sentence1 Approximate statistics based on the first 1000 samples: | | sentence0 | sentence1 | |:--------|:-----------------------------------------------------------------------------------|:------------------------------------------------------------------------------------| | type | string | string | | details | min: 10 tokens mean: 21.43 tokens max: 52 tokens | min: 18 tokens mean: 86.56 tokens max: 232 tokens | Samples: | sentence0 | sentence1 | |:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | What are some barriers and facilitators associated with the use and prescription of non-surgical treatments for knee or hip osteoarthritis in orthopaedic practice? | Barriers and facilitators associated with the use and prescription of non-surgical treatments for knee or hip osteoarthritis in orthopaedic practice may exist at various levels, including the professional, patient, social context, organizational context, and external environment. Some potential barriers may include lack of awareness or knowledge about non-surgical treatment options, time constraints, financial considerations, and patient preferences. Facilitators may include access to resources and support, clear guidelines and protocols, effective communication between healthcare professionals and patients, and a collaborative approach to decision-making. | | What are the different names used to refer to bipolar fracture dislocations of the clavicle? | Bipolar fracture dislocations of the clavicle are also known as complete dislocation, bipolar dislocation, panclavicular dislocation, bifocal clavicular dislocation, and traumatic floating clavicle. | | What is the association between returning to pivoting sports and the development of osteoarthritis after ACL reconstruction surgery? | Recent research has shown that patients who have returned to pivoting sports after ACL reconstruction surgery have a reduced risk of developing symptomatic and radiographic osteoarthritis compared to those who have not returned to pivoting sports. However, the reasons for this difference are unclear, and it is important to note that patients who failed to return to pivoting sports also reported poor knee function, which may have influenced the results. The rates of osteoarthritis after ACL reconstruction surgery increase over time, with higher rates observed with longer time intervals from injury to surgery. The role of activity in the development of osteoarthritis after ACL reconstruction surgery has not been thoroughly evaluated. | Loss: MultipleNegativesRankingLoss with these parameters: Training Hyperparameters Non-Default Hyperparameters - `perdevicetrainbatchsize`: 6 - `perdeviceevalbatchsize`: 6 - `numtrainepochs`: 1 - `multidatasetbatchsampler`: roundrobin - `overwriteoutputdir`: False - `dopredict`: False - `evalstrategy`: no - `predictionlossonly`: True - `perdevicetrainbatchsize`: 6 - `perdeviceevalbatchsize`: 6 - `pergputrainbatchsize`: None - `pergpuevalbatchsize`: None - `gradientaccumulationsteps`: 1 - `evalaccumulationsteps`: None - `torchemptycachesteps`: None - `learningrate`: 5e-05 - `weightdecay`: 0.0 - `adambeta1`: 0.9 - `adambeta2`: 0.999 - `adamepsilon`: 1e-08 - `maxgradnorm`: 1 - `numtrainepochs`: 1 - `maxsteps`: -1 - `lrschedulertype`: linear - `lrschedulerkwargs`: {} - `warmupratio`: 0.0 - `warmupsteps`: 0 - `loglevel`: passive - `loglevelreplica`: warning - `logoneachnode`: True - `loggingnaninffilter`: True - `savesafetensors`: True - `saveoneachnode`: False - `saveonlymodel`: False - `restorecallbackstatesfromcheckpoint`: False - `nocuda`: False - `usecpu`: False - `usempsdevice`: False - `seed`: 42 - `dataseed`: None - `jitmodeeval`: False - `useipex`: False - `bf16`: False - `fp16`: False - `fp16optlevel`: O1 - `halfprecisionbackend`: auto - `bf16fulleval`: False - `fp16fulleval`: False - `tf32`: None - `localrank`: 0 - `ddpbackend`: None - `tpunumcores`: None - `tpumetricsdebug`: False - `debug`: [] - `dataloaderdroplast`: False - `dataloadernumworkers`: 0 - `dataloaderprefetchfactor`: None - `pastindex`: -1 - `disabletqdm`: False - `removeunusedcolumns`: True - `labelnames`: None - `loadbestmodelatend`: False - `ignoredataskip`: False - `fsdp`: [] - `fsdpminnumparams`: 0 - `fsdpconfig`: {'minnumparams': 0, 'xla': False, 'xlafsdpv2': False, 'xlafsdpgradckpt': False} - `fsdptransformerlayerclstowrap`: None - `acceleratorconfig`: {'splitbatches': False, 'dispatchbatches': None, 'evenbatches': True, 'useseedablesampler': True, 'nonblocking': False, 'gradientaccumulationkwargs': None} - `parallelismconfig`: None - `deepspeed`: None - `labelsmoothingfactor`: 0.0 - `optim`: adamwtorchfused - `optimargs`: None - `adafactor`: False - `groupbylength`: False - `lengthcolumnname`: length - `ddpfindunusedparameters`: None - `ddpbucketcapmb`: None - `ddpbroadcastbuffers`: False - `dataloaderpinmemory`: True - `dataloaderpersistentworkers`: False - `skipmemorymetrics`: True - `uselegacypredictionloop`: False - `pushtohub`: False - `resumefromcheckpoint`: None - `hubmodelid`: None - `hubstrategy`: everysave - `hubprivaterepo`: None - `hubalwayspush`: False - `hubrevision`: None - `gradientcheckpointing`: False - `gradientcheckpointingkwargs`: None - `includeinputsformetrics`: False - `includeformetrics`: [] - `evaldoconcatbatches`: True - `fp16backend`: auto - `pushtohubmodelid`: None - `pushtohuborganization`: None - `mpparameters`: - `autofindbatchsize`: False - `fulldeterminism`: False - `torchdynamo`: None - `rayscope`: last - `ddptimeout`: 1800 - `torchcompile`: False - `torchcompilebackend`: None - `torchcompilemode`: None - `includetokenspersecond`: False - `includenuminputtokensseen`: False - `neftunenoisealpha`: None - `optimtargetmodules`: None - `batchevalmetrics`: False - `evalonstart`: False - `useligerkernel`: False - `ligerkernelconfig`: None - `evalusegatherobject`: False - `averagetokensacrossdevices`: False - `prompts`: None - `batchsampler`: batchsampler - `multidatasetbatchsampler`: roundrobin - `routermapping`: {} - `learningratemapping`: {} Training Logs | Epoch | Step | Training Loss | |:------:|:----:|:-------------:| | 0.1500 | 500 | 0.0309 | | 0.2999 | 1000 | 0.0107 | | 0.4499 | 1500 | 0.0078 | | 0.5999 | 2000 | 0.0081 | | 0.7499 | 2500 | 0.0017 | | 0.8998 | 3000 | 0.0018 | Framework Versions - Python: 3.12.11 - Sentence Transformers: 5.1.0 - Transformers: 4.56.2 - PyTorch: 2.8.0+cu128 - Accelerate: 1.10.1 - Datasets: 4.0.0 - Tokenizers: 0.22.1
tamil-gemma-300m-emb
bert-electrical-ner
smoothie-diffusion-qqp
pharma-gemma-300m-emb
SentenceTransformer based on google/embeddinggemma-300m This is a sentence-transformers model finetuned from google/embeddinggemma-300m. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more. Model Description - Model Type: Sentence Transformer - Base model: google/embeddinggemma-300m - Maximum Sequence Length: 2048 tokens - Output Dimensionality: 768 dimensions - Similarity Function: Cosine Similarity - Documentation: Sentence Transformers Documentation - Repository: Sentence Transformers on GitHub - Hugging Face: Sentence Transformers on Hugging Face Size: 20,000 training samples Columns: sentence0 and sentence1 Approximate statistics based on the first 1000 samples: | | sentence0 | sentence1 | |:--------|:-----------------------------------------------------------------------------------|:------------------------------------------------------------------------------------| | type | string | string | | details | min: 10 tokens mean: 21.09 tokens max: 48 tokens | min: 20 tokens mean: 94.97 tokens max: 223 tokens | Samples: | sentence0 | sentence1 | |:----------------------------------------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | How does ticlopidine differ from clopidogrel in terms of side effects and precautions? | Unlike clopidogrel, ticlopidine can lead to neutropenia in up to 1% of patients, which limits its widespread use. Regular blood count checks are necessary in the initial weeks of ticlopidine treatment. Additionally, neuraxial regional anesthesia should not be performed until 10 days have elapsed since the last ingestion of ticlopidine. | | What are the different types of ligands that can bind to GPCRs? | GPCRs can bind a wide variety of endogenous ligands, including neuropeptides, amino acids, ions, hormones, chemokines, lipid-derived mediators, and ions. Some GPCRs are considered orphan receptors because their exact ligands have not been identified yet. | | How does etomidate function as an adrenostatic agent and what are its effects on cortisol secretion? | Etomidate acts as an adrenostatic agent by blocking the cytochrome P450-dependent adrenal enzymes 11β-hydroxylase and cholesterol-side-chain cleavage enzyme. This inhibition leads to a decrease in cortisol secretion. In dispersed guinea-pig adrenal cells, etomidate has been shown to be the most potent adrenostatic drug available, with a mean concentration of 97 nmol/l required for 50% inhibition of cortisol secretion. This concentration is considerably lower than the plasma concentration needed to induce sedation. After a single induction dose of etomidate, the adrenocortical blockade lasts several hours while the hypnotic action of etomidate rapidly fades. | Loss: MultipleNegativesRankingLoss with these parameters: Training Hyperparameters Non-Default Hyperparameters - `perdevicetrainbatchsize`: 4 - `perdeviceevalbatchsize`: 4 - `numtrainepochs`: 1 - `multidatasetbatchsampler`: roundrobin - `overwriteoutputdir`: False - `dopredict`: False - `evalstrategy`: no - `predictionlossonly`: True - `perdevicetrainbatchsize`: 4 - `perdeviceevalbatchsize`: 4 - `pergputrainbatchsize`: None - `pergpuevalbatchsize`: None - `gradientaccumulationsteps`: 1 - `evalaccumulationsteps`: None - `torchemptycachesteps`: None - `learningrate`: 5e-05 - `weightdecay`: 0.0 - `adambeta1`: 0.9 - `adambeta2`: 0.999 - `adamepsilon`: 1e-08 - `maxgradnorm`: 1 - `numtrainepochs`: 1 - `maxsteps`: -1 - `lrschedulertype`: linear - `lrschedulerkwargs`: {} - `warmupratio`: 0.0 - `warmupsteps`: 0 - `loglevel`: passive - `loglevelreplica`: warning - `logoneachnode`: True - `loggingnaninffilter`: True - `savesafetensors`: True - `saveoneachnode`: False - `saveonlymodel`: False - `restorecallbackstatesfromcheckpoint`: False - `nocuda`: False - `usecpu`: False - `usempsdevice`: False - `seed`: 42 - `dataseed`: None - `jitmodeeval`: False - `useipex`: False - `bf16`: False - `fp16`: False - `fp16optlevel`: O1 - `halfprecisionbackend`: auto - `bf16fulleval`: False - `fp16fulleval`: False - `tf32`: None - `localrank`: 0 - `ddpbackend`: None - `tpunumcores`: None - `tpumetricsdebug`: False - `debug`: [] - `dataloaderdroplast`: False - `dataloadernumworkers`: 0 - `dataloaderprefetchfactor`: None - `pastindex`: -1 - `disabletqdm`: False - `removeunusedcolumns`: True - `labelnames`: None - `loadbestmodelatend`: False - `ignoredataskip`: False - `fsdp`: [] - `fsdpminnumparams`: 0 - `fsdpconfig`: {'minnumparams': 0, 'xla': False, 'xlafsdpv2': False, 'xlafsdpgradckpt': False} - `fsdptransformerlayerclstowrap`: None - `acceleratorconfig`: {'splitbatches': False, 'dispatchbatches': None, 'evenbatches': True, 'useseedablesampler': True, 'nonblocking': False, 'gradientaccumulationkwargs': None} - `parallelismconfig`: None - `deepspeed`: None - `labelsmoothingfactor`: 0.0 - `optim`: adamwtorchfused - `optimargs`: None - `adafactor`: False - `groupbylength`: False - `lengthcolumnname`: length - `ddpfindunusedparameters`: None - `ddpbucketcapmb`: None - `ddpbroadcastbuffers`: False - `dataloaderpinmemory`: True - `dataloaderpersistentworkers`: False - `skipmemorymetrics`: True - `uselegacypredictionloop`: False - `pushtohub`: False - `resumefromcheckpoint`: None - `hubmodelid`: None - `hubstrategy`: everysave - `hubprivaterepo`: None - `hubalwayspush`: False - `hubrevision`: None - `gradientcheckpointing`: False - `gradientcheckpointingkwargs`: None - `includeinputsformetrics`: False - `includeformetrics`: [] - `evaldoconcatbatches`: True - `fp16backend`: auto - `pushtohubmodelid`: None - `pushtohuborganization`: None - `mpparameters`: - `autofindbatchsize`: False - `fulldeterminism`: False - `torchdynamo`: None - `rayscope`: last - `ddptimeout`: 1800 - `torchcompile`: False - `torchcompilebackend`: None - `torchcompilemode`: None - `includetokenspersecond`: False - `includenuminputtokensseen`: False - `neftunenoisealpha`: None - `optimtargetmodules`: None - `batchevalmetrics`: False - `evalonstart`: False - `useligerkernel`: False - `ligerkernelconfig`: None - `evalusegatherobject`: False - `averagetokensacrossdevices`: False - `prompts`: None - `batchsampler`: batchsampler - `multidatasetbatchsampler`: roundrobin - `routermapping`: {} - `learningratemapping`: {} Training Logs | Epoch | Step | Training Loss | |:-----:|:----:|:-------------:| | 0.1 | 500 | 0.0134 | | 0.2 | 1000 | 0.009 | | 0.3 | 1500 | 0.0138 | | 0.4 | 2000 | 0.0052 | | 0.5 | 2500 | 0.0154 | | 0.6 | 3000 | 0.0076 | | 0.7 | 3500 | 0.0062 | | 0.8 | 4000 | 0.0021 | | 0.9 | 4500 | 0.0028 | | 1.0 | 5000 | 0.0015 | Framework Versions - Python: 3.12.11 - Sentence Transformers: 5.1.0 - Transformers: 4.56.1 - PyTorch: 2.8.0+cu128 - Accelerate: 1.10.1 - Datasets: 4.0.0 - Tokenizers: 0.22.0
ophthalmology-gemma-300m-emb
SentenceTransformer based on google/embeddinggemma-300m This is a sentence-transformers model finetuned from google/embeddinggemma-300m. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more. Model Description - Model Type: Sentence Transformer - Base model: google/embeddinggemma-300m - Maximum Sequence Length: 2048 tokens - Output Dimensionality: 768 dimensions - Similarity Function: Cosine Similarity - Documentation: Sentence Transformers Documentation - Repository: Sentence Transformers on GitHub - Hugging Face: Sentence Transformers on Hugging Face Size: 20,000 training samples Columns: sentence0 and sentence1 Approximate statistics based on the first 1000 samples: | | sentence0 | sentence1 | |:--------|:-----------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------| | type | string | string | | details | min: 10 tokens mean: 21.87 tokens max: 67 tokens | min: 15 tokens mean: 90.2 tokens max: 241 tokens | Samples: | sentence0 | sentence1 | |:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | How does OCT contribute to the identification and assessment of vitreoretinal traction in macular holes, and why is bilaterally performing tomographic imaging important for early detection in patients affected by this condition? | OCT contributes significantly to the identification and assessment of vitreoretinal traction in macular holes, enabling the visualization of vitreoretinal interface abnormalities. It plays a crucial role in assessing the risk for hole formation in the fellow eye of patients with a unilateral macular hole, as the probability of developing a macular hole in the contralateral eye is 13% in 48 months. Therefore, it is essential to perform bilateral tomographic imaging in patients affected by this pathology for early detection in the other eye. Considering that surgery in the normal contralateral eye of patients with macular holes is not acceptable, early detection through bilateral imaging is crucial for appropriate clinical management. | | How does corneal refractive therapy, also known as overnight orthokeratology, work to reduce myopia? | Corneal refractive therapy, or overnight orthokeratology, involves the use of reverse geometry contact lenses to induce temporary flattening in the central corneal curvature. This temporary flattening leads to a short-term reduction in myopia and improved unaided visual acuity. Recent studies have also shown that corneal refractive therapy has the potential to slow down the progression of myopia. | | What is the significance of contrast sensitivity testing in early age-related macular degeneration (AMD) patients, and how does it relate to the prognosis and treatment of the condition? | Contrast sensitivity testing is crucial in early AMD patients as it can detect even slight changes in visual system performance, particularly impairment at nighttime with and without glare. This testing provides important prognostic value, as AMD patients with less affected or intact contrast sensitivity have better prognoses and respond more effectively to treatments with vascular endothelial growth factor inhibitors, photodynamic therapy, and laser photocoagulation. Additionally, early diagnosis of AMD using contrast sensitivity testing can aid in the prevention of future blindness. | Loss: MultipleNegativesRankingLoss with these parameters: Training Hyperparameters Non-Default Hyperparameters - `perdevicetrainbatchsize`: 6 - `perdeviceevalbatchsize`: 6 - `numtrainepochs`: 1 - `multidatasetbatchsampler`: roundrobin - `overwriteoutputdir`: False - `dopredict`: False - `evalstrategy`: no - `predictionlossonly`: True - `perdevicetrainbatchsize`: 6 - `perdeviceevalbatchsize`: 6 - `pergputrainbatchsize`: None - `pergpuevalbatchsize`: None - `gradientaccumulationsteps`: 1 - `evalaccumulationsteps`: None - `torchemptycachesteps`: None - `learningrate`: 5e-05 - `weightdecay`: 0.0 - `adambeta1`: 0.9 - `adambeta2`: 0.999 - `adamepsilon`: 1e-08 - `maxgradnorm`: 1 - `numtrainepochs`: 1 - `maxsteps`: -1 - `lrschedulertype`: linear - `lrschedulerkwargs`: {} - `warmupratio`: 0.0 - `warmupsteps`: 0 - `loglevel`: passive - `loglevelreplica`: warning - `logoneachnode`: True - `loggingnaninffilter`: True - `savesafetensors`: True - `saveoneachnode`: False - `saveonlymodel`: False - `restorecallbackstatesfromcheckpoint`: False - `nocuda`: False - `usecpu`: False - `usempsdevice`: False - `seed`: 42 - `dataseed`: None - `jitmodeeval`: False - `useipex`: False - `bf16`: False - `fp16`: False - `fp16optlevel`: O1 - `halfprecisionbackend`: auto - `bf16fulleval`: False - `fp16fulleval`: False - `tf32`: None - `localrank`: 0 - `ddpbackend`: None - `tpunumcores`: None - `tpumetricsdebug`: False - `debug`: [] - `dataloaderdroplast`: False - `dataloadernumworkers`: 0 - `dataloaderprefetchfactor`: None - `pastindex`: -1 - `disabletqdm`: False - `removeunusedcolumns`: True - `labelnames`: None - `loadbestmodelatend`: False - `ignoredataskip`: False - `fsdp`: [] - `fsdpminnumparams`: 0 - `fsdpconfig`: {'minnumparams': 0, 'xla': False, 'xlafsdpv2': False, 'xlafsdpgradckpt': False} - `fsdptransformerlayerclstowrap`: None - `acceleratorconfig`: {'splitbatches': False, 'dispatchbatches': None, 'evenbatches': True, 'useseedablesampler': True, 'nonblocking': False, 'gradientaccumulationkwargs': None} - `parallelismconfig`: None - `deepspeed`: None - `labelsmoothingfactor`: 0.0 - `optim`: adamwtorchfused - `optimargs`: None - `adafactor`: False - `groupbylength`: False - `lengthcolumnname`: length - `ddpfindunusedparameters`: None - `ddpbucketcapmb`: None - `ddpbroadcastbuffers`: False - `dataloaderpinmemory`: True - `dataloaderpersistentworkers`: False - `skipmemorymetrics`: True - `uselegacypredictionloop`: False - `pushtohub`: False - `resumefromcheckpoint`: None - `hubmodelid`: None - `hubstrategy`: everysave - `hubprivaterepo`: None - `hubalwayspush`: False - `hubrevision`: None - `gradientcheckpointing`: False - `gradientcheckpointingkwargs`: None - `includeinputsformetrics`: False - `includeformetrics`: [] - `evaldoconcatbatches`: True - `fp16backend`: auto - `pushtohubmodelid`: None - `pushtohuborganization`: None - `mpparameters`: - `autofindbatchsize`: False - `fulldeterminism`: False - `torchdynamo`: None - `rayscope`: last - `ddptimeout`: 1800 - `torchcompile`: False - `torchcompilebackend`: None - `torchcompilemode`: None - `includetokenspersecond`: False - `includenuminputtokensseen`: False - `neftunenoisealpha`: None - `optimtargetmodules`: None - `batchevalmetrics`: False - `evalonstart`: False - `useligerkernel`: False - `ligerkernelconfig`: None - `evalusegatherobject`: False - `averagetokensacrossdevices`: False - `prompts`: None - `batchsampler`: batchsampler - `multidatasetbatchsampler`: roundrobin - `routermapping`: {} - `learningratemapping`: {} Training Logs | Epoch | Step | Training Loss | |:------:|:----:|:-------------:| | 0.1500 | 500 | 0.0239 | | 0.2999 | 1000 | 0.0192 | | 0.4499 | 1500 | 0.015 | | 0.5999 | 2000 | 0.0058 | | 0.7499 | 2500 | 0.0071 | | 0.8998 | 3000 | 0.0072 | Framework Versions - Python: 3.12.11 - Sentence Transformers: 5.1.0 - Transformers: 4.56.2 - PyTorch: 2.8.0+cu128 - Accelerate: 1.10.1 - Datasets: 4.0.0 - Tokenizers: 0.22.1
pediatrics-gemma-300m-emb
SentenceTransformer based on google/embeddinggemma-300m This is a sentence-transformers model finetuned from google/embeddinggemma-300m. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more. Model Description - Model Type: Sentence Transformer - Base model: google/embeddinggemma-300m - Maximum Sequence Length: 2048 tokens - Output Dimensionality: 768 dimensions - Similarity Function: Cosine Similarity - Documentation: Sentence Transformers Documentation - Repository: Sentence Transformers on GitHub - Hugging Face: Sentence Transformers on Hugging Face Size: 20,000 training samples Columns: sentence0 and sentence1 Approximate statistics based on the first 1000 samples: | | sentence0 | sentence1 | |:--------|:----------------------------------------------------------------------------------|:------------------------------------------------------------------------------------| | type | string | string | | details | min: 8 tokens mean: 20.38 tokens max: 64 tokens | min: 17 tokens mean: 87.92 tokens max: 258 tokens | Samples: | sentence0 | sentence1 | |:--------------------------------------------------------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | What is the role of routine health check-ups in detecting and diagnosing metabolic syndrome and NAFLD in obese children? | Routine health check-ups are important in detecting and diagnosing metabolic syndrome and NAFLD in obese children. However, there is a lack of routine health check-up data specifically for these complications in obese children. To address this need, pediatric health promotion centers and pediatric obesity clinics have been developed. The aim of these centers is to provide routine health check-ups and obesity-oriented check-ups to detect and diagnose metabolic syndrome and NAFLD in children. | | How does the implementation of family-centered rounds (FCR) impact medical education? | The implementation of family-centered rounds (FCR) has raised concerns about its potential impact on medical education. Some evidence suggests that FCR may lead to decreased "didactic" teaching, increased discomfort in asking specific management questions, and limited time to discuss management options for residents and students (8, 9, 10, 11). However, the literature on the association between FCR and teaching has mainly focused on learners' perceptions, and there is a lack of objective data to address the relationship between FCR and medical knowledge acquisition. | | What are some common clinical symptoms of neonatal septicaemia? | Some common clinical symptoms of neonatal septicaemia include fever, poor feeding, excessive cry, difficulty in breathing, yellowish skin discoloration, skin rashes, jitteriness, and irritability. | Loss: MultipleNegativesRankingLoss with these parameters: Training Hyperparameters Non-Default Hyperparameters - `perdevicetrainbatchsize`: 6 - `perdeviceevalbatchsize`: 6 - `numtrainepochs`: 1 - `multidatasetbatchsampler`: roundrobin - `overwriteoutputdir`: False - `dopredict`: False - `evalstrategy`: no - `predictionlossonly`: True - `perdevicetrainbatchsize`: 6 - `perdeviceevalbatchsize`: 6 - `pergputrainbatchsize`: None - `pergpuevalbatchsize`: None - `gradientaccumulationsteps`: 1 - `evalaccumulationsteps`: None - `torchemptycachesteps`: None - `learningrate`: 5e-05 - `weightdecay`: 0.0 - `adambeta1`: 0.9 - `adambeta2`: 0.999 - `adamepsilon`: 1e-08 - `maxgradnorm`: 1 - `numtrainepochs`: 1 - `maxsteps`: -1 - `lrschedulertype`: linear - `lrschedulerkwargs`: {} - `warmupratio`: 0.0 - `warmupsteps`: 0 - `loglevel`: passive - `loglevelreplica`: warning - `logoneachnode`: True - `loggingnaninffilter`: True - `savesafetensors`: True - `saveoneachnode`: False - `saveonlymodel`: False - `restorecallbackstatesfromcheckpoint`: False - `nocuda`: False - `usecpu`: False - `usempsdevice`: False - `seed`: 42 - `dataseed`: None - `jitmodeeval`: False - `useipex`: False - `bf16`: False - `fp16`: False - `fp16optlevel`: O1 - `halfprecisionbackend`: auto - `bf16fulleval`: False - `fp16fulleval`: False - `tf32`: None - `localrank`: 0 - `ddpbackend`: None - `tpunumcores`: None - `tpumetricsdebug`: False - `debug`: [] - `dataloaderdroplast`: False - `dataloadernumworkers`: 0 - `dataloaderprefetchfactor`: None - `pastindex`: -1 - `disabletqdm`: False - `removeunusedcolumns`: True - `labelnames`: None - `loadbestmodelatend`: False - `ignoredataskip`: False - `fsdp`: [] - `fsdpminnumparams`: 0 - `fsdpconfig`: {'minnumparams': 0, 'xla': False, 'xlafsdpv2': False, 'xlafsdpgradckpt': False} - `fsdptransformerlayerclstowrap`: None - `acceleratorconfig`: {'splitbatches': False, 'dispatchbatches': None, 'evenbatches': True, 'useseedablesampler': True, 'nonblocking': False, 'gradientaccumulationkwargs': None} - `parallelismconfig`: None - `deepspeed`: None - `labelsmoothingfactor`: 0.0 - `optim`: adamwtorchfused - `optimargs`: None - `adafactor`: False - `groupbylength`: False - `lengthcolumnname`: length - `ddpfindunusedparameters`: None - `ddpbucketcapmb`: None - `ddpbroadcastbuffers`: False - `dataloaderpinmemory`: True - `dataloaderpersistentworkers`: False - `skipmemorymetrics`: True - `uselegacypredictionloop`: False - `pushtohub`: False - `resumefromcheckpoint`: None - `hubmodelid`: None - `hubstrategy`: everysave - `hubprivaterepo`: None - `hubalwayspush`: False - `hubrevision`: None - `gradientcheckpointing`: False - `gradientcheckpointingkwargs`: None - `includeinputsformetrics`: False - `includeformetrics`: [] - `evaldoconcatbatches`: True - `fp16backend`: auto - `pushtohubmodelid`: None - `pushtohuborganization`: None - `mpparameters`: - `autofindbatchsize`: False - `fulldeterminism`: False - `torchdynamo`: None - `rayscope`: last - `ddptimeout`: 1800 - `torchcompile`: False - `torchcompilebackend`: None - `torchcompilemode`: None - `includetokenspersecond`: False - `includenuminputtokensseen`: False - `neftunenoisealpha`: None - `optimtargetmodules`: None - `batchevalmetrics`: False - `evalonstart`: False - `useligerkernel`: False - `ligerkernelconfig`: None - `evalusegatherobject`: False - `averagetokensacrossdevices`: False - `prompts`: None - `batchsampler`: batchsampler - `multidatasetbatchsampler`: roundrobin - `routermapping`: {} - `learningratemapping`: {} Training Logs | Epoch | Step | Training Loss | |:------:|:----:|:-------------:| | 0.1500 | 500 | 0.0195 | | 0.2999 | 1000 | 0.0095 | | 0.4499 | 1500 | 0.0084 | | 0.5999 | 2000 | 0.0059 | | 0.7499 | 2500 | 0.0021 | | 0.8998 | 3000 | 0.0035 | Framework Versions - Python: 3.12.11 - Sentence Transformers: 5.1.0 - Transformers: 4.56.2 - PyTorch: 2.8.0+cu128 - Accelerate: 1.10.1 - Datasets: 4.0.0 - Tokenizers: 0.22.1
DeepScaleR-1.5B-Preview-mlx
emirati-arabic-gemma-300m-emb
SentenceTransformer based on google/embeddinggemma-300m This is a sentence-transformers model finetuned from google/embeddinggemma-300m. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more. Model Description - Model Type: Sentence Transformer - Base model: google/embeddinggemma-300m - Maximum Sequence Length: 2048 tokens - Output Dimensionality: 768 dimensions - Similarity Function: Cosine Similarity - Documentation: Sentence Transformers Documentation - Repository: Sentence Transformers on GitHub - Hugging Face: Sentence Transformers on Hugging Face Size: 12,324 training samples Columns: sentence0 and sentence1 Approximate statistics based on the first 1000 samples: | | sentence0 | sentence1 | |:--------|:----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------| | type | string | string | | details | min: 4 tokens mean: 11.73 tokens max: 64 tokens | min: 4 tokens mean: 14.47 tokens max: 68 tokens | Samples: | sentence0 | sentence1 | |:---------------------------------------|:-----------------------------------| | كم عمرك؟ | ٢٧ سنة. | | ما تقدر تنزل أكثر؟ | لا والله، ما بقى ربح. | | الجولة البحرية فيها وجبة؟ | نعم، عشاء مفتوح. | Loss: MultipleNegativesRankingLoss with these parameters: Training Hyperparameters Non-Default Hyperparameters - `perdevicetrainbatchsize`: 6 - `perdeviceevalbatchsize`: 6 - `numtrainepochs`: 4 - `multidatasetbatchsampler`: roundrobin - `overwriteoutputdir`: False - `dopredict`: False - `evalstrategy`: no - `predictionlossonly`: True - `perdevicetrainbatchsize`: 6 - `perdeviceevalbatchsize`: 6 - `pergputrainbatchsize`: None - `pergpuevalbatchsize`: None - `gradientaccumulationsteps`: 1 - `evalaccumulationsteps`: None - `torchemptycachesteps`: None - `learningrate`: 5e-05 - `weightdecay`: 0.0 - `adambeta1`: 0.9 - `adambeta2`: 0.999 - `adamepsilon`: 1e-08 - `maxgradnorm`: 1 - `numtrainepochs`: 4 - `maxsteps`: -1 - `lrschedulertype`: linear - `lrschedulerkwargs`: {} - `warmupratio`: 0.0 - `warmupsteps`: 0 - `loglevel`: passive - `loglevelreplica`: warning - `logoneachnode`: True - `loggingnaninffilter`: True - `savesafetensors`: True - `saveoneachnode`: False - `saveonlymodel`: False - `restorecallbackstatesfromcheckpoint`: False - `nocuda`: False - `usecpu`: False - `usempsdevice`: False - `seed`: 42 - `dataseed`: None - `jitmodeeval`: False - `useipex`: False - `bf16`: False - `fp16`: False - `fp16optlevel`: O1 - `halfprecisionbackend`: auto - `bf16fulleval`: False - `fp16fulleval`: False - `tf32`: None - `localrank`: 0 - `ddpbackend`: None - `tpunumcores`: None - `tpumetricsdebug`: False - `debug`: [] - `dataloaderdroplast`: False - `dataloadernumworkers`: 0 - `dataloaderprefetchfactor`: None - `pastindex`: -1 - `disabletqdm`: False - `removeunusedcolumns`: True - `labelnames`: None - `loadbestmodelatend`: False - `ignoredataskip`: False - `fsdp`: [] - `fsdpminnumparams`: 0 - `fsdpconfig`: {'minnumparams': 0, 'xla': False, 'xlafsdpv2': False, 'xlafsdpgradckpt': False} - `fsdptransformerlayerclstowrap`: None - `acceleratorconfig`: {'splitbatches': False, 'dispatchbatches': None, 'evenbatches': True, 'useseedablesampler': True, 'nonblocking': False, 'gradientaccumulationkwargs': None} - `parallelismconfig`: None - `deepspeed`: None - `labelsmoothingfactor`: 0.0 - `optim`: adamwtorchfused - `optimargs`: None - `adafactor`: False - `groupbylength`: False - `lengthcolumnname`: length - `ddpfindunusedparameters`: None - `ddpbucketcapmb`: None - `ddpbroadcastbuffers`: False - `dataloaderpinmemory`: True - `dataloaderpersistentworkers`: False - `skipmemorymetrics`: True - `uselegacypredictionloop`: False - `pushtohub`: False - `resumefromcheckpoint`: None - `hubmodelid`: None - `hubstrategy`: everysave - `hubprivaterepo`: None - `hubalwayspush`: False - `hubrevision`: None - `gradientcheckpointing`: False - `gradientcheckpointingkwargs`: None - `includeinputsformetrics`: False - `includeformetrics`: [] - `evaldoconcatbatches`: True - `fp16backend`: auto - `pushtohubmodelid`: None - `pushtohuborganization`: None - `mpparameters`: - `autofindbatchsize`: False - `fulldeterminism`: False - `torchdynamo`: None - `rayscope`: last - `ddptimeout`: 1800 - `torchcompile`: False - `torchcompilebackend`: None - `torchcompilemode`: None - `includetokenspersecond`: False - `includenuminputtokensseen`: False - `neftunenoisealpha`: None - `optimtargetmodules`: None - `batchevalmetrics`: False - `evalonstart`: False - `useligerkernel`: False - `ligerkernelconfig`: None - `evalusegatherobject`: False - `averagetokensacrossdevices`: False - `prompts`: None - `batchsampler`: batchsampler - `multidatasetbatchsampler`: roundrobin - `routermapping`: {} - `learningratemapping`: {} Training Logs | Epoch | Step | Training Loss | |:------:|:----:|:-------------:| | 0.2434 | 500 | 1.0578 | | 0.4869 | 1000 | 0.7525 | | 0.7303 | 1500 | 0.5706 | | 0.9737 | 2000 | 0.4128 | | 0.2434 | 2500 | 0.4749 | | 0.4869 | 3000 | 0.5956 | | 0.7303 | 3500 | 0.5322 | | 0.9737 | 4000 | 0.476 | | 1.2171 | 4500 | 0.3686 | | 1.4606 | 5000 | 0.3213 | | 1.7040 | 5500 | 0.3192 | | 1.9474 | 6000 | 0.2964 | | 2.1908 | 6500 | 0.2151 | | 2.4343 | 7000 | 0.1891 | | 2.6777 | 7500 | 0.1668 | | 2.9211 | 8000 | 0.1669 | | 3.1646 | 8500 | 0.1 | | 3.4080 | 9000 | 0.0948 | | 3.6514 | 9500 | 0.1017 | | 3.8948 | 10000 | 0.076 | Got it ✅ Since you tested more than 200 pairs, you can make your README section stronger by showing scale + coverage. Here’s an upgraded version you can paste directly: The model was evaluated on 200+ Emirati Arabic conversational sentence pairs covering greetings, family, culture, food, weather, technology, education, and more. Greetings & Social Talk → High similarity (0.78–0.89) for common greetings and check-ins. Family & Daily Life → Strong clustering (0.7–0.88) for expressions about relatives and routine activities. Food & Culture → Accurate embeddings for traditional dishes and cultural references (0.8–0.95). Weather & Environment → Excellent handling of synonyms like “الجو حار” ↔ “الطقس حر” (0.93+). Sports Commentary → Captures natural paraphrases (“اللاعب سجل هدف” ↔ “اللاعب جاب جول” → 0.88). Tech & Code-switching → Handles Arabic-English mix well (“Laptop ما يشتغل” ↔ “اللابتوب خربان”). Negation & Polarity → Sometimes overestimates similarity between opposites (“بعيد ↔ قريب”). Religious / Abstract Phrases → Inconsistent for Eid, Ramadan, and Quran-related expressions. Subtle Emotions → Good with strong polarity (“غضبان ↔ معصب”), weaker on softer ones (“فرحان ↔ سعيد”). Health/Medical Contexts → Direct matches are fine (“عملية ↔ جراحة”), indirect links less consistent. Overall, the model shows robust performance on everyday Emirati Arabic dialogue with high reliability on paraphrases and cultural expressions, while edge cases like negation, abstract phrasing, and subtle emotional tone need refinement. Framework Versions - Python: 3.12.11 - Sentence Transformers: 5.1.0 - Transformers: 4.56.1 - PyTorch: 2.8.0+cu128 - Accelerate: 1.10.1 - Datasets: 4.0.0 - Tokenizers: 0.22.0
neurology-gemma-300m-emb
SentenceTransformer based on google/embeddinggemma-300m This is a sentence-transformers model finetuned from google/embeddinggemma-300m. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more. Model Description - Model Type: Sentence Transformer - Base model: google/embeddinggemma-300m - Maximum Sequence Length: 2048 tokens - Output Dimensionality: 768 dimensions - Similarity Function: Cosine Similarity - Documentation: Sentence Transformers Documentation - Repository: Sentence Transformers on GitHub - Hugging Face: Sentence Transformers on Hugging Face Size: 20,000 training samples Columns: sentence0 and sentence1 Approximate statistics based on the first 1000 samples: | | sentence0 | sentence1 | |:--------|:----------------------------------------------------------------------------------|:------------------------------------------------------------------------------------| | type | string | string | | details | min: 9 tokens mean: 21.87 tokens max: 61 tokens | min: 21 tokens mean: 92.13 tokens max: 204 tokens | Samples: | sentence0 | sentence1 | |:-----------------------------------------------------------------------------------------------------------------------------|:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | What are the complications that can arise from the injection of the sphenopalatine ganglion? | Complications that can arise from the injection of the sphenopalatine ganglion include ecchymosis in the cellular tissue below the eye. While initially, there was only one reported case of complication, subsequent cases have shown that difficulties can arise following the injections. However, if the technique is carefully followed, serious complications can be avoided. | | How does the interaction between CAPON and nNOS contribute to neuronal death after an excitotoxic stimulus? | CAPON, a protein specifically associated with nNOS, interacts with the PdZ domain of nNOS. The interaction between CAPON and nNOS mediates the nNOS-p38MAPK pathway during neuronal death after an excitotoxic stimulus. L-TAT-GESV, a cell-permeable peptide, can compete with CAPON for the unique PdZ domain of nNOS, leading to increased survival tissue in a severe model of neonatal hypoxia-ischemia. CAPON may serve as a high-specificity target for ischemia. | | What are the motor signs associated with Parkinson's disease? | Parkinson's disease is characterized by motor signs such as hypokinesia (reduced movement) and abnormal posture. These motor signs are commonly observed in patients with Parkinson's disease. | Loss: MultipleNegativesRankingLoss with these parameters: Training Hyperparameters Non-Default Hyperparameters - `perdevicetrainbatchsize`: 6 - `perdeviceevalbatchsize`: 6 - `numtrainepochs`: 1 - `multidatasetbatchsampler`: roundrobin - `overwriteoutputdir`: False - `dopredict`: False - `evalstrategy`: no - `predictionlossonly`: True - `perdevicetrainbatchsize`: 6 - `perdeviceevalbatchsize`: 6 - `pergputrainbatchsize`: None - `pergpuevalbatchsize`: None - `gradientaccumulationsteps`: 1 - `evalaccumulationsteps`: None - `torchemptycachesteps`: None - `learningrate`: 5e-05 - `weightdecay`: 0.0 - `adambeta1`: 0.9 - `adambeta2`: 0.999 - `adamepsilon`: 1e-08 - `maxgradnorm`: 1 - `numtrainepochs`: 1 - `maxsteps`: -1 - `lrschedulertype`: linear - `lrschedulerkwargs`: {} - `warmupratio`: 0.0 - `warmupsteps`: 0 - `loglevel`: passive - `loglevelreplica`: warning - `logoneachnode`: True - `loggingnaninffilter`: True - `savesafetensors`: True - `saveoneachnode`: False - `saveonlymodel`: False - `restorecallbackstatesfromcheckpoint`: False - `nocuda`: False - `usecpu`: False - `usempsdevice`: False - `seed`: 42 - `dataseed`: None - `jitmodeeval`: False - `useipex`: False - `bf16`: False - `fp16`: False - `fp16optlevel`: O1 - `halfprecisionbackend`: auto - `bf16fulleval`: False - `fp16fulleval`: False - `tf32`: None - `localrank`: 0 - `ddpbackend`: None - `tpunumcores`: None - `tpumetricsdebug`: False - `debug`: [] - `dataloaderdroplast`: False - `dataloadernumworkers`: 0 - `dataloaderprefetchfactor`: None - `pastindex`: -1 - `disabletqdm`: False - `removeunusedcolumns`: True - `labelnames`: None - `loadbestmodelatend`: False - `ignoredataskip`: False - `fsdp`: [] - `fsdpminnumparams`: 0 - `fsdpconfig`: {'minnumparams': 0, 'xla': False, 'xlafsdpv2': False, 'xlafsdpgradckpt': False} - `fsdptransformerlayerclstowrap`: None - `acceleratorconfig`: {'splitbatches': False, 'dispatchbatches': None, 'evenbatches': True, 'useseedablesampler': True, 'nonblocking': False, 'gradientaccumulationkwargs': None} - `parallelismconfig`: None - `deepspeed`: None - `labelsmoothingfactor`: 0.0 - `optim`: adamwtorchfused - `optimargs`: None - `adafactor`: False - `groupbylength`: False - `lengthcolumnname`: length - `ddpfindunusedparameters`: None - `ddpbucketcapmb`: None - `ddpbroadcastbuffers`: False - `dataloaderpinmemory`: True - `dataloaderpersistentworkers`: False - `skipmemorymetrics`: True - `uselegacypredictionloop`: False - `pushtohub`: False - `resumefromcheckpoint`: None - `hubmodelid`: None - `hubstrategy`: everysave - `hubprivaterepo`: None - `hubalwayspush`: False - `hubrevision`: None - `gradientcheckpointing`: False - `gradientcheckpointingkwargs`: None - `includeinputsformetrics`: False - `includeformetrics`: [] - `evaldoconcatbatches`: True - `fp16backend`: auto - `pushtohubmodelid`: None - `pushtohuborganization`: None - `mpparameters`: - `autofindbatchsize`: False - `fulldeterminism`: False - `torchdynamo`: None - `rayscope`: last - `ddptimeout`: 1800 - `torchcompile`: False - `torchcompilebackend`: None - `torchcompilemode`: None - `includetokenspersecond`: False - `includenuminputtokensseen`: False - `neftunenoisealpha`: None - `optimtargetmodules`: None - `batchevalmetrics`: False - `evalonstart`: False - `useligerkernel`: False - `ligerkernelconfig`: None - `evalusegatherobject`: False - `averagetokensacrossdevices`: False - `prompts`: None - `batchsampler`: batchsampler - `multidatasetbatchsampler`: roundrobin - `routermapping`: {} - `learningratemapping`: {} Training Logs | Epoch | Step | Training Loss | |:------:|:----:|:-------------:| | 0.1500 | 500 | 0.026 | | 0.2999 | 1000 | 0.0117 | | 0.4499 | 1500 | 0.0068 | | 0.5999 | 2000 | 0.0052 | | 0.7499 | 2500 | 0.0047 | | 0.8998 | 3000 | 0.0017 | Framework Versions - Python: 3.12.11 - Sentence Transformers: 5.1.0 - Transformers: 4.56.1 - PyTorch: 2.8.0+cu128 - Accelerate: 1.10.1 - Datasets: 4.0.0 - Tokenizers: 0.22.0
hindi-gemma-300m-emb
punjabi-gemma-300m-emb
SentenceTransformer based on google/embeddinggemma-300m This is a sentence-transformers model finetuned from google/embeddinggemma-300m. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more. Model Description - Model Type: Sentence Transformer - Base model: google/embeddinggemma-300m - Maximum Sequence Length: 2048 tokens - Output Dimensionality: 768 dimensions - Similarity Function: Cosine Similarity - Documentation: Sentence Transformers Documentation - Repository: Sentence Transformers on GitHub - Hugging Face: Sentence Transformers on Hugging Face Size: 5,004 training samples Columns: sentence0 and sentence1 Approximate statistics based on the first 1000 samples: | | sentence0 | sentence1 | |:--------|:----------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------| | type | string | string | | details | min: 11 tokens mean: 28.8 tokens max: 88 tokens | min: 3 tokens mean: 16.26 tokens max: 144 tokens | Samples: | sentence0 | sentence1 | |:-----------------------------------------------------------------|:---------------------------------| | ਵਿਰਾਟ ਕੋਹਲੀ ਨੇ ਕਿਹੜੇ ਸਕੂਲ ਵਿੱਚ ਪੜ੍ਹਾਈ ਕੀਤੀ? | ਸੇਂਟ ਥਾਮਸ ਸਕੂਲ | | 1992 'ਚ ਅੰਤਰਰਾਸ਼ਟਰੀ ਅਜਾਇਬ ਘਰ ਦਿਹਾੜੇ ਦਾ ਵਿਸ਼ਾ ਕੀ ਸੀ? | ਅਜਾਇਬਘਰ ਅਤੇ ਵਾਤਾਵਰਣ | | ਗੁਰਪ੍ਰੀਤ ਧੂਰੀ ਕਿੱਥੋਂ ਰੋਜ਼ੀ ਰੋਟੀ ਕਮਾ ਰਿਹਾ ਹੈ? | ਦਿੱਲੀ | Loss: MultipleNegativesRankingLoss with these parameters: Training Hyperparameters Non-Default Hyperparameters - `perdevicetrainbatchsize`: 6 - `perdeviceevalbatchsize`: 6 - `numtrainepochs`: 7 - `multidatasetbatchsampler`: roundrobin - `overwriteoutputdir`: False - `dopredict`: False - `evalstrategy`: no - `predictionlossonly`: True - `perdevicetrainbatchsize`: 6 - `perdeviceevalbatchsize`: 6 - `pergputrainbatchsize`: None - `pergpuevalbatchsize`: None - `gradientaccumulationsteps`: 1 - `evalaccumulationsteps`: None - `torchemptycachesteps`: None - `learningrate`: 5e-05 - `weightdecay`: 0.0 - `adambeta1`: 0.9 - `adambeta2`: 0.999 - `adamepsilon`: 1e-08 - `maxgradnorm`: 1 - `numtrainepochs`: 7 - `maxsteps`: -1 - `lrschedulertype`: linear - `lrschedulerkwargs`: {} - `warmupratio`: 0.0 - `warmupsteps`: 0 - `loglevel`: passive - `loglevelreplica`: warning - `logoneachnode`: True - `loggingnaninffilter`: True - `savesafetensors`: True - `saveoneachnode`: False - `saveonlymodel`: False - `restorecallbackstatesfromcheckpoint`: False - `nocuda`: False - `usecpu`: False - `usempsdevice`: False - `seed`: 42 - `dataseed`: None - `jitmodeeval`: False - `useipex`: False - `bf16`: False - `fp16`: False - `fp16optlevel`: O1 - `halfprecisionbackend`: auto - `bf16fulleval`: False - `fp16fulleval`: False - `tf32`: None - `localrank`: 0 - `ddpbackend`: None - `tpunumcores`: None - `tpumetricsdebug`: False - `debug`: [] - `dataloaderdroplast`: False - `dataloadernumworkers`: 0 - `dataloaderprefetchfactor`: None - `pastindex`: -1 - `disabletqdm`: False - `removeunusedcolumns`: True - `labelnames`: None - `loadbestmodelatend`: False - `ignoredataskip`: False - `fsdp`: [] - `fsdpminnumparams`: 0 - `fsdpconfig`: {'minnumparams': 0, 'xla': False, 'xlafsdpv2': False, 'xlafsdpgradckpt': False} - `fsdptransformerlayerclstowrap`: None - `acceleratorconfig`: {'splitbatches': False, 'dispatchbatches': None, 'evenbatches': True, 'useseedablesampler': True, 'nonblocking': False, 'gradientaccumulationkwargs': None} - `parallelismconfig`: None - `deepspeed`: None - `labelsmoothingfactor`: 0.0 - `optim`: adamwtorchfused - `optimargs`: None - `adafactor`: False - `groupbylength`: False - `lengthcolumnname`: length - `ddpfindunusedparameters`: None - `ddpbucketcapmb`: None - `ddpbroadcastbuffers`: False - `dataloaderpinmemory`: True - `dataloaderpersistentworkers`: False - `skipmemorymetrics`: True - `uselegacypredictionloop`: False - `pushtohub`: False - `resumefromcheckpoint`: None - `hubmodelid`: None - `hubstrategy`: everysave - `hubprivaterepo`: None - `hubalwayspush`: False - `hubrevision`: None - `gradientcheckpointing`: False - `gradientcheckpointingkwargs`: None - `includeinputsformetrics`: False - `includeformetrics`: [] - `evaldoconcatbatches`: True - `fp16backend`: auto - `pushtohubmodelid`: None - `pushtohuborganization`: None - `mpparameters`: - `autofindbatchsize`: False - `fulldeterminism`: False - `torchdynamo`: None - `rayscope`: last - `ddptimeout`: 1800 - `torchcompile`: False - `torchcompilebackend`: None - `torchcompilemode`: None - `includetokenspersecond`: False - `includenuminputtokensseen`: False - `neftunenoisealpha`: None - `optimtargetmodules`: None - `batchevalmetrics`: False - `evalonstart`: False - `useligerkernel`: False - `ligerkernelconfig`: None - `evalusegatherobject`: False - `averagetokensacrossdevices`: False - `prompts`: None - `batchsampler`: batchsampler - `multidatasetbatchsampler`: roundrobin - `routermapping`: {} - `learningratemapping`: {} Training Logs | Epoch | Step | Training Loss | |:------:|:----:|:-------------:| | 0.5995 | 500 | 1.346 | | 1.1990 | 1000 | 1.3542 | | 1.7986 | 1500 | 1.2281 | | 2.3981 | 2000 | 1.1036 | | 2.9976 | 2500 | 0.9937 | | 3.5971 | 3000 | 0.7913 | | 4.1966 | 3500 | 0.7128 | | 4.7962 | 4000 | 0.557 | | 5.3957 | 4500 | 0.4327 | | 5.9952 | 5000 | 0.3557 | | 6.5947 | 5500 | 0.2424 | Framework Versions - Python: 3.12.11 - Sentence Transformers: 5.1.0 - Transformers: 4.56.2 - PyTorch: 2.8.0+cu128 - Accelerate: 1.10.1 - Datasets: 4.0.0 - Tokenizers: 0.22.1
finance-gemma-300m-emb
SentenceTransformer based on google/embeddinggemma-300m This is a sentence-transformers model finetuned from google/embeddinggemma-300m. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more. Model Description - Model Type: Sentence Transformer - Base model: google/embeddinggemma-300m - Maximum Sequence Length: 2048 tokens - Output Dimensionality: 768 dimensions - Similarity Function: Cosine Similarity - Documentation: Sentence Transformers Documentation - Repository: Sentence Transformers on GitHub - Hugging Face: Sentence Transformers on Hugging Face Size: 10,000 training samples Columns: sentence0 and sentence1 Approximate statistics based on the first 1000 samples: | | sentence0 | sentence1 | |:--------|:-----------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------| | type | string | string | | details | min: 8 tokens mean: 64.43 tokens max: 577 tokens | min: 41 tokens mean: 329.7 tokens max: 1770 tokens | Samples: | sentence0 | sentence1 | |:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | Explain the stock market indicators that analysts use to determine market trends and how they analyze companies within the market. | • Market indexes: Analysts track major stock market indexes like the S&P 500, Dow Jones Industrial Average, and Nasdaq Composite to get a sense of the overall market direction. When these indexes are rising consistently, it indicates a bull market, and when they are falling, it points to a bear market. • Price-to-earnings (P/E) ratio: The P/E ratio compares a company's stock price to its earnings per share. Analysts use the P/E ratio to determine if a stock is overvalued or undervalued relative to its history and competitors. A higher P/E ratio suggests the stock is more expensive. • Earnings: Analysts pay close attention to companies' quarterly and annual earnings reports. Beating or missing analyst earnings estimates can significantly impact a stock's price. Long-term earnings growth expectations also influence stock valuations. • Economic indicators: Macroeconomic data like GDP growth, unemployment rates, and interest rates provide context for analysts to gauge overall demand in ... | | I am trying to create a calculator that allows users to specify asset classes (e.g. 500 in fund1, 500 in fund2, etc.) and the desired fraction of the total portfolio per asset class (e.g. 70% in fund1, 30% in fund2). The equation should determine how much to buy or sell per asset class (instead of move). I have the following formula for when only 1 fund changes, where $X1$ is before, $X2$ is after, and $dX$ is their difference. The same applies to the second fund $Y$. $a$ is the desired fraction of $X$. if $X2 = a (X2 + Y)$ and $X2 = dX + X1$ is true, then $dX + X1 = a (X2 + Y)$. Rewriting that gave me: $$ dX = \\frac{aX1 - X1 + aY}{1-a} $$ With $X1 = 50$, $a = 0.70$, and $Y = 50$; $dX$ will be $66.5$. Now imagine a situation where users can enter multiple funds and the corresponding desired fraction at will. Now, whenever I calculate the amount to buy in one fund, the total changes, and any previous calculations need to be redone. It seems like I need a way to combine these ... | The normal way to do this is to calculate the current value $P$ of the complete portfolio. Then for each fund $i$ multiply it by the target fraction $ai$ for that fund. This gives the target holding $Xi$ in fund $i$. Then $dXi$ is the difference between between the target holding and the current holding in fund $i$. This allows you to calculate the needed changes for all funds at once. | | Calculate the total revenue generated by a business with a profit margin of 25%, fixed costs of $20,000 and variable costs of $4 per unit if they sell 10,000 units. Show your work and express your final answer as an exact dollar value. | Given: - Profit margin = 25% - Fixed costs = $20,000 - Variable costs = $4 per unit - Units sold = 10,000 Total variable costs = Variable cost per unit x Units sold = $4 x 10,000 = $40,000 Total costs = Fixed costs + Variable costs = $20,000 + $40,000 = $60,000 Profit = Revenue - Total costs Since profit margin is 25%, we can say: Profit/Revenue = 0.25 Profit = 0.25 Revenue Plugging in the profit we found: $15,000 = 0.25 Revenue Revenue = $15,000/0.25 Revenue = $60,000 So the total revenue generated is $60,000 | Loss: MultipleNegativesRankingLoss with these parameters: Training Hyperparameters Non-Default Hyperparameters - `perdevicetrainbatchsize`: 2 - `perdeviceevalbatchsize`: 2 - `numtrainepochs`: 1 - `multidatasetbatchsampler`: roundrobin - `overwriteoutputdir`: False - `dopredict`: False - `evalstrategy`: no - `predictionlossonly`: True - `perdevicetrainbatchsize`: 2 - `perdeviceevalbatchsize`: 2 - `pergputrainbatchsize`: None - `pergpuevalbatchsize`: None - `gradientaccumulationsteps`: 1 - `evalaccumulationsteps`: None - `torchemptycachesteps`: None - `learningrate`: 5e-05 - `weightdecay`: 0.0 - `adambeta1`: 0.9 - `adambeta2`: 0.999 - `adamepsilon`: 1e-08 - `maxgradnorm`: 1 - `numtrainepochs`: 1 - `maxsteps`: -1 - `lrschedulertype`: linear - `lrschedulerkwargs`: {} - `warmupratio`: 0.0 - `warmupsteps`: 0 - `loglevel`: passive - `loglevelreplica`: warning - `logoneachnode`: True - `loggingnaninffilter`: True - `savesafetensors`: True - `saveoneachnode`: False - `saveonlymodel`: False - `restorecallbackstatesfromcheckpoint`: False - `nocuda`: False - `usecpu`: False - `usempsdevice`: False - `seed`: 42 - `dataseed`: None - `jitmodeeval`: False - `useipex`: False - `bf16`: False - `fp16`: False - `fp16optlevel`: O1 - `halfprecisionbackend`: auto - `bf16fulleval`: False - `fp16fulleval`: False - `tf32`: None - `localrank`: 0 - `ddpbackend`: None - `tpunumcores`: None - `tpumetricsdebug`: False - `debug`: [] - `dataloaderdroplast`: False - `dataloadernumworkers`: 0 - `dataloaderprefetchfactor`: None - `pastindex`: -1 - `disabletqdm`: False - `removeunusedcolumns`: True - `labelnames`: None - `loadbestmodelatend`: False - `ignoredataskip`: False - `fsdp`: [] - `fsdpminnumparams`: 0 - `fsdpconfig`: {'minnumparams': 0, 'xla': False, 'xlafsdpv2': False, 'xlafsdpgradckpt': False} - `fsdptransformerlayerclstowrap`: None - `acceleratorconfig`: {'splitbatches': False, 'dispatchbatches': None, 'evenbatches': True, 'useseedablesampler': True, 'nonblocking': False, 'gradientaccumulationkwargs': None} - `parallelismconfig`: None - `deepspeed`: None - `labelsmoothingfactor`: 0.0 - `optim`: adamwtorchfused - `optimargs`: None - `adafactor`: False - `groupbylength`: False - `lengthcolumnname`: length - `ddpfindunusedparameters`: None - `ddpbucketcapmb`: None - `ddpbroadcastbuffers`: False - `dataloaderpinmemory`: True - `dataloaderpersistentworkers`: False - `skipmemorymetrics`: True - `uselegacypredictionloop`: False - `pushtohub`: False - `resumefromcheckpoint`: None - `hubmodelid`: None - `hubstrategy`: everysave - `hubprivaterepo`: None - `hubalwayspush`: False - `hubrevision`: None - `gradientcheckpointing`: False - `gradientcheckpointingkwargs`: None - `includeinputsformetrics`: False - `includeformetrics`: [] - `evaldoconcatbatches`: True - `fp16backend`: auto - `pushtohubmodelid`: None - `pushtohuborganization`: None - `mpparameters`: - `autofindbatchsize`: False - `fulldeterminism`: False - `torchdynamo`: None - `rayscope`: last - `ddptimeout`: 1800 - `torchcompile`: False - `torchcompilebackend`: None - `torchcompilemode`: None - `includetokenspersecond`: False - `includenuminputtokensseen`: False - `neftunenoisealpha`: None - `optimtargetmodules`: None - `batchevalmetrics`: False - `evalonstart`: False - `useligerkernel`: False - `ligerkernelconfig`: None - `evalusegatherobject`: False - `averagetokensacrossdevices`: False - `prompts`: None - `batchsampler`: batchsampler - `multidatasetbatchsampler`: roundrobin - `routermapping`: {} - `learningratemapping`: {} Training Logs | Epoch | Step | Training Loss | |:-----:|:----:|:-------------:| | 0.1 | 500 | 0.0164 | | 0.2 | 1000 | 0.0779 | | 0.3 | 1500 | 0.0739 | | 0.4 | 2000 | 0.064 | | 0.5 | 2500 | 0.0596 | | 0.6 | 3000 | 0.0637 | | 0.7 | 3500 | 0.0304 | | 0.8 | 4000 | 0.0214 | | 0.9 | 4500 | 0.0094 | | 1.0 | 5000 | 0.0233 | Framework Versions - Python: 3.12.11 - Sentence Transformers: 5.1.0 - Transformers: 4.56.2 - PyTorch: 2.8.0+cu128 - Accelerate: 1.10.1 - Datasets: 4.0.0 - Tokenizers: 0.22.1
bangla-gemma-300m-emb
SentenceTransformer based on google/embeddinggemma-300m This is a sentence-transformers model finetuned from google/embeddinggemma-300m. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more. Model Description - Model Type: Sentence Transformer - Base model: google/embeddinggemma-300m - Maximum Sequence Length: 2048 tokens - Output Dimensionality: 768 dimensions - Similarity Function: Cosine Similarity - Documentation: Sentence Transformers Documentation - Repository: Sentence Transformers on GitHub - Hugging Face: Sentence Transformers on Hugging Face Size: 4,995 training samples Columns: sentence0 and sentence1 Approximate statistics based on the first 1000 samples: | | sentence0 | sentence1 | |:--------|:----------------------------------------------------------------------------------|:---------------------------------------------------------------------------------| | type | string | string | | details | min: 6 tokens mean: 15.66 tokens max: 47 tokens | min: 3 tokens mean: 9.46 tokens max: 66 tokens | Samples: | sentence0 | sentence1 | |:-------------------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------| | সকল জীবিত জিনিসের মধ্যে আত্মা বা চেতনা আছে - এমন ধারণা কী? | মানব-বিহীন সত্ত্বাগুলোতে (প্রাণীসমূহ, উদ্ভিদসমূহ এবং প্রাণহীন বস্তুসমূহ বা ইন্দ্রিয়গোচর বস্তুসমূহে) আধ্যাত্মিক নির্যাস বিরাজিত থাকা। | | "স্প্যাট" নামক একক পিতামাতার সহায়তা দলটিতে উইল কি মিথ্যা কথা বলেছিলেন? | তার একটি দুই বছরের ছেলে রয়েছে নেড। | | ১৯৭১ সালের ১৩ই ডিসেম্বর রাত্রিতে মুক্তিযোদ্ধারা পাকিস্তানী বাহিনীর সঙ্গে কোন স্থানে সংঘর্ষে লিপ্ত হন? | সিলেট শহরের উপকণ্ঠে এমসি কলেজসংলগ্ন | Loss: MultipleNegativesRankingLoss with these parameters: Training Hyperparameters Non-Default Hyperparameters - `perdevicetrainbatchsize`: 6 - `perdeviceevalbatchsize`: 6 - `numtrainepochs`: 7 - `multidatasetbatchsampler`: roundrobin - `overwriteoutputdir`: False - `dopredict`: False - `evalstrategy`: no - `predictionlossonly`: True - `perdevicetrainbatchsize`: 6 - `perdeviceevalbatchsize`: 6 - `pergputrainbatchsize`: None - `pergpuevalbatchsize`: None - `gradientaccumulationsteps`: 1 - `evalaccumulationsteps`: None - `torchemptycachesteps`: None - `learningrate`: 5e-05 - `weightdecay`: 0.0 - `adambeta1`: 0.9 - `adambeta2`: 0.999 - `adamepsilon`: 1e-08 - `maxgradnorm`: 1 - `numtrainepochs`: 7 - `maxsteps`: -1 - `lrschedulertype`: linear - `lrschedulerkwargs`: {} - `warmupratio`: 0.0 - `warmupsteps`: 0 - `loglevel`: passive - `loglevelreplica`: warning - `logoneachnode`: True - `loggingnaninffilter`: True - `savesafetensors`: True - `saveoneachnode`: False - `saveonlymodel`: False - `restorecallbackstatesfromcheckpoint`: False - `nocuda`: False - `usecpu`: False - `usempsdevice`: False - `seed`: 42 - `dataseed`: None - `jitmodeeval`: False - `useipex`: False - `bf16`: False - `fp16`: False - `fp16optlevel`: O1 - `halfprecisionbackend`: auto - `bf16fulleval`: False - `fp16fulleval`: False - `tf32`: None - `localrank`: 0 - `ddpbackend`: None - `tpunumcores`: None - `tpumetricsdebug`: False - `debug`: [] - `dataloaderdroplast`: False - `dataloadernumworkers`: 0 - `dataloaderprefetchfactor`: None - `pastindex`: -1 - `disabletqdm`: False - `removeunusedcolumns`: True - `labelnames`: None - `loadbestmodelatend`: False - `ignoredataskip`: False - `fsdp`: [] - `fsdpminnumparams`: 0 - `fsdpconfig`: {'minnumparams': 0, 'xla': False, 'xlafsdpv2': False, 'xlafsdpgradckpt': False} - `fsdptransformerlayerclstowrap`: None - `acceleratorconfig`: {'splitbatches': False, 'dispatchbatches': None, 'evenbatches': True, 'useseedablesampler': True, 'nonblocking': False, 'gradientaccumulationkwargs': None} - `parallelismconfig`: None - `deepspeed`: None - `labelsmoothingfactor`: 0.0 - `optim`: adamwtorchfused - `optimargs`: None - `adafactor`: False - `groupbylength`: False - `lengthcolumnname`: length - `ddpfindunusedparameters`: None - `ddpbucketcapmb`: None - `ddpbroadcastbuffers`: False - `dataloaderpinmemory`: True - `dataloaderpersistentworkers`: False - `skipmemorymetrics`: True - `uselegacypredictionloop`: False - `pushtohub`: False - `resumefromcheckpoint`: None - `hubmodelid`: None - `hubstrategy`: everysave - `hubprivaterepo`: None - `hubalwayspush`: False - `hubrevision`: None - `gradientcheckpointing`: False - `gradientcheckpointingkwargs`: None - `includeinputsformetrics`: False - `includeformetrics`: [] - `evaldoconcatbatches`: True - `fp16backend`: auto - `pushtohubmodelid`: None - `pushtohuborganization`: None - `mpparameters`: - `autofindbatchsize`: False - `fulldeterminism`: False - `torchdynamo`: None - `rayscope`: last - `ddptimeout`: 1800 - `torchcompile`: False - `torchcompilebackend`: None - `torchcompilemode`: None - `includetokenspersecond`: False - `includenuminputtokensseen`: False - `neftunenoisealpha`: None - `optimtargetmodules`: None - `batchevalmetrics`: False - `evalonstart`: False - `useligerkernel`: False - `ligerkernelconfig`: None - `evalusegatherobject`: False - `averagetokensacrossdevices`: False - `prompts`: None - `batchsampler`: batchsampler - `multidatasetbatchsampler`: roundrobin - `routermapping`: {} - `learningratemapping`: {} Training Logs | Epoch | Step | Training Loss | |:------:|:----:|:-------------:| | 0.6002 | 500 | 0.9196 | | 1.2005 | 1000 | 1.0147 | | 1.8007 | 1500 | 0.8201 | | 2.4010 | 2000 | 0.6164 | | 3.0012 | 2500 | 0.5688 | | 3.6014 | 3000 | 0.3681 | | 4.2017 | 3500 | 0.3331 | | 4.8019 | 4000 | 0.2473 | | 5.4022 | 4500 | 0.1687 | | 6.0024 | 5000 | 0.1404 | | 6.6026 | 5500 | 0.1014 | Framework Versions - Python: 3.12.11 - Sentence Transformers: 5.1.0 - Transformers: 4.56.2 - PyTorch: 2.8.0+cu128 - Accelerate: 1.10.1 - Datasets: 4.0.0 - Tokenizers: 0.22.1
urdu-gemma-300m-emb
SentenceTransformer based on google/embeddinggemma-300m This is a sentence-transformers model finetuned from google/embeddinggemma-300m. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more. Model Description - Model Type: Sentence Transformer - Base model: google/embeddinggemma-300m - Maximum Sequence Length: 2048 tokens - Output Dimensionality: 768 dimensions - Similarity Function: Cosine Similarity - Documentation: Sentence Transformers Documentation - Repository: Sentence Transformers on GitHub - Hugging Face: Sentence Transformers on Hugging Face Size: 5,001 training samples Columns: sentence0 and sentence1 Approximate statistics based on the first 1000 samples: | | sentence0 | sentence1 | |:--------|:---------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------| | type | string | string | | details | min: 8 tokens mean: 18.8 tokens max: 45 tokens | min: 2 tokens mean: 12.01 tokens max: 155 tokens | Samples: | sentence0 | sentence1 | |:--------------------------------------------------------------------------|:------------------------| | کولین جے اسٹین کب آزاد ہوئے؟ | اگست 1984ء | | سر کرسٹوفر لی کو "سر" کا خطاب کب عطا ہوا؟ | 2009ء | | سراج الدین ظفر نے کونسی کتاب پر آدم جی ادبی ایوارڈ حاصل کیا؟ | غزال و غزل | Loss: MultipleNegativesRankingLoss with these parameters: Training Hyperparameters Non-Default Hyperparameters - `numtrainepochs`: 7 - `multidatasetbatchsampler`: roundrobin - `overwriteoutputdir`: False - `dopredict`: False - `evalstrategy`: no - `predictionlossonly`: True - `perdevicetrainbatchsize`: 8 - `perdeviceevalbatchsize`: 8 - `pergputrainbatchsize`: None - `pergpuevalbatchsize`: None - `gradientaccumulationsteps`: 1 - `evalaccumulationsteps`: None - `torchemptycachesteps`: None - `learningrate`: 5e-05 - `weightdecay`: 0.0 - `adambeta1`: 0.9 - `adambeta2`: 0.999 - `adamepsilon`: 1e-08 - `maxgradnorm`: 1 - `numtrainepochs`: 7 - `maxsteps`: -1 - `lrschedulertype`: linear - `lrschedulerkwargs`: {} - `warmupratio`: 0.0 - `warmupsteps`: 0 - `loglevel`: passive - `loglevelreplica`: warning - `logoneachnode`: True - `loggingnaninffilter`: True - `savesafetensors`: True - `saveoneachnode`: False - `saveonlymodel`: False - `restorecallbackstatesfromcheckpoint`: False - `nocuda`: False - `usecpu`: False - `usempsdevice`: False - `seed`: 42 - `dataseed`: None - `jitmodeeval`: False - `useipex`: False - `bf16`: False - `fp16`: False - `fp16optlevel`: O1 - `halfprecisionbackend`: auto - `bf16fulleval`: False - `fp16fulleval`: False - `tf32`: None - `localrank`: 0 - `ddpbackend`: None - `tpunumcores`: None - `tpumetricsdebug`: False - `debug`: [] - `dataloaderdroplast`: False - `dataloadernumworkers`: 0 - `dataloaderprefetchfactor`: None - `pastindex`: -1 - `disabletqdm`: False - `removeunusedcolumns`: True - `labelnames`: None - `loadbestmodelatend`: False - `ignoredataskip`: False - `fsdp`: [] - `fsdpminnumparams`: 0 - `fsdpconfig`: {'minnumparams': 0, 'xla': False, 'xlafsdpv2': False, 'xlafsdpgradckpt': False} - `fsdptransformerlayerclstowrap`: None - `acceleratorconfig`: {'splitbatches': False, 'dispatchbatches': None, 'evenbatches': True, 'useseedablesampler': True, 'nonblocking': False, 'gradientaccumulationkwargs': None} - `parallelismconfig`: None - `deepspeed`: None - `labelsmoothingfactor`: 0.0 - `optim`: adamwtorchfused - `optimargs`: None - `adafactor`: False - `groupbylength`: False - `lengthcolumnname`: length - `ddpfindunusedparameters`: None - `ddpbucketcapmb`: None - `ddpbroadcastbuffers`: False - `dataloaderpinmemory`: True - `dataloaderpersistentworkers`: False - `skipmemorymetrics`: True - `uselegacypredictionloop`: False - `pushtohub`: False - `resumefromcheckpoint`: None - `hubmodelid`: None - `hubstrategy`: everysave - `hubprivaterepo`: None - `hubalwayspush`: False - `hubrevision`: None - `gradientcheckpointing`: False - `gradientcheckpointingkwargs`: None - `includeinputsformetrics`: False - `includeformetrics`: [] - `evaldoconcatbatches`: True - `fp16backend`: auto - `pushtohubmodelid`: None - `pushtohuborganization`: None - `mpparameters`: - `autofindbatchsize`: False - `fulldeterminism`: False - `torchdynamo`: None - `rayscope`: last - `ddptimeout`: 1800 - `torchcompile`: False - `torchcompilebackend`: None - `torchcompilemode`: None - `includetokenspersecond`: False - `includenuminputtokensseen`: False - `neftunenoisealpha`: None - `optimtargetmodules`: None - `batchevalmetrics`: False - `evalonstart`: False - `useligerkernel`: False - `ligerkernelconfig`: None - `evalusegatherobject`: False - `averagetokensacrossdevices`: False - `prompts`: None - `batchsampler`: batchsampler - `multidatasetbatchsampler`: roundrobin - `routermapping`: {} - `learningratemapping`: {} Training Logs | Epoch | Step | Training Loss | |:------:|:----:|:-------------:| | 0.7987 | 500 | 1.2925 | | 1.5974 | 1000 | 1.0327 | | 2.3962 | 1500 | 0.7306 | | 3.1949 | 2000 | 0.499 | | 3.9936 | 2500 | 0.3583 | | 4.7923 | 3000 | 0.2443 | | 5.5911 | 3500 | 0.1455 | | 6.3898 | 4000 | 0.1127 | Framework Versions - Python: 3.12.11 - Sentence Transformers: 5.1.0 - Transformers: 4.56.2 - PyTorch: 2.8.0+cu128 - Accelerate: 1.10.1 - Datasets: 4.0.0 - Tokenizers: 0.22.1