Smilyai-labs

43 models • 1 total models in database

Sort by:

Sam-1-large-it-0002

Sam large v2 IT has custom layers, resulting in the HF "unsafe" scan. We confirm it is safe to use. reach out in the community tab

license:apache-2.0

293

Sam-X-1.5

Model Stats - Parameters: 348,357,632 (~348.4M) - Architecture: 24L × 1024d × 16H - Final Perplexity: 4.81 - Final Accuracy: 80.34%

—

Sam-1

Introducing Sam-1, our newest smartest sam model Note the sam reason A1 model will be sunsetted soon and be replaced with Sam-1.2 This model replaces the sam reason S and V series after they got sunsetted on 23 Aug 2025. Uses a gemma3 archietecture. Great small model.

license:mit

Sam-2.0

📌 Model Overview Sam‑2.0 is a minimal, modular, decoder‑only Transformer architecture designed for chat‑style reasoning tasks. It emphasizes reproducibility, ablation‑friendly design, and clean benchmarking across input modalities. - Architecture: Decoder‑only Transformer with RMSNorm, SwiGLU feed‑forward, and causal masking - Training Objective: Causal language modeling (CLM) with role‑based label masking - Checkpoint: `sam2-epoch35.safetensors` - Final Train Loss: 1.04 - Validation Loss: Not tracked in this run - Training Duration: ~6272 s over 35 epochs - Framework: PyTorch + Hugging Face Transformers (custom model class) 🧱 Model Architecture | Component | Description | |-------------------|-----------------------------------------------------------------------------| | Backbone | Decoder‑only Transformer stack | | Normalization | RMSNorm | | Attention | Multi‑head self‑attention (causal) | | Feed‑Forward | SwiGLU activation with dropout | | Positional Bias | Learned absolute positions (no RoPE in this minimal variant) | | Head | Tied‑embedding LM head | | Checkpoint Format | `safetensors` with metadata for reproducibility | 🧪 Training Details - Dataset: pfb30/multiwozv22 - Batch Size: 8 - Optimizer: AdamW - Learning Rate: 2 × 10⁻⁴ (constant in this run) - Loss Function: Cross‑entropy over assistant tokens only - Hardware: Kaggle GPU runtime - Logging: Step‑wise loss tracking, no validation during training 📊 Evaluation | Metric | Value | Notes | |------------------|-------------|---------------------------------------| | Final Train Loss | 1.04 | Achieved at Epoch 35/35 | | Validation Loss | — | Not tracked in this run | | Inference Speed | Fast | Lightweight architecture | | Generalisation | TBD | To be compared against Sam‑2.5 | 🔧 Intended Use - Research: Benchmarking modular architectures and ablation studies - Education: Reasoning scaffolds and logic quizzes - Deployment: Lightweight agents for chat and dialogue modeling 🚫 Limitations - No validation tracking — generalisation must be inferred via external harnesses - Trained on MultiWOZ v2.2 only — may not generalize to other domains without fine‑tuning - Minimal architecture — no RoPE/MQA in this variant 📁 Files - `sam2-epoch35.safetensors` — final checkpoint - `config.json` — architecture and training config - `tokenizer.json` — tokenizer with special tokens - `README.md` — training logs and setup instructions 🧩 How to Load ```python from transformers import AutoTokenizer import torch from sam2 import Sam2, Sam2Config # your custom model class tok = AutoTokenizer.frompretrained("Smilyai-labs/Sam-2.0") cfg = Sam2Config(json.load(open("config.json"))) model = Sam2(cfg) state = torch.load("sam2-epoch35.safetensors", maplocation="cpu") model.loadstatedict(state) model.eval() prompt = " Hello! \n " ids = tok.encode(prompt, returntensors="pt") with torch.nograd(): for in range(50): logits = model(ids) nextid = torch.argmax(logits[:, -1, :], dim=-1, keepdim=True) ids = torch.cat([ids, nextid], dim=1) if nextid.item() == tok.eostokenid: break

license:mit

Sam-large-v1-speacil

llama

Sam-2.5-4

Sam-2.5-4 is the Pro continuation of the Sam-2.5 architecture series, designed for modular, multi-domain reasoning across math, dialogue, code, and open-domain tasks. It builds directly on Sam-2.5-3, continuing training for four additional epochs to deepen convergence, reduce domain bias, and improve generalization. This model is optimized for transparency, ablation-readiness, and deployment on both high-resource and low-resource devices (including Raspberry Pi). | Version | Description | |---------------|------------------------------------------------------------------------------| | Sam-2.5-2 | GSM8K-heavy fine-tune; overfit to math; lacked domain balance | | Sam-2.5-3 | Emergency patch; retrained from scratch on 4 datasets; balanced capabilities | | Sam-2.5-4 | Pro version; continued training for 4 epochs; refined convergence and fluency| - Transformer-based, modular design - Registry-driven domain tagging and ablation toggles - Shape-adaptive loss functions with domain-aware diagnostics - Quantization-ready for Pi deployment - Verbose logging for batch-level feedback and anomaly tracing - Memory-safe serialization via `safetensors` | Dataset | Domain Focus | |------------------------|----------------------------------| | GSM8K | Mathematical reasoning | | MultiWOZ | Multi-turn dialogue & task flow | | Alpaca-Code-Cleaned| Code generation & logic | | UltraChat-200k | Open-domain conversation | - Datasets were concatenated, shuffled, and tagged for domain awareness - Replay and mixing strategies used to balance underrepresented domains - Training spanned 9 total epochs (5 in -3, 4 in -4) | Metric | Value (Epoch 8–9) | |-------------------------|----------------------------------| | Validation Loss | ↓ 2.95 (avg across domains) | | Max Domain Loss | < 3.4 (no domain exceeded) | | Math Bias | Resolved (loss spikes absorbed) | | Dialogue Coherence | Improved (MultiWOZ eval) | | Code Determinism | Increased (Alpaca eval) | | Open-Domain Fluency | Fewer hallucinations, better grounding | - Loss spikes in early epochs traced to GSM8K; resolved by epoch 6 - Batch-level diagnostics printed per domain and token type - Attention stability improved on long-context prompts - Token transitions cleaner across dialogue and code tasks - Validation curve shows smooth convergence post-epoch 5 - Compatible with Raspberry Pi (quantized + safetensors) - Supports CLI-based training diagnostics (loss, ETA, memory) - Registry hooks enable domain-specific ablation and extension - Ideal for benchmarking on GSM8K, MultiWOZ, UltraChat, and custom blends - Research on modular Transformer architectures - Benchmarking across reasoning, dialogue, and code domains - Deployment on constrained hardware (e.g. Pi, ARM) - Community-driven extension and ablation testing - Still sensitive to prompt phrasing in edge cases - Long-context performance may degrade beyond 2k tokens - Requires domain tags for optimal generalization - Not trained on multimodal inputs (text-only) Thanks to the open-source community, dataset curators, and contributors who helped shape Sam-2.5-4. This release reflects our shared commitment to transparent, inspectable, and extensible AI.

license:mit

Sam-3.0-1

sam-3.0-1 internal testing version. Open to public testing.

license:mit

Sam-2.5-2

license:mit

Sam-2.5-PRO-SOLVER

license:mit

Sam-2.5

📌 Model Overview Sam‑2.5 is a minimal, modular, decoder‑only Transformer architecture designed for chat‑style reasoning tasks. It emphasizes reproducibility, ablation‑friendly design, and clean benchmarking across input modalities. - Architecture: Decoder‑only Transformer with RMSNorm, SwiGLU feed‑forward, and causal masking - Training Objective: Causal language modeling (CLM) with role‑based label masking - Checkpoint: `sam2-epoch35.safetensors` - Final Train Loss: 1.04 - Validation Loss: Not tracked in this run - Training Duration: ~6272 s over 35 epochs - Framework: PyTorch + Hugging Face Transformers (custom model class) 🧱 Model Architecture | Component | Description | |-------------------|-----------------------------------------------------------------------------| | Backbone | Decoder‑only Transformer stack | | Normalization | RMSNorm | | Attention | Multi‑head self‑attention (causal) | | Feed‑Forward | SwiGLU activation with dropout | | Positional Bias | Learned absolute positions (no RoPE in this minimal variant) | | Head | Tied‑embedding LM head | | Checkpoint Format | `safetensors` with metadata for reproducibility | 🧪 Training Details - Dataset: pfb30/multiwozv22 - Batch Size: 8 - Optimizer: AdamW - Learning Rate: 2 × 10⁻⁴ (constant in this run) - Loss Function: Cross‑entropy over assistant tokens only - Hardware: Kaggle GPU runtime - Logging: Step‑wise loss tracking, no validation during training 📊 Evaluation | Metric | Value | Notes | |------------------|-------------|---------------------------------------| | Final Train Loss | 1.04 | Achieved at Epoch 35/35 | | Validation Loss | 1.9554 | | Inference Speed | Fast | Lightweight architecture | | Generalisation | TBD | To be compared against Sam‑2.5 | 🔧 Intended Use - Research: Benchmarking modular architectures and ablation studies - Education: Reasoning scaffolds and logic quizzes - Deployment: Lightweight agents for chat and dialogue modeling 🚫 Limitations - No validation tracking — generalisation must be inferred via external harnesses - Trained on MultiWOZ v2.2 only — may not generalize to other domains without fine‑tuning - Minimal architecture — no RoPE/MQA in this variant 📁 Files - `sam2-epoch35.safetensors` — final checkpoint - `config.json` — architecture and training config - `tokenizer.json` — tokenizer with special tokens - `README.md` — training logs and setup instructions 🧩 How to Load ```python from transformers import AutoTokenizer import torch from sam2 import Sam2, Sam2Config # your custom model class tok = AutoTokenizer.frompretrained("Smilyai-labs/Sam-2.0") cfg = Sam2Config(json.load(open("config.json"))) model = Sam2(cfg) state = torch.load("sam2-epoch35.safetensors", maplocation="cpu") model.loadstatedict(state) model.eval() prompt = " Hello! \n " ids = tok.encode(prompt, returntensors="pt") with torch.nograd(): for in range(50): logits = model(ids) nextid = torch.argmax(logits[:, -1, :], dim=-1, keepdim=True) ids = torch.cat([ids, nextid], dim=1) if nextid.item() == tok.eostokenid: break

license:mit

Sam-Z-1-safetensor

license:mit

Sam-reason-A3

SAM REASON A3 BIGGER, SMARTER, FASTER Trained on a custom dataset. SAM reason A3 is the best of em all so far!!! Our nickname for it is: The ROAST King Its because it is sarcastic and rude if prompted due to training bias. It is not visible unless prompted so if using in apps add safety filters.

—