Mostafa8Mehrabi

27 models • 1 total models in database

Sort by:

qwen3-30m-tinystories-final

🚀 Qwen3-30M TinyStories Pretrained (FP16) - Notebook Version Pretrained Qwen3-30M model on TinyStories dataset using FP16 precision in notebook environment. - Final Training Loss: 1.5244 - Final Validation Loss: 1.5601832866668701 - Training Samples: -1 - Epochs: 3 - Precision: FP16 - Dataset: TinyStories (child-friendly stories) Training checkpoints (also in FP16) are available at: Mostafa8Mehrabi/qwen3-30m-tinystories-checkpoints The TinyStories dataset contains simple, child-friendly stories that are perfect for: - Story generation - Child-safe content creation - Educational applications - Creative writing assistance This model was trained in a notebook environment with the following configuration: - Batch Size: 128 - Learning Rate: 5e-05 - Max Length: 512 - Number of Processes: 8

NaNK

license:apache-2.0

llama-1b-pruned-3blocks-taylor-Insomnia-ChatBot-SFT-CoT-v1-merged

NaNK

llama

deepseek-v3-mini-wikitext103-lora-merged

—

deepseek_v3_mini_50m

A compact version of DeepSeek-V3 Mini with exactly 58,283,136 parameters (reduced from ~181M). - Parameters: 58,283,136 - Hidden Size: 448 - Layers: 6 - Attention Heads: 8 - Intermediate Size: 1200 - Memory (FP16): ~111.2 MB - Hidden Size: 448 - Layers: 6 - Attention Heads: 8 - Intermediate Size: 1200 - KV LoRA Rank: 96

—

custom-57m-language-model

A custom 57.55M parameter causal language model with modern transformer architecture. - Parameters: 57,553,632 (57.55M) - Architecture: 12-layer Transformer - Hidden Size: 432 - Attention Heads: 8 - Head Dimension: 54 - Intermediate Size: 1,728 - Vocabulary Size: 50,257 (GPT-2 tokenizer) - Max Sequence Length: 1,024 - RoPE Positional Embeddings: Rotary Position Embedding (θ=10000.0) - SwiGLU Activation: Swish-Gated Linear Unit in feed-forward networks - RMSNorm: Root Mean Square Layer Normalization (ε=1e-06) - Tied Embeddings: Input and output embeddings share weights - Dropout: 0.1 dropout rate - Dummy Phase: 2 epochs, 1,000 samples, LR=0.0005 - C4 Phase: 3 epochs, 1,000 samples, LR=0.0003 - Optimizer: AdamW (weightdecay=0.1) - Scheduler: Cosine Annealing - Gradient Clipping: 1.0 - Temperature: 0.8 - Top-K: 50 - Top-P: 0.9 - Repetition Penalty: 1.1 - Max New Tokens: 100 - Primary: C4 (Colossal Clean Crawled Corpus) - Warm-up: Synthetic dummy data for initial training This model was trained as an educational demonstration of transformer architecture implementation with modern techniques like RoPE embeddings and SwiGLU activations.

license:mit

qwen3-50m-fp32

NaNK

license:apache-2.0

qwen3-50m-c4-final-test-version

NaNK

license:apache-2.0

qwen3-50m-c4-final_test_H200

NaNK

license:apache-2.0

llama-1b-3blocks-BI-pruned-KD-bookcorpus-improved_epoch_10

NaNK

llama

llama-1b-3blocks-BI-pruned-10-epochs-KD-bookcorpus-SFT-CoT-merged

This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated. - Developed by: [More Information Needed] - Funded by [optional]: [More Information Needed] - Shared by [optional]: [More Information Needed] - Model type: [More Information Needed] - Language(s) (NLP): [More Information Needed] - License: [More Information Needed] - Finetuned from model [optional]: [More Information Needed] - Repository: [More Information Needed] - Paper [optional]: [More Information Needed] - Demo [optional]: [More Information Needed] Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019). - Hardware Type: [More Information Needed] - Hours used: [More Information Needed] - Cloud Provider: [More Information Needed] - Compute Region: [More Information Needed] - Carbon Emitted: [More Information Needed]

NaNK

llama

llama-3.2-1b-Insomnia-ChatBot-merged

Core purpose is to provide a chatbot experience using the Llama model with 1 billion parameters, utilizing the transformers library.

NaNK

llama

llama-3.2-1b-Insomnia-ChatBot-SFT-CoT-merged

NaNK

llama

llama-3.2-3b-Insomnia-ChatBot-SFT-CoT-merged

NaNK

llama

llama-3.2-1b-Insomnia-ChatBot-R1CoT-GRPO-450-steps-merged

NaNK

llama

llama-1b-3blocks-taylor-plus-pruned-10-epochs-KD-ptb

NaNK

llama

llama-1b-3blocks-taylor-plus-pruned-10-epochs-KD-ptb-SFT-CoT-merged

NaNK

llama

llama-1b-3blocks-BI-pruned-10-epochs-KD-ptb

NaNK

llama

llama-1b-3blocks-BI-pruned-10-epochs-KD-ptb-SFT-CoT-merged

NaNK

llama

llama-1b-pruned-3blocks-bi-Insomnia-ChatBot-SFT-CoT-merged

NaNK

llama

llama-1b-pruned-3blocks-ppl-therapy-calibration

NaNK

llama

llama-1b-3blocks-PPL-pruned-10-epochs-KD-ptb

NaNK

llama

llama-1b-3blocks-PPL-pruned-10-epochs-KD-ptb-SFT-CoT-merged

NaNK

llama

qwen3-50m-insomnia-therapist

Fine-tuned version of Qwen3-50M specialized for insomnia therapy conversations with Chain of Thought reasoning. - Base Model: Mostafa8Mehrabi/qwen3-50m-fp16 - Fine-tuned on: Mostafa8Mehrabi/insomnia-dataset-with-cot - Precision: BF16 - Model Size: ~50M parameters - Specialization: Insomnia therapy with Chain of Thought reasoning - Final Training Loss: 1.1862 - Final Validation Loss: 1.2026013135910034 - Training Epochs: 3 - Batch Size: 4 - Learning Rate: 2e-05 - Max Length: 1024 - Precision Used: BF16 - Input: ` ` + ` ` - Output: ` ` (Chain of Thought reasoning) + ` ` (Therapeutic response) - Chain of Thought Reasoning: The model provides transparent reasoning before generating responses - Therapeutic Approach: Follows evidence-based therapy principles - Validation-Education-Recommendation-Check: Structured therapeutic format - Optimized Training: Trained with BF16 precision for efficiency - Specialized Training: Fine-tuned specifically on insomnia therapy conversations - Architecture: Qwen3 (Transformer-based) - Parameters: ~50M - Training Precision: BF16 - Context Length: 1024 tokens - Training Framework: PyTorch + Transformers - Optimization: AdamW with warmup - Minimum: 2GB GPU VRAM - Recommended: 4GB+ GPU VRAM - CPU: Compatible with CPU inference (slower) Trained on curated insomnia therapy conversations with Chain of Thought annotations from the Mostafa8Mehrabi/insomnia-dataset-with-cot dataset. - This model is for educational and research purposes - Not a replacement for professional medical advice - Always consult healthcare professionals for serious sleep disorders - Model outputs should be reviewed by qualified therapists If you use this model in your research, please cite:

NaNK

license:apache-2.0

Mostafa8Mehrabi

qwen3-50m-fp16

deepseek-v3-mini

qwen3-30m-fp16

qwen3-71M-c4-final

qwen3-30m-tinystories-final

llama-1b-pruned-3blocks-taylor-Insomnia-ChatBot-SFT-CoT-v1-merged

deepseek-v3-mini-wikitext103-lora-merged

deepseek_v3_mini_50m

custom-57m-language-model

qwen3-50m-fp32

qwen3-50m-c4-final-test-version

qwen3-50m-c4-final_test_H200

llama-1b-3blocks-BI-pruned-KD-bookcorpus-improved_epoch_10

llama-1b-3blocks-BI-pruned-10-epochs-KD-bookcorpus-SFT-CoT-merged

llama-3.2-1b-Insomnia-ChatBot-merged

llama-3.2-1b-Insomnia-ChatBot-SFT-CoT-merged

llama-3.2-3b-Insomnia-ChatBot-SFT-CoT-merged

llama-3.2-1b-Insomnia-ChatBot-R1CoT-GRPO-450-steps-merged

llama-1b-3blocks-taylor-plus-pruned-10-epochs-KD-ptb

llama-1b-3blocks-taylor-plus-pruned-10-epochs-KD-ptb-SFT-CoT-merged

llama-1b-3blocks-BI-pruned-10-epochs-KD-ptb

llama-1b-3blocks-BI-pruned-10-epochs-KD-ptb-SFT-CoT-merged

llama-1b-pruned-3blocks-bi-Insomnia-ChatBot-SFT-CoT-merged

llama-1b-pruned-3blocks-ppl-therapy-calibration

llama-1b-3blocks-PPL-pruned-10-epochs-KD-ptb

llama-1b-3blocks-PPL-pruned-10-epochs-KD-ptb-SFT-CoT-merged

qwen3-50m-insomnia-therapist