Mostafa8Mehrabi
qwen3-50m-fp16
deepseek-v3-mini
qwen3-30m-fp16
qwen3-71M-c4-final
qwen3-30m-tinystories-final
🚀 Qwen3-30M TinyStories Pretrained (FP16) - Notebook Version Pretrained Qwen3-30M model on TinyStories dataset using FP16 precision in notebook environment. - Final Training Loss: 1.5244 - Final Validation Loss: 1.5601832866668701 - Training Samples: -1 - Epochs: 3 - Precision: FP16 - Dataset: TinyStories (child-friendly stories) Training checkpoints (also in FP16) are available at: Mostafa8Mehrabi/qwen3-30m-tinystories-checkpoints The TinyStories dataset contains simple, child-friendly stories that are perfect for: - Story generation - Child-safe content creation - Educational applications - Creative writing assistance This model was trained in a notebook environment with the following configuration: - Batch Size: 128 - Learning Rate: 5e-05 - Max Length: 512 - Number of Processes: 8
llama-1b-pruned-3blocks-taylor-Insomnia-ChatBot-SFT-CoT-v1-merged
deepseek-v3-mini-wikitext103-lora-merged
deepseek_v3_mini_50m
A compact version of DeepSeek-V3 Mini with exactly 58,283,136 parameters (reduced from ~181M). - Parameters: 58,283,136 - Hidden Size: 448 - Layers: 6 - Attention Heads: 8 - Intermediate Size: 1200 - Memory (FP16): ~111.2 MB - Hidden Size: 448 - Layers: 6 - Attention Heads: 8 - Intermediate Size: 1200 - KV LoRA Rank: 96
custom-57m-language-model
A custom 57.55M parameter causal language model with modern transformer architecture. - Parameters: 57,553,632 (57.55M) - Architecture: 12-layer Transformer - Hidden Size: 432 - Attention Heads: 8 - Head Dimension: 54 - Intermediate Size: 1,728 - Vocabulary Size: 50,257 (GPT-2 tokenizer) - Max Sequence Length: 1,024 - RoPE Positional Embeddings: Rotary Position Embedding (θ=10000.0) - SwiGLU Activation: Swish-Gated Linear Unit in feed-forward networks - RMSNorm: Root Mean Square Layer Normalization (ε=1e-06) - Tied Embeddings: Input and output embeddings share weights - Dropout: 0.1 dropout rate - Dummy Phase: 2 epochs, 1,000 samples, LR=0.0005 - C4 Phase: 3 epochs, 1,000 samples, LR=0.0003 - Optimizer: AdamW (weightdecay=0.1) - Scheduler: Cosine Annealing - Gradient Clipping: 1.0 - Temperature: 0.8 - Top-K: 50 - Top-P: 0.9 - Repetition Penalty: 1.1 - Max New Tokens: 100 - Primary: C4 (Colossal Clean Crawled Corpus) - Warm-up: Synthetic dummy data for initial training This model was trained as an educational demonstration of transformer architecture implementation with modern techniques like RoPE embeddings and SwiGLU activations.
qwen3-50m-fp32
qwen3-50m-c4-final-test-version
qwen3-50m-c4-final_test_H200
llama-1b-3blocks-BI-pruned-KD-bookcorpus-improved_epoch_10
llama-1b-3blocks-BI-pruned-10-epochs-KD-bookcorpus-SFT-CoT-merged
This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated. - Developed by: [More Information Needed] - Funded by [optional]: [More Information Needed] - Shared by [optional]: [More Information Needed] - Model type: [More Information Needed] - Language(s) (NLP): [More Information Needed] - License: [More Information Needed] - Finetuned from model [optional]: [More Information Needed] - Repository: [More Information Needed] - Paper [optional]: [More Information Needed] - Demo [optional]: [More Information Needed] Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019). - Hardware Type: [More Information Needed] - Hours used: [More Information Needed] - Cloud Provider: [More Information Needed] - Compute Region: [More Information Needed] - Carbon Emitted: [More Information Needed]
llama-3.2-1b-Insomnia-ChatBot-merged
Core purpose is to provide a chatbot experience using the Llama model with 1 billion parameters, utilizing the transformers library.
llama-3.2-1b-Insomnia-ChatBot-SFT-CoT-merged
llama-3.2-3b-Insomnia-ChatBot-SFT-CoT-merged
This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated. - Developed by: [More Information Needed] - Funded by [optional]: [More Information Needed] - Shared by [optional]: [More Information Needed] - Model type: [More Information Needed] - Language(s) (NLP): [More Information Needed] - License: [More Information Needed] - Finetuned from model [optional]: [More Information Needed] - Repository: [More Information Needed] - Paper [optional]: [More Information Needed] - Demo [optional]: [More Information Needed] Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019). - Hardware Type: [More Information Needed] - Hours used: [More Information Needed] - Cloud Provider: [More Information Needed] - Compute Region: [More Information Needed] - Carbon Emitted: [More Information Needed]
llama-3.2-1b-Insomnia-ChatBot-R1CoT-GRPO-450-steps-merged
This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated. - Developed by: [More Information Needed] - Funded by [optional]: [More Information Needed] - Shared by [optional]: [More Information Needed] - Model type: [More Information Needed] - Language(s) (NLP): [More Information Needed] - License: [More Information Needed] - Finetuned from model [optional]: [More Information Needed] - Repository: [More Information Needed] - Paper [optional]: [More Information Needed] - Demo [optional]: [More Information Needed] Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019). - Hardware Type: [More Information Needed] - Hours used: [More Information Needed] - Cloud Provider: [More Information Needed] - Compute Region: [More Information Needed] - Carbon Emitted: [More Information Needed]
llama-1b-3blocks-taylor-plus-pruned-10-epochs-KD-ptb
llama-1b-3blocks-taylor-plus-pruned-10-epochs-KD-ptb-SFT-CoT-merged
This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated. - Developed by: [More Information Needed] - Funded by [optional]: [More Information Needed] - Shared by [optional]: [More Information Needed] - Model type: [More Information Needed] - Language(s) (NLP): [More Information Needed] - License: [More Information Needed] - Finetuned from model [optional]: [More Information Needed] - Repository: [More Information Needed] - Paper [optional]: [More Information Needed] - Demo [optional]: [More Information Needed] Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019). - Hardware Type: [More Information Needed] - Hours used: [More Information Needed] - Cloud Provider: [More Information Needed] - Compute Region: [More Information Needed] - Carbon Emitted: [More Information Needed]
llama-1b-3blocks-BI-pruned-10-epochs-KD-ptb
llama-1b-3blocks-BI-pruned-10-epochs-KD-ptb-SFT-CoT-merged
llama-1b-pruned-3blocks-bi-Insomnia-ChatBot-SFT-CoT-merged
This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated. - Developed by: [More Information Needed] - Funded by [optional]: [More Information Needed] - Shared by [optional]: [More Information Needed] - Model type: [More Information Needed] - Language(s) (NLP): [More Information Needed] - License: [More Information Needed] - Finetuned from model [optional]: [More Information Needed] - Repository: [More Information Needed] - Paper [optional]: [More Information Needed] - Demo [optional]: [More Information Needed] Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019). - Hardware Type: [More Information Needed] - Hours used: [More Information Needed] - Cloud Provider: [More Information Needed] - Compute Region: [More Information Needed] - Carbon Emitted: [More Information Needed]
llama-1b-pruned-3blocks-ppl-therapy-calibration
llama-1b-3blocks-PPL-pruned-10-epochs-KD-ptb
llama-1b-3blocks-PPL-pruned-10-epochs-KD-ptb-SFT-CoT-merged
This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated. - Developed by: [More Information Needed] - Funded by [optional]: [More Information Needed] - Shared by [optional]: [More Information Needed] - Model type: [More Information Needed] - Language(s) (NLP): [More Information Needed] - License: [More Information Needed] - Finetuned from model [optional]: [More Information Needed] - Repository: [More Information Needed] - Paper [optional]: [More Information Needed] - Demo [optional]: [More Information Needed] Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019). - Hardware Type: [More Information Needed] - Hours used: [More Information Needed] - Cloud Provider: [More Information Needed] - Compute Region: [More Information Needed] - Carbon Emitted: [More Information Needed]
qwen3-50m-insomnia-therapist
Fine-tuned version of Qwen3-50M specialized for insomnia therapy conversations with Chain of Thought reasoning. - Base Model: Mostafa8Mehrabi/qwen3-50m-fp16 - Fine-tuned on: Mostafa8Mehrabi/insomnia-dataset-with-cot - Precision: BF16 - Model Size: ~50M parameters - Specialization: Insomnia therapy with Chain of Thought reasoning - Final Training Loss: 1.1862 - Final Validation Loss: 1.2026013135910034 - Training Epochs: 3 - Batch Size: 4 - Learning Rate: 2e-05 - Max Length: 1024 - Precision Used: BF16 - Input: ` ` + ` ` - Output: ` ` (Chain of Thought reasoning) + ` ` (Therapeutic response) - Chain of Thought Reasoning: The model provides transparent reasoning before generating responses - Therapeutic Approach: Follows evidence-based therapy principles - Validation-Education-Recommendation-Check: Structured therapeutic format - Optimized Training: Trained with BF16 precision for efficiency - Specialized Training: Fine-tuned specifically on insomnia therapy conversations - Architecture: Qwen3 (Transformer-based) - Parameters: ~50M - Training Precision: BF16 - Context Length: 1024 tokens - Training Framework: PyTorch + Transformers - Optimization: AdamW with warmup - Minimum: 2GB GPU VRAM - Recommended: 4GB+ GPU VRAM - CPU: Compatible with CPU inference (slower) Trained on curated insomnia therapy conversations with Chain of Thought annotations from the Mostafa8Mehrabi/insomnia-dataset-with-cot dataset. - This model is for educational and research purposes - Not a replacement for professional medical advice - Always consult healthcare professionals for serious sleep disorders - Model outputs should be reviewed by qualified therapists If you use this model in your research, please cite: