stablelm-4e1t-2b-v0.1

Name: stablelm-4e1t-2b-v0.1
Author: pszemraj

2.0B

license:cc-by-sa-4.0

pszemraj

Language Model

OTHER

2B params

New

15 downloads

Early-stage

Try on Hugging Face Add to Compare

Edge AI:

Mobile

Laptop

Server

5GB+ RAM

Mobile

Laptop

Server

Quick Summary

AI model with specialized capabilities.

Device Compatibility

Mobile

4-6GB RAM

Laptop

16GB RAM

Server

GPU

Minimum Recommended

2GB+ RAM

Training Data Analysis

🟡 Average (5.3/10)

Researched training datasets used by stablelm-4e1t-2b-v0.1 with quality assessment

Specialized For

code

general

science

multilingual

Training Datasets (2)

the pile

🟢 8/10

code

general

science

multilingual

Key Strengths

•Deliberate Diversity: Explicitly curated to include diverse content types (academia, code, Q&A, book...
•Documented Quality: Each component dataset is thoroughly documented with rationale for inclusion, en...
•Epoch Weighting: Component datasets receive different training epochs based on perceived quality, al...

common crawl

🔴 2.5/10

general

science

Key Strengths

•Scale and Accessibility: At 9.5+ petabytes, Common Crawl provides unprecedented scale for training d...
•Diversity: The dataset captures billions of web pages across multiple domains and content types, ena...
•Comprehensive Coverage: Despite limitations, Common Crawl attempts to represent the broader web acro...

Considerations

•Biased Coverage: The crawling process prioritizes frequently linked domains, making content from dig...
•Large-Scale Problematic Content: Contains significant amounts of hate speech, pornography, violent c...

Explore our comprehensive training dataset analysis

View All Datasets

Code Examples

configyaml

base_model: pszemraj/stablelm-3b-4e1t-prune10
model_type: AutoModelForCausalLM
tokenizer_type: AutoTokenizer

strict: false
seed: 80085

# dataset
datasets:
    - path: BEE-spoke-data/KI-smorgasbord_fw-small
      type: completion # format from earlier
      field: text # Optional[str] default: text, field to use for completion data
val_set_size: 0.015

sequence_len: 4096
sample_packing: true
pad_to_sequence_len: false
train_on_inputs: false
group_by_length: false

# WANDB
wandb_project: llama3-pruning
wandb_entity: pszemraj
wandb_watch: gradients
wandb_name: stablelm-4e1t-2b-v0.1
hub_model_id: pszemraj/stablelm-4e1t-2b-v0.1
hub_strategy: every_save

gradient_accumulation_steps: 16
micro_batch_size: 1
num_epochs: 2
optimizer: adamw_torch_fused # paged_adamw_32bit
weight_decay: 0.05
lr_scheduler: cosine
learning_rate: 5e-5
warmup_ratio: 0.1

load_in_8bit: false
load_in_4bit: false
bf16: true
tf32: true

flash_attention: true
torch_compile: true # requires >= torch 2.0, may sometimes cause problems
torch_compile_backend: inductor # Optional[str]
gradient_checkpointing: true
gradient_checkpointing_kwargs:
  use_reentrant: false

# hyperparams for freq of evals, saving, etc
evals_per_epoch: 5
saves_per_epoch: 3
save_safetensors: true
save_total_limit: 1
output_dir: ./output-axolotl/output-model-2b
logging_steps: 8

deepspeed:

special_tokens:
  pad_token: <|end_of_text|>

Deploy This Model

Production-ready deployment in minutes

Together.ai

Instant API access to this model

Fastest API

Production-ready inference API. Start free, scale to millions.

Try Free API

Replicate

One-click model deployment

Easiest Setup

Run models in the cloud with simple API. No DevOps required.

Deploy Now

Disclosure: We may earn a commission from these partners. This helps keep LLMYourWay free.