aicrowd-05

6
by
silverjan
Language Model
OTHER
New
6 downloads
Early-stage
Edge AI:
Mobile
Laptop
Server
Unknown
Mobile
Laptop
Server
Quick Summary

AI model with specialized capabilities.

Code Examples

base_model: /workspace/ckpts/Qwen3-4B-Base-uciyamlvllm
base_model: /workspace/ckpts/Qwen3-1.7B-uci
# base_model: /workspace/ckpts/Qwen3-4B-Base-uci
output_dir: /workspace/ckpts/Qwen3-1.7B-jan05

plugins:
  - axolotl.integrations.liger.LigerPlugin
  - axolotl.integrations.cut_cross_entropy.CutCrossEntropyPlugin
liger_rope: true
liger_rms_norm: true
liger_glu_activation: true
liger_layer_norm: true
liger_fused_linear_cross_entropy: true

flash_attention: true
attn_implementation: kernels-community/vllm-flash-attn3

# chat_template_jinja: chat_template_jan05.jinja
chat_template_jinja: qwen25_no_sys.jinja
dataset_num_proc: 32
datasets:
  - path: reciprocate/lichess-fmt-v2-jan05-1.2m-jan05
    type: chat_template
    split: train
    field_messages: messages
    roles_to_train:
      - assistant
dataset_prepared_path: /workspace/data/prepared/gupi_1b_1m
val_set_size: 0.01
evals_per_epoch: 8

sequence_len: 4096
micro_batch_size: 6 

sample_packing_bin_size: 1000
sample_packing_group_size: 1000000
sample_packing: true
eval_sample_packing: true
pad_to_sequence_len: true
group_by_length: false

num_epochs: 2
gradient_accumulation_steps: 1
learning_rate: 1e-4
cosine_min_lr_ratio: 0.1
lr_scheduler: cosine
warmup_ratio: 0.01
max_grad_norm: 1.0
weight_decay: 0.0
optimizer: adamw_torch_fused
bf16: true
tf32: true
gradient_checkpointing: false
gradient_checkpointing_kwargs:
  use_reentrant: false
ddp_find_unused_parameters: false

seed: 0
resume_from_checkpoint:
saves_per_epoch: 1
logging_steps: 1
wandb_project: gupi
wandb_entity: reciprocate
wandb_name: sft

Deploy This Model

Production-ready deployment in minutes

Together.ai

Instant API access to this model

Fastest API

Production-ready inference API. Start free, scale to millions.

Try Free API

Replicate

One-click model deployment

Easiest Setup

Run models in the cloud with simple API. No DevOps required.

Deploy Now

Disclosure: We may earn a commission from these partners. This helps keep LLMYourWay free.