aicrowd-05
6
—
by
silverjan
Language Model
OTHER
New
6 downloads
Early-stage
Edge AI:
Mobile
Laptop
Server
Unknown
Mobile
Laptop
Server
Quick Summary
AI model with specialized capabilities.
Code Examples
base_model: /workspace/ckpts/Qwen3-4B-Base-uciyamlvllm
base_model: /workspace/ckpts/Qwen3-1.7B-uci
# base_model: /workspace/ckpts/Qwen3-4B-Base-uci
output_dir: /workspace/ckpts/Qwen3-1.7B-jan05
plugins:
- axolotl.integrations.liger.LigerPlugin
- axolotl.integrations.cut_cross_entropy.CutCrossEntropyPlugin
liger_rope: true
liger_rms_norm: true
liger_glu_activation: true
liger_layer_norm: true
liger_fused_linear_cross_entropy: true
flash_attention: true
attn_implementation: kernels-community/vllm-flash-attn3
# chat_template_jinja: chat_template_jan05.jinja
chat_template_jinja: qwen25_no_sys.jinja
dataset_num_proc: 32
datasets:
- path: reciprocate/lichess-fmt-v2-jan05-1.2m-jan05
type: chat_template
split: train
field_messages: messages
roles_to_train:
- assistant
dataset_prepared_path: /workspace/data/prepared/gupi_1b_1m
val_set_size: 0.01
evals_per_epoch: 8
sequence_len: 4096
micro_batch_size: 6
sample_packing_bin_size: 1000
sample_packing_group_size: 1000000
sample_packing: true
eval_sample_packing: true
pad_to_sequence_len: true
group_by_length: false
num_epochs: 2
gradient_accumulation_steps: 1
learning_rate: 1e-4
cosine_min_lr_ratio: 0.1
lr_scheduler: cosine
warmup_ratio: 0.01
max_grad_norm: 1.0
weight_decay: 0.0
optimizer: adamw_torch_fused
bf16: true
tf32: true
gradient_checkpointing: false
gradient_checkpointing_kwargs:
use_reentrant: false
ddp_find_unused_parameters: false
seed: 0
resume_from_checkpoint:
saves_per_epoch: 1
logging_steps: 1
wandb_project: gupi
wandb_entity: reciprocate
wandb_name: sftDeploy This Model
Production-ready deployment in minutes
Together.ai
Instant API access to this model
Production-ready inference API. Start free, scale to millions.
Try Free APIReplicate
One-click model deployment
Run models in the cloud with simple API. No DevOps required.
Deploy NowDisclosure: We may earn a commission from these partners. This helps keep LLMYourWay free.