Qwen2.5-1.5B-Instruct-ultrafeedback_binarized-reward

3
by
chaosc
Language Model
OTHER
1.5B params
New
3 downloads
Early-stage
Edge AI:
Mobile
Laptop
Server
4GB+ RAM
Mobile
Laptop
Server
Quick Summary

AI model with specialized capabilities.

Device Compatibility

Mobile
4-6GB RAM
Laptop
16GB RAM
Server
GPU
Minimum Recommended
2GB+ RAM

Code Examples

Model Card for outputpython
import os
from trl import RewardTrainer, RewardConfig
from datasets import load_dataset


os.environ["WANDB_PROJECT"] = "hh"

training_args = RewardConfig(
    output_dir="output/",
    report_to="wandb",
    run_name="Qwen2.5-1.5B-Instruct-ultrafeedback_binarized-reward",
    num_train_epochs=3,
    per_device_train_batch_size=16,  
    gradient_accumulation_steps=4,  
    learning_rate=1e-5,                      
    warmup_ratio=0.1,
    center_rewards_coefficient=1e-2, 
    bf16=True, 
)

trainer = RewardTrainer(
    model="model/Qwen/Qwen2.5-1.5B-Instruct",
    args=training_args,
    train_dataset=load_dataset("trl-lib/ultrafeedback_binarized", split="train"),
)
trainer.train()
Quick startpythontransformers
from transformers import pipeline

text = "The capital of France is Paris."
rewarder = pipeline(model="None", device="cuda")
output = rewarder(text)[0]
print(output["score"])

Deploy This Model

Production-ready deployment in minutes

Together.ai

Instant API access to this model

Fastest API

Production-ready inference API. Start free, scale to millions.

Try Free API

Replicate

One-click model deployment

Easiest Setup

Run models in the cloud with simple API. No DevOps required.

Deploy Now

Disclosure: We may earn a commission from these partners. This helps keep LLMYourWay free.