WildReward-4B

Name: WildReward-4B
Author: THU-KEG

license:apache-2.0

THU-KEG

Embedding Model

OTHER

4B params

New

31 downloads

Early-stage

Try on Hugging Face Add to Compare

Edge AI:

Mobile

Laptop

Server

9GB+ RAM

Mobile

Laptop

Server

Quick Summary

AI model with specialized capabilities.

Device Compatibility

Mobile

4-6GB RAM

Laptop

16GB RAM

Server

GPU

Minimum Recommended

4GB+ RAM

Code Examples

Usagepythontransformers

import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification

model_name = "THU-KEG/WildReward-4B"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

def build_text(query, response, history_str=""):
    """Format input text for reward model scoring."""
    text = f"""
# Task Description
You are an expert conversation evaluator. Your task is to judge the **User's Satisfaction** with the Assistant's response based on the conversation context.
Please rate the response on a scale of 1 to 5 integers.

# Scoring Criteria
[1] CLEARLY NEGATIVE / REJECTION
[2] CORRECTION / ERROR POINTER (Negative)
[3] NEUTRAL
[4] POSITIVE ENGAGEMENT
[5] CLEAR SATISFACTION

# Input Data
## Context (History)
{history_str}

## User Query
{query}

## Assistant Response
{response}

# Output
Based on the criteria above, please output ONLY the integer score (1, 2, 3, 4, or 5).
"""
    return text.strip()

# Prepare query and response
query = "Explain quantum computing in simple terms."
response = "Quantum computing uses quantum bits or 'qubits' that can exist in multiple states simultaneously, unlike classical bits..."

# Build formatted text
text = build_text(query, response)

# Tokenize
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=4096).to(model.device)

# Get reward score
with torch.no_grad():
    outputs = model(**inputs)
    logits = outputs.logits

    # CORAL / Ordinal Regression (output shape: 1, K-1)
    probs = torch.sigmoid(logits)
    reward = 1 + torch.sum(probs).item()

print(f"Reward score: {reward:.2f} (scale: 1-5)")

Deploy This Model

Production-ready deployment in minutes

Together.ai

Instant API access to this model

Fastest API

Production-ready inference API. Start free, scale to millions.

Try Free API

Replicate

One-click model deployment

Easiest Setup

Run models in the cloud with simple API. No DevOps required.

Deploy Now

Disclosure: We may earn a commission from these partners. This helps keep LLMYourWay free.