llm-jp-4-8b-thinking

3.5K
19
llama
by
llm-jp
Language Model
OTHER
8B params
New
4K downloads
Early-stage
Edge AI:
Mobile
Laptop
Server
18GB+ RAM
Mobile
Laptop
Server
Quick Summary

AI model with specialized capabilities.

Device Compatibility

Mobile
4-6GB RAM
Laptop
16GB RAM
Server
GPU
Minimum Recommended
8GB+ RAM

Code Examples

Usagepythontransformers
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "llm-jp/llm-jp-4-8b-thinking"

tokenizer = AutoTokenizer.from_pretrained(
    model_name,
    # trust_remote_code is required to load custom tokenizer and reasoning parser.
    trust_remote_code=True,
)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True,
)
model.eval()

messages = [
    {"role": "user", "content": "自然言語処理とは何か"},
]

prompt: str = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
    reasoning_effort="medium",  # {"low", "medium", "high"}
)
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

with torch.no_grad():
    output_tensor = model.generate(
        **inputs,
        max_new_tokens=256,
        do_sample=True,
        temperature=0.7,
        top_p=0.9,
    )

generated_ids: list[int] = output_tensor[0][inputs["input_ids"].shape[1]:].tolist()
response = tokenizer.decode(generated_ids)
parsed = tokenizer.parse_response(response)

print("\n--- Parsed Response ---")
print("Role:", parsed.get("role"))
print("Thinking:", parsed.get("thinking"))
print("Content:", parsed.get("content"))

Deploy This Model

Production-ready deployment in minutes

Together.ai

Instant API access to this model

Fastest API

Production-ready inference API. Start free, scale to millions.

Try Free API

Replicate

One-click model deployment

Easiest Setup

Run models in the cloud with simple API. No DevOps required.

Deploy Now

Disclosure: We may earn a commission from these partners. This helps keep LLMYourWay free.