llm-jp-4-8b-thinking
3.5K
19
llama
by
llm-jp
Language Model
OTHER
8B params
New
4K downloads
Early-stage
Edge AI:
Mobile
Laptop
Server
18GB+ RAM
Mobile
Laptop
Server
Quick Summary
AI model with specialized capabilities.
Device Compatibility
Mobile
4-6GB RAM
Laptop
16GB RAM
Server
GPU
Minimum Recommended
8GB+ RAM
Code Examples
Usagepythontransformers
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "llm-jp/llm-jp-4-8b-thinking"
tokenizer = AutoTokenizer.from_pretrained(
model_name,
# trust_remote_code is required to load custom tokenizer and reasoning parser.
trust_remote_code=True,
)
model = AutoModelForCausalLM.from_pretrained(
model_name,
dtype=torch.bfloat16,
device_map="auto",
trust_remote_code=True,
)
model.eval()
messages = [
{"role": "user", "content": "自然言語処理とは何か"},
]
prompt: str = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True,
reasoning_effort="medium", # {"low", "medium", "high"}
)
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
with torch.no_grad():
output_tensor = model.generate(
**inputs,
max_new_tokens=256,
do_sample=True,
temperature=0.7,
top_p=0.9,
)
generated_ids: list[int] = output_tensor[0][inputs["input_ids"].shape[1]:].tolist()
response = tokenizer.decode(generated_ids)
parsed = tokenizer.parse_response(response)
print("\n--- Parsed Response ---")
print("Role:", parsed.get("role"))
print("Thinking:", parsed.get("thinking"))
print("Content:", parsed.get("content"))Deploy This Model
Production-ready deployment in minutes
Together.ai
Instant API access to this model
Production-ready inference API. Start free, scale to millions.
Try Free APIReplicate
One-click model deployment
Run models in the cloud with simple API. No DevOps required.
Deploy NowDisclosure: We may earn a commission from these partners. This helps keep LLMYourWay free.