RPBizkit-v5-12B-Lorablated

143
3
by
RicardoEstep
Language Model
OTHER
12B params
New
143 downloads
Early-stage
Edge AI:
Mobile
Laptop
Server
27GB+ RAM
Mobile
Laptop
Server
Quick Summary

AI model with specialized capabilities.

Device Compatibility

Mobile
4-6GB RAM
Laptop
16GB RAM
Server
GPU
Minimum Recommended
12GB+ RAM

Code Examples

LoRa used:pythontransformers
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

# -------- Configuration --------
base_model_path = "./output"
tokenizer_path = "./output"
lora_path = "nbeerbower/Mistral-Nemo-12B-abliterated-LORA"
output_path = "./RPBizkit-v5-12B-Lorablated"

# Hybrid scaling (recommended starting values)
ATTENTION_SCALE = 0.7   # Strong (but not complete) overwrite on attention.
MLP_SCALE = 0.3         # Light influence on MLP for stability.

# --------------------------

print("Loading base model...")
model = AutoModelForCausalLM.from_pretrained(
    base_model_path,
    torch_dtype=torch.bfloat16,
    low_cpu_mem_usage=True,
    trust_remote_code=True
)

# --- Fix Embeddeds ---
expected_vocab_size = 131072
current_vocab_size = model.get_input_embeddings().weight.shape[0]
if current_vocab_size != expected_vocab_size:
    print(f"Resizing embeddings from {current_vocab_size} to {expected_vocab_size}...")
    model.resize_token_embeddings(expected_vocab_size)

# --- Apply LoRA ---
print("Applying LoRA...")
model = PeftModel.from_pretrained(
    model,
    lora_path,
    adapter_name="default",
    is_trainable=False
)

# --- HYBRID SCALING ---
print("Applying hybrid scaling...")
adapter_name = "default"
for name, module in model.named_modules():
    if hasattr(module, "scaling"):
        # Strong behavioral overwrite on attention
        if any(x in name for x in ["q_proj", "k_proj", "v_proj", "o_proj"]):
            module.scaling = {adapter_name: ATTENTION_SCALE}
        # Light influence on MLP
        elif any(x in name for x in ["up_proj", "down_proj", "gate_proj"]):
            module.scaling = {adapter_name: MLP_SCALE}

# --- Merging the LoRA ---
print("Merging LoRA into base weights...")
model = model.merge_and_unload(progressbar=True)

# --- Adding Tokenizer ---
print("Loading tokenizer...")
tokenizer = AutoTokenizer.from_pretrained(tokenizer_path, trust_remote_code=True)

# --- Save Final Model ---
print("Saving final hybrid-merged model...")
model.save_pretrained(output_path, safe_serialization=True)
tokenizer.save_pretrained(output_path)

print("Hybrid merge complete!")

Deploy This Model

Production-ready deployment in minutes

Together.ai

Instant API access to this model

Fastest API

Production-ready inference API. Start free, scale to millions.

Try Free API

Replicate

One-click model deployment

Easiest Setup

Run models in the cloud with simple API. No DevOps required.

Deploy Now

Disclosure: We may earn a commission from these partners. This helps keep LLMYourWay free.