starcoder2-15b-quantized.w8a16
12
—
by
RedHatAI
Language Model
OTHER
15B params
New
12 downloads
Early-stage
Edge AI:
Mobile
Laptop
Server
34GB+ RAM
Mobile
Laptop
Server
Quick Summary
AI model with specialized capabilities.
Device Compatibility
Mobile
4-6GB RAM
Laptop
16GB RAM
Server
GPU
Minimum Recommended
14GB+ RAM
Training Data Analysis
🔵 Good (7.0/10)
Researched training datasets used by starcoder2-15b-quantized.w8a16 with quality assessment
Specialized For
code
Training Datasets (1)
the stack
🔵 7/10
code
Key Strengths
- •Legal Clarity: Permissive licenses eliminate licensing concerns
- •Comprehensive: 358 languages provide broad coverage
- •Well-Documented: Transparent preprocessing and filtering
Explore our comprehensive training dataset analysis
View All DatasetsCode Examples
Deploymentpythontransformers
from vllm import LLM, SamplingParams
from transformers import AutoTokenizer
model_id = "neuralmagic/starcoder2-15b-quantized.w8a16"
number_gpus = 1
sampling_params = SamplingParams(temperature=0.2, top_p=0.95, max_tokens=256)
tokenizer = AutoTokenizer.from_pretrained(model_id)
prompts = ["def print_hello_world():"]
llm = LLM(model=model_id, tensor_parallel_size=number_gpus)
outputs = llm.generate(prompts, sampling_params)
generated_text = outputs[0].outputs[0].text
print(generated_text)Creationpythontransformers
from transformers import AutoTokenizer
from datasets import Dataset
from llmcompressor.transformers import SparseAutoModelForCausalLM, oneshot
from llmcompressor.modifiers.quantization import GPTQModifier
import random
model_id = "bigcode/starcoder2-15b"
num_samples = 256
max_seq_len = 8192
tokenizer = AutoTokenizer.from_pretrained(model_id)
max_token_id = len(tokenizer.get_vocab()) - 1
input_ids = [[random.randint(0, max_token_id) for _ in range(max_seq_len)] for _ in range(num_samples)]
attention_mask = num_samples * [max_seq_len * [1]]
ds = Dataset.from_dict({"input_ids": input_ids, "attention_mask": attention_mask})
recipe = GPTQModifier(
targets="Linear",
scheme="W8A16",
ignore=["lm_head"],
dampening_frac=0.01,
)
model = SparseAutoModelForCausalLM.from_pretrained(
model_id,
device_map="auto",
trust_remote_code=True,
)
oneshot(
model=model,
dataset=ds,
recipe=recipe,
max_seq_length=max_seq_len,
num_calibration_samples=num_samples,
)
model.save_pretrained("starcoder2-15b-quantized.w8a16")textvllm
python codegen/generate.py \
--model neuralmagic/starcoder2-15b-quantized.w8a16 \
--bs 8 \
--temperature 0.2 \
--n_samples 50 \
--dataset humaneval \
-- root "."
python3 evalplus/sanitize.py humaneval/neuralmagic--starcoder2-15b-quantized.w8a16_vllm_temp_0.2
evalplus.evaluate --dataset humaneval --samples humaneval/neuralmagic--starcoder2-15b-quantized.w8a16_vllm_temp_0.2-sanitizedDeploy This Model
Production-ready deployment in minutes
Together.ai
Instant API access to this model
Production-ready inference API. Start free, scale to millions.
Try Free APIReplicate
One-click model deployment
Run models in the cloud with simple API. No DevOps required.
Deploy NowDisclosure: We may earn a commission from these partners. This helps keep LLMYourWay free.