Meta-Llama-3.1-70B-Instruct-quantized.w4a16

Name: Meta-Llama-3.1-70B-Instruct-quantized.w4a16
Author: RedHatAI

564.6K

131K

Long context

65.5B

8 languages

llama

RedHatAI

Language Model

OTHER

70B params

Good

565K downloads

Production-ready

Try on Hugging Face Add to Compare

Edge AI:

Mobile

Laptop

Server

147GB+ RAM

Mobile

Laptop

Server

Quick Summary

--- tags: - int4 - vllm language: - en - de - fr - it - pt - hi - es - th pipeline_tag: text-generation license: llama3.

Device Compatibility

Mobile

4-6GB RAM

Laptop

16GB RAM

Server

GPU

Minimum Recommended

62GB+ RAM

Training Data Analysis

🟡 Average (4.8/10)

Researched training datasets used by Meta-Llama-3.1-70B-Instruct-quantized.w4a16 with quality assessment

Specialized For

general

science

multilingual

reasoning

Training Datasets (4)

common crawl

🔴 2.5/10

general

science

Key Strengths

•Scale and Accessibility: At 9.5+ petabytes, Common Crawl provides unprecedented scale for training d...
•Diversity: The dataset captures billions of web pages across multiple domains and content types, ena...
•Comprehensive Coverage: Despite limitations, Common Crawl attempts to represent the broader web acro...

Considerations

•Biased Coverage: The crawling process prioritizes frequently linked domains, making content from dig...
•Large-Scale Problematic Content: Contains significant amounts of hate speech, pornography, violent c...

🔵 6/10

general

multilingual

Key Strengths

•Scale and Accessibility: 750GB of publicly available, filtered text
•Systematic Filtering: Documented heuristics enable reproducibility
•Language Diversity: Despite English-only, captures diverse writing styles

Considerations

•English-Only: Limits multilingual applications
•Filtering Limitations: Offensive content and low-quality text remain despite filtering

wikipedia

🟡 5/10

science

multilingual

Key Strengths

•High-Quality Content: Wikipedia articles are subject to community review, fact-checking, and citatio...
•Multilingual Coverage: Available in 300+ languages, enabling training of models that understand and ...
•Structured Knowledge: Articles follow consistent formatting with clear sections, allowing models to ...

Considerations

•Language Inequality: Low-resource language editions have significantly lower quality, fewer articles...
•Biased Coverage: Reflects biases in contributor demographics; topics related to Western culture and ...

arxiv

🟡 5.5/10

science

reasoning

Key Strengths

•Scientific Authority: Peer-reviewed content from established repository
•Domain-Specific: Specialized vocabulary and concepts
•Mathematical Content: Includes complex equations and notation

Considerations

•Specialized: Primarily technical and mathematical content
•English-Heavy: Predominantly English-language papers

Explore our comprehensive training dataset analysis

View All Datasets

Code Examples

Deploymentpythontransformers

from vllm import LLM, SamplingParams
from transformers import AutoTokenizer

model_id = "neuralmagic/Meta-Llama-3.1-70B-Instruct-quantized.w4a16"
number_gpus = 1
max_model_len = 8192

sampling_params = SamplingParams(temperature=0.6, top_p=0.9, max_tokens=256)

tokenizer = AutoTokenizer.from_pretrained(model_id)

messages = [
    {"role": "system", "content": "You are a pirate chatbot who always responds in pirate speak!"},
    {"role": "user", "content": "Who are you?"},
]

prompts = tokenizer.apply_chat_template(messages, add_generation_prompt=True, tokenize=False)

llm = LLM(model=model_id, tensor_parallel_size=number_gpus, max_model_len=max_model_len)

outputs = llm.generate(prompts, sampling_params)

generated_text = outputs[0].outputs[0].text
print(generated_text)

pythontransformers

from transformers import AutoTokenizer
from datasets import Dataset
from llmcompressor.transformers import SparseAutoModelForCausalLM, oneshot
from llmcompressor.modifiers.quantization import GPTQModifier
import random

model_id = "meta-llama/Meta-Llama-3.1-70B-Instruct"

num_samples = 512
max_seq_len = 8192

tokenizer = AutoTokenizer.from_pretrained(model_id)

preprocess_fn = lambda example: {"text": "Below is an instruction that describes a task. Write a response that appropriately completes the request.\n\n{text}".format_map(example)}

dataset_name = "neuralmagic/LLM_compression_calibration"
dataset = load_dataset(dataset_name, split="train")
ds = dataset.shuffle().select(range(num_samples))
ds = ds.map(preprocess_fn)

recipe = GPTQModifier(
  targets="Linear",
  scheme="W4A16",
  ignore=["lm_head"],
  dampening_frac=0.01,
)

model = SparseAutoModelForCausalLM.from_pretrained(
  model_id,
  device_map="auto",
  trust_remote_code=True,
)

oneshot(
  model=model,
  dataset=ds,
  recipe=recipe,
  max_seq_length=max_seq_len,
  num_calibration_samples=num_samples,
)
model.save_pretrained("Meta-Llama-3.1-70B-Instruct-quantized.w4a16")

Reproductiontextvllm

lm_eval \
  --model vllm \
  --model_args pretrained="neuralmagic/Meta-Llama-3.1-70B-Instruct-quantized.w4a16",dtype=auto,max_model_len=3850,max_gen_toks=10,tensor_parallel_size=1 \
  --tasks mmlu_llama_3.1_instruct \
  --fewshot_as_multiturn \
  --apply_chat_template \
  --num_fewshot 5 \
  --batch_size auto

HumanEval and HumanEval+text

python3 codegen/generate.py \
  --model neuralmagic/Meta-Llama-3.1-70B-Instruct-quantized.w4a16 \
  --bs 16 \
  --temperature 0.2 \
  --n_samples 50 \
  --root "." \
  --dataset humaneval

Sanitizationtextvllm

python3 evalplus/sanitize.py \
  humaneval/neuralmagic--Meta-Llama-3.1-70B-Instruct-quantized.w4a16_vllm_temp_0.2

Sanitizationtextvllm

evalplus.evaluate \
  --dataset humaneval \
  --samples humaneval/neuralmagic--Meta-Llama-3.1-70B-Instruct-quantized.w4a16_vllm_temp_0.2-sanitized

Deploy This Model

Production-ready deployment in minutes

Together.ai

Instant API access to this model

Fastest API

Production-ready inference API. Start free, scale to millions.

Try Free API

Replicate

One-click model deployment

Easiest Setup

Run models in the cloud with simple API. No DevOps required.

Deploy Now

Disclosure: We may earn a commission from these partners. This helps keep LLMYourWay free.