aws-neuron

24 models • 1 total models in database

Sort by:

Mistral 7B Instruct V0.1 Neuron 1x2048 2 Cores

Neuronx model for mistralai/Mistral-7B-Instruct-v0.1 This repository contains AWS Inferentia2 and `neuronx` compatible checkpoints for meta-llama/Llama-2-7b-hf. You can find detailed information about the base model on its Model Card. This model has been exported to the `neuron` format using specific `inputshapes` and `compiler` parameters detailed in the paragraphs below. Please refer to the 🤗 `optimum-neuron` documentation for an explanation of these parameters. This repository contains tags specific to versions of `neuronx`. When using with 🤗 `optimum-neuron`, use the repo revision specific to the version of `neuronx` you are using, to load the right serialized checkpoints.

NaNK

—

CodeLlama-7b-hf-neuron-24xlarge

NaNK

llama

SOLAR-10.7B-v1.0-neuron

NaNK

llama

Llama-2-7b-chat-hf-seqlen-2048-bs-2

NaNK

llama

bge-base-en-v1-5-seqlen-384-bs-1

license:mit

gpt2-seqlen-1024-bs-16

—

zephyr-7b-beta-neuron

NaNK

license:mit

gpt2-neuronx-bs4-seqlen1024

—

all-MiniLM-L6-v2-neuron

NaNK

license:apache-2.0

NeuronWorkshop2025

This is a fine-tuned version of TinyLlama/TinyLlama-1.1B-Chat-v1.0 using the b-mc2/sql-create-context dataset. This is part of an AWS workshop: https://github.com/aws-neuron/neuron-workshops You can see the process and code used to fine tune it here: https://github.com/aws-neuron/neuron-workshops/blob/main/labs/FineTuning/HuggingFaceExample/01finetuning/Finetune-TinyLlama-1.1B.ipynb Code to deploy it on SageMaker is here: https://github.com/aws-neuron/neuron-workshops/blob/main/labs/FineTuning/HuggingFaceExample/02inference/Inference-TinyLlama-1.1B.ipynb

NaNK

llama

zephyr-7b-seqlen-2048-bs-4-cores-2

NaNK

license:mit

Mistral-7B-Instruct-v0.1-neuron-1x2048-24-cores

NaNK

—

snowflake-arctic-embed-l

license:apache-2.0

Llama-2-7b-chat-hf-seqlen-2048-bs-4

NaNK

llama

Mistral-7B-Instruct-v0.1-neuron-4x2048-24-cores

NaNK

—

optimum-neuron-cache

This repository contains cached neuron compilation artifacts for the most popular models on the Hugging Face Hub. The transparent caching mechanism included in `optimum-neuron` and `NeuronX TGI`, makes it easier to export and deploy cached models to Neuron platforms such as Trainium and Inferentia. To deploy directly any cached model to SageMaker: - go to the model page, - select "Deploy" in the top right corner, - select "AWS SageMaker" in the drop-down, - select the "AWS Inferentia & Trainium" tab, - copy the code snippet. You can now paste the code snippet in your deployment script or notebook, following the instructions in the comment. To export a model to Neuron and save it locally, please follow the instructions in the `optimum-neuron` documentation. For a list of the cached models and configurations, please refer to the inference cache configuration files. Alternatively, you can use the `optimum-cli neuron cache lookup` command to look for a specific model and see the cached configurations.

license:apache-2.0

stable-diffusion-xl-base-1-0-1024x1024

—

mixtral-instruct-seqlen-4096-bs-4-optimum-0-0-23

—

SDXL-Lightning-4steps-neuronx

—

aws-neuron

Mistral-7B-Instruct-v0.3-seqlen-2048-bs-1-cores-2

Mistral-neuron

Llama-2-7b-chat-hf-seqlen-2048-bs-1

Mistral-7B-Instruct-v0.2-seqlen-2048-bs-1-cores-2

CodeLlama-7b-hf-neuron-8xlarge

Mistral 7B Instruct V0.1 Neuron 1x2048 2 Cores

CodeLlama-7b-hf-neuron-24xlarge

SOLAR-10.7B-v1.0-neuron

Llama-2-7b-chat-hf-seqlen-2048-bs-2

bge-base-en-v1-5-seqlen-384-bs-1

gpt2-seqlen-1024-bs-16

zephyr-7b-beta-neuron

gpt2-neuronx-bs4-seqlen1024

all-MiniLM-L6-v2-neuron

NeuronWorkshop2025

zephyr-7b-seqlen-2048-bs-4-cores-2

Mistral-7B-Instruct-v0.1-neuron-1x2048-24-cores

snowflake-arctic-embed-l

Llama-2-7b-chat-hf-seqlen-2048-bs-4

Mistral-7B-Instruct-v0.1-neuron-4x2048-24-cores

optimum-neuron-cache

stable-diffusion-xl-base-1-0-1024x1024

mixtral-instruct-seqlen-4096-bs-4-optimum-0-0-23

SDXL-Lightning-4steps-neuronx