aws-neuron

24 models • 1 total models in database
Sort by:

Mistral-7B-Instruct-v0.3-seqlen-2048-bs-1-cores-2

NaNK
license:apache-2.0
23,793
0

Mistral-neuron

license:apache-2.0
22
1

Llama-2-7b-chat-hf-seqlen-2048-bs-1

NaNK
llama
21
0

Mistral-7B-Instruct-v0.2-seqlen-2048-bs-1-cores-2

NaNK
license:apache-2.0
18
0

CodeLlama-7b-hf-neuron-8xlarge

NaNK
llama
14
0

Mistral 7B Instruct V0.1 Neuron 1x2048 2 Cores

Neuronx model for mistralai/Mistral-7B-Instruct-v0.1 This repository contains AWS Inferentia2 and `neuronx` compatible checkpoints for meta-llama/Llama-2-7b-hf. You can find detailed information about the base model on its Model Card. This model has been exported to the `neuron` format using specific `inputshapes` and `compiler` parameters detailed in the paragraphs below. Please refer to the 🤗 `optimum-neuron` documentation for an explanation of these parameters. This repository contains tags specific to versions of `neuronx`. When using with 🤗 `optimum-neuron`, use the repo revision specific to the version of `neuronx` you are using, to load the right serialized checkpoints.

NaNK
11
2

CodeLlama-7b-hf-neuron-24xlarge

NaNK
llama
7
0

SOLAR-10.7B-v1.0-neuron

NaNK
llama
5
0

Llama-2-7b-chat-hf-seqlen-2048-bs-2

NaNK
llama
4
1

bge-base-en-v1-5-seqlen-384-bs-1

license:mit
3
1

gpt2-seqlen-1024-bs-16

3
0

zephyr-7b-beta-neuron

NaNK
license:mit
3
0

gpt2-neuronx-bs4-seqlen1024

3
0

all-MiniLM-L6-v2-neuron

NaNK
license:apache-2.0
3
0

NeuronWorkshop2025

This is a fine-tuned version of TinyLlama/TinyLlama-1.1B-Chat-v1.0 using the b-mc2/sql-create-context dataset. This is part of an AWS workshop: https://github.com/aws-neuron/neuron-workshops You can see the process and code used to fine tune it here: https://github.com/aws-neuron/neuron-workshops/blob/main/labs/FineTuning/HuggingFaceExample/01finetuning/Finetune-TinyLlama-1.1B.ipynb Code to deploy it on SageMaker is here: https://github.com/aws-neuron/neuron-workshops/blob/main/labs/FineTuning/HuggingFaceExample/02inference/Inference-TinyLlama-1.1B.ipynb

NaNK
llama
3
0

zephyr-7b-seqlen-2048-bs-4-cores-2

NaNK
license:mit
2
2

Mistral-7B-Instruct-v0.1-neuron-1x2048-24-cores

NaNK
2
0

snowflake-arctic-embed-l

license:apache-2.0
2
0

Llama-2-7b-chat-hf-seqlen-2048-bs-4

NaNK
llama
1
0

Mistral-7B-Instruct-v0.1-neuron-4x2048-24-cores

NaNK
1
0

optimum-neuron-cache

This repository contains cached neuron compilation artifacts for the most popular models on the Hugging Face Hub. The transparent caching mechanism included in `optimum-neuron` and `NeuronX TGI`, makes it easier to export and deploy cached models to Neuron platforms such as Trainium and Inferentia. To deploy directly any cached model to SageMaker: - go to the model page, - select "Deploy" in the top right corner, - select "AWS SageMaker" in the drop-down, - select the "AWS Inferentia & Trainium" tab, - copy the code snippet. You can now paste the code snippet in your deployment script or notebook, following the instructions in the comment. To export a model to Neuron and save it locally, please follow the instructions in the `optimum-neuron` documentation. For a list of the cached models and configurations, please refer to the inference cache configuration files. Alternatively, you can use the `optimum-cli neuron cache lookup` command to look for a specific model and see the cached configurations.

license:apache-2.0
0
31

stable-diffusion-xl-base-1-0-1024x1024

0
3

mixtral-instruct-seqlen-4096-bs-4-optimum-0-0-23

0
1

SDXL-Lightning-4steps-neuronx

0
1