aws-neuron
Mistral-7B-Instruct-v0.3-seqlen-2048-bs-1-cores-2
Mistral-neuron
Llama-2-7b-chat-hf-seqlen-2048-bs-1
Mistral-7B-Instruct-v0.2-seqlen-2048-bs-1-cores-2
CodeLlama-7b-hf-neuron-8xlarge
Mistral 7B Instruct V0.1 Neuron 1x2048 2 Cores
Neuronx model for mistralai/Mistral-7B-Instruct-v0.1 This repository contains AWS Inferentia2 and `neuronx` compatible checkpoints for meta-llama/Llama-2-7b-hf. You can find detailed information about the base model on its Model Card. This model has been exported to the `neuron` format using specific `inputshapes` and `compiler` parameters detailed in the paragraphs below. Please refer to the 🤗 `optimum-neuron` documentation for an explanation of these parameters. This repository contains tags specific to versions of `neuronx`. When using with 🤗 `optimum-neuron`, use the repo revision specific to the version of `neuronx` you are using, to load the right serialized checkpoints.
CodeLlama-7b-hf-neuron-24xlarge
SOLAR-10.7B-v1.0-neuron
Llama-2-7b-chat-hf-seqlen-2048-bs-2
bge-base-en-v1-5-seqlen-384-bs-1
gpt2-seqlen-1024-bs-16
zephyr-7b-beta-neuron
gpt2-neuronx-bs4-seqlen1024
all-MiniLM-L6-v2-neuron
NeuronWorkshop2025
This is a fine-tuned version of TinyLlama/TinyLlama-1.1B-Chat-v1.0 using the b-mc2/sql-create-context dataset. This is part of an AWS workshop: https://github.com/aws-neuron/neuron-workshops You can see the process and code used to fine tune it here: https://github.com/aws-neuron/neuron-workshops/blob/main/labs/FineTuning/HuggingFaceExample/01finetuning/Finetune-TinyLlama-1.1B.ipynb Code to deploy it on SageMaker is here: https://github.com/aws-neuron/neuron-workshops/blob/main/labs/FineTuning/HuggingFaceExample/02inference/Inference-TinyLlama-1.1B.ipynb
zephyr-7b-seqlen-2048-bs-4-cores-2
Mistral-7B-Instruct-v0.1-neuron-1x2048-24-cores
snowflake-arctic-embed-l
Llama-2-7b-chat-hf-seqlen-2048-bs-4
Mistral-7B-Instruct-v0.1-neuron-4x2048-24-cores
optimum-neuron-cache
This repository contains cached neuron compilation artifacts for the most popular models on the Hugging Face Hub. The transparent caching mechanism included in `optimum-neuron` and `NeuronX TGI`, makes it easier to export and deploy cached models to Neuron platforms such as Trainium and Inferentia. To deploy directly any cached model to SageMaker: - go to the model page, - select "Deploy" in the top right corner, - select "AWS SageMaker" in the drop-down, - select the "AWS Inferentia & Trainium" tab, - copy the code snippet. You can now paste the code snippet in your deployment script or notebook, following the instructions in the comment. To export a model to Neuron and save it locally, please follow the instructions in the `optimum-neuron` documentation. For a list of the cached models and configurations, please refer to the inference cache configuration files. Alternatively, you can use the `optimum-cli neuron cache lookup` command to look for a specific model and see the cached configurations.