TheStageAI
thewhisper-large-v3
thewhisper-large-v3-turbo
Elastic-whisper-large-v3-turbo
Elastic-whisper-large-v3
neutts
Elastic-Wan2.2-T2V-A14B-Diffusers
Elastic-FLUX.1-schnell
Elastic-FLUX.1-dev
wake-word
Elastic-Llama-3.1-8B-Instruct
silero-vad
Elastic-Mistral-7B-Instruct-v0.3
Elastic-Qwen2.5-7B-Instruct
embeddinggemma-300m
Elastic-musicgen-large
speaker-segmentation
Qwen2.5-1.5B-Instruct
Elastic-DeepSeek-R1-Distill-Qwen-7B
Elastic-stable-diffusion-3.5-large
Elastic model: Fastest self-serving models. Stable Diffusion 3.5 Large. Elastic models are the models produced by TheStage AI ANNA: Automated Neural Networks Accelerator. ANNA allows you to control model size, latency and quality with a simple slider movement. For each model, ANNA produces a series of optimized models: XL: Mathematically equivalent neural network, optimized with our DNN compiler. S: The fastest model, with accuracy degradation less than 2%. Provide the fastest models and service for self-hosting. Provide flexibility in cost vs quality selection for inference. Provide clear quality and latency benchmarks. Provide interface of HF libraries: transformers and diffusers with a single line of code. Provide models supported on a wide range of hardware, which are pre-compiled and require no JIT. > It's important to note that specific quality degradation can vary from model to model. For instance, with an S model, you can have 0.5% degradation as well. Currently, our demo model supports 512x512 - 1024x1024 and batch sizes 1-4. This will be updated in the near future. To infer our models, you just need to replace `diffusers` import with `elasticmodels.diffusers`: System requirements: GPUs: H100, B200 CPU: AMD, Intel Python: 3.10-3.12 To work with our models just run these lines in your terminal: Then go to app.thestage.ai, login and generate API token from your profile page. Set up API token as follows: Benchmarking is one of the most important procedures during model acceleration. We aim to provide clear performance metrics for models using our algorithms. For quality evaluation we have used: PSNR and SSIM. PSNR and SSIM were computed using outputs of original model. | Metric/Model | S | XL | Original | |---------------|---|----|----------| | PSNR | 20.78 | 29.13 | inf | | SSIM | 0.81 | 0.95 | 1.0 | Time in seconds to generate one image 1024x1024 | GPU/Model | S | XL | Original | |-----------|-----|----|----------| | H100 | 3.10 | 3.80 | 6.55 | | B200 | 1.76 | 2.27 | 4.81 | Subscribe for updates: TheStageAI X Contact email: [email protected]