Gapeleon

20 models • 2 total models in database

Sort by:

Voxtral-Mini-3B-2507

Voxtral Mini is an enhancement of Ministral 3B, incorporating state-of-the-art audio input capabilities while retaining best-in-class text performance. It excels at speech transcription, translation and audio understanding. Learn more about Voxtral in our blog post here and our research paper. Voxtral builds upon Ministral-3B with powerful audio understanding capabilities. - Dedicated transcription mode: Voxtral can operate in a pure speech transcription mode to maximize performance. By default, Voxtral automatically predicts the source audio language and transcribes the text accordingly - Long-form context: With a 32k token context length, Voxtral handles audios up to 30 minutes for transcription, or 40 minutes for understanding - Built-in Q&A and summarization: Supports asking questions directly through audio. Analyze audio and generate structured summaries without the need for separate ASR and language models - Natively multilingual: Automatic language detection and state-of-the-art performance in the world’s most widely used languages (English, Spanish, French, Portuguese, Hindi, German, Dutch, Italian) - Function-calling straight from voice: Enables direct triggering of backend functions, workflows, or API calls based on spoken user intents - Highly capable at text: Retains the text understanding capabilities of its language model backbone, Ministral-3B Average word error rate (WER) over the FLEURS, Mozilla Common Voice and Multilingual LibriSpeech benchmarks: The model can be used with the following frameworks; - `vllm (recommended)`: See here - `Transformers` 🤗: See here - `temperature=0.2` and `topp=0.95` for chat completion (e.g. Audio Understanding) and `temperature=0.0` for transcription - Multiple audios per message and multiple user turns with audio are supported - System prompts are not yet supported Make sure to install vllm >= 0.10.0, we recommend using `uv`: Doing so should automatically install `mistralcommon >= 1.8.1`. You can test that your vLLM setup works as expected by cloning the vLLM repo: We recommend that you use Voxtral-Small-24B-2507 in a server/client setting. Note: Running Voxtral-Mini-3B-2507 on GPU requires ~9.5 GB of GPU RAM in bf16 or fp16. 2. To ping the client you can use a simple Python snippet. See the following examples. Leverage the audio capabilities of Voxtral-Mini-3B-2507 to chat. Make sure that your client has `mistral-common` with audio installed: Voxtral-Mini-3B-2507 has powerful transcription capabilities! Make sure that your client has `mistral-common` with audio installed: Starting with `transformers >= 4.54.0` and above, you can run Voxtral natively! Make sure to have `mistral-common >= 1.8.1` installed with audio dependencies:

Gapeleon

Voxtral-Mini-3B-2507

Slim Orpheus 3b JAPANESE Ft Q8 0 GGUF

Orpheus-3B-pt

Llasa-1B-Q4_K_M-GGUF

Satyr-V0.1-4B-HF-int4_awq-ov

Satyr-V0.1-4B-exl3-8.0bpw

bytedance_BAGEL-7B-MoT-INT8

kaniTTS_Elise

orpheus-maya-Q5_K_M-GGUF

DeepSeek-R1-0528-CODER-DRAFT-0.6B-v1.0-Q4_K_M-GGUF

Orpheus-4b-base

hubertsiuzdak_snac_24khz

Slim Orpheus 3b JAPANESE Ft

Voxtral-Small-24B-2507

DeepCoder-14B-Preview-int4-awq-ov

llasa-3b

Mistral-Small-3.1-24B-Instruct-2503-int4-awq-ov

slim-orpheus-3b-JAPANESE-ft-Q4_K_M-GGUF

Orpheus-1B-pt

mOrpheus_3B-1Base_early_preview-v1-25000-int4-awq-ov