Gapeleon

20 models • 2 total models in database
Sort by:

Voxtral-Mini-3B-2507

Voxtral Mini is an enhancement of Ministral 3B, incorporating state-of-the-art audio input capabilities while retaining best-in-class text performance. It excels at speech transcription, translation and audio understanding. Learn more about Voxtral in our blog post here and our research paper. Voxtral builds upon Ministral-3B with powerful audio understanding capabilities. - Dedicated transcription mode: Voxtral can operate in a pure speech transcription mode to maximize performance. By default, Voxtral automatically predicts the source audio language and transcribes the text accordingly - Long-form context: With a 32k token context length, Voxtral handles audios up to 30 minutes for transcription, or 40 minutes for understanding - Built-in Q&A and summarization: Supports asking questions directly through audio. Analyze audio and generate structured summaries without the need for separate ASR and language models - Natively multilingual: Automatic language detection and state-of-the-art performance in the world’s most widely used languages (English, Spanish, French, Portuguese, Hindi, German, Dutch, Italian) - Function-calling straight from voice: Enables direct triggering of backend functions, workflows, or API calls based on spoken user intents - Highly capable at text: Retains the text understanding capabilities of its language model backbone, Ministral-3B Average word error rate (WER) over the FLEURS, Mozilla Common Voice and Multilingual LibriSpeech benchmarks: The model can be used with the following frameworks; - `vllm (recommended)`: See here - `Transformers` 🤗: See here - `temperature=0.2` and `topp=0.95` for chat completion (e.g. Audio Understanding) and `temperature=0.0` for transcription - Multiple audios per message and multiple user turns with audio are supported - System prompts are not yet supported Make sure to install vllm >= 0.10.0, we recommend using `uv`: Doing so should automatically install `mistralcommon >= 1.8.1`. You can test that your vLLM setup works as expected by cloning the vLLM repo: We recommend that you use Voxtral-Small-24B-2507 in a server/client setting. Note: Running Voxtral-Mini-3B-2507 on GPU requires ~9.5 GB of GPU RAM in bf16 or fp16. 2. To ping the client you can use a simple Python snippet. See the following examples. Leverage the audio capabilities of Voxtral-Mini-3B-2507 to chat. Make sure that your client has `mistral-common` with audio installed: Voxtral-Mini-3B-2507 has powerful transcription capabilities! Make sure that your client has `mistral-common` with audio installed: Starting with `transformers >= 4.54.0` and above, you can run Voxtral natively! Make sure to have `mistral-common >= 1.8.1` installed with audio dependencies:

NaNK
license:apache-2.0
57
0

Slim Orpheus 3b JAPANESE Ft Q8 0 GGUF

Gapeleon/slim-orpheus-3b-JAPANESE-ft-Q80-GGUF This model was converted to GGUF format from `Gapeleon/slim-orpheus-3b-JAPANESE-ft` using llama.cpp via the ggml.ai's GGUF-my-repo space. Refer to the original model card for more details on the model. Use with llama.cpp Install llama.cpp through brew (works on Mac and Linux) Note: You can also use this checkpoint directly through the usage steps listed in the Llama.cpp repo as well. Step 2: Move into the llama.cpp folder and build it with `LLAMACURL=1` flag along with other hardware-specific flags (for ex: LLAMACUDA=1 for Nvidia GPUs on Linux).

NaNK
llama-cpp
38
1

Orpheus-3B-pt

NaNK
llama
29
0

Llasa-1B-Q4_K_M-GGUF

Gapeleon/Llasa-1B-Q4KM-GGUF This model was converted to GGUF format from `HKUSTAudio/Llasa-1B` using llama.cpp via the ggml.ai's GGUF-my-repo space. Refer to the original model card for more details on the model. Use with llama.cpp Install llama.cpp through brew (works on Mac and Linux) Note: You can also use this checkpoint directly through the usage steps listed in the Llama.cpp repo as well. Step 2: Move into the llama.cpp folder and build it with `LLAMACURL=1` flag along with other hardware-specific flags (for ex: LLAMACUDA=1 for Nvidia GPUs on Linux).

NaNK
llama-cpp
26
0

Satyr-V0.1-4B-HF-int4_awq-ov

NaNK
license:apache-2.0
17
1

Satyr-V0.1-4B-exl3-8.0bpw

NaNK
exllamav3
14
0

bytedance_BAGEL-7B-MoT-INT8

NaNK
license:apache-2.0
10
24

kaniTTS_Elise

A quick test run training nineninesix/kani-tts-450m-0.1-pt on MrDragonFox/Elise "Hey there, my name is Elise , and I'm a text to speech model. Do I sound like a person?" Sample 2 "Got it. $300,000. I can definitely help you get a very good price for your property by selecting a realtor."

license:cc-by-4.0
10
1

orpheus-maya-Q5_K_M-GGUF

Gapeleon/orpheus-maya-Q5KM-GGUF This model was converted to GGUF format from `taresh18/orpheus-maya` using llama.cpp via the ggml.ai's GGUF-my-repo space. Refer to the original model card for more details on the model. Use with llama.cpp Install llama.cpp through brew (works on Mac and Linux) Note: You can also use this checkpoint directly through the usage steps listed in the Llama.cpp repo as well. Step 2: Move into the llama.cpp folder and build it with `LLAMACURL=1` flag along with other hardware-specific flags (for ex: LLAMACUDA=1 for Nvidia GPUs on Linux).

llama-cpp
10
0

DeepSeek-R1-0528-CODER-DRAFT-0.6B-v1.0-Q4_K_M-GGUF

Gapeleon/DeepSeek-R1-0528-CODER-DRAFT-0.6B-v1.0-Q4KM-GGUF This model was converted to GGUF format from `jukofyork/DeepSeek-R1-0528-CODER-DRAFT-0.6B-v1.0` using llama.cpp via the ggml.ai's GGUF-my-repo space. Refer to the original model card for more details on the model. Use with llama.cpp Install llama.cpp through brew (works on Mac and Linux) Note: You can also use this checkpoint directly through the usage steps listed in the Llama.cpp repo as well. Step 2: Move into the llama.cpp folder and build it with `LLAMACURL=1` flag along with other hardware-specific flags (for ex: LLAMACUDA=1 for Nvidia GPUs on Linux).

NaNK
llama-cpp
7
0

Orpheus-4b-base

NaNK
llama
6
0

hubertsiuzdak_snac_24khz

Multi-Scale Neural Audio Codec (SNAC) compressess audio into discrete codes at a low bitrate. 👉 This model was primarily trained on speech data, and its recommended use case is speech synthesis. See below for other pretrained models. 🔗 GitHub repository: https://github.com/hubertsiuzdak/snac/ SNAC encodes audio into hierarchical tokens similarly to SoundStream, EnCodec, and DAC. However, SNAC introduces a simple change where coarse tokens are sampled less frequently, covering a broader time span. This model compresses 24 kHz audio into discrete codes at a 0.98 kbps bitrate. It uses 3 RVQ levels with token rates of 12, 23, and 47 Hz. Currently, all models support only single audio channel (mono). | Model | Bitrate | Sample Rate | Params | Recommended use case | |-----------------------------------------------------------------------------|-----------|-------------|--------|--------------------------| | hubertsiuzdak/snac24khz (this model) | 0.98 kbps | 24 kHz | 19.8 M | 🗣️ Speech | | hubertsiuzdak/snac32khz | 1.9 kbps | 32 kHz | 54.5 M | 🎸 Music / Sound Effects | | hubertsiuzdak/snac44khz | 2.6 kbps | 44 kHz | 54.5 M | 🎸 Music / Sound Effects | To encode (and decode) audio with SNAC in Python, use the following code: You can also encode and reconstruct in a single call: ⚠️ Note that `codes` is a list of token sequences of variable lengths, each corresponding to a different temporal resolution. Module definitions are adapted from the Descript Audio Codec.

license:mit
5
0

Slim Orpheus 3b JAPANESE Ft

Pruned original weights down from 28 -> 16 layers (43% reduction) to speed up inference and reduce memory requirements. Trained in Japanese on 14 voices. Below are sample outputs for each voice with quality indicators: - ⭐⭐⭐ Good quality - ⭐⭐ Okay quality - ⭐ Poor quality - ⚠️ Unstable - Japanese Only: This model was trained specifically for Japanese language and cannot speak English or other languages - No Emote Support: Not trained on emotes/emotional cues like , , etc. that were available in the original model - Reduced Parameter Count: While offering faster inference, the reduction from 28 to 16 layers may impact some of the nuanced capabilities of the original Orpheus model - Voice Quality Varies: As noted in the voice quality ratings, some voices perform better than others Orpheus TTS is a state-of-the-art, Llama-based Speech-LLM designed for high-quality, empathetic text-to-speech generation. This model has been finetuned to deliver human-level speech synthesis, achieving exceptional clarity, expressiveness, and real-time streaming performances. - Human-Like Speech: Natural intonation, emotion, and rhythm that is superior to SOTA closed source models - Low Latency: ~200ms streaming latency for realtime applications, reducible to ~100ms with input streaming - GitHub Repo: https://github.com/canopyai/Orpheus-TTS Check out the Orpheus Colab: (link to Colab) or GitHub (link to GitHub) on how to run easy inference on our finetuned models. Model Misuse Do not use our models for impersonation without consent, misinformation or deception (including fake news or fraudulent calls), or any illegal or harmful activity. By using this model, you agree to follow all applicable laws and ethical guidelines. We disclaim responsibility for any use.

NaNK
llama
3
1

Voxtral-Small-24B-2507

Voxtral Small is an enhancement of Mistral Small 3, incorporating state-of-the-art audio input capabilities while retaining best-in-class text performance. It excels at speech transcription, translation and audio understanding. Learn more about Voxtral in our blog post here and our research paper. Voxtral builds upon Mistral Small 3 with powerful audio understanding capabilities. - Dedicated transcription mode: Voxtral can operate in a pure speech transcription mode to maximize performance. By default, Voxtral automatically predicts the source audio language and transcribes the text accordingly - Long-form context: With a 32k token context length, Voxtral handles audios up to 30 minutes for transcription, or 40 minutes for understanding - Built-in Q&A and summarization: Supports asking questions directly through audio. Analyze audio and generate structured summaries without the need for separate ASR and language models - Natively multilingual: Automatic language detection and state-of-the-art performance in the world’s most widely used languages (English, Spanish, French, Portuguese, Hindi, German, Dutch, Italian) - Function-calling straight from voice: Enables direct triggering of backend functions, workflows, or API calls based on spoken user intents - Highly capable at text: Retains the text understanding capabilities of its language model backbone, Mistral Small 3.1 Average word error rate (WER) over the FLEURS, Mozilla Common Voice and Multilingual LibriSpeech benchmarks: The model can be used with the following frameworks; - `vllm (recommended)`: See here - `Transformers` 🤗: See here - `temperature=0.2` and `topp=0.95` for chat completion (e.g. Audio Understanding) and `temperature=0.0` for transcription - Multiple audios per message and multiple user turns with audio are supported - Function calling is supported - System prompts are not yet supported Make sure to install vllm >= `0.10.0`, we recommend using uv Doing so should automatically install `mistralcommon >= 1.8.1`. You can test that your vLLM setup works as expected by cloning the vLLM repo: We recommend that you use Voxtral-Small-24B-2507 in a server/client setting. Note: Running Voxtral-Small-24B-2507 on GPU requires ~55 GB of GPU RAM in bf16 or fp16. 2. To ping the client you can use a simple Python snippet. See the following examples. Leverage the audio capabilities of Voxtral-Small-24B-2507 to chat. Make sure that your client has `mistral-common` with audio installed: L'orateur le plus inspirant est le président. Il est plus inspirant parce qu'il parle de ses expériences personnelles et de son optimisme pour l'avenir du pays. Il est différent de l'autre orateur car il ne parle pas de la météo, mais plutôt de ses interactions avec les gens et de son rôle en tant que président. Voxtral-Small-24B-2507 has powerful transcription capabilities! Make sure that your client has `mistral-common` with audio installed: Voxtral has some experimental function calling support. You can try as shown below. Make sure that your client has `mistral-common` with audio installed: Starting with `transformers >= 4.54.0` and above, you can run Voxtral natively! Make sure to have `mistral-common >= 1.8.1` installed with audio dependencies:

NaNK
license:apache-2.0
3
0

DeepCoder-14B-Preview-int4-awq-ov

NaNK
license:mit
2
0

llasa-3b

NaNK
llama
1
0

Mistral-Small-3.1-24B-Instruct-2503-int4-awq-ov

NaNK
1
0

slim-orpheus-3b-JAPANESE-ft-Q4_K_M-GGUF

NaNK
llama-cpp
1
0

Orpheus-1B-pt

NaNK
llama
1
0

mOrpheus_3B-1Base_early_preview-v1-25000-int4-awq-ov

NaNK
llama
0
1