lex-au

21 models • 2 total models in database
Sort by:

Orpheus-3b-FT-Q4_K_M.gguf

This is a quantised version of canopylabs/orpheus-3b-0.1-ft. Orpheus is a high-performance Text-to-Speech model fine-tuned for natural, emotional speech synthesis. This repository hosts the 8-bit quantised version of the 3B parameter model, optimised for efficiency while maintaining high-quality output. Orpheus-3b-FT-Q4KM is a 3 billion parameter Text-to-Speech model that converts text inputs into natural-sounding speech with support for multiple voices and emotional expressions. The model has been quantised to 8-bit (Q4KM) format for efficient inference, making it accessible on consumer hardware. Key features: - 8 distinct voice options with different characteristics - Support for emotion tags like laughter, sighs, etc. - Optimised for CUDA acceleration on RTX GPUs - Produces high-quality 24kHz mono audio - Fine-tuned for conversational naturalness This model is designed to be used with an LLM inference server that connects to the Orpheus-FastAPI frontend, which provides both a web UI and OpenAI-compatible API endpoints. This quantised model can be loaded into any of these LLM inference servers: - GPUStack - GPU optimised LLM inference server (My pick) - supports LAN/WAN tensor split parallelisation - LM Studio - Load the GGUF model and start the local server - llama.cpp server - Run with the appropriate model parameters - Any compatible OpenAI API-compatible server 1. Download this quantised model from lex-au's Orpheus-FASTAPI collection 2. Load the model in your preferred inference server and start the server. 4. Configure the FastAPI server to connect to your inference server by setting the `ORPHEUSAPIURL` environment variable. 5. Follow the complete installation and setup instructions in the repository README. Listen to the model in action with different voices and emotions: The model supports 8 different voices: - `tara`: Female, conversational, clear - `leah`: Female, warm, gentle - `jess`: Female, energetic, youthful - `leo`: Male, authoritative, deep - `dan`: Male, friendly, casual - `mia`: Female, professional, articulate - `zac`: Male, enthusiastic, dynamic - `zoe`: Female, calm, soothing You can add expressiveness to speech by inserting tags: - ` `, ` `: For laughter sounds - ` `: For sighing sounds - ` `, ` `: For subtle interruptions - ` `, ` `, ` `: For additional emotional expression - Architecture: Specialised token-to-audio sequence model - Parameters: ~3 billion - Quantisation: 8-bit (GGUF Q4KM format) - Audio Sample Rate: 24kHz - Input: Text with optional voice selection and emotion tags - Output: High-quality WAV audio - Language: English - Hardware Requirements: CUDA-compatible GPU (recommended: RTX series) - Integration Method: External LLM inference server + Orpheus-FastAPI frontend - Currently supports English text only - Best performance achieved on CUDA-compatible GPUs - Generation speed depends on GPU capability This model is available under the Apache License 2.0. The original Orpheus model was created by Canopy Labs. This repository contains a quantised version optimised for use with the Orpheus-FastAPI server. If you use this quantised model in your research or applications, please cite:

NaNK
license:apache-2.0
1,150
6

Orpheus-3b-FT-Q8_0.gguf

This is a quantised version of canopylabs/orpheus-3b-0.1-ft. Orpheus is a high-performance Text-to-Speech model fine-tuned for natural, emotional speech synthesis. This repository hosts the 8-bit quantised version of the 3B parameter model, optimised for efficiency while maintaining high-quality output. Orpheus-3b-FT-Q80 is a 3 billion parameter Text-to-Speech model that converts text inputs into natural-sounding speech with support for multiple voices and emotional expressions. The model has been quantised to 8-bit (Q80) format for efficient inference, making it accessible on consumer hardware. Key features: - 8 distinct voice options with different characteristics - Support for emotion tags like laughter, sighs, etc. - Optimised for CUDA acceleration on RTX GPUs - Produces high-quality 24kHz mono audio - Fine-tuned for conversational naturalness This model is designed to be used with an LLM inference server that connects to the Orpheus-FastAPI frontend, which provides both a web UI and OpenAI-compatible API endpoints. This quantised model can be loaded into any of these LLM inference servers: - GPUStack - GPU optimised LLM inference server (My pick) - supports LAN/WAN tensor split parallelisation - LM Studio - Load the GGUF model and start the local server - llama.cpp server - Run with the appropriate model parameters - Any compatible OpenAI API-compatible server 1. Download this quantised model from lex-au's Orpheus-FASTAPI collection 2. Load the model in your preferred inference server and start the server. 4. Configure the FastAPI server to connect to your inference server by setting the `ORPHEUSAPIURL` environment variable. 5. Follow the complete installation and setup instructions in the repository README. Listen to the model in action with different voices and emotions: The model supports 8 different voices: - `tara`: Female, conversational, clear - `leah`: Female, warm, gentle - `jess`: Female, energetic, youthful - `leo`: Male, authoritative, deep - `dan`: Male, friendly, casual - `mia`: Female, professional, articulate - `zac`: Male, enthusiastic, dynamic - `zoe`: Female, calm, soothing You can add expressiveness to speech by inserting tags: - ` `, ` `: For laughter sounds - ` `: For sighing sounds - ` `, ` `: For subtle interruptions - ` `, ` `, ` `: For additional emotional expression - Architecture: Specialised token-to-audio sequence model - Parameters: ~3 billion - Quantisation: 8-bit (GGUF Q80 format) - Audio Sample Rate: 24kHz - Input: Text with optional voice selection and emotion tags - Output: High-quality WAV audio - Language: English - Hardware Requirements: CUDA-compatible GPU (recommended: RTX series) - Integration Method: External LLM inference server + Orpheus-FastAPI frontend - Currently supports English text only - Best performance achieved on CUDA-compatible GPUs - Generation speed depends on GPU capability This model is available under the Apache License 2.0. The original Orpheus model was created by Canopy Labs. This repository contains a quantised version optimised for use with the Orpheus-FastAPI server. If you use this quantised model in your research or applications, please cite:

NaNK
license:apache-2.0
1,136
28

Orpheus-3b-German-FT-Q8_0.gguf

This is a quantised version of canopylabs/3b-de-ft-researchrelease. Orpheus is a high-performance Text-to-Speech model fine-tuned for natural, emotional speech synthesis. This repository hosts the 8-bit quantised version of the 3B parameter model, optimised for efficiency while maintaining high-quality output. Orpheus-3b-FT-Q80 is a 3 billion parameter Text-to-Speech model that converts text inputs into natural-sounding speech with support for multiple voices and emotional expressions. The model has been quantised to 8-bit (Q80) format for efficient inference, making it accessible on consumer hardware. Key features: - 3 distinct voice options with different characteristics - Support for emotion tags like laughter, sighs, etc. - Optimised for CUDA acceleration on RTX GPUs - Produces high-quality 24kHz mono audio - Fine-tuned for conversational naturalness This model is designed to be used with an LLM inference server that connects to the Orpheus-FastAPI frontend, which provides both a web UI and OpenAI-compatible API endpoints. This quantised model can be loaded into any of these LLM inference servers: - GPUStack - GPU optimised LLM inference server (My pick) - supports LAN/WAN tensor split parallelisation - LM Studio - Load the GGUF model and start the local server - llama.cpp server - Run with the appropriate model parameters - Any compatible OpenAI API-compatible server 1. Download this quantised model from lex-au's Orpheus-FASTAPI collection 2. Load the model in your preferred inference server and start the server. 4. Configure the FastAPI server to connect to your inference server by setting the `ORPHEUSAPIURL` environment variable. 5. Follow the complete installation and setup instructions in the repository README. The model supports 3 different voices: - `Jana`: Female, German, clear - `Thomas`: Male, German, authoritative - `Max`: Male, German, energetic You can add expressiveness to speech by inserting tags: - ` `, ` `: For laughter sounds - ` `: For sighing sounds - ` `, ` `: For subtle interruptions - ` `, ` `, ` `: For additional emotional expression - Architecture: Specialised token-to-audio sequence model - Parameters: ~3 billion - Quantisation: 8-bit (GGUF Q80 format) - Audio Sample Rate: 24kHz - Input: Text with optional voice selection and emotion tags - Output: High-quality WAV audio - Language: German - Hardware Requirements: CUDA-compatible GPU (recommended: RTX series) - Integration Method: External LLM inference server + Orpheus-FastAPI frontend - Best performance achieved on CUDA-compatible GPUs - Generation speed depends on GPU capability This model is available under the Apache License 2.0. The original Orpheus model was created by Canopy Labs. This repository contains a quantised version optimised for use with the Orpheus-FastAPI server. If you use this quantised model in your research or applications, please cite:

NaNK
license:apache-2.0
295
9

Orpheus-3b-FT-Q2_K.gguf

NaNK
license:apache-2.0
166
6

Orpheus-3b-Italian_Spanish-FT-Q8_0.gguf

NaNK
license:apache-2.0
127
2

Orpheus 3b French FT Q8 0.Gguf

This is a quantised version of canopylabs/3b-fr-ft-researchrelease. Orpheus is a high-performance Text-to-Speech model fine-tuned for natural, emotional speech synthesis. This repository hosts the 8-bit quantised version of the 3B parameter model, optimised for efficiency while maintaining high-quality output. Orpheus-3b-FT-Q80 is a 3 billion parameter Text-to-Speech model that converts text inputs into natural-sounding speech with support for multiple voices and emotional expressions. The model has been quantised to 8-bit (Q80) format for efficient inference, making it accessible on consumer hardware. Key features: - 3 distinct voice options with different characteristics - Support for emotion tags like laughter, sighs, etc. - Optimised for CUDA acceleration on RTX GPUs - Produces high-quality 24kHz mono audio - Fine-tuned for conversational naturalness This model is designed to be used with an LLM inference server that connects to the Orpheus-FastAPI frontend, which provides both a web UI and OpenAI-compatible API endpoints. This quantised model can be loaded into any of these LLM inference servers: - GPUStack - GPU optimised LLM inference server (My pick) - supports LAN/WAN tensor split parallelisation - LM Studio - Load the GGUF model and start the local server - llama.cpp server - Run with the appropriate model parameters - Any compatible OpenAI API-compatible server 1. Download this quantised model from lex-au's Orpheus-FASTAPI collection 2. Load the model in your preferred inference server and start the server. 4. Configure the FastAPI server to connect to your inference server by setting the `ORPHEUSAPIURL` environment variable. 5. Follow the complete installation and setup instructions in the repository README. The model supports 3 different voices: - `Pierre`: Male, French, sophisticated - `Amelie`: Female, French, elegant - `Marie`: Female, French, spirited You can add expressiveness to speech by inserting tags: - ` `, ` `: For laughter sounds - ` `: For sighing sounds - ` `, ` `: For subtle interruptions - ` `, ` `, ` `: For additional emotional expression - Architecture: Specialised token-to-audio sequence model - Parameters: ~3 billion - Quantisation: 8-bit (GGUF Q80 format) - Audio Sample Rate: 24kHz - Input: Text with optional voice selection and emotion tags - Output: High-quality WAV audio - Language: French - Hardware Requirements: CUDA-compatible GPU (recommended: RTX series) - Integration Method: External LLM inference server + Orpheus-FastAPI frontend - Best performance achieved on CUDA-compatible GPUs - Generation speed depends on GPU capability This model is available under the Apache License 2.0. The original Orpheus model was created by Canopy Labs. This repository contains a quantised version optimised for use with the Orpheus-FastAPI server. If you use this quantised model in your research or applications, please cite:

NaNK
license:apache-2.0
107
3

Orpheus-3b-Korean-FT-Q8_0.gguf

NaNK
license:apache-2.0
106
2

Orpheus-3b-Chinese-FT-Q8_0.gguf

This is a quantised version of canopylabs/3b-zh-ft-researchrelease. Orpheus is a high-performance Text-to-Speech model fine-tuned for natural, emotional speech synthesis. This repository hosts the 8-bit quantised version of the 3B parameter model, optimised for efficiency while maintaining high-quality output. Orpheus-3b-FT-Q80 is a 3 billion parameter Text-to-Speech model that converts text inputs into natural-sounding speech with support for multiple voices and emotional expressions. The model has been quantised to 8-bit (Q80) format for efficient inference, making it accessible on consumer hardware. Key features: - 2 distinct voice options with different characteristics - Support for emotion tags like laughter, sighs, etc. - Optimised for CUDA acceleration on RTX GPUs - Produces high-quality 24kHz mono audio - Fine-tuned for conversational naturalness This model is designed to be used with an LLM inference server that connects to the Orpheus-FastAPI frontend, which provides both a web UI and OpenAI-compatible API endpoints. This quantised model can be loaded into any of these LLM inference servers: - GPUStack - GPU optimised LLM inference server (My pick) - supports LAN/WAN tensor split parallelisation - LM Studio - Load the GGUF model and start the local server - llama.cpp server - Run with the appropriate model parameters - Any compatible OpenAI API-compatible server 1. Download this quantised model from lex-au's Orpheus-FASTAPI collection 2. Load the model in your preferred inference server and start the server. 4. Configure the FastAPI server to connect to your inference server by setting the `ORPHEUSAPIURL` environment variable. 5. Follow the complete installation and setup instructions in the repository README. The model supports 2 different voices: - `长乐`: Female, Mandarin, gentle - `白芷`: Female, Mandarin, clear You can add expressiveness to speech by inserting tags: - ` `, ` `: For laughter sounds - ` `: For sighing sounds - ` `, ` `: For subtle interruptions - ` `, ` `, ` `: For additional emotional expression - Architecture: Specialised token-to-audio sequence model - Parameters: ~3 billion - Quantisation: 8-bit (GGUF Q80 format) - Audio Sample Rate: 24kHz - Input: Text with optional voice selection and emotion tags - Output: High-quality WAV audio - Language: Mandarin - Hardware Requirements: CUDA-compatible GPU (recommended: RTX series) - Integration Method: External LLM inference server + Orpheus-FastAPI frontend - Best performance achieved on CUDA-compatible GPUs - Generation speed depends on GPU capability This model is available under the Apache License 2.0. The original Orpheus model was created by Canopy Labs. This repository contains a quantised version optimised for use with the Orpheus-FastAPI server. If you use this quantised model in your research or applications, please cite:

NaNK
license:apache-2.0
74
3

Google.Gemma-3-4b-it-GGUF

NaNK
72
1

Vocalis-Q4_K_M.gguf

NaNK
base_model:meta-llama/Meta-Llama-3-8B-Instruct
48
3

Orpheus-3b-Hindi-FT-Q8_0.gguf

NaNK
license:apache-2.0
43
2

Orpheus-3b-Kaya-Q2_K.gguf

NaNK
llama
37
0

Orpheus-3b-Kaya-Q8_0.gguf

This is a fine-tuned version of the pretrained model canopylabs/orpheus-3b-0.1-pretrained, trained on a custom voice dataset and quantised to GGUF Q80 format for fast, efficient inference. - Model Type: Text-to-Speech (TTS) - Architecture: Token-to-audio language model - Parameters: ~3 billion - Quantisation: 8-bit GGUF (Q80) - Sampling Rate: 24kHz mono - Training Epochs: 1 - Training Dataset: lex-au/Orpheus-3b-Kaya - Languages: English This model is designed for use with Orpheus-FastAPI, an OpenAI-compatible inference server for text-to-speech generation. Compatible Inference Servers You can load this model into: - GPUStack - LM Studio - llama.cpp - Any other GGUF-compatible OpenAI-style server Apache License 2.0 — free for research and commercial use. - Original model by: Canopy Labs - Fine-tuned, quantised, and API-wrapped by: Lex-au via Unsloth and Huggingface's TRL library.

NaNK
llama
30
0

Orpheus-3b-Kaya-FP16.gguf

NaNK
llama
28
0

Google.Gemma-3-27b-pt-GGUF

NaNK
23
0

Google.Gemma-3-1b-it-GGUF

NaNK
23
0

Orpheus-3b-Kaya-Q4_K_M.gguf

NaNK
llama
21
0

Orpheus-3b-Kaya-Q6_K.gguf

NaNK
llama
17
0

shuttle-3.5-Q8_0-GGUF

NaNK
license:apache-2.0
16
0

Vocalis-FP16.gguf

NaNK
base_model:meta-llama/Meta-Llama-3-8B-Instruct
13
0

Vocalis-Q8_0.gguf

🧠 Model Card: LLaMA 3 8B Instruct – Conversational Roleplay Enhanced Model Name: `lex-au/vocalis-Q80.gguf` Base Model: Meta LLaMA 3 8B Instruct Fine-Tuned By: Lex (lex-au) Model Type: Causal Decoder Transformer License: Non-commercial (see Meta LLaMA 3 license) `vocalis-Q80.gguf` is a fine-tuned and quantised variant of Meta’s LLaMA 3 8B Instruct model, purpose-built for immersive roleplaying and low-latency conversational AI in the Vocalis web app. This model tracks spatial and temporal context across turns, enabling natural, persistent narrative flows and highly dynamic character interactions. Designed for low-resource environments, it balances responsiveness with rich generative quality, making it ideal for real-time applications. - Purpose: Enable vivid storytelling and contextually grounded interactions for roleplay-based use cases. - Focus Areas: - Spatial awareness (tracking objects, locations, movement) - Temporal sequencing (event continuity, time-of-day, memory within thread) - Expressive language generation with adaptable tone - Fine-Tuning Strategy: LoRA-based adaptation on multi-turn dialogues, fantasy/sci-fi roleplay transcripts, and narrative world-state tracking prompts. - Deployment Goal: Minimal latency, low memory footprint — optimised for running inside Vocalis - Text-based RPG narration - Interactive fiction engines - AI-driven NPCs for browser games - Persistent-world simulation bots - Web-deployed immersive chat agents - Maintains spatial & narrative continuity across turns - Generates richly descriptive, tone-sensitive prose - Adapts easily to character tone and dynamic environments - Ideal for 1st-person, 2nd-person, or multi-speaker scenarios - Context window constraints may cause memory loss over extended turns - Can over-describe or hallucinate detail in ambiguous settings - Not suited for task-based or factual Q&A applications - No long-term memory or external knowledge integration User: You enter the crumbling temple. Where’s the light coming from? Assistant: As you step into the ancient temple, a soft, ethereal glow emanates from within its walls. The light seems to seep through the weathered stone, casting an otherworldly illumination on the dusty artifacts and worn carvings that line the passageway. Upon closer inspection, you notice that the light is coming from a series of intricate, glowing symbols etched into the ceiling above. These mystical runes pulse with a gentle, blue-green luminescence, their soft radiance casting an enchanting atmosphere throughout the temple's interior. The air inside the temple feels heavy with age and mystery, yet the soothing light from the celestial symbols imbues the space with a sense of peace and ancient wisdom. As you venture deeper into the labyrinthine structure, the glow seems to grow stronger, guiding you towards secrets that have lain dormant for centuries within the sacred halls. - Format: GGUF - Quantisation: Q80 - Optimised For: Real-time inference (Vocalis-compatible) - Recommended Frontends: OpenWebUI, KoboldCPP, LM Studio, or custom web clients (e.g., Flask/Vite apps) > 🧪 This model was fine-tuned specifically for the Vocalis webapp project — a low-latency, voice-enabled AI assistant platform. The base model used for fine-tuning is Meta’s LLaMA 3 8B Instruct. For licensing, refer to Meta's license terms. This repository contains a quantised version optimised for low-latency use in the Vocalis real-time webapp. If you use this model in your research or application, please cite:

NaNK
base_model:meta-llama/Meta-Llama-3-8B-Instruct
8
0