fishaudio

10 models • 2 total models in database
Sort by:

s1-mini

license:cc-by-nc-sa-4.0
4,857
566

openaudio-s1-mini

license:cc-by-nc-sa-4.0
4,857
565

fish-speech-1.5

Fish Speech V1.5 is a leading text-to-speech (TTS) model trained on more than 1 million hours of audio data in multiple languages. Supported languages: - English (en) >300k hours - Chinese (zh) >300k hours - Japanese (ja) >100k hours - German (de) ~20k hours - French (fr) ~20k hours - Spanish (es) ~20k hours - Korean (ko) ~20k hours - Arabic (ar) ~20k hours - Russian (ru) ~20k hours - Dutch (nl) <10k hours - Italian (it) <10k hours - Polish (pl) <10k hours - Portuguese (pt) <10k hours Please refer to Fish Speech Github for more info. Demo available at Fish Audio. If you found this repository useful, please consider citing this work: This model is permissively licensed under the CC-BY-NC-SA-4.0 license.

license:cc-by-nc-sa-4.0
1,668
639

s2-pro

746
228

fish-speech-1.4

license:cc-by-nc-sa-4.0
180
453

fish-speech-1.2

license:cc-by-nc-sa-4.0
89
207

fish-agent-v0.1-3b

Fish Agent V0.1 3B is a groundbreaking Voice-to-Voice model capable of capturing and generating environmental audio information with unprecedented accuracy. What sets it apart is its semantic-token-free architecture, eliminating the need for traditional semantic encoders/decoders like Whisper and CosyVoice. Additionally, it stands as a state-of-the-art text-to-speech (TTS) model, trained on an extensive dataset of 700,000 hours of multilingual audio content. This model is a continue-pretrained version of Qwen-2.5-3B-Instruct for 200B voice & text tokens. Supported Languages The model supports the following languages with their respective training data sizes: - English (en): ~300,000 hours - Chinese (zh): ~300,000 hours - German (de): ~20,000 hours - Japanese (ja): ~20,000 hours - French (fr): ~20,000 hours - Spanish (es): ~20,000 hours - Korean (ko): ~20,000 hours - Arabic (ar): ~20,000 hours For detailed information and implementation guidelines, please visit our Fish Speech GitHub repository. Citation If you find this repository helpful in your work, please consider citing: License This model and its associated code are released under the BY-CC-NC-SA-4.0 license, allowing for non-commercial use with appropriate attribution.

NaNK
license:cc-by-nc-sa-4.0
42
266

fish-speech-1.2-sft

license:cc-by-nc-sa-4.0
6
16

fish-speech-1

license:cc-by-nc-sa-4.0
0
83

speech-lm-v1

license:cc-by-nc-sa-4.0
0
31