Respair

16 models • 1 total models in database
Sort by:

Whisper_Large_v2_Encoder_Block

1,202
0

RyuseiNet

license:cc-by-4.0
48
8

Avadec_12hz_44khz

license:cc-by-4.0
20
7

Japanese_Phoneme_to_Grapheme_LLM

20
2

NeMo_Canary

14
0

Higgs_Codec_Extended

[](https://github.com/Respaired/HiggsCodecExtended) This is an on-going project. it is a modified version of Higgs-Boson audio tokenizer, you can fully train it. all scripts have been tested. a Few notes however: - this is not backward compatible with the original checkpoint (I think you can tweak it to be, but you have to adhere to Boson community license if you do.) - I highly recommend you to pretrain the model without the mel and adversarial setup first. it saves you a significant amount of compute, time and speed-up your convergence. raise the batch size as much as you can before the adversarial phase. - for the semantic teacher, I am using which has a good multilingual support. if you want the original setup you can change it in the config. - The loss weights and hyperparameters may not be ideal, feel free to play around with different values. I will train a checkpoint on a larger enough dataset one of these days after figuring out a few things first. but the setup is solid.

license:mit
12
2

Hibiki_ASR_Phonemizer_v0.2

NaNK
license:apache-2.0
3
8

jordand_whisper_d_v1_a

This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated. - Developed by: [More Information Needed] - Funded by [optional]: [More Information Needed] - Shared by [optional]: [More Information Needed] - Model type: [More Information Needed] - Language(s) (NLP): [More Information Needed] - License: [More Information Needed] - Finetuned from model [optional]: [More Information Needed] - Repository: [More Information Needed] - Paper [optional]: [More Information Needed] - Demo [optional]: [More Information Needed] Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019). - Hardware Type: [More Information Needed] - Hours used: [More Information Needed] - Cloud Provider: [More Information Needed] - Compute Region: [More Information Needed] - Carbon Emitted: [More Information Needed]

2
0

deberta-v3-large-finetuned-style

license:mit
1
0

Tsukasa Speech

license:cc-by-nc-4.0
0
80

RiFornet_Vocoder

DDP is very un-stable, please use the single-gpu training script - if you still want to do it, I suggest uncommenting the grad clipping lines; that should help a lot. This Vocoder, is a combination of HiFTnet and Ringformer. it supports Ring Attention, Conformer and Neural Source Filtering etc. This repository is experimental, expect some bugs and some hardcoded params. The default setting is 44.1khz - 128 Mel bins. but I have provided the necessary script for the 24khz version in the LibriTTS checkpoint's folder. Huge Thanks to Johnathan Duering for his help. I mostly implemented this based on his STTS2 Fork. There are Three checkpoints so far in this repository: - RiFornet 24khz (trained for roughly 117K~ steps on LibriTTS (360 + 100) and 40 hours of other English datasets.) - RiFornet 44.1khz (trained for roughly 280K~ steps on a Large (more than 1100 hours) private Multilingual dataset, covering Arabic, Persian, Japanese, English, Russian and also Singing voice in Chinese and Japanese with Quranic recitations in Arabic. - HiFTNet 44.1khz (trained for ~100K steps, on a similar dataset to RiFornet 44.1khz, but slightly smaller and no singing voice). 1. Python >= 3.10 2. Clone this repository: For the F0 model training, please refer to yl4579/PitchExtractor. This repo includes a pre-trained F0 model on a Mixture of Multilingual data for the previously mentioned configuration. I'm going to quote the HiFTnet's Author: "Still, you may want to train your own F0 model for the best performance, particularly for noisy or non-speech data, as we found that F0 estimation accuracy is essential for the vocoder performance." Inference Please refer to the notebook inference.ipynb for details.

license:mit
0
4

Avadec_12hz

license:cc-by-4.0
0
2

Text_Aligners

0
2

StyleTTS_HifiGAN_24khz

license:mit
0
2

XCodec2_24khz

0
2

Test_QwJP

0
1