jonatasgrosman

288 models • 3 total models in database

Sort by:

wav2vec2-large-xlsr-53-russian

Fine-tuned XLSR-53 large model for speech recognition in Russian Fine-tuned facebook/wav2vec2-large-xlsr-53 on Russian using the train and validation splits of Common Voice 6.1 and CSS10. When using this model, make sure that your speech input is sampled at 16kHz. This model has been fine-tuned thanks to the GPU credits generously given by the OVHcloud :) The script used for training can be found here: https://github.com/jonatasgrosman/wav2vec2-sprint The model can be used directly (without a language model) as follows... | Reference | Prediction | | ------------- | ------------- | | ОН РАБОТАТЬ, А ЕЕ НЕ УДЕРЖАТЬ НИКАК — БЕГАЕТ ЗА КЛЁШЕМ КАЖДОГО БУЛЬВАРНИКА. | ОН РАБОТАТЬ А ЕЕ НЕ УДЕРЖАТ НИКАК БЕГАЕТ ЗА КЛЕШОМ КАЖДОГО БУЛЬБАРНИКА | | ЕСЛИ НЕ БУДЕТ ВОЗРАЖЕНИЙ, Я БУДУ СЧИТАТЬ, ЧТО АССАМБЛЕЯ СОГЛАСНА С ЭТИМ ПРЕДЛОЖЕНИЕМ. | ЕСЛИ НЕ БУДЕТ ВОЗРАЖЕНИЙ Я БУДУ СЧИТАТЬ ЧТО АССАМБЛЕЯ СОГЛАСНА С ЭТИМ ПРЕДЛОЖЕНИЕМ | | ПАЛЕСТИНЦАМ НЕОБХОДИМО СНАЧАЛА УСТАНОВИТЬ МИР С ИЗРАИЛЕМ, А ЗАТЕМ ДОБИВАТЬСЯ ПРИЗНАНИЯ ГОСУДАРСТВЕННОСТИ. | ПАЛЕСТИНЦАМ НЕОБХОДИМО СНАЧАЛА УСТАНОВИТЬ С НИ МИР ФЕЗРЕЛЕМ А ЗАТЕМ ДОБИВАТЬСЯ ПРИЗНАНИЯ ГОСУДАРСТВЕНСКИ | | У МЕНЯ БЫЛО ТАКОЕ ЧУВСТВО, ЧТО ЧТО-ТО ТАКОЕ ОЧЕНЬ ВАЖНОЕ Я ПРИБАВЛЯЮ. | У МЕНЯ БЫЛО ТАКОЕ ЧУВСТВО ЧТО ЧТО-ТО ТАКОЕ ОЧЕНЬ ВАЖНОЕ Я ПРЕДБАВЛЯЕТ | | ТОЛЬКО ВРЯД ЛИ ПОЙМЕТ. | ТОЛЬКО ВРЯД ЛИ ПОЙМЕТ | | ВРОНСКИЙ, СЛУШАЯ ОДНИМ УХОМ, ПЕРЕВОДИЛ БИНОКЛЬ С БЕНУАРА НА БЕЛЬ-ЭТАЖ И ОГЛЯДЫВАЛ ЛОЖИ. | ЗЛАЗКИ СЛУШАЮ ОТ ОДНИМ УХАМ ТЫ ВОТИ В ВИНОКОТ СПИЛА НА ПЕРЕТАЧ И ОКЛЯДЫВАЛ БОСУ | | К СОЖАЛЕНИЮ, СИТУАЦИЯ ПРОДОЛЖАЕТ УХУДШАТЬСЯ. | К СОЖАЛЕНИЮ СИТУАЦИИ ПРОДОЛЖАЕТ УХУЖАТЬСЯ | | ВСЁ ЖАЛОВАНИЕ УХОДИЛО НА ДОМАШНИЕ РАСХОДЫ И НА УПЛАТУ МЕЛКИХ НЕПЕРЕВОДИВШИХСЯ ДОЛГОВ. | ВСЕ ЖАЛОВАНИЕ УХОДИЛО НА ДОМАШНИЕ РАСХОДЫ И НА УПЛАТУ МЕЛКИХ НЕ ПЕРЕВОДИВШИХСЯ ДОЛГОВ | | ТЕПЕРЬ ДЕЛО, КОНЕЧНО, ЗА ТЕМ, ЧТОБЫ ПРЕВРАТИТЬ СЛОВА В ДЕЛА. | ТЕПЕРЬ ДЕЛАЮ КОНЕЧНО ЗАТЕМ ЧТОБЫ ПРЕВРАТИТЬ СЛОВА В ДЕЛА | | ДЕВЯТЬ | ЛЕВЕТЬ | 1. To evaluate on `mozilla-foundation/commonvoice60` with split `test` 2. To evaluate on `speech-recognition-community-v2/devdata` Citation If you want to cite this model you can use this:

—

6,123,704

wav2vec2-large-xlsr-53-japanese

--- language: ja datasets: - common_voice metrics: - wer - cer tags: - audio - automatic-speech-recognition - speech - xlsr-fine-tuning-week license: apache-2.0 model-index: - name: XLSR Wav2Vec2 Japanese by Jonatas Grosman results: - task: name: Speech Recognition type: automatic-speech-recognition dataset: name: Common Voice ja type: common_voice args: ja metrics: - name: Test WER type: wer value: 81.80 - name: Test CER type: cer value: 20.16 ---

—

5,115,488

wav2vec2-large-xlsr-53-portuguese

Fine-tuned XLSR-53 large model for speech recognition in Portuguese Fine-tuned facebook/wav2vec2-large-xlsr-53 on Portuguese using the train and validation splits of Common Voice 6.1. When using this model, make sure that your speech input is sampled at 16kHz. This model has been fine-tuned thanks to the GPU credits generously given by the OVHcloud :) The script used for training can be found here: https://github.com/jonatasgrosman/wav2vec2-sprint The model can be used directly (without a language model) as follows... | Reference | Prediction | | ------------- | ------------- | | NEM O RADAR NEM OS OUTROS INSTRUMENTOS DETECTARAM O BOMBARDEIRO STEALTH. | NEMHUM VADAN OS OLTWES INSTRUMENTOS DE TTÉÃN UM BOMBERDEIRO OSTER | | PEDIR DINHEIRO EMPRESTADO ÀS PESSOAS DA ALDEIA | E DIR ENGINHEIRO EMPRESTAR AS PESSOAS DA ALDEIA | | OITO | OITO | | TRANCÁ-LOS | TRANCAUVOS | | REALIZAR UMA INVESTIGAÇÃO PARA RESOLVER O PROBLEMA | REALIZAR UMA INVESTIGAÇÃO PARA RESOLVER O PROBLEMA | | O YOUTUBE AINDA É A MELHOR PLATAFORMA DE VÍDEOS. | YOUTUBE AINDA É A MELHOR PLATAFOMA DE VÍDEOS | | MENINA E MENINO BEIJANDO NAS SOMBRAS | MENINA E MENINO BEIJANDO NAS SOMBRAS | | EU SOU O SENHOR | EU SOU O SENHOR | | DUAS MULHERES QUE SENTAM-SE PARA BAIXO LENDO JORNAIS. | DUAS MIERES QUE SENTAM-SE PARA BAICLANE JODNÓI | | EU ORIGINALMENTE ESPERAVA | EU ORIGINALMENTE ESPERAVA | 1. To evaluate on `mozilla-foundation/commonvoice60` with split `test` 2. To evaluate on `speech-recognition-community-v2/devdata` Citation If you want to cite this model you can use this:

—

4,733,690

wav2vec2-large-xlsr-53-arabic

--- language: ar datasets: - common_voice - arabic_speech_corpus metrics: - wer - cer tags: - audio - automatic-speech-recognition - speech - xlsr-fine-tuning-week license: apache-2.0 model-index: - name: XLSR Wav2Vec2 Arabic by Jonatas Grosman results: - task: name: Speech Recognition type: automatic-speech-recognition dataset: name: Common Voice ar type: common_voice args: ar metrics: - name: Test WER type: wer value: 39.59 - name: Test CER type: cer value: 18.18 ---

—

3,382,354

wav2vec2-large-xlsr-53-chinese-zh-cn

--- language: zh datasets: - common_voice metrics: - wer - cer tags: - audio - automatic-speech-recognition - speech - xlsr-fine-tuning-week license: apache-2.0 model-index: - name: XLSR Wav2Vec2 Chinese (zh-CN) by Jonatas Grosman results: - task: name: Speech Recognition type: automatic-speech-recognition dataset: name: Common Voice zh-CN type: common_voice args: zh-CN metrics: - name: Test WER type: wer value: 82.37 - name: Test CER type: cer value: 19.03 ---

—

3,076,355

121

wav2vec2-large-xlsr-53-dutch

--- language: nl license: apache-2.0 datasets: - common_voice - mozilla-foundation/common_voice_6_0 metrics: - wer - cer tags: - audio - automatic-speech-recognition - hf-asr-leaderboard - mozilla-foundation/common_voice_6_0 - nl - robust-speech-event - speech - xlsr-fine-tuning-week model-index: - name: XLSR Wav2Vec2 Dutch by Jonatas Grosman results: - task: name: Automatic Speech Recognition type: automatic-speech-recognition dataset: name: Common Voice nl type: common_voice args: nl metrics:

jonatasgrosman

wav2vec2-large-xlsr-53-russian

wav2vec2-large-xlsr-53-japanese

wav2vec2-large-xlsr-53-portuguese

wav2vec2-large-xlsr-53-arabic

wav2vec2-large-xlsr-53-chinese-zh-cn

wav2vec2-large-xlsr-53-dutch

wav2vec2-large-xlsr-53-persian

wav2vec2-large-xlsr-53-polish

wav2vec2-large-xlsr-53-greek

wav2vec2-large-xlsr-53-hungarian

wav2vec2-large-xlsr-53-english

wav2vec2-xls-r-1b-portuguese

wav2vec2-large-xlsr-53-finnish

wav2vec2-large-xlsr-53-spanish

wav2vec2-large-xlsr-53-german

wav2vec2-large-xlsr-53-french

wav2vec2-xls-r-1b-english

wav2vec2-xls-r-1b-russian

wav2vec2-xls-r-1b-spanish

wav2vec2-large-xlsr-53-italian

wav2vec2-xls-r-1b-german

exp_w2v2t_en_wavlm_s990

wav2vec2-xls-r-1b-french

whisper-large-zh-cv11

whisper-large-pt-cv11

wav2vec2-large-english

wav2vec2-large-fr-voxpopuli-french

wav2vec2-xls-r-1b-italian

exp_w2v2t_ru_unispeech_s42

wav2vec2-xls-r-1b-polish

wav2vec2-xls-r-1b-dutch

exp_w2v2t_pt_hubert_s807

exp_w2v2t_es_wavlm_s115

whisper-large-fr-cv11

exp_w2v2t_uk_vp-sv_s428

exp_w2v2t_pt_vp-it_s996

bartuque-bart-base-pretrained-r-2

bartuque-bart-base-pretrained-rm-2

paraphrase

exp_w2v2t_en_wavlm_s461

exp_w2v2t_it_no-pretraining_s615

exp_w2v2t_it_unispeech-ml_s784

exp_w2v2t_fa_vp-100k_s88

exp_w2v2t_es_wavlm_s26

exp_w2v2t_ru_wavlm_s363

bartuque-bart-base-pretrained-mm-2

bartuque-bart-base-random-r-2

exp_w2v2t_th_wavlm_s847

exp_w2v2t_th_unispeech-ml_s256

exp_w2v2t_it_wavlm_s895

exp_w2v2t_fa_hubert_s889

exp_w2v2t_fa_wavlm_s527

exp_w2v2t_de_unispeech-sat_s75

exp_w2v2t_de_vp-it_s962

exp_w2v2t_ar_hubert_s947

exp_w2v2t_ar_unispeech-ml_s365

exp_w2v2t_es_no-pretraining_s953

whisper-small-pt-cv11-v7

whisper-large-es-cv11

exp_w2v2t_en_unispeech-ml_s103

exp_w2v2t_en_unispeech-sat_s459

exp_w2v2t_th_unispeech_s328

exp_w2v2t_th_hubert_s533

exp_w2v2t_th_vp-sv_s635

exp_w2v2t_th_unispeech-ml_s640

exp_w2v2t_th_vp-es_s26

exp_w2v2t_th_vp-es_s51

exp_w2v2t_th_vp-it_s259

exp_w2v2t_ja_unispeech_s947

exp_w2v2t_it_unispeech_s156

exp_w2v2t_it_vp-nl_s27

exp_w2v2t_it_vp-nl_s335

exp_w2v2t_it_unispeech-sat_s500

exp_w2v2t_fr_unispeech-sat_s115

exp_w2v2t_sv-se_unispeech_s149

exp_w2v2t_sv-se_vp-it_s817

exp_w2v2t_fa_wav2vec2_s168

exp_w2v2t_fa_wavlm_s545

exp_w2v2t_uk_vp-es_s211