jonatasgrosman

288 models • 3 total models in database
Sort by:

wav2vec2-large-xlsr-53-russian

Fine-tuned XLSR-53 large model for speech recognition in Russian Fine-tuned facebook/wav2vec2-large-xlsr-53 on Russian using the train and validation splits of Common Voice 6.1 and CSS10. When using this model, make sure that your speech input is sampled at 16kHz. This model has been fine-tuned thanks to the GPU credits generously given by the OVHcloud :) The script used for training can be found here: https://github.com/jonatasgrosman/wav2vec2-sprint The model can be used directly (without a language model) as follows... | Reference | Prediction | | ------------- | ------------- | | ОН РАБОТАТЬ, А ЕЕ НЕ УДЕРЖАТЬ НИКАК — БЕГАЕТ ЗА КЛЁШЕМ КАЖДОГО БУЛЬВАРНИКА. | ОН РАБОТАТЬ А ЕЕ НЕ УДЕРЖАТ НИКАК БЕГАЕТ ЗА КЛЕШОМ КАЖДОГО БУЛЬБАРНИКА | | ЕСЛИ НЕ БУДЕТ ВОЗРАЖЕНИЙ, Я БУДУ СЧИТАТЬ, ЧТО АССАМБЛЕЯ СОГЛАСНА С ЭТИМ ПРЕДЛОЖЕНИЕМ. | ЕСЛИ НЕ БУДЕТ ВОЗРАЖЕНИЙ Я БУДУ СЧИТАТЬ ЧТО АССАМБЛЕЯ СОГЛАСНА С ЭТИМ ПРЕДЛОЖЕНИЕМ | | ПАЛЕСТИНЦАМ НЕОБХОДИМО СНАЧАЛА УСТАНОВИТЬ МИР С ИЗРАИЛЕМ, А ЗАТЕМ ДОБИВАТЬСЯ ПРИЗНАНИЯ ГОСУДАРСТВЕННОСТИ. | ПАЛЕСТИНЦАМ НЕОБХОДИМО СНАЧАЛА УСТАНОВИТЬ С НИ МИР ФЕЗРЕЛЕМ А ЗАТЕМ ДОБИВАТЬСЯ ПРИЗНАНИЯ ГОСУДАРСТВЕНСКИ | | У МЕНЯ БЫЛО ТАКОЕ ЧУВСТВО, ЧТО ЧТО-ТО ТАКОЕ ОЧЕНЬ ВАЖНОЕ Я ПРИБАВЛЯЮ. | У МЕНЯ БЫЛО ТАКОЕ ЧУВСТВО ЧТО ЧТО-ТО ТАКОЕ ОЧЕНЬ ВАЖНОЕ Я ПРЕДБАВЛЯЕТ | | ТОЛЬКО ВРЯД ЛИ ПОЙМЕТ. | ТОЛЬКО ВРЯД ЛИ ПОЙМЕТ | | ВРОНСКИЙ, СЛУШАЯ ОДНИМ УХОМ, ПЕРЕВОДИЛ БИНОКЛЬ С БЕНУАРА НА БЕЛЬ-ЭТАЖ И ОГЛЯДЫВАЛ ЛОЖИ. | ЗЛАЗКИ СЛУШАЮ ОТ ОДНИМ УХАМ ТЫ ВОТИ В ВИНОКОТ СПИЛА НА ПЕРЕТАЧ И ОКЛЯДЫВАЛ БОСУ | | К СОЖАЛЕНИЮ, СИТУАЦИЯ ПРОДОЛЖАЕТ УХУДШАТЬСЯ. | К СОЖАЛЕНИЮ СИТУАЦИИ ПРОДОЛЖАЕТ УХУЖАТЬСЯ | | ВСЁ ЖАЛОВАНИЕ УХОДИЛО НА ДОМАШНИЕ РАСХОДЫ И НА УПЛАТУ МЕЛКИХ НЕПЕРЕВОДИВШИХСЯ ДОЛГОВ. | ВСЕ ЖАЛОВАНИЕ УХОДИЛО НА ДОМАШНИЕ РАСХОДЫ И НА УПЛАТУ МЕЛКИХ НЕ ПЕРЕВОДИВШИХСЯ ДОЛГОВ | | ТЕПЕРЬ ДЕЛО, КОНЕЧНО, ЗА ТЕМ, ЧТОБЫ ПРЕВРАТИТЬ СЛОВА В ДЕЛА. | ТЕПЕРЬ ДЕЛАЮ КОНЕЧНО ЗАТЕМ ЧТОБЫ ПРЕВРАТИТЬ СЛОВА В ДЕЛА | | ДЕВЯТЬ | ЛЕВЕТЬ | 1. To evaluate on `mozilla-foundation/commonvoice60` with split `test` 2. To evaluate on `speech-recognition-community-v2/devdata` Citation If you want to cite this model you can use this:

6,123,704
62

wav2vec2-large-xlsr-53-japanese

--- language: ja datasets: - common_voice metrics: - wer - cer tags: - audio - automatic-speech-recognition - speech - xlsr-fine-tuning-week license: apache-2.0 model-index: - name: XLSR Wav2Vec2 Japanese by Jonatas Grosman results: - task: name: Speech Recognition type: automatic-speech-recognition dataset: name: Common Voice ja type: common_voice args: ja metrics: - name: Test WER type: wer value: 81.80 - name: Test CER type: cer value: 20.16 ---

5,115,488
43

wav2vec2-large-xlsr-53-portuguese

Fine-tuned XLSR-53 large model for speech recognition in Portuguese Fine-tuned facebook/wav2vec2-large-xlsr-53 on Portuguese using the train and validation splits of Common Voice 6.1. When using this model, make sure that your speech input is sampled at 16kHz. This model has been fine-tuned thanks to the GPU credits generously given by the OVHcloud :) The script used for training can be found here: https://github.com/jonatasgrosman/wav2vec2-sprint The model can be used directly (without a language model) as follows... | Reference | Prediction | | ------------- | ------------- | | NEM O RADAR NEM OS OUTROS INSTRUMENTOS DETECTARAM O BOMBARDEIRO STEALTH. | NEMHUM VADAN OS OLTWES INSTRUMENTOS DE TTÉÃN UM BOMBERDEIRO OSTER | | PEDIR DINHEIRO EMPRESTADO ÀS PESSOAS DA ALDEIA | E DIR ENGINHEIRO EMPRESTAR AS PESSOAS DA ALDEIA | | OITO | OITO | | TRANCÁ-LOS | TRANCAUVOS | | REALIZAR UMA INVESTIGAÇÃO PARA RESOLVER O PROBLEMA | REALIZAR UMA INVESTIGAÇÃO PARA RESOLVER O PROBLEMA | | O YOUTUBE AINDA É A MELHOR PLATAFORMA DE VÍDEOS. | YOUTUBE AINDA É A MELHOR PLATAFOMA DE VÍDEOS | | MENINA E MENINO BEIJANDO NAS SOMBRAS | MENINA E MENINO BEIJANDO NAS SOMBRAS | | EU SOU O SENHOR | EU SOU O SENHOR | | DUAS MULHERES QUE SENTAM-SE PARA BAIXO LENDO JORNAIS. | DUAS MIERES QUE SENTAM-SE PARA BAICLANE JODNÓI | | EU ORIGINALMENTE ESPERAVA | EU ORIGINALMENTE ESPERAVA | 1. To evaluate on `mozilla-foundation/commonvoice60` with split `test` 2. To evaluate on `speech-recognition-community-v2/devdata` Citation If you want to cite this model you can use this:

4,733,690
35

wav2vec2-large-xlsr-53-arabic

--- language: ar datasets: - common_voice - arabic_speech_corpus metrics: - wer - cer tags: - audio - automatic-speech-recognition - speech - xlsr-fine-tuning-week license: apache-2.0 model-index: - name: XLSR Wav2Vec2 Arabic by Jonatas Grosman results: - task: name: Speech Recognition type: automatic-speech-recognition dataset: name: Common Voice ar type: common_voice args: ar metrics: - name: Test WER type: wer value: 39.59 - name: Test CER type: cer value: 18.18 ---

3,382,354
40

wav2vec2-large-xlsr-53-chinese-zh-cn

--- language: zh datasets: - common_voice metrics: - wer - cer tags: - audio - automatic-speech-recognition - speech - xlsr-fine-tuning-week license: apache-2.0 model-index: - name: XLSR Wav2Vec2 Chinese (zh-CN) by Jonatas Grosman results: - task: name: Speech Recognition type: automatic-speech-recognition dataset: name: Common Voice zh-CN type: common_voice args: zh-CN metrics: - name: Test WER type: wer value: 82.37 - name: Test CER type: cer value: 19.03 ---

3,076,355
121

wav2vec2-large-xlsr-53-dutch

--- language: nl license: apache-2.0 datasets: - common_voice - mozilla-foundation/common_voice_6_0 metrics: - wer - cer tags: - audio - automatic-speech-recognition - hf-asr-leaderboard - mozilla-foundation/common_voice_6_0 - nl - robust-speech-event - speech - xlsr-fine-tuning-week model-index: - name: XLSR Wav2Vec2 Dutch by Jonatas Grosman results: - task: name: Automatic Speech Recognition type: automatic-speech-recognition dataset: name: Common Voice nl type: common_voice args: nl metrics:

license:apache-2.0
2,934,943
13

wav2vec2-large-xlsr-53-persian

--- language: fa datasets: - common_voice metrics: - wer - cer tags: - audio - automatic-speech-recognition - speech - xlsr-fine-tuning-week license: apache-2.0 model-index: - name: XLSR Wav2Vec2 Persian by Jonatas Grosman results: - task: name: Speech Recognition type: automatic-speech-recognition dataset: name: Common Voice fa type: common_voice args: fa metrics: - name: Test WER type: wer value: 30.12 - name: Test CER type: cer value: 7.37 ---

license:apache-2.0
2,511,662
23

wav2vec2-large-xlsr-53-polish

--- language: pl license: apache-2.0 datasets: - common_voice - mozilla-foundation/common_voice_6_0 metrics: - wer - cer tags: - audio - automatic-speech-recognition - hf-asr-leaderboard - mozilla-foundation/common_voice_6_0 - pl - robust-speech-event - speech - xlsr-fine-tuning-week model-index: - name: XLSR Wav2Vec2 Polish by Jonatas Grosman results: - task: name: Automatic Speech Recognition type: automatic-speech-recognition dataset: name: Common Voice pl type: common_voice args: pl metrics:

license:apache-2.0
1,890,741
11

wav2vec2-large-xlsr-53-greek

--- language: el datasets: - common_voice metrics: - wer - cer tags: - audio - automatic-speech-recognition - speech - xlsr-fine-tuning-week license: apache-2.0 model-index: - name: XLSR Wav2Vec2 Greek by Jonatas Grosman results: - task: name: Speech Recognition type: automatic-speech-recognition dataset: name: Common Voice el type: common_voice args: el metrics: - name: Test WER type: wer value: 11.62 - name: Test CER type: cer value: 3.36 ---

license:apache-2.0
496,967
3

wav2vec2-large-xlsr-53-hungarian

--- language: hu datasets: - common_voice metrics: - wer - cer tags: - audio - automatic-speech-recognition - speech - xlsr-fine-tuning-week license: apache-2.0 model-index: - name: XLSR Wav2Vec2 Hungarian by Jonatas Grosman results: - task: name: Speech Recognition type: automatic-speech-recognition dataset: name: Common Voice hu type: common_voice args: hu metrics: - name: Test WER type: wer value: 31.40 - name: Test CER type: cer value: 6.20 ---

license:apache-2.0
396,231
10

wav2vec2-large-xlsr-53-english

--- language: en datasets: - common_voice - mozilla-foundation/common_voice_6_0 metrics: - wer - cer tags: - audio - automatic-speech-recognition - en - hf-asr-leaderboard - mozilla-foundation/common_voice_6_0 - robust-speech-event - speech - xlsr-fine-tuning-week license: apache-2.0 model-index: - name: XLSR Wav2Vec2 English by Jonatas Grosman results: - task: name: Automatic Speech Recognition type: automatic-speech-recognition dataset: name: Common Voice en type: common_voice args: en metrics

license:apache-2.0
183,882
475

wav2vec2-xls-r-1b-portuguese

NaNK
license:apache-2.0
112,905
14

wav2vec2-large-xlsr-53-finnish

Fine-tuned XLSR-53 large model for speech recognition in Finnish Fine-tuned facebook/wav2vec2-large-xlsr-53 on Finnish using the train and validation splits of Common Voice 6.1 and CSS10. When using this model, make sure that your speech input is sampled at 16kHz. This model has been fine-tuned thanks to the GPU credits generously given by the OVHcloud :) The script used for training can be found here: https://github.com/jonatasgrosman/wav2vec2-sprint The model can be used directly (without a language model) as follows... | Reference | Prediction | | ------------- | ------------- | | MYSTEERIMIES OLI OPPINUT MORAALINSA TARUISTA, ELOKUVISTA JA PELEISTÄ. | MYSTEERIMIES OLI OPPINUT MORALINSA TARUISTA ELOKUVISTA JA PELEISTÄ | | ÄÄNESTIN MIETINNÖN PUOLESTA! | ÄÄNESTIN MIETINNÖN PUOLESTA | | VAIN TUNTIA AIKAISEMMIN OLIMME MIEHENI KANSSA TUNTENEET SUURINTA ILOA. | PAIN TUNTIA AIKAISEMMIN OLIN MIEHENI KANSSA TUNTENEET SUURINTA ILAA | | ENSIMMÄISELLE MIEHELLE SAI KOLME LASTA. | ENSIMMÄISELLE MIEHELLE SAI KOLME LASTA | | ÄÄNESTIN MIETINNÖN PUOLESTA, SILLÄ POHJIMMILTAAN SIINÄ VASTUSTETAAN TÄTÄ SUUNTAUSTA. | ÄÄNESTIN MIETINNÖN PUOLESTA SILLÄ POHJIMMILTAAN SIINÄ VASTOTTETAAN TÄTÄ SUUNTAUSTA | | TÄHDENLENTOJENKO VARALTA MINÄ SEN OLISIN TÄNNE KUSKANNUT? | TÄHDEN LENTOJENKO VARALTA MINÄ SEN OLISIN TÄNNE KUSKANNUT | | SIITÄ SE TULEE. | SIITA SE TULEE | | NIIN, KUULUU KIROUS, JA KAUHEA KARJAISU. | NIIN KUULUU KIROUS JA KAUHEA KARJAISU | | ARKIT KUN OVAT NÄES ELEMENTTIRAKENTEISIA. | ARKIT KUN OVAT MÄISS' ELÄMÄTTEROKENTEISIÄ | | JÄIN ALUKSEN SISÄÄN, MUTTA KUULIN OVEN LÄPI, ETTÄ ULKOPUOLELLA ALKOI TAPAHTUA. | JAKALOKSEHÄN SISÄL MUTTA KUULIN OVENLAPI ETTÄ ULKA KUOLLALLA ALKOI TAPAHTUA | The model can be evaluated as follows on the Finnish test data of Common Voice. In the table below I report the Word Error Rate (WER) and the Character Error Rate (CER) of the model. I ran the evaluation script described above on other models as well (on 2021-04-21). Note that the table below may show different results from those already reported, this may have been caused due to some specificity of the other evaluation scripts used. | Model | WER | CER | | ------------- | ------------- | ------------- | | aapot/wav2vec2-large-xlsr-53-finnish | 32.51% | 5.34% | | Tommi/wav2vec2-large-xlsr-53-finnish | 35.22% | 5.81% | | vasilis/wav2vec2-large-xlsr-53-finnish | 38.24% | 6.49% | | jonatasgrosman/wav2vec2-large-xlsr-53-finnish | 41.60% | 8.23% | | birgermoell/wav2vec2-large-xlsr-finnish | 53.51% | 9.18% | Citation If you want to cite this model you can use this:

license:apache-2.0
84,253
1

wav2vec2-large-xlsr-53-spanish

Fine-tuned XLSR-53 large model for speech recognition in Spanish Fine-tuned facebook/wav2vec2-large-xlsr-53 on Spanish using the train and validation splits of Common Voice 6.1. When using this model, make sure that your speech input is sampled at 16kHz. This model has been fine-tuned thanks to the GPU credits generously given by the OVHcloud :) The script used for training can be found here: https://github.com/jonatasgrosman/wav2vec2-sprint The model can be used directly (without a language model) as follows... | Reference | Prediction | | ------------- | ------------- | | HABITA EN AGUAS POCO PROFUNDAS Y ROCOSAS. | HABITAN AGUAS POCO PROFUNDAS Y ROCOSAS | | OPERA PRINCIPALMENTE VUELOS DE CABOTAJE Y REGIONALES DE CARGA. | OPERA PRINCIPALMENTE VUELO DE CARBOTAJES Y REGIONALES DE CARGAN | | PARA VISITAR CONTACTAR PRIMERO CON LA DIRECCIÓN. | PARA VISITAR CONTACTAR PRIMERO CON LA DIRECCIÓN | | TRES | TRES | | REALIZÓ LOS ESTUDIOS PRIMARIOS EN FRANCIA, PARA CONTINUAR LUEGO EN ESPAÑA. | REALIZÓ LOS ESTUDIOS PRIMARIOS EN FRANCIA PARA CONTINUAR LUEGO EN ESPAÑA | | EN LOS AÑOS QUE SIGUIERON, ESTE TRABAJO ESPARTA PRODUJO DOCENAS DE BUENOS JUGADORES. | EN LOS AÑOS QUE SIGUIERON ESTE TRABAJO ESPARTA PRODUJO DOCENA DE BUENOS JUGADORES | | SE ESTÁ TRATANDO DE RECUPERAR SU CULTIVO EN LAS ISLAS CANARIAS. | SE ESTÓ TRATANDO DE RECUPERAR SU CULTIVO EN LAS ISLAS CANARIAS | | SÍ | SÍ | | "FUE ""SACADA"" DE LA SERIE EN EL EPISODIO ""LEAD"", EN QUE ALEXANDRA CABOT REGRESÓ." | FUE SACADA DE LA SERIE EN EL EPISODIO LEED EN QUE ALEXANDRA KAOT REGRESÓ | | SE UBICAN ESPECÍFICAMENTE EN EL VALLE DE MOKA, EN LA PROVINCIA DE BIOKO SUR. | SE UBICAN ESPECÍFICAMENTE EN EL VALLE DE MOCA EN LA PROVINCIA DE PÍOCOSUR | 1. To evaluate on `mozilla-foundation/commonvoice60` with split `test` 2. To evaluate on `speech-recognition-community-v2/devdata` Citation If you want to cite this model you can use this:

license:apache-2.0
28,720
30

wav2vec2-large-xlsr-53-german

Fine-tuned XLSR-53 large model for speech recognition in German Fine-tuned facebook/wav2vec2-large-xlsr-53 on German using the train and validation splits of Common Voice 6.1. When using this model, make sure that your speech input is sampled at 16kHz. This model has been fine-tuned thanks to the GPU credits generously given by the OVHcloud :) The script used for training can be found here: https://github.com/jonatasgrosman/wav2vec2-sprint The model can be used directly (without a language model) as follows... | Reference | Prediction | | ------------- | ------------- | | ZIEHT EUCH BITTE DRAUSSEN DIE SCHUHE AUS. | ZIEHT EUCH BITTE DRAUSSEN DIE SCHUHE AUS | | ES KOMMT ZUM SHOWDOWN IN GSTAAD. | ES KOMMT ZUG STUNDEDAUTENESTERKT | | IHRE FOTOSTRECKEN ERSCHIENEN IN MODEMAGAZINEN WIE DER VOGUE, HARPER’S BAZAAR UND MARIE CLAIRE. | IHRE FOTELSTRECKEN ERSCHIENEN MIT MODEMAGAZINEN WIE DER VALG AT DAS BASIN MA RIQUAIR | | FELIPE HAT EINE AUCH FÜR MONARCHEN UNGEWÖHNLICH LANGE TITELLISTE. | FELIPPE HAT EINE AUCH FÜR MONACHEN UNGEWÖHNLICH LANGE TITELLISTE | | ER WURDE ZU EHREN DES REICHSKANZLERS OTTO VON BISMARCK ERRICHTET. | ER WURDE ZU EHREN DES REICHSKANZLERS OTTO VON BISMARCK ERRICHTET M | | WAS SOLLS, ICH BIN BEREIT. | WAS SOLL'S ICH BIN BEREIT | | DAS INTERNET BESTEHT AUS VIELEN COMPUTERN, DIE MITEINANDER VERBUNDEN SIND. | DAS INTERNET BESTEHT AUS VIELEN COMPUTERN DIE MITEINANDER VERBUNDEN SIND | | DER URANUS IST DER SIEBENTE PLANET IN UNSEREM SONNENSYSTEM. | DER URANUS IST DER SIEBENTE PLANET IN UNSEREM SONNENSYSTEM | | DIE WAGEN ERHIELTEN EIN EINHEITLICHES ERSCHEINUNGSBILD IN WEISS MIT ROTEM FENSTERBAND. | DIE WAGEN ERHIELTEN EIN EINHEITLICHES ERSCHEINUNGSBILD IN WEISS MIT ROTEM FENSTERBAND | | SIE WAR DIE COUSINE VON CARL MARIA VON WEBER. | SIE WAR DIE COUSINE VON KARL-MARIA VON WEBER | 1. To evaluate on `mozilla-foundation/commonvoice60` with split `test` 2. To evaluate on `speech-recognition-community-v2/devdata` Citation If you want to cite this model you can use this:

license:apache-2.0
18,226
7

wav2vec2-large-xlsr-53-french

license:apache-2.0
11,632
12

wav2vec2-xls-r-1b-english

Fine-tuned XLS-R 1B model for speech recognition in English Fine-tuned facebook/wav2vec2-xls-r-1b on English using the train and validation splits of Common Voice 8.0, Multilingual LibriSpeech, TED-LIUMv3, and Voxpopuli. When using this model, make sure that your speech input is sampled at 16kHz. This model has been fine-tuned by the HuggingSound tool, and thanks to the GPU credits generously given by the OVHcloud :) 1. To evaluate on `mozilla-foundation/commonvoice80` with split `test` 2. To evaluate on `speech-recognition-community-v2/devdata` Citation If you want to cite this model you can use this:

NaNK
license:apache-2.0
2,706
9

wav2vec2-xls-r-1b-russian

Fine-tuned XLS-R 1B model for speech recognition in Russian Fine-tuned facebook/wav2vec2-xls-r-1b on Russian using the train and validation splits of Common Voice 8.0, Golos, and Multilingual TEDx. When using this model, make sure that your speech input is sampled at 16kHz. This model has been fine-tuned by the HuggingSound tool, and thanks to the GPU credits generously given by the OVHcloud :) 1. To evaluate on `mozilla-foundation/commonvoice80` with split `test` 2. To evaluate on `speech-recognition-community-v2/devdata` Citation If you want to cite this model you can use this:

NaNK
license:apache-2.0
2,354
16

wav2vec2-xls-r-1b-spanish

Fine-tuned XLS-R 1B model for speech recognition in Spanish Fine-tuned facebook/wav2vec2-xls-r-1b on Spanish using the train and validation splits of Common Voice 8.0, MediaSpeech, Multilingual TEDx, Multilingual LibriSpeech, and Voxpopuli. When using this model, make sure that your speech input is sampled at 16kHz. This model has been fine-tuned by the HuggingSound tool, and thanks to the GPU credits generously given by the OVHcloud :) 1. To evaluate on `mozilla-foundation/commonvoice80` with split `test` 2. To evaluate on `speech-recognition-community-v2/devdata` Citation If you want to cite this model you can use this:

NaNK
license:apache-2.0
1,703
6

wav2vec2-large-xlsr-53-italian

license:apache-2.0
1,227
14

wav2vec2-xls-r-1b-german

NaNK
license:apache-2.0
417
3

exp_w2v2t_en_wavlm_s990

license:apache-2.0
130
0

wav2vec2-xls-r-1b-french

NaNK
license:apache-2.0
110
8

whisper-large-zh-cv11

This model is a fine-tuned version of openai/whisper-large-v2 on Chinese (Mandarin) using the train and validation splits of Common Voice 11. Not all validation split data were used during training, I extracted 1k samples from the validation split to be used for evaluation during fine-tuning. I've performed the evaluation of the model using the test split of two datasets, the Common Voice 11 (same dataset used for the fine-tuning) and the Fleurs (dataset not seen during the fine-tuning). As Whisper can transcribe casing and punctuation, I've performed the model evaluation in 2 different scenarios, one using the raw text and the other using the normalized text (lowercase + removal of punctuations). Additionally, for the Fleurs dataset, I've evaluated the model in a scenario where there are no transcriptions of numerical values since the way these values are described in this dataset is different from how they are described in the dataset used in fine-tuning (Common Voice), so it is expected that this difference in the way of describing numerical values will affect the performance of the model for this type of transcription in Fleurs. | | CER | WER | | --- | --- | --- | | jonatasgrosman/whisper-large-zh-cv11 | 9.31 | 55.94 | | jonatasgrosman/whisper-large-zh-cv11 + text normalization | 9.55 | 55.02 | | openai/whisper-large-v2 | 33.33 | 101.80 | | openai/whisper-large-v2 + text normalization | 29.90 | 95.91 | | | CER | WER | | --- | --- | --- | | jonatasgrosman/whisper-large-zh-cv11 | 15.00 | 93.45 | | jonatasgrosman/whisper-large-zh-cv11 + text normalization | 11.76 | 70.63 | | jonatasgrosman/whisper-large-zh-cv11 + keep only non-numeric samples | 10.95 | 87.91 | | jonatasgrosman/whisper-large-zh-cv11 + text normalization + keep only non-numeric samples | 7.83 | 62.12 | | openai/whisper-large-v2 | 23.49 | 101.28 | | openai/whisper-large-v2 + text normalization | 17.58 | 83.22 | | openai/whisper-large-v2 + keep only non-numeric samples | 21.03 | 101.95 | | openai/whisper-large-v2 + text normalization + keep only non-numeric samples | 15.22 | 79.28 |

license:apache-2.0
101
78

whisper-large-pt-cv11

license:apache-2.0
95
13

wav2vec2-large-english

license:apache-2.0
75
5

wav2vec2-large-fr-voxpopuli-french

license:apache-2.0
43
3

wav2vec2-xls-r-1b-italian

NaNK
license:apache-2.0
38
1

exp_w2v2t_ru_unispeech_s42

license:apache-2.0
25
0

wav2vec2-xls-r-1b-polish

NaNK
license:apache-2.0
22
1

wav2vec2-xls-r-1b-dutch

NaNK
license:apache-2.0
12
2

exp_w2v2t_pt_hubert_s807

license:apache-2.0
11
0

exp_w2v2t_es_wavlm_s115

license:apache-2.0
6
0

whisper-large-fr-cv11

license:apache-2.0
5
1

exp_w2v2t_uk_vp-sv_s428

license:apache-2.0
5
0

exp_w2v2t_pt_vp-it_s996

license:apache-2.0
5
0

bartuque-bart-base-pretrained-r-2

4
0

bartuque-bart-base-pretrained-rm-2

4
0

paraphrase

4
0

exp_w2v2t_en_wavlm_s461

license:apache-2.0
4
0

exp_w2v2t_it_no-pretraining_s615

license:apache-2.0
4
0

exp_w2v2t_it_unispeech-ml_s784

license:apache-2.0
4
0

exp_w2v2t_fa_vp-100k_s88

license:apache-2.0
4
0

exp_w2v2t_es_wavlm_s26

license:apache-2.0
4
0

exp_w2v2t_ru_wavlm_s363

license:apache-2.0
3
1

bartuque-bart-base-pretrained-mm-2

3
0

bartuque-bart-base-random-r-2

3
0

exp_w2v2t_th_wavlm_s847

license:apache-2.0
3
0

exp_w2v2t_th_unispeech-ml_s256

license:apache-2.0
3
0

exp_w2v2t_it_wavlm_s895

license:apache-2.0
3
0

exp_w2v2t_fa_hubert_s889

license:apache-2.0
3
0

exp_w2v2t_fa_wavlm_s527

license:apache-2.0
3
0

exp_w2v2t_de_unispeech-sat_s75

license:apache-2.0
3
0

exp_w2v2t_de_vp-it_s962

license:apache-2.0
3
0

exp_w2v2t_ar_hubert_s947

license:apache-2.0
3
0

exp_w2v2t_ar_unispeech-ml_s365

license:apache-2.0
3
0

exp_w2v2t_es_no-pretraining_s953

license:apache-2.0
3
0

whisper-small-pt-cv11-v7

license:apache-2.0
2
1

whisper-large-es-cv11

license:apache-2.0
2
1

exp_w2v2t_en_unispeech-ml_s103

license:apache-2.0
2
0

exp_w2v2t_en_unispeech-sat_s459

license:apache-2.0
2
0

exp_w2v2t_th_unispeech_s328

license:apache-2.0
2
0

exp_w2v2t_th_hubert_s533

license:apache-2.0
2
0

exp_w2v2t_th_vp-sv_s635

license:apache-2.0
2
0

exp_w2v2t_th_unispeech-ml_s640

license:apache-2.0
2
0

exp_w2v2t_th_vp-es_s26

license:apache-2.0
2
0

exp_w2v2t_th_vp-es_s51

license:apache-2.0
2
0

exp_w2v2t_th_vp-it_s259

license:apache-2.0
2
0

exp_w2v2t_ja_unispeech_s947

license:apache-2.0
2
0

exp_w2v2t_it_unispeech_s156

license:apache-2.0
2
0

exp_w2v2t_it_vp-nl_s27

license:apache-2.0
2
0

exp_w2v2t_it_vp-nl_s335

license:apache-2.0
2
0

exp_w2v2t_it_unispeech-sat_s500

license:apache-2.0
2
0

exp_w2v2t_fr_unispeech-sat_s115

license:apache-2.0
2
0

exp_w2v2t_sv-se_unispeech_s149

license:apache-2.0
2
0

exp_w2v2t_sv-se_vp-it_s817

license:apache-2.0
2
0

exp_w2v2t_fa_wav2vec2_s168

license:apache-2.0
2
0

exp_w2v2t_fa_wavlm_s545

license:apache-2.0
2
0

exp_w2v2t_uk_vp-es_s211

license:apache-2.0
2
0

exp_w2v2t_pl_r-wav2vec2_s72

license:apache-2.0
2
0

exp_w2v2t_nl_unispeech-sat_s715

license:apache-2.0
2
0

exp_w2v2t_es_r-wav2vec2_s870

license:apache-2.0
2
0

exp_w2v2t_pt_wavlm_s691

license:apache-2.0
2
0

exp_w2v2t_pt_wavlm_s118

license:apache-2.0
2
0

exp_w2v2t_pt_xls-r_s689

license:apache-2.0
2
0

exp_w2v2t_pt_vp-it_s738

license:apache-2.0
2
0

exp_w2v2r_en_vp-100k_gender_male-0_female-10_s980

license:apache-2.0
2
0

exp_w2v2r_de_xls-r_accent_germany-8_austria-2_s941

license:apache-2.0
2
0

exp_w2v2r_de_xls-r_gender_male-8_female-2_s949

license:apache-2.0
2
0

exp_w2v2r_es_xls-r_gender_male-10_female-0_s840

license:apache-2.0
2
0

exp_w2v2r_en_vp-100k_age_teens-10_sixties-0_s368

license:apache-2.0
2
0

whisper-small-pt-cv11-v6

license:apache-2.0
2
0

exp_w2v2t_ja_wavlm_s729

license:apache-2.0
1
2

exp_w2v2t_ja_wavlm_s35

license:apache-2.0
1
1

exp_w2v2r_de_xls-r_accent_germany-0_austria-10_s350

license:apache-2.0
1
1

exp_w2v2t_en_unispeech_s870

license:apache-2.0
1
0

exp_w2v2t_en_unispeech_s227

license:apache-2.0
1
0

exp_w2v2t_en_unispeech_s809

license:apache-2.0
1
0

exp_w2v2t_en_unispeech-ml_s377

license:apache-2.0
1
0

exp_w2v2t_en_vp-es_s952

license:apache-2.0
1
0

exp_w2v2t_en_unispeech-sat_s456

license:apache-2.0
1
0

exp_w2v2t_en_vp-it_s859

license:apache-2.0
1
0

exp_w2v2t_th_vp-100k_s630

license:apache-2.0
1
0

exp_w2v2t_th_unispeech_s624

license:apache-2.0
1
0

exp_w2v2t_th_unispeech_s131

license:apache-2.0
1
0

exp_w2v2t_th_hubert_s817

license:apache-2.0
1
0

exp_w2v2t_th_vp-sv_s946

license:apache-2.0
1
0

exp_w2v2t_th_no-pretraining_s950

license:apache-2.0
1
0

exp_w2v2t_th_no-pretraining_s414

license:apache-2.0
1
0

exp_w2v2t_th_no-pretraining_s156

license:apache-2.0
1
0

exp_w2v2t_th_wavlm_s108

license:apache-2.0
1
0

exp_w2v2t_th_wavlm_s904

license:apache-2.0
1
0

exp_w2v2t_th_unispeech-ml_s351

license:apache-2.0
1
0

exp_w2v2t_th_vp-fr_s761

license:apache-2.0
1
0

exp_w2v2t_th_unispeech-sat_s658

license:apache-2.0
1
0

exp_w2v2t_th_unispeech-sat_s515

license:apache-2.0
1
0

exp_w2v2t_th_vp-it_s334

license:apache-2.0
1
0

exp_w2v2t_ja_wav2vec2_s834

license:apache-2.0
1
0

exp_w2v2t_ja_unispeech_s569

license:apache-2.0
1
0

exp_w2v2t_ja_unispeech_s253

license:apache-2.0
1
0

exp_w2v2t_ja_unispeech-ml_s295

license:apache-2.0
1
0

exp_w2v2t_ja_unispeech-sat_s884

license:apache-2.0
1
0

exp_w2v2t_ja_unispeech-sat_s946

license:apache-2.0
1
0

exp_w2v2t_ja_unispeech-sat_s635

license:apache-2.0
1
0

exp_w2v2t_it_unispeech_s714

license:apache-2.0
1
0

exp_w2v2t_it_unispeech_s626

license:apache-2.0
1
0

exp_w2v2t_it_hubert_s21

license:apache-2.0
1
0

exp_w2v2t_it_hubert_s474

license:apache-2.0
1
0

exp_w2v2t_it_vp-sv_s791

license:apache-2.0
1
0

exp_w2v2t_it_vp-sv_s1

license:apache-2.0
1
0

exp_w2v2t_it_no-pretraining_s842

license:apache-2.0
1
0

exp_w2v2t_it_no-pretraining_s764

license:apache-2.0
1
0

exp_w2v2t_it_wavlm_s662

license:apache-2.0
1
0

exp_w2v2t_it_unispeech-ml_s213

license:apache-2.0
1
0

exp_w2v2t_it_unispeech-ml_s246

license:apache-2.0
1
0

exp_w2v2t_it_vp-fr_s821

license:apache-2.0
1
0

exp_w2v2t_it_vp-fr_s557

license:apache-2.0
1
0

exp_w2v2t_it_vp-es_s33

license:apache-2.0
1
0

exp_w2v2t_it_vp-nl_s222

license:apache-2.0
1
0

exp_w2v2t_it_unispeech-sat_s692

license:apache-2.0
1
0

exp_w2v2t_it_vp-it_s324

license:apache-2.0
1
0

exp_w2v2t_it_vp-it_s965

license:apache-2.0
1
0

exp_w2v2t_fr_vp-100k_s688

license:apache-2.0
1
0

exp_w2v2t_fr_unispeech_s833

license:apache-2.0
1
0

exp_w2v2t_fr_unispeech_s42

license:apache-2.0
1
0

exp_w2v2t_fr_wavlm_s766

license:apache-2.0
1
0

exp_w2v2t_fr_wavlm_s208

license:apache-2.0
1
0

exp_w2v2t_fr_unispeech-ml_s51

license:apache-2.0
1
0

exp_w2v2t_fr_unispeech-ml_s159

license:apache-2.0
1
0

exp_w2v2t_fr_unispeech-ml_s614

license:apache-2.0
1
0

exp_w2v2t_fr_unispeech-sat_s26

license:apache-2.0
1
0

exp_w2v2t_fr_vp-it_s924

license:apache-2.0
1
0

exp_w2v2t_sv-se_wav2vec2_s451

license:apache-2.0
1
0

exp_w2v2t_sv-se_wav2vec2_s732

license:apache-2.0
1
0

exp_w2v2t_sv-se_vp-100k_s904

license:apache-2.0
1
0

exp_w2v2t_sv-se_xlsr-53_s328

license:apache-2.0
1
0

exp_w2v2t_sv-se_unispeech_s449

license:apache-2.0
1
0

exp_w2v2t_sv-se_wavlm_s132

license:apache-2.0
1
0

exp_w2v2t_sv-se_wavlm_s42

license:apache-2.0
1
0

exp_w2v2t_sv-se_wavlm_s607

license:apache-2.0
1
0

exp_w2v2t_sv-se_unispeech-ml_s35

license:apache-2.0
1
0

exp_w2v2t_sv-se_unispeech-ml_s729

license:apache-2.0
1
0

exp_w2v2t_sv-se_unispeech-ml_s664

license:apache-2.0
1
0

exp_w2v2t_sv-se_unispeech-sat_s515

license:apache-2.0
1
0

exp_w2v2t_sv-se_unispeech-sat_s772

license:apache-2.0
1
0

exp_w2v2t_sv-se_unispeech-sat_s658

license:apache-2.0
1
0

exp_w2v2t_sv-se_xls-r_s610

license:apache-2.0
1
0

exp_w2v2t_sv-se_xls-r_s946

license:apache-2.0
1
0

exp_w2v2t_sv-se_r-wav2vec2_s418

license:apache-2.0
1
0

exp_w2v2t_sv-se_vp-it_s533

license:apache-2.0
1
0

exp_w2v2t_fa_unispeech_s211

license:apache-2.0
1
0

exp_w2v2t_fa_unispeech_s108

license:apache-2.0
1
0

exp_w2v2t_fa_unispeech_s364

license:apache-2.0
1
0

exp_w2v2t_fa_unispeech-ml_s195

license:apache-2.0
1
0

exp_w2v2t_fa_unispeech-ml_s408

license:apache-2.0
1
0

exp_w2v2t_fa_unispeech-ml_s998

license:apache-2.0
1
0

exp_w2v2t_fa_unispeech-sat_s803

license:apache-2.0
1
0

exp_w2v2t_fa_unispeech-sat_s3

license:apache-2.0
1
0

exp_w2v2t_fa_unispeech-sat_s95

license:apache-2.0
1
0

exp_w2v2t_fa_xls-r_s44

license:apache-2.0
1
0

exp_w2v2t_zh-cn_xlsr-53_s533

license:apache-2.0
1
0

exp_w2v2t_zh-cn_wavlm_s677

license:apache-2.0
1
0

exp_w2v2t_zh-cn_wavlm_s368

license:apache-2.0
1
0

exp_w2v2t_zh-cn_unispeech-ml_s515

license:apache-2.0
1
0

exp_w2v2t_zh-cn_unispeech-ml_s772

license:apache-2.0
1
0

exp_w2v2t_zh-cn_unispeech-ml_s658

license:apache-2.0
1
0

exp_w2v2t_zh-cn_unispeech-sat_s762

license:apache-2.0
1
0

exp_w2v2t_zh-cn_unispeech-sat_s840

license:apache-2.0
1
0

exp_w2v2t_zh-cn_xls-r_s108

license:apache-2.0
1
0

exp_w2v2t_zh-cn_r-wav2vec2_s237

license:apache-2.0
1
0

exp_w2v2t_id_unispeech_s791

license:apache-2.0
1
0

exp_w2v2t_id_unispeech_s1

license:apache-2.0
1
0

exp_w2v2t_id_unispeech_s149

license:apache-2.0
1
0

exp_w2v2t_id_wavlm_s557

license:apache-2.0
1
0

exp_w2v2t_id_wavlm_s821

license:apache-2.0
1
0

exp_w2v2t_id_unispeech-ml_s418

license:apache-2.0
1
0

exp_w2v2t_id_vp-es_s425

license:apache-2.0
1
0

exp_w2v2t_id_unispeech-sat_s477

license:apache-2.0
1
0

exp_w2v2t_de_wavlm_s295

license:apache-2.0
1
0

exp_w2v2t_de_wavlm_s101

license:apache-2.0
1
0

exp_w2v2t_de_unispeech-ml_s952

license:apache-2.0
1
0

exp_w2v2t_de_unispeech-sat_s968

license:apache-2.0
1
0

exp_w2v2t_de_unispeech-sat_s480

license:apache-2.0
1
0

exp_w2v2t_uk_unispeech_s558

license:apache-2.0
1
0

exp_w2v2t_uk_unispeech_s607

license:apache-2.0
1
0

exp_w2v2t_uk_wavlm_s722

license:apache-2.0
1
0

exp_w2v2t_uk_wavlm_s21

license:apache-2.0
1
0

exp_w2v2t_uk_unispeech-ml_s417

license:apache-2.0
1
0

exp_w2v2t_uk_unispeech-ml_s156

license:apache-2.0
1
0

exp_w2v2t_uk_unispeech-ml_s226

license:apache-2.0
1
0

exp_w2v2t_uk_unispeech-sat_s222

license:apache-2.0
1
0

exp_w2v2t_uk_unispeech-sat_s27

license:apache-2.0
1
0

exp_w2v2t_uk_unispeech-sat_s335

license:apache-2.0
1
0

exp_w2v2t_ar_unispeech_s574

license:apache-2.0
1
0

exp_w2v2t_ar_unispeech_s474

license:apache-2.0
1
0

exp_w2v2t_ar_vp-sv_s953

license:apache-2.0
1
0

exp_w2v2t_ar_wavlm_s95

license:apache-2.0
1
0

exp_w2v2t_ar_wavlm_s3

license:apache-2.0
1
0

exp_w2v2t_ar_unispeech-sat_s504

license:apache-2.0
1
0

exp_w2v2t_ar_unispeech-sat_s75

license:apache-2.0
1
0

exp_w2v2t_pl_unispeech_s622

license:apache-2.0
1
0

exp_w2v2t_pl_wavlm_s250

license:apache-2.0
1
0

exp_w2v2t_pl_wavlm_s515

license:apache-2.0
1
0

exp_w2v2t_pl_wavlm_s859

license:apache-2.0
1
0

exp_w2v2t_pl_unispeech-ml_s463

license:apache-2.0
1
0

exp_w2v2t_pl_unispeech-ml_s362

license:apache-2.0
1
0

exp_w2v2t_pl_unispeech-ml_s240

license:apache-2.0
1
0

exp_w2v2t_pl_unispeech-sat_s695

license:apache-2.0
1
0

exp_w2v2t_pl_unispeech-sat_s961

license:apache-2.0
1
0

exp_w2v2t_pl_vp-it_s474

license:apache-2.0
1
0

exp_w2v2t_et_unispeech_s605

license:apache-2.0
1
0

exp_w2v2t_et_unispeech_s86

license:apache-2.0
1
0

exp_w2v2t_et_wavlm_s753

license:apache-2.0
1
0

exp_w2v2t_et_wavlm_s887

license:apache-2.0
1
0

exp_w2v2t_et_unispeech-ml_s779

license:apache-2.0
1
0

exp_w2v2t_et_unispeech-sat_s364

license:apache-2.0
1
0

exp_w2v2t_et_unispeech-sat_s108

license:apache-2.0
1
0

exp_w2v2t_et_unispeech-sat_s211

license:apache-2.0
1
0

exp_w2v2t_nl_unispeech_s853

license:apache-2.0
1
0

exp_w2v2t_nl_unispeech_s493

license:apache-2.0
1
0

exp_w2v2t_nl_unispeech_s683

license:apache-2.0
1
0

exp_w2v2t_nl_wavlm_s784

license:apache-2.0
1
0

exp_w2v2t_nl_unispeech-ml_s23

license:apache-2.0
1
0

exp_w2v2t_nl_unispeech-sat_s81

license:apache-2.0
1
0

exp_w2v2t_nl_unispeech-sat_s775

license:apache-2.0
1
0

exp_w2v2t_ru_unispeech_s132

license:apache-2.0
1
0

exp_w2v2t_ru_unispeech_s607

license:apache-2.0
1
0

exp_w2v2t_ru_wavlm_s331

license:apache-2.0
1
0

exp_w2v2t_ru_unispeech-ml_s947

license:apache-2.0
1
0

exp_w2v2t_ru_unispeech-ml_s569

license:apache-2.0
1
0

exp_w2v2t_ru_unispeech-ml_s253

license:apache-2.0
1
0

exp_w2v2t_ru_unispeech-sat_s423

license:apache-2.0
1
0

exp_w2v2t_ru_unispeech-sat_s418

license:apache-2.0
1
0

exp_w2v2t_ru_unispeech-sat_s160

license:apache-2.0
1
0

exp_w2v2t_es_unispeech_s767

license:apache-2.0
1
0

exp_w2v2t_es_unispeech_s461

license:apache-2.0
1
0

exp_w2v2t_es_wavlm_s655

license:apache-2.0
1
0

exp_w2v2t_es_unispeech-ml_s186

license:apache-2.0
1
0

exp_w2v2t_es_vp-fr_s281

license:apache-2.0
1
0

exp_w2v2t_es_vp-it_s438

license:apache-2.0
1
0

exp_w2v2t_pt_unispeech_s952

license:apache-2.0
1
0

exp_w2v2t_pt_unispeech-ml_s808

license:apache-2.0
1
0

exp_w2v2t_pt_unispeech-sat_s377

license:apache-2.0
1
0

exp_w2v2t_pt_unispeech-sat_s103

license:apache-2.0
1
0

exp_w2v2t_pt_xls-r_s657

license:apache-2.0
1
0

exp_w2v2t_pt_vp-it_s529

license:apache-2.0
1
0

exp_w2v2r_es_vp-100k_gender_male-5_female-5_s966

license:apache-2.0
1
0

exp_w2v2r_es_vp-100k_gender_male-8_female-2_s417

license:apache-2.0
1
0

exp_w2v2r_de_xls-r_gender_male-2_female-8_s755

license:apache-2.0
1
0

exp_w2v2r_en_xls-r_accent_us-10_england-0_s253

license:apache-2.0
1
0

exp_w2v2r_en_xls-r_gender_male-2_female-8_s201

license:apache-2.0
1
0

exp_w2v2r_es_xls-r_accent_surpeninsular-8_nortepeninsular-2_s187

license:apache-2.0
1
0

exp_w2v2r_es_xls-r_gender_male-5_female-5_s932

license:apache-2.0
1
0

exp_w2v2r_es_xls-r_gender_male-0_female-10_s961

license:apache-2.0
1
0

exp_w2v2r_es_xls-r_gender_male-2_female-8_s786

license:apache-2.0
1
0

exp_w2v2r_es_xls-r_gender_male-8_female-2_s235

license:apache-2.0
1
0

exp_w2v2r_fr_xls-r_accent_france-0_belgium-10_s513

license:apache-2.0
1
0

exp_w2v2r_fr_xls-r_gender_male-0_female-10_s412

license:apache-2.0
1
0

exp_w2v2r_fr_xls-r_gender_male-2_female-8_s295

license:apache-2.0
1
0

exp_w2v2r_fr_xls-r_gender_male-8_female-2_s659

license:apache-2.0
1
0

exp_w2v2r_fr_vp-100k_age_teens-10_sixties-0_s451

license:apache-2.0
1
0

exp_w2v2r_fr_vp-100k_age_teens-10_sixties-0_s818

license:apache-2.0
1
0

exp_w2v2t_it_hubert_s722

license:apache-2.0
0
2

exp_w2v2t_ru_hubert_s818

license:apache-2.0
0
2

exp_w2v2t_ja_wavlm_s664

license:apache-2.0
0
1

exp_w2v2t_zh-cn_wavlm_s596

license:apache-2.0
0
1

exp_w2v2t_ru_wav2vec2_s108

license:apache-2.0
0
1

exp_w2v2t_ru_vp-100k_s69

license:apache-2.0
0
1