jonatasgrosman
wav2vec2-large-xlsr-53-russian
Fine-tuned XLSR-53 large model for speech recognition in Russian Fine-tuned facebook/wav2vec2-large-xlsr-53 on Russian using the train and validation splits of Common Voice 6.1 and CSS10. When using this model, make sure that your speech input is sampled at 16kHz. This model has been fine-tuned thanks to the GPU credits generously given by the OVHcloud :) The script used for training can be found here: https://github.com/jonatasgrosman/wav2vec2-sprint The model can be used directly (without a language model) as follows... | Reference | Prediction | | ------------- | ------------- | | ОН РАБОТАТЬ, А ЕЕ НЕ УДЕРЖАТЬ НИКАК — БЕГАЕТ ЗА КЛЁШЕМ КАЖДОГО БУЛЬВАРНИКА. | ОН РАБОТАТЬ А ЕЕ НЕ УДЕРЖАТ НИКАК БЕГАЕТ ЗА КЛЕШОМ КАЖДОГО БУЛЬБАРНИКА | | ЕСЛИ НЕ БУДЕТ ВОЗРАЖЕНИЙ, Я БУДУ СЧИТАТЬ, ЧТО АССАМБЛЕЯ СОГЛАСНА С ЭТИМ ПРЕДЛОЖЕНИЕМ. | ЕСЛИ НЕ БУДЕТ ВОЗРАЖЕНИЙ Я БУДУ СЧИТАТЬ ЧТО АССАМБЛЕЯ СОГЛАСНА С ЭТИМ ПРЕДЛОЖЕНИЕМ | | ПАЛЕСТИНЦАМ НЕОБХОДИМО СНАЧАЛА УСТАНОВИТЬ МИР С ИЗРАИЛЕМ, А ЗАТЕМ ДОБИВАТЬСЯ ПРИЗНАНИЯ ГОСУДАРСТВЕННОСТИ. | ПАЛЕСТИНЦАМ НЕОБХОДИМО СНАЧАЛА УСТАНОВИТЬ С НИ МИР ФЕЗРЕЛЕМ А ЗАТЕМ ДОБИВАТЬСЯ ПРИЗНАНИЯ ГОСУДАРСТВЕНСКИ | | У МЕНЯ БЫЛО ТАКОЕ ЧУВСТВО, ЧТО ЧТО-ТО ТАКОЕ ОЧЕНЬ ВАЖНОЕ Я ПРИБАВЛЯЮ. | У МЕНЯ БЫЛО ТАКОЕ ЧУВСТВО ЧТО ЧТО-ТО ТАКОЕ ОЧЕНЬ ВАЖНОЕ Я ПРЕДБАВЛЯЕТ | | ТОЛЬКО ВРЯД ЛИ ПОЙМЕТ. | ТОЛЬКО ВРЯД ЛИ ПОЙМЕТ | | ВРОНСКИЙ, СЛУШАЯ ОДНИМ УХОМ, ПЕРЕВОДИЛ БИНОКЛЬ С БЕНУАРА НА БЕЛЬ-ЭТАЖ И ОГЛЯДЫВАЛ ЛОЖИ. | ЗЛАЗКИ СЛУШАЮ ОТ ОДНИМ УХАМ ТЫ ВОТИ В ВИНОКОТ СПИЛА НА ПЕРЕТАЧ И ОКЛЯДЫВАЛ БОСУ | | К СОЖАЛЕНИЮ, СИТУАЦИЯ ПРОДОЛЖАЕТ УХУДШАТЬСЯ. | К СОЖАЛЕНИЮ СИТУАЦИИ ПРОДОЛЖАЕТ УХУЖАТЬСЯ | | ВСЁ ЖАЛОВАНИЕ УХОДИЛО НА ДОМАШНИЕ РАСХОДЫ И НА УПЛАТУ МЕЛКИХ НЕПЕРЕВОДИВШИХСЯ ДОЛГОВ. | ВСЕ ЖАЛОВАНИЕ УХОДИЛО НА ДОМАШНИЕ РАСХОДЫ И НА УПЛАТУ МЕЛКИХ НЕ ПЕРЕВОДИВШИХСЯ ДОЛГОВ | | ТЕПЕРЬ ДЕЛО, КОНЕЧНО, ЗА ТЕМ, ЧТОБЫ ПРЕВРАТИТЬ СЛОВА В ДЕЛА. | ТЕПЕРЬ ДЕЛАЮ КОНЕЧНО ЗАТЕМ ЧТОБЫ ПРЕВРАТИТЬ СЛОВА В ДЕЛА | | ДЕВЯТЬ | ЛЕВЕТЬ | 1. To evaluate on `mozilla-foundation/commonvoice60` with split `test` 2. To evaluate on `speech-recognition-community-v2/devdata` Citation If you want to cite this model you can use this:
wav2vec2-large-xlsr-53-japanese
--- language: ja datasets: - common_voice metrics: - wer - cer tags: - audio - automatic-speech-recognition - speech - xlsr-fine-tuning-week license: apache-2.0 model-index: - name: XLSR Wav2Vec2 Japanese by Jonatas Grosman results: - task: name: Speech Recognition type: automatic-speech-recognition dataset: name: Common Voice ja type: common_voice args: ja metrics: - name: Test WER type: wer value: 81.80 - name: Test CER type: cer value: 20.16 ---
wav2vec2-large-xlsr-53-portuguese
Fine-tuned XLSR-53 large model for speech recognition in Portuguese Fine-tuned facebook/wav2vec2-large-xlsr-53 on Portuguese using the train and validation splits of Common Voice 6.1. When using this model, make sure that your speech input is sampled at 16kHz. This model has been fine-tuned thanks to the GPU credits generously given by the OVHcloud :) The script used for training can be found here: https://github.com/jonatasgrosman/wav2vec2-sprint The model can be used directly (without a language model) as follows... | Reference | Prediction | | ------------- | ------------- | | NEM O RADAR NEM OS OUTROS INSTRUMENTOS DETECTARAM O BOMBARDEIRO STEALTH. | NEMHUM VADAN OS OLTWES INSTRUMENTOS DE TTÉÃN UM BOMBERDEIRO OSTER | | PEDIR DINHEIRO EMPRESTADO ÀS PESSOAS DA ALDEIA | E DIR ENGINHEIRO EMPRESTAR AS PESSOAS DA ALDEIA | | OITO | OITO | | TRANCÁ-LOS | TRANCAUVOS | | REALIZAR UMA INVESTIGAÇÃO PARA RESOLVER O PROBLEMA | REALIZAR UMA INVESTIGAÇÃO PARA RESOLVER O PROBLEMA | | O YOUTUBE AINDA É A MELHOR PLATAFORMA DE VÍDEOS. | YOUTUBE AINDA É A MELHOR PLATAFOMA DE VÍDEOS | | MENINA E MENINO BEIJANDO NAS SOMBRAS | MENINA E MENINO BEIJANDO NAS SOMBRAS | | EU SOU O SENHOR | EU SOU O SENHOR | | DUAS MULHERES QUE SENTAM-SE PARA BAIXO LENDO JORNAIS. | DUAS MIERES QUE SENTAM-SE PARA BAICLANE JODNÓI | | EU ORIGINALMENTE ESPERAVA | EU ORIGINALMENTE ESPERAVA | 1. To evaluate on `mozilla-foundation/commonvoice60` with split `test` 2. To evaluate on `speech-recognition-community-v2/devdata` Citation If you want to cite this model you can use this:
wav2vec2-large-xlsr-53-arabic
--- language: ar datasets: - common_voice - arabic_speech_corpus metrics: - wer - cer tags: - audio - automatic-speech-recognition - speech - xlsr-fine-tuning-week license: apache-2.0 model-index: - name: XLSR Wav2Vec2 Arabic by Jonatas Grosman results: - task: name: Speech Recognition type: automatic-speech-recognition dataset: name: Common Voice ar type: common_voice args: ar metrics: - name: Test WER type: wer value: 39.59 - name: Test CER type: cer value: 18.18 ---
wav2vec2-large-xlsr-53-chinese-zh-cn
--- language: zh datasets: - common_voice metrics: - wer - cer tags: - audio - automatic-speech-recognition - speech - xlsr-fine-tuning-week license: apache-2.0 model-index: - name: XLSR Wav2Vec2 Chinese (zh-CN) by Jonatas Grosman results: - task: name: Speech Recognition type: automatic-speech-recognition dataset: name: Common Voice zh-CN type: common_voice args: zh-CN metrics: - name: Test WER type: wer value: 82.37 - name: Test CER type: cer value: 19.03 ---
wav2vec2-large-xlsr-53-dutch
--- language: nl license: apache-2.0 datasets: - common_voice - mozilla-foundation/common_voice_6_0 metrics: - wer - cer tags: - audio - automatic-speech-recognition - hf-asr-leaderboard - mozilla-foundation/common_voice_6_0 - nl - robust-speech-event - speech - xlsr-fine-tuning-week model-index: - name: XLSR Wav2Vec2 Dutch by Jonatas Grosman results: - task: name: Automatic Speech Recognition type: automatic-speech-recognition dataset: name: Common Voice nl type: common_voice args: nl metrics:
wav2vec2-large-xlsr-53-persian
--- language: fa datasets: - common_voice metrics: - wer - cer tags: - audio - automatic-speech-recognition - speech - xlsr-fine-tuning-week license: apache-2.0 model-index: - name: XLSR Wav2Vec2 Persian by Jonatas Grosman results: - task: name: Speech Recognition type: automatic-speech-recognition dataset: name: Common Voice fa type: common_voice args: fa metrics: - name: Test WER type: wer value: 30.12 - name: Test CER type: cer value: 7.37 ---
wav2vec2-large-xlsr-53-polish
--- language: pl license: apache-2.0 datasets: - common_voice - mozilla-foundation/common_voice_6_0 metrics: - wer - cer tags: - audio - automatic-speech-recognition - hf-asr-leaderboard - mozilla-foundation/common_voice_6_0 - pl - robust-speech-event - speech - xlsr-fine-tuning-week model-index: - name: XLSR Wav2Vec2 Polish by Jonatas Grosman results: - task: name: Automatic Speech Recognition type: automatic-speech-recognition dataset: name: Common Voice pl type: common_voice args: pl metrics:
wav2vec2-large-xlsr-53-greek
--- language: el datasets: - common_voice metrics: - wer - cer tags: - audio - automatic-speech-recognition - speech - xlsr-fine-tuning-week license: apache-2.0 model-index: - name: XLSR Wav2Vec2 Greek by Jonatas Grosman results: - task: name: Speech Recognition type: automatic-speech-recognition dataset: name: Common Voice el type: common_voice args: el metrics: - name: Test WER type: wer value: 11.62 - name: Test CER type: cer value: 3.36 ---
wav2vec2-large-xlsr-53-hungarian
--- language: hu datasets: - common_voice metrics: - wer - cer tags: - audio - automatic-speech-recognition - speech - xlsr-fine-tuning-week license: apache-2.0 model-index: - name: XLSR Wav2Vec2 Hungarian by Jonatas Grosman results: - task: name: Speech Recognition type: automatic-speech-recognition dataset: name: Common Voice hu type: common_voice args: hu metrics: - name: Test WER type: wer value: 31.40 - name: Test CER type: cer value: 6.20 ---
wav2vec2-large-xlsr-53-english
--- language: en datasets: - common_voice - mozilla-foundation/common_voice_6_0 metrics: - wer - cer tags: - audio - automatic-speech-recognition - en - hf-asr-leaderboard - mozilla-foundation/common_voice_6_0 - robust-speech-event - speech - xlsr-fine-tuning-week license: apache-2.0 model-index: - name: XLSR Wav2Vec2 English by Jonatas Grosman results: - task: name: Automatic Speech Recognition type: automatic-speech-recognition dataset: name: Common Voice en type: common_voice args: en metrics
wav2vec2-xls-r-1b-portuguese
wav2vec2-large-xlsr-53-finnish
Fine-tuned XLSR-53 large model for speech recognition in Finnish Fine-tuned facebook/wav2vec2-large-xlsr-53 on Finnish using the train and validation splits of Common Voice 6.1 and CSS10. When using this model, make sure that your speech input is sampled at 16kHz. This model has been fine-tuned thanks to the GPU credits generously given by the OVHcloud :) The script used for training can be found here: https://github.com/jonatasgrosman/wav2vec2-sprint The model can be used directly (without a language model) as follows... | Reference | Prediction | | ------------- | ------------- | | MYSTEERIMIES OLI OPPINUT MORAALINSA TARUISTA, ELOKUVISTA JA PELEISTÄ. | MYSTEERIMIES OLI OPPINUT MORALINSA TARUISTA ELOKUVISTA JA PELEISTÄ | | ÄÄNESTIN MIETINNÖN PUOLESTA! | ÄÄNESTIN MIETINNÖN PUOLESTA | | VAIN TUNTIA AIKAISEMMIN OLIMME MIEHENI KANSSA TUNTENEET SUURINTA ILOA. | PAIN TUNTIA AIKAISEMMIN OLIN MIEHENI KANSSA TUNTENEET SUURINTA ILAA | | ENSIMMÄISELLE MIEHELLE SAI KOLME LASTA. | ENSIMMÄISELLE MIEHELLE SAI KOLME LASTA | | ÄÄNESTIN MIETINNÖN PUOLESTA, SILLÄ POHJIMMILTAAN SIINÄ VASTUSTETAAN TÄTÄ SUUNTAUSTA. | ÄÄNESTIN MIETINNÖN PUOLESTA SILLÄ POHJIMMILTAAN SIINÄ VASTOTTETAAN TÄTÄ SUUNTAUSTA | | TÄHDENLENTOJENKO VARALTA MINÄ SEN OLISIN TÄNNE KUSKANNUT? | TÄHDEN LENTOJENKO VARALTA MINÄ SEN OLISIN TÄNNE KUSKANNUT | | SIITÄ SE TULEE. | SIITA SE TULEE | | NIIN, KUULUU KIROUS, JA KAUHEA KARJAISU. | NIIN KUULUU KIROUS JA KAUHEA KARJAISU | | ARKIT KUN OVAT NÄES ELEMENTTIRAKENTEISIA. | ARKIT KUN OVAT MÄISS' ELÄMÄTTEROKENTEISIÄ | | JÄIN ALUKSEN SISÄÄN, MUTTA KUULIN OVEN LÄPI, ETTÄ ULKOPUOLELLA ALKOI TAPAHTUA. | JAKALOKSEHÄN SISÄL MUTTA KUULIN OVENLAPI ETTÄ ULKA KUOLLALLA ALKOI TAPAHTUA | The model can be evaluated as follows on the Finnish test data of Common Voice. In the table below I report the Word Error Rate (WER) and the Character Error Rate (CER) of the model. I ran the evaluation script described above on other models as well (on 2021-04-21). Note that the table below may show different results from those already reported, this may have been caused due to some specificity of the other evaluation scripts used. | Model | WER | CER | | ------------- | ------------- | ------------- | | aapot/wav2vec2-large-xlsr-53-finnish | 32.51% | 5.34% | | Tommi/wav2vec2-large-xlsr-53-finnish | 35.22% | 5.81% | | vasilis/wav2vec2-large-xlsr-53-finnish | 38.24% | 6.49% | | jonatasgrosman/wav2vec2-large-xlsr-53-finnish | 41.60% | 8.23% | | birgermoell/wav2vec2-large-xlsr-finnish | 53.51% | 9.18% | Citation If you want to cite this model you can use this:
wav2vec2-large-xlsr-53-spanish
Fine-tuned XLSR-53 large model for speech recognition in Spanish Fine-tuned facebook/wav2vec2-large-xlsr-53 on Spanish using the train and validation splits of Common Voice 6.1. When using this model, make sure that your speech input is sampled at 16kHz. This model has been fine-tuned thanks to the GPU credits generously given by the OVHcloud :) The script used for training can be found here: https://github.com/jonatasgrosman/wav2vec2-sprint The model can be used directly (without a language model) as follows... | Reference | Prediction | | ------------- | ------------- | | HABITA EN AGUAS POCO PROFUNDAS Y ROCOSAS. | HABITAN AGUAS POCO PROFUNDAS Y ROCOSAS | | OPERA PRINCIPALMENTE VUELOS DE CABOTAJE Y REGIONALES DE CARGA. | OPERA PRINCIPALMENTE VUELO DE CARBOTAJES Y REGIONALES DE CARGAN | | PARA VISITAR CONTACTAR PRIMERO CON LA DIRECCIÓN. | PARA VISITAR CONTACTAR PRIMERO CON LA DIRECCIÓN | | TRES | TRES | | REALIZÓ LOS ESTUDIOS PRIMARIOS EN FRANCIA, PARA CONTINUAR LUEGO EN ESPAÑA. | REALIZÓ LOS ESTUDIOS PRIMARIOS EN FRANCIA PARA CONTINUAR LUEGO EN ESPAÑA | | EN LOS AÑOS QUE SIGUIERON, ESTE TRABAJO ESPARTA PRODUJO DOCENAS DE BUENOS JUGADORES. | EN LOS AÑOS QUE SIGUIERON ESTE TRABAJO ESPARTA PRODUJO DOCENA DE BUENOS JUGADORES | | SE ESTÁ TRATANDO DE RECUPERAR SU CULTIVO EN LAS ISLAS CANARIAS. | SE ESTÓ TRATANDO DE RECUPERAR SU CULTIVO EN LAS ISLAS CANARIAS | | SÍ | SÍ | | "FUE ""SACADA"" DE LA SERIE EN EL EPISODIO ""LEAD"", EN QUE ALEXANDRA CABOT REGRESÓ." | FUE SACADA DE LA SERIE EN EL EPISODIO LEED EN QUE ALEXANDRA KAOT REGRESÓ | | SE UBICAN ESPECÍFICAMENTE EN EL VALLE DE MOKA, EN LA PROVINCIA DE BIOKO SUR. | SE UBICAN ESPECÍFICAMENTE EN EL VALLE DE MOCA EN LA PROVINCIA DE PÍOCOSUR | 1. To evaluate on `mozilla-foundation/commonvoice60` with split `test` 2. To evaluate on `speech-recognition-community-v2/devdata` Citation If you want to cite this model you can use this:
wav2vec2-large-xlsr-53-german
Fine-tuned XLSR-53 large model for speech recognition in German Fine-tuned facebook/wav2vec2-large-xlsr-53 on German using the train and validation splits of Common Voice 6.1. When using this model, make sure that your speech input is sampled at 16kHz. This model has been fine-tuned thanks to the GPU credits generously given by the OVHcloud :) The script used for training can be found here: https://github.com/jonatasgrosman/wav2vec2-sprint The model can be used directly (without a language model) as follows... | Reference | Prediction | | ------------- | ------------- | | ZIEHT EUCH BITTE DRAUSSEN DIE SCHUHE AUS. | ZIEHT EUCH BITTE DRAUSSEN DIE SCHUHE AUS | | ES KOMMT ZUM SHOWDOWN IN GSTAAD. | ES KOMMT ZUG STUNDEDAUTENESTERKT | | IHRE FOTOSTRECKEN ERSCHIENEN IN MODEMAGAZINEN WIE DER VOGUE, HARPER’S BAZAAR UND MARIE CLAIRE. | IHRE FOTELSTRECKEN ERSCHIENEN MIT MODEMAGAZINEN WIE DER VALG AT DAS BASIN MA RIQUAIR | | FELIPE HAT EINE AUCH FÜR MONARCHEN UNGEWÖHNLICH LANGE TITELLISTE. | FELIPPE HAT EINE AUCH FÜR MONACHEN UNGEWÖHNLICH LANGE TITELLISTE | | ER WURDE ZU EHREN DES REICHSKANZLERS OTTO VON BISMARCK ERRICHTET. | ER WURDE ZU EHREN DES REICHSKANZLERS OTTO VON BISMARCK ERRICHTET M | | WAS SOLLS, ICH BIN BEREIT. | WAS SOLL'S ICH BIN BEREIT | | DAS INTERNET BESTEHT AUS VIELEN COMPUTERN, DIE MITEINANDER VERBUNDEN SIND. | DAS INTERNET BESTEHT AUS VIELEN COMPUTERN DIE MITEINANDER VERBUNDEN SIND | | DER URANUS IST DER SIEBENTE PLANET IN UNSEREM SONNENSYSTEM. | DER URANUS IST DER SIEBENTE PLANET IN UNSEREM SONNENSYSTEM | | DIE WAGEN ERHIELTEN EIN EINHEITLICHES ERSCHEINUNGSBILD IN WEISS MIT ROTEM FENSTERBAND. | DIE WAGEN ERHIELTEN EIN EINHEITLICHES ERSCHEINUNGSBILD IN WEISS MIT ROTEM FENSTERBAND | | SIE WAR DIE COUSINE VON CARL MARIA VON WEBER. | SIE WAR DIE COUSINE VON KARL-MARIA VON WEBER | 1. To evaluate on `mozilla-foundation/commonvoice60` with split `test` 2. To evaluate on `speech-recognition-community-v2/devdata` Citation If you want to cite this model you can use this:
wav2vec2-large-xlsr-53-french
wav2vec2-xls-r-1b-english
Fine-tuned XLS-R 1B model for speech recognition in English Fine-tuned facebook/wav2vec2-xls-r-1b on English using the train and validation splits of Common Voice 8.0, Multilingual LibriSpeech, TED-LIUMv3, and Voxpopuli. When using this model, make sure that your speech input is sampled at 16kHz. This model has been fine-tuned by the HuggingSound tool, and thanks to the GPU credits generously given by the OVHcloud :) 1. To evaluate on `mozilla-foundation/commonvoice80` with split `test` 2. To evaluate on `speech-recognition-community-v2/devdata` Citation If you want to cite this model you can use this:
wav2vec2-xls-r-1b-russian
Fine-tuned XLS-R 1B model for speech recognition in Russian Fine-tuned facebook/wav2vec2-xls-r-1b on Russian using the train and validation splits of Common Voice 8.0, Golos, and Multilingual TEDx. When using this model, make sure that your speech input is sampled at 16kHz. This model has been fine-tuned by the HuggingSound tool, and thanks to the GPU credits generously given by the OVHcloud :) 1. To evaluate on `mozilla-foundation/commonvoice80` with split `test` 2. To evaluate on `speech-recognition-community-v2/devdata` Citation If you want to cite this model you can use this:
wav2vec2-xls-r-1b-spanish
Fine-tuned XLS-R 1B model for speech recognition in Spanish Fine-tuned facebook/wav2vec2-xls-r-1b on Spanish using the train and validation splits of Common Voice 8.0, MediaSpeech, Multilingual TEDx, Multilingual LibriSpeech, and Voxpopuli. When using this model, make sure that your speech input is sampled at 16kHz. This model has been fine-tuned by the HuggingSound tool, and thanks to the GPU credits generously given by the OVHcloud :) 1. To evaluate on `mozilla-foundation/commonvoice80` with split `test` 2. To evaluate on `speech-recognition-community-v2/devdata` Citation If you want to cite this model you can use this:
wav2vec2-large-xlsr-53-italian
wav2vec2-xls-r-1b-german
exp_w2v2t_en_wavlm_s990
wav2vec2-xls-r-1b-french
whisper-large-zh-cv11
This model is a fine-tuned version of openai/whisper-large-v2 on Chinese (Mandarin) using the train and validation splits of Common Voice 11. Not all validation split data were used during training, I extracted 1k samples from the validation split to be used for evaluation during fine-tuning. I've performed the evaluation of the model using the test split of two datasets, the Common Voice 11 (same dataset used for the fine-tuning) and the Fleurs (dataset not seen during the fine-tuning). As Whisper can transcribe casing and punctuation, I've performed the model evaluation in 2 different scenarios, one using the raw text and the other using the normalized text (lowercase + removal of punctuations). Additionally, for the Fleurs dataset, I've evaluated the model in a scenario where there are no transcriptions of numerical values since the way these values are described in this dataset is different from how they are described in the dataset used in fine-tuning (Common Voice), so it is expected that this difference in the way of describing numerical values will affect the performance of the model for this type of transcription in Fleurs. | | CER | WER | | --- | --- | --- | | jonatasgrosman/whisper-large-zh-cv11 | 9.31 | 55.94 | | jonatasgrosman/whisper-large-zh-cv11 + text normalization | 9.55 | 55.02 | | openai/whisper-large-v2 | 33.33 | 101.80 | | openai/whisper-large-v2 + text normalization | 29.90 | 95.91 | | | CER | WER | | --- | --- | --- | | jonatasgrosman/whisper-large-zh-cv11 | 15.00 | 93.45 | | jonatasgrosman/whisper-large-zh-cv11 + text normalization | 11.76 | 70.63 | | jonatasgrosman/whisper-large-zh-cv11 + keep only non-numeric samples | 10.95 | 87.91 | | jonatasgrosman/whisper-large-zh-cv11 + text normalization + keep only non-numeric samples | 7.83 | 62.12 | | openai/whisper-large-v2 | 23.49 | 101.28 | | openai/whisper-large-v2 + text normalization | 17.58 | 83.22 | | openai/whisper-large-v2 + keep only non-numeric samples | 21.03 | 101.95 | | openai/whisper-large-v2 + text normalization + keep only non-numeric samples | 15.22 | 79.28 |