elgeish
cs224n-squad2.0-albert-base-v2
Wav2vec2 Large Xlsr 53 Arabic
Fine-tuned facebook/wav2vec2-large-xlsr-53 on Arabic using the `train` splits of Common Voice and Arabic Speech Corpus. When using this model, make sure that your speech input is sampled at 16kHz. The model can be used directly (without a language model) as follows: The model can be evaluated as follows on the Arabic test data of Common Voice: For more details, see Fine-Tuning with Arabic Speech Corpus. This model represents Arabic in a format called Buckwalter transliteration. The Buckwalter format only includes ASCII characters, some of which are non-alpha (e.g., `">"` maps to `"أ"`). The lang-trans package is used to convert (transliterate) Arabic abjad. This script was used to first fine-tune facebook/wav2vec2-large-xlsr-53 on the `train` split of the Arabic Speech Corpus dataset; the `test` split was used for model selection; the resulting model at this point is saved as elgeish/wav2vec2-large-xlsr-53-levantine-arabic. Training was then resumed using the `train` split of the Common Voice dataset; the `validation` split was used for model selection; training was stopped to meet the deadline of Fine-Tune-XLSR Week: this model is the checkpoint at 100k steps and a validation WER of 23.39%. It's worth noting that validation WER is trending down, indicating the potential of further training (resuming the decaying learning rate at 7e-6). Future Work One area to explore is using `attentionmask` in model input, which is recommended here. Also, exploring data augmentation using datasets used to train models listed here.