alvanlii

4 models • 1 total models in database
Sort by:

whisper-small-cantonese

This model is a fine-tuned version of openai/whisper-small on the Cantonese language. It achieves a 7.93 CER (without punctuations), 9.72 CER (with punctuations) on Common Voice 16.0 Training and evaluation data For training, - CantoMap: Winterstein, Grégoire, Tang, Carmen and Lai, Regine (2020) "CantoMap: a Hong Kong Cantonese MapTask Corpus", in Proceedings of The 12th Language Resources and Evaluation Conference, Marseille: European Language Resources Association, p. 2899-2906. - Cantonse-ASR: Yu, Tiezheng, Frieske, Rita, Xu, Peng, Cahyawijaya, Samuel, Yiu, Cheuk Tung, Lovenia, Holy, Dai, Wenliang, Barezi, Elham, Chen, Qifeng, Ma, Xiaojuan, Shi, Bertram, Fung, Pascale (2022) "Automatic Speech Recognition Datasets in Cantonese: A Survey and New Dataset", 2022. Link: https://arxiv.org/pdf/2201.02419.pdf |Name|# of Hours| |--|--| |Common Voice 16.0 zh-HK Train|138| |Common Voice 16.0 yue Train|85| |Common Voice 17.0 yue Train|178| |Cantonese-ASR|72| |CantoMap|23| |Pseudo-Labelled YouTube Data|438| For evaluation, Common Voice 16.0 yue Test set is used. Results - CER (lower is better): 0.0972 - down from 0.1073, 0.1581 in the previous versions - CER (punctuations removed): 0.0793 - GPU Inference with Fast Attention (example below): 0.055s/sample - Note all GPU evaluations are done on RTX 3090 GPU - GPU Inference: 0.308s/sample - CPU Inference: 2.57s/sample - GPU VRAM: ~1.5 GB Model Speedup Just add attnimplementation="sdpa" for Flash Attention. Using Flash Attention reduced the amount of time taken per sample from 0.308s to 0.055s. Speculative Decoding You can use a bigger model, then use `alvanlii/whisper-small-cantonese` to speed up inference with basically no loss in accuracy. In the original `simonl0909/whisper-large-v2-cantonese` model, it runs at 0.714s/sample for a CER of 7.65. \ Using speculative decoding with `alvanlii/whisper-small-cantonese`, it runs at 0.137s/sample for a CER of 7.67, which is much faster. Whisper.cpp Uploaded a GGML bin file for Whisper cpp as of June 2024. You can download the bin file here and try it out here. Whisper CT2 For use in WhisperX or FasterWhisper, a CT2 file is needed. The converted model is under here Training Hyperparameters - learningrate: 5e-5 - trainbatchsize: 25 (on 1 3090 GPU) - evalbatchsize: 8 - gradientaccumulationsteps: 4 - totaltrainbatchsize: 25x4=100 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lrschedulertype: linear - lrschedulerwarmupsteps: 500 - trainingsteps: 15000 - augmentation: None

license:apache-2.0
904
106

wav2vec2-BERT-cantonese

license:apache-2.0
27
5

distil-whisper-small-cantonese

NaNK
license:apache-2.0
21
8

canto-llasa-1b

NaNK
llama
2
0