BELLE-2
Belle-whisper-large-v3-turbo-zh
Welcome If you find this model helpful, please like this model and star us on https://github.com/LianjiaTech/BELLE and https://github.com/shuaijiang/Whisper-Finetune Belle-whisper-large-v3-turbo-zh Fine tune whisper-large-v3-turbo-zh to enhance Chinese speech recognition capabilities, Belle-whisper-large-v3-turbo-zh demonstrates a 24-64% relative improvement in performance to whisper-large-v3-turbo on Chinese ASR benchmarks, including AISHELL1, AISHELL2, WENETSPEECH, and HKUST. Same to Belle-whisper-large-v3-zh-punct, the punctuation marks come from model puncct-transformercn-en-common-vocab471067-large, and are added to the training datasets. Fine-tuning | Model | (Re)Sample Rate | Train Datasets | Fine-tuning (full or peft) | |:----------------:|:-------:|:----------------------------------------------------------:|:-----------:| | Belle-whisper-large-v3-turbo-zh | 16KHz | AISHELL-1 AISHELL-2 WenetSpeech HKUST | full fine-tuning | If you want to fine-thuning the model on your datasets, please reference to the github repo CER(%) ↓ | Model | Language Tag | aishell1test(↓) |aishell2test(↓)| wenetspeechnet(↓) | wenetspeechmeeting(↓) | HKUSTdev(↓)| |:----------------:|:-------:|:-----------:|:-----------:|:--------:|:-----------:|:-------:| | whisper-large-v3 | Chinese | 8.085 | 5.475 | 11.72 | 20.15 | 28.597 | | whisper-large-v3-turbo | Chinese | 8.639 | 6.014 | 13.507 | 20.313 | 37.324 | | Belle-whisper-large-v3-turbo-zh | Chinese | 3.070 | 4.114 | 10.230 | 13.357 | 18.944 | It is worth mentioning that compared to whisper-large-v3 and whisper-large-v3-turbo, Belle-whisper-large-v3-turbo-zh has a significant improvement. Please cite our paper and github when using our code, data or model.
Belle Whisper Large V3 Zh
Welcome If you find this model helpful, please like this model and star us on https://github.com/LianjiaTech/BELLE and https://github.com/shuaijiang/Whisper-Finetune Belle-whisper-large-v3-zh Fine tune whisper-large-v3 to enhance Chinese speech recognition capabilities, Belle-whisper-large-v3-zh demonstrates a 24-65% relative improvement in performance on Chinese ASR benchmarks, including AISHELL1, AISHELL2, WENETSPEECH, and HKUST. Fine-tuning | Model | (Re)Sample Rate | Train Datasets | Fine-tuning (full or peft) | |:----------------:|:-------:|:----------------------------------------------------------:|:-----------:| | Belle-whisper-large-v3-zh | 16KHz | AISHELL-1 AISHELL-2 WenetSpeech HKUST | full fine-tuning | If you want to fine-thuning the model on your datasets, please reference to the github repo CER(%) ↓ | Model | Language Tag | aishell1test(↓) |aishell2test(↓)| wenetspeechnet(↓) | wenetspeechmeeting(↓) | HKUSTdev(↓)| |:----------------:|:-------:|:-----------:|:-----------:|:--------:|:-----------:|:-------:| | whisper-large-v3 | Chinese | 8.085 | 5.475 | 11.72 | 20.15 | 28.597 | | Belle-whisper-large-v2-zh | Chinese | 2.549 | 3.746 | 8.503 | 14.598 | 16.289 | | Belle-whisper-large-v3-zh | Chinese | 2.781 | 3.786 | 8.865 | 11.246 | 16.440 | It is worth mentioning that compared to Belle-whisper-large-v2-zh, Belle-whisper-large-v3-zh has a significant improvement in complex acoustic scenes(such as wenetspeechmeeting). Please cite our paper and github when using our code, data or model.
BELLE-Llama2-13B-chat-0.4M
Belle-whisper-large-v3-zh-punct
Welcome If you find this model helpful, please like this model and star us on https://github.com/LianjiaTech/BELLE and https://github.com/shuaijiang/Whisper-Finetune Belle-whisper-large-v3-zh-punct Fine tune whisper-large-v3-zh to enhance Chinese punctuation mark capabilities while maintaining comparable performance, Belle-whisper-large-v3-zh-punct demonstrates similar performance to Belle-whisper-large-v3-zh on Chinese ASR benchmarks, including AISHELL1, AISHELL2, WENETSPEECH, and HKUST. The punctuation marks come from model puncct-transformercn-en-common-vocab471067-large, and are added to the training datasets. Fine-tuning | Model | (Re)Sample Rate | Train Datasets | Fine-tuning (full or peft) | |:----------------:|:-------:|:----------------------------------------------------------:|:-----------:| | Belle-whisper-large-v3-zh-punct | 16KHz | AISHELL-1 AISHELL-2 WenetSpeech HKUST | lora fine-tuning | To incorporate punctuation marks without compromising performance, Lora fine-tuning was employed. If you want to fine-thuning the model on your datasets, please reference to the github repo CER(%) ↓ | Model | Language Tag | aishell1test(↓) |aishell2test(↓)| wenetspeechnet(↓) | wenetspeechmeeting(↓) | HKUSTdev(↓)| |:----------------:|:-------:|:-----------:|:-----------:|:--------:|:-----------:|:-------:| | whisper-large-v3 | Chinese | 8.085 | 5.475 | 11.72 | 20.15 | 28.597 | | Belle-whisper-large-v3-zh | Chinese | 2.781 | 3.786 | 8.865 | 11.246 | 16.440 | | Belle-whisper-large-v3-zh-punct | Chinese | 2.945 | 3.808 | 8.998 | 10.973 | 17.196 | It is worth mentioning that compared to Belle-whisper-large-v3-zh, Belle-whisper-large-v3-zh-punct even has a slight improvement in complex acoustic scenes(such as wenetspeechmeeting). And the punctation marks of Belle-whisper-large-v3-zh-punct are removed to compute the CER. Please cite our paper and github when using our code, data or model.
Belle-whisper-large-v2-zh
Welcome If you find this model helpful, please like this model and star us on https://github.com/LianjiaTech/BELLE and https://github.com/shuaijiang/Whisper-Finetune Belle-whisper-large-v2-zh Fine tune whisper-large-v2 to enhance Chinese speech recognition capabilities, Belle-whisper-large-v2-zh demonstrates a 30-70% relative improvement in performance on Chinese ASR benchmarks, including AISHELL1, AISHELL2, WENETSPEECH, and HKUST. Fine-tuning | Model | (Re)Sample Rate | Train Datasets | Fine-tuning (full or peft) | |:----------------:|:-------:|:----------------------------------------------------------:|:-----------:| | Belle-whisper-large-v2-zh | 16KHz | AISHELL-1 AISHELL-2 WenetSpeech HKUST | full fine-tuning | If you want to fine-thuning the model on your datasets, please reference to the github repo CER(%) ↓ | Model | Language Tag | aishell1test(↓) |aishell2test(↓)| wenetspeechnet(↓) | wenetspeechmeeting(↓) | HKUSTdev(↓)| |:----------------:|:-------:|:-----------:|:-----------:|:--------:|:-----------:|:-------:| | whisper-large-v2 | Chinese | 8.818 | 6.183 | 12.343 | 26.413 | 31.917 | | Belle-whisper-large-v2-zh | Chinese | 2.549 | 3.746 | 8.503 | 14.598 | 16.289 | Please cite our paper and github when using our code, data or model.
Belle-distilwhisper-large-v2-zh
Welcome If you find this model helpful, please like this model and star us on https://github.com/LianjiaTech/BELLE and https://github.com/shuaijiang/Whisper-Finetune Belle-distilwhisper-large-v2-zh Fine tune distilwhisper-large-v2 to enhance Chinese speech recognition capabilities. Similar to distilwhisper-large-v2, Belle-distilwhisper-large-v2-zh is 5.8 times faster and has 51% fewer parameters compared to whisper-large-v2. Despite having 51% fewer parameters, Belle-distilwhisper-large-v2-zh achieves a relative improvement of -3% to 35% over whisper-large-v2. It's important to note that the original distilwhisper-large-v2 cannot transcribe Chinese (it only outputs English). Fine-tuning | Model | (Re)Sample Rate | Train Datasets | Fine-tuning (full or peft) | |:----------------:|:-------:|:----------------------------------------------------------:|:-----------:| | Belle-distilwhisper-large-v2-zh | 16KHz | AISHELL-1 AISHELL-2 WenetSpeech HKUST | full fine-tuning | If you want to fine-thuning the model on your datasets, please reference to the github repo CER(%) ↓ | Model | Parameters(M) |Language Tag| aishell1test( ↓ ) |aishell2test( ↓ )| wenetspeechnet ( ↓ )| wenetspeechmeeting( ↓ )| HKUSTdev( ↓ )| |:----------------:|:-------:|:-------:|:-----------:|:-----------:|:--------:|:-----------:|:-------:| | whisper-large-v2 |1550 |Chinese | 8.818% | 6.183% | 12.343% | 26.413% | 31.917% | | distilwhisper-large-v2 |756| Chinese | - | - | - | - | - | | Belle-distilwhisper-large-v2-zh| 756 | Chinese | 5.958% | 6.477% | 12.786% | 17.039% | 20.771% | Please cite our paper and github when using our code, data or model.