Oriserve

3 models • 1 total models in database
Sort by:

Whisper-Hindi2Hinglish-Prime

A better version of this model is available: Oriserve/Whisper-Hindi2Hinglish-Apex Whisper-Hindi2Hinglish-Prime: - GITHUB LINK: github link - SPEECH-TO-TEXT ARENA: Speech-To-Text Arena Table of Contents: - Key Features - Training - Data - Finetuning - Usage - Performance Overview - Qualitative Performance Overview - Quantitative Performance Overview - Miscellaneous Key Features: 1. Hinglish as a language: Added ability to transcribe audio into spoken Hinglish language reducing chances of grammatical errors 2. Whisper Architecture: Based on the whisper architecture making it easy to use with the transformers package 3. Better Noise handling: The model is resistant to noise and thus does not return transcriptions for audios with just noise 4. Hallucination Mitigation: Minimizes transcription hallucinations to enhance accuracy. 5. Performance Increase: ~39% average performance increase versus pretrained model across benchmarking datasets Training: Data: - Duration: A total of ~550 Hrs of noisy Indian-accented Hindi data was used to finetune the model. - Collection: Due to a lack of ASR-ready hinglish datasets available, a specially curated proprietary dataset was used. - Labelling: This data was then labeled using a SOTA model and the transcriptions were improved by human intervention. - Quality: Emphasis was placed on collecting noisy data for the task as the intended use case of the model is in Indian environments where background noise is abundant. - Processing: It was ensured that the audios are all chunked into chunks of length | maynata pura, canta maynata | Mehnat to poora karte hain. | | | Where did they come from? | Haan vahi ek aapko bataaya na. | | | A Pantral Logan. | Aap pandrah log hain. | | | Thank you, Sanchez. | Kitne saal ki? | | | Rangers, I can tell you. | Lander cycle chaahie. | | | Uh-huh. They can't. | Haan haan, dekhe hain. | Note: - The below WER scores are for Hinglish text generated by our model and the original whisper model - To check our model's real-world performance against other SOTA models please head to our Speech-To-Text Arena arena space. | Dataset | Whisper Large V3 | Whisper-Hindi2Hinglish-Prime | |-------|------------------------|-------------------------| | Common-Voice | 61.9432| 32.4314 | | FLEURS | 50.8425 | 28.6806 | | Indic-Voices| 82.5621 | 60.8224 | Usage: Using Transformers - To run the model, first install the Transformers library - The model can be used with the `pipeline` class to transcribe audios of arbitrary length: Flash-Attention 2 can be used to make the transcription fast. If your GPU supports Flash-Attention you can use it by, first installing Flash Attention: - Once installed you can then load the model using the below code: - Convert the huggingface checkpoint to a pytorch model Miscellaneous This model is from a family of transformers-based ASR models trained by Oriserve. To compare this model against other models from the same family or other SOTA models please head to our Speech-To-Text Arena. To learn more about our other models, and other queries regarding AI voice agents you can reach out to us at our email [email protected]

NaNK
license:apache-2.0
57,609
10

Whisper-Hindi2Hinglish-Swift

license:apache-2.0
495
7

Whisper Hindi2Hinglish Apex

license:apache-2.0
4
2