H5N1AIDS

3 models • 1 total models in database

Sort by:

Transcribe_and_Translate_Subtitles

A powerful, privacy-first tool for transcribing and translating video subtitles [](https://github.com/your-repo) [](https://onnxruntime.ai/) [](https://github.com/your-repo) 设置 1. 下载模型: 从 HuggingFace 获取所需模型，只下载您想要的模型并保持文件夹路径与当前定义相同，无需全部下载。 2. 下载脚本: 将 `run.py` 放置在您的 `TranscribeandTranslateSubtitles` 文件夹中 3. 添加媒体: 将您的音视频放置在 `TranscribeandTranslateSubtitles/Media/` 目录下 4. 运行: 务必在`TranscribeandTranslateSubtitles`目录下执行 `python run.py` 并打开 Web 界面 Setup 1. Download Models: Get the required models from HuggingFace. Download only the models you want and keep the folder path the same as currently defined, no need to download all. 2. Download Script: Place `run.py` in your `TranscribeandTranslateSubtitles` folder 3. Add Media: Place your audios/videos in `TranscribeandTranslateSubtitles/Media/` 4. Run: You must execute `python run.py` in the `TranscribeandTranslateSubtitles` folder and open the web interface 5. 结果 / Results 在以下位置找到您处理后的字幕 / Find your processed subtitles in: 🔇 降噪模型 / Noise Reduction Models - DFSMN - GTCRN - ZipEnhancer - Mel-Band-Roformer - MossFormerGANSE16K - MossFormer2SE48K 🎤 语音活动检测 (VAD) / Voice Activity Detection (VAD) - Faster-Whisper-Silero - Official-Silero-v6 - HumAware - NVIDIA-NeMo-VAD-v2.0 - TEN-VAD - Pyannote-Segmentation-3.0 - 注意：您需要接受 Pyannote 的使用条款并下载 Pyannote 的 `pytorchmodel.bin` 文件。将其放置在 `VAD/pyannotesegmentation` 文件夹中。 - Note: You need to accept Pyannote's terms of use and download the Pyannote `pytorchmodel.bin` file. Place it in the `VAD/pyannotesegmentation` folder. 🗣️ 语音识别 (ASR) / Speech Recognition (ASR) 多语言模型 / Multilingual Models - SenseVoice-Small-Multilingual - Dolphin-Small-Asian 亚洲语言 - Paraformer-Large-Chinese 中文 - Paraformer-Large-English 英语 - FireRedASR-AED-L Chinese 中文 - Official-Whisper-Large-v3-Multilingual - Official-Whisper-Large-v3-Turbo-Multilingual - 阿拉伯语 / Arabic - 巴斯克语 / Basque - 粤语 / Cantonese-Yue - 中文 / Chinese - 台湾客家话 / Chinese-Hakka - 台湾闽南语 / Chinese-Minnan - 台湾华语 / Chinese-Taiwan - CrisperWhisper-Multilingual - 丹麦语 / Danish - 印度英语 / English-Indian - 英语 v3.5 / Engish-v3.5 - 法语 / French - 瑞士德语 / German-Swiss - 德语 / German - 希腊语 / Greek - 意大利语 / Italian - 日语-动漫 / Japanese-Anime - 日语 / Japanese - 韩语 / Korean - 马来语 / Malaysian - 波斯语 / Persian - 波兰语 / Polish - 葡萄牙语 / Portuguese - 俄语 / Russian - 塞尔维亚语 / Serbian - 西班牙语 / Spanish - 泰语 / Thai - 土耳其语 / Turkish - 乌尔都语 / Urdu - 越南语 / Vietnamese 🤖 翻译模型 (LLM) / Translation Models (LLM) - Qwen-3-4B-Instruct-2507-Abliterated - Qwen-3-8B-Abliterated - Hunyuan-MT-7B-Abliterated - Seed-X-PRO-7B Apple CoreML AMD ROCm Intel OpenVINO NVIDIA CUDA Windows DirectML 测试条件 / Test Conditions： Ubuntu 24.04, Intel i3-12300, 7602 秒视频 | 操作系统 (OS) | 后端 (Backend) | 降噪器 (Denoiser) | VAD | 语音识别 (ASR) | 大语言模型 (LLM) | 实时率 (Real-Time Factor) | |:---:|:---:|:---:|:---:|:---:|:---:|:---:| | Ubuntu-24.04 | CPU i3-12300 | - | Silero | SenseVoiceSmall | - | 0.08 | | Ubuntu-24.04 | CPU i3-12300 | GTCRN | Silero | SenseVoiceSmall | Qwen2.5-7B-Instruct | 0.50 | | Ubuntu-24.04 | CPU i3-12300 | GTCRN | FSMN | SenseVoiceSmall | - | 0.054 | | Ubuntu-24.04 | CPU i3-12300 | ZipEnhancer | FSMN | SenseVoiceSmall | - | 0.39 | | Ubuntu-24.04 | CPU i3-12300 | GTCRN | Silero | Whisper-Large-V3 | - | 0.20 | | Ubuntu-24.04 | CPU i3-12300 | GTCRN | FSMN | Whisper-Large-V3-Turbo | - | 0.148 | 常见问题 / Common Issues - Silero VAD 错误 / Silero VAD Error: 首次运行时只需重启应用程序 / Simply restart the application on first run - libc++ 错误 (Linux) / libc++ Error (Linux): - 苹果芯片 / Apple Silicon: 请避免安装 `onnxruntime-openvino`，因为它会导致错误 / Avoid installing `onnxruntime-openvino` as it will cause errors 🆕 2025/9/19 - 重大更新 / Major Release - ✅ 新增 ASR / Added ASR: - 28 个地区微调的 Whisper 模型 - 28 region fine-tuned Whisper models - ✅ 新增降噪器 / Added Denoiser: MossFormer2SE48K - ✅ 新增 LLM 模型 / Added LLM Models: - Qwen3-4B-Instruct-2507-abliterated - Qwen3-8B-abliterated-v2 - Hunyuan-MT-7B-abliterated - Seed-X-PRO-7B - ✅ 性能改进 / Performance Improvements: - 为类 Whisper 的 ASR 模型应用了束搜索（Beam Search）和重复惩罚（Repeat Penalty） - 应用 ONNX Runtime IOBinding 实现最大加速（比常规 ortsession.run() 快 10%以上） - 支持单次推理处理 20 秒的音频片段 - 改进了多线程性能 - Applied Beam Search & Repeat Penalty for Whisper-like ASR models - Applied ONNX Runtime IOBinding for maximum speed up (10%+ faster than normal ortsession.run()) - Support for 20 seconds audio segment per single run inference - Improved multi-threads performance - ✅ 硬件支持扩展 / Hardware Support Expansion: - AMD-ROCm 执行提供程序 / Execution Provider - AMD-MIGraphX 执行提供程序 / Execution Provider - NVIDIA TensorRTX 执行提供程序 / Execution Provider - (必须先配置环境，否则无法工作 / Must config the env first or it will not work) - ✅ 准确性改进 / Accuracy Improvements: - SenseVoice - Paraformer - FireRedASR - Dolphin - ZipEnhancer - MossFormerGANSE16K - NVIDIA-NeMo-VAD - ✅ 速度改进 / Speed Improvements: - MelBandRoformer (通过转换为单声道提升速度 / speed boost by converting to mono channel) - ❌ 移除的模型 / Removed Models: - FSMN-VAD - Qwen3-4B-Official - Qwen3-8B-Official - Gemma3-4B-it - Gemma3-12B-it - InternLM3 - Phi-4-Instruct 2025/7/5 - 降噪增强 / Noise Reduction Enhancement - ✅ 新增降噪模型 / Added noise reduction model: MossFormerGANSE16K 2025/6/11 - VAD 模型扩展 / VAD Models Expansion - ✅ 新增 VAD 模型 / Added VAD Models: - HumAware-VAD - NVIDIA-NeMo-VAD - TEN-VAD 2025/6/3 - 亚洲语言支持 / Asian Language Support - ✅ 新增 Dolphin ASR 模型以支持亚洲语言 / Added Dolphin ASR model to support Asian languages 2025/5/13 - GPU 加速 / GPU Acceleration - ✅ 新增 Float16/32 ASR 模型以支持 CUDA/DirectML GPU / Added Float16/32 ASR models to support CUDA/DirectML GPU usage - ✅ GPU 性能 / GPU Performance: 这些模型可以实现超过 99% 的 GPU 算子部署 / These models can achieve >99% GPU operator deployment 2025/5/9 - 主要功能发布 / Major Feature Release - ✅ 灵活性改进 / Flexibility Improvements: - 新增不使用 VAD（语音活动检测）的选项 / Added option to not use VAD (Voice Activity Detection) - ✅ 新增模型 / Added Models: - 降噪 / Noise reduction: MelBandRoformer - ASR: CrisperWhisper - ASR: Whisper-Large-v3.5-Distil (英语微调 / English fine-tuned) - ASR: FireRedASR-AED-L (支持中文及方言 / Chinese + dialects support) - 三个日语动漫微调的 Whisper 模型 / Three Japanese anime fine-tuned Whisper models - ✅ 性能优化 / Performance Optimizations: - 移除 IPEX-LLM 框架以提升整体性能 / Removed IPEX-LLM framework to enhance overall performance - 取消 LLM 量化选项，统一使用 Q4F32 格式 / Cancelled LLM quantization options, standardized on Q4F32 format - Whisper 系列推理速度提升 10% 以上 / Improved Whisper series inference speed by over 10% - ✅ 准确性改进 / Accuracy Improvements: - 提升 FSMN-VAD 准确率 / Improved FSMN-VAD accuracy - 提升 Paraformer 识别准确率 / Improved Paraformer recognition accuracy - 提升 SenseVoice 识别准确率 / Improved SenseVoice recognition accuracy - ✅ LLM 支持 ONNX Runtime 100% GPU 算子部署 / LLM Support with ONNX Runtime 100% GPU operator deployment: - Qwen3-4B/8B - InternLM3-8B - Phi-4-mini-Instruct - Gemma3-4B/12B-it - ✅ 硬件支持扩展 / Hardware Support Expansion: - Intel OpenVINO - NVIDIA CUDA GPU - Windows DirectML GPU (支持集成显卡和独立显卡 / supports integrated and discrete GPUs) - [ ] 视频超分 / Video Upscaling - 提升分辨率 / Enhance resolution - [ ] 实时播放器 / Real-time Player - 实时转录和翻译 / Live transcription and translation

—

F5-TTS-ONNX

license:apache-2.0

Qwen_Android_ONNX_Runtime

license:apache-2.0