FluidInference

32 models • 1 total models in database
Sort by:

parakeet-tdt-0.6b-v3-coreml

🧃 parakeet-tdt-0.6b-v3: Multilingual Speech-to-Text Model CoreML [](#model-architecture) | [](#model-architecture) | [](#datasets) | [](https://discord.gg/WNsvaCtmDe) | [](https://github.com/FluidInference/FluidAudio) On‑device multilingual ASR model converted to Core ML for Apple platforms. This model powers FluidAudio’s batch ASR and is the same model used in our backend. It supports 25 European languages and is optimized for low‑latency, private, offline transcription. For conversion script and benchmarks: https://github.com/FluidInference/mobius/tree/main/models/tts/parakeet-tdt-v3-0.6b/coreml - Core ML: Runs fully on‑device (ANE/CPU) on Apple Silicon. - Multilingual: 25 European languages; see model usage in FluidAudio for examples. - Performance: ~110× RTF on M4 Pro for batch ASR (1 min audio ≈ 0.5 s). - Privacy: No network calls required once models are downloaded. - Batch transcription of complete audio files on macOS/iOS. - Local dictation and note‑taking apps where privacy and latency matter. - Embedded ASR in production apps via the FluidAudio Swift framework. - Architecture: Parakeet TDT v3 (Token Duration Transducer, 0.6B parameters) - Input audio: 16 kHz, mono, Float32 PCM in range [-1, 1] - Languages: 25 European languages (multilingual) - Precision: Mixed precision optimized for Core ML execution (ANE/CPU) - Real‑time factor (RTF): ~110× on M4 Pro in batch mode - Throughput and latency vary with device, input duration, and compute units (ANE/CPU). For quickest integration, use the FluidAudio Swift framework which handles model loading, audio preprocessing, and decoding. For more examples (including CLI usage and benchmarking), see the FluidAudio repository: https://github.com/FluidInference/FluidAudio - Core ML model artifacts suitable for use via the FluidAudio APIs (preferred) or directly with Core ML. - Tokenizer and configuration assets are included/managed by FluidAudio’s loaders. - Primary coverage is European languages; performance may degrade for non‑European languages. Apache 2.0. See the FluidAudio repository for details and usage guidance.

NaNK
license:cc-by-4.0
391,582
35

parakeet-ctc-110m-coreml

license:cc-by-4.0
10,878
1

parakeet-realtime-eou-120m-coreml

NaNK
8,330
4

speaker-diarization-coreml

NaNK
license:cc-by-4.0
7,075
32

silero-vad-coreml

NaNK
license:mit
4,674
8

parakeet-tdt-0.6b-v2-coreml

NaNK
license:cc-by-4.0
2,024
6

diar-streaming-sortformer-coreml

NaNK
license:cc-by-4.0
940
3

kokoro-82m-coreml

Based on the original kokoro model, see https://github.com/FluidInference/FluidAudio for inference We generated the same strings with to gerneate audio between 1s to ~300s in order to test the speed across a range of varying inputs on Pytorch CPU, MPS, and MLX pipeline, and compared it against the native Swift version with Core ML models. Each pipeline warmed up the models by running through it once with pesudo inputs, and then comparing the raw inference time with the model already loaded. You can see that for the Core ML model, we traded lower memory and very slightly faster inference for longer initial warm-up. Note that the Pytorch kokoro model in Pytorch has a memory leak issue: https://github.com/hexgrad/kokoro/issues/152 The following tests were ran on M4 Pro, 48GB RAM, Macbook Pro. If you have another device, please do try replicating it as well! I wasn't able to run the MPS model for longer durations, even with `PYTORCHENABLEMPSFALLBACK=1` enabled, it kept crashing for the longer strings. Note that it does take `~15s` to compile the model on the first run, subsequent runs are shorter, we expect ~2s to load.

license:apache-2.0
621
8

pocket-tts-coreml

license:cc-by-4.0
394
0

qwen3-asr-0.6b-coreml

NaNK
license:apache-2.0
295
7

parakeet-tdt-0.6b-v2-ov

NaNK
license:cc-by-4.0
55
0

parakeet-0.6b-ja-coreml

NaNK
license:cc-by-4.0
53
0

cohere-transcribe-03-2026-coreml

46
2

whisper-large-v3-turbo-int4-ov-npu

whisper-large-v3-fp16-ov Model creator: OpenAI Original model: whisper-large-v3 Description This is whisper-large-v3 model converted to the OpenVINO™ IR (Intermediate Representation) format with weights compressed to FP16. The provided OpenVINO™ IR model is compatible with: OpenVINO version 2025.2.0 and higher Optimum Intel 1.23.0 and higher

license:apache-2.0
25
0

nemotron-speech-streaming-en-0.6b-coreml

NaNK
21
1

whisper-large-v3-turbo-fp16-ov-npu

whisper-large-v3-fp16-ov Model creator: OpenAI Original model: whisper-large-v3 Description This is whisper-large-v3 model converted to the OpenVINO™ IR (Intermediate Representation) format with weights compressed to FP16. The provided OpenVINO™ IR model is compatible with: OpenVINO version 2025.2.0 and higher Optimum Intel 1.23.0 and higher

license:apache-2.0
18
0

ls-eend-coreml

license:mit
17
1

qwen3-8b-int4-ov-npu

Model converted specifically for NPU on Intel devices.

NaNK
license:apache-2.0
13
1

whisper-large-v3-turbo-int8-ov-npu

whisper-large-v3-fp16-ov Model creator: OpenAI Original model: whisper-large-v3 Description This is whisper-large-v3 model converted to the OpenVINO™ IR (Intermediate Representation) format with weights compressed to FP16. The provided OpenVINO™ IR model is compatible with: OpenVINO version 2025.2.0 and higher Optimum Intel 1.23.0 and higher python #!/usr/bin/env python3 import time import requests import openvinogenai import librosa from pathlib import Path from huggingfacehub import snapshotdownload def downloadmodel(modelid="FluidInference/whisper-large-v3-turbo-int8-ov-npu"): """Download model from HuggingFace Hub""" localdir = Path("models") / modelid.split("/")[-1] if localdir.exists() and any(localdir.iterdir()): return str(localdir) print(f"Downloading model...") snapshotdownload( repoid=modelid, localdir=str(localdir), localdirusesymlinks=False ) return str(localdir) def downloadhfaudiosamples(): """Download audio samples from Hugging Face""" samplesdir = Path("sampleaudios") samplesdir.mkdir(existok=True) downloaded = [] whispersamples = [ ("https://cdn-media.huggingface.co/speechsamples/sample1.flac", "sample1.flac"), ("https://cdn-media.huggingface.co/speechsamples/sample2.flac", "sample2.flac"), ] for url, filename in whispersamples: filepath = samplesdir / filename if filepath.exists(): downloaded.append(str(filepath)) continue try: response = requests.get(url, headers={"User-Agent": "Mozilla/5.0"}) response.raiseforstatus() with open(filepath, 'wb') as f: f.write(response.content) downloaded.append(str(filepath)) except Exception as e: print(f"Error downloading {filename}: {e}") def readaudio(filepath): """Read audio file and convert to 16kHz""" try: rawspeech, = librosa.load(filepath, sr=16000) return rawspeech.tolist() except Exception as e: print(f"Error reading {filepath}: {e}") return None def testwhisperonfile(pipe, filepath): """Test Whisper on a single audio file""" config = pipe.getgenerationconfig() config.language = " " config.task = "transcribe" config.returntimestamps = True config.maxnewtokens = 448 rawspeech = readaudio(filepath) if rawspeech is None: return None starttime = time.time() result = pipe.generate(rawspeech, config) inferencetime = time.time() - starttime return { "file": filepath, "duration": duration, "inferencetime": inferencetime, "rtf": inferencetime/duration, "transcription": str(result) } def main(): # Download model modelpath = downloadmodel() # Initialize pipeline on NPU print(f"\nInitializing NPU...") starttime = time.time() pipe = openvinogenai.WhisperPipeline(modelpath, "NPU") inittime = time.time() - starttime # Collect test files testfiles = [] testfiles.extend(Path(".").glob(".wav")) if Path("samples/c/whisperspeechrecognition").exists(): testfiles.extend(Path("samples/c/whisperspeechrecognition").glob(".wav")) # Download HF samples hfsamples = downloadhfaudiosamples() testfiles.extend([Path(f) for f in hfsamples]) # Test all files print(f"\nTesting {len(testfiles)} files...") for audiofile in testfiles: result = testwhisperonfile(pipe, str(audiofile)) if result: results.append(result) print(f"[OK] {Path(result['file']).name}: RTF={result['rtf']:.2f}x") # Print summary if results: totalduration = sum(r["duration"] for r in results) totalinference = sum(r["inferencetime"] for r in results) avgrtf = totalinference / totalduration print(f"\n{'='50}") print(f"NPU Performance Summary") print(f"{'='50}") print(f"Model load time: {inittime:.1f}s") print(f"Files tested: {len(results)}") print(f"Total audio: {totalduration:.1f}s") print(f"Total inference: {totalinference:.1f}s") print(f"Average RTF: {avgrtf:.2f}x {'[Faster than real-time]' if avgrtf 60: trans = trans[:57] + "..." print(f"- {Path(r['file']).name}: \"{trans}\"")

license:apache-2.0
13
1

qwen3-1.7b-int4-ov-npu

Model converted specifically for NPU on Intel devices.

NaNK
license:apache-2.0
12
0

phi-4-mini-instruct-int4-ov-npu

license:mit
11
0

qwen3-tts-coreml

license:apache-2.0
10
0

speaker-diarization-ov

Pyannote and wespeaker models converted for Speaker diarization and identification for OpenVINO

license:mit
10
0

whisper-tiny-int4-ov

8
0

phi-4-mini-instruct-fp16-ov-npu

license:mit
7
0

qwen3-0.6b-int4-ov-npu

Model converted specifically for NPU on Intel devices.

NaNK
license:apache-2.0
5
0

qwen3-4b-int4-ov-npu

NaNK
license:apache-2.0
4
0

whisper-large-v3-turbo-qnn

license:apache-2.0
4
0

qwen3-4b-fp16-npu-ov

Qwen3-4B-fp16-ov Model creator: Qwen Original model: Qwen3-4B Description This is Qwen3-4B model converted to the OpenVINO™ IR (Intermediate Representation) format with weights compressed to FP16. The provided OpenVINO™ IR model is compatible with: OpenVINO version 2025.1.0 and higher Optimum Intel 1.24.0 and higher 1. Install packages required for using Optimum Intel integration with the OpenVINO backend: For more examples and possible optimizations, refer to the Inference with Optimum Intel. 1. Install packages required for using OpenVINO GenAI. More GenAI usage examples can be found in OpenVINO GenAI library docs and samples You can find more detaild usage examples in OpenVINO Notebooks: The original model is distributed under Apache License Version 2.0 license. More details can be found in Qwen3-4B. Intel is committed to respecting human rights and avoiding causing or contributing to adverse impacts on human rights. See Intel’s Global Human Rights Principles. Intel’s products and software are intended only to be used in applications that do not cause or contribute to adverse impacts on human rights.

NaNK
license:apache-2.0
2
0

Qwen3-8B-int4-ov

NaNK
license:apache-2.0
1
0

Qwen3-8B-int8-ov

Qwen3-8B-int8-ov Model creator: Qwen Original model: Qwen3-8B Description This is Qwen3-8B model converted to the OpenVINO™ IR (Intermediate Representation) format with weights compressed to INT8 by NNCF. Weight compression was performed using `nncf.compressweights` with the following parameters: For more information on quantization, check the OpenVINO model optimization guide. The provided OpenVINO™ IR model is compatible with: OpenVINO version 2025.1.0 and higher Optimum Intel 1.24.0 and higher 1. Install packages required for using Optimum Intel integration with the OpenVINO backend: For more examples and possible optimizations, refer to the Inference with Optimum Intel. 1. Install packages required for using OpenVINO GenAI. More GenAI usage examples can be found in OpenVINO GenAI library docs and samples You can find more detaild usage examples in OpenVINO Notebooks: The original model is distributed under Apache License Version 2.0 license. More details can be found in Qwen3-8B. Intel is committed to respecting human rights and avoiding causing or contributing to adverse impacts on human rights. See Intel’s Global Human Rights Principles. Intel’s products and software are intended only to be used in applications that do not cause or contribute to adverse impacts on human rights.

NaNK
license:apache-2.0
1
0

parakeet-tdt-0.6b-v3-ov

NaNK
license:cc-by-4.0
0
1