audio-flamingo-3-hf

8.9K
143
1 language
by
nvidia
Audio Model
OTHER
New
9K downloads
Early-stage
Edge AI:
Mobile
Laptop
Server
Unknown
Mobile
Laptop
Server
Quick Summary

Audio Flamingo 3: Advancing Audio Intelligence with Fully Open Large Audio-Language Models Description: Audio Flamingo 3 (AF3) is a fully open, state-of-the-ar...

Code Examples

Usagebash
pip install --upgrade pip
pip install transformers==5.0.0rc1 accelerate
vLLM Inference (5-7x faster)bashvllm
VLLM_USE_PRECOMPILED=1 uv pip install -U --pre \
  --override <(printf 'transformers>=5.0.0rc1\n') \
  "vllm[audio] @ git+https://github.com/vllm-project/vllm.git"
audio_url = Path("./audio_file.mp3").expanduser().resolve().as_uri() # local file -> file://...pythonvllm
import os
from pathlib import Path

from vllm import LLM, SamplingParams

os.environ["VLLM_ALLOW_LONG_MAX_MODEL_LEN"] = "1"

# audio_url = Path("./audio_file.mp3").expanduser().resolve().as_uri()   # local file -> file://...
audio_url = "https://huggingface.co/datasets/nvidia/AudioSkills/resolve/main/assets/WhDJDIviAOg_120_10.mp3"  # web URL -> https://...

prompt = "Transcribe the input speech."

llm = LLM(
    model="nvidia/audio-flamingo-3-hf",
    allowed_local_media_path=str(Path.cwd()),
    max_model_len=20000,
)
sp = SamplingParams(max_tokens=4096, temperature=0.0, repetition_penalty=1.2)

print(
    llm.chat(
        [
            {
                "role": "user",
                "content": [
                    {"type": "text", "text": prompt},
                    {"type": "audio_url", "audio_url": {"url": audio_url}},
                ],
            }
        ],
        sp,
    )[0]
    .outputs[0]
    .text
)
Flash Attention 2bash
pip install flash-attn --no-build-isolation
Flash Attention 2python
model = AudioFlamingo3ForConditionalGeneration.from_pretrained(
    model_id, torch_dtype=torch_dtype, low_cpu_mem_usage=True, attn_implementation="flash_attention_2"
).to(device)

Deploy This Model

Production-ready deployment in minutes

Together.ai

Instant API access to this model

Fastest API

Production-ready inference API. Start free, scale to millions.

Try Free API

Replicate

One-click model deployment

Easiest Setup

Run models in the cloud with simple API. No DevOps required.

Deploy Now

Disclosure: We may earn a commission from these partners. This helps keep LLMYourWay free.