audio-flamingo-3-hf

Name: audio-flamingo-3-hf
Author: nvidia

8.9K

143

1 language

—

nvidia

Audio Model

OTHER

New

9K downloads

Early-stage

Try on Hugging Face Add to Compare

Edge AI:

Mobile

Laptop

Server

Unknown

Mobile

Laptop

Server

Quick Summary

Audio Flamingo 3: Advancing Audio Intelligence with Fully Open Large Audio-Language Models Description: Audio Flamingo 3 (AF3) is a fully open, state-of-the-ar...

Code Examples

Usagebash

pip install --upgrade pip
pip install transformers==5.0.0rc1 accelerate

vLLM Inference (5-7x faster)bashvllm

VLLM_USE_PRECOMPILED=1 uv pip install -U --pre \
  --override <(printf 'transformers>=5.0.0rc1\n') \
  "vllm[audio] @ git+https://github.com/vllm-project/vllm.git"

audio_url = Path("./audio_file.mp3").expanduser().resolve().as_uri() # local file -> file://...pythonvllm

import os
from pathlib import Path

from vllm import LLM, SamplingParams

os.environ["VLLM_ALLOW_LONG_MAX_MODEL_LEN"] = "1"

# audio_url = Path("./audio_file.mp3").expanduser().resolve().as_uri()   # local file -> file://...
audio_url = "https://huggingface.co/datasets/nvidia/AudioSkills/resolve/main/assets/WhDJDIviAOg_120_10.mp3"  # web URL -> https://...

prompt = "Transcribe the input speech."

llm = LLM(
    model="nvidia/audio-flamingo-3-hf",
    allowed_local_media_path=str(Path.cwd()),
    max_model_len=20000,
)
sp = SamplingParams(max_tokens=4096, temperature=0.0, repetition_penalty=1.2)

print(
    llm.chat(
        [
            {
                "role": "user",
                "content": [
                    {"type": "text", "text": prompt},
                    {"type": "audio_url", "audio_url": {"url": audio_url}},
                ],
            }
        ],
        sp,
    )[0]
    .outputs[0]
    .text
)

Flash Attention 2bash

pip install flash-attn --no-build-isolation

Flash Attention 2python

model = AudioFlamingo3ForConditionalGeneration.from_pretrained(
    model_id, torch_dtype=torch_dtype, low_cpu_mem_usage=True, attn_implementation="flash_attention_2"
).to(device)

Deploy This Model

Production-ready deployment in minutes

Together.ai

Instant API access to this model

Fastest API

Production-ready inference API. Start free, scale to millions.

Try Free API

Replicate

One-click model deployment

Easiest Setup

Run models in the cloud with simple API. No DevOps required.

Deploy Now

Disclosure: We may earn a commission from these partners. This helps keep LLMYourWay free.