jdp8

11 models • 1 total models in database

Sort by:

wikineural-multilingual-ner

audioldm2

AudioLDM 2 is a latent text-to-audio diffusion model capable of generating realistic audio samples given any text input. It is available in the 🧨 Diffusers library from v0.21.0 onwards. AudioLDM 2 was proposed in the paper AudioLDM 2: Learning Holistic Audio Generation with Self-supervised Pretraining by Haohe Liu et al. AudioLDM takes a text prompt as input and predicts the corresponding audio. It can generate text-conditional sound effects, human speech and music. This is the original, base version of the AudioLDM 2 model, also referred to as audioldm2-full. There are three official AudioLDM 2 checkpoints. Two of these checkpoints are applicable to the general task of text-to-audio generation. The third checkpoint is trained exclusively on text-to-music generation. All checkpoints share the same model size for the text encoders and VAE. They differ in the size and depth of the UNet. See table below for details on the three official checkpoints: | Checkpoint | Task | UNet Model Size | Total Model Size | Training Data / h | |-----------------------------------------------------------------|---------------|-----------------|------------------|-------------------| | audioldm2 | Text-to-audio | 350M | 1.1B | 1150k | | audioldm2-large | Text-to-audio | 750M | 1.5B | 1150k | | audioldm2-music | Text-to-music | 350M | 1.1B | 665k | - Original Repository - 🧨 Diffusers Pipeline - Paper - Demo For text-to-audio generation, the AudioLDM2Pipeline can be used to load pre-trained weights and generate text-conditional audio outputs: The resulting audio output can be saved as a .wav file: Prompts: Descriptive prompt inputs work best: you can use adjectives to describe the sound (e.g. "high quality" or "clear") and make the prompt context specific (e.g., "water stream in a forest" instead of "stream"). It's best to use general terms like 'cat' or 'dog' instead of specific names or abstract objects that the model may not be familiar with. The quality of the generated waveforms can vary significantly based on the seed. Try generating with different seeds until you find a satisfactory generation Multiple waveforms can be generated in one go: set `numwaveformsperprompt` to a value greater than 1. Automatic scoring will be performed between the generated waveforms and prompt text, and the audios ranked from best to worst accordingly. The following example demonstrates how to construct a good audio generation using the aforementioned tips:

license:cc-by-nc-sa-4.0

Dinov2 Liveness Detection V2.2.3

Note: Having a separate repo for ONNX weights is intended to be a temporary solution until WebML gains more traction. If you would like to make your models web-ready, we recommend converting to ONNX using 🤗 Optimum and structuring your repo like this one (with ONNX weights located in a subfolder named onnx).

NaNK

—