camenduru

207 models • 1 total models in database

Sort by:

FLUX.1-dev-ungated

`FLUX.1 [dev]` is a 12 billion parameter rectified flow transformer capable of generating images from text descriptions. For more information, please read our blog post. Key Features 1. Cutting-edge output quality, second only to our state-of-the-art model `FLUX.1 [pro]`. 2. Competitive prompt following, matching the performance of closed source alternatives . 3. Trained using guidance distillation, making `FLUX.1 [dev]` more efficient. 4. Open weights to drive new scientific research, and empower artists to develop innovative workflows. 5. Generated outputs can be used for personal, scientific, and commercial purposes as described in the [`FLUX.1 [dev]` Non-Commercial License](https://huggingface.co/black-forest-labs/FLUX.1-dev/blob/main/LICENSE.md). Usage We provide a reference implementation of `FLUX.1 [dev]`, as well as sampling code, in a dedicated github repository. Developers and creatives looking to build on top of `FLUX.1 [dev]` are encouraged to use this as a starting point. API Endpoints The FLUX.1 models are also available via API from the following sources - bfl.ml (currently `FLUX.1 [pro]`) - replicate.com - fal.ai - mystic.ai ComfyUI `FLUX.1 [dev]` is also available in Comfy UI for local inference with a node-based workflow. To use `FLUX.1 [dev]` with the 🧨 diffusers python library, first install or upgrade diffusers To learn more check out the diffusers documentation --- Limitations - This model is not intended or able to provide factual information. - As a statistical model this checkpoint might amplify existing societal biases. - The model may fail to generate output that matches the prompts. - Prompt following is heavily influenced by the prompting-style. Out-of-Scope Use The model and its derivatives may not be used - In any way that violates any applicable national, federal, state, local or international law or regulation. - For the purpose of exploiting, harming or attempting to exploit or harm minors in any way; including but not limited to the solicitation, creation, acquisition, or dissemination of child exploitative content. - To generate or disseminate verifiably false information and/or content with the purpose of harming others. - To generate or disseminate personal identifiable information that can be used to harm an individual. - To harass, abuse, threaten, stalk, or bully individuals or groups of individuals. - To create non-consensual nudity or illegal pornographic content. - For fully automated decision making that adversely impacts an individual's legal rights or otherwise creates or modifies a binding, enforceable obligation. - Generating or facilitating large-scale disinformation campaigns. License This model falls under the [`FLUX.1 [dev]` Non-Commercial License](https://huggingface.co/black-forest-labs/FLUX.1-dev/blob/main/LICENSE.md).

—

Video-to-Video

—

potat1

—

159

VistaDream

—

FLUX.1-Fill-dev-ungated

`FLUX.1 Fill [dev]` is a 12 billion parameter rectified flow transformer capable of filling areas in existing images based on a text description. For more information, please read our blog post. Key Features 1. Cutting-edge output quality, second only to our state-of-the-art model `FLUX.1 Fill [pro]`. 2. Blends impressive prompt following with completing the structure of your source image. 3. Trained using guidance distillation, making `FLUX.1 Fill [dev]` more efficient. 4. Open weights to drive new scientific research, and empower artists to develop innovative workflows. 5. Generated outputs can be used for personal, scientific, and commercial purposes as described in the [`FLUX.1 [dev]` Non-Commercial License](https://huggingface.co/black-forest-labs/FLUX.1-dev/blob/main/LICENSE.md). Usage We provide a reference implementation of `FLUX.1 Fill [dev]`, as well as sampling code, in a dedicated github repository. Developers and creatives looking to build on top of `FLUX.1 Fill [dev]` are encouraged to use this as a starting point. API Endpoints The FLUX.1 models are also available in our API bfl.ml To use `FLUX.1 Fill [dev]` with the 🧨 diffusers python library, first install or upgrade diffusers Then you can use `FluxFillPipeline` to run the model To learn more check out the diffusers documentation Limitations - This model is not intended or able to provide factual information. - As a statistical model this checkpoint might amplify existing societal biases. - The model may fail to generate output that matches the prompts. - Prompt following is heavily influenced by the prompting-style. - There may be slight-color shifts in areas that are not filled in - Filling in complex textures may produce lines at the edges of the filled-area. Out-of-Scope Use The model and its derivatives may not be used - In any way that violates any applicable national, federal, state, local or international law or regulation. - For the purpose of exploiting, harming or attempting to exploit or harm minors in any way; including but not limited to the solicitation, creation, acquisition, or dissemination of child exploitative content. - To generate or disseminate verifiably false information and/or content with the purpose of harming others. - To generate or disseminate personal identifiable information that can be used to harm an individual. - To harass, abuse, threat

—

VideoComposer

—

MeiGen-MultiTalk

MeiGen-MultiTalk • Audio-Driven Multi-Person Conversational Video Generation > We present MultiTalk, an open-source audio-driven multi-person conversational video generation model with the state-of-the-art lip synchronization accuracy. > Key features: > - 💬 Realistic Conversations - Supports single & multi-person generation > - 👥 Interactive Character Control - Direct virtual humans via prompts > - 🎤 Generalization Performances - Supports the generation of cartoon character and singing > - 📺 Resolution Flexibility: 480p & 720p output at arbitrary aspect ratios > - ⏱️ Long Video Generation: Support video generation up to 15 seconds This repository hosts the model weights for MultiTalk. For installation, usage instructions, and further documentation, please visit our GitHub repository. Method We propose a novel framework, MultiTalk, for audio-driven multi-person conversational video generation. We investigate several schemes for audio injection and introduce the Label Rotary Position Embedding (L-RoPE) method. By assigning identical labels to audio embeddings and video latents, it effectively activates specific regions within the audio cross-attention map, thereby resolving incorrect binding issues. To localize the region of the specified person, we introduce the adaptive person localization by computing the similarity between the features of the given region of a person in the reference image and all the features of the whole video. Citation If you find our work helpful, please cite us. License Agreement The models in this repository are licensed under the Apache 2.0 License. We claim no rights over the your generated contents, granting you the freedom to use them while ensuring that your usage complies with the provisions of this license. You are fully accountable for your use of the models, which must not involve sharing any content that violates applicable laws, causes harm to individuals or groups, disseminates personal information intended for harm, spreads misinformation, or targets vulnerable populations.

license:apache-2.0

IMAGDressing

license:apache-2.0

xenmon-xl-v1

—

svd_xt_1_1_unet

—

DemoFusion

—

DreamClear

license:apache-2.0

Envision3D

—

InstantID

license:apache-2.0

TripoSR

—

plushies

—

MiniGPT4-7B

NaNK

llama

parakeet-rnnt-1.1b

NaNK

—

joy-caption-alpha-two

—

midstreet

—

ThemeStation

—

IICF

—

MiniGPT4

llama

champ

—

FLUX.1_Kontext-Lightning

Update 7/9/25: This model is now quantized and implemented in this example space. Seeing preliminary VRAM usage at around ~10GB with faster inferencing. Will be experimenting with different weights and schedulers to find particularly well-performing libraries. Highly experimental, will update with more details later. - 6-8 steps - Euler, SGM Uniform (Recommended, feel free to play around) Getting mixed results now, feel free to play around and share. Experimenting with FLUX.1-dev LoRAs and how it affects Kontext-dev. This model has been fused with acceleration LoRAs. License This model falls under the [FLUX.1 \[dev\] Non-Commercial License](https://github.com/black-forest-labs/flux/blob/main/modellicenses/LICENSE-FLUX1-dev), please familiarize yourself with the license.

—

xenmon-xl-v2

—

wonder3d-v1.0

license:agpl-3.0

marigold-e2e-ft-normals

license:apache-2.0

flux1-kontext-dev_fp8_e4m3fn_diffusers

—

XTTS-v1

—

test_gpu

—

EvoVLM-JP-v1-7B-4bit

NaNK

—

Meta-Llama-3.1-8B-Instruct

NaNK

llama

test-10

—

kosmos-2-patch14-224

—

IF-II-L-v1.0

—

Mixtral-8x22B-Instruct-v0.1

NaNK

—

RMBG-1.4

—

MythoMax-L2-13b

NaNK

llama

robustsam-vit-huge

license:mit

marigold-e2e-ft-depth

license:apache-2.0

SUPIR

—

165

FLUX.1-dev

—

142

Wav2Lip

—

YoloWorld-EfficientSAM

—

gaussian-splatting

—

SMPLer X

—

one-shot-talking-face

—

HandRefiner

—

stable-diffusion-3.5-large

—

xl_sliders

—

text2-video-zero

—

big-lama

—

dust3r

—

Diffutoon

—

instant-ngp

—

one-shot-talking-face-20.04-t4

—

one-shot-talking-face-20.04-a10

—

show

—

openpose

—

joy-caption-alpha-one

—

SadTalker

—

textdiffuser

—

video-retalking

—

OOTDiffusion

—

improved-aesthetic-predictor

—

potat1_dataset

—

3d-photo-inpainting

—

DWPose

—

ios-emoji-xl

—

beats

—

PeRF

—

dreamtalk

—

MagicDance

—

Multi-LoRA-Composition

—

PuLID

license:apache-2.0

one-shot-talking-face-20.04

—

DragGAN

—

StableSR

—

facechain

—

DeepFilterNet2

—

ProPainter

—

OpenVoice

—

cv_ddcolor_image-colorization

license:apache-2.0

Arc2Face

—

MuseTalk

—

CogVideoX-5b-8bit

NaNK

—

google_t5_v1.1

—

oasis-500m

license:mit

DimensionX

—

HunyuanVideo

—

memo

MEMO: Memory-Guided Diffusion for Expressive Talking Video Generation Longtao Zheng\, Yifan Zhang\, Hanzhong Guo\, Jiachun Pan, Zhenxiong Tan, Jiahao Lu, Chuanxin Tang, Bo An, Shuicheng Yan This repository contains the example inference script for the MEMO-preview model. The gif demo below is compressed. See our project page for full videos. > Our code will download the checkpoint from Hugging Face automatically, and the models for face analysis and vocal separation will be downloaded to `miscmodeldir` of `configs/inference.yaml`. If you want to download the models manually, please download the checkpoint from here and specify the path in `modelnameorpath` of `configs/inference.yaml`. > We tested the code on H100 and RTX 4090 GPUs using CUDA 12. Under the default settings (fps=30, inferencesteps=20), the inference time is around 1 second per frame on H100 and 2 seconds per frame on RTX 4090. We welcome community contributions to improve the inference speed or interfaces like ComfyUI. Our work is made possible thanks to high-quality open-source talking video datasets (including HDTF, VFHQ, CelebV-HQ, MultiTalk, and MEAD) and some pioneering works (such as EMO and Hallo). We acknowledge the potential of AI in generating talking videos, with applications spanning education, virtual assistants, and entertainment. However, we are equally aware of the ethical, legal, and societal challenges that misuse of this technology could pose. To reduce potential risks, we have only open-sourced a preview model for research purposes. Demos on our website use publicly available materials. We welcome copyright concerns—please contact us if needed, and we will address issues promptly. Users are required to ensure that their actions align with legal regulations, cultural norms, and ethical standards. It is strictly prohibited to use the model for creating malicious, misleading, defamatory, or privacy-infringing content, such as deepfake videos for political misinformation, impersonation, harassment, or fraud. We strongly encourage users to review generated content carefully, ensuring it meets ethical guidelines and respects the rights of all parties involved. Users must also ensure that their inputs (e.g., audio and reference images) and outputs are used with proper authorization. Unauthorized use of third-party intellectual property is strictly forbidden. While users may claim ownership of content generated by the model, they must ensure compliance with copyright laws, particularly when involving public figures' likeness, voice, or other aspects protected under personality rights. If you find our work useful, please use the following citation:

license:apache-2.0