WithAnyone

1 models • 1 total models in database

Sort by:

WithAnyone

WithAnyone: Towards Controllable and ID Consistent Image Generation The model was presented in the paper WithAnyone: Towards Controllable and ID Consistent Image Generation. [](https://arxiv.org/abs/2510.14975) [](https://doby-xu.github.io/WithAnyone/) [](https://github.com/Doby-Xu/WithAnyone) [](https://huggingface.co/datasets/WithAnyone/MultiID-Bench) [](https://huggingface.co/datasets/WithAnyone/MultiID-2M) [](https://huggingface.co/spaces/WithAnyone/WithAnyonedemo) [](https://huggingface.co/black-forest-labs/FLUX.1-dev/blob/main/LICENSE.md) Abstract Identity-consistent generation has become an important focus in text-to-image research, with recent models achieving notable success in producing images aligned with a reference identity. Yet, the scarcity of large-scale paired datasets containing multiple images of the same individual forces most approaches to adopt reconstruction-based training. This reliance often leads to a failure mode we term copy-paste, where the model directly replicates the reference face rather than preserving identity across natural variations in pose, expression, or lighting. Such over-similarity undermines controllability and limits the expressive power of generation. To address these limitations, we (1) construct a large-scale paired dataset MultiID-2M, tailored for multi-person scenarios, providing diverse references for each identity; (2) introduce a benchmark that quantifies both copy-paste artifacts and the trade-off between identity fidelity and variation; and (3) propose a novel training paradigm with a contrastive identity loss that leverages paired data to balance fidelity with diversity. These contributions culminate in WithAnyone, a diffusion-based model that effectively mitigates copy-paste while preserving high identity similarity. Extensive qualitative and quantitative experiments demonstrate that WithAnyone significantly reduces copy-paste artifacts, improves controllability over pose and expression, and maintains strong perceptual quality. User studies further validate that our method achieves high identity fidelity while enabling expressive controllable generation. Model Zoo | Model | Description | Download | |-|-|-| | WithAnyone 1.0 - FLUX.1 [dev] | Just use this one. | HuggingFace | | WithAnyone.K Preview - FLUX.1 Kontext [dev] | For t2i generation with FLUX.1 Kontext | HuggingFace | | WithAnyone.Ke Preview - FLUX.1 Kontext [dev] | For face-editing with FLUX.1 Kontext | HuggingFace | If you just want to try it out, please use the base model WithAnyone - FLUX.1 [dev]. The other models are for the following use cases: WithAnyone.K This is a preliminary version of WithAnyone with FLUX.1 Kontext. It can be used for text-to-image generation with multiple given identities. However, stability and quality are not as good as the base model. Please use it with caution. We are working on improving it. WithAnyone.Ke This is a face editing version of WithAnyone with FLUX.1 Kontext, leveraging the editing capabilities of FLUX.1 Kontext. Please use it with `gradioedit.py` instead of `gradioapp.py`. It is still a preliminary version, and we are working on improving it. Highlight of WithAnyone - Controllable: WithAnyone aims to mitigate the "copy-paste" artifacts in face generation. Previous methods have a tendency to directly copy and paste the reference face onto the generated image, leading poor controllability of expressions, hairstyles, accessories, and even poses. They falls into a clear trade-off between similarity and copy-paste. The more similar the generated face is to the reference, the more copy-paste artifacts it has. WithAnyone is an attampt to break this trade-off. - Multi-ID Generation: WithAnyone can generate multiple given identities in a single image. With the help of controllable face generation, all generated faces can fit harmoniously in one group photo. 🏰 Model Zoo | Model | Description | Download | |-|-|-| | WithAnyone 1.0 - FLUX.1 | Main model with FLUX.1 | HuggingFace | | WithAnyone.K.preview - FLUX.1 Kontext | For t2i generation with FLUX.1 Kontext | HuggingFace | | WithAnyone.Ke.preview - FLUX.1 Kontext | For face-editing with FLUX.1 Kontext | HuggingFace | If you just want to try it out, please use the base model WithAnyone - FLUX.1. The other models are for the following use cases: WithAnyone.K This is a preliminary version of WithAnyone with FLUX.1 Kontext. It can be used for text-to-image generation with multiple given identities. However, stability and quality are not as good as the base model. Please use it with caution. We are working on improving it. WithAnyone.Ke This is a face editing version of WithAnyone with FLUX.1 Kontext, leveraging the editing capabilities of FLUX.1 Kontext. Please use it with `gradioedit.py` instead of `gradioapp.py`. It is still a preliminary version, and we are working on improving it. Use `pip install -r requirements.txt` to install the necessary packages. You can download the necessary model checkpoints in one of the two ways: 1. Directly run the inference scripts. The checkpoints will be downloaded automatically by the `hfhubdownload` function in the code to your `$HFHOME` (default: `~/.cache/huggingface`). 2. Use `huggingface-cli download ` to download: - `black-forest-labs/FLUX.1-dev` - `xlabs-ai/xfluxtextencoders` - `openai/clip-vit-large-patch14` - `google/siglip-base-patch16-256-i18n` - `withanyone/withanyone` Then run the inference scripts. You can download only the checkpoints you need to speed up setup and save disk space. Example for `black-forest-labs/FLUX.1-dev`: - `huggingface-cli download black-forest-labs/FLUX.1-dev flux1-dev.safetensors` - `huggingface-cli download black-forest-labs/FLUX.1-dev ae.safetensors` Ignore the text encoder in the `black-forest-labs/FLUX.1-dev` model repo (it is there for `diffusers` calls). All checkpoints together require about 37 GB of disk space. After downloading, set the following arguments in the inference script to the local paths of the downloaded checkpoints: We need to use the ArcFace model for face embedding. It will automatically be downloaded to `./models/`. However, there is an original bug. If you see an error like `assert 'detection' in self.models`, please manually move the model directory: mv models/antelopev2/ models/antelopev2 mv models/antelopev2/antelopev2/ models/antelopev2/ rm -rf models/antelopev2, antelopev2.zip The Gradio GUI demo is a good starting point to experiment with WithAnyone. Run it with: ❗ WithAnyone requires face bounding boxes (bboxes). You should provide them to indicate where faces are. You can provide face bboxes in two ways: 1. Upload an example image with desired face locations in `Mask Configuration (Option 1: Automatic)`. The face bboxes will be extracted automatically, and faces will be generated in the same locations. Do not worry if the given image has a different resolution or aspect ratio; the face bboxes will be resized accordingly. 2. Input face bboxes directly in `Mask Configuration (Option 2: Manual)`. The format is `x1,y1,x2,y2` for each face, one per line. 3. (NOT recommended) leave both options empty, and the face bboxes will be randomly chosen from a pre-defined set. ⭕ WithAnyone works well with LoRA. If you have any stylized LoRA checkpoints, use `--additionallorackpt ` when launching the demo. The LoRA will be merged into the diffusion model. ⭕ In `Advanced Options`, there is a slider controlling whether outputs are more "similar in spirit" or "similar in form" to the reference faces. - Move the slider to the right to preserve more details in the reference image (expression, makeup, accessories, hairstyle, etc.). Identity will also be better preserved. - Move it to the left for more freedom and creativity. Stylization can be stronger, hair style and makeup can be changed. How the slider works and some tips The slider actually controlls the weight of SigLIP embedding and ArcFace embedding. The former preserves more mid-level semantic details, while the latter preserves more high-level identity information. SigLIP is a general image embedding model, capturing more than just faces, while ArcFace is a face-specific embedding model, capturing only identity information. When using high arcface weight (slider to the left), please add more description of the identity in the prompt, since arcface embedding may lose information like hairstyle, skin color, body build, age, etc. 💡 Tips for Better Results Be prepared for the first few runs as it may not be very satisfying. - Provide detailed prompts describing the identity. WithAnyone is "controllable", so it needs more information to be controlled. Here are something that might go wrong if not specified: - Skin color (generally the race is fine, but for asain descent, if not specified, it may generate darker skin tone); - Age (e.g., intead of "a man", try "a young man". If not specified, it may generate an older figure); - Body build; - Hairstyle; - Accessories (glasses, hats, earrings, etc.); - Makeup - Use the slider to balance between "Resemblance in Spirit" and "Resemblance in Form" according to your needs. If you want to preserve more details in the reference image, move the slider to the right; if you want more freedom and creativity, move it to the left. - Try it with LoRAs from community. They are usually fantastic. You can use `inferwithanyone.py` for batch inference. The script supports generating multiple images with MultiID-Bench. And convert the parquet file to a folder of images and a json file using `MultiIDBench/parquet2bench.py`: You will get a folder with the following structure: Where the dataroot should be p1/untar, p2/untar, or p3/ depending on which subset you want to evaluate. The evaljsonpath should be the corresponding json file converted from the parquet file. ⚙️ Face Edit with FLUX.1 Kontext You can use `gradioedit.py` for face editing with FLUX.1 Kontext and WithAnyone.Ke. The code of WithAnyone is released under the Apache License 2.0, while the WithAnyone model and associated datasets are made available solely for non-commercial academic research purposes. - License Terms: The WithAnyone model is distributed under the [FLUX.1 [dev] Non-Commercial License v1.1.1](https://huggingface.co/black-forest-labs/FLUX.1-dev/blob/main/LICENSE.md). All underlying base models remain governed by their respective original licenses and terms, which shall continue to apply in full. Users must comply with all such applicable licenses when using this project. - Permitted Use: This project may be used for lawful academic research, analysis, and non-commercial experimentation only. Any form of commercial use, redistribution for profit, or application that violates applicable laws, regulations, or ethical standards is strictly prohibited. - User Obligations: Users are solely responsible for ensuring that their use of the model and dataset complies with all relevant laws, regulations, institutional review policies, and third-party license terms. - Disclaimer of Liability: The authors, developers, and contributors make no warranties, express or implied, regarding the accuracy, reliability, or fitness of this project for any particular purpose. They shall not be held liable for any damages, losses, or legal claims arising from the use or misuse of this project, including but not limited to violations of law or ethical standards by end users. - Acceptance of Terms: By downloading, accessing, or using this project, you acknowledge and agree to be bound by the applicable license terms and legal requirements, and you assume full responsibility for all consequences resulting from your use. 🌹 Acknowledgement We thank the following prior art for their excellent open source work: - PuLID - UNO - UniPortrait - InfiniteYou - DreamO - UMO If you find this project useful in your research, please consider citing:

NaNK

—