[ControlNet SDXL Inpainting] Support inpainting of ControlNet SDXL#4694
[ControlNet SDXL Inpainting] Support inpainting of ControlNet SDXL#4694yiyixuxu merged 24 commits intohuggingface:mainfrom
Conversation
|
Wow, I really need it. Can it work now? I always generate black pictures with it ? Can you post the api usage, thanks a lot ! |
I discovered some issues today, but it should generate sensible images, rather than black ones ... Let me complete this by this week. Feel free to add my discord: harutatsuakiyama |
…sers into sdxl_ctrl_inpaint
I fixed the issue yesterday. The code should work as expected. |
|
I use the following pipeline, but still generate black image. def inpaint_with_controlnet():
import torch
from diffusers import StableDiffusionXLInpaintPipeline
from diffusers.utils import load_image
from diffusers import StableDiffusionXLControlNetPipeline, ControlNetModel, UniPCMultistepScheduler
from pipeline_controlnet_inpaint_sd_xl import StableDiffusionXLControlNetInpaintPipeline
img_url = "https://user-images.githubusercontent.com/8084808/262496067-e01fb3c9-aece-4560-ae64-6354fdd789d7.png"
mask_url = "https://user-images.githubusercontent.com/8084808/262496139-234e0049-43ab-415b-ae6d-4cbb96055f6d.png"
control_image_url = img_url
# Compute openpose conditioning image.
from controlnet_aux import OpenposeDetector
openpose = OpenposeDetector.from_pretrained("lllyasviel/ControlNet")
control_image = openpose(load_image(control_image_url))
controlnet = ControlNetModel.from_pretrained("thibaud/controlnet-openpose-sdxl-1.0", torch_dtype=torch.float16)
pipe = StableDiffusionXLControlNetInpaintPipeline.from_pretrained(
"stabilityai/stable-diffusion-xl-base-1.0",
controlnet=controlnet,
torch_dtype=torch.float16,
)
pipe.to("cuda")
init_image = load_image(img_url).convert("RGB")
mask_image = load_image(mask_url).convert("RGB")
prompt = "hand"
strength=0.5
controlnet_conditioning_scale = 1.0
image = pipe(
prompt=prompt,
image=init_image,
mask_image=mask_image,
control_image=control_image,
controlnet_conditioning_scale=controlnet_conditioning_scale,
strength=strength,
).images[0]
image.save('result.jpg') |
Thank you for the code! You need to use torch.float32 instead of torch.float16. I tested the following code, should work: Feel free to add my discord and we can discuss there. |
|
Very cool PR! @yiyixuxu can you give this a look? :-) |
yiyixuxu
left a comment
There was a problem hiding this comment.
Thanks! excellent work!
I think 2 main thing left are:
- Refactor with a mask_image_processor https://github.com/huggingface/diffusers/pull/4444/files
- Add MultiControlnet support
|
|
||
|
|
||
| # Copied from diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion_inpaint.prepare_mask_and_masked_image | ||
| def prepare_mask_and_masked_image(image, mask, height, width, return_image=False): |
There was a problem hiding this comment.
We just deprecated this function :)
in this PR #4444 (comment)
let's update this PR too
There was a problem hiding this comment.
Updated
self.image_processor = VaeImageProcessor(vae_scale_factor=self.vae_scale_factor)
self.mask_processor = VaeImageProcessor(
vae_scale_factor=self.vae_scale_factor, do_normalize=False, do_binarize=True, do_convert_grayscale=True)
self.control_image_processor = VaeImageProcessor(vae_scale_factor=self.vae_scale_factor, do_convert_rgb=True, do_normalize=False)| self.control_image_processor = VaeImageProcessor( | ||
| vae_scale_factor=self.vae_scale_factor, do_convert_rgb=True, do_normalize=False | ||
| ) | ||
| self.watermark = StableDiffusionXLWatermarker() |
There was a problem hiding this comment.
add a mask_processor here
| generator = torch.Generator(device=device).manual_seed(seed) | ||
|
|
||
| controlnet_embedder_scale_factor = 2 | ||
| control_image = randn_tensor( |
There was a problem hiding this comment.
I think we accept image tensor in [0,1] range, so should not use randn_tensor here
There was a problem hiding this comment.
Thank you! Corrected.
control_image = (
floats_tensor(
(1, 3, 32 * controlnet_embedder_scale_factor, 32 * controlnet_embedder_scale_factor),
rng=random.Random(seed),
)
.to(device)
.cpu()
)| init_image = init_image.cpu().permute(0, 2, 3, 1)[0] | ||
|
|
||
| controlnet_embedder_scale_factor = 2 | ||
| image = Image.fromarray(np.uint8(init_image)).convert("RGB").resize((64, 64)) |
There was a problem hiding this comment.
the dummy image and mask_image are just 2 black images here
let's do something similar as https://github.com/huggingface/diffusers/pull/4536/files#diff-b65a24df736726ca6f92c71567b77c2a9832ee6142ee2dcbdb08e9addcb6da4b
There was a problem hiding this comment.
Followed the link's code,
image = floats_tensor((1, 3, 32, 32), rng=random.Random(seed)).to(device)
image = image.cpu().permute(0, 2, 3, 1)[0]
mask_image = torch.ones_like(image)
controlnet_embedder_scale_factor = 2
control_image = (
floats_tensor(
(1, 3, 32 * controlnet_embedder_scale_factor, 32 * controlnet_embedder_scale_factor),
rng=random.Random(seed),
)
.to(device)
.cpu()
)| assert np.abs(image_slice_1.flatten() - image_slice_3.flatten()).max() > 1e-4 | ||
|
|
||
| # Ignore float16 for SDXL | ||
| def test_float16_inference(self): |
There was a problem hiding this comment.
This was unintentional. Removed the disabling.
|
Thank you @yiyixuxu and @patrickvonplaten. I will work on comments this week. |
|
Borrowing ideas of PR 4811. Working in progress. |
|
Hey @viiika, Could we maybe work on this PR together? @harutatsuakiyama can you maybe invite @viiika as a collaborator for this PR to your fork so that we can work here? @viiika , it's quite rare that we have two PRs about the same feature popping up almost at the same time - very sorry for the potentially duplicated work. Would it be ok to pass onto this PR because:
That would be very nice if we could collaborate here 🙏 |
| return mask | ||
|
|
||
|
|
||
| def prepare_mask_and_masked_image(image, mask, height, width, return_image: bool = False): |
There was a problem hiding this comment.
Can we remove this function and instead use the new mask processor logic: #4444
There was a problem hiding this comment.
@harutatsuakiyama I think you can delete this function now if not used?
|
I still insist that #4811 already support some new features mentioned in #4694, like MultiControlnet, the api usage, no randn_tensor for control_image, even refactor with a mask_image_processor you mentioned just now, etc. And the coding style is more consistent with pipeline_stable_diffusion_xl_inpaint, compared to StableDiffusionControlNetInpaintPipeline adapted from StableDiffusionInpaintPipeline. I believe #4811 requires almost no effort to review, because it and the latest pipeline_stable_diffusion_xl/pipeline_stable_diffusion_xl_inpaint are updated synchronously. Despite this, merge which PR depends you. And I believe if you choose #4811, it may take less than a day for us to merge. |
|
Also, if you still insist we should continue with #4694, that's fine with me and I can try my best to help fixing problems. I just think merging #4694 will take a few weeks to handle many problems, and might introduce some design inconsistencies. A lot of current research relies on this pipeline, so I just hope it gets merged soon. |
|
Hi @yiyixuxu. Thanks for the review. I have addressed the review comments:
My local tests show no issues. Please let me know if further changes are required :-) |
| ] = None, | ||
| height: Optional[int] = None, | ||
| width: Optional[int] = None, | ||
| strength: float = 1.0, |
There was a problem hiding this comment.
| strength: float = 1.0, | |
| strength: float =0.9999, |
| The height in pixels of the generated image. | ||
| width (`int`, *optional*, defaults to self.unet.config.sample_size * self.vae_scale_factor): | ||
| The width in pixels of the generated image. | ||
| strength (`float`, *optional*, defaults to 1.): |
There was a problem hiding this comment.
| strength (`float`, *optional*, defaults to 1.): | |
| strength (`float`, *optional*, defaults to 0.9999): |
There was a problem hiding this comment.
Changed, can I curiously ask why?
|
|
||
| control_image = control_images | ||
| else: | ||
| assert False |
There was a problem hiding this comment.
| assert False | |
| raise ValueError(f"{controlnet.__class__} is not supported.") |
patrickvonplaten
left a comment
There was a problem hiding this comment.
Good to merge once @yiyixuxu is ok with it :-)
|
@viiika could you maybe drop your email here so that we can add you as a co-author via https://docs.github.com/en/pull-requests/committing-changes-to-your-project/creating-and-editing-commits/creating-a-commit-with-multiple-authors |
Sure. My primary GitHub email for this account is 1355864570@qq.com. Thank you very much! |
|
@harutatsuakiyama |
@harutatsuakiyama could you add @viiika as an author here that would be very nice ❤️ |
Co-authored-by: Jiabin Bai 1355864570@qq.com
|
Hi @yiyixuxu, @patrickvonplaten, and @viiika, I have addressed the new code review comments:
For the failing tests, it seems previous failure was due to Internet issues (500 bad gate). My local tests can pass. Please let me know if further changes are required. |
|
@harutatsuakiyama |
|
Thank you @yiyixuxu. I just realized that diffusers.utils.dummy_torch_and_transformers_objects.py has some style problems. I have fixed them. The following shows outputs of Let me know if other things are required.
python utils/check_copies.py --fix_and_overwrite
python utils/check_dummies.py --fix_and_overwrite
black examples scripts src tests utils
All done! ✨ 🍰 ✨
613 files left unchanged.
ruff examples scripts src tests utils --fix
examples/community/lpw_stable_diffusion_xl.py:1141:42: E721 Do not compare types, use `isinstance()`
examples/community/stable_diffusion_xl_reference.py:703:42: E721 Do not compare types, use `isinstance()`
src/diffusers/experimental/rl/value_guided_sampling.py:79:12: E721 Do not compare types, use `isinstance()`
src/diffusers/pipelines/audio_diffusion/pipeline_audio_diffusion.py:181:12: E721 Do not compare types, use `isinstance()`
src/diffusers/pipelines/stable_diffusion_xl/pipeline_stable_diffusion_xl.py:827:42: E721 Do not compare types, use `isinstance()`
src/diffusers/pipelines/stable_diffusion_xl/pipeline_stable_diffusion_xl_img2img.py:909:20: E721 Do not compare types, use `isinstance()`
src/diffusers/pipelines/stable_diffusion_xl/pipeline_stable_diffusion_xl_inpaint.py:1132:20: E721 Do not compare types, use `isinstance()`
src/diffusers/pipelines/t2i_adapter/pipeline_stable_diffusion_xl_adapter.py:877:42: E721 Do not compare types, use `isinstance()`
tests/pipelines/consistency_models/test_consistency_models.py:190:12: E721 Do not compare types, use `isinstance()`
tests/pipelines/unidiffuser/test_unidiffuser.py:112:12: E721 Do not compare types, use `isinstance()`
tests/pipelines/unidiffuser/test_unidiffuser.py:548:12: E721 Do not compare types, use `isinstance()`
tests/pipelines/unidiffuser/test_unidiffuser.py:651:12: E721 Do not compare types, use `isinstance()`
Found 12 errors.
make: *** [Makefile:59: style] Error 1 |
|
Ahh I see, I need to run the test for doc builder. Let me do that. I aim that to be the last test. Sorry for failing test again. Can I ask for hints about how to fix this error? @yiyixuxu Also, can we get access to run tests, for more efficient debugging purposes? I have tried locally, and seem to be correct ... |
| >>> mask_image = load_image(mask_url).convert("RGB") | ||
|
|
||
| >>> original_width, original_height = init_image.size | ||
| >>> new_width = int(original_width / 2) |
There was a problem hiding this comment.
This is to save CUDA memory. Removed in the new code.
| self, | ||
| prompt: Union[str, List[str]] = None, | ||
| prompt_2: Optional[Union[str, List[str]]] = None, | ||
| image: Union[ |
There was a problem hiding this comment.
let's use a custom type PipelineImageInput (was recently introduced)
| List[PIL.Image.Image], | ||
| List[np.ndarray], | ||
| ] = None, | ||
| mask_image: Union[torch.FloatTensor, PIL.Image.Image] = None, |
There was a problem hiding this comment.
I think mask_image should be of same type as image no? PipelineImageInput
| latent_model_input = torch.cat([latent_model_input, mask, masked_image_latents], dim=1) | ||
|
|
||
| # predict the noise residual | ||
| added_cond_kwargs = {"text_embeds": add_text_embeds, "time_ids": add_time_ids} |
There was a problem hiding this comment.
I don't think this line is needed? it has not changed from line 1452
| projection_class_embeddings_input_dim=80, # 6 * 8 + 32 | ||
| cross_attention_dim=64, | ||
| ) | ||
| torch.manual_seed(0) |
There was a problem hiding this comment.
Why do we need to fix the seed here? I don't think we have any randomness here, no?
There was a problem hiding this comment.
I followed the test here: https://github.com/huggingface/diffusers/blob/main/tests/pipelines/controlnet/test_controlnet_sdxl.py
| image_latents_params = TEXT_TO_IMAGE_IMAGE_PARAMS | ||
|
|
||
| def get_dummy_components(self): | ||
| torch.manual_seed(0) |
There was a problem hiding this comment.
| projection_class_embeddings_input_dim=80, # 6 * 8 + 32 | ||
| cross_attention_dim=64, | ||
| ) | ||
| torch.manual_seed(0) |
There was a problem hiding this comment.
Similarly, follow test here: https://github.com/huggingface/diffusers/blob/main/tests/pipelines/controlnet/test_controlnet_sdxl.py
|
regards to the quality test, make sure you are up to date? cc @DN6 here we need help with tests! |
|
I found out the test issues, some lines in doc_string is too long. |
|
Hi @yiyixuxu. I removed For now, I strongly believe the code should be able to pass tests (finger crossed 🙏) |
|
Hi @yiyixuxu, thanks for the new review round. I have addressed the comments:
Also, I strongly believe the code should be able to pass tests (finger crossed 🙏) Let me know if further changes are required. |
…uggingface#4694) * [ControlNet SDXL Inpainting] Support inpainting of ControlNet SDXL Co-authored-by: Jiabin Bai 1355864570@qq.com --------- Co-authored-by: Harutatsu Akiyama <kf.zy.qin@gmail.com>
…uggingface#4694) * [ControlNet SDXL Inpainting] Support inpainting of ControlNet SDXL Co-authored-by: Jiabin Bai 1355864570@qq.com --------- Co-authored-by: Harutatsu Akiyama <kf.zy.qin@gmail.com>
Overview:
This PR introduces the implementation of the inference pipeline for ControlNet with SDXL and inpainting.
Files Modified/Added:
srcs/pipelines/controlnet/pipeline_control_inpaint_sd_xl.pytests/pipelines/controlnet/test_controlnet_inpaint_sdx.pyVisualizations:
To better understand the impact and functionality of the implemented pipeline, the following visualizations are provided:
Overview:
This PR introduces the implementation of the inference pipeline for ControlNet with SDXL and inpainting.
Files Modified/Added:
srcs/pipelines/controlnet/pipeline_control_inpaint_sd_xl.pytests/pipelines/controlnet/test_controlnet_inpaint_sdx.pyExample Usage
Features