[Community] reference only control by okotaku · Pull Request #3435 · huggingface/diffusers

okotaku · 2023-05-15T11:54:24Z

Refer to Mikubill/sd-webui-controlnet#1236

Reference Image

Output Image of reference_attn=True and reference_adain=False

Output Image of reference_attn=False and reference_adain=True

Output Image of reference_attn=True and reference_adain=True

HuggingFaceDocBuilderDev · 2023-05-15T11:59:56Z

The documentation is not available anymore as the PR was closed or merged.

jfischoff · 2023-05-16T16:29:48Z

Should we include the group norm modification in this PR as well: https://github.com/Mikubill/sd-webui-controlnet/pull/1278/files#diff-8c8d004eed5a3078434f6fbde15c178e472565ebfcb3119f308f9292c8eb7514R458 ?

Right now using the group norm or reference_adain+attn gives the best results for reference only. I would definitely like to see this get added either in this PR or a subsequent one.

patrickvonplaten

Works for me! Thanks!

patrickvonplaten · 2023-05-17T09:06:29Z

Think there is one merge conflict that we need to resolve & then we can get this one merged :-)

kadirnar · 2023-05-17T09:10:21Z

Will you add controlnet support? How can we use it with controlnet? @okotaku

wangdong-ivymobile · 2023-05-18T01:24:35Z

It doesn't work for me for EulerAncestralDiscreteScheduler

Laidawang · 2023-05-18T02:00:17Z

When i run the example i get the error：
AttributeError: 'StableDiffusionReferencePipeline' object has no attribute 'image_processor'

wangdong-ivymobile · 2023-05-18T02:13:15Z

When i run the example i get the error： AttributeError: 'StableDiffusionReferencePipeline' object has no attribute 'image_processor'

You can't use the version from PyPi, you need to pull master branch and install from local (that or wait for next release).

Laidawang · 2023-05-18T02:18:09Z

When i run the example i get the error： AttributeError: 'StableDiffusionReferencePipeline' object has no attribute 'image_processor'

You can't use the version from PyPi, you need to pull master branch and install from local (that or wait for next release).

thanks, i see

okotaku · 2023-05-18T02:30:25Z

@kadirnar I will add controlnet version after this PR merged.
@patrickvonplaten I added reference adain. Please review it too.

okotaku · 2023-05-18T02:42:43Z

@wangdong-ivymobile Thank you for your report. I fixed this bug on latest commit.

jfischoff · 2023-05-18T05:05:39Z

Just fyi, @lllyasviel fixed a bug recently: Mikubill/sd-webui-controlnet#1309

kadirnar · 2023-05-18T05:50:49Z

@kadirnar I will add controlnet version after this PR merged.
@patrickvonplaten I added reference adain. Please review it too.

This is great 💯 Can we add inpaint feature like in this repo?

https://github.com/Mikubill/sd-webui-controlnet

okotaku · 2023-05-18T06:31:07Z

@jfischoff Thank you for your suggestion. I fixed style fidelity rule on latest commit, it is based on Mikubill/sd-webui-controlnet#1309 .

jfischoff · 2023-05-18T16:53:37Z

@okotaku I really appreciate the work you are doing!

wangdong-ivymobile · 2023-05-19T08:22:01Z

I tried to use this with multiple images (with slight modification), the result is very bad. I wonder if there are some tricks to make it better?
What I did is basically instead of using a latent of a single image, I concat all the ref image latent together.
What I wish to achieve is maybe to have different ref images contributing differently to the result image, e.g. the first setting up the tone and theme, the second add some style and details, etc.

wangdong-ivymobile · 2023-05-19T08:22:56Z

Also I tried to use multi controlnet with reference on Webui, the result is also not good but different, I wonder maybe the mechanism is different?

jfischoff · 2023-05-19T20:54:59Z

I'm getting slightly different results with this pipeline vs the automatic1111.
This pipeline

Automatic1111

Looking at the code I notice the implementation is different between the original and this PR. I'm not sure it is responsible for the differences, but I was curious for instance why some of the changes were made.

For instance the original code using the style blend to control adding sequences to the bank

                if outer.attention_auto_machine == AutoMachine.Write:
                    if outer.attention_auto_machine_weight > self.attn_weight:
                        self.bank.append(self_attention_context.detach().clone())
                        self.style_cfgs.append(outer.current_style_fidelity)
                if outer.attention_auto_machine == AutoMachine.Read:
                    if len(self.bank) > 0:
                        style_cfg = sum(self.style_cfgs) / float(len(self.style_cfgs))
                        self_attn1_uc = self.attn1(x_norm1, context=torch.cat([self_attention_context] + self.bank, dim=1))
                        self_attn1_c = self_attn1_uc.clone()
                        if len(outer.current_uc_indices) > 0 and style_cfg > 1e-5:
                            self_attn1_c[outer.current_uc_indices] = self.attn1(
                                x_norm1[outer.current_uc_indices],
                                context=self_attention_context[outer.current_uc_indices])
                        self_attn1 = style_cfg * self_attn1_c + (1.0 - style_cfg) * self_attn1_uc
                    self.bank = []
                    self.style_cfgs = []
                if self_attn1 is None:
                    self_attn1 = self.attn1(x_norm1, **context=self_attention_context)**

Will this PR does the check during the read mode:

                if MODE == "write":
                    self.bank.append(norm_hidden_states.detach().clone())
                    attn_output = self.attn1(
                        norm_hidden_states,
                        encoder_hidden_states=encoder_hidden_states if self.only_cross_attention else None,
                        attention_mask=attention_mask,
                        **cross_attention_kwargs,
                    )
                if MODE == "read":
                    if attention_auto_machine_weight > self.attn_weight:
                        attn_output_uc = self.attn1(
                            norm_hidden_states,
                            encoder_hidden_states=torch.cat([norm_hidden_states] + self.bank, dim=1),
                            # attention_mask=attention_mask,
                            **cross_attention_kwargs,
                        )
                        attn_output_c = attn_output_uc.clone()
                        if do_classifier_free_guidance and style_fidelity > 0:
                            attn_output_c[uc_mask] = self.attn1(
                                norm_hidden_states[uc_mask],
                                encoder_hidden_states=norm_hidden_states[uc_mask],
                                **cross_attention_kwargs,
                            )
                        attn_output = style_fidelity * attn_output_c + (1.0 - style_fidelity) * attn_output_uc
                        self.bank.clear()
                    else:
                        attn_output = self.attn1(
                            norm_hidden_states,
                            encoder_hidden_states=encoder_hidden_states if self.only_cross_attention else None,
                            attention_mask=attention_mask,
                            **cross_attention_kwargs,
                        **)

I haven't looked closely enough to see if there are other differences, nor do I know if this is why my images are different. I'm just curious mostly.

skirsten · 2023-05-20T14:15:22Z

examples/community/stable_diffusion_reference.py

+                latent_model_input = self.scheduler.scale_model_input(latent_model_input, t)
+
+                # ref only part
+                noise = torch.randn_like(ref_image_latents)


I believe this should be

noise = randn_tensor(ref_image_latents.shape, generator=generator, device=ref_image_latents.device, dtype=ref_image_latents.dtype)

from the utils import to ensure the generation is deterministic.

jfischoff · 2023-05-21T23:26:14Z

@okotaku that makes sense. Thanks for answer my question.

patrickvonplaten · 2023-05-22T15:21:50Z

Cool let's merge this one!

* add reference only control * add reference only control * add reference only control * fix lint * fix lint * reference adain * bugfix EulerAncestralDiscreteScheduler * fix style fidelity rule * fix default output size * del unused line * fix deterministic

learningyan · 2023-05-26T08:14:57Z

hi. TypeError: Transformer2DModel.forward() got an unexpected keyword argument 'attention_mask'
also got an unexpected keyword argument 'encoder_attention_mask'
How to solve it ? @okotaku

okotaku · 2023-05-26T09:01:46Z

@learningyan #3508
I fixed the bug in this PR.
Please pull latest main branch.

learningyan · 2023-05-26T09:27:47Z

@okotaku
i have pulled latest main branch, still have this question in line 501 and 502 of examples/community/stable_diffusion_controlnet_reference.py

when I comment out these codes, the code is right. but the quality of generated images is poor. So I'm not sure if commenting out these codes is correct.

reference image :

inference code:

pipe = StableDiffusionReferencePipeline.from_pretrained(
"runwayml/stable-diffusion-v1-5",
safety_checker=None,
torch_dtype=torch.float16
).to('cuda:0')

pipe.scheduler = UniPCMultistepScheduler.from_config(pipe.scheduler.config)

result_img = pipe(ref_image=input_image,
prompt="1girl, masterpiece, best quality",
negative_prompt="lowres, bad anatomy, bad hands, text, error, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality, normal quality, jpeg artifacts, signature, watermark, username, blurry",
num_inference_steps=20,
num_images_per_prompt=1,
reference_attn=True,
reference_adain=True).images[0]
result_img.save('reference.png')

result_img:

the results in sd-webui-controlnet (see Mikubill/sd-webui-controlnet#1236)

So How can I get the similar result as sd-webui-controlnet?
thanks a lot.

okotaku · 2023-05-26T10:05:08Z

@learningyan

i have pulled latest main branch, still have this question in line 501 and 502 of examples/community/stable_diffusion_controlnet_reference.py

You need to update diffusers too.

git pull
pip install .

the quality of generated images is poor.

You can change base model.
For example,

pipe = StableDiffusionReferencePipeline.from_pretrained(
"andite/anything-v4.0",
safety_checker=None,
torch_dtype=torch.float16
).to('cuda:0')

And this result is reference attention only,

result_img = pipe(ref_image=input_image,
prompt="1girl, masterpiece, best quality",
negative_prompt="lowres, bad anatomy, bad hands, text, error, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality, normal quality, jpeg artifacts, signature, watermark, username, blurry",
num_inference_steps=20,
num_images_per_prompt=1,
reference_attn=True,
reference_adain=False,
style_fidelity=0.5  # you can set style_fidelity=1.0
).images[0]

Mikubill/sd-webui-controlnet#1236 (comment)

When examining the details more closely, it appears that the actual model being used is anythingv3 and a custom model named animevae.pt, etc.

vijishmadhavan · 2023-06-06T07:19:10Z

why I am getting kind of white tint on images using controlnet reference only?

`
pipe = StableDiffusionControlNetReferencePipeline.from_pretrained(
"stabilityai/stable-diffusion-2-1",
controlnet=controlnet,
safety_checker=None,
torch_dtype=torch.float16
).to('cuda:0')

pipe.scheduler = UniPCMultistepScheduler.from_config(pipe.scheduler.config) image1 = Image.fromarray(image)
image = np.array(image)
low_threshold = 100
high_threshold = 200
canny = cv2.Canny(image, low_threshold, high_threshold)
canny = canny[:, :, None]
canny = np.concatenate([canny, canny, canny], axis=2)
canny_image = Image.fromarray(canny)
result_img = pipe(ref_image=image1,
prompt=prompt,
negative_prompt="lowres, two head, bad anatomy, bad hands, text, error, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality, normal quality, jpeg artifacts, signature, watermark, username, blurry",
image=canny_image,
num_inference_steps=20,
controlnet_conditioning_scale=0.8,
reference_attn=True,
reference_adain=True,
style_fidelity=0.5).images[0]`

miiiz · 2023-06-06T09:39:54Z

i just git from the lastest diffusers and install following the important note but got this error. StableDiffusionReferencePipeline works fine.

AttributeError: 'StableDiffusionControlNetReferencePipeline' object has no attribute '_default_height_width'

blx0102 · 2023-06-06T13:12:03Z

@miiiz you can copy the code of that function from reference file to the controlnet reference file

okotaku · 2023-06-09T02:11:23Z

@miiiz @blx0102 Thank you for your feedback.
I updated the codes.

#3723

alexisrolland · 2023-06-10T13:51:16Z

Hi there, this thread is a bit long and confusing. Is there any documentation somewhere that describes the classes StableDiffusionReferencePipeline and StableDiffusionCobtrolNetReferencePipeline?

gasvn · 2023-06-17T02:42:09Z

@okotaku Thanks for your diffusers implementation. Based on your code, I achieve a cross-image region drag based on the reference scheme. 1. Use inpaint controlnet to extract inpainted region feature from another image. 2. Use the segment-anything controlnet to keep reasonable pose.

Code in https://github.com/sail-sg/EditAnything

okotaku · 2023-06-22T06:31:23Z

@gasvn Really cool projects! Thank you for your contribution.

ethankongee · 2023-07-24T15:37:01Z

two questions here:

I see StableDiffusionControlNetReferencePipeline and StableDiffusionReferencePipeline. What are the differences?
I'm unable to import StableDiffusionControlNetReferencePipeline directly from the diffusers module. From the doc here, looks like I should use custom_pipeline kwarg?

patrickvonplaten · 2023-07-24T19:31:10Z

Yes you need to use the custom_pipeline kwarg :-)

djj0s3 · 2023-07-31T18:20:28Z

two questions here:

I see StableDiffusionControlNetReferencePipeline and StableDiffusionReferencePipeline. What are the differences?

I'm unable to import StableDiffusionControlNetReferencePipeline directly from the diffusers module. From the doc here, looks like I should use custom_pipeline kwarg?

For anyone coming to this later, you'll definitely need custom_pipeline like so:

controlnet_reference_pipe = StableDiffusionPipeline.from_pretrained(
  MODEL_ID,
  cache_dir=MODEL_CACHE_DIR,
  custom_pipeline="stable_diffusion_controlnet_reference",
  controlnet=CONTROLNET_PIPE,
  safety_checker=None,
  torch_dtype=torch.float16
  )

amrakm · 2023-07-31T19:18:59Z

@djj0s3 does that mean you can't use reference-only with other ControlNets using StableDiffusionMultiControlNetPipeline

djj0s3 · 2023-08-05T18:59:20Z

@amrakm are you trying to use reference-only as a controlnet to pass into another pipe with other controlnets? Interesting idea! One option you can try is to import this class directly and use it as StableDiffusionControlNetReferencePipeline.from_pretrained(...):...Since it is written as a subclass of StableDiffusionControlNetPipeline, you should be able to accomplish what you're trying to do. I'm coincidentally loading it this way vs using custom_pipeline since I had to tweak the class to get a couple of other things I'm doing working with it. I haven't tried what you're doing, but that approach should work for you.

Akmpfen · 2023-09-05T07:11:39Z

I see the code has reference_attn and reference_adain, but how to use reference_only?

dotieuthien · 2023-09-24T15:11:29Z

why I am getting kind of white tint on images using controlnet reference only?

` pipe = StableDiffusionControlNetReferencePipeline.from_pretrained( "stabilityai/stable-diffusion-2-1", controlnet=controlnet, safety_checker=None, torch_dtype=torch.float16 ).to('cuda:0')

pipe.scheduler = UniPCMultistepScheduler.from_config(pipe.scheduler.config) image1 = Image.fromarray(image) image = np.array(image) low_threshold = 100 high_threshold = 200 canny = cv2.Canny(image, low_threshold, high_threshold) canny = canny[:, :, None] canny = np.concatenate([canny, canny, canny], axis=2) canny_image = Image.fromarray(canny) result_img = pipe(ref_image=image1, prompt=prompt, negative_prompt="lowres, two head, bad anatomy, bad hands, text, error, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality, normal quality, jpeg artifacts, signature, watermark, username, blurry", image=canny_image, num_inference_steps=20, controlnet_conditioning_scale=0.8, reference_attn=True, reference_adain=True, style_fidelity=0.5).images[0]`

I still face with the same issue. Could you please give me any advices? @okotaku

chyohoo · 2023-11-01T04:04:03Z

why I am getting kind of white tint on images using controlnet reference only?
pipe = StableDiffusionControlNetReferencePipeline.from_pretrained( "stabilityai/stable-diffusion-2-1", controlnet=controlnet, safety_checker=None, torch_dtype=torch.float16 ).to('cuda:0') pipe.scheduler = UniPCMultistepScheduler.from_config(pipe.scheduler.config) image1 = Image.fromarray(image) image = np.array(image) low_threshold = 100 high_threshold = 200 canny = cv2.Canny(image, low_threshold, high_threshold) canny = canny[:, :, None] canny = np.concatenate([canny, canny, canny], axis=2) canny_image = Image.fromarray(canny) result_img = pipe(ref_image=image1, prompt=prompt, negative_prompt="lowres, two head, bad anatomy, bad hands, text, error, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality, normal quality, jpeg artifacts, signature, watermark, username, blurry", image=canny_image, num_inference_steps=20, controlnet_conditioning_scale=0.8, reference_attn=True, reference_adain=True, style_fidelity=0.5).images[0]

I still face with the same issue. Could you please give me any advices? @okotaku

normalized the reference image into [-1,1] not [0,1]

patrickvonplaten · 2023-11-01T20:47:28Z

@DN6 there is still lots of interest I think. Wonder if we should circle back to a naive diffusers integration as proposed in #4257

animebing · 2023-12-07T02:21:11Z

@okotaku @patrickvonplaten the ref_image in stable_diffusion_controlnet_reference.py is not normalized to [-1, 1], because control_image_processor in StableDiffusionControlNetPipeline does not do this, but this normalization is needed in prepare_ref_latents when using VAE

* add reference only control * add reference only control * add reference only control * fix lint * fix lint * reference adain * bugfix EulerAncestralDiscreteScheduler * fix style fidelity rule * fix default output size * del unused line * fix deterministic

okotaku added 4 commits May 15, 2023 20:48

add reference only control

301b086

add reference only control

d3fb604

add reference only control

20eecfd

fix lint

a4ef40d

fix lint

50ee56a

NormXU mentioned this pull request May 16, 2023

Diffusers support for Reference Only Mikubill/sd-webui-controlnet#1291

Closed

patrickvonplaten approved these changes May 17, 2023

View reviewed changes

reference adain

5072cf2

okotaku added 2 commits May 18, 2023 11:32

fix conflict

0c53436

bugfix EulerAncestralDiscreteScheduler

f3f5db7

fix style fidelity rule

acdbb4a

okotaku added 2 commits May 18, 2023 17:39

fix default output size

79c162e

del unused line

5c5450f

skirsten reviewed May 20, 2023

View reviewed changes

fix deterministic

e516fb3

patrickvonplaten merged commit c4359d6 into huggingface:main May 22, 2023

okotaku mentioned this pull request May 23, 2023

[Community] ControlNet Reference #3508

Merged

patrickvonplaten mentioned this pull request Jun 28, 2023

Controlnet Img2Img with reference #3872

Closed

6 tasks

Conversation

okotaku commented May 15, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented May 15, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jfischoff commented May 16, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

patrickvonplaten left a comment

Choose a reason for hiding this comment

Uh oh!

patrickvonplaten commented May 17, 2023

Uh oh!

kadirnar commented May 17, 2023

Uh oh!

wangdong-ivymobile commented May 18, 2023

Uh oh!

Laidawang commented May 18, 2023

Uh oh!

wangdong-ivymobile commented May 18, 2023

Uh oh!

Laidawang commented May 18, 2023

Uh oh!

okotaku commented May 18, 2023

Uh oh!

okotaku commented May 18, 2023

Uh oh!

jfischoff commented May 18, 2023

Uh oh!

kadirnar commented May 18, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

okotaku commented May 18, 2023

Uh oh!

jfischoff commented May 18, 2023

Uh oh!

wangdong-ivymobile commented May 19, 2023

Uh oh!

wangdong-ivymobile commented May 19, 2023

Uh oh!

jfischoff commented May 19, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

skirsten May 20, 2023

Choose a reason for hiding this comment

Uh oh!

jfischoff commented May 21, 2023

Uh oh!

patrickvonplaten commented May 22, 2023

Uh oh!

learningyan commented May 26, 2023

Uh oh!

okotaku commented May 26, 2023

Uh oh!

learningyan commented May 26, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

okotaku commented May 26, 2023

Uh oh!

vijishmadhavan commented Jun 6, 2023

Uh oh!

miiiz commented Jun 6, 2023

Uh oh!

blx0102 commented Jun 6, 2023

Uh oh!

okotaku commented Jun 9, 2023

Uh oh!

alexisrolland commented Jun 10, 2023

Uh oh!

gasvn commented Jun 17, 2023

Uh oh!

okotaku commented Jun 22, 2023

Uh oh!

ethankongee commented Jul 24, 2023

Uh oh!

patrickvonplaten commented Jul 24, 2023

Uh oh!

djj0s3 commented Jul 31, 2023

okotaku commented May 15, 2023 •

edited

Loading

HuggingFaceDocBuilderDev commented May 15, 2023 •

edited

Loading

jfischoff commented May 16, 2023 •

edited

Loading

kadirnar commented May 18, 2023 •

edited

Loading

jfischoff commented May 19, 2023 •

edited

Loading

learningyan commented May 26, 2023 •

edited

Loading

dotieuthien commented Sep 24, 2023 •

edited

Loading