Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. |
|
@cene555 are you still working on the PR, or is it ready for review? let us know |
There was a problem hiding this comment.
great! I left some feedbacks, most nit, we can wait to make changes until @patrickvonplaten gives his review
I think the main to-dos left are:
- add docstring examples: I've tested our doc example for text2img, img2img and inpaint, all works great. only ones missing are controlnets
- add
#copy fromstatement everywhere (YiYi can help with this) - add tests and doc (YiYi can help with this)
- make
PriorEmb2Embwork as intended - either add add_noise function to unclip or replace unclip scheduler with ddpm (YiYi can help with this) - I'm not too sure why we need this
trystatement - can you explain? if any refactor is needed, let me know #3903 (comment)
| KandinskyInpaintPipeline, | ||
| KandinskyPipeline, | ||
| KandinskyPriorPipeline, | ||
| Kandinsky2_2_DecoderControlnetImg2ImgPipeline, |
There was a problem hiding this comment.
@patrickvonplaten
these Decoder pipelines are main pipelines that contains image generation process - this is consistent with the naming in the original repo and it makes sense I think
however might be a little bit confusing for our users - should we rename them?
There was a problem hiding this comment.
Yes I think it would be nicer to remove _Decoder here
|
|
||
| return time_image_embeds + time_text_embeds | ||
|
|
||
| class ImageTimeEmbedding(nn.Module): |
| self, | ||
| unet: UNet2DConditionModel, | ||
| scheduler: DDPMScheduler, | ||
| vae: VQModel, |
There was a problem hiding this comment.
I think we should be consistent with the 2.1 in diffusers and name it movq ?
There was a problem hiding this comment.
@cene555 is the model a movq here or just a vae?
| # YiYi notes: only reason this pipeline can't work with unclip scheduler is that can't pass down this argument | ||
| # need to use DDPM scheduler instead | ||
| # prev_timestep=prev_timestep, |
There was a problem hiding this comment.
| # YiYi notes: only reason this pipeline can't work with unclip scheduler is that can't pass down this argument | |
| # need to use DDPM scheduler instead | |
| # prev_timestep=prev_timestep, |
we can remove this note now:)
|
|
||
| class Kandinsky2_2_DecoderControlnetPipeline(DiffusionPipeline): | ||
| """ | ||
| Pipeline for text-to-image generation using Kandinsky |
There was a problem hiding this comment.
| Pipeline for text-to-image generation using Kandinsky | |
| Pipeline for text-to-image generation using Kandinsky with ControlNet guidance. |
| noise_pred, _ = noise_pred.split(latents.shape[1], dim=1) | ||
|
|
||
| # compute the previous noisy sample x_t -> x_t-1 | ||
| try: |
There was a problem hiding this comment.
I don't understand this here: do we have a scheduler.step that does not accept generator argument?
in any case we should be very clear about what schedulers this pipeline can work with and be able to work with them
|
|
||
| class Kandinsky2_2_DecoderImg2ImgPipeline(DiffusionPipeline): | ||
| """ | ||
| Pipeline for text-to-image generation using Kandinsky |
There was a problem hiding this comment.
| Pipeline for text-to-image generation using Kandinsky | |
| Pipeline for image-to-image generation using Kandinsky |
| text_encoder ([`MultilingualCLIP`]): | ||
| Frozen text-encoder. | ||
| tokenizer ([`XLMRobertaTokenizer`]): | ||
| Tokenizer of class |
There was a problem hiding this comment.
| text_encoder ([`MultilingualCLIP`]): | |
| Frozen text-encoder. | |
| tokenizer ([`XLMRobertaTokenizer`]): | |
| Tokenizer of class |
| noise = randn_tensor(shape, generator=generator, device=device, dtype=dtype) | ||
|
|
||
| # get latents | ||
| try: |
There was a problem hiding this comment.
need to make sure either it works with ddpm or add add_noise method to unclip scheduler
| text_encoder ([`MultilingualCLIP`]): | ||
| Frozen text-encoder. | ||
| tokenizer ([`XLMRobertaTokenizer`]): | ||
| Tokenizer of class |
There was a problem hiding this comment.
| text_encoder ([`MultilingualCLIP`]): | |
| Frozen text-encoder. | |
| tokenizer ([`XLMRobertaTokenizer`]): | |
| Tokenizer of class |
|
|
||
| self.num_image_text_embeds = num_image_text_embeds | ||
| self.image_embeds = nn.Linear(image_embed_dim, self.num_image_text_embeds * cross_attention_dim) | ||
| self.norm = nn.LayerNorm(cross_attention_dim) |
There was a problem hiding this comment.
| self.norm = nn.LayerNorm(cross_attention_dim) | |
| self.norm = nn.LayerNorm(cross_attention_dim) | |
| @@ -26,7 +26,10 @@ | |||
| from .embeddings import ( | |||
There was a problem hiding this comment.
Changes here look all good to me! would just be nice / important to add some tests and docstrings
| self.input_hint_block = nn.Sequential( | ||
| nn.Conv2d(3, 16, 3, padding=1), | ||
| nn.SiLU(), | ||
| nn.Conv2d(16, 16, 3, padding=1), | ||
| nn.SiLU(), | ||
| nn.Conv2d(16, 32, 3, padding=1, stride=2), | ||
| nn.SiLU(), | ||
| nn.Conv2d(32, 32, 3, padding=1), | ||
| nn.SiLU(), | ||
| nn.Conv2d(32, 96, 3, padding=1, stride=2), | ||
| nn.SiLU(), | ||
| nn.Conv2d(96, 96, 3, padding=1), | ||
| nn.SiLU(), | ||
| nn.Conv2d(96, 256, 3, padding=1, stride=2), | ||
| nn.SiLU(), | ||
| nn.Conv2d(256, 4, 3, padding=1) | ||
| ) |
There was a problem hiding this comment.
| self.input_hint_block = nn.Sequential( | |
| nn.Conv2d(3, 16, 3, padding=1), | |
| nn.SiLU(), | |
| nn.Conv2d(16, 16, 3, padding=1), | |
| nn.SiLU(), | |
| nn.Conv2d(16, 32, 3, padding=1, stride=2), | |
| nn.SiLU(), | |
| nn.Conv2d(32, 32, 3, padding=1), | |
| nn.SiLU(), | |
| nn.Conv2d(32, 96, 3, padding=1, stride=2), | |
| nn.SiLU(), | |
| nn.Conv2d(96, 96, 3, padding=1), | |
| nn.SiLU(), | |
| nn.Conv2d(96, 256, 3, padding=1, stride=2), | |
| nn.SiLU(), | |
| nn.Conv2d(256, 4, 3, padding=1) | |
| ) | |
| self.input_hint_block = nn.Sequential( | |
| nn.Conv2d(3, 16, 3, padding=1), | |
| nn.SiLU(), | |
| nn.Conv2d(16, 16, 3, padding=1), | |
| nn.SiLU(), | |
| nn.Conv2d(16, 32, 3, padding=1, stride=2), | |
| nn.SiLU(), | |
| nn.Conv2d(32, 32, 3, padding=1), | |
| nn.SiLU(), | |
| nn.Conv2d(32, 96, 3, padding=1, stride=2), | |
| nn.SiLU(), | |
| nn.Conv2d(96, 96, 3, padding=1), | |
| nn.SiLU(), | |
| nn.Conv2d(96, 256, 3, padding=1, stride=2), | |
| nn.SiLU(), | |
| nn.Conv2d(256, 4, 3, padding=1) | |
| ) |
| @@ -375,6 +375,27 @@ def forward(self, text_embeds: torch.FloatTensor, image_embeds: torch.FloatTenso | |||
|
|
|||
There was a problem hiding this comment.
Changes look good to me!
| KandinskyInpaintPipeline, | ||
| KandinskyPipeline, | ||
| KandinskyPriorPipeline, | ||
| Kandinsky2_2_DecoderControlnetImg2ImgPipeline, |
There was a problem hiding this comment.
| Kandinsky2_2_DecoderControlnetImg2ImgPipeline, | |
| KandinskyControlnetV22Img2ImgPipeline, |
| Kandinsky2_2_DecoderControlnetPipeline, | ||
| Kandinsky2_2_DecoderImg2ImgPipeline, | ||
| Kandinsky2_2_DecoderPipeline, | ||
| Kandinsky2_2PriorEmb2EmbPipeline, | ||
| Kandinsky2_2PriorPipeline, | ||
| Kandinsky2_2_DecoderInpaintPipeline, |
There was a problem hiding this comment.
| Kandinsky2_2_DecoderControlnetPipeline, | |
| Kandinsky2_2_DecoderImg2ImgPipeline, | |
| Kandinsky2_2_DecoderPipeline, | |
| Kandinsky2_2PriorEmb2EmbPipeline, | |
| Kandinsky2_2PriorPipeline, | |
| Kandinsky2_2_DecoderInpaintPipeline, | |
| KandinskyV22ControlnetPipeline, | |
| KandinskyV22Img2ImgPipeline, | |
| KandinskyV22DecoderPipeline, | |
| KandinskyV22PriorEmb2EmbPipeline, | |
| KandinskyV22PriorPipeline, | |
| KandinskyV22InpaintPipeline, |
What does this PR do?
Fixes # (issue)
Before submitting
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.