Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
36 commits
Select commit Hold shift + click to select a range
7e20a8b
T2I-Adapter-XL
MC-E Aug 21, 2023
a45cc6a
update
MC-E Aug 22, 2023
cd829c9
update
MC-E Aug 22, 2023
44b3454
add pipeline
MC-E Aug 23, 2023
d288b3b
modify pipeline
MC-E Aug 23, 2023
c8a20ef
modify pipeline
MC-E Aug 23, 2023
3dc1eba
modify pipeline
MC-E Aug 23, 2023
e0f73a7
modify pipeline
MC-E Aug 23, 2023
b299b8f
modify pipeline
MC-E Aug 23, 2023
98a6e69
modify modeling_text_unet
MC-E Aug 23, 2023
d146aa2
fix styling.
sayakpaul Aug 23, 2023
512e8b4
fix: copies.
sayakpaul Aug 23, 2023
db7e9b2
adapter settings
MC-E Aug 23, 2023
cc43a7b
new test case
MC-E Aug 23, 2023
fa7a218
new test case
MC-E Aug 23, 2023
dc203bd
debugging
sayakpaul Aug 23, 2023
708ee6a
debugging
sayakpaul Aug 23, 2023
4877a70
debugging
sayakpaul Aug 23, 2023
c518df5
debugging
sayakpaul Aug 23, 2023
3280104
debugging
sayakpaul Aug 23, 2023
0dc053f
debugging
sayakpaul Aug 23, 2023
5ceed5f
debugging
sayakpaul Aug 23, 2023
ae98b48
debugging
sayakpaul Aug 23, 2023
d595689
revert prints.
sayakpaul Aug 23, 2023
7a570b6
new test case
MC-E Aug 23, 2023
e1c60a1
remove print
MC-E Aug 23, 2023
8e78422
org test case
MC-E Aug 23, 2023
9cc021a
add test_pipeline
MC-E Aug 23, 2023
714a39b
styling.
sayakpaul Aug 23, 2023
7bb5d5b
fix copies.
sayakpaul Aug 23, 2023
b359793
modify test parameter
MC-E Aug 23, 2023
5d73983
style.
sayakpaul Aug 24, 2023
297e776
add adapter-xl doc
MC-E Aug 28, 2023
7857fa9
double quotes in docs
MC-E Aug 28, 2023
9b502fd
Fix potential type mismatch
MC-E Aug 29, 2023
d5c901d
style.
sayakpaul Aug 29, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
75 changes: 73 additions & 2 deletions docs/source/en/api/pipelines/stable_diffusion/adapter.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,10 +29,11 @@ This model was contributed by the community contributor [HimariO](https://github
| Pipeline | Tasks | Demo
|---|---|:---:|
| [StableDiffusionAdapterPipeline](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_adapter.py) | *Text-to-Image Generation with T2I-Adapter Conditioning* | -
| [StableDiffusionXLAdapterPipeline](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_xl_adapter.py) | *Text-to-Image Generation with T2I-Adapter Conditioning on StableDiffusion-XL* | -

## Usage example
## Usage example with the base model of StableDiffusion-1.4/1.5

In the following we give a simple example of how to use a *T2IAdapter* checkpoint with Diffusers for inference.
In the following we give a simple example of how to use a *T2IAdapter* checkpoint with Diffusers for inference based on StableDiffusion-1.4/1.5.
All adapters use the same pipeline.

1. Images are first converted into the appropriate *control image* format.
Expand Down Expand Up @@ -93,6 +94,62 @@ out_image = pipe(

![img](https://huggingface.co/datasets/diffusers/docs-images/resolve/main/t2i-adapter/color_output.png)

## Usage example with the base model of StableDiffusion-XL

In the following we give a simple example of how to use a *T2IAdapter* checkpoint with Diffusers for inference based on StableDiffusion-XL.
All adapters use the same pipeline.

1. Images are first downloaded into the appropriate *control image* format.
2. The *control image* and *prompt* are passed to the [`StableDiffusionXLAdapterPipeline`].

Let's have a look at a simple example using the [Sketch Adapter](https://huggingface.co/Adapter/t2iadapter/tree/main/sketch_sdxl_1.0).

```python
from diffusers.utils import load_image

sketch_image = load_image("https://huggingface.co/Adapter/t2iadapter/resolve/main/sketch.png").convert("L")
```

![img](https://huggingface.co/Adapter/t2iadapter/resolve/main/sketch.png)

Then, create the adapter pipeline

```py
import torch
from diffusers import (
T2IAdapter,
StableDiffusionXLAdapterPipeline,
DDPMScheduler
)
from diffusers.models.unet_2d_condition import UNet2DConditionModel

model_id = "stabilityai/stable-diffusion-xl-base-1.0"
adapter = T2IAdapter.from_pretrained("Adapter/t2iadapter", subfolder="sketch_sdxl_1.0",torch_dtype=torch.float16, adapter_type="full_adapter_xl")
scheduler = DDPMScheduler.from_pretrained(model_id, subfolder="scheduler")

pipe = StableDiffusionXLAdapterPipeline.from_pretrained(
model_id, adapter=adapter, safety_checker=None, torch_dtype=torch.float16, variant="fp16", scheduler=scheduler
)

pipe.to("cuda")
```

Finally, pass the prompt and control image to the pipeline

```py
# fix the random seed, so you will get the same result as the example
generator = torch.Generator().manual_seed(42)

sketch_image_out = pipe(
prompt="a photo of a dog in real world, high quality",
negative_prompt="extra digit, fewer digits, cropped, worst quality, low quality",
image=sketch_image,
generator=generator,
guidance_scale=7.5
).images[0]
```

![img](https://huggingface.co/Adapter/t2iadapter/resolve/main/sketch_output.png)

## Available checkpoints

Expand All @@ -113,6 +170,9 @@ Non-diffusers checkpoints can be found under [TencentARC/T2I-Adapter](https://hu
|[TencentARC/t2iadapter_depth_sd15v2](https://huggingface.co/TencentARC/t2iadapter_depth_sd15v2)||
|[TencentARC/t2iadapter_sketch_sd15v2](https://huggingface.co/TencentARC/t2iadapter_sketch_sd15v2)||
|[TencentARC/t2iadapter_zoedepth_sd15v1](https://huggingface.co/TencentARC/t2iadapter_zoedepth_sd15v1)||
|[Adapter/t2iadapter, subfolder='sketch_sdxl_1.0'](https://huggingface.co/Adapter/t2iadapter/tree/main/sketch_sdxl_1.0)||
|[Adapter/t2iadapter, subfolder='canny_sdxl_1.0'](https://huggingface.co/Adapter/t2iadapter/tree/main/canny_sdxl_1.0)||
|[Adapter/t2iadapter, subfolder='openpose_sdxl_1.0'](https://huggingface.co/Adapter/t2iadapter/tree/main/openpose_sdxl_1.0)||

## Combining multiple adapters

Expand Down Expand Up @@ -185,3 +245,14 @@ However, T2I-Adapter performs slightly worse than ControlNet.
- disable_vae_slicing
- enable_xformers_memory_efficient_attention
- disable_xformers_memory_efficient_attention

## StableDiffusionXLAdapterPipeline
[[autodoc]] StableDiffusionXLAdapterPipeline
- all
- __call__
- enable_attention_slicing
- disable_attention_slicing
- enable_vae_slicing
- disable_vae_slicing
- enable_xformers_memory_efficient_attention
- disable_xformers_memory_efficient_attention
1 change: 1 addition & 0 deletions src/diffusers/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -191,6 +191,7 @@
StableDiffusionPix2PixZeroPipeline,
StableDiffusionSAGPipeline,
StableDiffusionUpscalePipeline,
StableDiffusionXLAdapterPipeline,
StableDiffusionXLControlNetImg2ImgPipeline,
StableDiffusionXLControlNetPipeline,
StableDiffusionXLImg2ImgPipeline,
Expand Down
44 changes: 44 additions & 0 deletions src/diffusers/models/adapter.py
Original file line number Diff line number Diff line change
Expand Up @@ -128,6 +128,8 @@ def __init__(

if adapter_type == "full_adapter":
self.adapter = FullAdapter(in_channels, channels, num_res_blocks, downscale_factor)
elif adapter_type == "full_adapter_xl":
self.adapter = FullAdapterXL(in_channels, channels, num_res_blocks, downscale_factor)
elif adapter_type == "light_adapter":
self.adapter = LightAdapter(in_channels, channels, num_res_blocks, downscale_factor)
else:
Expand Down Expand Up @@ -184,6 +186,48 @@ def forward(self, x: torch.Tensor) -> List[torch.Tensor]:
return features


class FullAdapterXL(nn.Module):
def __init__(
self,
in_channels: int = 3,
channels: List[int] = [320, 640, 1280, 1280],
num_res_blocks: int = 2,
downscale_factor: int = 16,
):
super().__init__()

in_channels = in_channels * downscale_factor**2

self.unshuffle = nn.PixelUnshuffle(downscale_factor)
self.conv_in = nn.Conv2d(in_channels, channels[0], kernel_size=3, padding=1)

self.body = []
# blocks to extract XL features with dimensions of [320, 64, 64], [640, 64, 64], [1280, 32, 32], [1280, 32, 32]
for i in range(len(channels)):
if i == 1:
self.body.append(AdapterBlock(channels[i - 1], channels[i], num_res_blocks))
elif i == 2:
self.body.append(AdapterBlock(channels[i - 1], channels[i], num_res_blocks, down=True))
else:
self.body.append(AdapterBlock(channels[i], channels[i], num_res_blocks))

self.body = nn.ModuleList(self.body)
# XL has one fewer downsampling
self.total_downscale_factor = downscale_factor * 2 ** (len(channels) - 2)

def forward(self, x: torch.Tensor) -> List[torch.Tensor]:
x = self.unshuffle(x)
x = self.conv_in(x)

features = []

for block in self.body:
x = block(x)
features.append(x)

return features


class AdapterBlock(nn.Module):
def __init__(self, in_channels, out_channels, num_res_blocks, down=False):
super().__init__()
Expand Down
7 changes: 7 additions & 0 deletions src/diffusers/models/unet_2d_condition.py
Original file line number Diff line number Diff line change
Expand Up @@ -965,6 +965,13 @@ def forward(
cross_attention_kwargs=cross_attention_kwargs,
encoder_attention_mask=encoder_attention_mask,
)
# To support T2I-Adapter-XL
if (
is_adapter
and len(down_block_additional_residuals) > 0
and sample.shape == down_block_additional_residuals[0].shape
):
sample += down_block_additional_residuals.pop(0)

if is_controlnet:
sample = sample + mid_block_additional_residual
Expand Down
2 changes: 1 addition & 1 deletion src/diffusers/pipelines/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -118,7 +118,7 @@
StableDiffusionXLInstructPix2PixPipeline,
StableDiffusionXLPipeline,
)
from .t2i_adapter import StableDiffusionAdapterPipeline
from .t2i_adapter import StableDiffusionAdapterPipeline, StableDiffusionXLAdapterPipeline
from .text_to_video_synthesis import TextToVideoSDPipeline, TextToVideoZeroPipeline, VideoToVideoSDPipeline
from .unclip import UnCLIPImageVariationPipeline, UnCLIPPipeline
from .unidiffuser import ImageTextPipelineOutput, UniDiffuserModel, UniDiffuserPipeline, UniDiffuserTextDecoder
Expand Down
1 change: 1 addition & 0 deletions src/diffusers/pipelines/t2i_adapter/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,3 +12,4 @@
from ...utils.dummy_torch_and_transformers_objects import * # noqa F403
else:
from .pipeline_stable_diffusion_adapter import StableDiffusionAdapterPipeline
from .pipeline_stable_diffusion_xl_adapter import StableDiffusionXLAdapterPipeline
Loading