From 9f2d7e2fba658bc56e0f37f6e552082632520ff6 Mon Sep 17 00:00:00 2001 From: Patrick von Platen Date: Thu, 6 Jul 2023 18:44:26 +0200 Subject: [PATCH 1/6] finish sd xl docs --- .../stable_diffusion/stable_diffusion_xl.mdx | 114 +++++++++++++++++- src/diffusers/utils/import_utils.py | 2 +- 2 files changed, 109 insertions(+), 7 deletions(-) diff --git a/docs/source/en/api/pipelines/stable_diffusion/stable_diffusion_xl.mdx b/docs/source/en/api/pipelines/stable_diffusion/stable_diffusion_xl.mdx index b87d51af233b..fa0cc5bcb27a 100644 --- a/docs/source/en/api/pipelines/stable_diffusion/stable_diffusion_xl.mdx +++ b/docs/source/en/api/pipelines/stable_diffusion/stable_diffusion_xl.mdx @@ -12,22 +12,124 @@ specific language governing permissions and limitations under the License. # Stable diffusion XL -Stable Diffusion 2 is a text-to-image _latent diffusion_ model built upon the work of [Stable Diffusion 1](https://stability.ai/blog/stable-diffusion-public-release). -The project to train Stable Diffusion 2 was led by Robin Rombach and Katherine Crowson from [Stability AI](https://stability.ai/) and [LAION](https://laion.ai/). +Stable Diffusion XL was proposed in [SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis](https://arxiv.org/abs/2307.01952) by Dustin Podell, Zion English, Kyle Lacey, Andreas Blattmann, Tim Dockhorn, Jonas Müller, Joe Penna, Robin Rombach -*The Stable Diffusion 2.0 release includes robust text-to-image models trained using a brand new text encoder (OpenCLIP), developed by LAION with support from Stability AI, which greatly improves the quality of the generated images compared to earlier V1 releases. The text-to-image models in this release can generate images with default resolutions of both 512x512 pixels and 768x768 pixels. -These models are trained on an aesthetic subset of the [LAION-5B dataset](https://laion.ai/blog/laion-5b/) created by the DeepFloyd team at Stability AI, which is then further filtered to remove adult content using [LAION’s NSFW filter](https://openreview.net/forum?id=M3Y74vmsMcY).* +The abstract of the paper is the following: -For more details about how Stable Diffusion 2 works and how it differs from Stable Diffusion 1, please refer to the official [launch announcement post](https://stability.ai/blog/stable-diffusion-v2-release). +*We present SDXL, a latent diffusion model for text-to-image synthesis. Compared to previous versions of Stable Diffusion, SDXL leverages a three times larger UNet backbone: The increase of model parameters is mainly due to more attention blocks and a larger cross-attention context as SDXL uses a second text encoder. We design multiple novel conditioning schemes and train SDXL on multiple aspect ratios. We also introduce a refinement model which is used to improve the visual fidelity of samples generated by SDXL using a post-hoc image-to-image technique. We demonstrate that SDXL shows drastically improved performance compared the previous versions of Stable Diffusion and achieves results competitive with those of black-box state-of-the-art image generators.* ## Tips +- Stable Diffusion XL works especially well with images between 768 and 1024. +- Stable Diffusion XL output image can be improved by making use of a refiner as shown below + ### Available checkpoints: - *Text-to-Image (1024x1024 resolution)*: [stabilityai/stable-diffusion-xl-base-0.9](https://huggingface.co/stabilityai/stable-diffusion-xl-base-0.9) with [`StableDiffusionXLPipeline`] - *Image-to-Image / Refiner (1024x1024 resolution)*: [stabilityai/stable-diffusion-xl-refiner-0.9](https://huggingface.co/stabilityai/stable-diffusion-xl-refiner-0.9) with [`StableDiffusionXLImg2ImgPipeline`] -TODO +## Usage Example + +Before using SDXL make sure to have `transformers`, `accelerate`, `safetensors` and `invisible_watermark` installed. +You can install the libraries as follows: + +``` +pip install transformers +pip install accelerate +pip install safetensors +pip install invisible-watermark>=2.0 +``` + +### *Text-to-Image* + +You can use SDXL as follows for *text-to-image*: + +```py +from diffusers import StableDiffusionXLPipeline +import torch + +pipe = StableDiffusionXLPipeline.from_pretrained("stabilityai/stable-diffusion-xl-base-0.9", torch_dtype=torch.float16, variant="fp16", use_safetensors=True) +pipe.to("cuda") + +prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k" +image = pipe(prompt=prompt).images[0] +``` + +### Refining the image output + +The image can be refined by making use of [stabilityai/stable-diffusion-xl-refiner-0.9](https://huggingface.co/stabilityai/stable-diffusion-xl-refiner-0.9). +In this case, you only have to output the `latents` from the base model. + +```py +from diffusers import StableDiffusionXLPipeline, StableDiffusionXLImg2ImgPipeline +import torch + +pipe = StableDiffusionXLPipeline.from_pretrained("stabilityai/stable-diffusion-xl-base-0.9", torch_dtype=torch.float16, variant="fp16", use_safetensors=True) +pipe.to("cuda") + +refiner = StableDiffusionXLImg2ImgPipeline.from_pretrained("stabilityai/stable-diffusion-xl-refiner-0.9", torch_dtype=torch.float16, use_safetensors=True, variant="fp16") +refiner.to("cuda") + +prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k" + +image = pipe(prompt=prompt, output_type="latent" if use_refiner else "pil").images[0] +image = refiner(prompt=prompt, image=image[None, :]).images[0] +``` + +### Loading single file checkpoitns / original file format + +By making use of [`~diffusers.loaders.FromSingleFileMixin.from_single_file`] you can also load the +original file format into `diffusers`: + +```py +from diffusers import StableDiffusionXLPipeline, StableDiffusionXLImg2ImgPipeline +import torch + +pipe = StableDiffusionXLPipeline.from_pretrained("stabilityai/stable-diffusion-xl-base-0.9", torch_dtype=torch.float16, variant="fp16", use_safetensors=True) +pipe.to("cuda") + +refiner = StableDiffusionXLImg2ImgPipeline.from_pretrained("stabilityai/stable-diffusion-xl-refiner-0.9", torch_dtype=torch.float16, use_safetensors=True, variant="fp16") +refiner.to("cuda") +``` + +### Memory optimization via model offloading + +If you are seeing out-of-memory errors, we recommend making use of [`StableDiffusionXLPipeline.enable_model_cpu_offload`]. + +```diff +- pipe.to("cuda") ++ pipe.enable_model_cpu_offload() +``` + +and + +```diff +- refiner.to("cuda") ++ refiner.enable_model_cpu_offload() +``` + +### Speed-up inference with `torch.compile` + +You can speed up inference by making use of `torch.compile`. This should give you **ca.** 20% speed-up. + +```diff ++ pipe.unet = torch.compile(pipe.unet, mode="reduce-overhead", fullgraph=True) ++ refiner.unet = torch.compile(refiner.unet, mode="reduce-overhead", fullgraph=True) +``` + +### Running with `torch` < 2.0 + +**Note** that if you want to run Stable Diffusion XL with `torch` < 2.0, please make sure to enable xformers +attention: + +``` +pip install xformers +``` + +```py ++ pipe.enable_xformers_memory_efficient_attention() ++ refiner.enable_xformers_memory_efficient_attention() +``` ## StableDiffusionXLPipeline diff --git a/src/diffusers/utils/import_utils.py b/src/diffusers/utils/import_utils.py index 287992207e5a..3a7539cfb0fb 100644 --- a/src/diffusers/utils/import_utils.py +++ b/src/diffusers/utils/import_utils.py @@ -504,7 +504,7 @@ def is_invisible_watermark_available(): # docstyle-ignore INVISIBLE_WATERMARK_IMPORT_ERROR = """ -{0} requires the invisible-watermark library but it was not found in your environment. You can install it with pip: `pip install git+https://github.com/patrickvonplaten/invisible-watermark.git@remove_onnxruntime_depedency` +{0} requires the invisible-watermark library but it was not found in your environment. You can install it with pip: `pip install invisible-watermark>=2.0` """ From f32ef3bd77b98d1296970229efbeae4b656050a2 Mon Sep 17 00:00:00 2001 From: Patrick von Platen Date: Thu, 6 Jul 2023 18:46:24 +0200 Subject: [PATCH 2/6] make style --- .../stable_diffusion/stable_diffusion_xl.mdx | 24 +++++++++++++------ 1 file changed, 17 insertions(+), 7 deletions(-) diff --git a/docs/source/en/api/pipelines/stable_diffusion/stable_diffusion_xl.mdx b/docs/source/en/api/pipelines/stable_diffusion/stable_diffusion_xl.mdx index fa0cc5bcb27a..6a6f2d38fb29 100644 --- a/docs/source/en/api/pipelines/stable_diffusion/stable_diffusion_xl.mdx +++ b/docs/source/en/api/pipelines/stable_diffusion/stable_diffusion_xl.mdx @@ -48,7 +48,9 @@ You can use SDXL as follows for *text-to-image*: from diffusers import StableDiffusionXLPipeline import torch -pipe = StableDiffusionXLPipeline.from_pretrained("stabilityai/stable-diffusion-xl-base-0.9", torch_dtype=torch.float16, variant="fp16", use_safetensors=True) +pipe = StableDiffusionXLPipeline.from_pretrained( + "stabilityai/stable-diffusion-xl-base-0.9", torch_dtype=torch.float16, variant="fp16", use_safetensors=True +) pipe.to("cuda") prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k" @@ -64,10 +66,14 @@ In this case, you only have to output the `latents` from the base model. from diffusers import StableDiffusionXLPipeline, StableDiffusionXLImg2ImgPipeline import torch -pipe = StableDiffusionXLPipeline.from_pretrained("stabilityai/stable-diffusion-xl-base-0.9", torch_dtype=torch.float16, variant="fp16", use_safetensors=True) +pipe = StableDiffusionXLPipeline.from_pretrained( + "stabilityai/stable-diffusion-xl-base-0.9", torch_dtype=torch.float16, variant="fp16", use_safetensors=True +) pipe.to("cuda") -refiner = StableDiffusionXLImg2ImgPipeline.from_pretrained("stabilityai/stable-diffusion-xl-refiner-0.9", torch_dtype=torch.float16, use_safetensors=True, variant="fp16") +refiner = StableDiffusionXLImg2ImgPipeline.from_pretrained( + "stabilityai/stable-diffusion-xl-refiner-0.9", torch_dtype=torch.float16, use_safetensors=True, variant="fp16" +) refiner.to("cuda") prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k" @@ -85,10 +91,14 @@ original file format into `diffusers`: from diffusers import StableDiffusionXLPipeline, StableDiffusionXLImg2ImgPipeline import torch -pipe = StableDiffusionXLPipeline.from_pretrained("stabilityai/stable-diffusion-xl-base-0.9", torch_dtype=torch.float16, variant="fp16", use_safetensors=True) +pipe = StableDiffusionXLPipeline.from_pretrained( + "stabilityai/stable-diffusion-xl-base-0.9", torch_dtype=torch.float16, variant="fp16", use_safetensors=True +) pipe.to("cuda") -refiner = StableDiffusionXLImg2ImgPipeline.from_pretrained("stabilityai/stable-diffusion-xl-refiner-0.9", torch_dtype=torch.float16, use_safetensors=True, variant="fp16") +refiner = StableDiffusionXLImg2ImgPipeline.from_pretrained( + "stabilityai/stable-diffusion-xl-refiner-0.9", torch_dtype=torch.float16, use_safetensors=True, variant="fp16" +) refiner.to("cuda") ``` @@ -127,8 +137,8 @@ pip install xformers ``` ```py -+ pipe.enable_xformers_memory_efficient_attention() -+ refiner.enable_xformers_memory_efficient_attention() ++pipe.enable_xformers_memory_efficient_attention() ++refiner.enable_xformers_memory_efficient_attention() ``` ## StableDiffusionXLPipeline From bf5a42ec88c92daa700b51f5e7356d4988219098 Mon Sep 17 00:00:00 2001 From: Patrick von Platen Date: Thu, 6 Jul 2023 18:47:21 +0200 Subject: [PATCH 3/6] Apply suggestions from code review --- .../en/api/pipelines/stable_diffusion/stable_diffusion_xl.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/source/en/api/pipelines/stable_diffusion/stable_diffusion_xl.mdx b/docs/source/en/api/pipelines/stable_diffusion/stable_diffusion_xl.mdx index 6a6f2d38fb29..64abb9eef8c8 100644 --- a/docs/source/en/api/pipelines/stable_diffusion/stable_diffusion_xl.mdx +++ b/docs/source/en/api/pipelines/stable_diffusion/stable_diffusion_xl.mdx @@ -136,7 +136,7 @@ attention: pip install xformers ``` -```py +```diff +pipe.enable_xformers_memory_efficient_attention() +refiner.enable_xformers_memory_efficient_attention() ``` From 6ee990eb130460460b69742524cdf3d0907399b3 Mon Sep 17 00:00:00 2001 From: Patrick von Platen Date: Thu, 6 Jul 2023 18:53:06 +0200 Subject: [PATCH 4/6] uP --- .github/workflows/build_documentation.yml | 17 ++++++----------- .github/workflows/build_pr_documentation.yml | 18 ++++++------------ 2 files changed, 12 insertions(+), 23 deletions(-) diff --git a/.github/workflows/build_documentation.yml b/.github/workflows/build_documentation.yml index 79d2cdec0672..f06cdf2ca3cf 100644 --- a/.github/workflows/build_documentation.yml +++ b/.github/workflows/build_documentation.yml @@ -11,17 +11,12 @@ on: jobs: build: steps: - - name: Install dependencies - run: | - apt-get update && apt-get install libsndfile1-dev libgl1 -y - - - name: Build doc - uses: huggingface/doc-builder/.github/workflows/build_main_documentation.yml@main - with: - commit_sha: ${{ github.sha }} - package: diffusers - notebook_folder: diffusers_doc - languages: en ko zh + uses: huggingface/doc-builder/.github/workflows/build_main_documentation.yml@main + with: + commit_sha: ${{ github.sha }} + package: diffusers + notebook_folder: diffusers_doc + languages: en ko zh secrets: token: ${{ secrets.HUGGINGFACE_PUSH }} diff --git a/.github/workflows/build_pr_documentation.yml b/.github/workflows/build_pr_documentation.yml index 248644b7e9cd..d09411e37287 100644 --- a/.github/workflows/build_pr_documentation.yml +++ b/.github/workflows/build_pr_documentation.yml @@ -9,15 +9,9 @@ concurrency: jobs: build: - steps: - - name: Install dependencies - run: | - apt-get update && apt-get install libsndfile1-dev libgl1 -y - - - name: Build doc - uses: huggingface/doc-builder/.github/workflows/build_pr_documentation.yml@main - with: - commit_sha: ${{ github.event.pull_request.head.sha }} - pr_number: ${{ github.event.number }} - package: diffusers - languages: en ko zh + uses: huggingface/doc-builder/.github/workflows/build_pr_documentation.yml@main + with: + commit_sha: ${{ github.event.pull_request.head.sha }} + pr_number: ${{ github.event.number }} + package: diffusers + languages: en ko zh From f5896f5dc56ac9b0db38cc452814b396125a0991 Mon Sep 17 00:00:00 2001 From: Patrick von Platen Date: Thu, 6 Jul 2023 19:10:57 +0200 Subject: [PATCH 5/6] uP --- .github/workflows/build_documentation.yml | 1 + .github/workflows/build_pr_documentation.yml | 1 + 2 files changed, 2 insertions(+) diff --git a/.github/workflows/build_documentation.yml b/.github/workflows/build_documentation.yml index f06cdf2ca3cf..f8c5dcac428b 100644 --- a/.github/workflows/build_documentation.yml +++ b/.github/workflows/build_documentation.yml @@ -14,6 +14,7 @@ jobs: uses: huggingface/doc-builder/.github/workflows/build_main_documentation.yml@main with: commit_sha: ${{ github.sha }} + install_libgl1: 'true' package: diffusers notebook_folder: diffusers_doc languages: en ko zh diff --git a/.github/workflows/build_pr_documentation.yml b/.github/workflows/build_pr_documentation.yml index d09411e37287..d7eb119104f2 100644 --- a/.github/workflows/build_pr_documentation.yml +++ b/.github/workflows/build_pr_documentation.yml @@ -13,5 +13,6 @@ jobs: with: commit_sha: ${{ github.event.pull_request.head.sha }} pr_number: ${{ github.event.number }} + install_libgl1: 'true' package: diffusers languages: en ko zh From 08ec72ead3c78f794f670f17601f4dd5c13680ba Mon Sep 17 00:00:00 2001 From: Patrick von Platen Date: Thu, 6 Jul 2023 19:12:45 +0200 Subject: [PATCH 6/6] Correct --- .github/workflows/build_documentation.yml | 2 +- .github/workflows/build_pr_documentation.yml | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/.github/workflows/build_documentation.yml b/.github/workflows/build_documentation.yml index f8c5dcac428b..8fdae99883f8 100644 --- a/.github/workflows/build_documentation.yml +++ b/.github/workflows/build_documentation.yml @@ -14,7 +14,7 @@ jobs: uses: huggingface/doc-builder/.github/workflows/build_main_documentation.yml@main with: commit_sha: ${{ github.sha }} - install_libgl1: 'true' + install_libgl1: true package: diffusers notebook_folder: diffusers_doc languages: en ko zh diff --git a/.github/workflows/build_pr_documentation.yml b/.github/workflows/build_pr_documentation.yml index d7eb119104f2..18b606ca754c 100644 --- a/.github/workflows/build_pr_documentation.yml +++ b/.github/workflows/build_pr_documentation.yml @@ -13,6 +13,6 @@ jobs: with: commit_sha: ${{ github.event.pull_request.head.sha }} pr_number: ${{ github.event.number }} - install_libgl1: 'true' + install_libgl1: true package: diffusers languages: en ko zh