From 9f2d7e2fba658bc56e0f37f6e552082632520ff6 Mon Sep 17 00:00:00 2001
From: Patrick von Platen <patrick.v.platen@gmail.com>
Date: Thu, 6 Jul 2023 18:44:26 +0200
Subject: [PATCH 1/6] finish sd xl docs

---
 .../stable_diffusion/stable_diffusion_xl.mdx  | 114 +++++++++++++++++-
 src/diffusers/utils/import_utils.py           |   2 +-
 2 files changed, 109 insertions(+), 7 deletions(-)

diff --git a/docs/source/en/api/pipelines/stable_diffusion/stable_diffusion_xl.mdx b/docs/source/en/api/pipelines/stable_diffusion/stable_diffusion_xl.mdx
index b87d51af233b..fa0cc5bcb27a 100644
--- a/docs/source/en/api/pipelines/stable_diffusion/stable_diffusion_xl.mdx
+++ b/docs/source/en/api/pipelines/stable_diffusion/stable_diffusion_xl.mdx
@@ -12,22 +12,124 @@ specific language governing permissions and limitations under the License.
 
 # Stable diffusion XL
 
-Stable Diffusion 2 is a text-to-image _latent diffusion_ model built upon the work of [Stable Diffusion 1](https://stability.ai/blog/stable-diffusion-public-release). 
-The project to train Stable Diffusion 2 was led by Robin Rombach and Katherine Crowson from [Stability AI](https://stability.ai/) and [LAION](https://laion.ai/).
+Stable Diffusion XL was proposed in [SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis](https://arxiv.org/abs/2307.01952) by Dustin Podell, Zion English, Kyle Lacey, Andreas Blattmann, Tim Dockhorn, Jonas Müller, Joe Penna, Robin Rombach
 
-*The Stable Diffusion 2.0 release includes robust text-to-image models trained using a brand new text encoder (OpenCLIP), developed by LAION with support from Stability AI, which greatly improves the quality of the generated images compared to earlier V1 releases. The text-to-image models in this release can generate images with default resolutions of both 512x512 pixels and 768x768 pixels. 
-These models are trained on an aesthetic subset of the [LAION-5B dataset](https://laion.ai/blog/laion-5b/) created by the DeepFloyd team at Stability AI, which is then further filtered to remove adult content using [LAION’s NSFW filter](https://openreview.net/forum?id=M3Y74vmsMcY).*
+The abstract of the paper is the following:
 
-For more details about how Stable Diffusion 2 works and how it differs from Stable Diffusion 1, please refer to the official [launch announcement post](https://stability.ai/blog/stable-diffusion-v2-release).
+*We present SDXL, a latent diffusion model for text-to-image synthesis. Compared to previous versions of Stable Diffusion, SDXL leverages a three times larger UNet backbone: The increase of model parameters is mainly due to more attention blocks and a larger cross-attention context as SDXL uses a second text encoder. We design multiple novel conditioning schemes and train SDXL on multiple aspect ratios. We also introduce a refinement model which is used to improve the visual fidelity of samples generated by SDXL using a post-hoc image-to-image technique. We demonstrate that SDXL shows drastically improved performance compared the previous versions of Stable Diffusion and achieves results competitive with those of black-box state-of-the-art image generators.*
 
 ## Tips
 
+- Stable Diffusion XL works especially well with images between 768 and 1024.
+- Stable Diffusion XL output image can be improved by making use of a refiner as shown below
+
 ### Available checkpoints:
 
 - *Text-to-Image (1024x1024 resolution)*: [stabilityai/stable-diffusion-xl-base-0.9](https://huggingface.co/stabilityai/stable-diffusion-xl-base-0.9) with [`StableDiffusionXLPipeline`]
 - *Image-to-Image / Refiner (1024x1024 resolution)*: [stabilityai/stable-diffusion-xl-refiner-0.9](https://huggingface.co/stabilityai/stable-diffusion-xl-refiner-0.9) with [`StableDiffusionXLImg2ImgPipeline`]
 
-TODO
+## Usage Example
+
+Before using SDXL make sure to have `transformers`, `accelerate`, `safetensors` and `invisible_watermark` installed. 
+You can install the libraries as follows:
+
+```
+pip install transformers
+pip install accelerate
+pip install safetensors
+pip install invisible-watermark>=2.0
+```
+
+### *Text-to-Image*
+
+You can use SDXL as follows for *text-to-image*:
+
+```py
+from diffusers import StableDiffusionXLPipeline
+import torch
+
+pipe = StableDiffusionXLPipeline.from_pretrained("stabilityai/stable-diffusion-xl-base-0.9", torch_dtype=torch.float16, variant="fp16", use_safetensors=True)
+pipe.to("cuda")
+
+prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"
+image = pipe(prompt=prompt).images[0]
+```
+
+### Refining the image output
+
+The image can be refined by making use of [stabilityai/stable-diffusion-xl-refiner-0.9](https://huggingface.co/stabilityai/stable-diffusion-xl-refiner-0.9).
+In this case, you only have to output the `latents` from the base model.
+
+```py
+from diffusers import StableDiffusionXLPipeline, StableDiffusionXLImg2ImgPipeline
+import torch
+
+pipe = StableDiffusionXLPipeline.from_pretrained("stabilityai/stable-diffusion-xl-base-0.9", torch_dtype=torch.float16, variant="fp16", use_safetensors=True)
+pipe.to("cuda")
+
+refiner = StableDiffusionXLImg2ImgPipeline.from_pretrained("stabilityai/stable-diffusion-xl-refiner-0.9", torch_dtype=torch.float16, use_safetensors=True, variant="fp16")
+refiner.to("cuda")
+
+prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"
+
+image = pipe(prompt=prompt, output_type="latent" if use_refiner else "pil").images[0]
+image = refiner(prompt=prompt, image=image[None, :]).images[0]
+```
+
+### Loading single file checkpoitns / original file format
+
+By making use of [`~diffusers.loaders.FromSingleFileMixin.from_single_file`] you can also load the 
+original file format into `diffusers`:
+
+```py
+from diffusers import StableDiffusionXLPipeline, StableDiffusionXLImg2ImgPipeline
+import torch
+
+pipe = StableDiffusionXLPipeline.from_pretrained("stabilityai/stable-diffusion-xl-base-0.9", torch_dtype=torch.float16, variant="fp16", use_safetensors=True)
+pipe.to("cuda")
+
+refiner = StableDiffusionXLImg2ImgPipeline.from_pretrained("stabilityai/stable-diffusion-xl-refiner-0.9", torch_dtype=torch.float16, use_safetensors=True, variant="fp16")
+refiner.to("cuda")
+```
+
+### Memory optimization via model offloading 
+
+If you are seeing out-of-memory errors, we recommend making use of [`StableDiffusionXLPipeline.enable_model_cpu_offload`].
+
+```diff
+- pipe.to("cuda")
++ pipe.enable_model_cpu_offload()
+```
+
+and 
+
+```diff
+- refiner.to("cuda")
++ refiner.enable_model_cpu_offload()
+```
+
+### Speed-up inference with `torch.compile`
+
+You can speed up inference by making use of `torch.compile`. This should give you **ca.** 20% speed-up.
+
+```diff
++ pipe.unet = torch.compile(pipe.unet, mode="reduce-overhead", fullgraph=True)
++ refiner.unet = torch.compile(refiner.unet, mode="reduce-overhead", fullgraph=True)
+```
+
+### Running with `torch` < 2.0
+
+**Note** that if you want to run Stable Diffusion XL with `torch` < 2.0, please make sure to enable xformers 
+attention:
+
+```
+pip install xformers
+```
+
+```py
++ pipe.enable_xformers_memory_efficient_attention()
++ refiner.enable_xformers_memory_efficient_attention()
+```
 
 ## StableDiffusionXLPipeline
 
diff --git a/src/diffusers/utils/import_utils.py b/src/diffusers/utils/import_utils.py
index 287992207e5a..3a7539cfb0fb 100644
--- a/src/diffusers/utils/import_utils.py
+++ b/src/diffusers/utils/import_utils.py
@@ -504,7 +504,7 @@ def is_invisible_watermark_available():
 
 # docstyle-ignore
 INVISIBLE_WATERMARK_IMPORT_ERROR = """
-{0} requires the invisible-watermark library but it was not found in your environment. You can install it with pip: `pip install git+https://github.com/patrickvonplaten/invisible-watermark.git@remove_onnxruntime_depedency`
+{0} requires the invisible-watermark library but it was not found in your environment. You can install it with pip: `pip install invisible-watermark>=2.0`
 """
 
 

From f32ef3bd77b98d1296970229efbeae4b656050a2 Mon Sep 17 00:00:00 2001
From: Patrick von Platen <patrick.v.platen@gmail.com>
Date: Thu, 6 Jul 2023 18:46:24 +0200
Subject: [PATCH 2/6] make style

---
 .../stable_diffusion/stable_diffusion_xl.mdx  | 24 +++++++++++++------
 1 file changed, 17 insertions(+), 7 deletions(-)

diff --git a/docs/source/en/api/pipelines/stable_diffusion/stable_diffusion_xl.mdx b/docs/source/en/api/pipelines/stable_diffusion/stable_diffusion_xl.mdx
index fa0cc5bcb27a..6a6f2d38fb29 100644
--- a/docs/source/en/api/pipelines/stable_diffusion/stable_diffusion_xl.mdx
+++ b/docs/source/en/api/pipelines/stable_diffusion/stable_diffusion_xl.mdx
@@ -48,7 +48,9 @@ You can use SDXL as follows for *text-to-image*:
 from diffusers import StableDiffusionXLPipeline
 import torch
 
-pipe = StableDiffusionXLPipeline.from_pretrained("stabilityai/stable-diffusion-xl-base-0.9", torch_dtype=torch.float16, variant="fp16", use_safetensors=True)
+pipe = StableDiffusionXLPipeline.from_pretrained(
+    "stabilityai/stable-diffusion-xl-base-0.9", torch_dtype=torch.float16, variant="fp16", use_safetensors=True
+)
 pipe.to("cuda")
 
 prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"
@@ -64,10 +66,14 @@ In this case, you only have to output the `latents` from the base model.
 from diffusers import StableDiffusionXLPipeline, StableDiffusionXLImg2ImgPipeline
 import torch
 
-pipe = StableDiffusionXLPipeline.from_pretrained("stabilityai/stable-diffusion-xl-base-0.9", torch_dtype=torch.float16, variant="fp16", use_safetensors=True)
+pipe = StableDiffusionXLPipeline.from_pretrained(
+    "stabilityai/stable-diffusion-xl-base-0.9", torch_dtype=torch.float16, variant="fp16", use_safetensors=True
+)
 pipe.to("cuda")
 
-refiner = StableDiffusionXLImg2ImgPipeline.from_pretrained("stabilityai/stable-diffusion-xl-refiner-0.9", torch_dtype=torch.float16, use_safetensors=True, variant="fp16")
+refiner = StableDiffusionXLImg2ImgPipeline.from_pretrained(
+    "stabilityai/stable-diffusion-xl-refiner-0.9", torch_dtype=torch.float16, use_safetensors=True, variant="fp16"
+)
 refiner.to("cuda")
 
 prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"
@@ -85,10 +91,14 @@ original file format into `diffusers`:
 from diffusers import StableDiffusionXLPipeline, StableDiffusionXLImg2ImgPipeline
 import torch
 
-pipe = StableDiffusionXLPipeline.from_pretrained("stabilityai/stable-diffusion-xl-base-0.9", torch_dtype=torch.float16, variant="fp16", use_safetensors=True)
+pipe = StableDiffusionXLPipeline.from_pretrained(
+    "stabilityai/stable-diffusion-xl-base-0.9", torch_dtype=torch.float16, variant="fp16", use_safetensors=True
+)
 pipe.to("cuda")
 
-refiner = StableDiffusionXLImg2ImgPipeline.from_pretrained("stabilityai/stable-diffusion-xl-refiner-0.9", torch_dtype=torch.float16, use_safetensors=True, variant="fp16")
+refiner = StableDiffusionXLImg2ImgPipeline.from_pretrained(
+    "stabilityai/stable-diffusion-xl-refiner-0.9", torch_dtype=torch.float16, use_safetensors=True, variant="fp16"
+)
 refiner.to("cuda")
 ```
 
@@ -127,8 +137,8 @@ pip install xformers
 ```
 
 ```py
-+ pipe.enable_xformers_memory_efficient_attention()
-+ refiner.enable_xformers_memory_efficient_attention()
++pipe.enable_xformers_memory_efficient_attention()
++refiner.enable_xformers_memory_efficient_attention()
 ```
 
 ## StableDiffusionXLPipeline

From bf5a42ec88c92daa700b51f5e7356d4988219098 Mon Sep 17 00:00:00 2001
From: Patrick von Platen <patrick.v.platen@gmail.com>
Date: Thu, 6 Jul 2023 18:47:21 +0200
Subject: [PATCH 3/6] Apply suggestions from code review

---
 .../en/api/pipelines/stable_diffusion/stable_diffusion_xl.mdx   | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/source/en/api/pipelines/stable_diffusion/stable_diffusion_xl.mdx b/docs/source/en/api/pipelines/stable_diffusion/stable_diffusion_xl.mdx
index 6a6f2d38fb29..64abb9eef8c8 100644
--- a/docs/source/en/api/pipelines/stable_diffusion/stable_diffusion_xl.mdx
+++ b/docs/source/en/api/pipelines/stable_diffusion/stable_diffusion_xl.mdx
@@ -136,7 +136,7 @@ attention:
 pip install xformers
 ```
 
-```py
+```diff
 +pipe.enable_xformers_memory_efficient_attention()
 +refiner.enable_xformers_memory_efficient_attention()
 ```

From 6ee990eb130460460b69742524cdf3d0907399b3 Mon Sep 17 00:00:00 2001
From: Patrick von Platen <patrick.v.platen@gmail.com>
Date: Thu, 6 Jul 2023 18:53:06 +0200
Subject: [PATCH 4/6] uP

---
 .github/workflows/build_documentation.yml    | 17 ++++++-----------
 .github/workflows/build_pr_documentation.yml | 18 ++++++------------
 2 files changed, 12 insertions(+), 23 deletions(-)

diff --git a/.github/workflows/build_documentation.yml b/.github/workflows/build_documentation.yml
index 79d2cdec0672..f06cdf2ca3cf 100644
--- a/.github/workflows/build_documentation.yml
+++ b/.github/workflows/build_documentation.yml
@@ -11,17 +11,12 @@ on:
 jobs:
   build:
     steps:
-      - name: Install dependencies
-        run: |
-          apt-get update && apt-get install libsndfile1-dev libgl1 -y
-
-      - name: Build doc
-        uses: huggingface/doc-builder/.github/workflows/build_main_documentation.yml@main
-        with:
-          commit_sha: ${{ github.sha }}
-          package: diffusers
-          notebook_folder: diffusers_doc
-          languages: en ko zh
+      uses: huggingface/doc-builder/.github/workflows/build_main_documentation.yml@main
+      with:
+        commit_sha: ${{ github.sha }}
+        package: diffusers
+        notebook_folder: diffusers_doc
+        languages: en ko zh
 
     secrets:
       token: ${{ secrets.HUGGINGFACE_PUSH }}
diff --git a/.github/workflows/build_pr_documentation.yml b/.github/workflows/build_pr_documentation.yml
index 248644b7e9cd..d09411e37287 100644
--- a/.github/workflows/build_pr_documentation.yml
+++ b/.github/workflows/build_pr_documentation.yml
@@ -9,15 +9,9 @@ concurrency:
 
 jobs:
   build:
-    steps:
-      - name: Install dependencies
-        run: |
-          apt-get update && apt-get install libsndfile1-dev libgl1 -y
-
-      - name: Build doc
-        uses: huggingface/doc-builder/.github/workflows/build_pr_documentation.yml@main
-        with:
-          commit_sha: ${{ github.event.pull_request.head.sha }}
-          pr_number: ${{ github.event.number }}
-          package: diffusers
-          languages: en ko zh
+    uses: huggingface/doc-builder/.github/workflows/build_pr_documentation.yml@main
+    with:
+      commit_sha: ${{ github.event.pull_request.head.sha }}
+      pr_number: ${{ github.event.number }}
+      package: diffusers
+      languages: en ko zh

From f5896f5dc56ac9b0db38cc452814b396125a0991 Mon Sep 17 00:00:00 2001
From: Patrick von Platen <patrick.v.platen@gmail.com>
Date: Thu, 6 Jul 2023 19:10:57 +0200
Subject: [PATCH 5/6] uP

---
 .github/workflows/build_documentation.yml    | 1 +
 .github/workflows/build_pr_documentation.yml | 1 +
 2 files changed, 2 insertions(+)

diff --git a/.github/workflows/build_documentation.yml b/.github/workflows/build_documentation.yml
index f06cdf2ca3cf..f8c5dcac428b 100644
--- a/.github/workflows/build_documentation.yml
+++ b/.github/workflows/build_documentation.yml
@@ -14,6 +14,7 @@ jobs:
       uses: huggingface/doc-builder/.github/workflows/build_main_documentation.yml@main
       with:
         commit_sha: ${{ github.sha }}
+        install_libgl1: 'true'
         package: diffusers
         notebook_folder: diffusers_doc
         languages: en ko zh
diff --git a/.github/workflows/build_pr_documentation.yml b/.github/workflows/build_pr_documentation.yml
index d09411e37287..d7eb119104f2 100644
--- a/.github/workflows/build_pr_documentation.yml
+++ b/.github/workflows/build_pr_documentation.yml
@@ -13,5 +13,6 @@ jobs:
     with:
       commit_sha: ${{ github.event.pull_request.head.sha }}
       pr_number: ${{ github.event.number }}
+      install_libgl1: 'true'
       package: diffusers
       languages: en ko zh

From 08ec72ead3c78f794f670f17601f4dd5c13680ba Mon Sep 17 00:00:00 2001
From: Patrick von Platen <patrick.v.platen@gmail.com>
Date: Thu, 6 Jul 2023 19:12:45 +0200
Subject: [PATCH 6/6] Correct

---
 .github/workflows/build_documentation.yml    | 2 +-
 .github/workflows/build_pr_documentation.yml | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/.github/workflows/build_documentation.yml b/.github/workflows/build_documentation.yml
index f8c5dcac428b..8fdae99883f8 100644
--- a/.github/workflows/build_documentation.yml
+++ b/.github/workflows/build_documentation.yml
@@ -14,7 +14,7 @@ jobs:
       uses: huggingface/doc-builder/.github/workflows/build_main_documentation.yml@main
       with:
         commit_sha: ${{ github.sha }}
-        install_libgl1: 'true'
+        install_libgl1: true
         package: diffusers
         notebook_folder: diffusers_doc
         languages: en ko zh
diff --git a/.github/workflows/build_pr_documentation.yml b/.github/workflows/build_pr_documentation.yml
index d7eb119104f2..18b606ca754c 100644
--- a/.github/workflows/build_pr_documentation.yml
+++ b/.github/workflows/build_pr_documentation.yml
@@ -13,6 +13,6 @@ jobs:
     with:
       commit_sha: ${{ github.event.pull_request.head.sha }}
       pr_number: ${{ github.event.number }}
-      install_libgl1: 'true'
+      install_libgl1: true
       package: diffusers
       languages: en ko zh