Conversation
|
The documentation is not available anymore as the PR was closed or merged. |
| if self.config.timestep_spacing == "linspace": | ||
| timesteps = np.linspace(0, self.config.num_train_timesteps - 1, num_inference_steps, dtype=float)[::-1].copy() | ||
| elif self.config.timestep_spacing == "leading": | ||
| step_ratio = self.config.num_train_timesteps // self.num_inference_steps |
There was a problem hiding this comment.
this new spacing, doesn't give drastically better results, but better results nevertheless IMO. It's also needed to have 1-to-1 the same results as original code.
There was a problem hiding this comment.
Does the original code (XL) use this new spacing scheme, though?
| num_transformer_blocks (`int` or `Tuple[int]`, *optional*, defaults to 1): | ||
| The number of transformer blocks of type [`~models.attention.BasicTransformerBlock`]. Only relevant for [`~models.unet_2d_blocks.CrossAttnDownBlock2D`], [`~models.unet_2d_blocks.CrossAttnUpBlock2D`], [`~models.unet_2d_blocks.UNetMidBlock2DCrossAttn`]. |
There was a problem hiding this comment.
So a Transformer block can be a UNet block? I don't find the num_transformer_blocks name to be a good one to encompass all the blocks we're supporting here. But cannot think of a better one, either. So, okay to ignore I guess.
There was a problem hiding this comment.
Yeah good point, maybe transformer_layers_per_block is better?
| def convert_open_clip_checkpoint(checkpoint): | ||
| text_model = CLIPTextModel.from_pretrained("stabilityai/stable-diffusion-2", subfolder="text_encoder") | ||
| def convert_open_clip_checkpoint(checkpoint, prefix="cond_stage_model.model."): | ||
| # text_model = CLIPTextModel.from_pretrained("stabilityai/stable-diffusion-2", subfolder="text_encoder") |
There was a problem hiding this comment.
Are we not affecting the SD 2 conversion process with this one?
There was a problem hiding this comment.
Need to double check!
| num_train_timesteps = original_config.model.params.timesteps or 1000 | ||
| beta_start = original_config.model.params.linear_start or 0.02 | ||
| beta_end = original_config.model.params.linear_end or 0.085 |
There was a problem hiding this comment.
Where are these numbers coming from? I'd make a note for our future reference.
There was a problem hiding this comment.
Ah this is hacky for now and shouldn't be this way
| text_encoder_lora_scale = ( | ||
| cross_attention_kwargs.get("scale", None) if cross_attention_kwargs is not None else None | ||
| ) | ||
| ( |
There was a problem hiding this comment.
4 tensors are returned instead of just one.
The first 2 tensors are the normal pos and neg prompt embeddings that are passed into cross attention. The last 2 "pooled" embeds are used to additional condition the time embedding
Fix embeddings for classic SD models.
| This parameter controls whether to use Karras sigmas (Karras et al. (2022) scheme) for step sizes in the | ||
| noise schedule during the sampling process. If True, the sigmas will be determined according to a sequence | ||
| of noise levels {σi} as defined in Equation (5) of the paper https://arxiv.org/pdf/2206.00364.pdf. | ||
| timestep_spacing (`str`, default `"linspace"`): |
There was a problem hiding this comment.
Those changes should also work well for other schedulers
src/diffusers/pipelines/stable_diffusion_xl/pipeline_stable_diffusion_xl_img2img.py
Show resolved
Hide resolved
| A kwargs dictionary that if specified is passed along to the `AttentionProcessor` as defined under | ||
| `self.processor` in | ||
| [diffusers.cross_attention](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/cross_attention.py). | ||
| guidance_rescale (`float`, *optional*, defaults to 0.7): |
There was a problem hiding this comment.
Ah yes we should probably fix this in a follow-up PR! Sorry just noticed the comment here. Would you like to open a PR here maybe @bghira ? :-)
This reverts commit 491bc9f.
he just merge the feature branch to |
thanks |
* Add new text encoder * add transformers depth * More * Correct conversion script * Fix more * Fix more * Correct more * correct text encoder * Finish all * proof that in works in run local xl * clean up * Get refiner to work * Add red castle * Fix batch size * Improve pipelines more * Finish text2image tests * Add img2img test * Fix more * fix import * Fix embeddings for classic models (huggingface#3888) Fix embeddings for classic SD models. * Allow multiple prompts to be passed to the refiner (huggingface#3895) * finish more * Apply suggestions from code review * add watermarker * Model offload (huggingface#3889) * Model offload. * Model offload for refiner / img2img * Hardcode encoder offload on img2img vae encode Saves some GPU RAM in img2img / refiner tasks so it remains below 8 GB. --------- Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * correct * fix * clean print * Update install warning for `invisible-watermark` * add: missing docstrings. * fix and simplify the usage example in img2img. * fix setup for watermarking. * Revert "fix setup for watermarking." This reverts commit 491bc9f. * fix: watermarking setup. * fix: op. * run make fix-copies. * make sure tests pass * improve convert * make tests pass * make tests pass * better error message * fiinsh * finish * Fix final test --------- Co-authored-by: Pedro Cuenca <pedro@huggingface.co> Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* Add new text encoder * add transformers depth * More * Correct conversion script * Fix more * Fix more * Correct more * correct text encoder * Finish all * proof that in works in run local xl * clean up * Get refiner to work * Add red castle * Fix batch size * Improve pipelines more * Finish text2image tests * Add img2img test * Fix more * fix import * Fix embeddings for classic models (huggingface#3888) Fix embeddings for classic SD models. * Allow multiple prompts to be passed to the refiner (huggingface#3895) * finish more * Apply suggestions from code review * add watermarker * Model offload (huggingface#3889) * Model offload. * Model offload for refiner / img2img * Hardcode encoder offload on img2img vae encode Saves some GPU RAM in img2img / refiner tasks so it remains below 8 GB. --------- Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * correct * fix * clean print * Update install warning for `invisible-watermark` * add: missing docstrings. * fix and simplify the usage example in img2img. * fix setup for watermarking. * Revert "fix setup for watermarking." This reverts commit 491bc9f. * fix: watermarking setup. * fix: op. * run make fix-copies. * make sure tests pass * improve convert * make tests pass * make tests pass * better error message * fiinsh * finish * Fix final test --------- Co-authored-by: Pedro Cuenca <pedro@huggingface.co> Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Usage for
"stabilityai/stable-diffusion-xl-base-0.9":In addition make sure to install
transformers,safetensors,accelerateas well as the invisible watermark:You can use the model then as follows
When using
torch >= 2.0, you can improve the inference speed by 20-30% with torch.compile. Simple wrap the unet with torch compile before running the pipeline:If you are limited by GPU VRAM, you can enable cpu offloading by calling
pipe.enable_model_cpu_offloadinstead of
.to("cuda"):Usage for
"stabilityai/stable-diffusion-xl-refiner-0.9"When using
torch >= 2.0, you can improve the inference speed by 20-30% with torch.compile. Simple wrap the unet with torch compile before running the pipeline:If you are limited by GPU VRAM, you can enable cpu offloading by calling
pipe.enable_model_cpu_offloadinstead of
.to("cuda"):