Optimize test files by fixing CPU-offloading usage#8409
Optimize test files by fixing CPU-offloading usage#8409yiyixuxu merged 5 commits intohuggingface:mainfrom
Conversation
|
@tolgacangoz
|
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
* Refactor code to remove unnecessary calls to `to(torch_device)`
* Refactor code to remove unnecessary calls to `to("cuda")`
* Update pipeline_stable_diffusion_diffedit.py
This pull request refactors the code to remove unnecessary calls to
to(torch_device)andto("cuda"). These calls were redundant and consumed more memory unnecessarily, and can be safely removed without affecting the code's functionality.There are also comparisons between
output_without_offloadandoutput_with_offloadin the test files. I tried with SD-1.5-fp16 in Colab. After two forward passes (w and w/o offloading), the occupied system RAM is ~5.1 GB. But, if I initialize the pipeline again beforepipeline.enable_sequential_cpu_offload(), the occupied system RAM is ~2.4 GB. 1-1.5 GB RAM is already occupied by the system initially. This difference is ~0.5 GB forpipeline.enable_model_cpu_offload(). And I couldn't see a difference on GPU vRAM much. The time cost for adding a second initialization was almost zero. What should be done for these places:diffusers/tests/pipelines/test_pipelines_common.py
Lines 1362 to 1384 in 867a2b0
@sayakpaul @yiyixuxu @DN6