Optimize test files by fixing CPU-offloading usage by tolgacangoz · Pull Request #8409 · huggingface/diffusers

tolgacangoz · 2024-06-05T14:14:25Z

This pull request refactors the code to remove unnecessary calls to to(torch_device) and to("cuda"). These calls were redundant and consumed more memory unnecessarily, and can be safely removed without affecting the code's functionality.

There are also comparisons between output_without_offload and output_with_offload in the test files. I tried with SD-1.5-fp16 in Colab. After two forward passes (w and w/o offloading), the occupied system RAM is ~5.1 GB. But, if I initialize the pipeline again before pipeline.enable_sequential_cpu_offload(), the occupied system RAM is ~2.4 GB. 1-1.5 GB RAM is already occupied by the system initially. This difference is ~0.5 GB for pipeline.enable_model_cpu_offload(). And I couldn't see a difference on GPU vRAM much. The time cost for adding a second initialization was almost zero. What should be done for these places:

diffusers/tests/pipelines/test_pipelines_common.py

Lines 1362 to 1384 in 867a2b0

    
           def test_sequential_cpu_offload_forward_pass(self, expected_max_diff=1e-4): 
        
               import accelerate 
        
               components = self.get_dummy_components() 
        
               pipe = self.pipeline_class(**components) 
        
               for component in pipe.components.values(): 
        
                   if hasattr(component, "set_default_attn_processor"): 
        
                       component.set_default_attn_processor() 
        
               pipe.to(torch_device) 
        
               pipe.set_progress_bar_config(disable=None) 
        
               generator_device = "cpu" 
        
               inputs = self.get_dummy_inputs(generator_device) 
        
               output_without_offload = pipe(**inputs)[0] 
        
               pipe.enable_sequential_cpu_offload() 
        
               assert pipe._execution_device.type == "cuda" 
        
               inputs = self.get_dummy_inputs(generator_device) 
        
               output_with_offload = pipe(**inputs)[0] 
        
               max_diff = np.abs(to_np(output_with_offload) - to_np(output_without_offload)).max() 
        
               self.assertLess(max_diff, expected_max_diff, "CPU offloading should not affect the inference results")

@sayakpaul @yiyixuxu @DN6

yiyixuxu

ohh thanks!

yiyixuxu · 2024-06-06T19:19:45Z

@tolgacangoz
not sure I understand what you meant here - maybe a PR or a testing script?

There are also comparisons between output_without_offload and output_with_offload in the test files. I tried with SD-1.5-fp16 in Colab. After two forward passes (w and w/o offloading), the occupied system RAM is ~5.1 GB. But, if I initialize the pipeline again before pipeline.enable_sequential_cpu_offload(), the occupied system RAM is ~2.4 GB. 1-1.5 GB RAM is already occupied by the system initially. This difference is ~0.5 GB for pipeline.enable_model_cpu_offload(). And I couldn't see a difference on GPU vRAM much. The time cost for adding a second initialization was almost zero. What should be done for these places:

HuggingFaceDocBuilderDev · 2024-06-06T19:21:58Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

* Refactor code to remove unnecessary calls to `to(torch_device)` * Refactor code to remove unnecessary calls to `to("cuda")` * Update pipeline_stable_diffusion_diffedit.py

tolgacangoz added 3 commits May 30, 2024 16:20

Refactor code to remove unnecessary calls to to(torch_device)

a9aad0e

Refactor code to remove unnecessary calls to to("cuda")

812a4c1

Merge branch 'main' into fix-offloading

b278652

tolgacangoz closed this Jun 5, 2024

tolgacangoz reopened this Jun 5, 2024

tolgacangoz marked this pull request as draft June 5, 2024 15:25

tolgacangoz added 2 commits June 5, 2024 20:39

Update pipeline_stable_diffusion_diffedit.py

9442555

Merge branch 'main' into fix-offloading

31a1174

tolgacangoz marked this pull request as ready for review June 6, 2024 09:58

tolgacangoz changed the title ~~Fix CPU-Offloading Usage~~ Optimize test files by fixing CPU-offloading usage Jun 6, 2024

sayakpaul requested a review from DN6 June 6, 2024 11:57

yiyixuxu approved these changes Jun 6, 2024

View reviewed changes

yiyixuxu merged commit ec1aded into huggingface:main Jun 6, 2024

tolgacangoz deleted the fix-offloading branch July 27, 2024 11:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize test files by fixing CPU-offloading usage#8409

Optimize test files by fixing CPU-offloading usage#8409
yiyixuxu merged 5 commits intohuggingface:mainfrom
tolgacangoz:fix-offloading

tolgacangoz commented Jun 5, 2024 •

edited

Loading

Uh oh!

yiyixuxu left a comment

Uh oh!

yiyixuxu commented Jun 6, 2024

Uh oh!

HuggingFaceDocBuilderDev commented Jun 6, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	def test_sequential_cpu_offload_forward_pass(self, expected_max_diff=1e-4):
	import accelerate

	components = self.get_dummy_components()
	pipe = self.pipeline_class(**components)
	for component in pipe.components.values():
	if hasattr(component, "set_default_attn_processor"):
	component.set_default_attn_processor()
	pipe.to(torch_device)
	pipe.set_progress_bar_config(disable=None)

	generator_device = "cpu"
	inputs = self.get_dummy_inputs(generator_device)
	output_without_offload = pipe(**inputs)[0]

	pipe.enable_sequential_cpu_offload()
	assert pipe._execution_device.type == "cuda"

	inputs = self.get_dummy_inputs(generator_device)
	output_with_offload = pipe(**inputs)[0]

	max_diff = np.abs(to_np(output_with_offload) - to_np(output_without_offload)).max()
	self.assertLess(max_diff, expected_max_diff, "CPU offloading should not affect the inference results")

Conversation

tolgacangoz commented Jun 5, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yiyixuxu left a comment

Choose a reason for hiding this comment

Uh oh!

yiyixuxu commented Jun 6, 2024

Uh oh!

HuggingFaceDocBuilderDev commented Jun 6, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

tolgacangoz commented Jun 5, 2024 •

edited

Loading