add accelerate to load models with smaller memory footprint#361
add accelerate to load models with smaller memory footprint#361patrickvonplaten merged 21 commits intohuggingface:mainfrom piEsposito:main
Conversation
|
The documentation is not available anymore as the PR was closed or merged. |
src/diffusers/configuration_utils.py
Outdated
| init_dict, unused_kwargs = cls.extract_init_dict(config_dict, **kwargs) | ||
|
|
||
| model = cls(**init_dict) | ||
| device_map = kwargs.pop("low_cpu_mem_usage", None) |
There was a problem hiding this comment.
Ideally we would like to try to keep configuration_utils.py framework and component independent. Could we maybe try to set:
with accelerate.init_empty_weights():
model, unused_kwargs = cls.from_config(
config_path,
cache_dir=cache_dir,
return_unused_kwargs=True,
force_download=force_download,
resume_download=resume_download,
proxies=proxies,
local_files_only=local_files_only,
use_auth_token=use_auth_token,
revision=revision,
subfolder=subfolder,
device_map=device_map,
**kwargs,
)
in modeling_utils.py instead?
src/diffusers/modeling_utils.py
Outdated
|
|
||
| # Set model in evaluation mode to deactivate DropOut modules by default | ||
| model.eval() | ||
| if device_map is not None: |
There was a problem hiding this comment.
If possible it would be very nice if all accelerate logic would only be added here.
There was a problem hiding this comment.
Just did it. Had to put the model creation and checkpoint loading after grabbing the weight and config files to avoid splitting the accelerate logic into two.
patrickvonplaten
left a comment
There was a problem hiding this comment.
Hey @piEsposito,
Sorry for replying only now :-/
Thanks a lot for the PR! It looks already really nice. One important thing that (if possible) would be good to change is only add functionality to modeling_utils.py and not configuration_utils.py because configuration_utils is also used for the schedulers.
Could you maybe give this a try?
…ove it from configuration utils
|
@patrickvonplaten I've addressed your comments and moved accelerate logic to modelling utils. Also, created some tests to ensure memory usage gets lower and results keep the same. Thank you for your time to carefully review this PR. |
src/diffusers/modeling_utils.py
Outdated
| import torch | ||
| from torch import Tensor, device | ||
|
|
||
| import accelerate |
There was a problem hiding this comment.
We should not make accelerate a hard dependency here . Could you wrap it into a:
if accelerate_is_available():
import accelerate
else:
raise ImportError("Please install accelerate via `pip install accelerate`")
below the if device_map == "auto" method?
There was a problem hiding this comment.
@patrickvonplaten I've just did it, thank you for the suggestion.
src/diffusers/modeling_utils.py
Outdated
| from huggingface_hub import hf_hub_download | ||
| from huggingface_hub.utils import EntryNotFoundError, RepositoryNotFoundError, RevisionNotFoundError | ||
| from requests import HTTPError | ||
| from transformers.utils import is_accelerate_available |
There was a problem hiding this comment.
We cannot do this either because transformers is not hard requirement 😅
I should have given you more details in my previous comments - very sorry that the feedback cycle takes so much time. Will try hard to reply faster here now.
In short, can you copy this code https://github.com/huggingface/transformers/blob/e5b7cff5fe65eac9e54ba88fa3935b3270db0207/src/transformers/utils/import_utils.py#L528 into https://github.com/huggingface/diffusers/blob/main/src/diffusers/utils/import_utils.py
There was a problem hiding this comment.
Hey, thank you for clarifying that. I've just fixed it. And no problem. I'm sure you are very busy and am very thankful for the time you took to guide me through this PR.
patrickvonplaten
left a comment
There was a problem hiding this comment.
Just need to clean up the is_accelerate_available() comment and then it should be good to go :-)
|
@patrickvonplaten finished implementing all requested changes. Please let me know if anything else comes to your mind. |
|
This looks good to me! |
|
Hey folks, any update here? |
| from_auto_class = kwargs.pop("_from_auto", False) | ||
| torch_dtype = kwargs.pop("torch_dtype", None) | ||
| subfolder = kwargs.pop("subfolder", None) | ||
| device_map = kwargs.pop("device_map", None) |
There was a problem hiding this comment.
Sorry I oversaw this the first time.
@piEsposito could you also add some docstring here?
There was a problem hiding this comment.
E.g. 3,4 lines under line 264
|
PR is good to merge for me! Played around with: #!/usr/bin/env python3
from diffusers import UNet2DConditionModel
model = UNet2DConditionModel.from_pretrained("CompVis/stable-diffusion-v1-3", device_map="auto", subfolder="unet")
import ipdb; ipdb.set_trace()on 1, >1, and no GPU machines and it works as expected. |
patrickvonplaten
left a comment
There was a problem hiding this comment.
@patil-suraj @anton-l would be nice if one of you could take a look :-)
|
@piEsposito in a follow-up PR it would be nice if you could implement this then as well to the more global: from diffusers import DiffusionPipeline
pipeline = DiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-3", device_map="auto")functionality :-) Think this could have then very wide-spread adoption. At the moment 99% of users load models via the pipeline interface |
patil-suraj
left a comment
There was a problem hiding this comment.
LGTM, thanks a lot for working on this @piEsposito !
@patrickvonplaten great idea. I can start working on that today. Thanks! |
…ace#361) * add accelerate to load models with smaller memory footprint * remove low_cpu_mem_usage as it is reduntant * move accelerate init weights context to modelling utils * add test to ensure results are the same when loading with accelerate * add tests to ensure ram usage gets lower when using accelerate * move accelerate logic to single snippet under modelling utils and remove it from configuration utils * format code using to pass quality check * fix imports with isor * add accelerate to test extra deps * only import accelerate if device_map is set to auto * move accelerate availability check to diffusers import utils * format code Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Usage: with create_context() as ctx: module = model_annotation(ctx, input_contents=..., config_path=..., search_op=...) Example: The example is to annotate the minilm model with GPU config files. python model_annotation.py /nodclouddata/vivian/minilm_model/model.mlir /nodclouddata/vivian/minilm_model/model_config.json
…ace#361) * add accelerate to load models with smaller memory footprint * remove low_cpu_mem_usage as it is reduntant * move accelerate init weights context to modelling utils * add test to ensure results are the same when loading with accelerate * add tests to ensure ram usage gets lower when using accelerate * move accelerate logic to single snippet under modelling utils and remove it from configuration utils * format code using to pass quality check * fix imports with isor * add accelerate to test extra deps * only import accelerate if device_map is set to auto * move accelerate availability check to diffusers import utils * format code Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Closes #281