🚨🚨🚨 Enforce single model initialization by sgugger · Pull Request #21431 · huggingface/transformers

sgugger · 2023-02-02T20:36:28Z

What does this PR do?

There are currently three problems with the mode inits:

Problem 1: When not using the fast init (so in practice when using the model constructor or AutoXxx.from_config instead of from_pretrained) weights are initialized multiple times. @stas00 showed the example of OPTForCausalLM where we have a call to post_init() three times: in OPTForCausalLM, OptModel and OptDecoder. Each of those calls launches a recursive call of _init_weights to all submodules of the model, so this makes three inits.

Problem 2: The fast init (of random weights of the head in from_pretrained) and non-fast init (as above) are not always equivalent. This is because in from_pretrained init is done on calling _init_weights only on leaf modules with weights not present in the checkpoint, but sometimes _init_weights contains class checks for bigger modules (here is one example in OneFormer)

Problem 3: Some of the models have _init_weights function that will initialize the same weights with two different ways. We can take back this example in OneFormer which initializes a weight that is a Conv2D, but _init_weights is applied recursively, so that Conv2D will also be initialized here with a different rule.

This PR should solve these three problems with one stone by changing slightly the _init_weights function to look for a private _is_hf_initialized attribute in the module and skip the init if it's there and True. Of course when initializing a module, this private attribute is set to True after the initialization is done.

This PR gets the 🚨🚨🚨 sign because it might break user's code if they were relying on the (buggy) init of composite models: if a model has an encoder or backbone that is initialized differently from the rest, the init of the encoder/backbone was previously erased by the bigger model init.

src/transformers/modeling_utils.py

HuggingFaceDocBuilderDev · 2023-02-02T20:56:18Z

The documentation is not available anymore as the PR was closed or merged.

sgugger · 2023-02-03T14:23:43Z

@stas00 In initial discussions with @LysandreJik , he mentioned he preferred not having a wrapper. Though the argument about init weights code in the wild is a sound one, so showed how it could look like with the last two commits.

LysandreJik · 2023-02-03T14:32:24Z

Thanks for the PR, and for showing the two options! I feel like the wrapper is a little bit magical, but would make contributions simpler while reducing the complexity of the code.

I would go with the wrapper, if possible.

stas00 · 2023-02-03T19:25:36Z

src/transformers/models/oneformer/modeling_oneformer.py

        elif isinstance(module, OneFormerTransformerDecoder):
            nn.init.xavier_uniform_(module.query_input_projection.weight, gain=xavier_std)
            nn.init.constant_(module.query_input_projection.bias, 0)
+            module.query_input_projection._is_hf_initialized = True


once all instances of OneFormerTransformerDecoder submodule will have _is_hf_initialized = True this code would already never run, no? as _init_weights won't get called on this sub-module anymore.

The goal with this is to avoid the module.query_input_projections be initialized another time since it is a Conv2d and there is a path for Conv2d in the succession of tests here. This is an example of fix for problem 2 in the PR description.

Got it. I can see now why, it's because of as you said later in this function we are having:

elif isinstance(module, (nn.Linear, nn.Conv2d, nn.BatchNorm2d)):

Here is an idea - pass to _init_weights param key name in addition to module name where possible? that way one could also if on the key name and shortcut the "switch case"

so instead of - init Conv2d this way unless the param belongs to this parent module - for this param key only use this init.

but I guess it'd be difficult if the key isn't always fully qualified depending on how the model was initialized - full stack or say just decoder - perhaps just checking the last segment of the param name would be enough of context?

That would require changing the signature of all _init_weights, so we're back to changing all models ;-) I think it's probably easier this way even if it looks more convoluted at first glance.

ah, didn't think of that one! you're correct, Sylvain.

It'd have been useful if the init function returns a list of params it touched. the the outside care taker could do the accounting automatically. But again this adds more complexity.

This reminds me of the Deepspeed external parameter special case that was originally an issue for the same reason.

So let's please document this special case using the full example with 2 isinstance branches to show what to do when a sub-module inits weights that are outside of its immediate descendants.

Will add it to the add model doc.

stas00 · 2023-02-03T19:26:21Z

Thank you for making it simpler for the end user, Sylvain - I will test this today on m4 and get back to you.

stas00

Tested this on the m4 issue that started this whole investigation.

This PR solves the problem. My init tests that check the expected mean and variance now pass!

Thank you, Sylvain!

sgugger · 2023-02-08T21:00:46Z

src/transformers/models/bart/modeling_bart.py

This leftover in BART clashes with the new logic and testing. It is fixed here and in several copies (and in practice does exactly the same since weights are only initialized once).

src/transformers/models/upernet/modeling_upernet.py

sgugger · 2023-02-08T21:03:47Z

src/transformers/models/wav2vec2/modeling_wav2vec2.py

Fixes the weird hack for init of those modules (see below).

sgugger · 2023-02-08T21:04:12Z

tests/models/prophetnet/test_modeling_prophetnet.py

Hard-coded values on a random model, which is now initialized differently with the only one init rule.

sgugger · 2023-02-08T21:04:16Z

tests/models/reformer/test_modeling_reformer.py

sgugger · 2023-02-08T21:04:34Z

tests/test_modeling_common.py

Fixes this to get no init in the subconfigs as well.

stas00 · 2023-02-09T05:30:12Z

Thank you for doing a massive adjustment work and the explanations, Sylvain!

This is hard work and very awesome for everybody to benefit from!

sgugger · 2023-02-09T16:20:20Z

Last failing test is flaky so this is good for final review!

LysandreJik

Great change! Thanks for spending time on this, it's very nice to be sure that the initialisation is correct across methods and models.

LGTM!

LysandreJik · 2023-02-09T20:30:56Z

docs/source/en/add_new_model.mdx

 The above command will create a model according to the default parameters as defined in `BrandNewBertConfig()` with
 random weights, thus making sure that the `init()` methods of all components works.

+Note that all random initialization should happen in the `_init_weights` method of your `BrandnewBertPreTrainedModel`


Love the docs! Thanks for spending time on them, it's worthwhile.

LysandreJik · 2023-02-09T20:32:49Z

tests/models/layoutlmv2/test_modeling_layoutlmv2.py

        config_and_inputs = self.model_tester.prepare_config_and_inputs()
        self.model_tester.create_and_check_for_question_answering(*config_and_inputs)

-    def test_save_load_fast_init_from_base(self):


Amazing to remove the special case

stas00 · 2023-02-09T20:56:00Z

so it didn't make it into https://github.com/huggingface/transformers/releases/tag/v4.26.1, right?

do you know if you plan another hotfix release in the future or plan to wait for 4.27.0?

Asking as I'm needing to anchor requirements on this fix for m4 where I found this bug.

sgugger · 2023-02-10T01:13:26Z

This won't be until 4.27.0 as it could come with bugs we need to fix (and it's not a regression fix so won't go in a patch).

stas00 · 2023-02-10T01:51:52Z

Thank you for the clarity, Sylvain. 4.27.0 it is.

sgugger requested review from LysandreJik and stas00 February 2, 2023 20:38

stas00 reviewed Feb 2, 2023

View reviewed changes

src/transformers/modeling_utils.py Outdated Show resolved Hide resolved

stas00 reviewed Feb 2, 2023

View reviewed changes

src/transformers/modeling_utils.py Show resolved Hide resolved

stas00 reviewed Feb 3, 2023

View reviewed changes

stas00 approved these changes Feb 3, 2023

View reviewed changes

sgugger force-pushed the init_fixes branch from 2d026e3 to fe1f942 Compare February 7, 2023 18:49

sgugger commented Feb 8, 2023

View reviewed changes

sgugger added 13 commits February 8, 2023 16:53

Enforce single model initialization

385a6a6

Add OneFormer example for problem 3

3ce867e

Do it the Stas way

e18e980

Actually rename the uses...

c97de3c

Rewrite test

0f39f00

Try to change the test this way

6117713

Fix all init slow/fast tests

a03be79

Break connection

097c514

Fix more tests

e2e3125

Fix test for initialization

769b09f

Remove custom test

52c1f45

Quality

2fbc392

Fix last failing tests

c4ecb9f

sgugger force-pushed the init_fixes branch from fe333c6 to c4ecb9f Compare February 8, 2023 21:54

The end?

f16a592

LysandreJik approved these changes Feb 9, 2023

View reviewed changes

sgugger changed the title ~~Enforce single model initialization~~ 🚨🚨🚨 Enforce single model initialization Feb 9, 2023

sgugger merged commit 04b2f13 into main Feb 9, 2023

sgugger deleted the init_fixes branch February 9, 2023 20:46

patrickvonplaten mentioned this pull request Feb 10, 2023

Correct fast tests huggingface/diffusers#2314

Merged

Conversation

sgugger commented Feb 2, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Uh oh!

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented Feb 2, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sgugger commented Feb 3, 2023

Uh oh!

LysandreJik commented Feb 3, 2023

Uh oh!

stas00 Feb 3, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

stas00 Feb 3, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

stas00 Feb 3, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

stas00 Feb 3, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

stas00 Feb 3, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

stas00 commented Feb 3, 2023

Uh oh!

stas00 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

stas00 commented Feb 9, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sgugger commented Feb 9, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

LysandreJik left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

stas00 commented Feb 9, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sgugger commented Feb 10, 2023

Uh oh!

stas00 commented Feb 10, 2023

Uh oh!

sgugger commented Feb 2, 2023 •

edited

Loading

HuggingFaceDocBuilderDev commented Feb 2, 2023 •

edited

Loading

stas00 Feb 3, 2023 •

edited

Loading

stas00 Feb 3, 2023 •

edited

Loading

stas00 Feb 3, 2023 •

edited

Loading

stas00 Feb 3, 2023 •

edited

Loading

stas00 Feb 3, 2023 •

edited

Loading

stas00 commented Feb 9, 2023 •

edited

Loading

sgugger commented Feb 9, 2023 •

edited

Loading

stas00 commented Feb 9, 2023 •

edited

Loading