-
Notifications
You must be signed in to change notification settings - Fork 5
Open
Description
Hi!
Thank you for releasing PerturbDiff. It is a very nice piece of work, and I found it straightforward to start training from scratch on the Replogle dataset. During training, I ran into a few questions about the default hyperparameter settings and would appreciate any guidance:
- The default LR is 2e-3 and decay to 2e-4. Is it too large? On wandb the training loss fluctuates a lot.
data.use_cell_set=32seems very high, since the dataset grouped by(pert, cell_line, batch)have 2.67 cells per group on average, and more than 99.8% groups have fewer than 32 cells, thus most cells used in training are duplicates. Even set this number to 4, about 90% of groups need padding. I wander if it's necessary to have a relatively highuse_cell_set(maybe helps to reduce variance of gradient, but most are duplicates), or use a smaller one?- How many steps does whole training roughly takes? I used the default 200k steps and seems validation loss didn't converge yet.
- Seems the GPU utilization is quite low (less than 10% of utilization and memory), could you provide some experiences for speed up?
Thanks in advance for your help!
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels