Question about Replogle training settings

Hi!

Thank you for releasing PerturbDiff. It is a very nice piece of work, and I found it straightforward to start training from scratch on the Replogle dataset. During training, I ran into a few questions about the default hyperparameter settings and would appreciate any guidance:

1. The default LR is 2e-3 and decay to 2e-4. Is it too large? On wandb the training loss fluctuates a lot.
2. `data.use_cell_set=32` seems very high, since the dataset grouped by `(pert, cell_line, batch)` have 2.67 cells per group on average, and more than 99.8% groups have fewer than 32 cells, thus most cells used in training are duplicates. Even set this number to 4, about 90% of groups need padding. I wander if it's necessary to have a relatively high `use_cell_set` (maybe helps to reduce variance of gradient, but most are duplicates), or use a smaller one?
3. How many steps does whole training roughly takes? I used the default 200k steps and seems validation loss didn't converge yet.
4. Seems the GPU utilization is quite low (less than 10% of utilization and memory), could you provide some experiences for speed up? 

Thanks in advance for your help!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about Replogle training settings #3

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Question about Replogle training settings #3

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions