Skip to content

Question about Replogle training settings #3

@ZichongWang

Description

@ZichongWang

Hi!

Thank you for releasing PerturbDiff. It is a very nice piece of work, and I found it straightforward to start training from scratch on the Replogle dataset. During training, I ran into a few questions about the default hyperparameter settings and would appreciate any guidance:

  1. The default LR is 2e-3 and decay to 2e-4. Is it too large? On wandb the training loss fluctuates a lot.
  2. data.use_cell_set=32 seems very high, since the dataset grouped by (pert, cell_line, batch) have 2.67 cells per group on average, and more than 99.8% groups have fewer than 32 cells, thus most cells used in training are duplicates. Even set this number to 4, about 90% of groups need padding. I wander if it's necessary to have a relatively high use_cell_set (maybe helps to reduce variance of gradient, but most are duplicates), or use a smaller one?
  3. How many steps does whole training roughly takes? I used the default 200k steps and seems validation loss didn't converge yet.
  4. Seems the GPU utilization is quite low (less than 10% of utilization and memory), could you provide some experiences for speed up?

Thanks in advance for your help!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions