Skip to content

Feature/alignn model#12

Merged
saraheisenach merged 4 commits intomainfrom
feature/alignn-model
Feb 5, 2023
Merged

Feature/alignn model#12
saraheisenach merged 4 commits intomainfrom
feature/alignn-model

Conversation

@sidnb13
Copy link
Copy Markdown
Member

@sidnb13 sidnb13 commented Jan 9, 2023

  • Adds in implementations for the ALIGNN model (our own implementation and LLNL's Graphite implementation).
  • Implements basic functionality to load from a checkpoint file (saving is already implemented) and resume training for long jobs.
  • Implements the ability to register a new transform in common/transforms.py via decorator and specify an arbitrary list of them to compose via the config file. These are applied during preprocessing.

@sidnb13 sidnb13 requested review from saraheisenach, shuyijia and vxfung and removed request for saraheisenach, shuyijia and vxfung January 9, 2023 16:38
Copy link
Copy Markdown
Contributor

@saraheisenach saraheisenach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work :) ! Main things that I think need fixing are:

  1. Formatting seems to be messed up - a .flake8 file was added and I think maybe that's messing up the pre-commit formatting that should be in place. Try resetting up pre-commit and deleting .flake8 and running all the files through it
  2. You accidentally (I think) deleted some files, so those need to be added back in
  3. A few code changes and moving some stuff around, but aside from that it looks great!

sidnb13 added a commit that referenced this pull request Jan 15, 2023
sidnb13 added a commit that referenced this pull request Jan 15, 2023
@sidnb13

This comment was marked as resolved.

Copy link
Copy Markdown
Contributor

@saraheisenach saraheisenach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for making those changes! A few things:

  1. There are still some comments from last week that you may have missed. I also added a couple new ones
  2. Did you set up the pre-commit with the config in the repo so that it runs before you commit and is that working correctly? I noticed a couple instances where the code style had changed but you didn't add anything, so just wanted to see if that was working properly for you

@sidnb13
Copy link
Copy Markdown
Member Author

sidnb13 commented Jan 19, 2023

Hi Sarah, looks like I glossed over a couple of things--sorry about that.

  1. I've addressed the changes you mentioned above, as well as the ones I missed from before. A couple of things: removed common/metrics.py since we're switching to W&B, and this was some test code I had written a few months back,
  2. I was able to get the pre-commit hook reconfigured from my end, I think it was installed incorrectly earlier.

sidnb13 added a commit that referenced this pull request Jan 19, 2023
Copy link
Copy Markdown
Contributor

@saraheisenach saraheisenach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job! I only see a few small bug fixes and then a couple places where we can remove some unnecessary parameters

sidnb13 added a commit that referenced this pull request Jan 20, 2023
@sidnb13
Copy link
Copy Markdown
Member Author

sidnb13 commented Jan 20, 2023

A note on GetY dimensionality bug for documentation purposes

  • When the GetY transform is performed on a graph, data.y becomes a 0D tensor, which the DOS and CGCNN models use as a fact to calculate the output dimension.
  • Refactoring the GetY transform as something required to be specified in config introduces a new bug, it is not applied "on the fly" in data.py when get_dataset is called (unless specifed), rather, the get_dataset fetches the processed data but implicitly converts this 0D tensor which is created in processor.py into a 1D tensor, causing issues for DOS and CGCNN. The solution is to modify the line self.output_dim = len(data[0][self.target_attr][0]) to self.output_dim = len(data[0][self.target_attr]).

def get_dataset(
data_path, target_index: int = 0, transform_type="GetY", large_dataset=False
data_path,
transform_list: list = [],
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want this to be

Suggested change
transform_list: list = [],
transform_list: list = ["GetY"],

This goes back to the comment where we were discussing the GetY behavior. I am under the impression that if GetY isn't included as a transform, somewhere within our code, there will be a failure because y is used. If this is the case, do we want to throw an error here if GetY isn't included? I think in the future it would make sense to change how we approach y vs other targets, but for now, we are dependent on using y, so I think it would fail, but double check, because maybe I'm wrong

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should remain transform_list: List[dict] = []. Since we instantiate GetY with an argument, and possibly more in the future, the assertion makes more sense for now as the user should specify GetY.

r=cutoff_radius,
n_neighbors=n_neighbors,
edge_steps=edge_steps,
transforms=dataset_config.get("transforms", []),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my other comment above about maybe making the default list to be this

Suggested change
transforms=dataset_config.get("transforms", []),
transforms=dataset_config.get("transforms", ["GetY"]),

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above.

dataset = get_dataset(dataset_path, target_index)
dataset = get_dataset(
dataset_path,
transform_list=dataset_config.get("transforms", []),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you decide that GetY is needed in order to avoid errors, then I think this should instead be

Suggested change
transform_list=dataset_config.get("transforms", []),
transform_list=dataset_config.get("transforms", ["GetY"]),

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as I commented above, the issue is that the transforms list holds dictionaries with parameters. So it would be necessary to specify this in the config.

@sidnb13
Copy link
Copy Markdown
Member Author

sidnb13 commented Jan 24, 2023

In both preprocessor.py and data.py we assert the existence of GetY to prevent any issues downstream, since the transform has an argument which is specified in config, and this wouldn't make sense to specify elsewhere. The functionality for transforms downstream shouldn't need to then check for this.

@saraheisenach saraheisenach merged commit 8a2a418 into main Feb 5, 2023
@saraheisenach saraheisenach deleted the feature/alignn-model branch March 14, 2023 21:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants