Feature/alignn model by sidnb13 · Pull Request #12 · Fung-Lab/MatDeepLearn_dev

sidnb13 · 2023-01-09T16:37:46Z

Adds in implementations for the ALIGNN model (our own implementation and LLNL's Graphite implementation).
Implements basic functionality to load from a checkpoint file (saving is already implemented) and resume training for long jobs.
Implements the ability to register a new transform in common/transforms.py via decorator and specify an arbitrary list of them to compose via the config file. These are applied during preprocessing.

saraheisenach

Nice work :) ! Main things that I think need fixing are:

Formatting seems to be messed up - a .flake8 file was added and I think maybe that's messing up the pre-commit formatting that should be in place. Try resetting up pre-commit and deleting .flake8 and running all the files through it
You accidentally (I think) deleted some files, so those need to be added back in
A few code changes and moving some stuff around, but aside from that it looks great!

data/bulk_data_500/raw/0.json

configs/config_alignn.yml

.flake8

matdeeplearn/preprocessor/transforms.py

matdeeplearn/preprocessor/processor.py

matdeeplearn/trainers/base_trainer.py

scripts/main.py

matdeeplearn/trainers/property_trainer.py

saraheisenach

Thanks for making those changes! A few things:

There are still some comments from last week that you may have missed. I also added a couple new ones
Did you set up the pre-commit with the config in the repo so that it runs before you commit and is that working correctly? I noticed a couple instances where the code style had changed but you didn't add anything, so just wanted to see if that was working properly for you

configs/config.yml

configs/examples/config_alignn.yml

matdeeplearn/preprocessor/processor.py

matdeeplearn/trainers/base_trainer.py

sidnb13 · 2023-01-19T04:14:11Z

Hi Sarah, looks like I glossed over a couple of things--sorry about that.

I've addressed the changes you mentioned above, as well as the ones I missed from before. A couple of things: removed common/metrics.py since we're switching to W&B, and this was some test code I had written a few months back,
I was able to get the pre-commit hook reconfigured from my end, I think it was installed incorrectly earlier.

saraheisenach

Great job! I only see a few small bug fixes and then a couple places where we can remove some unnecessary parameters

matdeeplearn/common/data.py

matdeeplearn/models/alignn.py

matdeeplearn/models/alignn_graphite.py

matdeeplearn/preprocessor/helpers.py

matdeeplearn/trainers/base_trainer.py

matdeeplearn/trainers/property_trainer.py

configs/config.yml

matdeeplearn/trainers/base_trainer.py

sidnb13 · 2023-01-20T21:49:20Z

A note on GetY dimensionality bug for documentation purposes

When the GetY transform is performed on a graph, data.y becomes a 0D tensor, which the DOS and CGCNN models use as a fact to calculate the output dimension.
Refactoring the GetY transform as something required to be specified in config introduces a new bug, it is not applied "on the fly" in data.py when get_dataset is called (unless specifed), rather, the get_dataset fetches the processed data but implicitly converts this 0D tensor which is created in processor.py into a 1D tensor, causing issues for DOS and CGCNN. The solution is to modify the line self.output_dim = len(data[0][self.target_attr][0]) to self.output_dim = len(data[0][self.target_attr]).

matdeeplearn/preprocessor/processor.py

matdeeplearn/common/data.py

configs/config.yml

configs/examples/config_hive.yml

saraheisenach · 2023-01-23T20:15:47Z

matdeeplearn/common/data.py

 def get_dataset(
-    data_path, target_index: int = 0, transform_type="GetY", large_dataset=False
+    data_path,
+    transform_list: list = [],


Do we want this to be

Suggested change

transform_list: list = [],

transform_list: list = ["GetY"],

This goes back to the comment where we were discussing the GetY behavior. I am under the impression that if GetY isn't included as a transform, somewhere within our code, there will be a failure because y is used. If this is the case, do we want to throw an error here if GetY isn't included? I think in the future it would make sense to change how we approach y vs other targets, but for now, we are dependent on using y, so I think it would fail, but double check, because maybe I'm wrong

I think it should remain transform_list: List[dict] = []. Since we instantiate GetY with an argument, and possibly more in the future, the assertion makes more sense for now as the user should specify GetY.

saraheisenach · 2023-01-23T20:27:39Z

matdeeplearn/preprocessor/processor.py

        r=cutoff_radius,
        n_neighbors=n_neighbors,
        edge_steps=edge_steps,
+        transforms=dataset_config.get("transforms", []),


See my other comment above about maybe making the default list to be this

Suggested change

transforms=dataset_config.get("transforms", []),

transforms=dataset_config.get("transforms", ["GetY"]),

Same as above.

matdeeplearn/tasks/task.py

saraheisenach · 2023-01-23T20:41:53Z

matdeeplearn/trainers/base_trainer.py

-        dataset = get_dataset(dataset_path, target_index)
+        dataset = get_dataset(
+            dataset_path,
+            transform_list=dataset_config.get("transforms", []),


If you decide that GetY is needed in order to avoid errors, then I think this should instead be

Suggested change

transform_list=dataset_config.get("transforms", []),

transform_list=dataset_config.get("transforms", ["GetY"]),

Same as I commented above, the issue is that the transforms list holds dictionaries with parameters. So it would be necessary to specify this in the config.

matdeeplearn/trainers/property_trainer.py

…t checks #12

sidnb13 · 2023-01-24T21:45:04Z

In both preprocessor.py and data.py we assert the existence of GetY to prevent any issues downstream, since the transform has an argument which is specified in config, and this wouldn't make sense to specify elsewhere. The functionality for transforms downstream shouldn't need to then check for this.

sidnb13 requested review from saraheisenach, shuyijia and vxfung and removed request for saraheisenach, shuyijia and vxfung January 9, 2023 16:38

saraheisenach suggested changes Jan 12, 2023

View reviewed changes

sidnb13 added a commit that referenced this pull request Jan 15, 2023

Address issues from PR #12

e8c4d3d

sidnb13 added a commit that referenced this pull request Jan 15, 2023

Address issues from PR #12

0fcb993

This comment was marked as resolved.

Sign in to view

saraheisenach suggested changes Jan 18, 2023

View reviewed changes

sidnb13 added a commit that referenced this pull request Jan 19, 2023

Fix issues from #12

7693418

saraheisenach suggested changes Jan 19, 2023

View reviewed changes

saraheisenach reviewed Jan 19, 2023

View reviewed changes

configs/config.yml Show resolved Hide resolved

saraheisenach reviewed Jan 19, 2023

View reviewed changes

matdeeplearn/trainers/base_trainer.py Outdated Show resolved Hide resolved

saraheisenach reviewed Jan 19, 2023

View reviewed changes

matdeeplearn/trainers/base_trainer.py Outdated Show resolved Hide resolved

sidnb13 added a commit that referenced this pull request Jan 20, 2023

Remove extra parameters and fix bugs #12

4810e8b

saraheisenach suggested changes Jan 23, 2023

View reviewed changes

sidnb13 added a commit that referenced this pull request Jan 24, 2023

Assert GetY existence, clean up comments, and provide default argumen…

67804bd

…t checks #12

sidnb13 added 3 commits February 5, 2023 12:16

Add ALIGNN work

a084ca6

Add in pre-training transform functionality

1eceb9b

Implement load from checkpoint

2966875

Adding simple notebook tutorial and fixing compatibility issues

97ee227

saraheisenach force-pushed the feature/alignn-model branch from b1f3b86 to 97ee227 Compare February 5, 2023 17:17

saraheisenach approved these changes Feb 5, 2023

View reviewed changes

saraheisenach merged commit 8a2a418 into main Feb 5, 2023

saraheisenach deleted the feature/alignn-model branch March 14, 2023 21:58

	transforms=dataset_config.get("transforms", []),
	transforms=dataset_config.get("transforms", ["GetY"]),

	transform_list=dataset_config.get("transforms", []),
	transform_list=dataset_config.get("transforms", ["GetY"]),

Conversation

sidnb13 commented Jan 9, 2023

Uh oh!

saraheisenach left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

This comment was marked as resolved.

saraheisenach left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sidnb13 commented Jan 19, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

saraheisenach left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sidnb13 commented Jan 20, 2023

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

saraheisenach Jan 23, 2023

Choose a reason for hiding this comment

Uh oh!

sidnb13 Jan 24, 2023

Choose a reason for hiding this comment

Uh oh!

saraheisenach Jan 23, 2023

Choose a reason for hiding this comment

Uh oh!

sidnb13 Jan 24, 2023

Choose a reason for hiding this comment

Uh oh!

Uh oh!

saraheisenach Jan 23, 2023

Choose a reason for hiding this comment

Uh oh!

sidnb13 Jan 24, 2023

Choose a reason for hiding this comment

Uh oh!

Uh oh!

sidnb13 commented Jan 24, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

sidnb13 commented Jan 19, 2023 •

edited

Loading