-
Notifications
You must be signed in to change notification settings - Fork 3.9k
Add GraphLand benchmark #10458
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Add GraphLand benchmark #10458
Conversation
for more information, see https://pre-commit.ci
Could someone help me to fix the problems with the imports of |
Alright, moving imports into function bodies has solved our problems. However, it seems like |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #10458 +/- ##
==========================================
- Coverage 86.11% 85.09% -1.02%
==========================================
Files 496 510 +14
Lines 33655 35962 +2307
==========================================
+ Hits 28981 30602 +1621
- Misses 4674 5360 +686 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There are also some linter issues with types, but they occur not only in |
@rusty1s @akihironitta @wsad1 I wanted to kindly ping regarding this PR and share an update: GraphLand benchmark has been accepted to NeurIPS this year, which I believe highlights its potential value for the community. I would greatly appreciate it if you could find some time to review the changes — I think merging this would be very timely and beneficial for many users. Thanks for your help! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to help get this merged, can you add examples/graphland.py which has an argparser to switch between all possible graphland choices. I know running each one would be time consuming but if you can atleast run like 1/2 of the available datasets and attach the logs of the runs this will make it alot easier to merge.
For those runs, please use the latest nvidia pyg container from https://catalog.ngc.nvidia.com/orgs/nvidia/containers/pyg
This will ensure your aligned with the latest pyg stack.
Additionally I would ask that you add 1-3 sentences to examples/README.md explainaing what graphland is and what your example showcases.
Lastly, please fix the existing CI issues so that CI is passing w green checks
d318a18
to
c0c6c61
Compare
@puririshi98 Thanks for picking up our PR! I have managed to fix the linter issues and also added an example on using GraphLand datasets for node property prediction. Hope this can be merged now. |
can you share a log of running the example? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, just need a log of an example run in the latest nvidia container (ideally you could run atleast 2-3 datasets to loosely confirm it is dataset agnostic.
To do this just clone your branch inside the container and pip uninstall torch-geometric
then pip install .
inside your branch.
Please also test that your unit tests pass and share a log of that as well
@puririshi98 Here are the commands I have executed staying at the root of
And the logs of
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, @akihironitta to final review and merge
examples/graphland.py
Outdated
) | ||
model = _get_model(dataset) | ||
model = model.cuda() | ||
dataset = dataset.copy().cuda() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is it deep-copying the dataset?
examples/graphland.py
Outdated
GRAPHLAND_DATASETS = [ | ||
'hm-categories', | ||
'pokec-regions', | ||
'web-topics', | ||
'tolokers-2', | ||
'city-reviews', | ||
'artnet-exp', | ||
'web-fraud', | ||
'hm-prices', | ||
'avazu-ctr', | ||
'city-roads-M', | ||
'city-roads-L', | ||
'twitch-views', | ||
'artnet-views', | ||
'web-traffic', | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unused
examples/graphland.py
Outdated
optimizer.zero_grad() | ||
loss.backward() | ||
optimizer.step() | ||
return loss.detach().cpu().item() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's defer device blocking call to when it's needed so that the subsequent evaluation can start sooner
return loss.detach().cpu().item() | |
return loss |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your contribution! Great work! PTAL at a few minor commits I pushed to the branch. Once the remaining comments are addressed, this PR is ready for merge 🚀
test_node_id = np.where(test_graph_mask)[0] | ||
test_node_id = torch.tensor(test_node_id, | ||
dtype=torch.long) # type: ignore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should avoid copying numpy:
test_node_id = np.where(test_graph_mask)[0] | |
test_node_id = torch.tensor(test_node_id, | |
dtype=torch.long) # type: ignore | |
test_node_id = np.where(test_graph_mask)[0] | |
test_node_id = torch.from_numpy(test_node_id) |
test_graph_mask = (raw_data['masks']['train'] | ||
| raw_data['masks']['val'] | ||
| raw_data['masks']['test']) | ||
test_graph_mask = torch.tensor(test_graph_mask, dtype=torch.bool) | ||
|
||
test_label_mask = raw_data['masks']['test'] & labeled_mask | ||
test_label_mask = torch.tensor(test_label_mask, dtype=torch.bool) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
val_graph_mask = (raw_data['masks']['train'] | ||
| raw_data['masks']['val']) | ||
val_graph_mask = torch.tensor(val_graph_mask, dtype=torch.bool) | ||
|
||
val_label_mask = raw_data['masks']['val'] & labeled_mask | ||
val_label_mask = torch.tensor(val_label_mask, dtype=torch.bool) | ||
|
||
val_node_id = np.where(val_graph_mask)[0] | ||
val_node_id = torch.tensor(val_node_id, | ||
dtype=torch.long) # type: ignore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
# >>> construct Data objects | ||
edge_index = raw_data['edges'].T | ||
edge_index = torch.tensor(edge_index, dtype=torch.long) | ||
|
||
# --- train | ||
train_graph_mask = raw_data['masks']['train'] | ||
train_graph_mask = torch.tensor(train_graph_mask, dtype=torch.bool) | ||
|
||
train_label_mask = raw_data['masks']['train'] & labeled_mask | ||
train_label_mask = torch.tensor(train_label_mask, dtype=torch.bool) | ||
|
||
train_node_id = np.where(train_graph_mask)[0] | ||
train_node_id = torch.tensor(train_node_id, | ||
dtype=torch.long) # type: ignore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
note: we want to make the CI weekly instead of after every commit, i will look into this while im back unless @akihironitta or @gvbazhenov can set this up |
GraphLand is a new graph benchmark for node property prediction that covers diverse industrial applications and includes graphs with different sizes, structural characteristics, and feature sets.