Skip to content

Conversation

gvbazhenov
Copy link
Contributor

@gvbazhenov gvbazhenov commented Sep 16, 2025

GraphLand is a new graph benchmark for node property prediction that covers diverse industrial applications and includes graphs with different sizes, structural characteristics, and feature sets.

@gvbazhenov
Copy link
Contributor Author

Could someone help me to fix the problems with the imports of pandas, sklearn and yaml that are required in the implemented class? I do not understand how to organize them so that the tests pass.

@gvbazhenov
Copy link
Contributor Author

Could someone help me to fix the problems with the imports of pandas, sklearn and yaml that are required in the implemented class? I do not understand how to organize them so that the tests pass.

Alright, moving imports into function bodies has solved our problems. However, it seems like yaml is not installed for testing. How we can fix that and use yaml in our implementation?

Copy link

codecov bot commented Sep 17, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 85.09%. Comparing base (c211214) to head (d5926bc).
⚠️ Report is 123 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master   #10458      +/-   ##
==========================================
- Coverage   86.11%   85.09%   -1.02%     
==========================================
  Files         496      510      +14     
  Lines       33655    35962    +2307     
==========================================
+ Hits        28981    30602    +1621     
- Misses       4674     5360     +686     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@gvbazhenov
Copy link
Contributor Author

There are also some linter issues with types, but they occur not only in torch_geometric/datasets/graphland.py, as I understand. Can we skip them if all other tests pass?

@gvbazhenov
Copy link
Contributor Author

@rusty1s @akihironitta @wsad1 I wanted to kindly ping regarding this PR and share an update: GraphLand benchmark has been accepted to NeurIPS this year, which I believe highlights its potential value for the community. I would greatly appreciate it if you could find some time to review the changes — I think merging this would be very timely and beneficial for many users. Thanks for your help!

@puririshi98 puririshi98 self-requested a review October 2, 2025 17:37
Copy link
Contributor

@puririshi98 puririshi98 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to help get this merged, can you add examples/graphland.py which has an argparser to switch between all possible graphland choices. I know running each one would be time consuming but if you can atleast run like 1/2 of the available datasets and attach the logs of the runs this will make it alot easier to merge.
For those runs, please use the latest nvidia pyg container from https://catalog.ngc.nvidia.com/orgs/nvidia/containers/pyg

This will ensure your aligned with the latest pyg stack.

Additionally I would ask that you add 1-3 sentences to examples/README.md explainaing what graphland is and what your example showcases.
Lastly, please fix the existing CI issues so that CI is passing w green checks

@gvbazhenov gvbazhenov force-pushed the graphland branch 4 times, most recently from d318a18 to c0c6c61 Compare October 6, 2025 19:36
@gvbazhenov
Copy link
Contributor Author

@puririshi98 Thanks for picking up our PR! I have managed to fix the linter issues and also added an example on using GraphLand datasets for node property prediction. Hope this can be merged now.

@puririshi98
Copy link
Contributor

For those runs, please use the latest nvidia pyg container from https://catalog.ngc.nvidia.com/orgs/nvidia/containers/pyg

can you share a log of running the example?

Copy link
Contributor

@puririshi98 puririshi98 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just need a log of an example run in the latest nvidia container (ideally you could run atleast 2-3 datasets to loosely confirm it is dataset agnostic.
To do this just clone your branch inside the container and pip uninstall torch-geometric then pip install . inside your branch.

Please also test that your unit tests pass and share a log of that as well

@gvbazhenov
Copy link
Contributor Author

@puririshi98 Here are the commands I have executed staying at the root of pytorch_geometric repository:

> docker run --gpus all -it --network=host --rm --mount type=bind,source=(pwd),target=/workspace nvcr.io/nvidia/pyg:25.09-py3 bash
> pip uninstall torch-geometric
... Successfully uninstalled torch-geometric-2.7.0
> pip install .
... Successfully installed torch-geometric-2.7.0
> cd examples
> python graphland.py --name tolokers-2 --split RL
Extracting datasets/tolokers-2/raw/tolokers-2.zip
Processing...
Done!
100%|████████████████████████████████████████| 100/100 [00:03<00:00, 28.84it/s, loss=0.4327, train=49.89, val=43.11, test=44.79]
Best metrics: train=49.78, val=43.11, test=44.76
> python graphland.py --name avazu-ctr --split THI
Extracting datasets/avazu-ctr/raw/avazu-ctr.zip
Processing...
Done!
100%|████████████████████████████████████████| 100/100 [00:27<00:00,  3.63it/s, loss=0.7874, train=21.32, val=15.32, test=27.06]
Best metrics: train=21.32, val=15.32, test=27.06
> rm -rf datasets
> exit

And the logs of pytest:

> pytest test/datasets/test_graphland.py
============================================================== test session starts ===============================================================
platform linux -- Python 3.10.18, pytest-8.4.2, pluggy-1.6.0 -- .../bin/python3.10
cachedir: .pytest_cache
rootdir: ...
configfile: pyproject.toml
plugins: xdist-3.8.0
collected 6 items                                                                                                                                

test/datasets/test_graphland.py::test_transductive_graphland[hm-categories] PASSED
test/datasets/test_graphland.py::test_transductive_graphland[tolokers-2] PASSED
test/datasets/test_graphland.py::test_transductive_graphland[avazu-ctr] PASSED
test/datasets/test_graphland.py::test_inductive_graphland[hm-categories] PASSED
test/datasets/test_graphland.py::test_inductive_graphland[tolokers-2] PASSED
test/datasets/test_graphland.py::test_inductive_graphland[avazu-ctr] PASSED

================================================================ warnings summary ================================================================
test/datasets/test_graphland.py::test_inductive_graphland[hm-categories]
  .../lib/python3.10/site-packages/sklearn/preprocessing/_encoders.py:246: UserWarning: Found unknown categories in columns [3] during transform. These unknown categories will be encoded as all zeros
    warnings.warn(

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
==================================================== 6 passed, 1 warning in 92.21s (0:01:32) =====================================================

Copy link
Contributor

@puririshi98 puririshi98 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, @akihironitta to final review and merge

)
model = _get_model(dataset)
model = model.cuda()
dataset = dataset.copy().cuda()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it deep-copying the dataset?

Comment on lines 14 to 29
GRAPHLAND_DATASETS = [
'hm-categories',
'pokec-regions',
'web-topics',
'tolokers-2',
'city-reviews',
'artnet-exp',
'web-fraud',
'hm-prices',
'avazu-ctr',
'city-roads-M',
'city-roads-L',
'twitch-views',
'artnet-views',
'web-traffic',
]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused

optimizer.zero_grad()
loss.backward()
optimizer.step()
return loss.detach().cpu().item()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's defer device blocking call to when it's needed so that the subsequent evaluation can start sooner

Suggested change
return loss.detach().cpu().item()
return loss

Copy link
Member

@akihironitta akihironitta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your contribution! Great work! PTAL at a few minor commits I pushed to the branch. Once the remaining comments are addressed, this PR is ready for merge 🚀

Comment on lines +639 to +641
test_node_id = np.where(test_graph_mask)[0]
test_node_id = torch.tensor(test_node_id,
dtype=torch.long) # type: ignore
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should avoid copying numpy:

Suggested change
test_node_id = np.where(test_graph_mask)[0]
test_node_id = torch.tensor(test_node_id,
dtype=torch.long) # type: ignore
test_node_id = np.where(test_graph_mask)[0]
test_node_id = torch.from_numpy(test_node_id)

Comment on lines +631 to +637
test_graph_mask = (raw_data['masks']['train']
| raw_data['masks']['val']
| raw_data['masks']['test'])
test_graph_mask = torch.tensor(test_graph_mask, dtype=torch.bool)

test_label_mask = raw_data['masks']['test'] & labeled_mask
test_label_mask = torch.tensor(test_label_mask, dtype=torch.bool)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Comment on lines +603 to +612
val_graph_mask = (raw_data['masks']['train']
| raw_data['masks']['val'])
val_graph_mask = torch.tensor(val_graph_mask, dtype=torch.bool)

val_label_mask = raw_data['masks']['val'] & labeled_mask
val_label_mask = torch.tensor(val_label_mask, dtype=torch.bool)

val_node_id = np.where(val_graph_mask)[0]
val_node_id = torch.tensor(val_node_id,
dtype=torch.long) # type: ignore
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Comment on lines +571 to +584
# >>> construct Data objects
edge_index = raw_data['edges'].T
edge_index = torch.tensor(edge_index, dtype=torch.long)

# --- train
train_graph_mask = raw_data['masks']['train']
train_graph_mask = torch.tensor(train_graph_mask, dtype=torch.bool)

train_label_mask = raw_data['masks']['train'] & labeled_mask
train_label_mask = torch.tensor(train_label_mask, dtype=torch.bool)

train_node_id = np.where(train_graph_mask)[0]
train_node_id = torch.tensor(train_node_id,
dtype=torch.long) # type: ignore
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

@puririshi98
Copy link
Contributor

note: we want to make the CI weekly instead of after every commit, i will look into this while im back unless @akihironitta or @gvbazhenov can set this up

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants