Add GraphLand benchmark #10458

gvbazhenov · 2025-09-16T15:10:36Z

GraphLand is a new graph benchmark for node property prediction that covers diverse industrial applications and includes graphs with different sizes, structural characteristics, and feature sets.

…aphland

for more information, see https://pre-commit.ci

…aphland

gvbazhenov · 2025-09-16T15:35:43Z

Could someone help me to fix the problems with the imports of pandas, sklearn and yaml that are required in the implemented class? I do not understand how to organize them so that the tests pass.

gvbazhenov · 2025-09-17T11:15:12Z

Could someone help me to fix the problems with the imports of pandas, sklearn and yaml that are required in the implemented class? I do not understand how to organize them so that the tests pass.

Alright, moving imports into function bodies has solved our problems. However, it seems like yaml is not installed for testing. How we can fix that and use yaml in our implementation?

codecov · 2025-09-17T11:24:25Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 85.09%. Comparing base (c211214) to head (d5926bc).
⚠️ Report is 123 commits behind head on master.

Additional details and impacted files

@@            Coverage Diff             @@
##           master   #10458      +/-   ##
==========================================
- Coverage   86.11%   85.09%   -1.02%     
==========================================
  Files         496      510      +14     
  Lines       33655    35962    +2307     
==========================================
+ Hits        28981    30602    +1621     
- Misses       4674     5360     +686

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

gvbazhenov · 2025-09-17T11:28:31Z

There are also some linter issues with types, but they occur not only in torch_geometric/datasets/graphland.py, as I understand. Can we skip them if all other tests pass?

…aphland

gvbazhenov · 2025-10-01T15:01:21Z

@rusty1s @akihironitta @wsad1 I wanted to kindly ping regarding this PR and share an update: GraphLand benchmark has been accepted to NeurIPS this year, which I believe highlights its potential value for the community. I would greatly appreciate it if you could find some time to review the changes — I think merging this would be very timely and beneficial for many users. Thanks for your help!

puririshi98

to help get this merged, can you add examples/graphland.py which has an argparser to switch between all possible graphland choices. I know running each one would be time consuming but if you can atleast run like 1/2 of the available datasets and attach the logs of the runs this will make it alot easier to merge.
For those runs, please use the latest nvidia pyg container from https://catalog.ngc.nvidia.com/orgs/nvidia/containers/pyg

This will ensure your aligned with the latest pyg stack.

Additionally I would ask that you add 1-3 sentences to examples/README.md explainaing what graphland is and what your example showcases.
Lastly, please fix the existing CI issues so that CI is passing w green checks

gvbazhenov · 2025-10-06T19:41:52Z

@puririshi98 Thanks for picking up our PR! I have managed to fix the linter issues and also added an example on using GraphLand datasets for node property prediction. Hope this can be merged now.

puririshi98 · 2025-10-06T20:15:37Z

For those runs, please use the latest nvidia pyg container from https://catalog.ngc.nvidia.com/orgs/nvidia/containers/pyg

can you share a log of running the example?

puririshi98

LGTM, just need a log of an example run in the latest nvidia container (ideally you could run atleast 2-3 datasets to loosely confirm it is dataset agnostic.
To do this just clone your branch inside the container and pip uninstall torch-geometric then pip install . inside your branch.

Please also test that your unit tests pass and share a log of that as well

gvbazhenov · 2025-10-07T11:27:35Z

@puririshi98 Here are the commands I have executed staying at the root of pytorch_geometric repository:

> docker run --gpus all -it --network=host --rm --mount type=bind,source=(pwd),target=/workspace nvcr.io/nvidia/pyg:25.09-py3 bash
> pip uninstall torch-geometric
... Successfully uninstalled torch-geometric-2.7.0
> pip install .
... Successfully installed torch-geometric-2.7.0
> cd examples
> python graphland.py --name tolokers-2 --split RL
Extracting datasets/tolokers-2/raw/tolokers-2.zip
Processing...
Done!
100%|████████████████████████████████████████| 100/100 [00:03<00:00, 28.84it/s, loss=0.4327, train=49.89, val=43.11, test=44.79]
Best metrics: train=49.78, val=43.11, test=44.76
> python graphland.py --name avazu-ctr --split THI
Extracting datasets/avazu-ctr/raw/avazu-ctr.zip
Processing...
Done!
100%|████████████████████████████████████████| 100/100 [00:27<00:00,  3.63it/s, loss=0.7874, train=21.32, val=15.32, test=27.06]
Best metrics: train=21.32, val=15.32, test=27.06
> rm -rf datasets
> exit

And the logs of pytest:

> pytest test/datasets/test_graphland.py
============================================================== test session starts ===============================================================
platform linux -- Python 3.10.18, pytest-8.4.2, pluggy-1.6.0 -- .../bin/python3.10
cachedir: .pytest_cache
rootdir: ...
configfile: pyproject.toml
plugins: xdist-3.8.0
collected 6 items                                                                                                                                

test/datasets/test_graphland.py::test_transductive_graphland[hm-categories] PASSED
test/datasets/test_graphland.py::test_transductive_graphland[tolokers-2] PASSED
test/datasets/test_graphland.py::test_transductive_graphland[avazu-ctr] PASSED
test/datasets/test_graphland.py::test_inductive_graphland[hm-categories] PASSED
test/datasets/test_graphland.py::test_inductive_graphland[tolokers-2] PASSED
test/datasets/test_graphland.py::test_inductive_graphland[avazu-ctr] PASSED

================================================================ warnings summary ================================================================
test/datasets/test_graphland.py::test_inductive_graphland[hm-categories]
  .../lib/python3.10/site-packages/sklearn/preprocessing/_encoders.py:246: UserWarning: Found unknown categories in columns [3] during transform. These unknown categories will be encoded as all zeros
    warnings.warn(

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
==================================================== 6 passed, 1 warning in 92.21s (0:01:32) =====================================================

puririshi98

LGTM, @akihironitta to final review and merge

akihironitta · 2025-10-10T17:46:28Z

examples/graphland.py

+    )
+    model = _get_model(dataset)
+    model = model.cuda()
+    dataset = dataset.copy().cuda()


Why is it deep-copying the dataset?

akihironitta · 2025-10-10T17:47:27Z

examples/graphland.py

+GRAPHLAND_DATASETS = [
+    'hm-categories',
+    'pokec-regions',
+    'web-topics',
+    'tolokers-2',
+    'city-reviews',
+    'artnet-exp',
+    'web-fraud',
+    'hm-prices',
+    'avazu-ctr',
+    'city-roads-M',
+    'city-roads-L',
+    'twitch-views',
+    'artnet-views',
+    'web-traffic',
+]


akihironitta · 2025-10-10T17:49:23Z

examples/graphland.py

+    optimizer.zero_grad()
+    loss.backward()
+    optimizer.step()
+    return loss.detach().cpu().item()


Let's defer device blocking call to when it's needed so that the subsequent evaluation can start sooner

Suggested change

return loss.detach().cpu().item()

return loss

akihironitta

Thank you for your contribution! Great work! PTAL at a few minor commits I pushed to the branch. Once the remaining comments are addressed, this PR is ready for merge 🚀

akihironitta · 2025-10-10T18:02:28Z

torch_geometric/datasets/graphland.py

+        test_node_id = np.where(test_graph_mask)[0]
+        test_node_id = torch.tensor(test_node_id,
+                                    dtype=torch.long)  # type: ignore


We should avoid copying numpy:

Suggested change

test_node_id = np.where(test_graph_mask)[0]

test_node_id = torch.tensor(test_node_id,

dtype=torch.long) # type: ignore

test_node_id = np.where(test_graph_mask)[0]

test_node_id = torch.from_numpy(test_node_id)

akihironitta · 2025-10-10T18:02:39Z

torch_geometric/datasets/graphland.py

+        test_graph_mask = (raw_data['masks']['train']
+                           | raw_data['masks']['val']
+                           | raw_data['masks']['test'])
+        test_graph_mask = torch.tensor(test_graph_mask, dtype=torch.bool)
+
+        test_label_mask = raw_data['masks']['test'] & labeled_mask
+        test_label_mask = torch.tensor(test_label_mask, dtype=torch.bool)


akihironitta · 2025-10-10T18:02:56Z

torch_geometric/datasets/graphland.py

+        val_graph_mask = (raw_data['masks']['train']
+                          | raw_data['masks']['val'])
+        val_graph_mask = torch.tensor(val_graph_mask, dtype=torch.bool)
+
+        val_label_mask = raw_data['masks']['val'] & labeled_mask
+        val_label_mask = torch.tensor(val_label_mask, dtype=torch.bool)
+
+        val_node_id = np.where(val_graph_mask)[0]
+        val_node_id = torch.tensor(val_node_id,
+                                   dtype=torch.long)  # type: ignore


akihironitta · 2025-10-10T18:03:10Z

torch_geometric/datasets/graphland.py

+        # >>> construct Data objects
+        edge_index = raw_data['edges'].T
+        edge_index = torch.tensor(edge_index, dtype=torch.long)
+
+        # --- train
+        train_graph_mask = raw_data['masks']['train']
+        train_graph_mask = torch.tensor(train_graph_mask, dtype=torch.bool)
+
+        train_label_mask = raw_data['masks']['train'] & labeled_mask
+        train_label_mask = torch.tensor(train_label_mask, dtype=torch.bool)
+
+        train_node_id = np.where(train_graph_mask)[0]
+        train_node_id = torch.tensor(train_node_id,
+                                     dtype=torch.long)  # type: ignore


puririshi98 · 2025-10-10T18:31:02Z

note: we want to make the CI weekly instead of after every commit, i will look into this while im back unless @akihironitta or @gvbazhenov can set this up

Gleb Bazhenov and others added 4 commits September 16, 2025 12:55

add graphland source and tests

ba3e5dd

Merge branch 'pyg-team:master' into graphland

2844683

Merge branch 'pyg-team:master' into graphland

e49aa04

Merge branch 'graphland' of https://github.com/gvbazhenov/pyg into gr…

37d1c5c

…aphland

gvbazhenov requested review from akihironitta, rusty1s and wsad1 as code owners September 16, 2025 15:10

pre-commit-ci bot and others added 6 commits September 16, 2025 15:12

[pre-commit.ci] auto fixes from pre-commit.com hooks

7f10551

for more information, see https://pre-commit.ci

try to fix dataset class and test

3acac12

Merge branch 'graphland' of https://github.com/gvbazhenov/pyg into gr…

0569554

…aphland

try to fix imports

d7a4951

try to fix imports

4d93e68

resolve conflicts

f3d45aa

Gleb Bazhenov added 3 commits September 17, 2025 13:42

fix dtypes

4241b84

try to move imports into functions

09839e3

reduce number of tests

d2ae707

Gleb Bazhenov and others added 3 commits September 17, 2025 14:31

add pull request link to changelog

7bd1d67

Merge branch 'pyg-team:master' into graphland

5f4381c

Merge branch 'graphland' of https://github.com/gvbazhenov/pyg into gr…

01989af

…aphland

akihironitta added feature 1 - Priority P1 dataset labels Sep 20, 2025

puririshi98 self-requested a review October 2, 2025 17:37

puririshi98 requested changes Oct 2, 2025

View reviewed changes

Merge branch 'master' into graphland

8e52704

Merge branch 'pyg-team:master' into graphland

1dc2786

gvbazhenov force-pushed the graphland branch from ea12e3b to b3c0091 Compare October 6, 2025 15:32

try to fix linter issues

e5a5e36

gvbazhenov force-pushed the graphland branch 4 times, most recently from d318a18 to c0c6c61 Compare October 6, 2025 19:36

puririshi98 requested changes Oct 6, 2025

View reviewed changes

add example

5db89fc

gvbazhenov force-pushed the graphland branch from c0c6c61 to 5db89fc Compare October 7, 2025 11:09

puririshi98 approved these changes Oct 7, 2025

View reviewed changes

akihironitta reviewed Oct 10, 2025

View reviewed changes

update

d5926bc

akihironitta approved these changes Oct 10, 2025

View reviewed changes

Add GraphLand benchmark #10458

Are you sure you want to change the base?

Add GraphLand benchmark #10458

Conversation

gvbazhenov commented Sep 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gvbazhenov commented Sep 16, 2025

Uh oh!

gvbazhenov commented Sep 17, 2025

Uh oh!

codecov bot commented Sep 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

gvbazhenov commented Sep 17, 2025

Uh oh!

gvbazhenov commented Oct 1, 2025

Uh oh!

puririshi98 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gvbazhenov commented Oct 6, 2025

Uh oh!

puririshi98 commented Oct 6, 2025

Uh oh!

puririshi98 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gvbazhenov commented Oct 7, 2025

Uh oh!

puririshi98 left a comment

Choose a reason for hiding this comment

Uh oh!

akihironitta Oct 10, 2025

Choose a reason for hiding this comment

Uh oh!

akihironitta Oct 10, 2025

Choose a reason for hiding this comment

Uh oh!

akihironitta Oct 10, 2025

Choose a reason for hiding this comment

Uh oh!

akihironitta left a comment

Choose a reason for hiding this comment

Uh oh!

akihironitta Oct 10, 2025

Choose a reason for hiding this comment

Uh oh!

akihironitta Oct 10, 2025

Choose a reason for hiding this comment

Uh oh!

akihironitta Oct 10, 2025

Choose a reason for hiding this comment

Uh oh!

akihironitta Oct 10, 2025

Choose a reason for hiding this comment

Uh oh!

puririshi98 commented Oct 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

gvbazhenov commented Sep 16, 2025 •

edited

Loading

codecov bot commented Sep 17, 2025 •

edited

Loading

puririshi98 left a comment •

edited

Loading

puririshi98 left a comment •

edited

Loading