Skip to content

Conversation

lionelkusch
Copy link
Collaborator

@lionelkusch lionelkusch commented Aug 8, 2025

This is a method of marginal method for computing importance.

I based this PR on the PR #220 for the API and PR #265 for the testing tools.

The figures can be improved with some suggestions.

The main limitation of this implementation is the limitation of the effect of one feature. Scikitlearn proposes to go to 3 features but the importance score can be tricky to compute for more than 1 feature. I decided not to increase the number of features for the moment and avoid problems with the importance.

This method can be a good example to improving the other methods because it is based on the implementation of scikit learn which consider a lot of more cases than the basic ones.

I literally copied the code of scikit learn and their example.
What is the best way to manage license issues? @bthirion

Reference:
Original method: Friedman, Jerome H. 2001. “Greedy Function Approximation: A Gradient Boosting Machine.” The Annals of Statistics 29 (5): 1189–1232. https://doi.org/10.1214/aos/1013203451.
Extension to Variable of Importance: Greenwell, Brandon M., Bradley C. Boehmke, and Andrew J. McCarthy. 2018. “A Simple and Effective Model-Based Variable Importance Measure.” arXiv. https://doi.org/10.48550/arXiv.1805.04755.
Implementation:

Copy link

codecov bot commented Aug 11, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 98.19%. Comparing base (bafa4ed) to head (1a117a4).
⚠️ Report is 2 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #318      +/-   ##
==========================================
+ Coverage   98.10%   98.19%   +0.08%     
==========================================
  Files          22       22              
  Lines        1161     1163       +2     
==========================================
+ Hits         1139     1142       +3     
+ Misses         22       21       -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@bthirion
Copy link
Collaborator

This is a method of marginal method for computing importance.

I based this PR on the PR #220 for the API and PR #265 for the testing tools.

The figures can be improved with some suggestions.

The main limitation of this implementation is the limitation of the effect of one feature. Scikitlearn proposes to go to 3 features but the importance score can be tricky to compute for more than 1 feature. I decided not to increase the number of features for the moment and avoid problems with the importance.

This method can be a good example to improving the other methods because it is based on the implementation of scikit learn which consider a lot of more cases than the basic ones.

I literally copied the code of scikit learn and their example. What is the best way to manage license issues? @bthirion

No worries, you can copy sklearn's code.

@bthirion
Copy link
Collaborator

Can you make this PR on top of another one that allows to make readable diffs. Otherwise, it's not possible to work on it.
Best,

@lionelkusch
Copy link
Collaborator Author

Can you make this PR on top of another one that allows to make readable diffs. Otherwise, it's not possible to work on it. Best,

done

@lionelkusch lionelkusch marked this pull request as ready for review August 27, 2025 17:44
Copy link
Collaborator

@bthirion bthirion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just had a look at the example so far.

Copy link
Collaborator

@bthirion bthirion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A first pass on the pdp module.

Notes
-----
Based on scikit-learn's _grid_from_X implementation:
https://github.com/scikit-learn/scikit-learn/blob/c5497b7f7eacfaff061cf68e09bcd48aa93d4d6b/sklearn/inspection/_partial_dependence.py#L40
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we simply import it ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I need to modify it to get the ICEs.

@bthirion
Copy link
Collaborator

bthirion commented Sep 1, 2025

What is the best way to manage license issues? @bthirion

You can copy BSD code without constraint. We should however acknowledge the origin of the code in the documentation.

Why didn't you simply add a light wrapper on top of sklearn's function ?

@lionelkusch
Copy link
Collaborator Author

What is the best way to manage license issues? @bthirion

You can copy BSD code without constraint. We should however acknowledge the origin of the code in the documentation.

Why didn't you simply add a light wrapper on top of sklearn's function ?

The scikitlearn implementation is not adapted to compute variable importance because they don't provide access to ICE. Furthermore, the current implementation is limited to one feature at a time instead of doing several at once.

Moreover, the plot of the scikitlearn presents some inconvenient, as I mentioned in issue #51.

Copy link
Collaborator

@jpaillard jpaillard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks good overall but the example is a bit lengthy.
Could we simplify by, for instance, using only the MLP or boosted tree?

@lionelkusch lionelkusch added the API 2 Refactoring following the second version of API label Sep 9, 2025
@lionelkusch lionelkusch changed the title Add Partial Dependance Plot Add Partial Dependence Plot Sep 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API 2 Refactoring following the second version of API
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants