-
Notifications
You must be signed in to change notification settings - Fork 9
Add Partial Dependence Plot #318
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #318 +/- ##
==========================================
+ Coverage 98.10% 98.19% +0.08%
==========================================
Files 22 22
Lines 1161 1163 +2
==========================================
+ Hits 1139 1142 +3
+ Misses 22 21 -1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
No worries, you can copy sklearn's code. |
Can you make this PR on top of another one that allows to make readable diffs. Otherwise, it's not possible to work on it. |
done |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just had a look at the example so far.
Co-authored-by: bthirion <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A first pass on the pdp module.
Notes | ||
----- | ||
Based on scikit-learn's _grid_from_X implementation: | ||
https://github.com/scikit-learn/scikit-learn/blob/c5497b7f7eacfaff061cf68e09bcd48aa93d4d6b/sklearn/inspection/_partial_dependence.py#L40 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't we simply import it ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I need to modify it to get the ICEs.
You can copy BSD code without constraint. We should however acknowledge the origin of the code in the documentation. Why didn't you simply add a light wrapper on top of sklearn's function ? |
The scikitlearn implementation is not adapted to compute variable importance because they don't provide access to ICE. Furthermore, the current implementation is limited to one feature at a time instead of doing several at once. Moreover, the plot of the scikitlearn presents some inconvenient, as I mentioned in issue #51. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks good overall but the example is a bit lengthy.
Could we simplify by, for instance, using only the MLP or boosted tree?
This is a method of marginal method for computing importance.
I based this PR on the PR #220 for the API and PR #265 for the testing tools.
The figures can be improved with some suggestions.
The main limitation of this implementation is the limitation of the effect of one feature. Scikitlearn proposes to go to 3 features but the importance score can be tricky to compute for more than 1 feature. I decided not to increase the number of features for the moment and avoid problems with the importance.
This method can be a good example to improving the other methods because it is based on the implementation of scikit learn which consider a lot of more cases than the basic ones.
I literally copied the code of scikit learn and their example.
What is the best way to manage license issues? @bthirion
Reference:
Original method: Friedman, Jerome H. 2001. “Greedy Function Approximation: A Gradient Boosting Machine.” The Annals of Statistics 29 (5): 1189–1232. https://doi.org/10.1214/aos/1013203451.
Extension to Variable of Importance: Greenwell, Brandon M., Bradley C. Boehmke, and Andrew J. McCarthy. 2018. “A Simple and Effective Model-Based Variable Importance Measure.” arXiv. https://doi.org/10.48550/arXiv.1805.04755.
Implementation: