API 2: CFI, PFI, LOCO #372

lionelkusch · 2025-09-02T14:54:39Z

Update the model of CFI, PFI and LOCO for API 2.

codecov · 2025-09-02T15:01:52Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 98.19%. Comparing base (d159dca) to head (3c52789).
⚠️ Report is 2 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #372      +/-   ##
==========================================
+ Coverage   98.10%   98.19%   +0.09%     
==========================================
  Files          22       22              
  Lines        1159     1222      +63     
==========================================
+ Hits         1137     1200      +63     
  Misses         22       22

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

jpaillard

It looks good but the diff seems very large for this small change.
Is there a reason for all the other modifications?

lionelkusch · 2025-09-02T15:58:42Z

I reorganize a bit the parameter in the init and move the docstring to the class because in all the other classes, I plan to do this.

By looking into more details, I miss some parts being added. I will add it and ask you to review it after. Sorry for it.

bthirion

This PR is definitely an improvement, thx.

src/hidimstat/base_perturbation.py

src/hidimstat/conditional_feature_importance.py

src/hidimstat/leave_one_covariate_out.py

src/hidimstat/permutation_feature_importance.py

jpaillard · 2025-09-03T08:17:55Z

src/hidimstat/base_perturbation.py

+        ).pvalue
+        return self.importances_
+
+    def fit_importance(


I find it disturbing that fit_importance has a behavior that is quite different from simply calling fit, then importance.

Could we add a check to the .fit() method to ensure that the estimator is fitted, and if not, fit it.

Could we allow for passing a list of fitted estimators matching the number of splits? That could typically be relevant for users willing to pass DL models, trained before, through skorch for instance.

If the models are not fitted, can we store them? in estimators_ for instance? It is useful to check the predictive performance in addition to the importance.

Could we add a check to the .fit() method to ensure that the estimator is fitted, and if not, fit it.

From my point of view, I don't think because the goal is to do the importance into the cross validation, like in the example of plot_model_agnostic__importance.

* Could we allow for passing a list of fitted estimators matching the number of splits? That could typically be relevant for users willing to pass DL models, trained before, through skorch for instance.

This also requires having the index of the cross validation. At this point, it's better for the user to do the loop with themselves.

* If the models are not fitted, can we store them? in estimators_ for instance? It is useful to check the predictive performance in addition to the importance.

Yes, I will add this.

I did the modification, tell me it's ok.

IMO, there are too many patches that are not optimal and would be avoided by creating a dedicated BasePerturbationCV:

The computation of p-values by taking the mean over folds is not valid

A big benefit of model-agnostic approaches (LOCO, CFI...) is to support DL models. However, it is not reasonable to force the training of DL models in the 'fit' of Hidimstat's methods. We should allow passing a list of fitted estimators to support this use case.

What is the DL?

I agree that this is not an optimal approach. However, there will need a redesign of the usage of CV and the management of the estimator. @bthirion doesn't want that CV is a parameter of fit_importances and for the moment, the estimator requires to be fitted before usage.

I don't see the point of having another class BasePerturbationCV if it's only modifying the fit_importances.
Passing the list of fitted estimators and a CV can be an idea but there are difficult to assert the link between these two objects.

The computation of p-values by taking the mean over folds is not valid

Do you have a better solution?
I was thought of using the function aggregate_pvalue but I don't know if it's correct in this case.

DL: Deep Learning

src/hidimstat/base_perturbation.py

src/hidimstat/conditional_feature_importance.py

Co-authored-by: Joseph Paillard <[email protected]>

bthirion

Thx for the progress. Please find a few suggestions enclosed.

src/hidimstat/base_perturbation.py

src/hidimstat/leave_one_covariate_out.py

test/test_permutation_feature_importance.py

src/hidimstat/base_perturbation.py

Co-authored-by: bthirion <[email protected]>

bthirion

We're almost there.

test/test_leave_one_covariate_out.py

test/test_permutation_feature_importance.py

bthirion · 2025-09-11T21:11:32Z

src/hidimstat/base_perturbation.py

+    Attributes
+    ----------
+    features_groups : dict
+        Mapping of feature groups identified during fit.


This is no longer accurate IIUC.

What do you mean?
It's still accurate in this version of the code.

jpaillard

Thank you.
I agree with the goal of the modifications but I believe that it is not optimal to implement the CV by simply patching the current class, we need a dedicated class BasePerturbationCV

src/hidimstat/_utils/utils.py

jpaillard · 2025-09-12T14:42:54Z

src/hidimstat/base_perturbation.py

+        self.importances_ = np.mean(self.importances_cv_, axis=0)
+        self.pvalues_ = (
+            None if self.pvalues_cv_[0] is None else np.mean(self.pvalues_cv_, axis=0)


That looks problematic:

The p-value of the CV estimator is computed over the k test statistics, where k is the number of folds. So, self.pvalues_cv_` should we 1d. Even if it were 2d, taking the mean of p-values is not in general a p-value.

I see the problem that leaving self.pvalues_ to None will leave the instance "not-fitted" for me this calls for creating a sub-class BasePerturbationCV

I see the problem that leaving self.pvalues_ to None will leave the instance "not-fitted" for me this calls for creating a sub-class BasePerturbationCV

This is not a problem because pvalues are not possible to be computed by all methods, such as LOCO.
The check is based only on importances_

jpaillard · 2025-09-12T14:51:03Z

src/hidimstat/base_perturbation.py

+        ).pvalue
+        return self.importances_
+
+    def fit_importance(


IMO, there are too many patches that are not optimal and would be avoided by creating a dedicated BasePerturbationCV:

The computation of p-values by taking the mean over folds is not valid

A big benefit of model-agnostic approaches (LOCO, CFI...) is to support DL models. However, it is not reasonable to force the training of DL models in the 'fit' of Hidimstat's methods. We should allow passing a list of fitted estimators to support this use case.

Co-authored-by: Joseph Paillard <[email protected]>

lionelkusch · 2025-09-12T16:06:34Z

src/hidimstat/base_perturbation.py

+        self.pvalues_ = ttest_1samp(
+            test_result, 0.0, axis=1, alternative="greater"
+        ).pvalue


As the issue #48 mentions, do I should propose a better way to compute the pvalue?

If you want to use the function as parameters:
Do you have some suggestions for the signature of the function of it?

I suggest something similar to scikit learn's metric: support both strings: 'ttest', 'wilcoxon', 'corrected-ttest' ... and functions lambda x: ttest_1samp(x, 0.0, axis=1, alternative="greater")[1]

A signature is more like this:
test(diff_loss) -> pvalue

Do you think that losses - mean_losses has parameter is enough or do we need more information?

The signature you describe looks good. However, I think it is nice to also support passing a string for classical tests. That would save to the user the process of defining a function that follows the signature described while fixing the other parameters of the function (e.g. axis=1, alternative="greater" ...)

I don't really like to have strings because I find difficult to manage but in this case, it can be interesting.
I try to add a function for it.

the point is that ttest_1samp is initially a scipy method. Keeping a similar API helps users.

lionelkusch added 3 commits September 2, 2025 16:23

New API for CFI, PFI, LOCO

df93c78

fix test for new API

ccb60ed

fix example

7c827ad

lionelkusch added the API 2 Refactoring following the second version of API label Sep 2, 2025

lionelkusch requested review from jpaillard and bthirion September 2, 2025 14:54

add test for new check

82d61e6

jpaillard reviewed Sep 2, 2025

View reviewed changes

lionelkusch added 3 commits September 2, 2025 18:13

add pvalue and fit_importance and function

28593e4

Add new function

cabfb63

fix docstring

7d7fd7d

bthirion reviewed Sep 2, 2025

View reviewed changes

jpaillard reviewed Sep 3, 2025

View reviewed changes

src/hidimstat/conditional_feature_importance.py Outdated Show resolved Hide resolved

jpaillard reviewed Sep 3, 2025

View reviewed changes

src/hidimstat/conditional_feature_importance.py Outdated Show resolved Hide resolved

lionelkusch mentioned this pull request Sep 3, 2025

parallelisation of cross-validation in fit_importance #373

Open

lionelkusch and others added 8 commits September 3, 2025 15:19

Improve cross validation

b958cc7

update docstring

1f97d60

update doctring

db96bb6

fix error

d656f17

fix docstring

0493b6f

Apply suggestions from code review

9c54e1b

Co-authored-by: Joseph Paillard <[email protected]>

Update default

7bf75e4

fix tests

b3cd78a

bthirion reviewed Sep 7, 2025

View reviewed changes

lionelkusch and others added 3 commits September 8, 2025 10:59

Apply suggestions from code review

7825490

Co-authored-by: bthirion <[email protected]>

chnage group by features_groups

084ad24

fix format

7379ec1

lionelkusch added 5 commits September 8, 2025 11:04

improve test

02ae5ba

fix docstring

1e91c65

Merge branch 'main' into PR_CFI

46b8fa5

fix test

58a57f8

Merge branch 'main' into PR_CFI

03b919a

lionelkusch mentioned this pull request Sep 10, 2025

Add a methods for calculating the p-values for CPI, PI and LOCO #48

Open

lionelkusch force-pushed the PR_CFI branch from 87bc991 to 03b919a Compare September 11, 2025 12:42

lionelkusch mentioned this pull request Sep 11, 2025

API2: (2/4:test) PFI #391

Draft

lionelkusch requested review from bthirion and jpaillard September 11, 2025 12:46

improve loco

c4ea731

lionelkusch force-pushed the PR_CFI branch from f9a0ca1 to c4ea731 Compare September 11, 2025 13:11

fix computation of pvalues

43d3f99

bthirion reviewed Sep 11, 2025

View reviewed changes

jpaillard reviewed Sep 12, 2025

View reviewed changes

lionelkusch and others added 2 commits September 12, 2025 17:46

Update src/hidimstat/_utils/utils.py

5fd99b0

Co-authored-by: Joseph Paillard <[email protected]>

change name

3c52789

lionelkusch commented Sep 12, 2025

View reviewed changes

API 2: CFI, PFI, LOCO #372

Are you sure you want to change the base?

API 2: CFI, PFI, LOCO #372

Uh oh!

Conversation

lionelkusch commented Sep 2, 2025

Uh oh!

codecov bot commented Sep 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

jpaillard left a comment

Choose a reason for hiding this comment

Uh oh!

lionelkusch commented Sep 2, 2025

Uh oh!

bthirion left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lionelkusch Sep 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

bthirion left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

bthirion left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jpaillard left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

codecov bot commented Sep 2, 2025 •

edited

Loading

lionelkusch Sep 12, 2025 •

edited

Loading

jpaillard Sep 12, 2025 •

edited

Loading