Skip to content

Commit 3b75403

Browse files
lionelkuschbthirionjpaillard
authored
Add tools for testing and make an example with CPI (#265)
* remove not necessary function for testing * add a check on the number of featires for X * add assertion on the model * improve the check_fit * update data_generation * add function for the generation of data for the tests * improve test for CPI * fix tests * improve geenration of model * change multivariate function but need to fix the function reid * change the generation of data * Add TODO It requires some reflexion about the method. The threshold based on the coefficient for estimating the weight seems strange. The equation signal to ratio seems also weird * add tests for senario * fix tests * Update src/hidimstat/_utils/scenario.py Co-authored-by: bthirion <[email protected]> * Fix docstring in the PR * fix noise_mag * improve test of knockoff * small improvement * fix estimation of variance * clean noise_std * change name of the sigma * add the possibility of continous support * dfix test by increasing the snr and find a good seed * fix test for noise_std * fix tests * fix tests * fix errror * fix dcrt example * fix bug in exampel by modification of example * fix size of tests * Apply suggestions from code review Co-authored-by: Joseph Paillard <[email protected]> * fix time and seed for test * Fix number of threadpool to 1 * increase coverage * add a warning * add test for warning * Modify snr for better approch * Apply suggestions from code review Co-authored-by: bthirion <[email protected]> * Update src/hidimstat/_utils/scenario.py Co-authored-by: bthirion <[email protected]> * modify senteces of the tests * fix situation of 0 support * change name of noise including spatial information * change mane of the noise fonction * Change parameter continous * fix range of paraemter for rho_noise_time * fix message of the error * fix message of error * fix test * fix error in name parameters * fix name in the error * fix tests * fix generation of data parameters * fix the missmathc of feature * fix error when the support of noise was zero * fix bug in the assertion * fix assertion * Update src/hidimstat/_utils/scenario.py Co-authored-by: bthirion <[email protected]> * transform beta in boolean array * rename noise serial * Improve comment of shuffle * done * remove assertion on number of jobs * Improve docstring * fix bug in tests * fix tests cpi * Remove unessesary tests * fix test noise_std * replace n_times by n_targets * fix change name * Chage name for index * refactor data generation with spatial * fix tests for senario * chnage the example * update regression test * update test * update noise_std test * fix test_senario * chnage to snr * Increase the robustness ofreid test I modified the parameters for having a more robust tests * update docstring * Put back cov in the tests * Modify warning message * fix tests * simplify the assert * chnage name of the file * fix docstring * change sigma by signal_noise_ratio * Update the example * change error message * Improve test of the noise * fix format of pyproject * Improve the different type of indexing * fix a bug in the condition * Update test/test_conditional_permutation_importance.py Co-authored-by: Joseph Paillard <[email protected]> * rename the main function * update the condition for the number of features * remove the count of index * improve the test and the coverage * update the modification * Apply suggestions from code review Co-authored-by: bthirion <[email protected]> * finish merge * fix example * remove the categoriacl in function of cpi test * fix rename * rename some tests * Update examples/plot_dcrt_example.py fix error in the computation of type-1 error. Co-authored-by: Joseph Paillard <[email protected]> * fix format --------- Co-authored-by: bthirion <[email protected]> Co-authored-by: Joseph Paillard <[email protected]>
1 parent 94d7d82 commit 3b75403

22 files changed

+1420
-565
lines changed

examples/plot_2D_simulation_example.py

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -69,7 +69,7 @@
6969
ensemble_clustered_inference_pvalue,
7070
)
7171
from hidimstat.statistical_tools.p_values import zscore_from_pval
72-
from hidimstat._utils.scenario import multivariate_simulation
72+
from hidimstat._utils.scenario import multivariate_simulation_spatial
7373

7474
#############################################################################
7575
# Specific plotting functions
@@ -167,12 +167,12 @@ def plot(maps, titles):
167167
shape = (40, 40)
168168
n_features = shape[1] * shape[0]
169169
roi_size = 4 # size of the edge of the four predictive regions
170-
sigma = 2.0 # noise standard deviation
170+
signal_noise_ratio = 10.0 # noise standard deviation
171171
smooth_X = 1.0 # level of spatial smoothing introduced by the Gaussian filter
172172

173173
# generating the data
174-
X_init, y, beta, epsilon, _, _ = multivariate_simulation(
175-
n_samples, shape, roi_size, sigma, smooth_X, seed=1
174+
X_init, y, beta, epsilon = multivariate_simulation_spatial(
175+
n_samples, shape, roi_size, signal_noise_ratio, smooth_X, seed=1
176176
)
177177

178178
##############################################################################

examples/plot_dcrt_example.py

Lines changed: 16 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@
2020
from sklearn.linear_model import LassoCV
2121

2222
from hidimstat import D0CRT
23-
from hidimstat._utils.scenario import multivariate_1D_simulation
23+
from hidimstat._utils.scenario import multivariate_simulation
2424

2525
#############################################################################
2626
# Processing the computations
@@ -38,14 +38,20 @@
3838
# Number of relevant variables
3939
n_signal = 2
4040
# Signal-to-noise ratio
41-
snr = 4
41+
signal_noise_ratio = 4
4242
# Correlation coefficient
4343
rho = 0.8
4444
# Nominal false positive rate
4545
alpha = 5e-2
4646

47-
X, y, _, __ = multivariate_1D_simulation(
48-
n_samples=n, n_features=p, support_size=n_signal, rho=rho, seed=sim_ind
47+
X, y, beta_true, noise = multivariate_simulation(
48+
n_samples=n,
49+
n_features=p,
50+
support_size=n_signal,
51+
rho=rho,
52+
signal_noise_ratio=signal_noise_ratio,
53+
shuffle=True,
54+
seed=sim_ind,
4955
)
5056

5157
# Applying a reLu function on the outcome y to get non-linear relationships
@@ -58,8 +64,9 @@
5864
results_list.append(
5965
{
6066
"model": "Lasso",
61-
"type-1 error": sum(pvals_lasso[n_signal:] < alpha) / (p - n_signal),
62-
"power": sum(pvals_lasso[:n_signal] < alpha) / (n_signal),
67+
"type-1 error": sum(pvals_lasso[np.logical_not(beta_true)] < alpha)
68+
/ (p - n_signal),
69+
"power": sum(pvals_lasso[beta_true] < alpha) / (n_signal),
6370
}
6471
)
6572

@@ -73,8 +80,9 @@
7380
results_list.append(
7481
{
7582
"model": "RF",
76-
"type-1 error": sum(pvals_forest[n_signal:] < alpha) / (p - n_signal),
77-
"power": sum(pvals_forest[:n_signal] < alpha) / (n_signal),
83+
"type-1 error": sum(pvals_forest[np.logical_not(beta_true)] < alpha)
84+
/ (n_signal),
85+
"power": sum(pvals_forest[beta_true] < alpha) / (n_signal),
7886
}
7987
)
8088

examples/plot_importance_classification_iris.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -116,7 +116,7 @@ def run_one_fold(X, y, model, train_index, test_index, vim_name="CFI", groups=No
116116
GridSearchCV(SVC(kernel="rbf"), {"C": np.logspace(-3, 3, 10)}),
117117
]
118118
cv = KFold(n_splits=5, shuffle=True, random_state=0)
119-
groups = {ft: i for i, ft in enumerate(dataset.feature_names)}
119+
groups = {ft: [i] for i, ft in enumerate(dataset.feature_names)}
120120
out_list = Parallel(n_jobs=5)(
121121
delayed(run_one_fold)(
122122
X, y, model, train_index, test_index, vim_name=vim_name, groups=groups

examples/plot_knockoff_aggregation.py

Lines changed: 29 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -31,7 +31,7 @@
3131
model_x_knockoff_pvalue,
3232
)
3333
from hidimstat.statistical_tools.multiple_testing import fdp_power
34-
from hidimstat._utils.scenario import multivariate_1D_simulation_AR
34+
from hidimstat._utils.scenario import multivariate_simulation
3535

3636

3737
#############################################################################
@@ -49,14 +49,14 @@
4949
# Number of variables
5050
n_features = 150
5151
# Correlation parameter
52-
rho = 0.4
52+
rho = 0.5
5353
# Ratio of number of variables with non-zero coefficients over total
5454
# coefficients
5555
sparsity = 0.2
5656
# Desired controlled False Discovery Rate (FDR) level
5757
fdr = 0.1
5858
# signal noise ration
59-
snr = 10
59+
signal_noise_ratio = 10
6060
# number of repetitions for the bootstraps
6161
n_bootstraps = 25
6262
# seed for the random generator
@@ -73,11 +73,26 @@
7373
#######################################################################
7474
# Define the function for running the three procedures on the same data
7575
# ---------------------------------------------------------------------
76-
def single_run(n_samples, n_features, rho, sparsity, snr, fdr, n_bootstraps, seed=None):
76+
def single_run(
77+
n_samples,
78+
n_features,
79+
rho,
80+
sparsity,
81+
signal_noise_ratio,
82+
fdr,
83+
n_bootstraps,
84+
seed=None,
85+
):
7786
# Generate data
78-
X, y, _, non_zero_index = multivariate_1D_simulation_AR(
79-
n_samples, n_features, rho=rho, sparsity=sparsity, seed=seed, snr=snr
87+
X, y, beta_true, noise = multivariate_simulation(
88+
n_samples,
89+
n_features,
90+
rho=rho,
91+
support_size=int(n_features * sparsity),
92+
signal_noise_ratio=signal_noise_ratio,
93+
seed=seed,
8094
)
95+
non_zero_index = np.where(beta_true)[0]
8196

8297
# Use model-X Knockoffs [1]
8398
selected, test_scores, threshold, X_tildes = model_x_knockoff(
@@ -165,7 +180,14 @@ def effect_number_samples(n_samples):
165180
parallel = Parallel(n_jobs, verbose=joblib_verbose)
166181
results = parallel(
167182
delayed(single_run)(
168-
n_samples, n_features, rho, sparsity, snr, fdr, n_bootstraps, seed=seed
183+
n_samples,
184+
n_features,
185+
rho,
186+
sparsity,
187+
signal_noise_ratio,
188+
fdr,
189+
n_bootstraps,
190+
seed=seed,
169191
)
170192
for seed in seed_list
171193
)

pyproject.toml

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -110,3 +110,9 @@ markers = ["slow: marks tests as slow (deselect with '-m \"not slow\"')"]
110110
# pytest-timeout
111111
timeout = 60 # on individual test should not take more than 10s
112112
session_timeout = 1200 # all the tests should be run in 5 min
113+
114+
[tool.pytest_env]
115+
OPENBLAS_NUM_THREADS = 1
116+
BLIS_NUM_THREADS = 1
117+
MKL_NUM_THREADS = 1
118+
OMP_NUM_THREADS = 1

src/hidimstat/_utils/exception.py

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
class InternalError(BaseException):
2+
"""
3+
Create an error for internal error of the library
4+
5+
Parameters
6+
----------
7+
message: str
8+
Message of explanation of the error
9+
"""
10+
11+
def __init__(self, message):
12+
self.message = message

0 commit comments

Comments
 (0)