-
Notifications
You must be signed in to change notification settings - Fork 42
Adding purged cross-validation for time series datasets #115
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good to me, just a few cosmetics nits below.
- The second, third and fourth steps are repeated with each of the k parts, | ||
so that k evaluations are generated | ||
- Finally, the evaluation metrics are averaged to provide the cross-validation | ||
metrics. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just a note outside the scope of this PR: it'd be nice if in the metadata's description we could use a pointer to a file, something like:
"description": {"file": "./readme.md"}
@@ -0,0 +1,18 @@ | |||
# Script for purged k-fold cross-validation | |||
|
|||
The objective of this script is create a purged k-fold cross validation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to create
# Script for purged k-fold cross-validation | ||
|
||
The objective of this script is create a purged k-fold cross validation | ||
starting form any classification model |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: from
" predict " (if regression? "regressions" | ||
"classifications") | ||
".") | ||
"code" 106})))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indentation looks off
(and (= model-type "linearregression") (not regression?)))) | ||
(when error | ||
(raise {"message" (str "The " model-type " cannot be used to" | ||
" predict " (if regression? "regressions" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indentation looks off
batch (round (/ rows k-folds)) | ||
k-fold-fn (lambda (x) | ||
(log-info "range" (str (+ 1 (* x batch))) (str (+ 1 (* (+ x 1) batch)))) | ||
(create-dataset {"origin_dataset" dataset-id |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i would define a variable for range in the let, instead of computing it twice
pruning-rows (round (* (/ rows 100) 7.5))) | ||
(log-info "range" (str (+ 1 pruning-rows)) (str (- rows pruning-rows 1))) | ||
(create-dataset {"origin_dataset" ds-id | ||
"range" [(+ 1 pruning-rows) (- rows pruning-rows 1)]}))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same thing about defining a variable for range (there could even be a function for computing it)
|
||
The following script performs a k-fold cross-validation compatible with time | ||
series datasets. Test datasets are created by sampling linearly the original | ||
dataset and some data is removed from the test dataset edges to avoid leakage. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe it is worth mentioning what a dataset "edge" is, unless it's pretty common terminology in this context (i for one don't know what it is :))
No description provided.