-
Notifications
You must be signed in to change notification settings - Fork 15
Delimited mr #154
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Delimited mr #154
Conversation
Codecov Report
@@ Coverage Diff @@
## master #154 +/- ##
==========================================
- Coverage 89.21% 89.08% -0.13%
==========================================
Files 92 92
Lines 5478 5507 +29
==========================================
+ Hits 4887 4906 +19
- Misses 591 601 +10
Continue to review full report at Codecov.
|
Codecov Report
@@ Coverage Diff @@
## master #154 +/- ##
==========================================
- Coverage 90.04% 89.91% -0.13%
==========================================
Files 105 104 -1
Lines 6466 6414 -52
==========================================
- Hits 5822 5767 -55
- Misses 644 647 +3
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is very exciting. I know that we've struggled with making this easy, and it will be great to be able to do this.
R/make-array.R
Outdated
@@ -80,6 +80,101 @@ makeMR <- function (subvariables, name, selections, ...) { | |||
return(vardef) | |||
} | |||
|
|||
|
|||
#' Create Multiple Response Variable from Delimited |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Super nitpicky: Create a Multiple Response Variable from Delimited Lists or Create Multiple Response Variables from Delimited Lists
R/make-array.R
Outdated
#' @param unanswered Character string indicating non-response | ||
#' @param ... Other arguments to be passed on to [makeMR()] | ||
#' | ||
#' @return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a return string here
R/make-array.R
Outdated
#' @param delim The delimiter separating the responses | ||
#' @param name The name of the resulting MR variable | ||
#' @param selected A character string used to indicate a selection | ||
#' @param not_selected Character string identifying non-selection |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add references to the default values that are specified
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think "selected" and "not_selected" arguments are worth having. And I'm guessing that if you convert to the derived expression (using ~=
on the server), you won't have room for defining subvariable category names anyway.
R/make-array.R
Outdated
v <- as.vector(var) | ||
} else { | ||
halt(dQuote("var"), " must be a Categorical or Text Crunch Variable.") | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here, dQuote("var")
could be dQuote(substitute(var))
to get the actual variable string that was given
R/make-array.R
Outdated
ds <- loadDataset(datasetReference(var)) | ||
addVariables(ds, vardefs) | ||
hide(var) | ||
ds <- refresh(ds) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can get rid of this refresh
line, if you change line 122 to ds <- addVariables(ds, vardefs)
and possible also change line 123 to var <- hide(var)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can get rid of all of this if we accept #162 and you use that approach instead.
tests/testthat/test-make-array.R
Outdated
c("not_selected", "selected", "No Data")) | ||
expect_identical(as.vector(ds$mr_5$maple), | ||
structure(c(2L, 2L, 1L, 1L), .Label = c("not_selected", "selected" | ||
), class = "factor")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this would be more readable as factor(c("selected", "selected", "not_selected", "not_selected"), levels=c("not_selected", "selected"))
tests/testthat/test-make-array.R
Outdated
unanswered = v[is.na(v)]) | ||
expect_equivalent(varDef, expected) | ||
}) | ||
c("maple; birch", "oak; maple; birch", "birch; sugar maple", "maple butter; oak") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this line leftover?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup!
tests/testthat/test-make-array.R
Outdated
not_selected = "No", | ||
unanswered = v[is.na(v)]) | ||
expect_equivalent(varDef, expected) | ||
}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This expectation doesn't seem quite right: I would have expected "oak" to be matched a few times given those values.
tests/testthat/test-make-array.R
Outdated
}) | ||
|
||
test_that("createSubvarDef generates the correct variable definition", { | ||
v <- c("maple; birch", "oak; maple; birch", "birch; sugar maple", "maple butter; oak", NA) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it would be good to also test where one of the values is selected in all of the values.
R/make-array.R
Outdated
vardefs <- lapply(cats, function(x) createSubvarDef(v, x, delim, | ||
selected, not_selected, unanswered, missing = is.na(v))) | ||
ds <- loadDataset(datasetReference(var)) | ||
addVariables(ds, vardefs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure this is avoidable, but maybe it would be good to add a check that these variables don't already exist / if they do rename them? It seems a bit problematic that someone could only ever use this command once on a variable (without going through and manually deleting the subvariables that were created).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about just changing the subvariable names to varname_selection
? In other words if the delimited variable was called trees
then the subvariables would be trees_oak
trees_maple
etc. My preference would be to keep the function consistent so that it always named things in the same way and didn't change behaviour based of the other variables in a dataset.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
varname_selection
is nice, but it doesn't ensure uniqueness. A safer way is to create the variable in one request (not upload and then bind, which is not atomic). And in doing it in a single request, you may be able to omit aliases for the subvariables, in which case the backend should generate (guaranteed valid) aliases for them.
It looks like there's a zz9 bug which prevents creating a MR variable with derived subvariable definitions. https://www.pivotaltracker.com/story/show/152975005 & The ideal scenario is to create the multiple response variable definition in one step without having to first create the subvariables, but for the time being we probably need to use the two step process until the bug is resolved. |
R/make-array.R
Outdated
deriv <- zfunc("case", new_cat) | ||
deriv$args[[2]] <- zfunc("is_missing", var) | ||
deriv$args[[3]] <- zfunc("~=", var, buildDelimRegex(str, delim)) | ||
deriv$references <- list(name = str, alias = paste0(alias(var), "_", str)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably check that these aliases are unique in the dataset. Currently the concatenate the source variable's alias with the item name (str
).
It should be doable to check that this alias doesn't exist, and if it does do something else.
- Changed map names to accomodate no periods in Mongo - Added escape functionality for regex metacharacters
Create Multiple Response variables from delimited text or categorical variables. The process this function uses to generate the MR is: