Skip to content

Delimited mr #154

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 19 commits into from
Mar 15, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -181,6 +181,7 @@ export(mergeFork)
export(mkdir)
export(modifyWeightVariables)
export(moveToGroup)
export(makeMRFromText)
export(mv)
export(newDataset)
export(newDatasetByColumn)
Expand Down
1 change: 1 addition & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
## crunch 1.20.1 (under development)
* Variables can now be converted from one type to another with server-side derivations. Have a text input that is only numbers, and want to have a variables that is a true numeric? Simple, just use `ds$id_var_numeric <- as.Numeric(ds$id_var)`. There Are `as.*` methods for all Crunch data types except for array-like variables.
* `makeMRFromText()` to take a variable imported as delimited strings, parse the multiple-response options, and return a (derived) multiple_response variable.
* Added support for setting population sizes on datasets with `setPopulation(ds, size = 24.13e6, magnitude = 3)` and for getting population sizes (or magnitudes) with `popSize(ds)` and `popMagnitude(ds)` respectively.
* Add `options(crunch.show.progress)` to govern whether to report progress of long-running requests. Default is `TRUE`, but set it to `FALSE` to run quietly.
* Export `pollProgress()` and recommend using that when a long-running request fails to complete within the local timeout.
Expand Down
131 changes: 131 additions & 0 deletions R/make-array.R
Original file line number Diff line number Diff line change
Expand Up @@ -80,6 +80,137 @@ makeMR <- function (subvariables, name, selections, ...) {
return(vardef)
}


#' Create Multiple Response Variable from Delimited lists
#'
#' Surveys often record multiple response questions in delimited lists where
#' each respondent's selections are separated by a delimiter like `;` or `|`.
#' This function breaks the delimited responses into subvariables, uploads those
#' subvariables to Crunch, and finally creates a multiple response variable from
#' them.
#'
#' @param var The variable containing the delimited responses
#' @param delim The delimiter separating the responses
#' @param name The name of the resulting MR variable
#' @param selected A character string used to indicate a selection, defaults to
#' "selected"
#' @param not_selected Character string identifying non-selection, defaults to
#' "not_selected"
#' @param unanswered Character string indicating non-response, defaults to NA.
#' @param ... Other arguments to be passed on to [makeMR()]
#'
#' @return a Multiple response variable definition
#' @export
makeMRFromText <- function (var,
delim,
name,
selected = "selected",
not_selected = "not_selected",
unanswered = NA,
...) {
if (missing(name)) {
halt("Must supply a name for the new variable")
}
if (is.Categorical(var) || is.Text(var)) {
uniques <- names(table(var))
} else {
halt(dQuote(substitute(var)),
" must be a Categorical or Text Crunch Variable.")
}
items <- unique(unlist(strsplit(uniques, delim)))
# make a derivation expression for each unique item
subvarderivs <- lapply(items, function(x) createSubvarDeriv(var, x, delim,
selected, not_selected, unanswered))
names(subvarderivs) <- gsub("\\.", "_", items) # mongo errors if there are dots in the names

# generate the ZCL to make an array from the subvariable derivations, and
# then do selection magic to make an MR
derivation <- zfunc("select_categories",
zfunc("array",
zfunc("select", list(map=subvarderivs),
list(value=I(c(1, 2, 3, 4, 5))))),
list(value=I("selected")))

# hide the original variable
var <- hide(var)
return(VariableDefinition(derivation=derivation, name=name, ...))
}

#' Create subvariable derivation expressions
#'
#' This function creates a single subvariable definition based on a character string
#' to search for and an originating variable. It uses regex to determine whether
#' a string is present in a delimited list, then substitutes the user supplied values
#' to indicate selection, non-selection, and missingness.
#'
#'
#' @inheritParams makeMRFromText
#' @param str A string whose presence indicates a selection
#' @param missing A logical vector indicating which variable entries are missing
#' @keywords internal
#'
#' @return A VariableDefinition
createSubvarDeriv <- function (var, str, delim, selected, not_selected,
unanswered) {
if (is.na(unanswered)) {
unanswered <- "No Data"
}
new_cat_type <- list(
value = list(
class = "categorical",
categories = list(
list("id" = 1,
"name" = unanswered,
"numeric_value" = NA,
"missing" = TRUE),
list("id" = 2,
"name" = selected,
"numeric_value" = NA,
"missing" = FALSE),
list("id" = 3,
"name" = not_selected,
"numeric_value" = NA,
"missing" = FALSE)
)
)
)
new_cat <- list(column = I(1:3), type = new_cat_type)
deriv <- zfunc("case", new_cat)
deriv$args[[2]] <- zfunc("is_missing", var)
deriv$args[[3]] <- zfunc("~=", var, buildDelimRegex(str, delim))
new_alias <- paste0(alias(var), "_", gsub("\\.", "_", str)) # Mongo doesn't allow aliases with dots
deriv$references <- list(name = str, alias = new_alias)
return(deriv)
}

#' Build Regex to find delimited items.
#'
#' A delimited item `maple` can appear in a list in four ways
#' 1. At the start of a list `maple; oak`
#' 1. In the middle of a list `oak; maple; birch`
#' 1. At the end of a list `oak; maple`
#' 1. Alone with no delimiters `maple`
#'
#' This function builds a regex expression which captures those four cases It
#' is mostly broken out of [createSubvarDeriv()] for testing purposes.
#'
#' @inheritParams createSubvarDeriv
#'
#' @return A character string
#' @keywords internal
buildDelimRegex <- function (str, delim){
# the delimeter needs to be escaped in case it's a regex character
delim <- escapeRegex(delim)
str <- escapeRegex(str)
regex <- paste0(
"^", str, delim, "|",
delim, str, delim, "|",
delim, str, "$", "|",
"^", str, "$")
return(regex)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice if this (possibly optionally) ignored white space so that same delimiter could be used for 'oak;maple;pine' as well as 'oak; maple; pine'



#' @rdname makeArray
#' @export
deriveArray <- function (subvariables, name, selections, ...) {
Expand Down
15 changes: 15 additions & 0 deletions R/misc.R
Original file line number Diff line number Diff line change
Expand Up @@ -266,3 +266,18 @@ has.function <- function (query, funcs) {

return(FALSE)
}

#' escape Regex
#'
#' This function takes a string and escapes all of the special characters in the string.
#' So VB.NET becomes VB\.NET. Note that R will print this as VB\\.NET, but `cat` reveals
#' that there's only one `\`.
#' @param string
#'
#' @kerwords internal
#' escapeRegex("Tom&Jerry")
#' escapeRegex(".Net)
escapeRegex <- function(string) {
out <- gsub("([.|()\\^{}+$*?])", "\\\\\\1", string)
return(gsub("(\\[|\\])", "\\\\\\1", out))
}
30 changes: 30 additions & 0 deletions man/buildDelimRegex.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

35 changes: 35 additions & 0 deletions man/createSubvarDeriv.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

36 changes: 36 additions & 0 deletions man/mrFromDelim.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

70 changes: 70 additions & 0 deletions tests/testthat/app.crunch.io/api/datasets/mr_from_delim.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
{
"element": "shoji:entity",
"self": "https://app.crunch.io/api/datasets/mr_from_delim/",
"catalogs": {
"batches": "https://app.crunch.io/api/datasets/mr_from_delim/batches/",
"users": "https://app.crunch.io/api/datasets/mr_from_delim/users/",
"variables": "https://app.crunch.io/api/datasets/mr_from_delim/variables/",
"actions": "https://app.crunch.io/api/datasets/mr_from_delim/actions/",
"savepoints": "https://app.crunch.io/api/datasets/mr_from_delim/savepoints/",
"boxdata": "https://app.crunch.io/api/datasets/mr_from_delim/boxdata/",
"filters": "https://app.crunch.io/api/datasets/mr_from_delim/filters/",
"multitables": "https://app.crunch.io/api/datasets/mr_from_delim/multitables/",
"comparisons": "https://app.crunch.io/api/datasets/mr_from_delim/comparisons/",
"forks": "https://app.crunch.io/api/datasets/mr_from_delim/forks/",
"permissions": "https://app.crunch.io/api/datasets/mr_from_delim/permissions/",
"joins": "https://app.crunch.io/api/datasets/mr_from_delim/joins/",
"decks": "https://app.crunch.io/api/datasets/mr_from_delim/decks/",
"parent": "https://app.crunch.io/api/datasets/",
"weight_variables": "https://app.crunch.io/api/datasets/mr_from_delim/weight_variables/"
},
"fragments": {
"preferences": "https://app.crunch.io/api/datasets/mr_from_delim/preferences/",
"stream": "https://app.crunch.io/api/datasets/mr_from_delim/stream/",
"settings": "https://app.crunch.io/api/datasets/mr_from_delim/settings/",
"visit": "https://app.crunch.io/api/datasets/mr_from_delim/visit/",
"state": "https://app.crunch.io/api/datasets/mr_from_delim/state/",
"table": "https://app.crunch.io/api/datasets/mr_from_delim/table/",
"pk": "https://app.crunch.io/api/datasets/mr_from_delim/pk/",
"exclusion": "https://app.crunch.io/api/datasets/mr_from_delim/exclusion/"
},
"views": {
"cube": "https://app.crunch.io/api/datasets/mr_from_delim/cube/",
"export": "https://app.crunch.io/api/datasets/mr_from_delim/export/",
"summary": "https://app.crunch.io/api/datasets/mr_from_delim/summary/",
"applied_filters": "https://app.crunch.io/api/datasets/mr_from_delim/filters/applied/"
},
"specification": "https://app.crunch.io/api/specifications/datasets/",
"description": "Detail for a given dataset",
"body": {
"size": {
"rows": 4,
"columns": 1
},
"current_editor_name": "Me",
"owner_name": "Me",
"name": "for testing functionality of MR from delimited text",
"end_date": null,
"access_time": "2017-04-12T19:07:06.351000",
"notes": "",
"current_editor": "https://app.crunch.io/api/users/me/",
"creation_time": "2017-04-12T14:34:00.015000",
"archived": false,
"start_date": null,
"modification_time": "2017-04-12T14:34:03.239000",
"app_settings": {
"crunch": {
"deleted_rogue_vp": true
}
},
"owner": "https://app.crunch.io/api/users/me/",
"permissions": {
"edit": true,
"change_permissions": true,
"view": true
},
"is_published": true,
"id": "mr_from_delim",
"description": ""
}
}
Loading