Skip to content

yng-me/tsgx

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

About the tsgx package


tsgx stands for "table summary generator." This package is designed to facilitate generation of statistical summary tables with ease. It also adheres to the tidyverse specifications.

The package allows you to:

  • generate frequency tables, cross-tabulations (2-way table or more);
  • extract multiple-letter response variable from survey data;
  • include frequency and/or percent distributions in the generated tables;
  • specify whether the 'percent to total' is computed by row (default) or by column;
  • export Excel file with default formatting/styling which is can also be customized.

Installation

You may install the tsgx package either from GitHub or R-CRAN.

# Install devtools if not yet installed in your machine
if(!('devtools' %in% installed.packages()[,'Package'])){
   install.packages('devtools')
}

# Install the package from GitHub
devtools::install_github('yng-me/tsgx')

# Install via R-CRAN
install.packages('tsgx')

Then load the package after installation.

library(tsgx)

tsgx core functions

1. generate_frequency

This function allows you to generates a frequency distribution table (marginal table) of a categorical variable x specified in its second argument. It returns five (5) columns by default if x_group is not specified. These include (1) categories of x, (2) frequency of each category, (3) percent to total, (4) cumulative frequency, and (5) cumulative percent to total.

generate_frequency(
  .data,
  x,
  x_group = NULL,
  x_label = get_config('x_label'),
  sort_frequency = FALSE,
  x_as_group = FALSE,
  include_total = TRUE,
  include_cumulative = TRUE,
  exclude_zero_value = FALSE
)

Parameters:

`.data` Required. A data frame, data frame extension (e.g. a `tibble`), a lazy data frame (e.g. from `dbplyr` or `dtplyr`), or Arrow data format.
`x` Required. Variable to be used as categories.
`x_group` Accepts a vector of string/character as grouping variables present in the input `.data`.
`x_label` Stubhead label or label for `x`.
`x_as_group` Use `x` variable as top level grouping
`sort_frequency` Whether to sort the output. If set to `TRUE`, the frequency will be sorted in descending order.
`include_total` Whether to include row total.
`include_cumulative` Whether to cumulative frequencies.
`exclude_zero_value` Whether to drop categories with zero (0) values

Example 1.1: Basic usage

library(palmerpenguins)

generate_frequency(penguins, species)

Example 1.2: Add grouping variable and define label for x

penguins |> 
  generate_frequency(
    x = sex, 
    x_group = 'species', 
    x_label = 'Sex'
  )

Example 1.3: Add grouping variable, use x as group, and exclude column total

penguins |> 
  generate_frequency(
    x = sex, 
    x_group = 'species', 
    x_as_group = TRUE, 
    include_total = FALSE
  )

Example 1.4: Exclude cumulative values and sort the output by frequency

penguins |> 
  generate_frequency(
    x = species, 
    x_label = 'Species', 
    sort_frequency = TRUE, 
    include_cumulative = FALSE
  )

Example 1.5: Exclude cumulative values and define multiple grouping variables

dplyr::starwars |> 
  generate_frequency(
    x = sex, 
    x_group = c('skin_color', 'gender'), 
    x_label = 'Sex', 
    include_cumulative = FALSE
  )

2. generate_crosstab

generate_crosstab extends the functionality of generate_frequency by allowing you to generate cross-tabulations of two (2) or more categorical variables.

generate_crosstab(
  .data,
  x,
  y = NULL,
  x_group = NULL,
  y_group = NULL,
  x_label = get_config('x_label'),
  y_group_separator = '>',
  x_as_group = FALSE,
  total_by = 'row',
  group_values_by = 'statistics',
  include_frequency = TRUE,
  include_proportion = TRUE,
  include_column_total = TRUE,
  convert_to_percent = TRUE,
  format_precision = 2,
  total_label = NULL,
  ...
) 

Parameters:

`.data` Required. A data frame, data frame extension (e.g. a tibble), a lazy data frame (e.g. from dbplyr or dtplyr), or Arrow data format.
`x` Required. Variable to be used as categories.
`y` Variable to be used as columns (like in pivot_wider). If not supplied, `generate_frequency` will used in the function call.
`x_group` Accepts a vector of string/character as grouping variables.
`y_group` Accepts a vector of string/character as grouping variables in the column.
`x_label` Stubhead label or label for `x`.
`x_as_group` Use `x` variable as top level grouping
`y_group_separator` A character string that defines the column separator to be used to show table hierarchy.
`total_by` Accepts `row` | `column`. Whether to apply the sum columnwise or rowwise.
`group_values_by` Accepts `statistics` | `indicators`.
`include_frequency` Whether to include frequency columns.
`include_proportion` Whether to include proportion/percentage columns.
`include_column_total` Whether to include column total.
`convert_to_percent` Whether to format to percent or proportion.
`format_precision` *[Not yet implemented]* Specify the precision of rounding the percent or proportion. Default is `2`.
`total_label` Whether to rename the column total.
`...` Valid arguments for `generate_frequency`.

Example 2.1: Basic usage

penguins |> 
  generate_crosstab(
    x = species, 
    y = sex
  )

Example 2.2: Percent/proportion total by column

penguins |> 
  generate_crosstab(
    x = species, 
    y = sex,
    total_by = 'column'
  )

Example 2.3: Exclude frequencies

penguins |> 
  generate_crosstab(
    x = species, 
    y = sex,
    include_frequency = F
  )

Example 2.4: Exclude percentages/proportions

penguins |> 
  generate_crosstab(
    x = species, 
    y = sex,
    include_proportion = F
  )

Example 2.5: Add row grouping variable and exclude percentages/proportions

penguins |> 
  generate_crosstab(
    x = species, 
    y = sex, 
    x_group = 'island',
    include_proportion = F
  )

Example 2.6: Add column grouping variable, exclude frequencies, and convert_to_percent set to FALSE.

penguins |> 
  generate_crosstab(
    x = species, 
    y = sex, 
    y_group = 'island',
    convert_to_percent = F,
    include_frequency = F
  )

3. generate_multiple_response

This function allows you to generate summary table from a multiple response category.

generate_multiple_response(
  .data,
  x,
  ...,
  y = NULL,
  x_group = NULL,
  x_label = get_config('x_label'),
  x_as_group = FALSE,
  y_group_separator = '>',
  group_values_by = 'statistics',
  value_to_count = 1,
  include_frequency = TRUE,
  include_proportion = TRUE,
  format_precision = 2,
  convert_to_percent = TRUE
) 

Parameters:

`.data` Required. A data frame, data frame extension (e.g. a tibble), a lazy data frame (e.g. from dbplyr or dtplyr), or Arrow data format.
`x` Required. Variable to be used as categories.
`...` Columns with binary-coded response (generally). Use tidyselect specification.
`y` Column variable to specify for a letter-coded response.
`x_group` Accepts a vector of string/character as grouping variables.
`x_label` Stubhead label or label for `x`.
`x_as_group` Use `x` as top-level grouping. Applicable only if `x_group` is specified.
`y_group_separator` A character string that defines the column separator to be used to show table hierarchy.
`group_values_by` Accepts `statistics` | `indicators`.
`include_frequency` Whether to include frequency columns.
`include_proportion` Whether to include proportion/percentage columns.
`convert_to_percent` Whether to format to percent or proportion.
`format_precision` *[Not yet implemented]* Specify the precision of rounding the percent or proportion. Default is `2`.

Example 3.1: Basic usage (extract multiple-letter response)

df <- data.frame(
  category = c("G1", "G1", "G2", "G1", "G2", "G1"),
  response = c("AB", "AC", "B", "ABC", "AB", "C"),
  A = c(1, 1, 0, 1, 1, 0),
  B = c(1, 0, 1, 1, 1, 0),
  C = c(0, 1, 0, 1, 0, 1)
) 

df |> generate_multiple_response(category, y = response)

Example 3.1: Basic usage (wide format multiple response)

df |> generate_multiple_response(category, A:C)

4. generate_as_list

generate_as_list(
  .data,
  list_group,
  x,
  ...,
  fn = 'generate_crosstab',
  list_name_overall = 'ALL',
  exclude_overall = FALSE,
  collapse_overall = TRUE,
  save_as_excel = FALSE,
  formatted = TRUE,
  filename = NULL
)

Example 2.1: Basic usage

penguins |> 
  generate_as_list(
    list_group = island,
    x = species, 
    sex
  )

About

No description, website, or topics provided.

Resources

License

Unknown, MIT licenses found

Licenses found

Unknown
LICENSE
MIT
LICENSE.md

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published