Code Monkey home page Code Monkey logo

xltabr's Introduction

Coverage Status Build Status Cran Status Cran Downloads

Warning: xltabr is in early development. Please raise an issue if you find any bugs

Introduction

xltabr allows you to write formatted cross tabulations to Excel using openxlsx. It has been developed to help automate the process of publishing Official Statistics.

The package works best when the input dataframe is the output of a crosstabulation performed by reshape2:dcast. This allows the package to autorecognise various elements of the cross tabulation, which can be styled accordingly.

For example, given a crosstabulation ct produced by reshape2, the following code produces the table shown.

titles = c("Breakdown of car statistics", "Cross tabulation of drive and age against type*")
footers = "*age as of January 2015"
wb <- xltabr::auto_crosstab_to_wb(ct, titles = titles, footers = footers)
openxlsx::openXL(wb)

image

This readme provides a variety of examples of increasing complexity. It is based on a simulated dataset built into the package, which you can see here.

Getting started

Much of xltabr utility comes from its ability to automatically format cross tabulations which have been produced by reshape2:dcast.

The package provides a core convenience function called xltabr::auto_crosstab_to_xl. This wraps more advanced functionality, at the cost of reducing flexibility.

The following code assumes you've read in the synthetic data as follows:

# Read in data 
path <- system.file("extdata", "synthetic_data.csv", package="xltabr")
df <- read.csv(path, stringsAsFactors = FALSE)

Example 1: Simple cross tabulation to Excel

# Create a cross tabulation using reshape2
ct <- reshape2::dcast(df, drive + age  ~ type, value.var= "value", margins=c("drive", "age"), fun.aggregate = sum)
ct <- dplyr::arrange(ct, -row_number())

# Use the main convenience function from xltabr to output to excel
tab <- xltabr::auto_crosstab_to_wb(ct, return_tab = TRUE)  #wb is an openxlsx workbook object
openxlsx::openXL(tab$wb)

image

Example 2: Standard data frame to Excel

There is also a convenience function to write a standard data.frame to Excel:

wb <- xltabr::auto_df_to_wb(mtcars)
openxlsx::openXL(wb)

image

Example 3: Add in titles and footers

titles = c("Breakdown of car statistics", "Cross tabulation of drive and age against type*")
footers = "*age as of January 2015"
wb <- xltabr::auto_crosstab_to_wb(ct, titles = titles, footers = footers)
openxlsx::openXL(wb)

image

Example 4: Supply custom styles

path <- system.file("extdata", "styles_pub.xlsx", package = "xltabr")
cell_path <- system.file("extdata", "style_to_excel_number_format_alt.csv", package = "xltabr")
xltabr::set_style_path(path)
xltabr::set_cell_format_path(cell_path)
wb <- xltabr::auto_crosstab_to_wb(ct)
openxlsx::openXL(wb)

image

Example 5: Output more than one table

# Change back to default styles
xltabr::set_style_path()
xltabr::set_cell_format_path()

# Create second crosstab
ct2 <- reshape2::dcast(df, drive + age ~ colour, value.var= "value", margins=c("drive", "age"), fun.aggregate = sum)
ct2 <- dplyr::arrange(ct2, -row_number())

tab <- xltabr::auto_crosstab_to_wb(ct, titles = titles, footers = c(footers, ""), return_tab = TRUE)

titles2 = c("Table 2: More car statistics", "Cross tabulation of drive and age against colour*")
footers2 = "*age as of January 2015"
wb <- xltabr::auto_crosstab_to_wb(ct2, titles = titles2, footers = footers2, insert_below_tab = tab)
openxlsx::openXL(wb)

image

Example 6: Output more than one table, with different styles

tab <- xltabr::auto_crosstab_to_wb(ct, titles = titles, footers = c(footers, ""), return_tab = TRUE)

xltabr::set_style_path(path)
xltabr::set_cell_format_path(cell_path)

wb <- xltabr::auto_crosstab_to_wb(ct2, titles = titles2, footers = footers2, insert_below_tab = tab)
openxlsx::openXL(wb)

# Change back to default styles
xltabr::set_style_path()
xltabr::set_cell_format_path()

image

Example 7: Auoindent off

ct <- reshape2::dcast(df, drive + age  ~ type, value.var= "value",  fun.aggregate = sum)
wb <- xltabr::auto_crosstab_to_wb(ct, titles = titles, footers = c(footers, ""), indent = FALSE, left_header_colnames = c("drive", "age"))
openxlsx::openXL(wb)

image

auto_crosstab_to_wb options

The following provides a list of all the options you can provide to auto_crosstab_to_wb

## Take a cross tabulation produced by 'reshape2::dcast' and output a formatted openxlsx wb object
## 
## Description:
## 
##      Take a cross tabulation produced by 'reshape2::dcast' and output a formatted openxlsx wb object
## 
## Usage:
## 
##      auto_crosstab_to_wb(df, auto_number_format = TRUE, top_headers = NULL,
##        titles = NULL, footers = NULL, auto_open = FALSE, indent = TRUE,
##        left_header_colnames = NULL, vertical_border = TRUE, return_tab = FALSE,
##        auto_merge = TRUE, insert_below_tab = NULL, total_text = NULL,
##        include_header_rows = TRUE, wb = NULL, ws_name = NULL,
##        number_format_overrides = list(), fill_non_values_with = list(na = NULL,
##        nan = NULL, inf = NULL, neg_inf = NULL), allcount_to_level_translate = NULL,
##        left_header_col_widths = NULL, body_header_col_widths = NULL)
##      
## Arguments:
## 
##       df: A data.frame.  The cross tabulation to convert to Excel
## 
## auto_number_format: Whether to automatically detect number format
## 
## top_headers: A list.  Custom top headers. See 'add_top_headers()'
## 
##   titles: The title.  A character vector.  One element per row of title
## 
##  footers: Table footers.  A character vector.  One element per row of footer.
## 
## auto_open: Boolean. Automatically open Excel output.
## 
##   indent: Automatically detect level of indentation of each row
## 
## left_header_colnames: The names of the columns that you want to designate as left headers
## 
## vertical_border: Boolean. Do you want a left border?
## 
## return_tab: Boolean.  Return a tab object rather than a openxlsx workbook object
## 
## auto_merge: Boolean.  Whether to merge cells in the title and footers to width of body
## 
## insert_below_tab: A existing tab object.  If provided, this table will be written on the same sheet, below the provided tab.
## 
## total_text: The text that is used for the 'grand total' of a cross tabulation
## 
## include_header_rows: Boolean - whether to include or omit the header rows
## 
##       wb: A existing openxlsx workbook.  If not provided, a new one will be created
## 
##  ws_name: The name of the worksheet you want to write to
## 
## number_format_overrides: e.g. list("colname1" = "currency1") see auto_style_number_formatting
## 
## fill_non_values_with: Manually specify a list of strings that will replace non numbers types NA, NaN, Inf and -Inf. e.g. list(na = '*', nan = '', inf = '-', neg_inf = '-'). Note: NaNs are not treated as NAs.
## 
## allcount_to_level_translate: Manually specify how to translate summary levels into header formatting
## 
## left_header_col_widths: Width of row header columns you wish to set in Excel column width units. If singular, value is applied to all row header columns. If a vector, vector must have length equal to the number of row headers in workbook. Use special case "auto" for
##           automatic sizing. Default (NULL) leaves column widths unchanged.
## 
## body_header_col_widths: Width of body header columns you wish to set in Excel column width units. If singular, value is applied to all body columns. If a vector, vector must have length equal to the number of body headers in workbook. Use special case "auto" for
##           automatic sizing. Default (NULL) leaves column widths unchanged.
## 
## Examples:
## 
##      crosstab <- read.csv(system.file("extdata", "example_crosstab.csv", package="xltabr"))
##      wb <- auto_crosstab_to_wb(crosstab)
## 

Advanced usage

The simple examples above wrap lower-level functions. These functions can be used to customise the output in a number of ways.

The following example shows the range of functions available.

tab <- xltabr::initialise() %>%  #Options here for providing an existing workbook, changing worksheet name, and position of table in wb
  xltabr::add_title(title_text) %>% # Optional title_style_names allows user to specify formatting
  xltabr::add_top_headers(h_list) %>% # Optional row_style_names and col_style_names allows custom formatting
  xltabr::add_body(df) %>%  #Optional left_header_colnames, row_style_names, left_header_style_names col_style names
  xltabr::add_footer(footer_text) %>% # Optional footer_style_names
  xltabr::auto_detect_left_headers() %>% # Auto detect left headers through presence of keyword, default = '(all)'
  xltabr::auto_detect_body_title_level() %>% # Auto detect level of emphasis of each row in body, through presence of keyword
  xltabr::auto_style_indent() %>% # Consolidate all left headers into a single column, with indentation to signify emphasis level
  xltabr::auto_merge_title_cells() %>% # merge the cells in the title
  xltabr::auto_merge_footer_cells() # merge the cells in the footer

The convenience functions contain further examples of how to build up a tab. See here.

Implementation diagrams.

See here

xltabr's People

Contributors

isichei avatar robinl avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

xltabr's Issues

Dealing with redactions / NULLs

Often figures will be redacted and replaced with a "*". This functionality might be useful as this is currently a very cumbersome task to do manually. Need to think how to generalise and only apply to body.

Allow automatic cross tabs with multiple top header rows

When we use

reshape2::dcast and provide a formula with multiple categories in the top headers, dcast concatenates categories with _ into a single row. See here.

We want these to be able to occupy multiple rows in the excel spreadsheet

In practice, this probably means we need to provide a kind of xltabr::dcast type function, which wraps the reshape2 one, running reshape2::dcast under the hood, but returning a top_headers list which is compatible with the xltabr::add_top_headers function.

An alternative would be to provide a xltabr::get_top_headers_from_croostab type function, but I can't see a way of parsing out the list of top header rows unambiguously. The _ is not an unambiguous delimiter because it can occur in the columns.

A plausible algnorithm could be to replace _ in the cols of data in the data frame which is crosstabulated with e.g. --+--, dcaston these new columns, then split column headers on _, th

Formatting is different if title and footer is added

Compare the results of the following two chunks:
xltabr:::combine_all_styles(tab) gives the same result for all the body cells, but the styling that's actually applied to the workbook is different

# Test 1
ct <- reshape2::dcast(mtcars, am + gear ~ cyl, value.var= "mpg", margins=c("am", "gear"), fun.aggregate = mean)
headers <- colnames(ct)
tab <- xltabr::initialise()
tab <- xltabr::add_top_headers(tab, headers)
tab <- xltabr::add_body(tab, ct)
tab <- xltabr:::auto_detect_left_headers(tab)
tab <- xltabr:::auto_detect_body_title_level(tab)
tab <- xltabr:::auto_style_indent(tab)
tab <- xltabr::auto_style_number_formatting(tab)
tab <- xltabr:::write_all_elements_to_wb(tab)
tab <- xltabr:::add_styles_to_wb(tab)
xltabr:::combine_all_styles(tab)

if (open_output) openxlsx::openXL(tab$wb) else {
  if (file.exists("test1.xlsx")) file.remove("test1.xlsx")
  openxlsx::saveWorkbook(tab$wb, "test1.xlsx")
}

tab$body$body_df
# Test 2
ct <- reshape2::dcast(mtcars, am + gear ~ cyl, value.var= "mpg", margins=c("am", "gear"), fun.aggregate = mean)
headers <- colnames(ct)
tab <- xltabr::initialise()
tab <- xltabr::add_top_headers(tab, headers)
tab <- xltabr::add_body(tab, ct)
tab <- xltabr::add_title(tab, "Here is a title")
tab <- xltabr::add_footer(tab, "Here is a footer")
tab <- xltabr:::auto_detect_left_headers(tab)
tab <- xltabr:::auto_detect_body_title_level(tab)
tab <- xltabr:::auto_style_indent(tab)
tab <- xltabr::auto_style_number_formatting(tab)
tab <- xltabr:::write_all_elements_to_wb(tab)
tab <- xltabr:::add_styles_to_wb(tab)
xltabr:::combine_all_styles(tab)

if (open_output) openxlsx::openXL(tab$wb) else {
  if (file.exists("test2.xlsx")) file.remove("test2.xlsx")
  openxlsx::saveWorkbook(tab$wb, "test2.xlsx")
}

Merge cells

Do we need to incorporate any functionality that merges cells in multi-dimensional cross tabulations?

Needs style name checker

Code currently assumes all style names e.g. "left_header" are specified in the styles.xlsx and therefore will exist in style_catelogue. Need to add code to check this before calling add_styles_to_wb or any other style catelogue code.

Improve number detection

Ideas:

Look at numbers in column - if max(num) > 1000, use a different format to eg. if the max is 1. Choose decimal places accordingly

Ability to update style, with overrides

Need to implement functionality that allows you to add additional style elements to a cell. At the moment, you can impose a new style, but this delete all previous styling

Tidy up core tab definition

Need to make sure we're happy with the core properties on tab, particularly so that the tab keeps track of e.g. its extent, how many header and footer rows it has etc, and we have a consistent way for keeping all of these fields up to date as we change things

Fix R build ignore

This is causing some issues with the r cmd check, e.g. the travis file - need to add all non essential files in github

Fails when style[["numFmt"]] is null

Tracked bug down to here:

out_style <- openxlsx::createStyle(

Whehn openxlsx::createStyle get passed numFmt =NULL it raises and error Error in if (numFmt == "date") { : argument is of length zero (the error seems to be unrelated to "date", it's just that when numFmt is NULL the if statement cannot evaluate.

To reproduce this error, use the following code:

library(magrittr)

headers <- names(mtcars)

tab <- xltabr::initialise() %>%
  xltabr::add_top_headers(headers) %>%
  xltabr::add_body(mtcars) %>% 
  xltabr::auto_style_number_formatting()

tab <- xltabr:::write_all_elements_to_wb(tab)

xltabr:::add_styles_to_wb(tab)


Note output from

xltabr:::top_headers_get_cell_styles_table(tab)
xltabr:::body_get_cell_styles_table(tab)

looks fine

2 minor bugs

  • We're currently using a deprecated sheet function
  • Fixed a tiny bug in initialise

Fix incorrect indentation and add regressio ntest

      a     b     c d      e          f    meta_row_             meta_left_header_row_
1     a     b (all) 1 1001.1 2017-01-01 body|title_3 body|left_header|title_3|indent_3
2     a (all) (all) 2 2001.1 2017-01-02 body|title_2 body|left_header|title_2|indent_2
3 (all) (all) (all) 3 3001.1 2017-01-03 body|title_1 body|left_header|title_1|indent_1
4     a     b     c 4 4001.1 2017-01-04         body                  body|left_header

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.