Code Monkey home page Code Monkey logo

edibble's Introduction

edibble

Lifecycle: experimental

The main aims of edibble R-package are:

  1. facilitate statistical thinking of adapting experimental designs to different conditions by implementing the grammar of experimental design to generate new designs, and
  2. lubricate the project flow of generating experimental data.

The name origin of edibble is a play on tibble (a modern take on data frames) with motivation as tibble for experimental design.

The grammatical expressions to generate design can draw parallel to dplyr so some functions may be decoupled to a package called edplyr 🤔

Experimental data

The focus of tidyverse R-package is well suited for the data science project workflow as illustrated below in (B) (from Grolemund and Wickham 2017). For experimental data, statististical project begin before obtaining data as depicted below in (A). The focus of edibble is to facilitate work in (A).

Statistical thinking

The central idea of the grammar of experimental design can be illustrated by drawing analogy to the (layered) grammar of graphics implemented in ggplot2.

Grammar of graphics defines independent graphical components that make possible to graph complex plots and make it extensible. For example, in figure (B) below, a stacked barplot is transformed to a pie chart by transforming the coordinate system from Cartersian to polar coordinates. The corresponding functions to generate these figures in base R are barplot and pie. These singular purpose graphic functions can only generate the corresponding “named plots”. Hence why ggplot2 is powerful as the functions are no longer limiting the creation of different plots.

In experimental design, agricolae is by far the most popular R package on the CRAN task view of experimental design. Just as there are “named plots”, there are “named experimental designs”. E.g. completely randomised design (CRD), randomised complete block design (CRBD), balanced incomplete block design (BIBD) and so on. The functions to generate designs in agricolae are motivated by these named designs (e.g. design.crd() and design.rcbd()). These functions are limiting the adaptation of experimental designs to different conditions in a similar manner to the base plots. Figure (A) below shows how CRD is tranformed to RCBD by imposing a blocking structure on the units (provided the units and treatment number are conformable).

Below are connections between different named experimental designs.

Below show examples where the elements of experimental designs are identified.

Project Flow

  • edibble will integrate an easy way to export the design in a rectangular tabular format so there is a smooth translation from the generation of experimental design to data collection.

  • edibble will allow easy visualisation of experimental designs that serve as a communication medium in a number scenarios. For example, an easy way to visualise the experimental design would ensure that all parties involved can digest the design quickly and correct any mistake as needed before conducting the experiment.

API Prototype

The core elements of experimental design are trts and units. Specifications of more complex designs include:

  • phase for multi-phase designs;
  • index for temporal components (default value 0);
  • keys that uniquely identify observational units;
  • map define mapping structures;
  • traits for observations;
  • strata for nested designs; (questioning this)
  • block and cluster are synonyms for grouped units.

These are manipulated by the verb that precedes these elements separated by _. Some verbs include set, get, group, modify, split, cross, combine, apply, assign, randomise, measure, allocate, permute, cluster.

The prototype code below generates a classic split-plot deisgn:

design <- edibble() %>% 
  # [    ]
  set_trts(A = 1:3, B = c("X", "Y")) %>% 
  # [ T1: A:{1 2 3}  ]
  # [ T2: B:{X Y}    
  set_units(18) %>% 
  # [ T1: A:{1 2 3}  ]
  # [ T2: B:{X Y}    ]
  # [ U1: 1, ..., 18                ]
  group_units(size = 3) %>% 
  # [ T1: A:{1 2 3}  ]
  # [ T2: B:{X Y}    ]
  # [ U1: 1, ..., 18                ]
  # [ U2: [1,2,3], [4,5,6],...,[16,17,18]]
  randomise_map(.U1 ~ .T1,
                .U2 ~ .T2) 
  # [ T1: A:{1 2 3}  ]
  # [ T2: B:{X Y}    ]
  # [ U1: 1, ..., 18                ]
  # [ U2: [1,2,3], [4,5,6],...,[16,17,18]]
  # U1 -> T1 [random]
  # U2 -> T2 [random]

The input argument of edibble function can also include data followed by the user specifying which columns are treatment, units, grouping and so on.

Optimal Design

Ideally edibble will also allow for model-based design by feeding in the statistical model in its framework.

Sampling Surveys

While the initial focus will be experiments were resources are well defined, but the final goal is to implement workflow to facilitate survey sampling designs as well. Sample surveys primarily focus on how to sample including the sample size needed. These can be implemented by say a function (that may be user-defined) that takes into account previous settings.

sample_size <- function(d) {
  pwrout <- pwr::pwr.t.test(d = d, sig.level=.05, 
                            power = .90, type = 'two.sample')
  return(2 * ceiling(pwrout$n))
}

edibble() %>% 
  set_trts(t = 2) %>% 
  set_units(n = sample_size(0.3))

Outputs

The resulting output of edibble will be a modified tibble (built from pillar). It will include summary of treatments and units. The plan is also to have the number of levels of factors printed as well.

As under the hood, edibble output is just a data frame, it is easy to export the output to a spreadsheet using standard exporting tools for data frames. While this seems such a small issue, this is important for lubricating the generation of experimental design to data collection. Ultimately the person collecting the data should aim to add extra column(s) for the observed data to the edibble output.

The edibble output can also be used to quickly generate a sketch of the design by adding one extra function in the pipeline. E.g.

design %>% visualise_design()

Using a similar mechanism to lazy_dt in dtplyr, edibble should also be able to animate the steps required to generate the design. This will make it useful for presentations or general communication to ensure all parties involved in the generation of experimental design are all on the same page.

design %>% animate_design()

These functions are designed to visualise experimental design fast. Some more fine tuning can be provided so that users can customise the visualisation outputs for publications or reports. E.g. modify_trts(.T1, images = c("drugA.jpg", "drugB.jpg")) to modify from standard circle units to user supplied images.

*Note this API prototype code is still experimental and bound to change.

ANOVA Table

Another table output which edibble will support is ANOVA tables. These include the so-called skeleton ANOVA table which is essentially ANOVA without the data (no sum of squares, mean squares and so on). The edibble ANOVA output will especially be useful for complex nested experiment and non-orthogonal designs.

Hasse diagrams

Another visualisation output in edibble will include Hasse diagram to show relationshional structures with different sources of variation.

Installation

Currently the package is not available for installation.

Related Work

  • DeclareDesign although motivation is different to edibble.
  • desplot for visualising designs.
  • designr for balanced factorial designs with crossed and nested random and fixed effect to data frame
  • dae for functions useful in the design and ANOVA of experiments (consider using this for the backend)

More info

  • I presented this work on 27th Nov 2019 at NUMBATs, Monash. The slide can be found here.

Acknowlegement

Thanks to Francis Hui, Di Cook, Rob Hyndman and Nick Tierney for their feedback on the proposed idea.

edibble's People

Contributors

emitanaka avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.