larmarange / broom.helpers Goto Github PK

View Code? Open in Web Editor NEW

21.0 4.0 7.0 13.41 MB

A set of functions to facilitate manipulation of tibbles produced by broom

Home Page: https://larmarange.github.io/broom.helpers/

License: GNU General Public License v3.0

R 100.00%

broom.helpers's Introduction

broom.helpers

The broom.helpers package provides suite of functions to work with regression model broom::tidy() tibbles.

The suite includes functions to group regression model terms by variable, insert reference and header rows for categorical variables, add variable labels, and more.

broom.helpers is used, in particular, by gtsummary::tbl_regression() for producing nice formatted tables of model coefficients and by ggstats::ggcoef_model() for plotting model coefficients.

Installation & Documentation

To install stable version:

install.packages("broom.helpers")

Documentation of stable version: https://larmarange.github.io/broom.helpers/

To install development version:

remotes::install_github("larmarange/broom.helpers")

Documentation of development version: https://larmarange.github.io/broom.helpers/dev/

Examples

all-in-one wrapper

mod1 <- lm(Sepal.Length ~ Sepal.Width + Species, data = iris)
library(broom.helpers)
ex1 <- mod1 %>% tidy_plus_plus()
ex1
#> # A tibble: 4 × 17
#>   term              variable  var_label var_class var_type var_nlevels contrasts
#>   <chr>             <chr>     <chr>     <chr>     <chr>          <int> <chr>    
#> 1 Sepal.Width       Sepal.Wi… Sepal.Wi… numeric   continu…          NA <NA>     
#> 2 Speciessetosa     Species   Species   factor    categor…           3 contr.tr…
#> 3 Speciesversicolor Species   Species   factor    categor…           3 contr.tr…
#> 4 Speciesvirginica  Species   Species   factor    categor…           3 contr.tr…
#> # ℹ 10 more variables: contrasts_type <chr>, reference_row <lgl>, label <chr>,
#> #   n_obs <dbl>, estimate <dbl>, std.error <dbl>, statistic <dbl>,
#> #   p.value <dbl>, conf.low <dbl>, conf.high <dbl>
dplyr::glimpse(ex1)
#> Rows: 4
#> Columns: 17
#> $ term           <chr> "Sepal.Width", "Speciessetosa", "Speciesversicolor", "S…
#> $ variable       <chr> "Sepal.Width", "Species", "Species", "Species"
#> $ var_label      <chr> "Sepal.Width", "Species", "Species", "Species"
#> $ var_class      <chr> "numeric", "factor", "factor", "factor"
#> $ var_type       <chr> "continuous", "categorical", "categorical", "categorica…
#> $ var_nlevels    <int> NA, 3, 3, 3
#> $ contrasts      <chr> NA, "contr.treatment", "contr.treatment", "contr.treatm…
#> $ contrasts_type <chr> NA, "treatment", "treatment", "treatment"
#> $ reference_row  <lgl> NA, TRUE, FALSE, FALSE
#> $ label          <chr> "Sepal.Width", "setosa", "versicolor", "virginica"
#> $ n_obs          <dbl> 150, 50, 50, 50
#> $ estimate       <dbl> 0.8035609, 0.0000000, 1.4587431, 1.9468166
#> $ std.error      <dbl> 0.1063390, NA, 0.1121079, 0.1000150
#> $ statistic      <dbl> 7.556598, NA, 13.011954, 19.465255
#> $ p.value        <dbl> 4.187340e-12, NA, 3.478232e-26, 2.094475e-42
#> $ conf.low       <dbl> 0.5933983, NA, 1.2371791, 1.7491525
#> $ conf.high      <dbl> 1.013723, NA, 1.680307, 2.144481

mod2 <- glm(
  response ~ poly(age, 3) + stage + grade * trt,
  na.omit(gtsummary::trial),
  family = binomial,
  contrasts = list(
    stage = contr.treatment(4, base = 3),
    grade = contr.sum
  )
)
ex2 <- mod2 %>%
  tidy_plus_plus(
    exponentiate = TRUE,
    variable_labels = c(age = "Age (in years)"),
    add_header_rows = TRUE,
    show_single_row = "trt"
  )
ex2
#> # A tibble: 17 × 19
#>    term   variable var_label var_class var_type var_nlevels header_row contrasts
#>    <chr>  <chr>    <chr>     <chr>     <chr>          <int> <lgl>      <chr>    
#>  1 <NA>   age      Age (in … nmatrix.3 continu…          NA TRUE       <NA>     
#>  2 poly(… age      Age (in … nmatrix.3 continu…          NA FALSE      <NA>     
#>  3 poly(… age      Age (in … nmatrix.3 continu…          NA FALSE      <NA>     
#>  4 poly(… age      Age (in … nmatrix.3 continu…          NA FALSE      <NA>     
#>  5 <NA>   stage    T Stage   factor    categor…           4 TRUE       contr.tr…
#>  6 stage1 stage    T Stage   factor    categor…           4 FALSE      contr.tr…
#>  7 stage2 stage    T Stage   factor    categor…           4 FALSE      contr.tr…
#>  8 stage3 stage    T Stage   factor    categor…           4 FALSE      contr.tr…
#>  9 stage4 stage    T Stage   factor    categor…           4 FALSE      contr.tr…
#> 10 <NA>   grade    Grade     factor    categor…           3 TRUE       contr.sum
#> 11 grade1 grade    Grade     factor    categor…           3 FALSE      contr.sum
#> 12 grade2 grade    Grade     factor    categor…           3 FALSE      contr.sum
#> 13 grade3 grade    Grade     factor    categor…           3 FALSE      contr.sum
#> 14 trtDr… trt      Chemothe… character dichoto…           2 NA         contr.tr…
#> 15 <NA>   grade:t… Grade * … <NA>      interac…          NA TRUE       <NA>     
#> 16 grade… grade:t… Grade * … <NA>      interac…          NA FALSE      <NA>     
#> 17 grade… grade:t… Grade * … <NA>      interac…          NA FALSE      <NA>     
#> # ℹ 11 more variables: contrasts_type <chr>, reference_row <lgl>, label <chr>,
#> #   n_obs <dbl>, n_event <dbl>, estimate <dbl>, std.error <dbl>,
#> #   statistic <dbl>, p.value <dbl>, conf.low <dbl>, conf.high <dbl>
dplyr::glimpse(ex2)
#> Rows: 17
#> Columns: 19
#> $ term           <chr> NA, "poly(age, 3)1", "poly(age, 3)2", "poly(age, 3)3", …
#> $ variable       <chr> "age", "age", "age", "age", "stage", "stage", "stage", …
#> $ var_label      <chr> "Age (in years)", "Age (in years)", "Age (in years)", "…
#> $ var_class      <chr> "nmatrix.3", "nmatrix.3", "nmatrix.3", "nmatrix.3", "fa…
#> $ var_type       <chr> "continuous", "continuous", "continuous", "continuous",…
#> $ var_nlevels    <int> NA, NA, NA, NA, 4, 4, 4, 4, 4, 3, 3, 3, 3, 2, NA, NA, NA
#> $ header_row     <lgl> TRUE, FALSE, FALSE, FALSE, TRUE, FALSE, FALSE, FALSE, F…
#> $ contrasts      <chr> NA, NA, NA, NA, "contr.treatment(base=3)", "contr.treat…
#> $ contrasts_type <chr> NA, NA, NA, NA, "treatment", "treatment", "treatment", …
#> $ reference_row  <lgl> NA, NA, NA, NA, NA, FALSE, FALSE, TRUE, FALSE, NA, FALS…
#> $ label          <chr> "Age (in years)", "Age (in years)", "Age (in years)²", …
#> $ n_obs          <dbl> NA, 92, 56, 80, NA, 46, 50, 35, 42, NA, 63, 53, 57, 90,…
#> $ n_event        <dbl> NA, 31, 17, 22, NA, 17, 12, 13, 12, NA, 20, 16, 18, 30,…
#> $ estimate       <dbl> NA, 20.2416394, 1.2337899, 0.4931553, NA, 1.0047885, 0.…
#> $ std.error      <dbl> NA, 2.3254455, 2.3512842, 2.3936657, NA, 0.4959893, 0.5…
#> $ statistic      <dbl> NA, 1.29340459, 0.08935144, -0.29533409, NA, 0.00963137…
#> $ p.value        <dbl> NA, 0.1958712, 0.9288026, 0.7677387, NA, 0.9923154, 0.1…
#> $ conf.low       <dbl> NA, 0.225454425, 0.007493208, 0.004745694, NA, 0.379776…
#> $ conf.high      <dbl> NA, 2315.587655, 100.318341, 74.226179, NA, 2.683385, 1…

fine control

ex3 <- mod1 %>%
  # perform initial tidying of model
  tidy_and_attach() %>%
  # add reference row
  tidy_add_reference_rows() %>%
  # add term labels
  tidy_add_term_labels() %>%
  # remove intercept
  tidy_remove_intercept()
ex3
#> # A tibble: 4 × 16
#>   term              variable  var_label var_class var_type var_nlevels contrasts
#>   <chr>             <chr>     <chr>     <chr>     <chr>          <int> <chr>    
#> 1 Sepal.Width       Sepal.Wi… Sepal.Wi… numeric   continu…          NA <NA>     
#> 2 Speciessetosa     Species   Species   factor    categor…           3 contr.tr…
#> 3 Speciesversicolor Species   Species   factor    categor…           3 contr.tr…
#> 4 Speciesvirginica  Species   Species   factor    categor…           3 contr.tr…
#> # ℹ 9 more variables: contrasts_type <chr>, reference_row <lgl>, label <chr>,
#> #   estimate <dbl>, std.error <dbl>, statistic <dbl>, p.value <dbl>,
#> #   conf.low <dbl>, conf.high <dbl>
dplyr::glimpse(ex3)
#> Rows: 4
#> Columns: 16
#> $ term           <chr> "Sepal.Width", "Speciessetosa", "Speciesversicolor", "S…
#> $ variable       <chr> "Sepal.Width", "Species", "Species", "Species"
#> $ var_label      <chr> "Sepal.Width", "Species", "Species", "Species"
#> $ var_class      <chr> "numeric", "factor", "factor", "factor"
#> $ var_type       <chr> "continuous", "categorical", "categorical", "categorica…
#> $ var_nlevels    <int> NA, 3, 3, 3
#> $ contrasts      <chr> NA, "contr.treatment", "contr.treatment", "contr.treatm…
#> $ contrasts_type <chr> NA, "treatment", "treatment", "treatment"
#> $ reference_row  <lgl> NA, TRUE, FALSE, FALSE
#> $ label          <chr> "Sepal.Width", "setosa", "versicolor", "virginica"
#> $ estimate       <dbl> 0.8035609, NA, 1.4587431, 1.9468166
#> $ std.error      <dbl> 0.1063390, NA, 0.1121079, 0.1000150
#> $ statistic      <dbl> 7.556598, NA, 13.011954, 19.465255
#> $ p.value        <dbl> 4.187340e-12, NA, 3.478232e-26, 2.094475e-42
#> $ conf.low       <dbl> 0.5933983, NA, 1.2371791, 1.7491525
#> $ conf.high      <dbl> 1.013723, NA, 1.680307, 2.144481

ex4 <- mod2 %>%
  # perform initial tidying of model
  tidy_and_attach(exponentiate = TRUE) %>%
  # add variable labels, including a custom value for age
  tidy_add_variable_labels(labels = c(age = "Age in years")) %>%
  # add reference rows for categorical variables
  tidy_add_reference_rows() %>%
  # add a, estimate value of reference terms
  tidy_add_estimate_to_reference_rows(exponentiate = TRUE) %>%
  # add header rows for categorical variables
  tidy_add_header_rows()
ex4
#> # A tibble: 20 × 17
#>    term   variable var_label var_class var_type var_nlevels header_row contrasts
#>    <chr>  <chr>    <chr>     <chr>     <chr>          <int> <lgl>      <chr>    
#>  1 (Inte… (Interc… (Interce… <NA>      interce…          NA NA         <NA>     
#>  2 <NA>   age      Age in y… nmatrix.3 continu…          NA TRUE       <NA>     
#>  3 poly(… age      Age in y… nmatrix.3 continu…          NA FALSE      <NA>     
#>  4 poly(… age      Age in y… nmatrix.3 continu…          NA FALSE      <NA>     
#>  5 poly(… age      Age in y… nmatrix.3 continu…          NA FALSE      <NA>     
#>  6 <NA>   stage    T Stage   factor    categor…           4 TRUE       contr.tr…
#>  7 stage1 stage    T Stage   factor    categor…           4 FALSE      contr.tr…
#>  8 stage2 stage    T Stage   factor    categor…           4 FALSE      contr.tr…
#>  9 stage3 stage    T Stage   factor    categor…           4 FALSE      contr.tr…
#> 10 stage4 stage    T Stage   factor    categor…           4 FALSE      contr.tr…
#> 11 <NA>   grade    Grade     factor    categor…           3 TRUE       contr.sum
#> 12 grade1 grade    Grade     factor    categor…           3 FALSE      contr.sum
#> 13 grade2 grade    Grade     factor    categor…           3 FALSE      contr.sum
#> 14 grade3 grade    Grade     factor    categor…           3 FALSE      contr.sum
#> 15 <NA>   trt      Chemothe… character dichoto…           2 TRUE       contr.tr…
#> 16 trtDr… trt      Chemothe… character dichoto…           2 FALSE      contr.tr…
#> 17 trtDr… trt      Chemothe… character dichoto…           2 FALSE      contr.tr…
#> 18 <NA>   grade:t… Grade * … <NA>      interac…          NA TRUE       <NA>     
#> 19 grade… grade:t… Grade * … <NA>      interac…          NA FALSE      <NA>     
#> 20 grade… grade:t… Grade * … <NA>      interac…          NA FALSE      <NA>     
#> # ℹ 9 more variables: contrasts_type <chr>, reference_row <lgl>, label <chr>,
#> #   estimate <dbl>, std.error <dbl>, statistic <dbl>, p.value <dbl>,
#> #   conf.low <dbl>, conf.high <dbl>
dplyr::glimpse(ex4)
#> Rows: 20
#> Columns: 17
#> $ term           <chr> "(Intercept)", NA, "poly(age, 3)1", "poly(age, 3)2", "p…
#> $ variable       <chr> "(Intercept)", "age", "age", "age", "age", "stage", "st…
#> $ var_label      <chr> "(Intercept)", "Age in years", "Age in years", "Age in …
#> $ var_class      <chr> NA, "nmatrix.3", "nmatrix.3", "nmatrix.3", "nmatrix.3",…
#> $ var_type       <chr> "intercept", "continuous", "continuous", "continuous", …
#> $ var_nlevels    <int> NA, NA, NA, NA, NA, 4, 4, 4, 4, 4, 3, 3, 3, 3, 2, 2, 2,…
#> $ header_row     <lgl> NA, TRUE, FALSE, FALSE, FALSE, TRUE, FALSE, FALSE, FALS…
#> $ contrasts      <chr> NA, NA, NA, NA, NA, "contr.treatment(base=3)", "contr.t…
#> $ contrasts_type <chr> NA, NA, NA, NA, NA, "treatment", "treatment", "treatmen…
#> $ reference_row  <lgl> NA, NA, NA, NA, NA, NA, FALSE, FALSE, TRUE, FALSE, NA, …
#> $ label          <chr> "(Intercept)", "Age in years", "Age in years", "Age in …
#> $ estimate       <dbl> 0.5266376, NA, 20.2416394, 1.2337899, 0.4931553, NA, 1.…
#> $ std.error      <dbl> 0.4130930, NA, 2.3254455, 2.3512842, 2.3936657, NA, 0.4…
#> $ statistic      <dbl> -1.55229592, NA, 1.29340459, 0.08935144, -0.29533409, N…
#> $ p.value        <dbl> 0.1205914, NA, 0.1958712, 0.9288026, 0.7677387, NA, 0.9…
#> $ conf.low       <dbl> 0.227717775, NA, 0.225454425, 0.007493208, 0.004745694,…
#> $ conf.high      <dbl> 1.164600, NA, 2315.587655, 100.318341, 74.226179, NA, 2…

broom.helpers's People

Contributors

Stargazers

Watchers

Forkers

raphidoc matthieurouland shaoyoucheng priestleymuhindo ddsjoberg hadley michaelchirico

broom.helpers's Issues

Improve error messaging

What is your opinion on improving the error messaging in situations like the one below: where the model is created within an apply() or map() setting and the stats::*() functions called on model objects fail.

library(tidyverse)
library(gtsummary)
#> #Uighur
library(survival)

# Set up map statement to create different models
tibble(grade = c("I", "II", "III")) %>%
  mutate(df_model = map(grade, ~ trial %>% filter(grade == ..1))) %>%
  mutate(
    mv_formula_char = "Surv(ttdeath, death) ~ trt + age + marker",
    mv_formula = map(mv_formula_char, ~ as.formula(.x)),
    mv_model_form =
      map2(
        mv_formula, df_model,
        ~ coxph(..1, data = ..2)
      ),
    mv_tbl_form =
      map(
        mv_model_form,
        ~ broom.helpers::tidy_plus_plus(..1, exponentiate = TRUE)
      )
  )
#> Error: Problem with `mutate()` input `mv_tbl_form`.
#> x the ... list contains fewer than 2 elements
#> i Input `mv_tbl_form` is `map(mv_model_form, ~broom.helpers::tidy_plus_plus(..1, exponentiate = TRUE))`.

^{Created on 2020-08-31 by the reprex package (v0.3.0)}

In gtsummary, we added an error message like this: ddsjoberg/gtsummary#231

`var_class` incorrect for integers

In the example below, am is an integer class variable. But in the broom.helpers tibble, the class is indicated as integer.

library(broom.helpers)

tibble::as_tibble(mtcars) %>%
  dplyr::mutate(
    am = as.integer(am),
    vs = as.logical(vs)
  ) %>%
  {lm(mpg ~ am + vs + hp + factor(cyl), .)} %>%
  tidy_and_attach() %>%
  tidy_identify_variables() 
#> # A tibble: 6 x 8
#>   term      variable   var_class var_type  estimate std.error statistic  p.value
#>   <chr>     <chr>      <chr>     <chr>        <dbl>     <dbl>     <dbl>    <dbl>
#> 1 (Interce~ <NA>       <NA>      intercept  24.4       2.57      9.51   6.01e-10
#> 2 am        am         numeric   continuo~   5.16      1.45      3.55   1.49e- 3
#> 3 vsTRUE    vs         logical   categori~   2.57      1.94      1.32   1.97e- 1
#> 4 hp        hp         numeric   continuo~  -0.0469    0.0145   -3.23   3.35e- 3
#> 5 factor(c~ factor(cy~ factor    categori~  -2.65      1.80     -1.48   1.52e- 1
#> 6 factor(c~ factor(cy~ factor    categori~  -0.277     3.49     -0.0795 9.37e- 1

^{Created on 2020-10-08 by the reprex package (v0.3.0)}

label column filled when using `Hmisc::rcspline.eval()` and `poly()` , but not for other categorical variables

Everything so far is looking amazing!

I noted that there is inconsistent application of the label for categorical variables (no value in the label column for the header row), and output for results from Hmisc::rcspline.eval() and poly() (the label column does have a value).

Obviously not a big deal, but can be worth addressing to remain consistent.

library(broom.helpers)
library(gtsummary)

mod <- glm(age ~ grade + Hmisc::rcspline.eval(marker), data = trial, family = gaussian)

mod %>%
  tidy_and_attach() %>%
  tidy_identify_variables() %>%
  tidy_add_variable_labels() %>%
  tidy_add_reference_rows() %>%
  tidy_add_header_rows() %>%
  select(1:2, label) 
#> # A tibble: 9 x 3
#>   term                       variable                  label                    
#>   <chr>                      <chr>                     <chr>                    
#> 1 (Intercept)                <NA>                      (Intercept)              
#> 2 <NA>                       grade                     <NA>                     
#> 3 grade_ref                  grade                     I                        
#> 4 gradeII                    grade                     II                       
#> 5 gradeIII                   grade                     III                      
#> 6 <NA>                       Hmisc::rcspline.eval(mar~ Hmisc::rcspline.eval(mar~
#> 7 Hmisc::rcspline.eval(mark~ Hmisc::rcspline.eval(mar~ Hmisc::rcspline.eval(mar~
#> 8 Hmisc::rcspline.eval(mark~ Hmisc::rcspline.eval(mar~ Hmisc::rcspline.eval(mar~
#> 9 Hmisc::rcspline.eval(mark~ Hmisc::rcspline.eval(mar~ Hmisc::rcspline.eval(mar~

^{Created on 2020-08-14 by the reprex package (v0.3.0)}

Error when identify variables run after remove intercept

There is a merging error when the the remove intercept function is run before the identify variables function...there are two columns for var_nlevels).

library(broom.helpers)

lm(age ~ marker, gtsummary::trial) %>%
  tidy_and_attach() %>%
  tidy_remove_intercept() %>%
  tidy_identify_variables() # looks like a merging error (two cols for var_nlevels)
#> # A tibble: 1 x 10
#>   term  variable var_class var_type var_nlevels.x estimate std.error statistic
#>   <chr> <chr>    <chr>     <chr>            <int>    <dbl>     <dbl>     <dbl>
#> 1 mark~ marker   numeric   continu~            NA  -0.0545      1.26   -0.0434
#> # ... with 2 more variables: p.value <dbl>, var_nlevels.y <int>

^{Created on 2020-10-15 by the reprex package (v0.3.0)}

Improve examples in the documentation

Once integrated in gtsummary and GGally, update README

Missing label for nnet::multinom() categorical variables

I noticed a labelling error in nnet::multinom(). The label column is missing the variable label for the stage variable in the example below.

library(gtsummary)
#> #BlackLivesMatter

nnet::multinom(grade ~ age + stage, data = trial, trace = FALSE) %>%
  broom.helpers::tidy_plus_plus(add_header_rows = TRUE) %>%
  dplyr::select(y.level, variable, term, var_label, label, estimate)
#> # A tibble: 12 x 6
#>    y.level variable term    var_label label estimate
#>    <chr>   <chr>    <chr>   <chr>     <chr>    <dbl>
#>  1 II      age      age     Age       Age    0.00813
#>  2 II      stage    <NA>    T Stage   <NA>  NA      
#>  3 II      stage    stageT1 T Stage   T1     0      
#>  4 II      stage    stageT2 T Stage   T2    -0.497  
#>  5 II      stage    stageT3 T Stage   T3    -1.04   
#>  6 II      stage    stageT4 T Stage   T4    -0.634  
#>  7 III     age      age     Age       Age    0.0110 
#>  8 III     stage    <NA>    T Stage   <NA>  NA      
#>  9 III     stage    stageT1 T Stage   T1     0      
#> 10 III     stage    stageT2 T Stage   T2     0.128  
#> 11 III     stage    stageT3 T Stage   T3    -0.214  
#> 12 III     stage    stageT4 T Stage   T4     0.291

^{Created on 2020-10-04 by the reprex package (v0.3.0)}

Include `var_label` in subsequent calls

If tidy_add_variable_labels() is run after tidy_add_reference_rows() labels are correctly filled correctly.

library(broom.helpers)
library(gtsummary)
library(survival)

mod <- lm(ttdeath ~ grade, trial)

mod %>%
  tidy_and_attach() %>%
  tidy_identify_variables() %>%
  tidy_add_reference_rows() %>%
  tidy_add_variable_labels() %>%
  tidy_add_header_rows() %>%
  select(term, variable, var_label, label)
#> # A tibble: 5 x 4
#>   term        variable var_label   label      
#>   <chr>       <chr>    <chr>       <chr>      
#> 1 (Intercept) <NA>     (Intercept) (Intercept)
#> 2 <NA>        grade    Grade       Grade      
#> 3 grade_ref   grade    Grade       I          
#> 4 gradeII     grade    Grade       II         
#> 5 gradeIII    grade    Grade       III

But, if it is called in the opposite order, the var_label is does not fill all rows associated with the variable.

mod %>%
  tidy_and_attach() %>%
  tidy_identify_variables() %>%
  tidy_add_variable_labels() %>%
  tidy_add_reference_rows() %>%
  tidy_add_header_rows() %>%
  select(term, variable, var_label, label)
#> # A tibble: 5 x 4
#>   term        variable var_label   label      
#>   <chr>       <chr>    <chr>       <chr>      
#> 1 (Intercept) <NA>     (Intercept) (Intercept)
#> 2 <NA>        grade    <NA>        <NA>       
#> 3 grade_ref   grade    <NA>        I          
#> 4 gradeII     grade    Grade       II         
#> 5 gradeIII    grade    Grade       III

I think it is fine to have an order dependency of these functions, but a note passed to the users would be helpful. Or even an error like when tidy_add_variable_labels() is called after tidy_add_header_rows().

Wrap `model.frame()` in `tryCatch()` ?

Perhaps a good idea to wrap the call to stats::model.frame(model) in model_get_model_frame.R in a try catch in case the regression model does not have a method for it (like mice models).

Custom tieders

Should we add a table to the vignette with a list of compatibles models, with a note column to specify model-specific information about compatibility?

Update the new term for reference rows

When a reference row is added, rather than creating a new term "{varname}_ref", I suggest that you either keep the term name consistent with the other terms (in the example below, the new term would be "gradeI"), or leave it blank.

I know it's unlikely, but it someone had a variable with the level _ref, things would fall apart somewhere I think.

library(broom.helpers)
library(gtsummary)

mod <- glm(age ~ grade, data = trial, family = gaussian)

mod %>%
  tidy_and_attach() %>%
  tidy_identify_variables() %>%
  tidy_add_variable_labels() %>%
  tidy_add_reference_rows() %>%
  tidy_add_header_rows() %>% 
  select(variable, term, reference_row, label, header_row)
#> # A tibble: 5 x 5
#>   variable term        reference_row label       header_row
#>   <chr>    <chr>       <lgl>         <chr>       <lgl>     
#> 1 <NA>     (Intercept) NA            (Intercept) FALSE     
#> 2 grade    <NA>        NA            <NA>        TRUE      
#> 3 grade    grade_ref   TRUE          I           FALSE     
#> 4 grade    gradeII     FALSE         II          FALSE     
#> 5 grade    gradeIII    FALSE         III         FALSE

^{Created on 2020-08-14 by the reprex package (v0.3.0)}

Improve code coverage

cf. https://codecov.io/gh/larmarange/broom.helpers/tree/master/R

Unify the broom.helpers and gtsummary select helpers

At the moment, the broom.helpers and gtsummary select helpers are created independently. When both packages are loaded, one package will mask the others' all_*() selecting functions....which is not good! I've been thinking on a way to unify the syntax, and I think I've come up with something.

Proposed changes:

Create a universal select function, and export it. This function will help construct each of the other helpers. For example, if the function were called select_constructor(), we could define all_continuous() with the code below, which would select variable with type continuous.
```
all_continuous <- function() select_constructor("variable", "var_type", "continuous")
```

The reason for the constructor, is that I can later use it in gtsummary to easily construct selecting functions that do not apply in the broom.helpers setting. BUT, I do not need to recreate the enviornments which which we're selecting or define new scoping functions.

That brings us to the second point, we'd also need to export the scoping function so I can reuse it gtsummary
I recall a notification I received where you indicated we could add an all_interactions() selector and I think another one...but I can't find that message. I'll add that here too. With the general format, it's actually very easy to add new select functions.
You had also mentioned at some point about adding all_factor(), all_character(), etc. functions. I do not suggest you do this. Since I initially released those select functions, {tidyselect} has been updated to all for selection using predicate functions, e.g. trial %>% select(where(is.character)). It's in my plan to deprecate those functions so I do not need to support any supliferous functions.

The only front-facing changes here, will be exporting two new functions that help us write and use the selecting functions in other packages. I'll start putting together a PR.

Management of `poly()`

Add an helper to convert poly(var, 4) into var in variable and to produce more explicit term (e.g. var^1, var^2, var^3, var^4)

Consistency of args passed

This is so minor, but wanted to point it out just in case!

The tidy_plus_plus() fn accepts the arg conf.int= and also the ... which are passed to tidy_fun=. Is there a reason to include conf.int= here, but not in tidy_and_attach() for example.

There are other common tidy arguments not included, e.g. exponentiate=. To be consistent, should conf.int= argument be removed?

Release version 1.0.0

Prepare for release:

devtools::check(remote = TRUE, manual = TRUE)
devtools::check_win_devel()
Polish NEWS
Polish pkgdown reference index

Submit to CRAN:

usethis::use_version()
Update cran-comments.md
devtools::submit_cran()
Approve email

Wait for CRAN...

Accepted 🎉
Create GitHub release
Remove file CRAN-RELEASE
usethis::use_dev_version()

Add tidy_select_variables()

This function will allow to keep only certain variables in the output.

Two arguments: keep and drop.

To be added also to tidy_plus_plus()

Easier identification of dichotomous variables and all_categorical(), all_continuous(), all_dichotomous() helpers

Dear @ddsjoberg

I would like your opinion on the two following points.

First, it could be relevant to better identify dichotomous variables. An option could be to have an evolution of var_type created by tidy_identify_variables() and, for dichotomous variables, to replace the value"categorical" by "dichotomous", knowing that all dichotomous variables are also categoricals. But it could maybe have side effects in gtsummary.

An alternative could be to generate an additional column dichotomous equal to TRUE, FALSE or NA (for continuous variables).

Identifying dichotomous variables directly in tidy_identify_variables() would be useful later by simplifying the code of tidy_add_header_rows() when applying show_single_row.

Second, tidy helpers such as all_categorical(), all_continuous() and all_dichotomous() could be useful as well in broom.helpers. However, I do not know if code could be mutualised between gtsummary() and broom.helpers and if we could avoid any conflict.

As you developed these two functions and you are the one who implemented tidy selecters in broom.helpers, what do you think?

Best

Add a function `tidy_add_estimate_for_reference_rows(exponentiate = FALSE)`

For treatment and SAS contrasts, will set reference rows equal to 0
For sum contrasts, will use dummy.coef to populate the estimate of the reference row.
For other contrasts, will do nothing

Error identifying variables in multinom with sum contrast

Add strict option for functions

As a developer, it would be helpful to have the option for some broom.helper functions to fail when they cannot execute the requested action.

I am integrating broom.helpers into gtsummary now, and these two scenarios have come up so far:

When I run broom.helpers::tidy_identify_variables() if the variables cannot be identified, I would like to be able to have the function error. As it is currently written, I would need to inspect the returned object to check if the variables were indeed identified.
When I run broom.helpers::tidy_add_header_rows(show_single_row=) for a variable that cannot be put on a single row.

Perhaps the arg could be something like tidy_plus_plus(strict=)? It would be similar to how purrr had pluck() and chuck()?

tidy_add_header_rows() error with continuous * categorical interaction

When tidy_add_header_rows() is run on the model below, the interaction term should be on two rows. It should have a header row with label column equal to factor(response) * Marker Level (ng/mL), and a second row with label column 1 * Marker Level (ng/mL) with the estimate.

library(gtsummary)
library(broom.helpers)
lm(age ~ factor(response) * marker, trial) %>%
  broom.helpers::tidy_and_attach() %>%
  broom.helpers::tidy_identify_variables() %>%
  broom.helpers::tidy_add_variable_labels() %>%
  broom.helpers::tidy_add_reference_rows() %>%
  broom.helpers::tidy_add_header_rows() %>%
  select(variable, var_type, var_label, label, estimate)
#> # A tibble: 6 x 5
#>   variable          var_type  var_label                 label           estimate
#>   <chr>             <chr>     <chr>                     <chr>              <dbl>
#> 1 <NA>              intercept (Intercept)               (Intercept)        44.0 
#> 2 factor(response)  categori~ factor(response)          factor(respons~    NA   
#> 3 factor(response)  categori~ factor(response)          0                  NA   
#> 4 factor(response)  categori~ factor(response)          1                   9.12
#> 5 marker            continuo~ Marker Level (ng/mL)      Marker Level (~     2.01
#> 6 factor(response)~ interact~ factor(response) * Marke~ 1 * Marker Lev~    -5.34

^{Created on 2020-09-01 by the reprex package (v0.3.0)}

Add an option `no_reference_row` to `tidy_add_reference_rows()`

To avoid adding a reference row to certain variables (could be useful in some cases, e.g. a forest plot), in particular when no header rows are added

Variable names for models with no model.frame method

When there is no model.frame() method, the user sees a very informative message. (super helpful!)

The resulting table has column for variable, but the columns are NA. Can we add the term as the variable for these models? In gtsummary, we use the variable name to do further manipulation (and in broom.helpers too), but with no name these variables cannot be selected.

I know the term is not the proper variable name, but I think the printed message is enough of a cue to users that the original variable names are not available.

library(broom.helpers)
library(gtsummary)

# make up some interval censored data 
trial2 <-
  trial %>% 
  dplyr::mutate(
    lint = dplyr::case_when(
      death == 1 ~ runif(200) + 2,
      death == 0 ~ ttdeath
    ),
    rint = dplyr::case_when(
      death == 1 ~ ttdeath,
      death == 0 ~ Inf
    )
  )

# Write a custom tidier
tidy_ic_sp <- function(x, exponentiate =  FALSE, conf.level = 0.95, ...) {
  tidy <-
    tibble::tibble(
      term = names(x[["coefficients"]]),
      estimate = x[["coefficients"]],
      std.error = sqrt(diag(x[["var"]])),
      statistic = summary(x)$summaryParameters[, "z-value"],
      p.value = summary(x)$summaryParameters[, "p"],
      conf.low = confint(x, level = conf.level)[, 1],
      conf.high = confint(x, level = conf.level)[, 2]
    )
  
  if (exponentiate == TRUE)
    tidy <- dplyr::mutate_at(tidy, vars(estimate, conf.low, conf.high), exp)
  
  tidy
}

# fit the interval-censored survival model with icenReg::ic_sp()
icenReg::ic_sp(
  survival::Surv(lint, rint, type = "interval2") ~ trt,
  model = "ph",
  bs_samples = 3,
  data = trial2
) %>%
  # tidy up with broom.helpers
  tidy_and_attach(tidy_fun = tidy_ic_sp) %>%
  tidy_identify_variables() %>%
  tidy_add_variable_labels() %>%
  tidy_add_header_rows() %>%
  select(term, variable, var_label, label, estimate)
#> x Unable to identify the list of variables.
#>   
#>   This is usually due to an error calling `stats::model.frame(x)`or `stats::model.matrix(x)`.
#>   It could be the case if that type of model does not implement these methods.
#>   Rarely, this error may occur if the model object was created within
#>   a functional programming framework (e.g. using `lappy()`, `purrr::map()`, etc.).
#> # A tibble: 1 x 5
#>   term      variable var_label label     estimate
#>   <chr>     <chr>    <chr>     <chr>        <dbl>
#> 1 trtDrug B <NA>     trtDrug B trtDrug B    0.160

^{Created on 2020-10-19 by the reprex package (v0.3.0)}

`survival::coxph()` strips labels from categorical variables....but you can access them.

In the example below, the variable grade does indeed have a label, "Grade". But you can get it!

Can we please update the internals to grab the label using the method below if not found in the typical manner?

library(broom.helpers)
library(gtsummary)
library(survival)
#> Warning: package 'survival' was built under R version 4.0.2

mod <- coxph(Surv(ttdeath, death) ~ grade, trial)

mod %>%
  tidy_and_attach() %>%
  tidy_identify_variables() %>%
  tidy_add_reference_rows() %>%
  tidy_add_variable_labels() %>%
  tidy_add_header_rows() %>%
  select(term, variable, var_label, label)
#> # A tibble: 4 x 4
#>   term      variable var_label label
#>   <chr>     <chr>    <chr>     <chr>
#> 1 <NA>      grade    grade     grade
#> 2 grade_ref grade    grade     I    
#> 3 gradeII   grade    grade     II   
#> 4 gradeIII  grade    grade     III

# get the grade label from a coxph object
model.frame.default(mod)$grade %>% attr("label")
#> [1] "Grade"

^{Created on 2020-08-14 by the reprex package (v0.3.0)}

model attribute is lost in some cases

By the way, if the model attribute is lost in some cases, should we add a call to tidy_attach_model at the end of each tidy_* function, by security?

Originally posted by @larmarange in #13 (comment)

Ref row label not added

When one runs tidy_add_reference_rows() after tidy_add_term_labels() the reference row label is not shown. It makes sense why this occurs, but I think at minimum a message to users would be helpful to alert them to run the functions in a different order to get desired output.

library(broom.helpers)

# build regression model
lm(mpg ~ factor(cyl) + hp, mtcars) %>%
  # perform initial tidying of model
  tidy_and_attach() %>%
  # add the cyl levels
  tidy_add_term_labels() %>%
  # add reference row cyl
  tidy_add_reference_rows() %>%
  knitr::kable()

term	variable	var_class	var_type	estimate	std.error	statistic	p.value	var_label	contrasts	label	reference_row
(Intercept)	NA	NA	intercept	28.6501182	1.5877870	18.044056	0.0000000	(Intercept)	NA	(Intercept)	NA
factor(cyl)4	factor(cyl)	factor	categorical	NA	NA	NA	NA	factor(cyl)	contr.treatment	NA	TRUE
factor(cyl)6	factor(cyl)	factor	categorical	-5.9676551	1.6392776	-3.640418	0.0010921	factor(cyl)	contr.treatment	6	FALSE
factor(cyl)8	factor(cyl)	factor	categorical	-8.5208508	2.3260749	-3.663188	0.0010286	factor(cyl)	contr.treatment	8	FALSE
hp	hp	numeric	continuous	-0.0240388	0.0154079	-1.560163	0.1299540	hp	NA	hp	NA

^{Created on 2020-08-27 by the reprex package (v0.3.0)}

Add broom.helpers class to tibbles?

Should we add a broom.helpers class to the tibbles? I think this can help down the line ensuring we're working the the correct object types.

class(x) <- c("broom.helpers", class(x))

Should variable be populated for intercept terms? or stay NA as current?

I started re-writing the broom.helpers section of tbl_regression() to use tidy_plus_plus() instead of the individual functions. One of the reasons to use plus-plus over a series of other tidy_*() functions, is that it will be easier for me to give users access to the other arguments in tidy_plus_plus() so they can change the resulting table if they like (e.g. adding the informative contrast labels @gorkang are working on).

One sticking point is that I treat the intercept like a variable. For example, users can change the intercept label using tbl_regression(label = list("(Intercept)" ~ "b0", age ~ "Patient Age")). Is there a way where the gtsummary API does not change, and I can use tidy_plus_plus()?

My first thought was to simply have an option in tidy_identify_variables() that populates the intercept variable column with the term name. But I am not sure if this will cause problems with other subsequent functions. What do you think?

Once released, add a DOI (through Zenodo)

Add a `model_get_model_frame()` method for mice objects

The mice package does not include a model.frame() method for the resulting regression models from multiply imputed data sets.

Would you be ok adding one here? I need to look up the exact code, but it'll be something like this (i can add if you're ok with it)

#' @export
#' @rdname model_get_model_frame
model_get_model_frame.mipo <- function(model) {
  # add check that the mice package is installed
  
  # grab input mice data

  # extract a single dataset for our use of finding labels
   mice::complete(...)
}

Add a function model_get_coefficients_type()

Inspired by gtsummary:::estimate_header(), add a function to identify model type and coefficient type.

An additional function tidy_identify_model_type() could add model_type and coefficient_type as attributes to the results.

It will be useful for the redesign of GGally::ggcoef

To @ddsjoberg , let me know if you think it could be relevant for gtsummary as well. I know that in gtsummary you also manage corresponding footnotes and translation. But I do not think that this last part is in the scope of broom.helpers.

clean bacticks in variable names for interaction only terms

Test

lm(hp ~ factor(`number + cylinders`) : `miles per galon` + factor(`type of transmission`), mtcars %>% rename(`miles per galon` = mpg, `type of transmission` = am, `number + cylinders` = cyl))

miles per galon should have ticks removed

Column ordering suggestion

It would be helpful to have a standardized order the columns appear as additional information is added to the tidy tibble. For example, all the original columns could remain on the right side of the tibble, and all new columns would be added to the left side of the tibble.

The ordering of the columns (no matter the order the functions are called) would also be standardized. The order would be selected to make it easier to digest the information in the table. For example, when the variable is added, rather than it perhaps ending up in the middle of the tibble, it would always be near the beginning. Below is a suggested ordering:

library(broom.helpers)

lm(mpg ~ factor(cyl) + hp, mtcars) %>%
  tidy_plus_plus() %>% 
  dplyr::select(any_of(c("variable", "var_label", "var_class", "var_type", 
                         "contrasts", "reference_row", "label")), 
                everything()) %>%
  knitr::kable()

variable	var_label	var_class	var_type	contrasts	reference_row	label	term	estimate	std.error	statistic	p.value	conf.low	conf.high
factor(cyl)	factor(cyl)	factor	categorical	contr.treatment	TRUE	4	factor(cyl)4	NA	NA	NA	NA	NA	NA
factor(cyl)	factor(cyl)	factor	categorical	contr.treatment	FALSE	6	factor(cyl)6	-5.9676551	1.6392776	-3.640418	0.0010921	-9.3255631	-2.6097471
factor(cyl)	factor(cyl)	factor	categorical	contr.treatment	FALSE	8	factor(cyl)8	-8.5208508	2.3260749	-3.663188	0.0010286	-13.2855993	-3.7561022
hp	hp	numeric	continuous	NA	NA	hp	hp	-0.0240388	0.0154079	-1.560163	0.1299540	-0.0556005	0.0075228

^{Created on 2020-08-27 by the reprex package (v0.3.0)}

A simple re-ordering function could be added to the end of each tidy_*() function.

order_tidy_columns <- function(x) {
  dplyr::select(x, 
                any_of(c("variable", "var_label", "var_class", "var_type", 
                         "contrasts", "reference_row", "label")), 
                everything())
}

rename keep parameter to include?

@ddsjoberg

in tidy_select_variables(), should we rename keep as include for consistency?

What do you think?

`tidy_add_variable_labels()` error with interaction only model

The model below only has an interaction term (no main effects), and the variable label is not correct.

library(gtsummary)
library(broom.helpers)
lm(age ~ factor(response):marker, trial) %>%
  broom.helpers::tidy_and_attach() %>%
  broom.helpers::tidy_identify_variables() %>%
  broom.helpers::tidy_add_variable_labels() %>%
  select(variable, var_type, var_label, estimate)
#> # A tibble: 3 x 4
#>   variable                var_type    var_label                 estimate
#>   <chr>                   <chr>       <chr>                        <dbl>
#> 1 <NA>                    intercept   (Intercept)                 46.6  
#> 2 factor(response):marker interaction NA * Marker Level (ng/mL)    0.396
#> 3 factor(response):marker interaction NA * Marker Level (ng/mL)    0.102

^{Created on 2020-09-01 by the reprex package (v0.3.0)}

Also, if we add a tidy_add_term_labels() the label is also wrong, but in a different way.

library(gtsummary)
library(broom.helpers)
lm(age ~ factor(response):marker, trial) %>%
  broom.helpers::tidy_and_attach() %>%
  broom.helpers::tidy_identify_variables() %>%
  broom.helpers::tidy_add_term_labels() %>%
  broom.helpers::tidy_add_variable_labels() %>%
  select(variable, var_type, var_label, label, estimate)
#> # A tibble: 3 x 5
#>   variable               var_type    var_label               label      estimate
#>   <chr>                  <chr>       <chr>                   <chr>         <dbl>
#> 1 <NA>                   intercept   (Intercept)             (Intercep~   46.6  
#> 2 factor(response):mark~ interaction NA * Marker Level (ng/~ NA * NA       0.396
#> 3 factor(response):mark~ interaction NA * Marker Level (ng/~ NA * NA       0.102

^{Created on 2020-09-01 by the reprex package (v0.3.0)}

`tidy_remove_intercept()` removes terms from model (when they are named horribly)

This is VERY much an edge case, but wanted to let you know. It seems that if a variable name has a + in it, tidy_remove_intercept() will remove both the intercept and the variable from the model. If this is a complicated fix, perhaps just a message to the user, "more than one row was removed from the table. possible error occurred likely due to unusual naming conventions used for terms."

library(gtsummary)

trial2 <- 
  trial %>% 
  dplyr::mutate(`treatment +name` = trt)


glm(response ~ `treatment +name`, 
    trial2, 
    family = binomial(link = "logit")) %>%
  broom.helpers::tidy_and_attach() %>%
  broom.helpers::tidy_remove_intercept()
#> # A tibble: 0 x 8
#> # ... with 8 variables: term <chr>, variable <chr>, var_class <chr>,
#> #   var_type <chr>, estimate <dbl>, std.error <dbl>, statistic <dbl>,
#> #   p.value <dbl>

^{Created on 2020-10-02 by the reprex package (v0.3.0)}

Add a vignette

Add quiet option

Any function that prints messages should have a quiet= option. This could be helpful to devs who do not want the broom.helpers messages to print.

`show_single_row=` not working for categorical-continuous interaction

In the example below, I am requesting the interaction term "factor(response):marker" be printed on a single row, but it is being ignored.

library(broom.helpers)
library(gtsummary)

lm(age ~ factor(response) * marker, trial) %>%
  tidy_and_attach() %>%
  tidy_identify_variables() %>%
  tidy_add_reference_rows() %>%
  tidy_add_variable_labels() %>%
  tidy_add_header_rows(show_single_row = "factor(response):marker") %>%
  knitr::kable()

term	variable	var_label	var_class	var_type	header_row	contrasts	reference_row	label	estimate	std.error	statistic	p.value
(Intercept)	NA	(Intercept)	NA	intercept	NA	NA	NA	(Intercept)	43.985685	1.906507	23.071342	0.0000000
NA	factor(response)	factor(response)	factor	categorical	TRUE	contr.treatment	NA	factor(response)	NA	NA	NA	NA
factor(response)0	factor(response)	factor(response)	factor	categorical	FALSE	contr.treatment	TRUE	0	NA	NA	NA	NA
factor(response)1	factor(response)	factor(response)	factor	categorical	FALSE	contr.treatment	FALSE	1	9.117623	3.536300	2.578294	0.0107814
marker	marker	Marker Level (ng/mL)	numeric	continuous	NA	NA	NA	Marker Level (ng/mL)	2.007188	1.609824	1.246836	0.2141828
NA	factor(response):marker	factor(response) * Marker Level (ng/mL)	NA	interaction	TRUE	NA	NA	factor(response) * Marker Level (ng/mL)	NA	NA	NA	NA
factor(response)1:marker	factor(response):marker	factor(response) * Marker Level (ng/mL)	NA	interaction	FALSE	NA	NA	1 * Marker Level (ng/mL)	-5.337195	2.647510	-2.015930	0.0453914

^{Created on 2020-09-03 by the reprex package (v0.3.0)}

Add pkgdown website

Improve pkdown website

header row missing after running `tidy_plus_plus()`

The header row for cyl is missing when using tidy_plus_plus(), but the documentation indicates it should have been added.

library(broom.helpers)
# no header row for cyl 
lm(mpg ~ factor(cyl), mtcars) %>%
  tidy_plus_plus()
#> # A tibble: 3 x 14
#>   term  variable var_class var_type estimate std.error statistic   p.value
#>   <chr> <chr>    <chr>     <chr>       <dbl>     <dbl>     <dbl>     <dbl>
#> 1 fact~ factor(~ factor    categor~    NA        NA        NA    NA       
#> 2 fact~ factor(~ factor    categor~    -6.92      1.56     -4.44  1.19e- 4
#> 3 fact~ factor(~ factor    categor~   -11.6       1.30     -8.90  8.57e-10
#> # ... with 6 more variables: conf.low <dbl>, conf.high <dbl>, contrasts <chr>,
#> #   reference_row <lgl>, var_label <chr>, label <chr>

# has header row
lm(mpg ~ factor(cyl), mtcars) %>%
  tidy_and_attach() %>%
  tidy_add_reference_rows() %>%
  tidy_add_header_rows() 
#> # A tibble: 5 x 13
#>   term  variable var_class var_type estimate std.error statistic   p.value
#>   <chr> <chr>    <chr>     <chr>       <dbl>     <dbl>     <dbl>     <dbl>
#> 1 (Int~ <NA>     <NA>      interce~    26.7      0.972     27.4   2.69e-22
#> 2 <NA>  factor(~ factor    categor~    NA       NA         NA    NA       
#> 3 fact~ factor(~ factor    categor~    NA       NA         NA    NA       
#> 4 fact~ factor(~ factor    categor~    -6.92     1.56      -4.44  1.19e- 4
#> 5 fact~ factor(~ factor    categor~   -11.6      1.30      -8.90  8.57e-10
#> # ... with 5 more variables: contrasts <chr>, reference_row <lgl>,
#> #   var_label <chr>, label <chr>, header_row <lgl>

^{Created on 2020-08-17 by the reprex package (v0.3.0)}

Warning is printed for intercept only models

When I run an intercept only model, we get returned tibble but also a warning.

library(broom.helpers)
lm(mpg ~ 1, mtcars) %>%
  broom.helpers::tidy_and_attach() %>%
  broom.helpers::tidy_identify_variables() %>%
  broom.helpers::tidy_add_header_rows() 
#> Warning in min(.data$rank): no non-missing arguments to min; returning Inf
#> # A tibble: 1 x 12
#>   term  variable var_label var_class var_type header_row contrasts label
#>   <chr> <chr>    <chr>     <chr>     <chr>    <lgl>      <chr>     <chr>
#> 1 (Int~ <NA>     (Interce~ <NA>      interce~ NA         <NA>      (Int~
#> # ... with 4 more variables: estimate <dbl>, std.error <dbl>, statistic <dbl>,
#> #   p.value <dbl>

^{Created on 2020-09-01 by the reprex package (v0.3.0)}

Add `label=` argument to `tidy_add_header_rows()`

Is there a way to modify the header row label? I didn't immediately see it. If there is not, one way to solve the this is by adding an argument to tidy_add_header_rows(label=).

Check goodpractice::goodpractice()

Add message when user requests single row for variable that cannot be put on a single row

The model below includes factor(cyl) which is 3 levels. When we request that it is displayed on a single row, nothing happens (because it can't be shown on a single row), and there is not message about the command being ignored.

A message to the user in this case would be helpful.

library(broom.helpers)
lm(mpg ~ hp + factor(cyl) + factor(am), mtcars) %>%
  broom.helpers::tidy_and_attach() %>%
  broom.helpers::tidy_identify_variables() %>%
  broom.helpers::tidy_add_header_rows(show_single_row = c("factor(am)", "factor(cyl)")) 
#> # A tibble: 6 x 12
#>   term  variable var_label var_class var_type header_row contrasts label
#>   <chr> <chr>    <chr>     <chr>     <chr>    <lgl>      <chr>     <chr>
#> 1 (Int~ <NA>     (Interce~ <NA>      interce~ NA         <NA>      (Int~
#> 2 hp    hp       hp        numeric   continu~ NA         <NA>      hp   
#> 3 <NA>  factor(~ factor(c~ factor    categor~ TRUE       contr.tr~ fact~
#> 4 fact~ factor(~ factor(c~ factor    categor~ FALSE      contr.tr~ 6    
#> 5 fact~ factor(~ factor(c~ factor    categor~ FALSE      contr.tr~ 8    
#> 6 fact~ factor(~ factor(a~ factor    categor~ NA         contr.tr~ 1    
#> # ... with 4 more variables: estimate <dbl>, std.error <dbl>, statistic <dbl>,
#> #   p.value <dbl>

^{Created on 2020-09-01 by the reprex package (v0.3.0)}

`tidy_add_reference_rows()` erroneously adds reference row to interaction-only model

library(broom.helpers)
library(gtsummary)
#> #BlackLivesMatter

lm(age ~ factor(response):marker, trial) %>%
  tidy_and_attach() %>%
  tidy_identify_variables() %>%
  tidy_add_reference_rows() %>%
  knitr::kable()

term	variable	var_class	var_type	contrasts	reference_row	estimate	std.error	statistic	p.value
(Intercept)	NA	NA	intercept	NA	NA	46.6357738	1.632164	28.5729753	0.0000000
factor(response)0:marker	factor(response):marker	NA	interaction	NA	NA	0.3957856	1.507993	0.2624585	0.7932857
factor(response)1:marker	factor(response):marker	NA	interaction	NA	NA	0.1015807	1.653558	0.0614316	0.9510877
factor(response)0	factor(response)	NA	NA	NA	TRUE	NA	NA	NA	NA

^{Created on 2020-09-03 by the reprex package (v0.3.0)}

Customize categorical term labels with a glue pattern

Maybe, a feature that could be added in broom.helpers (and therefore also implemented in gtsummary) could be a function tidy_rename_categorical_terms() that would allow to do the type of renaming you want, but after model computation and at the moment the table is built. For example:

mod %>% tidy_and_attach() %>% tidy_rename_categorical_terms(pattern = "{variable} [{term}-{reference}]")
You would be able to choose whatever pattern you want.

Cf. ddsjoberg/gtsummary#677

Note: a second argument should allow to select which variables to rename.

Release broom.helpers 1.1.0

Prepare for release:

devtools::check(remote = TRUE, manual = TRUE)
revdepcheck::revdep_reset()
revdepcheck::revdep_check(num_workers = 4)
Polish NEWS
Polish pkgdown reference index

Submit to CRAN:

usethis::use_version()
Update cran-comments.md
devtools::submit_cran() (CRAN team on vacation until August 24)
Approve email

Wait for CRAN...

Accepted 🎉
usethis::use_github_release()
Remove file CRAN-RELEASE
usethis::use_dev_version()

larmarange / broom.helpers Goto Github PK

broom.helpers's Introduction

broom.helpers

Installation & Documentation

Examples

all-in-one wrapper

fine control

broom.helpers's People

Contributors

Stargazers

Watchers

Forkers

broom.helpers's Issues

Recommend Projects

Recommend Topics

Recommend Org