Code Monkey home page Code Monkey logo

broom.helpers's People

Contributors

actions-user avatar ddsjoberg avatar jerryekohe avatar larmarange avatar michaelchirico avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

broom.helpers's Issues

Missing label for nnet::multinom() categorical variables

I noticed a labelling error in nnet::multinom(). The label column is missing the variable label for the stage variable in the example below.

library(gtsummary)
#> #BlackLivesMatter

nnet::multinom(grade ~ age + stage, data = trial, trace = FALSE) %>%
  broom.helpers::tidy_plus_plus(add_header_rows = TRUE) %>%
  dplyr::select(y.level, variable, term, var_label, label, estimate)
#> # A tibble: 12 x 6
#>    y.level variable term    var_label label estimate
#>    <chr>   <chr>    <chr>   <chr>     <chr>    <dbl>
#>  1 II      age      age     Age       Age    0.00813
#>  2 II      stage    <NA>    T Stage   <NA>  NA      
#>  3 II      stage    stageT1 T Stage   T1     0      
#>  4 II      stage    stageT2 T Stage   T2    -0.497  
#>  5 II      stage    stageT3 T Stage   T3    -1.04   
#>  6 II      stage    stageT4 T Stage   T4    -0.634  
#>  7 III     age      age     Age       Age    0.0110 
#>  8 III     stage    <NA>    T Stage   <NA>  NA      
#>  9 III     stage    stageT1 T Stage   T1     0      
#> 10 III     stage    stageT2 T Stage   T2     0.128  
#> 11 III     stage    stageT3 T Stage   T3    -0.214  
#> 12 III     stage    stageT4 T Stage   T4     0.291

Created on 2020-10-04 by the reprex package (v0.3.0)

Update the new term for reference rows

When a reference row is added, rather than creating a new term "{varname}_ref", I suggest that you either keep the term name consistent with the other terms (in the example below, the new term would be "gradeI"), or leave it blank.

I know it's unlikely, but it someone had a variable with the level _ref, things would fall apart somewhere I think.

library(broom.helpers)
library(gtsummary)

mod <- glm(age ~ grade, data = trial, family = gaussian)

mod %>%
  tidy_and_attach() %>%
  tidy_identify_variables() %>%
  tidy_add_variable_labels() %>%
  tidy_add_reference_rows() %>%
  tidy_add_header_rows() %>% 
  select(variable, term, reference_row, label, header_row)
#> # A tibble: 5 x 5
#>   variable term        reference_row label       header_row
#>   <chr>    <chr>       <lgl>         <chr>       <lgl>     
#> 1 <NA>     (Intercept) NA            (Intercept) FALSE     
#> 2 grade    <NA>        NA            <NA>        TRUE      
#> 3 grade    grade_ref   TRUE          I           FALSE     
#> 4 grade    gradeII     FALSE         II          FALSE     
#> 5 grade    gradeIII    FALSE         III         FALSE

Created on 2020-08-14 by the reprex package (v0.3.0)

label column filled when using `Hmisc::rcspline.eval()` and `poly()` , but not for other categorical variables

Everything so far is looking amazing!

I noted that there is inconsistent application of the label for categorical variables (no value in the label column for the header row), and output for results from Hmisc::rcspline.eval() and poly() (the label column does have a value).

Obviously not a big deal, but can be worth addressing to remain consistent.

library(broom.helpers)
library(gtsummary)

mod <- glm(age ~ grade + Hmisc::rcspline.eval(marker), data = trial, family = gaussian)

mod %>%
  tidy_and_attach() %>%
  tidy_identify_variables() %>%
  tidy_add_variable_labels() %>%
  tidy_add_reference_rows() %>%
  tidy_add_header_rows() %>%
  select(1:2, label) 
#> # A tibble: 9 x 3
#>   term                       variable                  label                    
#>   <chr>                      <chr>                     <chr>                    
#> 1 (Intercept)                <NA>                      (Intercept)              
#> 2 <NA>                       grade                     <NA>                     
#> 3 grade_ref                  grade                     I                        
#> 4 gradeII                    grade                     II                       
#> 5 gradeIII                   grade                     III                      
#> 6 <NA>                       Hmisc::rcspline.eval(mar~ Hmisc::rcspline.eval(mar~
#> 7 Hmisc::rcspline.eval(mark~ Hmisc::rcspline.eval(mar~ Hmisc::rcspline.eval(mar~
#> 8 Hmisc::rcspline.eval(mark~ Hmisc::rcspline.eval(mar~ Hmisc::rcspline.eval(mar~
#> 9 Hmisc::rcspline.eval(mark~ Hmisc::rcspline.eval(mar~ Hmisc::rcspline.eval(mar~

Created on 2020-08-14 by the reprex package (v0.3.0)

Unify the broom.helpers and gtsummary select helpers

At the moment, the broom.helpers and gtsummary select helpers are created independently. When both packages are loaded, one package will mask the others' all_*() selecting functions....which is not good! I've been thinking on a way to unify the syntax, and I think I've come up with something.

Proposed changes:

  1. Create a universal select function, and export it. This function will help construct each of the other helpers. For example, if the function were called select_constructor(), we could define all_continuous() with the code below, which would select variable with type continuous.
    all_continuous <- function() select_constructor("variable", "var_type", "continuous")

The reason for the constructor, is that I can later use it in gtsummary to easily construct selecting functions that do not apply in the broom.helpers setting. BUT, I do not need to recreate the enviornments which which we're selecting or define new scoping functions.

  1. That brings us to the second point, we'd also need to export the scoping function so I can reuse it gtsummary
  2. I recall a notification I received where you indicated we could add an all_interactions() selector and I think another one...but I can't find that message. I'll add that here too. With the general format, it's actually very easy to add new select functions.
  3. You had also mentioned at some point about adding all_factor(), all_character(), etc. functions. I do not suggest you do this. Since I initially released those select functions, {tidyselect} has been updated to all for selection using predicate functions, e.g. trial %>% select(where(is.character)). It's in my plan to deprecate those functions so I do not need to support any supliferous functions.

The only front-facing changes here, will be exporting two new functions that help us write and use the selecting functions in other packages. I'll start putting together a PR.

Ref row label not added

When one runs tidy_add_reference_rows() after tidy_add_term_labels() the reference row label is not shown. It makes sense why this occurs, but I think at minimum a message to users would be helpful to alert them to run the functions in a different order to get desired output.

library(broom.helpers)

# build regression model
lm(mpg ~ factor(cyl) + hp, mtcars) %>%
  # perform initial tidying of model
  tidy_and_attach() %>%
  # add the cyl levels
  tidy_add_term_labels() %>%
  # add reference row cyl
  tidy_add_reference_rows() %>%
  knitr::kable()
term variable var_class var_type estimate std.error statistic p.value var_label contrasts label reference_row
(Intercept) NA NA intercept 28.6501182 1.5877870 18.044056 0.0000000 (Intercept) NA (Intercept) NA
factor(cyl)4 factor(cyl) factor categorical NA NA NA NA factor(cyl) contr.treatment NA TRUE
factor(cyl)6 factor(cyl) factor categorical -5.9676551 1.6392776 -3.640418 0.0010921 factor(cyl) contr.treatment 6 FALSE
factor(cyl)8 factor(cyl) factor categorical -8.5208508 2.3260749 -3.663188 0.0010286 factor(cyl) contr.treatment 8 FALSE
hp hp numeric continuous -0.0240388 0.0154079 -1.560163 0.1299540 hp NA hp NA

Created on 2020-08-27 by the reprex package (v0.3.0)

Add broom.helpers class to tibbles?

Should we add a broom.helpers class to the tibbles? I think this can help down the line ensuring we're working the the correct object types.

class(x) <- c("broom.helpers", class(x))

`survival::coxph()` strips labels from categorical variables....but you can access them.

In the example below, the variable grade does indeed have a label, "Grade". But you can get it!

Can we please update the internals to grab the label using the method below if not found in the typical manner?

library(broom.helpers)
library(gtsummary)
library(survival)
#> Warning: package 'survival' was built under R version 4.0.2

mod <- coxph(Surv(ttdeath, death) ~ grade, trial)

mod %>%
  tidy_and_attach() %>%
  tidy_identify_variables() %>%
  tidy_add_reference_rows() %>%
  tidy_add_variable_labels() %>%
  tidy_add_header_rows() %>%
  select(term, variable, var_label, label)
#> # A tibble: 4 x 4
#>   term      variable var_label label
#>   <chr>     <chr>    <chr>     <chr>
#> 1 <NA>      grade    grade     grade
#> 2 grade_ref grade    grade     I    
#> 3 gradeII   grade    grade     II   
#> 4 gradeIII  grade    grade     III

# get the grade label from a coxph object
model.frame.default(mod)$grade %>% attr("label")
#> [1] "Grade"

Created on 2020-08-14 by the reprex package (v0.3.0)

Column ordering suggestion

It would be helpful to have a standardized order the columns appear as additional information is added to the tidy tibble. For example, all the original columns could remain on the right side of the tibble, and all new columns would be added to the left side of the tibble.

The ordering of the columns (no matter the order the functions are called) would also be standardized. The order would be selected to make it easier to digest the information in the table. For example, when the variable is added, rather than it perhaps ending up in the middle of the tibble, it would always be near the beginning. Below is a suggested ordering:

library(broom.helpers)

lm(mpg ~ factor(cyl) + hp, mtcars) %>%
  tidy_plus_plus() %>% 
  dplyr::select(any_of(c("variable", "var_label", "var_class", "var_type", 
                         "contrasts", "reference_row", "label")), 
                everything()) %>%
  knitr::kable()
variable var_label var_class var_type contrasts reference_row label term estimate std.error statistic p.value conf.low conf.high
factor(cyl) factor(cyl) factor categorical contr.treatment TRUE 4 factor(cyl)4 NA NA NA NA NA NA
factor(cyl) factor(cyl) factor categorical contr.treatment FALSE 6 factor(cyl)6 -5.9676551 1.6392776 -3.640418 0.0010921 -9.3255631 -2.6097471
factor(cyl) factor(cyl) factor categorical contr.treatment FALSE 8 factor(cyl)8 -8.5208508 2.3260749 -3.663188 0.0010286 -13.2855993 -3.7561022
hp hp numeric continuous NA NA hp hp -0.0240388 0.0154079 -1.560163 0.1299540 -0.0556005 0.0075228

Created on 2020-08-27 by the reprex package (v0.3.0)

A simple re-ordering function could be added to the end of each tidy_*() function.

order_tidy_columns <- function(x) {
  dplyr::select(x, 
                any_of(c("variable", "var_label", "var_class", "var_type", 
                         "contrasts", "reference_row", "label")), 
                everything())
}

`var_class` incorrect for integers

In the example below, am is an integer class variable. But in the broom.helpers tibble, the class is indicated as integer.

library(broom.helpers)

tibble::as_tibble(mtcars) %>%
  dplyr::mutate(
    am = as.integer(am),
    vs = as.logical(vs)
  ) %>%
  {lm(mpg ~ am + vs + hp + factor(cyl), .)} %>%
  tidy_and_attach() %>%
  tidy_identify_variables() 
#> # A tibble: 6 x 8
#>   term      variable   var_class var_type  estimate std.error statistic  p.value
#>   <chr>     <chr>      <chr>     <chr>        <dbl>     <dbl>     <dbl>    <dbl>
#> 1 (Interce~ <NA>       <NA>      intercept  24.4       2.57      9.51   6.01e-10
#> 2 am        am         numeric   continuo~   5.16      1.45      3.55   1.49e- 3
#> 3 vsTRUE    vs         logical   categori~   2.57      1.94      1.32   1.97e- 1
#> 4 hp        hp         numeric   continuo~  -0.0469    0.0145   -3.23   3.35e- 3
#> 5 factor(c~ factor(cy~ factor    categori~  -2.65      1.80     -1.48   1.52e- 1
#> 6 factor(c~ factor(cy~ factor    categori~  -0.277     3.49     -0.0795 9.37e- 1

Created on 2020-10-08 by the reprex package (v0.3.0)

Easier identification of dichotomous variables and all_categorical(), all_continuous(), all_dichotomous() helpers

Dear @ddsjoberg

I would like your opinion on the two following points.

First, it could be relevant to better identify dichotomous variables. An option could be to have an evolution of var_type created by tidy_identify_variables() and, for dichotomous variables, to replace the value"categorical" by "dichotomous", knowing that all dichotomous variables are also categoricals. But it could maybe have side effects in gtsummary.

An alternative could be to generate an additional column dichotomous equal to TRUE, FALSE or NA (for continuous variables).

Identifying dichotomous variables directly in tidy_identify_variables() would be useful later by simplifying the code of tidy_add_header_rows() when applying show_single_row.

Second, tidy helpers such as all_categorical(), all_continuous() and all_dichotomous() could be useful as well in broom.helpers. However, I do not know if code could be mutualised between gtsummary() and broom.helpers and if we could avoid any conflict.

As you developed these two functions and you are the one who implemented tidy selecters in broom.helpers, what do you think?

Best

Add message when user requests single row for variable that cannot be put on a single row

The model below includes factor(cyl) which is 3 levels. When we request that it is displayed on a single row, nothing happens (because it can't be shown on a single row), and there is not message about the command being ignored.

A message to the user in this case would be helpful.

library(broom.helpers)
lm(mpg ~ hp + factor(cyl) + factor(am), mtcars) %>%
  broom.helpers::tidy_and_attach() %>%
  broom.helpers::tidy_identify_variables() %>%
  broom.helpers::tidy_add_header_rows(show_single_row = c("factor(am)", "factor(cyl)")) 
#> # A tibble: 6 x 12
#>   term  variable var_label var_class var_type header_row contrasts label
#>   <chr> <chr>    <chr>     <chr>     <chr>    <lgl>      <chr>     <chr>
#> 1 (Int~ <NA>     (Interce~ <NA>      interce~ NA         <NA>      (Int~
#> 2 hp    hp       hp        numeric   continu~ NA         <NA>      hp   
#> 3 <NA>  factor(~ factor(c~ factor    categor~ TRUE       contr.tr~ fact~
#> 4 fact~ factor(~ factor(c~ factor    categor~ FALSE      contr.tr~ 6    
#> 5 fact~ factor(~ factor(c~ factor    categor~ FALSE      contr.tr~ 8    
#> 6 fact~ factor(~ factor(a~ factor    categor~ NA         contr.tr~ 1    
#> # ... with 4 more variables: estimate <dbl>, std.error <dbl>, statistic <dbl>,
#> #   p.value <dbl>

Created on 2020-09-01 by the reprex package (v0.3.0)

Release version 1.0.0

Prepare for release:

  • devtools::check(remote = TRUE, manual = TRUE)
  • devtools::check_win_devel()
  • Polish NEWS
  • Polish pkgdown reference index

Submit to CRAN:

  • usethis::use_version()
  • Update cran-comments.md
  • devtools::submit_cran()
  • Approve email

Wait for CRAN...

  • Accepted ๐ŸŽ‰
  • Create GitHub release
  • Remove file CRAN-RELEASE
  • usethis::use_dev_version()

clean bacticks in variable names for interaction only terms

Test

lm(hp ~ factor(`number + cylinders`) : `miles per galon` + factor(`type of transmission`), mtcars %>% rename(`miles per galon` = mpg, `type of transmission` = am, `number + cylinders` = cyl))

miles per galon should have ticks removed

Variable names for models with no model.frame method

When there is no model.frame() method, the user sees a very informative message. (super helpful!)

The resulting table has column for variable, but the columns are NA. Can we add the term as the variable for these models? In gtsummary, we use the variable name to do further manipulation (and in broom.helpers too), but with no name these variables cannot be selected.

I know the term is not the proper variable name, but I think the printed message is enough of a cue to users that the original variable names are not available.

library(broom.helpers)
library(gtsummary)

# make up some interval censored data 
trial2 <-
  trial %>% 
  dplyr::mutate(
    lint = dplyr::case_when(
      death == 1 ~ runif(200) + 2,
      death == 0 ~ ttdeath
    ),
    rint = dplyr::case_when(
      death == 1 ~ ttdeath,
      death == 0 ~ Inf
    )
  )

# Write a custom tidier
tidy_ic_sp <- function(x, exponentiate =  FALSE, conf.level = 0.95, ...) {
  tidy <-
    tibble::tibble(
      term = names(x[["coefficients"]]),
      estimate = x[["coefficients"]],
      std.error = sqrt(diag(x[["var"]])),
      statistic = summary(x)$summaryParameters[, "z-value"],
      p.value = summary(x)$summaryParameters[, "p"],
      conf.low = confint(x, level = conf.level)[, 1],
      conf.high = confint(x, level = conf.level)[, 2]
    )
  
  if (exponentiate == TRUE)
    tidy <- dplyr::mutate_at(tidy, vars(estimate, conf.low, conf.high), exp)
  
  tidy
}

# fit the interval-censored survival model with icenReg::ic_sp()
icenReg::ic_sp(
  survival::Surv(lint, rint, type = "interval2") ~ trt,
  model = "ph",
  bs_samples = 3,
  data = trial2
) %>%
  # tidy up with broom.helpers
  tidy_and_attach(tidy_fun = tidy_ic_sp) %>%
  tidy_identify_variables() %>%
  tidy_add_variable_labels() %>%
  tidy_add_header_rows() %>%
  select(term, variable, var_label, label, estimate)
#> x Unable to identify the list of variables.
#>   
#>   This is usually due to an error calling `stats::model.frame(x)`or `stats::model.matrix(x)`.
#>   It could be the case if that type of model does not implement these methods.
#>   Rarely, this error may occur if the model object was created within
#>   a functional programming framework (e.g. using `lappy()`, `purrr::map()`, etc.).
#> # A tibble: 1 x 5
#>   term      variable var_label label     estimate
#>   <chr>     <chr>    <chr>     <chr>        <dbl>
#> 1 trtDrug B <NA>     trtDrug B trtDrug B    0.160

Created on 2020-10-19 by the reprex package (v0.3.0)

Management of `poly()`

Add an helper to convert poly(var, 4) into var in variable and to produce more explicit term (e.g. var^1, var^2, var^3, var^4)

`tidy_add_reference_rows()` erroneously adds reference row to interaction-only model

library(broom.helpers)
library(gtsummary)
#> #BlackLivesMatter

lm(age ~ factor(response):marker, trial) %>%
  tidy_and_attach() %>%
  tidy_identify_variables() %>%
  tidy_add_reference_rows() %>%
  knitr::kable()
term variable var_class var_type contrasts reference_row estimate std.error statistic p.value
(Intercept) NA NA intercept NA NA 46.6357738 1.632164 28.5729753 0.0000000
factor(response)0:marker factor(response):marker NA interaction NA NA 0.3957856 1.507993 0.2624585 0.7932857
factor(response)1:marker factor(response):marker NA interaction NA NA 0.1015807 1.653558 0.0614316 0.9510877
factor(response)0 factor(response) NA NA NA TRUE NA NA NA NA

Created on 2020-09-03 by the reprex package (v0.3.0)

`show_single_row=` not working for categorical-continuous interaction

In the example below, I am requesting the interaction term "factor(response):marker" be printed on a single row, but it is being ignored.

library(broom.helpers)
library(gtsummary)

lm(age ~ factor(response) * marker, trial) %>%
  tidy_and_attach() %>%
  tidy_identify_variables() %>%
  tidy_add_reference_rows() %>%
  tidy_add_variable_labels() %>%
  tidy_add_header_rows(show_single_row = "factor(response):marker") %>%
  knitr::kable()
term variable var_label var_class var_type header_row contrasts reference_row label estimate std.error statistic p.value
(Intercept) NA (Intercept) NA intercept NA NA NA (Intercept) 43.985685 1.906507 23.071342 0.0000000
NA factor(response) factor(response) factor categorical TRUE contr.treatment NA factor(response) NA NA NA NA
factor(response)0 factor(response) factor(response) factor categorical FALSE contr.treatment TRUE 0 NA NA NA NA
factor(response)1 factor(response) factor(response) factor categorical FALSE contr.treatment FALSE 1 9.117623 3.536300 2.578294 0.0107814
marker marker Marker Level (ng/mL) numeric continuous NA NA NA Marker Level (ng/mL) 2.007188 1.609824 1.246836 0.2141828
NA factor(response):marker factor(response) * Marker Level (ng/mL) NA interaction TRUE NA NA factor(response) * Marker Level (ng/mL) NA NA NA NA
factor(response)1:marker factor(response):marker factor(response) * Marker Level (ng/mL) NA interaction FALSE NA NA 1 * Marker Level (ng/mL) -5.337195 2.647510 -2.015930 0.0453914

Created on 2020-09-03 by the reprex package (v0.3.0)

`tidy_remove_intercept()` removes terms from model (when they are named horribly)

This is VERY much an edge case, but wanted to let you know. It seems that if a variable name has a + in it, tidy_remove_intercept() will remove both the intercept and the variable from the model. If this is a complicated fix, perhaps just a message to the user, "more than one row was removed from the table. possible error occurred likely due to unusual naming conventions used for terms."

library(gtsummary)

trial2 <- 
  trial %>% 
  dplyr::mutate(`treatment +name` = trt)


glm(response ~ `treatment +name`, 
    trial2, 
    family = binomial(link = "logit")) %>%
  broom.helpers::tidy_and_attach() %>%
  broom.helpers::tidy_remove_intercept()
#> # A tibble: 0 x 8
#> # ... with 8 variables: term <chr>, variable <chr>, var_class <chr>,
#> #   var_type <chr>, estimate <dbl>, std.error <dbl>, statistic <dbl>,
#> #   p.value <dbl>

Created on 2020-10-02 by the reprex package (v0.3.0)

Add a function model_get_coefficients_type()

Inspired by gtsummary:::estimate_header(), add a function to identify model type and coefficient type.

An additional function tidy_identify_model_type() could add model_type and coefficient_type as attributes to the results.

It will be useful for the redesign of GGally::ggcoef

To @ddsjoberg , let me know if you think it could be relevant for gtsummary as well. I know that in gtsummary you also manage corresponding footnotes and translation. But I do not think that this last part is in the scope of broom.helpers.

tidy_add_header_rows() error with continuous * categorical interaction

When tidy_add_header_rows() is run on the model below, the interaction term should be on two rows. It should have a header row with label column equal to factor(response) * Marker Level (ng/mL), and a second row with label column 1 * Marker Level (ng/mL) with the estimate.

library(gtsummary)
library(broom.helpers)
lm(age ~ factor(response) * marker, trial) %>%
  broom.helpers::tidy_and_attach() %>%
  broom.helpers::tidy_identify_variables() %>%
  broom.helpers::tidy_add_variable_labels() %>%
  broom.helpers::tidy_add_reference_rows() %>%
  broom.helpers::tidy_add_header_rows() %>%
  select(variable, var_type, var_label, label, estimate)
#> # A tibble: 6 x 5
#>   variable          var_type  var_label                 label           estimate
#>   <chr>             <chr>     <chr>                     <chr>              <dbl>
#> 1 <NA>              intercept (Intercept)               (Intercept)        44.0 
#> 2 factor(response)  categori~ factor(response)          factor(respons~    NA   
#> 3 factor(response)  categori~ factor(response)          0                  NA   
#> 4 factor(response)  categori~ factor(response)          1                   9.12
#> 5 marker            continuo~ Marker Level (ng/mL)      Marker Level (~     2.01
#> 6 factor(response)~ interact~ factor(response) * Marke~ 1 * Marker Lev~    -5.34

Created on 2020-09-01 by the reprex package (v0.3.0)

Error when identify variables run after remove intercept

There is a merging error when the the remove intercept function is run before the identify variables function...there are two columns for var_nlevels).

library(broom.helpers)

lm(age ~ marker, gtsummary::trial) %>%
  tidy_and_attach() %>%
  tidy_remove_intercept() %>%
  tidy_identify_variables() # looks like a merging error (two cols for var_nlevels)
#> # A tibble: 1 x 10
#>   term  variable var_class var_type var_nlevels.x estimate std.error statistic
#>   <chr> <chr>    <chr>     <chr>            <int>    <dbl>     <dbl>     <dbl>
#> 1 mark~ marker   numeric   continu~            NA  -0.0545      1.26   -0.0434
#> # ... with 2 more variables: p.value <dbl>, var_nlevels.y <int>

Created on 2020-10-15 by the reprex package (v0.3.0)

Should variable be populated for intercept terms? or stay NA as current?

I started re-writing the broom.helpers section of tbl_regression() to use tidy_plus_plus() instead of the individual functions. One of the reasons to use plus-plus over a series of other tidy_*() functions, is that it will be easier for me to give users access to the other arguments in tidy_plus_plus() so they can change the resulting table if they like (e.g. adding the informative contrast labels @gorkang are working on).

One sticking point is that I treat the intercept like a variable. For example, users can change the intercept label using tbl_regression(label = list("(Intercept)" ~ "b0", age ~ "Patient Age")). Is there a way where the gtsummary API does not change, and I can use tidy_plus_plus()?

My first thought was to simply have an option in tidy_identify_variables() that populates the intercept variable column with the term name. But I am not sure if this will cause problems with other subsequent functions. What do you think?

Custom tieders

Should we add a table to the vignette with a list of compatibles models, with a note column to specify model-specific information about compatibility?

Consistency of args passed

This is so minor, but wanted to point it out just in case!

The tidy_plus_plus() fn accepts the arg conf.int= and also the ... which are passed to tidy_fun=. Is there a reason to include conf.int= here, but not in tidy_and_attach() for example.

There are other common tidy arguments not included, e.g. exponentiate=. To be consistent, should conf.int= argument be removed?

Add a `model_get_model_frame()` method for mice objects

The mice package does not include a model.frame() method for the resulting regression models from multiply imputed data sets.

Would you be ok adding one here? I need to look up the exact code, but it'll be something like this (i can add if you're ok with it)

#' @export
#' @rdname model_get_model_frame
model_get_model_frame.mipo <- function(model) {
  # add check that the mice package is installed
  
  # grab input mice data

  # extract a single dataset for our use of finding labels
   mice::complete(...)
}

Improve error messaging

What is your opinion on improving the error messaging in situations like the one below: where the model is created within an apply() or map() setting and the stats::*() functions called on model objects fail.

library(tidyverse)
library(gtsummary)
#> #Uighur
library(survival)

# Set up map statement to create different models
tibble(grade = c("I", "II", "III")) %>%
  mutate(df_model = map(grade, ~ trial %>% filter(grade == ..1))) %>%
  mutate(
    mv_formula_char = "Surv(ttdeath, death) ~ trt + age + marker",
    mv_formula = map(mv_formula_char, ~ as.formula(.x)),
    mv_model_form =
      map2(
        mv_formula, df_model,
        ~ coxph(..1, data = ..2)
      ),
    mv_tbl_form =
      map(
        mv_model_form,
        ~ broom.helpers::tidy_plus_plus(..1, exponentiate = TRUE)
      )
  )
#> Error: Problem with `mutate()` input `mv_tbl_form`.
#> x the ... list contains fewer than 2 elements
#> i Input `mv_tbl_form` is `map(mv_model_form, ~broom.helpers::tidy_plus_plus(..1, exponentiate = TRUE))`.

Created on 2020-08-31 by the reprex package (v0.3.0)

In gtsummary, we added an error message like this: ddsjoberg/gtsummary#231

`tidy_add_variable_labels()` error with interaction only model

The model below only has an interaction term (no main effects), and the variable label is not correct.

library(gtsummary)
library(broom.helpers)
lm(age ~ factor(response):marker, trial) %>%
  broom.helpers::tidy_and_attach() %>%
  broom.helpers::tidy_identify_variables() %>%
  broom.helpers::tidy_add_variable_labels() %>%
  select(variable, var_type, var_label, estimate)
#> # A tibble: 3 x 4
#>   variable                var_type    var_label                 estimate
#>   <chr>                   <chr>       <chr>                        <dbl>
#> 1 <NA>                    intercept   (Intercept)                 46.6  
#> 2 factor(response):marker interaction NA * Marker Level (ng/mL)    0.396
#> 3 factor(response):marker interaction NA * Marker Level (ng/mL)    0.102

Created on 2020-09-01 by the reprex package (v0.3.0)

Also, if we add a tidy_add_term_labels() the label is also wrong, but in a different way.

library(gtsummary)
library(broom.helpers)
lm(age ~ factor(response):marker, trial) %>%
  broom.helpers::tidy_and_attach() %>%
  broom.helpers::tidy_identify_variables() %>%
  broom.helpers::tidy_add_term_labels() %>%
  broom.helpers::tidy_add_variable_labels() %>%
  select(variable, var_type, var_label, label, estimate)
#> # A tibble: 3 x 5
#>   variable               var_type    var_label               label      estimate
#>   <chr>                  <chr>       <chr>                   <chr>         <dbl>
#> 1 <NA>                   intercept   (Intercept)             (Intercep~   46.6  
#> 2 factor(response):mark~ interaction NA * Marker Level (ng/~ NA * NA       0.396
#> 3 factor(response):mark~ interaction NA * Marker Level (ng/~ NA * NA       0.102

Created on 2020-09-01 by the reprex package (v0.3.0)

Warning is printed for intercept only models

When I run an intercept only model, we get returned tibble but also a warning.

library(broom.helpers)
lm(mpg ~ 1, mtcars) %>%
  broom.helpers::tidy_and_attach() %>%
  broom.helpers::tidy_identify_variables() %>%
  broom.helpers::tidy_add_header_rows() 
#> Warning in min(.data$rank): no non-missing arguments to min; returning Inf
#> # A tibble: 1 x 12
#>   term  variable var_label var_class var_type header_row contrasts label
#>   <chr> <chr>    <chr>     <chr>     <chr>    <lgl>      <chr>     <chr>
#> 1 (Int~ <NA>     (Interce~ <NA>      interce~ NA         <NA>      (Int~
#> # ... with 4 more variables: estimate <dbl>, std.error <dbl>, statistic <dbl>,
#> #   p.value <dbl>

Created on 2020-09-01 by the reprex package (v0.3.0)

Add strict option for functions

As a developer, it would be helpful to have the option for some broom.helper functions to fail when they cannot execute the requested action.

I am integrating broom.helpers into gtsummary now, and these two scenarios have come up so far:

  1. When I run broom.helpers::tidy_identify_variables() if the variables cannot be identified, I would like to be able to have the function error. As it is currently written, I would need to inspect the returned object to check if the variables were indeed identified.

  2. When I run broom.helpers::tidy_add_header_rows(show_single_row=) for a variable that cannot be put on a single row.

Perhaps the arg could be something like tidy_plus_plus(strict=)? It would be similar to how purrr had pluck() and chuck()?

Include `var_label` in subsequent calls

If tidy_add_variable_labels() is run after tidy_add_reference_rows() labels are correctly filled correctly.

library(broom.helpers)
library(gtsummary)
library(survival)

mod <- lm(ttdeath ~ grade, trial)

mod %>%
  tidy_and_attach() %>%
  tidy_identify_variables() %>%
  tidy_add_reference_rows() %>%
  tidy_add_variable_labels() %>%
  tidy_add_header_rows() %>%
  select(term, variable, var_label, label)
#> # A tibble: 5 x 4
#>   term        variable var_label   label      
#>   <chr>       <chr>    <chr>       <chr>      
#> 1 (Intercept) <NA>     (Intercept) (Intercept)
#> 2 <NA>        grade    Grade       Grade      
#> 3 grade_ref   grade    Grade       I          
#> 4 gradeII     grade    Grade       II         
#> 5 gradeIII    grade    Grade       III

But, if it is called in the opposite order, the var_label is does not fill all rows associated with the variable.

mod %>%
  tidy_and_attach() %>%
  tidy_identify_variables() %>%
  tidy_add_variable_labels() %>%
  tidy_add_reference_rows() %>%
  tidy_add_header_rows() %>%
  select(term, variable, var_label, label)
#> # A tibble: 5 x 4
#>   term        variable var_label   label      
#>   <chr>       <chr>    <chr>       <chr>      
#> 1 (Intercept) <NA>     (Intercept) (Intercept)
#> 2 <NA>        grade    <NA>        <NA>       
#> 3 grade_ref   grade    <NA>        I          
#> 4 gradeII     grade    Grade       II         
#> 5 gradeIII    grade    Grade       III

I think it is fine to have an order dependency of these functions, but a note passed to the users would be helpful. Or even an error like when tidy_add_variable_labels() is called after tidy_add_header_rows().
image

Add tidy_select_variables()

This function will allow to keep only certain variables in the output.

Two arguments: keep and drop.

To be added also to tidy_plus_plus()

Wrap `model.frame()` in `tryCatch()` ?

Perhaps a good idea to wrap the call to stats::model.frame(model) in model_get_model_frame.R in a try catch in case the regression model does not have a method for it (like mice models).

Release broom.helpers 1.1.0

Prepare for release:

  • devtools::check(remote = TRUE, manual = TRUE)
  • revdepcheck::revdep_reset()
  • revdepcheck::revdep_check(num_workers = 4)
  • Polish NEWS
  • Polish pkgdown reference index

Submit to CRAN:

  • usethis::use_version()
  • Update cran-comments.md
  • devtools::submit_cran() (CRAN team on vacation until August 24)
  • Approve email

Wait for CRAN...

  • Accepted ๐ŸŽ‰
  • usethis::use_github_release()
  • Remove file CRAN-RELEASE
  • usethis::use_dev_version()

Customize categorical term labels with a glue pattern

Maybe, a feature that could be added in broom.helpers (and therefore also implemented in gtsummary) could be a function tidy_rename_categorical_terms() that would allow to do the type of renaming you want, but after model computation and at the moment the table is built. For example:

mod %>% tidy_and_attach() %>% tidy_rename_categorical_terms(pattern = "{variable} [{term}-{reference}]")
You would be able to choose whatever pattern you want.

Cf. ddsjoberg/gtsummary#677

Note: a second argument should allow to select which variables to rename.

header row missing after running `tidy_plus_plus()`

The header row for cyl is missing when using tidy_plus_plus(), but the documentation indicates it should have been added.

library(broom.helpers)
# no header row for cyl 
lm(mpg ~ factor(cyl), mtcars) %>%
  tidy_plus_plus()
#> # A tibble: 3 x 14
#>   term  variable var_class var_type estimate std.error statistic   p.value
#>   <chr> <chr>    <chr>     <chr>       <dbl>     <dbl>     <dbl>     <dbl>
#> 1 fact~ factor(~ factor    categor~    NA        NA        NA    NA       
#> 2 fact~ factor(~ factor    categor~    -6.92      1.56     -4.44  1.19e- 4
#> 3 fact~ factor(~ factor    categor~   -11.6       1.30     -8.90  8.57e-10
#> # ... with 6 more variables: conf.low <dbl>, conf.high <dbl>, contrasts <chr>,
#> #   reference_row <lgl>, var_label <chr>, label <chr>

# has header row
lm(mpg ~ factor(cyl), mtcars) %>%
  tidy_and_attach() %>%
  tidy_add_reference_rows() %>%
  tidy_add_header_rows() 
#> # A tibble: 5 x 13
#>   term  variable var_class var_type estimate std.error statistic   p.value
#>   <chr> <chr>    <chr>     <chr>       <dbl>     <dbl>     <dbl>     <dbl>
#> 1 (Int~ <NA>     <NA>      interce~    26.7      0.972     27.4   2.69e-22
#> 2 <NA>  factor(~ factor    categor~    NA       NA         NA    NA       
#> 3 fact~ factor(~ factor    categor~    NA       NA         NA    NA       
#> 4 fact~ factor(~ factor    categor~    -6.92     1.56      -4.44  1.19e- 4
#> 5 fact~ factor(~ factor    categor~   -11.6      1.30      -8.90  8.57e-10
#> # ... with 5 more variables: contrasts <chr>, reference_row <lgl>,
#> #   var_label <chr>, label <chr>, header_row <lgl>

Created on 2020-08-17 by the reprex package (v0.3.0)

Add quiet option

Any function that prints messages should have a quiet= option. This could be helpful to devs who do not want the broom.helpers messages to print.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.