Code Monkey home page Code Monkey logo

rsimsum's Introduction

{rsimsum}: Analysis of Simulation Studies Including Monte Carlo Error

R-CMD-check Codecov Test Coverage CRAN Status CRAN Logs Monthly Downloads CRAN Logs Total Downloads JOSS DOI Lifecycle: Stable

rsimsum is an R package that can compute summary statistics from simulation studies. rsimsum is modelled upon a similar package available in Stata, the user-written command simsum (White I.R., 2010).

The aim of rsimsum is to help to report simulation studies, including understanding the role of chance in results of simulation studies: Monte Carlo standard errors and confidence intervals based on them are computed and presented to the user by default. rsimsum can compute a wide variety of summary statistics: bias, empirical and model-based standard errors, relative precision, relative error in model standard error, mean squared error, coverage, bias. Further details on each summary statistic are presented elsewhere (White I.R., 2010; Morris et al, 2019).

The main function of rsimsum is called simsum and can handle simulation studies with a single estimand of interest at a time. Missing values are excluded by default, and it is possible to define boundary values to drop estimated values or standard errors exceeding such limits. It is possible to define a variable representing methods compared with the simulation study, and it is possible to define by factors, that is, factors that vary between the different simulated scenarios (data-generating mechanisms, DGMs). However, methods and DGMs are not strictly required: in that case, a simulation study with a single scenario and a single method is assumed. Finally, rsimsum provides a function named multisimsum that allows summarising simulation studies with multiple estimands as well.

An important step of reporting a simulation study consists in visualising the results; therefore, rsimsum exploits the R package ggplot2 to produce a portfolio of opinionated data visualisations for quick exploration of results, inferring colours and facetting by data-generating mechanisms. rsimsum includes methods to produce (1) plots of summary statistics with confidence intervals based on Monte Carlo standard errors (forest plots, lolly plots), (2) zipper plots to graphically visualise coverage by directly plotting confidence intervals, (3) plots for method-wise comparisons of estimates and standard errors (scatter plots, Bland-Altman plots, ridgeline plots), and (4) heat plots. The latter is a visualisation type that has not been traditionally used to present results of simulation studies, and consists in a mosaic plot where the factor on the x-axis is the methods compared with the current simulation study and the factor on the y-axis is the data-generating factors. Each tile of the mosaic plot is coloured according to the value of the summary statistic of interest, with a red colour representing values above the target value and a blue colour representing values below the target.

Installation

You can install rsimsum from CRAN:

install.packages("rsimsum")

Alternatively, it is possible to install the development version from GitHub using the remotes package:

# install.packages("remotes")
remotes::install_github("ellessenne/rsimsum")

Example

This is a basic example using data from a simulation study on missing data (type help("MIsim", package = "rsimsum") in the R console for more information):

library(rsimsum)
data("MIsim", package = "rsimsum")
s <- simsum(data = MIsim, estvarname = "b", true = 0.5, se = "se", methodvar = "method", x = TRUE)
#> 'ref' method was not specified, CC set as the reference
s
#> Summary of a simulation study with a single estimand.
#> True value of the estimand: 0.5 
#> 
#> Method variable: method 
#>  Unique methods: CC, MI_LOGT, MI_T 
#>  Reference method: CC 
#> 
#> By factors: none
#> 
#> Monte Carlo standard errors were computed.

We set x = TRUE as it will be required for some plot types.

Summarising the results:

summary(s)
#> Values are:
#>  Point Estimate (Monte Carlo Standard Error)
#> 
#> Non-missing point estimates/standard errors:
#>    CC MI_LOGT MI_T
#>  1000    1000 1000
#> 
#> Average point estimate:
#>      CC MI_LOGT   MI_T
#>  0.5168  0.5009 0.4988
#> 
#> Median point estimate:
#>      CC MI_LOGT   MI_T
#>  0.5070  0.4969 0.4939
#> 
#> Average variance:
#>      CC MI_LOGT   MI_T
#>  0.0216  0.0182 0.0179
#> 
#> Median variance:
#>      CC MI_LOGT   MI_T
#>  0.0211  0.0172 0.0169
#> 
#> Bias in point estimate:
#>               CC         MI_LOGT             MI_T
#>  0.0168 (0.0048) 0.0009 (0.0042) -0.0012 (0.0043)
#> 
#> Relative bias in point estimate:
#>               CC         MI_LOGT             MI_T
#>  0.0335 (0.0096) 0.0018 (0.0083) -0.0024 (0.0085)
#> 
#> Empirical standard error:
#>               CC         MI_LOGT            MI_T
#>  0.1511 (0.0034) 0.1320 (0.0030) 0.1344 (0.0030)
#> 
#> % gain in precision relative to method CC:
#>               CC          MI_LOGT             MI_T
#>  0.0000 (0.0000) 31.0463 (3.9375) 26.3682 (3.8424)
#> 
#> Mean squared error:
#>               CC         MI_LOGT            MI_T
#>  0.0231 (0.0011) 0.0174 (0.0009) 0.0181 (0.0009)
#> 
#> Model-based standard error:
#>               CC         MI_LOGT            MI_T
#>  0.1471 (0.0005) 0.1349 (0.0006) 0.1338 (0.0006)
#> 
#> Relative % error in standard error:
#>                CC         MI_LOGT             MI_T
#>  -2.6594 (2.2055) 2.2233 (2.3323) -0.4412 (2.2695)
#> 
#> Coverage of nominal 95% confidence interval:
#>               CC         MI_LOGT            MI_T
#>  0.9430 (0.0073) 0.9490 (0.0070) 0.9430 (0.0073)
#> 
#> Bias-eliminated coverage of nominal 95% confidence interval:
#>               CC         MI_LOGT            MI_T
#>  0.9400 (0.0075) 0.9490 (0.0070) 0.9430 (0.0073)
#> 
#> Power of 5% level test:
#>               CC         MI_LOGT            MI_T
#>  0.9460 (0.0071) 0.9690 (0.0055) 0.9630 (0.0060)

Vignettes

rsimsum comes with 5 vignettes. In particular, check out the introductory one:

vignette(topic = "A-introduction", package = "rsimsum")

The list of vignettes could be obtained by typing the following in the R console:

vignette(package = "rsimsum")

Visualising results

As of version 0.2.0, rsimsum can produce a variety of plots: among others, lolly plots, forest plots, zipper plots, etc.:

library(ggplot2)
autoplot(s, type = "lolly", stats = "bias")

autoplot(s, type = "zip")

With rsimsum 0.5.0 the plotting functionality has been completely rewritten, and new plot types have been implemented:

  • Scatter plots for method-wise comparisons, including Bland-Altman type plots;
autoplot(s, type = "est_ba")
#> `geom_smooth()` using formula = 'y ~ x'

  • Ridgeline plots.
autoplot(s, type = "est_ridge")
#> Picking joint bandwidth of 0.0295

Nested loop plots have been implemented in rsimsum 0.6.0:

data("nlp", package = "rsimsum")
s.nlp <- rsimsum::simsum(
  data = nlp, estvarname = "b", true = 0, se = "se",
  methodvar = "model", by = c("baseline", "ss", "esigma")
)
#> 'ref' method was not specified, 1 set as the reference
autoplot(s.nlp, stats = "bias", type = "nlp")

Finally, as of rsimsum 0.7.1 contour plots and hexbin plots have been implemented as well:

autoplot(s, type = "est_density")
#> `geom_smooth()` using formula = 'y ~ x'

autoplot(s, type = "est_hex")
#> `geom_smooth()` using formula = 'y ~ x'

They provide a useful alternative when there are several data points with large overlap (e.g. in a scatterplot).

The plotting functionality now extend the S3 generic autoplot: see ?ggplot2::autoplot and ?rsimsum::autoplot.simsum for further details.

More details and information can be found in the vignettes dedicated to plotting:

vignette(topic = "C-plotting", package = "rsimsum")
vignette(topic = "D-nlp", package = "rsimsum")

Citation

If you find rsimsum useful, please cite it in your publications:

citation("rsimsum")
#> To cite package 'rsimsum' in publications use:
#> 
#>   Gasparini, (2018). rsimsum: Summarise results from Monte Carlo simulation studies.
#>   Journal of Open Source Software, 3(26), 739, https://doi.org/10.21105/joss.00739
#> 
#> A BibTeX entry for LaTeX users is
#> 
#>   @Article{,
#>     author = {Alessandro Gasparini},
#>     title = {rsimsum: Summarise results from Monte Carlo simulation studies},
#>     journal = {Journal of Open Source Software},
#>     year = {2018},
#>     volume = {3},
#>     issue = {26},
#>     pages = {739},
#>     doi = {10.21105/joss.00739},
#>     url = {https://doi.org/10.21105/joss.00739},
#>   }

References

  • White, I.R. 2010. simsum: Analyses of simulation studies including Monte Carlo error. The Stata Journal, 10(3): 369-385
  • Morris, T.P., White, I.R. and Crowther, M.J. 2019. Using simulation studies to evaluate statistical methods. Statistics in Medicine, 38: 2074-2102
  • Gasparini, A. 2018. rsimsum: Summarise results from Monte Carlo simulation studies. Journal of Open Source Software, 3(26):739

rsimsum's People

Contributors

ellessenne avatar kaladani avatar lorenzo-guizzaro avatar mllg avatar samperochkin avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

rsimsum's Issues

When using the `by` argument all factorial combinations of the elements must exist in the summary dataset else an error is thrown.

Is your feature request related to a problem?
When using the by argument all factorial combinations of the elements must exist in the summary dataset else an error is thrown.

But occasionally we are in a situation where not all combinations exist (it may be that certain combinations of the factors are not of interest). In my situation it is that we want to plot just the subset where factor A is not equal to factor B

Below is a contrived situation where we are not interested in settings where ss=100 and esigma=2. This throws an error.

nlp.subset <- nlp %>% filter(!(ss==100 & esigma==2))
s.nlp.subset <- rsimsum::simsum(
  data = nlp.subset, estvarname = "b", true = 0, se = "se",
  methodvar = "model", by = c("baseline", "ss", "esigma")
)

Describe the solution you'd like
It would be great if we could get an nlp plot for only the combination of factors that exist in the dataset that is passed to simsum.

Describe alternatives you've considered
Currently I'm unable to use a nlp plot for subsets of settings which are not factorial in nature.

Zip plot is missing/overlapping confidence intervals.

Describe the bug
image
Notice there are some overlapping or seemingly missing CIs. I suspect it's because of this snippet of code in the plot-types.R (line 128-137)

### Compute ranking for each data split
  for (i in seq_along(data)) {
    for (j in seq_along(data[[i]])) {
      A <- rank(data[[i]][[j]][["z"]])
      B <- max(A)
      data[[i]][[j]][["rank"]] <- A / B
    }
    data[[i]] <- .br(data[[i]])
  }
  data <- .br(data)

To Reproduce
I'll use a sampled data set from my simulation.

alpha_hat <- c(0.656359464634274,0.526704505967593,0.475676356717568,0.611939636587922,0.433070092974109,0.531604887583618,0.3642027920398,0.531331114468528,0.588184525037812,0.540509288564473,0.630490278895853,0.419239614571934,0.311393213642424,0.567859683275186,0.475294354930879,0.335069477933224,0.393376022987897,0.488597053641829,0.44111704925641,0.557230281119142,0.535521802048021,0.360641641250493,0.376916394535027,0.700009157933441,0.555590820626339,0.432729743038088,0.47602591651468,0.407165879254845,0.648791431736419,0.402439346836696,0.422020613532311,0.461372936266911,0.610453889026044,0.496020984616336,0.761399591470052,0.510093438756691,0.458627498094725,0.437950861818385,0.421842676858636,0.443855078787984,0.50969516325107,0.587861788774241,0.291916330122512,0.434125117893412,0.433454777024997,0.511627050589271,0.459155155336153,0.429396260785516,0.456188743738326,0.515155142417543,0.563332002383864,0.371271603813827,0.419740675799457,0.333860698365792,0.550980951167078,0.360815702482087,0.500727639504405,0.60203471591322,0.306630734081198,0.269764664128963,0.510332439298986,0.59799154560772,0.558552615764875,0.431083460127381,0.45976278550078,0.470787444713912,0.452324069240977,0.703791145336808,0.647075248755021,0.832873099582439,0.495935534495554,0.706556551771328,0.583224205575491,0.687724839249629,0.601849868297012,0.360831845580547,0.478834996706738,0.599706040542091,0.397495413439359,0.421958750530489,0.345298899443421,0.390183319076788,0.414163909071519,0.320115727483855,0.441042037469589,0.381943055461818,0.321881286547786,0.400309390417769,0.495307622132507,0.45342339815497,0.550015327079968,0.409066439214623,0.576445248812615,0.466516714928062,0.234683566538901,0.459852842421986,0.597533187100067,0.403086064443031,0.451341483478344,0.666399673328944,0.234458230000808,0.478848118448268,0.624614855114165,0.522972041510484,0.535158133007044,0.548072254219487,0.537181720821365,0.393642353453084,0.326127227582349,0.430601003638648,0.436634844609421,0.535971402274416,0.476503450580366,0.395516117032172,0.296819555672007,0.588739836506621,0.450412961897042,0.668888542609458,0.367039351829482,0.517593633344588,0.546332564525076,0.433356156007874,0.361247659864539,0.522646057074194,0.701978649558848,0.282893148712807,0.368777137029908,0.385916892048012,0.478428169180524,0.556735336661497,0.461782338107608,0.240889539175967,0.450442556936949,0.50395907501438,0.594767998824096,0.376631000454423,0.548667042325616,0.619626506280488,0.373297291489086,0.327423398242716,0.535797694529746,0.620001370086969,0.478428693647047,0.267380682893406,0.664936740364319,0.320958716760762,0.574130125264438,0.51683659026773,0.383530662423678,0.653676024926081,0.400056675628522,0.465787304179851,0.628506483741265,0.37416834008451,0.359278109711074,0.525553364014043,0.453795930785715,0.492027397507949,0.553506356133726,0.501756925805646,0.484465875220644,0.462224849453566,0.484083035949571,0.491544869953084,0.424319468574546,0.460388785526096,0.35218640663993,0.540480282117303,0.62107684275757,0.644100839717473,0.440324403747399,0.387972431408694,0.498022570557973,0.515264882653275,0.329038890471325,0.351338762916926,0.415220904394264,0.546514921861891,0.365011517557668,0.465330645969789,0.240349001740987,0.516012353672969,0.570196827720793,0.467949716267977,0.791813963666541,0.446265816875625,0.456659982044231,0.581921696646041,0.501456806088658,0.433932103817059,0.47819598563163,0.469649715706258,0.514535210015505,0.480760313244703,0.4826917981395,0.54125148252936,0.484310901487429,0.467584743003643,0.469866032133875,0.541475708763287)
alpha_se <- c(0.0877388154467468,0.0880965779931617,0.0871032776606541,0.0870952699388111,0.0860859772099564,0.0844096916262812,0.0847779508620272,0.0876910246069201,0.0874591696788071,0.0881192091567087,0.0888768086820969,0.0863118464597798,0.0831056940197959,0.0873145486068549,0.0841897599464942,0.0855609263238139,0.0857505076762107,0.0864398693651952,0.0852459947844958,0.0861676550845069,0.0853344112023075,0.0845581886908714,0.084562556477393,0.0887632345833936,0.0887708595029702,0.0834631767827538,0.0864141915945574,0.0862500598468023,0.0882076406070766,0.0852975369513632,0.0856892173201958,0.087308748973989,0.0872976041035144,0.0881334747993714,0.0908928205894117,0.0872485147785255,0.0868809047244885,0.0847561448220887,0.0860052294288993,0.0850672333592025,0.085545193545447,0.0878336356036802,0.082985657881651,0.0840846711437843,0.0858425442979095,0.0850820661220407,0.0852488529344636,0.0840337622612644,0.0852160043336433,0.0886245308952232,0.0885179064768261,0.0841318627048585,0.0864231415812021,0.0855925114070241,0.0885779377746889,0.0859240099529746,0.0867139916183447,0.0867056742865064,0.0850604611239242,0.0831182679445331,0.0847818967973359,0.086424804446052,0.0867490104857623,0.0848188685273843,0.0852141839616433,0.0861225047009766,0.0849519514593007,0.0877433604870608,0.0885318639357468,0.090695995893096,0.0869244122056348,0.0916963923068694,0.086388462540225,0.0908382152929422,0.0882429173575183,0.0847401873923082,0.0849225515128795,0.0875193377590677,0.0875068315575047,0.0852585816971395,0.08385447113872,0.0839859962066267,0.0852962831929812,0.0855989508915641,0.0858309887352697,0.0849566737404435,0.0839556902000267,0.0863805769063671,0.0849294951555101,0.0855723290403614,0.0864596848120583,0.0858742741180604,0.0899018655735256,0.0866704583505727,0.085091797765269,0.0844344369609142,0.0884664929065622,0.0845884380924548,0.0852570862815936,0.0895219654024349,0.0831617430271058,0.0869228284332124,0.087229139976361,0.0892452217149429,0.0880488652500995,0.0873585438532981,0.0857115173412019,0.085533600915808,0.0823869937765032,0.0841550814699975,0.0852316279093306,0.0862300147752962,0.0870915712316277,0.0876293288949928,0.0838083790323801,0.0872049493275479,0.0876130958680178,0.0873840948470912,0.0866888628464038,0.0848829346886492,0.0846121169369514,0.0856475790230622,0.084865955471184,0.0877627675089955,0.0883307166164756,0.0833776956154428,0.0845355334393845,0.0852651502251628,0.0871530694056119,0.0869473322272491,0.0856248393598056,0.0842310396350257,0.0851942616840299,0.0865559373810271,0.0863320837281201,0.0852141830301668,0.0863202108429177,0.087425506536314,0.087071603627396,0.0840080187145488,0.0851211925609332,0.0872372179847335,0.0872616328043539,0.0843151946801957,0.0905852973313057,0.0854785124063171,0.0885656377192217,0.0870511884010271,0.083701386255851,0.0889173373612042,0.0847990463514356,0.085206286226956,0.089164236315406,0.0854715264099306,0.0844259774674939,0.0872937976564537,0.0850787819383495,0.0874717112662698,0.0879503296185651,0.0856681094812232,0.0847651870036599,0.083889226966191,0.0897029284772413,0.0861083382370717,0.0835363826507567,0.0853061291719781,0.0839888077409142,0.0883860899643039,0.0882127469767168,0.0871186903772599,0.0839470030628867,0.0853619698527822,0.0860274629386376,0.0882814209472338,0.0837348518904601,0.0844389033173324,0.0846287395937193,0.0870233729653114,0.0846554097159615,0.0858466598604219,0.0825044485516709,0.0846911472880101,0.0866654946222681,0.0849506678475894,0.0898233360807913,0.0839726457354748,0.0856256441265963,0.0870032488395095,0.0852893933550954,0.0849672160314836,0.0848249319771885,0.0876146908692248,0.0866914277354994,0.0869564398271178,0.0854766334806529,0.0854470399926068,0.0855142243100405,0.08786827774222,0.0836169828656378,0.0847226220963862)
df <- data.frame(alpha_hat, alpha_se)
ss <- simsum(sample_data, estvarname = "alpha_hat", se = "alpha_se", true = 1, x = T)
autoplot(ss, "zip")

Expected behaviour
Here's my basic implementation using the percent_rank() function.

df %>%
  mutate(alpha_empse = sd(alpha_hat),
         alpha_ci_lower = alpha_hat - qnorm(0.975) * alpha_se,
         alpha_ci_upper = alpha_hat + qnorm(0.975) * alpha_se,
         alpha_z_score = (alpha_hat - 1) / alpha_se) %>%
  mutate(covering = alpha_ci_lower < 1 & 1 < alpha_ci_upper) %>%
  mutate(ci_percent = percent_rank(abs(alpha_z_score))) %>%
  ggplot(aes(y = ci_percent, x = alpha_hat, color = covering)) +
  geom_linerange(aes(xmin = alpha_ci_lower, xmax = alpha_ci_upper))

image

# OR
df %>%
  mutate(alpha_empse = sd(alpha_hat),
         alpha_ci_lower = alpha_hat - qnorm(0.975) * alpha_se,
         alpha_ci_upper = alpha_hat + qnorm(0.975) * alpha_se,
         alpha_z_score = (alpha_hat - 1) / alpha_se) %>%
  mutate(covering = alpha_ci_lower < 1 & 1 < alpha_ci_upper) %>%
  mutate(ci_rank = rank(abs(alpha_z_score))) %>%
  mutate(ci_percent = ci_rank / max(ci_rank)) %>%
  ggplot(aes(y = ci_percent, x = alpha_hat, color = covering)) +
  geom_linerange(aes(xmin = alpha_ci_lower, xmax = alpha_ci_upper))

image

System information:

R version 4.1.2 (2021-11-01)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Big Sur 11.5.2

Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] dplyr_1.0.7    ggplot2_3.3.5  rsimsum_0.11.1

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.7       pillar_1.6.4     compiler_4.1.2   plyr_1.8.6       tools_4.1.2      digest_0.6.28   
 [7] lifecycle_1.0.1  tibble_3.1.5     gtable_0.3.0     checkmate_2.0.0  pkgconfig_2.0.3  rlang_0.4.12    
[13] cli_3.1.0        DBI_1.1.1        rstudioapi_0.13  xfun_0.28        fastmap_1.1.0    withr_2.4.2     
[19] knitr_1.36       generics_0.1.0   vctrs_0.3.8      grid_4.1.2       tidyselect_1.1.1 glue_1.4.2      
[25] R6_2.5.1         fansi_0.5.0      clipr_0.8.0      farver_2.1.0     purrr_0.3.4      magrittr_2.0.1  
[31] scales_1.1.1     backports_1.2.1  ggridges_0.5.3   ellipsis_0.3.2   htmltools_0.5.2  assertthat_0.2.1
[37] colorspace_2.0-2 labeling_0.4.2   utf8_1.2.2       munsell_0.5.0    crayon_1.4.2   

User-provided, replication-wise confidence intervals for coverage

Idea from the simulation course, t-test example: passing the t-based confidence intervals as columns in data:

simsum(data, ..., ci.limits = c("column.lower", "column.upper"), ...)

This is compatible with the current functionality, in which case the column should have the same values for each replication.

Missing parameter names in multisimsum with true values as a column in data

Need to re-think how to identify parameter names from column in data, e.g. the following is broken:

library(rsimsum)
data("frailty", package = "rsimsum")
frailty$true <- ifelse(frailty$par == "trt", -0.50, 0.75)
ms <- multisimsum(data = frailty, par = "par", estvarname = "b", true = "true")
ms
#> 
#> Estimands variable: par 
#>  Unique estimands: fv, trt 
#>  True values: fv = true, trt = NA 
#> 
#> Method variable: none
#> 
#> By factors: none
#> 
 #> Monte Carlo standard errors were computed.

Created on 2020-08-17 by the reprex package (v0.3.0)

Release rsimsum 0.11.1

Prepare for release:

  • Check current CRAN check results
  • Polish NEWS
  • devtools::build_readme()
  • urlchecker::url_check()
  • devtools::check(remote = TRUE, manual = TRUE)
  • devtools::check_win_devel()
  • rhub::check_for_cran()
  • revdepcheck::revdep_check(num_workers = 4)
  • Update cran-comments.md
  • Check on macOS Builder
  • Close #38

Submit to CRAN:

  • devtools::submit_cran()
  • Approve email

Wait for CRAN...

  • Accepted 🎉
  • usethis::use_github_release()
  • usethis::use_dev_version()

Release rsimsum 0.11.3

Prepare for release:

  • git pull
  • Check current CRAN check results
  • Polish NEWS
  • devtools::build_readme()
  • urlchecker::url_check()
  • devtools::check(remote = TRUE, manual = TRUE)
  • devtools::check_win_devel()
  • rhub::check_for_cran()
  • revdepcheck::revdep_check(num_workers = 4)
  • Update cran-comments.md
  • git push

Submit to CRAN:

  • usethis::use_version('patch')
  • devtools::submit_cran()
  • Approve email

Wait for CRAN...

  • Accepted 🎉
  • git push
  • usethis::use_github_release()
  • usethis::use_dev_version()
  • make docs
  • git push

Release rsimsum 0.10.0

Prepare for release:

  • Check current CRAN check results
  • Polish NEWS
  • devtools::build_readme()
  • urlchecker::url_check()
  • devtools::check(remote = TRUE, manual = TRUE)
  • devtools::check_win_devel()
  • rhub::check_for_cran()
  • revdepcheck::revdep_check(num_workers = 4)
  • Update cran-comments.md
  • Review pkgdown reference index for, e.g., missing topics
  • Draft blog post

Submit to CRAN:

  • usethis::use_version('minor')
  • devtools::submit_cran()
  • Approve email

Wait for CRAN...

  • Accepted 🎉
  • usethis::use_github_release()
  • usethis::use_dev_version()
  • Finish blog post
  • Tweet
  • Add link to blog post in pkgdown news menu
  • Close this issue 👍

true variable cannot also be in by

We might want a situation where the true parameter value is varying across scenario settings

Currently simsum gives a warning message (rather than an error) if the true variable is also in by. But the nlp plot does not work.

A contrived example below is shown where esigma is assumed to be the true value of the parameter (estvarname)

s.nlp.true <- rsimsum::simsum(
  data = nlp, estvarname = "b", true = "esigma", se = "se",
  methodvar = "model", by = c("baseline", "ss", "esigma")
)
autoplot(s.nlp.true, stats = "bias", type = "nlp")

Work-around
A work-around is to create a copy of the true parameter.

nlp$esigma.copy <- nlp$esigma
s.nlp.true2 <- rsimsum::simsum(
  data = nlp, estvarname = "b", true = "esigma.copy", se = "se",
  methodvar = "model", by = c("baseline", "ss", "esigma")
)
autoplot(s.nlp.true2, stats = "bias", type = "nlp")

Relative bias

Add relative bias as a performance measure, including its Monte Carlo error.

Error 403 for some URLs

Need to fix some URLs:

> urlchecker::url_check()
✖ Error: vignettes/A-introduction.Rmd:335:157 403: Forbidden
* White, I.R., and P. Royston. 2009. _Imputing missing covariate values for the Cox model_. Statistics in Medicine 28(15):1982-1998 <[doi:10.1002/sim.3618](https://doi.org/10.1002/sim.3618)>
                                                                                                                                                            ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~Error: README.md:319:28 403: Forbidden
  \<[doi:10.1002/sim.8086](https://doi.org/10.1002/sim.8086)\>
                           ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~Error: vignettes/A-introduction.Rmd:334:162 403: Forbidden
* Morris, T.P., White, I.R. and Crowther, M.J. 2019. _Using simulation studies to evaluate statistical methods_. Statistics in Medicine, <[doi:10.1002/sim.8086](https://doi.org/10.1002/sim.8086)>
                                                                                                                                                                 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~Error: vignettes/A-introduction.Rmd:336:139 403: Forbidden
* Little, R.J.A., and D.B. Rubin. 2002. _Statistical analysis with missing data_. 2nd ed. Hoboken, NJ: Wiley <[doi:10.1002/9781119013563](https://onlinelibrary.wiley.com/doi/book/10.1002/9781119013563)>
                                                                                                                                          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

autoplot() with type = "est_density"

This warning appeared when rebuilding the README file, need to look into it:

autoplot(s, type = "est_density")
#> Warning: `stat(level)` was deprecated in ggplot2 3.4.0.
#> ℹ Please use `after_stat(level)` instead.
#> ℹ The deprecated feature was likely used in the rsimsum package.
#>   Please report the issue at <https://github.com/ellessenne/rsimsum/issues>.
#> `geom_smooth()` using formula = 'y ~ x'

Release rsimsum 0.13.0

Prepare for release:

  • git pull
  • Check current CRAN check results
  • Check if any deprecation processes should be advanced, as described in Gradual deprecation
  • Polish NEWS
  • usethis::use_github_links()
  • urlchecker::url_check()
  • devtools::build_readme()
  • devtools::check(remote = TRUE, manual = TRUE)
  • devtools::check_win_devel()
  • revdepcheck::revdep_check(num_workers = 4)
  • make docs
  • make pre_submission_test
  • Update cran-comments.md
  • git push

Submit to CRAN:

  • devtools::submit_cran()
  • Approve email

Wait for CRAN...

  • Accepted 🎉
  • usethis::use_dev_version(push = TRUE)

Improve control functionality

I should re-implement the control logic, e.g., see the implementation of survival::coxph.control:

survival::coxph.control
#> function (eps = 1e-09, toler.chol = .Machine$double.eps^0.75, 
#>     iter.max = 20, toler.inf = sqrt(eps), outer.max = 10, timefix = TRUE) 
#> {
#>     if (iter.max < 0) 
#>         stop("Invalid value for iterations")
#>     if (eps <= 0) 
#>         stop("Invalid convergence criteria")
#>     if (eps <= toler.chol) 
#>         warning("For numerical accuracy, tolerance should be < eps")
#>     if (toler.inf <= 0) 
#>         stop("The inf.warn setting must be >0")
#>     if (!is.logical(timefix)) 
#>         stop("timefix must be TRUE or FALSE")
#>     list(eps = eps, toler.chol = toler.chol, iter.max = as.integer(iter.max), 
#>         toler.inf = toler.inf, outer.max = as.integer(outer.max), 
#>         timefix = timefix)
#> }
#> <bytecode: 0x7fda8182a3c8>
#> <environment: namespace:survival>

Created on 2023-05-27 with reprex v2.0.2

Bug in autoplot

library(rsimsum)
library(ggplot2)
data("tt", package = "rsimsum")
s6 <- simsum(data = tt, estvarname = "diff", se = "se", true = -1, x = TRUE)
autoplot(s6, type = "est")
#> Error in .subset2(x, i, exact = exact): attempt to select less than one element in get1index

Created on 2020-02-28 by the reprex package (v0.3.0)

Passing a column of p-values for calculating power

Suggested by @Kaladani (in #33):

I'm glad I could help.
As an idea for the the next update (post 0.1.0) you might want to consider to allow to add a column of p-values by the user.
This way, power could be calculated even if the model is using neither z- nor t-test.
This might be particularly useful to allow for e.g. rejection rate of model fit and comparison tests.
It would also align stronger to Morris et. a. 2019 definition of power/rejection estimation.

Clarifications within Introduction vignette

Issue for JOSS review.

Below are some items I feel would benefit from some additional clarification in the vignette to ensure users are comfortable with using the simsum function and understanding the output.

  • The data from the first example are described well, however there was very little discussion of how to use the simsum function except for showing the function used with the MIsim data. For example, how do we know the true value is 0.5 or -0.5, why is ref = "CC" and not specified in the second example, etc?
  • I found the discussion of the output from simsum to be relatively brief which leaves a lot of burden on the user to understand the output coming from the function with no discussion. I believe some discussion of the output would improve the user experience and confidence in understanding how to interpret the specific statistics.
  • I wonder if a table would be useful for what the values in the "stat" column represent after running the get_data function.
  • From the vignette and simsum documentation, it is not completely clear to me the difference between the methodvar and by arguments. I believe the main distinction seems to be within vs between simulation conditions (this was gathered from the JOSS paper), however from the documentation itself, I was unsure.

Plotting functions

Issue for JOSS review.

A few questions/clarifications I had regarding the plotting output and functions.

  • The pattern and zip plot functions do not need to specify the by argument, but the other plotting functions do. Is it possible to reuse the by argument from the simsum output object? This would be more consistent across the plotting functions and I also think would be easier for users to not have to reuse the same arguments across functions.
  • Is the sample size order in the figures customizable? In some of the figures from the plotting vignette they go from smallest to largest, but others go from largest to smallest. It'd be great if this could be customizable, otherwise being consistent would be second best.
  • Specific to the plotting vignette. I understand what "multiple estimands" means for the multsimsum function, however to ensure the vignette is more approachable to a variety of audiences, I would consider adding synonyms for the word estimands used in other fields.

Equations not rendering in Vignettes

Issue for JOSS review.

The equations are currently not rendering in any of the vignettes. This occurred when I used install.packages("rsimsum") and installing directly from GitHub with devtools::install_github("ellessenne/rsimsum", build_vignettes = TRUE).

This makes the introduction vignette particularly difficult and also makes it difficult to understand the computations the package does without users digging into the code.

Statement of Need

Issue for JOSS review.

I found the statement of need well articulated in the JOSS paper itself, however I wonder if this would be useful to also articulate in the README of the software repository. Users are probably more likely to see the package README first compared to the JOSS paper, therefore being more clear about the uses and goals of the package in the README is beneficial in my mind.

Release rsimsum 0.10.1

Prepare for release:

Submit to CRAN:

  • devtools::submit_cran()
  • Approve email

Wait for CRAN...

  • Accepted 🎉
  • usethis::use_github_release()
  • usethis::use_dev_version()

Conflicts with {broom}'s tidy()

The following doesn't recognise {rsimsum}'s tidy() function if the {broom} package is loaded:

library(broom)
library(rsimsum)

data("MIsim", package = "rsimsum")
s <- rsimsum::simsum(
  data = MIsim, estvarname = "b", true = 0.5, se = "se",
  methodvar = "method", x = TRUE
)
#> 'ref' method was not specified, CC set as the reference

library(ggplot2)
autoplot(s, type = "lolly")
#> Error: No tidy method recognized for this list.

Created on 2021-11-01 by the reprex package (v2.0.1)

Issue when number of simulations per data-generating mechanism are not of equal size

Describe the bug

Hi - thanks for the excellent package! I think there is an issue when running simsum() / multisimsum() on a dataset from a simulation study when nsim differs per data-generating mechanism.

To Reproduce

Here is an example with the frailty dataset from the package:

# Load library and data
library(rsimsum)
data("frailty", package = "rsimsum")

# Duplicate last row and and to data
frailty <- rbind.data.frame(frailty, frailty[nrow(frailty), ])

# Check nsim
with(frailty, table(model, interaction(fv_dist, par)))

model               Gamma.fv Log-Normal.fv Gamma.trt Log-Normal.trt
  Cox, Gamma            1000          1000      1000           1000
  Cox, Log-Normal       1000          1000      1000           1000
  RP(P), Gamma          1000          1000      1000           1000
  RP(P), Log-Normal     1000          1000      1000           1001

Attempt with multisimsum() :

ms1 <- multisimsum(
  data = frailty,
  par = "par", 
  true = c(trt = -0.50, fv = 0.75),
  estvarname = "b",
  se = "se",
  ref = "Cox, Gamma",
  methodvar = "model",
  by = "fv_dist"
)

Error in stats::cor(data[[i]][[ref]][[estvarname]], data[[i]][[x]][[estvarname]],  : 
  incompatible dimensions

So it is getting stuck here it seems..

Expected behaviour

Should return a multisimsum object with correct nsim values.

System information:

R version 4.0.3 (2020-10-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 17134)

Matrix products: default

locale:
[1] LC_COLLATE=Dutch_Netherlands.1252  LC_CTYPE=Dutch_Netherlands.1252   
[3] LC_MONETARY=Dutch_Netherlands.1252 LC_NUMERIC=C                      
[5] LC_TIME=Dutch_Netherlands.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] rsimsum_0.9.1

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.5       dplyr_1.0.2      crayon_1.3.4     plyr_1.8.6       grid_4.0.3      
 [6] R6_2.5.0         ggridges_0.5.2   lifecycle_0.2.0  gtable_0.3.0     backports_1.2.0 
[11] magrittr_2.0.1   scales_1.1.1     pillar_1.4.7     ggplot2_3.3.2    rlang_0.4.9     
[16] rstudioapi_0.13  generics_0.1.0   vctrs_0.3.6      checkmate_2.0.0  ellipsis_0.3.1  
[21] tools_4.0.3      glue_1.4.2       purrr_0.3.4      munsell_0.5.0    compiler_4.0.3  
[26] pkgconfig_2.0.3  colorspace_2.0-0 tidyselect_1.1.0 tibble_3.0.4

Additional context

I think it would be a welcome fix since the number of simulated datasets will likely vary if for example you vary the sample size per scenario, in which case lower values of sample size would need larger nsim to achieve the same Monte-Carlo error for some parameter :-)

Thanks!
-Ed

Release rsimsum 0.11.0

Prepare for release:

  • Check current CRAN check results
  • Polish NEWS
  • devtools::build_readme()
  • urlchecker::url_check()
  • devtools::check(remote = TRUE, manual = TRUE)
  • devtools::check_win_devel()
  • rhub::check_for_cran()
  • Check on R Mac Builder
  • revdepcheck::revdep_check(num_workers = 4)
  • Update cran-comments.md
  • Review pkgdown reference index for, e.g., missing topics
  • Draft blog post

Submit to CRAN:

  • usethis::use_version('minor')
  • devtools::submit_cran()
  • Approve email

Wait for CRAN...

  • Accepted 🎉
  • usethis::use_github_release()
  • usethis::use_dev_version()
  • Finish blog post
  • Tweet
  • Add link to blog post in pkgdown news menu

Discrepancy between code and formula in paper

Hi- thank you for this extremely helpful package! I have noticed a discrepancy between the following line of code:

if (!is.null(se)) relerror_mcse <- 100 * (modelse / empse) * sqrt(se2_var / (4 * nsim * modelse^4) + 1 / (2 * nsim - 1))

and the corresponding formula in Table 6 of the Morris et al. Statistics in Medicine paper. To match what's written in the paper, the line of code would need to be
if (!is.null(se)) relerror_mcse <- 100 * (modelse / empse) * sqrt(se2_var / (4 * nsim * modelse^4) + 1 / (2 * (nsim - 1))) (parens around the final "nsim - 1"). I don't know which one is correct, but I'm guessing it's the one in the paper.

Issue reported by CRAN regarding @docType

As per e-mail from CRAN:

[...]

You have file 'rsimsum/man/rsimsum.Rd' with \docType{package}, likely
intended as a package overview help file, but without the appropriate
PKGNAME-package \alias as per "Documenting packages" in R-exts.

This seems to be the consequence of the breaking change

 Using @docType package no longer automatically adds a -package alias.
 Instead document _PACKAGE to get all the defaults for package
 documentation.

in roxygen2 7.0.0 (2019-11-12) having gone unnoticed, see
<https://github.com/r-lib/roxygen2/issues/1491>.

As explained in the issue, to get the desired PKGNAME-package \alias
back, you should either change to the new approach and document the new
special sentinel

 "_PACKAGE"

or manually add

 @aliases rsimsum-package

if remaining with the old approach.

[...]

estvarname cannot be named est

Describe the bug
estvarname cannot be called "est". An easy fix is just to call it something else, but am reporting this anyway for completeness!

data("nlp", package = "rsimsum")
nlp.rename <- nlp %>% rename(est = b)
s.nlp.estvarname <- rsimsum::simsum(
  data = nlp.rename, estvarname = "est", true = 0, se = "se",
  methodvar = "model", by = c("baseline", "ss", "esigma")
)

Multiple columns identifying methods

Multiple columns identifying methods could be collapsed into a single column and be processed like that:

data[[".method"]] <- do.call(interaction, as.list(data[methodvar]))

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.