Code Monkey home page Code Monkey logo

indrajeetpatil / ggstatsplot Goto Github PK

View Code? Open in Web Editor NEW
2.0K 2.0K 184.0 2.19 GB

Enhancing {ggplot2} plots with statistical analysis ๐Ÿ“Š๐Ÿ“ฃ

Home Page: https://indrajeetpatil.github.io/ggstatsplot/

License: GNU General Public License v3.0

R 89.77% TeX 9.73% Shell 0.09% Makefile 0.41%
bayes-factors datascience dataviz effect-size ggplot-extension hypothesis-testing non-parametric-statistics r regression-models statistical-analysis

ggstatsplot's Introduction

About Me

LinkedIn

I am a software developer with a passion for data science and full-stack web development. With several years of experience in building data-driven products, I have acquired a deep understanding of how to create robust and scalable software solutions that solve business problems. I have a proven track record of delivering high-quality, user-friendly software solutions to real-world problems; my open-source software packages have been downloaded over 40 million times. Additionally, I have 10+ years of experience in using data science tools to draw insights from complex experimental datasets. My skills include statistical modelling, experimental design, data analysis, visualization, and communication. I have 20+ peer-reviewed publications with 8000+ citations and 2 online books.

Software development

I have authored over a dozen R packages that deal with:

  • statistical analysis and reporting (e.g. easystats project, a collection of 10 packages)
  • data visualization (e.g. ggstatsplot: Enhancing {ggplot2} plots with statistical analysis)
  • code analysis (lintr: Static code analysis for R)
  • code formatting (styler: Non-invasive pretty-printing of R code)

Other authored packages include:

datawizard, performance, parameters, insight, effectsize, bayestestR, modelbased, ggsignif, see, statsExpressions, report, correlation, styler, ospsuite, ospsuite.utils, esqlabsR

Presentations

If you are interested in good programming and software development practices, check out my slide decks.

ggstatsplot's People

Contributors

antoinesoetewey avatar csoneson avatar danheck avatar dependabot[bot] avatar emilhvitfeldt avatar hbaniecki avatar ibecav avatar indrajeetpatil avatar mikemahoney218 avatar wibeasley avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ggstatsplot's Issues

Bug in gghistostats

Hi,

Love this package. Forked to learn and study more. Encountered a bug in gghistostats that generates:

Error in as.list.environment(x, all.names = TRUE) :
object 'len' not found

For what should be identical behavior between non dataframe format. I tried on both the CRAN and GitHub versions of the code. I'll put some sleuthing into it but I only just started looking at your code which is quite complex.

Thank you. Chuck

# insert reprex here
library(ggstatsplot)
# Minimum reproducible example
gghistostats(
  x = ToothGrowth$len,
  xlab = "Tooth length",
  bar.measure = "mix"
)
gghistostats(
  data = ToothGrowth,
  x = len, 
  xlab = "Tooth length",
  bar.measure = "mix"
)
gghistostats(
  data = ToothGrowth,
  x = len, 
  xlab = "Tooth length"
)

`theme_wsj()` doesn't work with ggstatsplot layer

# doesn't work
ggstatsplot::ggscatterstats(
  data = iris,
  x = Sepal.Length,
  y = Petal.Width,
  ggtheme = ggthemes::theme_wsj(),
  ggstatsplot.layer = TRUE
)
#> Error in unit(rep(just$hjust, n), "npc"): 'x' and 'units' must have length > 0

# works
ggstatsplot::ggscatterstats(
  data = iris,
  x = Sepal.Length,
  y = Petal.Width,
  ggtheme = ggthemes::theme_wsj(),
  ggstatsplot.layer = FALSE
)
#> Warning: The plot is not a `ggplot` object and therefore can't be further modified with `ggplot2` functions.
#> 

Created on 2018-09-25 by the reprex package (v0.2.1)

Typo in explanation test

Please briefly describe your problem and what output you expect. If you have a question, please don't use this form. Instead, ask on https://stackoverflow.com/ or https://community.rstudio.com/.

Please include a minimal reproducible example (AKA a reprex). If you've never heard of a reprex before, start by reading https://www.tidyverse.org/help/#reprex.


Brief description of the problem

"t-tets" is written instead of "t-test" in last line of first paragraph

# insert reprex here

standardizing regression coefficients

@ibecav Opening a new issue to discuss how to standardize regression coefficients.

The existing functions that do this:

The latter is the most general in the sense that it can work with any model object.

I think we should do something similar to by_2sd: write a stand alone function (maybe with S3-methods) that takes regression model objects and outputs a tidy dataframe with standardized estimates and their confidene intervals (using broom::tidy/broom.mixed::tidy() in the backend).

I still think this issues should be given a low priority (because there is still the option of rescaling variables that can alleviate resolution issues) compared to writing tests. I am a bit bummed that all results from ggcoefstats function with merMod objects are showing incorrect results on CRAN vignettes due to the broom.mixed bug (bbolker/broom.mixed#30). If there were tests, this would have been caught immediately.

When marginal=TRUE on ggscatterstats, graphs does display when you run the chunk within an R notebook

This issue appears to be specific to running a chunk within an R notebook.

When I run ggscatterstats within a chunk in an R notebook with marginal=TRUE, I get the error "Warning: This function doesn't return ggplot2 object and is not further modifiable with ggplot2 commands."

Here is my code:

library(ggstatsplot)
ggscatterstats(cars,speed,dist,marginal=TRUE)

I should note I also tried with messages=FALSE. That suppressed the message, but did not render the plot.

And for clarity, here is a screen shot of it not working with marginal, and working without marginal.

image

I love those histograms, so I hope there is a way to do this in notebooks!

Feature request ggpiestats

Like to suggest we add to ggpiestats something similar to test.k for gghistostats that is a way to change the number of decimal places show for the labels on the pie slices. Right not it just displays a rounded integer.

bug in pairwise_p for Student's t test comparisons

set.seed(123)
library(ggstatsplot)

# works properly (with the defaults)
pairwise_p(movies_wide,
           mpaa,
           rating,
           var.equal = FALSE)
#> Note: The parametric pairwise multiple comparisons test used-
#>  Games-Howell test.
#>  Adjustment method for p-values: holm
#> 
#> # A tibble: 3 x 11
#>   group1 group2 mean.difference conf.low conf.high    se t.value    df
#>   <chr>  <chr>            <dbl>    <dbl>     <dbl> <dbl>   <dbl> <dbl>
#> 1 PG-13  R                0.219    0.064     0.375 0.047   3.31  1142.
#> 2 PG-13  PG              -0.104   -0.362     0.154 0.077   0.952  309.
#> 3 R      PG              -0.323   -0.573    -0.074 0.075   3.05   277.
#> # ... with 3 more variables: p.value <dbl>, significance <chr>,
#> #   p.value.label <chr>

# doesn't work properly
pairwise_p(movies_wide,
           mpaa,
           rating,
           var.equal = TRUE)
#> Warning: Expected 2 pieces. Additional pieces discarded in 2 rows [1, 3].
#> Note: The parametric pairwise multiple comparisons test used-
#>  Student's t-test.
#>  Adjustment method for p-values: holm
#> 
#> # A tibble: 5 x 8
#>   group1 group2 mean.difference conf.low conf.high  p.value significance
#>   <chr>  <chr>            <dbl>    <dbl>     <dbl>    <dbl> <chr>       
#> 1 PG     13               0.104  -0.140      0.348 NA       <NA>        
#> 2 R      PG               0.323   0.0944     0.552  0.00283 **          
#> 3 R      PG               0.219   0.0570     0.381  0.00283 **          
#> 4 PG-13  PG              NA      NA         NA      0.316   ns          
#> 5 R      PG-13           NA      NA         NA      0.00310 **          
#> # ... with 1 more variable: p.value.label <chr>

Created on 2018-12-14 by the reprex package (v0.2.1)

This issue is specific to dataframes where x factor levels have a - in their name, which messes with the following code (esp. L331-335):

if (isTRUE(var.equal) || isTRUE(paired)) {
df <-
dplyr::full_join(
# mean difference and its confidence intervals
x = stats::aov(formula = y ~ x, data = data) %>%
stats::TukeyHSD(x = .) %>%
broom::tidy(x = .) %>%
dplyr::select(
.data = .,
comparison, estimate, conf.low, conf.high
) %>%
tidyr::separate(
data = .,
col = comparison,
into = c("group1", "group2"),
sep = "-"
) %>%
dplyr::rename(.data = ., mean.difference = estimate),
y = broom::tidy(
stats::pairwise.t.test(
x = data$y,
g = data$x,
p.adjust.method = p.adjust.method,
paired = paired,
alternative = "two.sided",
na.action = na.omit
)
) %>%
ggstatsplot::signif_column(data = ., p = p.value),
by = c("group1", "group2")
)

Problem installing ggstatplot - NON ZERO EXIT status

Hi,

As prescribed,
tried installing ggstatsplot from CRAN
but get the error message (below).
Same error
if I try to install from GITHUB ("If you have time" install code)

ERROR messages at the end of install...

  • installing source package โ€˜rglโ€™ ...
    ** package โ€˜rglโ€™ successfully unpacked and MD5 sums checked
    checking for gcc... gcc -std=gnu99
    checking whether the C compiler works... yes
    checking for C compiler default output file name... a.out
    checking for suffix of executables...
    checking whether we are cross compiling... no
    checking for suffix of object files... o
    checking whether we are using the GNU C compiler... yes
    checking whether gcc -std=gnu99 accepts -g... yes
    checking for gcc -std=gnu99 option to accept ISO C89... none needed
    checking how to run the C preprocessor... gcc -std=gnu99 -E
    checking for gcc... (cached) gcc -std=gnu99
    checking whether we are using the GNU C compiler... (cached) yes
    checking whether gcc -std=gnu99 accepts -g... (cached) yes
    checking for gcc -std=gnu99 option to accept ISO C89... (cached) none needed
    checking for libpng-config... yes
    configure: using libpng-config
    configure: using libpng dynamic linkage
    checking for X... libraries , headers
    checking GL/gl.h usability... no
    checking GL/gl.h presence... no
    checking for GL/gl.h... no
    checking GL/glu.h usability... no
    checking GL/glu.h presence... no
    checking for GL/glu.h... no
    configure: error: missing required header GL/gl.h
    ERROR: configuration failed for package โ€˜rglโ€™
  • removing โ€˜/home/ray/R/i686-pc-linux-gnu-library/3.5/rglโ€™
    Error in i.p(...) :
    (converted from warning) installation of package โ€˜rglโ€™ had non-zero exit status

And so, ggstatsplot is not installed...
Help!
SFd99
San Francisco

Using latest Rstudio, w/R 351, Ubuntu Linux 14.04 32-bits.

Package Installation without .rmd files

Hi - my employer blocks .rmd files and this unfortunately prevents me from being able to install the package; I was wondering if there was a way to install it without these - not sure how this would work, but figured I would ask. Thanks for your time.

this is a snippet of the error: /README.Rmd': Permission denied

Shapiro-Wilk test

When I try to plot between-group statistics with ggbetweenstats, it fails for samples larger than 5000 because of the Shapiro-Wilk test:

dat <- data.frame(x = c(rep(1,2500),rep(2,2501)), y = rnorm(5001))
ggbetweenstats(data = dat, x = x, y = y)
Warning:  aesthetic `x` was not a factor; converting it to factor
Reference:  Welch's t-test is used as a default. (Delacre, Lakens, & Leys, International Review of Social Psychology, 2017).
Error in stats::shapiro.test(data$y) :
  sample size must be between 3 and 5000

This could be fixed easily by using ks.test(y, "pnorm") instead.

Feature request for new function ggbarstats

So I find piecharts very appealing for looking at univariate situations, although there are many who dislike them even when you add labels as you have. But as soon as you move to bivariate cases especially when one of the variables has multiple factor levels, I (and my students more so) have a hard time interpreting multiple pie charts depicting the relationship between two variables.

So looking at the usual Titanic example currently I can imagine a function that is very similiar in nature but shifts to percentage bars with labels.

You'll notice all I have really done is take hunks of your current code and change the call to ggplot in one small way...

library(ggstatsplot)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(ggplot2)
ggpiestats(Titanic_full, main = Survived, condition = Class, title = "Test")
#> Note: Results from faceted one-sample proportion tests:
#> 
#> # A tibble: 4 x 7
#>   condition No     Yes    `Chi-squared`    df `p-value` significance
#>   <fct>     <chr>  <chr>          <dbl> <dbl>     <dbl> <chr>       
#> 1 1st       37.54% 62.46%         20.2      1     0     ***         
#> 2 2nd       58.60% 41.40%          8.43     1     0.004 **          
#> 3 3rd       74.79% 25.21%        174.       1     0     ***         
#> 4 Crew      76.05% 23.95%        240.       1     0     ***         
#> Note: 95% CI for Cramer's V was computed with 25 bootstrap samples.
#> 

# this is the crucial piece of ggpiestats defunctionalized
Titanic_full %>%
  group_by(Class,Survived) %>%
  summarize(counts = n()) %>% 
  mutate(perc = (counts / sum(counts)))-> tempdf
# only real changes are geom bar and percent y axis
ggplot(tempdf, aes(fill=Survived, y=perc, x=Class)) +
  geom_bar(stat="identity", position="fill") +
  ylab("Percent") +
  scale_y_continuous(labels = scales::percent, breaks = seq(0, 1, by = 0.10)) +
  geom_label(aes(label = paste0(round(x = perc*100, digits = 1), "%")), show.legend = FALSE, position = position_fill(vjust = 0.5)) +
  ggtitle("test", subtitle = subtitle_contigency_tab(Titanic_full,Class,Survived))
#> Note: 95% CI for Cramer's V was computed with 25 bootstrap samples.
#> 

Created on 2018-10-26 by the reprex package (v0.2.1)
Thanks for considering

unable to further modify plots w/ ggplot2 syntax

Love the package, thanks I!

However, I'm not having any luck modifying a grouped_ggbetweenstats plot with subsequent + ggplot code, see the example below. I'm an intermediate coder at best so maybe missing something obvious? In any case it works with the ungrouped version but doesn't seem to pass the command to the two plots in the grouped version. Thanks for all your work on this!


Brief description of the problem

# generate some data
d1 <- data.frame(target=rep(c('boy','girl'),100),
                 rating=rnorm(100),
                 gender=rep(c('boy','boy','girl','girl'),25))

# these work as expected

ggbetweenstats(data=d1,
               x=target,y=rating) +
  scale_y_continuous(breaks=seq(-3,3,.5))

ggbetweenstats(data=d1,
               x=target,y=rating) +
  ggplot2::labs(subtitle = NULL)

# but don't do anything in the grouped_versions

grouped_ggbetweenstats(data=d1,
                       x=target,y=rating, grouping.var = gender,
                       title.prefix = 'Participant Gender') + ggplot2::labs(subtitle = NULL)

grouped_ggbetweenstats(data=d1,
                       x=target,y=rating, grouping.var = gender,
                       title.prefix = 'Participant Gender') +
  scale_y_continuous(breaks=seq(-3,3,.5))

Bug in lm_effsize_ci

There's a bug in lm_effsize_ci that is limited to only the case where your input object is of type anova and you specify partial = FALSE note that partial = TRUE works

Reprex

library(ggstatsplot)
# works as it should
ggstatsplot:::lm_effsize_ci((lm(mpg ~ hp * wt, mtcars)), effsize = "eta", partial = FALSE)
#> # A tibble: 3 x 8
#>   term  F.value   df1   df2  p.value  etasq conf.low conf.high
#>   <chr>   <dbl> <int> <int>    <dbl>  <dbl>    <dbl>     <dbl>
#> 1 hp      146.      1    28 1.23e-12 0.602   0.472       0.772
#> 2 wt       54.5     1    28 4.86e- 8 0.224   0.0666      0.357
#> 3 hp:wt    14.1     1    28 8.11e- 4 0.0580 -0.00395     0.123
# works as it should
ggstatsplot:::lm_effsize_ci((aov(mpg ~ hp * wt, mtcars)), effsize = "eta", partial = FALSE)
#> # A tibble: 3 x 8
#>   term  F.value   df1   df2  p.value  etasq conf.low conf.high
#>   <chr>   <dbl> <dbl> <dbl>    <dbl>  <dbl>    <dbl>     <dbl>
#> 1 hp      146.      1    28 1.23e-12 0.602   0.475       0.763
#> 2 wt       54.5     1    28 4.86e- 8 0.224   0.0734      0.363
#> 3 hp:wt    14.1     1    28 8.11e- 4 0.0580 -0.00767     0.118
# fails even though anova and aov are just different aspect of the same analysis
ggstatsplot:::lm_effsize_ci(anova(aov(mpg ~ hp * wt, mtcars)), effsize = "eta", partial = FALSE)
#> Warning in anova.lm(model): ANOVA F-tests on an essentially perfect fit are
#> unreliable

#> Warning in anova.lm(model): ANOVA F-tests on an essentially perfect fit are
#> unreliable

#> Warning in anova.lm(model): ANOVA F-tests on an essentially perfect fit are
#> unreliable

#> Warning in anova.lm(model): ANOVA F-tests on an essentially perfect fit are
#> unreliable

#> Warning in anova.lm(model): ANOVA F-tests on an essentially perfect fit are
#> unreliable

#> Warning in anova.lm(model): ANOVA F-tests on an essentially perfect fit are
#> unreliable
#> Error in mutate_impl(.data, dots): Evaluation error: Result 2 is not a length 1 atomic vector.
# fails for lm as well
ggstatsplot:::lm_effsize_ci(anova(lm(mpg ~ hp * wt, mtcars)), effsize = "eta", partial = FALSE)
#> Warning in anova.lm(model): ANOVA F-tests on an essentially perfect fit are
#> unreliable

#> Warning in anova.lm(model): ANOVA F-tests on an essentially perfect fit are
#> unreliable

#> Warning in anova.lm(model): ANOVA F-tests on an essentially perfect fit are
#> unreliable

#> Warning in anova.lm(model): ANOVA F-tests on an essentially perfect fit are
#> unreliable

#> Warning in anova.lm(model): ANOVA F-tests on an essentially perfect fit are
#> unreliable

#> Warning in anova.lm(model): ANOVA F-tests on an essentially perfect fit are
#> unreliable

#> Warning in anova.lm(model): ANOVA F-tests on an essentially perfect fit are
#> unreliable

#> Warning in anova.lm(model): ANOVA F-tests on an essentially perfect fit are
#> unreliable

#> Warning in anova.lm(model): ANOVA F-tests on an essentially perfect fit are
#> unreliable

#> Warning in anova.lm(model): ANOVA F-tests on an essentially perfect fit are
#> unreliable

#> Warning in anova.lm(model): ANOVA F-tests on an essentially perfect fit are
#> unreliable

#> Warning in anova.lm(model): ANOVA F-tests on an essentially perfect fit are
#> unreliable

#> Warning in anova.lm(model): ANOVA F-tests on an essentially perfect fit are
#> unreliable

#> Warning in anova.lm(model): ANOVA F-tests on an essentially perfect fit are
#> unreliable

#> Warning in anova.lm(model): ANOVA F-tests on an essentially perfect fit are
#> unreliable

#> Warning in anova.lm(model): ANOVA F-tests on an essentially perfect fit are
#> unreliable

#> Warning in anova.lm(model): ANOVA F-tests on an essentially perfect fit are
#> unreliable

#> Warning in anova.lm(model): ANOVA F-tests on an essentially perfect fit are
#> unreliable

#> Warning in anova.lm(model): ANOVA F-tests on an essentially perfect fit are
#> unreliable

#> Warning in anova.lm(model): ANOVA F-tests on an essentially perfect fit are
#> unreliable

#> Warning in anova.lm(model): ANOVA F-tests on an essentially perfect fit are
#> unreliable

#> Warning in anova.lm(model): ANOVA F-tests on an essentially perfect fit are
#> unreliable

#> Warning in anova.lm(model): ANOVA F-tests on an essentially perfect fit are
#> unreliable

#> Warning in anova.lm(model): ANOVA F-tests on an essentially perfect fit are
#> unreliable

#> Warning in anova.lm(model): ANOVA F-tests on an essentially perfect fit are
#> unreliable

#> Warning in anova.lm(model): ANOVA F-tests on an essentially perfect fit are
#> unreliable

#> Warning in anova.lm(model): ANOVA F-tests on an essentially perfect fit are
#> unreliable

#> Warning in anova.lm(model): ANOVA F-tests on an essentially perfect fit are
#> unreliable

#> Warning in anova.lm(model): ANOVA F-tests on an essentially perfect fit are
#> unreliable

#> Warning in anova.lm(model): ANOVA F-tests on an essentially perfect fit are
#> unreliable

#> Warning in anova.lm(model): ANOVA F-tests on an essentially perfect fit are
#> unreliable

#> Warning in anova.lm(model): ANOVA F-tests on an essentially perfect fit are
#> unreliable

#> Warning in anova.lm(model): ANOVA F-tests on an essentially perfect fit are
#> unreliable

#> Warning in anova.lm(model): ANOVA F-tests on an essentially perfect fit are
#> unreliable

#> Warning in anova.lm(model): ANOVA F-tests on an essentially perfect fit are
#> unreliable

#> Warning in anova.lm(model): ANOVA F-tests on an essentially perfect fit are
#> unreliable

#> Warning in anova.lm(model): ANOVA F-tests on an essentially perfect fit are
#> unreliable

#> Warning in anova.lm(model): ANOVA F-tests on an essentially perfect fit are
#> unreliable

#> Warning in anova.lm(model): ANOVA F-tests on an essentially perfect fit are
#> unreliable

#> Warning in anova.lm(model): ANOVA F-tests on an essentially perfect fit are
#> unreliable

#> Warning in anova.lm(model): ANOVA F-tests on an essentially perfect fit are
#> unreliable

#> Warning in anova.lm(model): ANOVA F-tests on an essentially perfect fit are
#> unreliable

#> Warning in anova.lm(model): ANOVA F-tests on an essentially perfect fit are
#> unreliable
#> Error in mutate_impl(.data, dots): Evaluation error: Result 2 is not a length 1 atomic vector.
# but if I remove partial = FALSE it works for the next four
ggstatsplot:::lm_effsize_ci((lm(mpg ~ hp * wt, mtcars)), effsize = "eta", partial = FALSE)
#> # A tibble: 3 x 8
#>   term  F.value   df1   df2  p.value  etasq conf.low conf.high
#>   <chr>   <dbl> <int> <int>    <dbl>  <dbl>    <dbl>     <dbl>
#> 1 hp      146.      1    28 1.23e-12 0.602   0.473       0.755
#> 2 wt       54.5     1    28 4.86e- 8 0.224   0.0793      0.365
#> 3 hp:wt    14.1     1    28 8.11e- 4 0.0580 -0.00410     0.115
ggstatsplot:::lm_effsize_ci(anova(aov(mpg ~ hp * wt, mtcars)), effsize = "eta")
#> # A tibble: 3 x 8
#>   term  F.value   df1   df2  p.value partial.etasq conf.low conf.high
#>   <chr>   <dbl> <int> <int>    <dbl>         <dbl>    <dbl>     <dbl>
#> 1 hp      146.      1    28 1.23e-12         0.839   0.699      0.892
#> 2 wt       54.5     1    28 4.86e- 8         0.661   0.414      0.773
#> 3 hp:wt    14.1     1    28 8.11e- 4         0.335   0.0729     0.539
ggstatsplot:::lm_effsize_ci(anova(aov(mpg ~ hp * wt, mtcars)), effsize = "eta", conf.level = .99)
#> # A tibble: 3 x 8
#>   term  F.value   df1   df2  p.value partial.etasq conf.low conf.high
#>   <chr>   <dbl> <int> <int>    <dbl>         <dbl>    <dbl>     <dbl>
#> 1 hp      146.      1    28 1.23e-12         0.839   0.638      0.906
#> 2 wt       54.5     1    28 4.86e- 8         0.661   0.322      0.801
#> 3 hp:wt    14.1     1    28 8.11e- 4         0.335   0.0236     0.593
ggstatsplot:::lm_effsize_ci(anova(aov(mpg ~ hp * wt, mtcars)), effsize = "eta", conf.level = .99, nboot = 100)
#> # A tibble: 3 x 8
#>   term  F.value   df1   df2  p.value partial.etasq conf.low conf.high
#>   <chr>   <dbl> <int> <int>    <dbl>         <dbl>    <dbl>     <dbl>
#> 1 hp      146.      1    28 1.23e-12         0.839   0.638      0.906
#> 2 wt       54.5     1    28 4.86e- 8         0.661   0.322      0.801
#> 3 hp:wt    14.1     1    28 8.11e- 4         0.335   0.0236     0.593


<sup>Created on 2018-10-04 by the [reprex package](https://reprex.tidyverse.org) (v0.2.1)</sup>

adding missing tests

Some of the following functions were discovered to have bugs and this escaped before because there were no tests for them-

  • context("subtitle_ggscatterstats")

  • Pearson's r

  • percentage bend correlations

  • bayes factor

  • context("subtitle_t_onesample")

  • Wilcox test

  • robust location measure test

  • context("subtitle_mann_nonparametric")

  • within-subjects design

  • context("effctsize_ci")
    (new functions introduced in 0.0.7)

  • yuend_ci

  • kw_eta_h_ci

  • context("pairwise comparisons")

  • tests for pairwise_p() function

user question

require("foreign")
#> Loading required package: foreign
library(foreign)
## import and descriptives
aggression <-
  read.spss(
    "http://www.people.fas.harvard.edu/~mair/datasets/aggression.sav",
    to.data.frame = TRUE,
    use.value.labels = FALSE
  )
#> re-encoding from CP1252
colnames(aggression) <-
  c("car", "sex", "age", "frequency", "duration", "honk")
aggression[, 1] <-
  factor(aggression[, 1], labels = c("BMW", "Ford KA"))
aggression[, 2] <-
  factor(aggression[, 2], labels = c("male", "female"))
head(aggression)
#>   car  sex age frequency duration honk
#> 1 BMW male   7         5        2    8
#> 2 BMW male   8         1        4    1
#> 3 BMW male  NA         3        1    4
#> 4 BMW male   6         1        7    1
#> 5 BMW male   6         0       NA    0
#> 6 BMW male   6         1        9    3
dim(aggression)
#> [1] 127   6

## DV: honking duration; Factor: car (BMW vs. Ford)
# hist(aggression$duration)

using ggstatsplot:

ggstatsplot::ggbetweenstats(
  data = aggression,
  x = car,
  y = duration,
  messages = FALSE
)

ggstatsplot::ggbetweenstats(
  data = na.omit(aggression),
  x = car,
  y = duration,
  messages = FALSE
) 

Created on 2018-09-24 by the reprex package (v0.2.1)

ggpiestats not recycling colors

trying to use ggpiestats on a variable that has more than 8 levels. The first 8 levels are working fine, anything above 8 does not get a color assigned. Perhaps that's by design but seems dysfunctional. Could it at least recycle colors so the extras are not white? Or am I doing something wrong?

Very, very very simple reprex below

library(ggstatsplot)
mainexample <- rep(c("one","two","three","four","five","six","seven","eight","nine","ten"),5)
mainexample<-as.data.frame(mainexample)
# This shows the problem
ggpiestats(data = mainexample, main = "mainexample")

# I can of course manually force a different palette
ggpiestats(data = mainexample, main = "mainexample", palette = "Set3")

Created on 2018-10-24 by the reprex package (v0.2.1)

Loading additional palettes doesn't seem to work

I'm trying to load additional palettes in to help me plot a variable with more than 8 categories. However, when I do that, I get the following error:

Error in ggstatsplot::ggbetweenstats(data = sums_inc, x = AnnualIncome, :
unused arguments (ggstatsplot.layer = FALSE, package = "wesanderson")

Any advice/insights on what to do?

ggstatsplot::ggbetweenstats(
data = sums_inc,
x = AnnualIncome,
y = negative_affect.z,
mean.plotting = F,
mean.label.size = 3.5,
k = 2,
xlab = "Annual Income",
ylab = "Negative Affect",
title = "Income and Negative Affect",
plot.type = "boxviolin",
type = F,
ggtheme = ggthemes::theme_fivethirtyeight(),
messages = FALSE,
ggstatsplot.layer = FALSE,
package = "wesanderson",
palette = "Darjeeling1")

Is it possible to change color palette for ggscatterstats?

I notice that custom color palette can be passed to the plot by argument package=xxx, palette=xxx .

It works in most function of ggstatsplot, but seems that it is impossible to pass that argument to ggscatterstats.

Here is the error message:

Error in ggscatterstats(., , : unused argument (package = "ggsci")

color control

Terrific package.
I'm using ggstatsplot::ggbetweenstats and couldn't find any function that might control for the fill color of the points that are created in the violin plot that is produced.
Similar to how you have xfill and yfill arguments for the ggscatterstats script, is there an option within ggbetweenstats that could function where you pass in a vector of colors associated with each group?
I'd like to use a 2 hue, 2 color system where light/dark represents one variable, and the other color (say yellow/purple) represents a different variable.
In your between stats example combining plots, I can get half way there because the same scheme is used for each subplot, but I don't want to split my 4 groups apart.

Here's the plot I have so far:

image

What I'd like is to have something that allows me to substitute those four colors with the four hex-code specified colors I want, which would ultimately produce something where the first two groups are yellow (light yellow, dark yellow) and the last two groups are purple (light purple, dark purple).

Apologies if this already exists and I can't find it in the function descriptions!

Thanks

Missing images in the documentation?

Please briefly describe your problem and what output you expect. If you have a question, please don't use this form. Instead, ask on https://stackoverflow.com/ or https://community.rstudio.com/.

Please include a minimal reproducible example (AKA a reprex). If you've never heard of a reprex before, start by reading https://www.tidyverse.org/help/#reprex.


Hi Indrajeet Patil,

it seems as if the images of the documentation here oh GitHub went missing - they won't load (at least using firefox).

Best,
Jonas

# insert reprex here

Using ggcoefstats to display a tbl_df containing the output of a brms model

Hi,

I tried displaying the output of a Bayesian regression model using ggcoefstats but I just can't manage to get it to work. I don't have reproducible code as it basically involves one function (ggcoefstats()) and one line of code, but here's my workflow, described in words:

  1. After fitting my model using brms() and saving the output as an .RDS file, I load the .RDS file and convert it to a tidied tibble using broom.mixed::tidy(), which supports brms;

  2. I then try displaying the tibble using ggcoefstats, but I get an error without any description. To be more precise, I'm trying ggstatsplot::ggcoefstats(x = RT_model_tidy), without any test statistic specification. Note that the tibble has all the appropriate columns, as required (i.e., term and estimate).

Thanks in advance for any tips!

not shortening names for `ggcorrmat` output when confidence intervals are returned

This stems from the fact that psych::corr.test function itself produces shortened names when confidence intervals are needed, even if minlength argument is changed.

# for reproducibility
set.seed(123)

# creating the object
res_df <- psych::corr.test(
    x = dplyr::select(iris, -Species),
    y = NULL,
    use = "pairwise",
    alpha = .05,
    ci = TRUE,
    minlength = 20
  )

# checking confidence intervals
res_df$ci
#>                  lower          r       upper            p
#> Spl.L-Spl.W -0.2726932 -0.1175698  0.04351158 1.518983e-01
#> Spl.L-Ptl.L  0.8270363  0.8717538  0.90550805 1.038667e-47
#> Spl.L-Ptl.W  0.7568971  0.8179411  0.86483606 2.325498e-37
#> Spl.W-Ptl.L -0.5508771 -0.4284401 -0.28794993 4.513314e-08
#> Spl.W-Ptl.W -0.4972130 -0.3661259 -0.21869663 4.073229e-06
#> Ptl.L-Ptl.W  0.9490525  0.9628654  0.97298532 4.675004e-86

Created on 2018-10-31 by the reprex package (v0.2.1)

possible issue with devel version of broom.mixed?

Testing devel version of ggstatsplot with devel version of broom.mixed (0.2.3, on GitHub, about to go to CRAN ...) I get

Quitting from lines 556-575 (ggcoefstats.Rmd) 
Error: processing vignette 'ggcoefstats.Rmd' failed with diagnostics:
replacement has 1 row, data has 0
Execution halted

As far as I have been able to dig in this seems to come from inside tidy.clm(), which is not part of broom.mixed ... can you double-check on your end please?

 1: plotlist %>% purrr::map(.x = ., .f = ~ggstatsplot::ggcoefstats(x = ordinal:
 2: withVisible(eval(quote(`_fseq`(`_lhs`)), env, env))
 3: eval(quote(`_fseq`(`_lhs`)), env, env)
 4: eval(quote(`_fseq`(`_lhs`)), env, env)
 5: `_fseq`(`_lhs`)
 6: freduce(value, `_function_list`)
 7: withVisible(function_list[[k]](value))
 8: function_list[[k]](value)
 9: purrr::map(.x = ., .f = ~ggstatsplot::ggcoefstats(x = ordinal::clm(formula 
10: .f(.x[[i]], ...)
11: ggstatsplot::ggcoefstats(x = ordinal::clm(formula = as.factor(rating) ~ bel
12: broom::tidy(x = x, conf.int = TRUE, conf.level = conf.level, quick = FALSE,
13: tidy.clm(x = x, conf.int = TRUE, conf.level = conf.level, quick = FALSE, co
14: process_clm(ret, x, conf.int = conf.int, conf.level = conf.level, exponenti
15: `[<-`(`*tmp*`, ret$term %in% names(x$zeta), "coefficient_type", value = "ze
16: `[<-.data.frame`(`*tmp*`, ret$term %in% names(x$zeta), "coefficient_type", 

changing `k`/`digits` argument doesn't make any difference to `ggcorrmat` plot correlations

This should display correlations with 3 digits after the decimal point.

# for reproducibility
set.seed(123)

# as a default this function outputs a correlalogram plot
ggstatsplot::ggcorrmat(
  data = ggplot2::msleep,
  corr.method = "robust",                    # correlation method
  sig.level = 0.001,                         # threshold of significance
  p.adjust.method = "holm",                  # p-value adjustment method for multiple comparisons
  cor.vars = c(sleep_rem, awake:bodywt),     # a range of variables can be selected  
  cor.vars.names = c("REM sleep",            # variable names
                     "time awake", 
                     "brain weight", 
                     "body weight"), 
  matrix.type = "upper",                     # type of visualization matrix
  digits = 3,                                # no. of digits after decimal point
  colors = c("#B2182B", "white", "#4D4D4D"), 
  title = "Correlalogram for mammals sleep dataset",
  subtitle = "sleep units: hours; weight units: kilograms"
)

Created on 2018-11-21 by the reprex package (v0.2.1)

Treatment of NAs in gghistostats with regard to central tendency measures

In gghistostats, If you give a vector with NAs, it will still plot the histogram, but it won't draw the mean/median. Might be worth making consistent what the function will accept to plot and what it will display central tendency for.

(As noted on twitter, "The na.rm = TRUE arguments are indeed missing for the geom_line() function in gghistostats and that's why it's behaving this way.")

Factor Re-leveling supported?

Hi - thank you so much for this package, it's my absolute favorite thing. I'm not sure if this is an issue so I apologize if I'm posting in the wrong place, but I was wondering if there's a way to re-order levels so that each grouping would be sorted in the same order; see below - is it possible to re-sort the 'Sepal.Width' grouping in ascending order so it looks like the other groups or is each group sorted independently by design? Thank you!

image

melt <- melt(iris)

grouped_ggbetweenstats(melt, Species, value, grouping.var = variable, messages = F )

making `ggscatterstats` consistent with the overall API

Since #38 was solved, ggscatterstats arguments label.var and label.expression work only with characters but not bare expressions. This is incosistent with the API principles for the rest of the functions in the package. It will be nice if this function's arguments behaved the same way as other functions and accepted both bare expressions and characters.

# setup
set.seed(123)
library(ggstatsplot)
library(ggplot2)

# works
ggscatterstats(
  msleep,
  brainwt,
  sleep_total,
  marginal = FALSE,
  label.var = "genus",
  label.expression = "brainwt > 3"
)

# doesn't work
ggscatterstats(
  msleep,
  brainwt,
  sleep_total,
  marginal = FALSE,
  label.var = genus,
  label.expression = brainwt > 3
)
#> Error in parse_exprs(x): object 'brainwt' not found

Created on 2018-12-14 by the reprex package (v0.2.1)

Preparing for 0.0.6 release

To do:

  • Check font size for theme_ggstatsplot function and if you wanna change it
  • Clean up Rmd using gramr package
  • Change merMod tidiers as soon as broom.mixed is on CRAN.
  • Attempt to reduce size of vignettes to prevent R CMD CRAN from producing a NOTE.
  • Add tests for all the new functions being exported to prepare text subtitles for results.
  • Change how the bf.message is currently implemented.
  • Add warning message to every grouped_ function that they can't be further modified.

displaying outlier labels properly in case `plot.type = "violin"`

Since geom_violin(), like geom_boxplot(), doesn't have outlier point highlighting, figure out a way to better display the outliers.

set.seed(123)

# plot
ggstatsplot::ggbetweenstats(
  data = ToothGrowth,
  x = supp,
  y = len,
  plot.type = "violin",
  messages = FALSE,
  results.subtitle = FALSE,
  outlier.tagging = TRUE,
  outlier.coef = 0.75,
  mean.plotting = FALSE,
  sample.size.label = FALSE
)

Created on 2018-12-11 by the reprex package (v0.2.1)

`ggcoefstats` displays incorrect labels for anova + partial omega-squared combo

All p-values are below 0.05 but confidence intervals contain 0. This traces back to sjstats::omega_sq() (strengejacke/sjstats#51).

# for reprducibility
set.seed(123)
library(ggstatsplot)

# to speed up the calculation, let's use only 10% of the data
movies_10 <-
  dplyr::sample_frac(tbl = ggstatsplot::movies_long, size = 0.1)

# `aov` object
stats.object <- stats::aov(formula = rating ~ mpaa * genre,
                           data = movies_10)

# plot
ggstatsplot::ggcoefstats(x = stats.object, effsize = "omega")

Created on 2018-11-17 by the reprex package (v0.2.1)

purrr::pmap not working with ggscatterstats when expression is used

The expression is not evaluated properly and so there are 0 rows in label_data and geom_label_repel fails.

library(tidyverse)

# for reproducibility
set.seed(123)

# let's split the dataframe and create a list by mpaa rating
mpaa_list <- ggstatsplot::movies_wide %>%
  base::split(x = ., f = .$mpaa, drop = TRUE)

# this created a list with 4 elements, one for each mpaa rating
# you can check the structure of the file for yourself
# str(mpaa_list)

# checking the length and names of each element
length(mpaa_list)
#> [1] 4
names(mpaa_list)
#> [1] "NC-17" "PG"    "PG-13" "R"

# running function on every element of this list note that if you want the same
# value for a given argument across all elements of the list, you need to
# specify it just once
plot_list1 <- purrr::pmap(
  .l = list(
    data = mpaa_list,
    x = "budget",
    y = "rating",
    xlab = "Budget (in millions of US dollars)",
    ylab = "Rating on IMDB",
    title = list(
      "MPAA Rating: NC-17",
      "MPAA Rating: PG",
      "MPAA Rating: PG-13",
      "MPAA Rating: R"
    ),
    label.var = list("title", "year", "votes", "genre"),
    label.expression = list(
      ("rating" > 8.5 &
        "budget" < 50),
      ("rating" > 8.5 &
        "budget" < 100),
      ("rating" > 8 & "budget" < 50),
      ("rating" > 9 & "budget" < 10)
    ),
    type = list("r", "np", "p", "np"),
    method = list(MASS::rlm, "lm", "lm", "lm"),
    marginal.type = list("histogram", "boxplot", "density", "violin"),
    centrality.para = "mean",
    xfill = list("#56B4E9", "#009E73", "#999999", "#0072B2"),
    yfill = list("#D55E00", "#CC79A7", "#F0E442", "#D55E00"),
    ggtheme = list(
      ggplot2::theme_grey(),
      ggplot2::theme_classic(),
      ggplot2::theme_light(),
      ggplot2::theme_minimal()
    ),
    messages = FALSE
  ),
  .f = ggstatsplot::ggscatterstats
)
#> Error: Aesthetics must be either length 1 or the same as the data (1): x, y

# combine plots
ggstatsplot::combine_plots(
  plotlist = plot_list,
  nrow = 4
)

Created on 2018-09-01 by the reprex package (v0.2.0.9000).

Possible bug in ggscatterstats

label.expression works for single conditions but seems to fail silently for a joint condition. See reprex (sorry for some reason on my machine it's only rendering the final plot but I assure you the first two are working.

library(ggstatsplot)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
# Remove non unique movies for clarity
# Test "length < 60" it works
ggscatterstats(
  data = distinct(movies_long, title, year, .keep_all = TRUE),
  x = length,
  y = budget, 
  label.expression = "length < 60", 
  label.var = "title"
)
#> Warning: This plot can't be further modified with `ggplot2` functions.
#> In case you want a `ggplot` object, set `marginal = FALSE`.
#> 
# Remove non unique movies for clarity
# Test "budget > 150" it works
ggscatterstats(
  data = distinct(movies_long, title, year, .keep_all = TRUE),
  x = length,
  y = budget, 
  label.expression = "budget > 150", 
  label.var = "title"
)
#> Warning: This plot can't be further modified with `ggplot2` functions.
#> In case you want a `ggplot` object, set `marginal = FALSE`.
#> 
# Remove non unique movies for clarity
# Try both and it silently drops the labels
ggscatterstats(
  data = distinct(movies_long, title, year, .keep_all = TRUE),
  x = length,
  y = budget, 
  label.expression = "budget > 150 & length < 60", 
  label.var = "title"
)
#> Warning: This plot can't be further modified with `ggplot2` functions.
#> In case you want a `ggplot` object, set `marginal = FALSE`.
#> 

Created on 2018-12-05 by the reprex package (v0.2.1)

robust anova not working when `NA`s present in data

# data
ggplot2::msleep
#> # A tibble: 83 x 11
#>    name  genus vore  order conservation sleep_total sleep_rem sleep_cycle
#>    <chr> <chr> <chr> <chr> <chr>              <dbl>     <dbl>       <dbl>
#>  1 Chee~ Acin~ carni Carn~ lc                  12.1      NA        NA    
#>  2 Owl ~ Aotus omni  Prim~ <NA>                17         1.8      NA    
#>  3 Moun~ Aplo~ herbi Rode~ nt                  14.4       2.4      NA    
#>  4 Grea~ Blar~ omni  Sori~ lc                  14.9       2.3       0.133
#>  5 Cow   Bos   herbi Arti~ domesticated         4         0.7       0.667
#>  6 Thre~ Brad~ herbi Pilo~ <NA>                14.4       2.2       0.767
#>  7 Nort~ Call~ carni Carn~ vu                   8.7       1.4       0.383
#>  8 Vesp~ Calo~ <NA>  Rode~ <NA>                 7        NA        NA    
#>  9 Dog   Canis carni Carn~ domesticated        10.1       2.9       0.333
#> 10 Roe ~ Capr~ herbi Arti~ lc                   3        NA        NA    
#> # ... with 73 more rows, and 3 more variables: awake <dbl>, brainwt <dbl>,
#> #   bodywt <dbl>

# with `WRS2` works
WRS2::t1way(formula = sleep_rem ~ vore, 
            data = ggplot2::msleep)
#> Call:
#> WRS2::t1way(formula = sleep_rem ~ vore, data = ggplot2::msleep)
#> 
#> Test statistic: F = 2.7569 
#> Degrees of freedom 1: 3 
#> Degrees of freedom 2: 9.37 
#> p-value: 0.10159 
#> 
#> Explanatory measure of effect size: 0.78

# when bootstrapping, it doesn't work
subtitle_ggbetween_rob_anova(
  data = ggplot2::msleep,
  x = vore,
  y = sleep_rem
)
#> Error in subtitle_ggbetween_rob_anova(data = ggplot2::msleep, x = vore, : could not find function "subtitle_ggbetween_rob_anova"

Created on 2018-09-25 by the reprex package (v0.2.1)

Warn linux users of required linux packages for ggstatsplot installation

Dear Indrajeet Patil,

Thank you so much for your hard work on this package!

I'm using archlinux, and was getting errors when trying to install this package, both from cran or directly from github.
Then I found out Linux users need to have OpenGL libraries installed in order to install this package (namely libx11, mesa and Mesa OpenGL Utility library - glu), given that its dependency 'rgl' requires it.
Could you warn linux users about this in the installation instructions in the Readme?

Best regards,
Joรฃo

why is this not working?

# needed libraries
library(ggstatsplot)
library(tidyverse)

# data
df <- data.frame(x = c(1:100))
df$y <- 2 + 3 * df$x + rnorm(100, sd = 40)

# looking at the structure
str(df)
#> 'data.frame':    100 obs. of  2 variables:
#>  $ x: int  1 2 3 4 5 6 7 8 9 10 ...
#>  $ y: num  22.57 16.74 8.33 19.36 120.54 ...
colnames(df)
#> [1] "x" "y"

# adding results from correlation
ggscatterstats(df, x, y)
#> Error in filter_impl(.data, quo): Evaluation error: object 'x' not found.

Created on 2018-09-12 by the reprex package (v0.2.0.9000).

NA omission is too harsh for `grouped_` variant of some functions

Same dataset has different sample sizes (n) across the bare and grouped variant of the function.

For example, ggbetweenstats-

set.seed(123)
library(ggplot2)
library(ggstatsplot)

# create a dataset
df <- ggplot2::msleep
df$group <- "1"

# bare function
ggbetweenstats(df,
               vore,
               brainwt,
               messages = FALSE,
               outlier.label = conservation)

# grouped function
grouped_ggbetweenstats(
  df,
  vore,
  brainwt,
  grouping.var = group,
  outlier.label = conservation,
  messages = FALSE
)

Created on 2018-12-12 by the reprex package (v0.2.1)

Plus, outlier.tagging is not TRUE and yet that column is getting evaluated. Needs to be fixed.

Adaptation to new `effsize` version

The novel version of effsize package breaks a test that was designed to adapt to a known bug.

The test output with version 0.7.4 (available on https://github.com/mtorchiano/effsize)

1. Failure: parametric t-test works (between-subjects without NAs) (@test_subtitle_t_parametric.R#63) 

This is a possible correction to the test case:

testthat::test_that(
  desc = "parametric t-test works (between-subjects without NAs)",
  code = {

    # ggstatsplot output
    set.seed(123)
    using_function1 <-
      suppressWarnings(
        ggstatsplot::subtitle_t_parametric(
          data = dplyr::filter(
            ggstatsplot::movies_long,
            genre == "Action" | genre == "Drama"
          ),
          x = genre,
          y = rating,
          effsize.type = "d",
          effsize.noncentral = TRUE,
          var.equal = TRUE,
          conf.level = .99,
          k = 5,
          messages = FALSE
        )
      )

    # expected output
    # this test will have to be changed with the next release of `effsize`
    # d here should be negative but is displayed as positive
    # this is a bug in effsize and has been fixed in the development version
    # (https://github.com/mtorchiano/effsize/commit/3561d93f9e9f5a61b3460ba120b316f7e4c3352f)
    set.seed(123)
    results1 <-
      ggplot2::expr(
        paste(
          italic("t"),
          "(",
          "1317.00000",
          ") = ",
          "-9.46816",
          ", ",
          italic("p"),
          " = ",
          "< 0.001",
          ", ",
          italic("d"),
          " = ",
          #"0.51775",
          "-0.56364",  ## FIX
          ", CI"["99%"],
          " [",
          #"0.36213",
          "-0.71947",  ## FIX
          ", ",
          #"0.67319",
          "-0.40762",  ## FIX
          "]",
          ", ",
          italic("n"),
          " = ",
          1319L
        )
      )

    
    # testing overall call
    testthat::expect_equal(using_function1, results1)
  }
)

adding support for `gamlss` class objects in `ggcoefstats`

library(gamlss)
#> Loading required package: splines
#> Loading required package: gamlss.data
#> Loading required package: gamlss.dist
#> Loading required package: MASS
#> Loading required package: nlme
#> Loading required package: parallel
#>  **********   GAMLSS Version 5.1-2  **********
#> For more on GAMLSS look at http://www.gamlss.org/
#> Type gamlssNews() to see new features/changes/bug fixes.
library(tidyverse)
library(ggstatsplot)

g <- gamlss(
  y ~ pb(x),
  sigma.fo = ~ pb(x),
  family = BCT,
  data = abdom,
  method = mixed(1, 20)
)
#> GAMLSS-RS iteration 1: Global Deviance = 4771.925 
#> GAMLSS-CG iteration 1: Global Deviance = 4771.013 
#> GAMLSS-CG iteration 2: Global Deviance = 4770.994 
#> GAMLSS-CG iteration 3: Global Deviance = 4770.994

broom::tidy(g, conf.int = TRUE)
#>   parameter        term     estimate   std.error   statistic       p.value
#> 1        mu (Intercept) -64.44299460 1.328921129 -48.4927158 1.889994e-210
#> 2        mu       pb(x)  10.69463541 0.057769202 185.1269371  0.000000e+00
#> 3     sigma (Intercept)  -2.65041283 0.108045909 -24.5304321  8.093605e-93
#> 4     sigma       pb(x)  -0.01002512 0.003784911  -2.6487067  8.290567e-03
#> 5        nu (Intercept)  -0.10715726 0.557434072  -0.1922331  8.476237e-01
#> 6       tau (Intercept)   2.49483399 0.301271895   8.2810047  7.765827e-16
confint(g)
#> Warning in vcov.gamlss(object, robust = robust): Additive terms exists in the  mu formula. 
#>   Standard errors for the linear terms maybe are not appropriate
#> Warning in vcov.gamlss(object, robust = robust): Additive terms exists in the  sigma formula. 
#>   Standard errors for the linear terms maybe are not appropriate
#>                 2.5 %    97.5 %
#> (Intercept) -67.05752 -61.82847
#> pb(x)        10.58121  10.80806

ggcoefstats(x = g)
#> Note: No 95% confidence intervals available for regression coefficients from gamlss object, so skipping whiskers in the plot.
#> 
#> Error in `levels<-`(`*tmp*`, value = as.character(levels)): factor level [2] is duplicated

Created on 2018-10-07 by the reprex package (v0.2.1)

Bug in `ggbetweenstats` `var.equal`

Just formalizing a bug I noted earlier. I tested a lot of permutations and it appears that it is var.equal that breaks the tibble that is output of mean differences. As a side product it actually makers the resultant plots inaccurate as well.

library(ggstatsplot)
# works
ggstatsplot::ggbetweenstats(
  data = movies_long,
  x = mpaa,
  y = rating,
  pairwise.comparisons = TRUE
)
#> Note: 95% CI for effect size estimate was computed with 100 bootstrap samples.
#> 
#> # tibble [3 ร— 11]
#>   group1 group2 mean.difference conf.low conf.high    se t.value    df
#>   <chr>  <chr>            <dbl>    <dbl>     <dbl> <dbl>   <dbl> <dbl>
#> 1 R      PG-13           -0.219   -0.375    -0.064 0.047   3.31  1142.
#> 2 R      PG              -0.323   -0.573    -0.074 0.075   3.05   277.
#> 3 PG-13  PG              -0.104   -0.362     0.154 0.077   0.952  309.
#> # ... with 3 more variables: p.value <dbl>, significance <chr>,
#> #   p.value.label <chr>
#> Note: Shapiro-Wilk Normality Test for rating : p-value = < 0.001
#> 
#> Note: Bartlett's test for homogeneity of variances for factor mpaa : p-value = 0.004
#> 

# gives incorrect effect size tibble
ggstatsplot::ggbetweenstats(
  data = movies_long,
  x = mpaa,
  y = rating,
  pairwise.comparisons = TRUE,
  var.equal = TRUE
)
#> Note: 95% CI for effect size estimate was computed with 100 bootstrap samples.
#> 
#> Warning: Expected 2 pieces. Additional pieces discarded in 2 rows [1, 3].
#> # tibble [5 ร— 8]
#>   group1 group2 mean.difference conf.low conf.high  p.value significance
#>   <chr>  <chr>            <dbl>    <dbl>     <dbl>    <dbl> <chr>       
#> 1 PG     13               0.104  -0.140      0.348 NA       <NA>        
#> 2 R      PG               0.323   0.0944     0.552  0.00283 **          
#> 3 R      PG               0.219   0.0570     0.381  0.00283 **          
#> 4 PG-13  PG              NA      NA         NA      0.316   ns          
#> 5 R      PG-13           NA      NA         NA      0.00310 **          
#> # ... with 1 more variable: p.value.label <chr>
#> Note: Shapiro-Wilk Normality Test for rating : p-value = < 0.001
#> 
#> Note: Bartlett's test for homogeneity of variances for factor mpaa : p-value = 0.004
#> 

Created on 2018-12-14 by the reprex package (v0.2.1)

diagnosing Ubuntu fails in Travis

@IndrajeetPatil, in response to #23, I'll try a few things in sequence to isolate the problem, including

  • temporarily disable OS-X builds in the matrix (in case they're timing out --but I don't think that's the case).

At the very least, that should buy you some extra run time on Travis before the time-limit is reached.

  • experiment w/ location of the package sources, which hopefully addresses this error message
The command "eval sudo apt-get install -y r-cran-stringi r-cran-magrittr r-cran-curl r-cran-jsonlite r-cran-rcpp r-cran-bindrcpp r-cran-rcppeigen r-cran-openssl r-cran-rlang r-cran-utf8 r-cran-gss r-cran-haven r-cran-data.table r-cran-dplyr r-cran-purrr r-cran-tidyr r-cran-readr r-cran-minqa r-cran-mvtnorm r-cran-nloptr r-cran-sparsem r-cran-lme4 r-cran-httpuv r-cran-markdown r-cran-sem r-cran-readxl r-cran-openxlsx r-cran-pander " failed. Retrying, 2 of 3.

goals for `0.0.8` release

Planned date: early Feb 2019

(Goal for release date: last week of December)

To do:

  • Massively refactor subtitle maker functions to avoid repetition of code across functions
  • Go full rlang rather than using short-cuts
  • Since #38 was solved, ggscatterstats arguments label.var and label.expression work only with characters but not bare expressions. This is incosistent with the API principles for the rest of the functions in the package. See if there is a way to implement this.
  • Refactor grouped_ variants of functions to remove the ugly purrr hack they currently implement
  • Refactor code to remove stats::na.omit(). Take a more fine-grained approach to remove NAs only from columns of interest.
  • Showing both 50% and 95% CIs for ggcoefstats (like in Bayesian inference plots: e.g., https://twitter.com/tjmahr/status/1048226472710873089)
  • Check font size for theme_ggstatsplot function; maybe give user arguments option to change all aspects of the theme?
  • Clean up .Rmd file language using gramr package
  • When there are many levels in a factor, ggpiestats labels can overlap; give the option to have the labels to be either "internal" (current default) or "external" to the slices
  • Add group option for ggscatterstats to support grouped marginals (https://github.com/daattali/ggExtra/blob/master/inst/vignette_files/ggExtra_files/figure-markdown_strict/ggmarginal-grouping-1.png)
  • Add ggplot.function argument to grouped_ variants to make modifications with ggplot2 functions to customize the plot
  • Make the package lighter; the number of dependencies is getting out of control

writing unit tests for bootstrapped effect sizes

Somehow even after setting the seed to the same value, the subtitle prepared by the bare function and the one computed in the function environment consistently differ slightly. This is because the confidence intervals for effect size are not identical.

How do you write tests for such cases?
Want to make sure here that the entire call is identical in the helper subtitle function and its instantiation in the plotting function.

# plot
set.seed(123)
p <- ggstatsplot::ggbetweenstats(
  data = mtcars,
  x = cyl,
  y = wt,
  nboot = 50,
  var.equal = TRUE,
  messages = FALSE,
  k = 3
)


# subtitle
set.seed(123)
p_subtitle <- ggstatsplot::subtitle_anova_parametric(
  data = mtcars,
  x = cyl,
  y = wt,
  nboot = 50,
  var.equal = TRUE,
  messages = FALSE,
  k = 3
)

# checking if these two are equal
p$labels$subtitle
#> paste(italic("F"), "(", 2, ",", "29", ") = ", "22.911", ", ", 
#>     italic("p"), " = ", "< 0.001", ", ", omega["p"]^2, " = ", 
#>     "0.578", ", CI"["95%"], " [", "0.432", ", ", "0.774", "]", 
#>     ", ", italic("n"), " = ", 32L)

p_subtitle
#> paste(italic("F"), "(", 2, ",", "29", ") = ", "22.911", ", ", 
#>     italic("p"), " = ", "< 0.001", ", ", omega["p"]^2, " = ", 
#>     "0.578", ", CI"["95%"], " [", "0.431", ", ", "0.770", "]", 
#>     ", ", italic("n"), " = ", 32L)

Created on 2018-11-29 by the reprex package (v0.2.1)

bug in `grouped_` variants of functions?

As the README mentions, all functions in ggstatsplot are supposed to work irrespective of whether you enter a character (x = "x") or or a bare expression (x = x). But this doesn't seem to be working for grouped_ variants of functions for the grouping.var argument?

Definintely something is not right with the way I've implemented this using rlang, my Achilles heel.

@ibecav You wanna take a look at this?

library(ggstatsplot)

# works
ggstatsplot::grouped_ggbetweenstats(
  data = dplyr::sample_frac(tbl = ggstatsplot::movies_long, size = 0.25) %>%
    dplyr::filter(.data = ., mpaa %in% c("R", "PG-13"), genre %in% c("Drama", "Comedy")),
  x = genre,
  y = rating,
  grouping.var = mpaa
)

# doesn't work
ggstatsplot::grouped_ggbetweenstats(
  data = dplyr::sample_frac(tbl = ggstatsplot::movies_long, size = 0.25) %>%
    dplyr::filter(.data = ., mpaa %in% c("R", "PG-13"), genre %in% c("Drama", "Comedy")),
  x = genre,
  y = rating,
  grouping.var = "mpaa"
)
#> Error in arrange_impl(.data, dots): incorrect size (1) at position 1, expecting : 341

12.
stop(structure(list(message = "incorrect size (1) at position 1, expecting : 341", 
    call = arrange_impl(.data, dots), cppstack = structure(list(
        file = "", line = -1L, stack = "C++ stack not available on this system"), class = "Rcpp_stack_trace")), class = c("Rcpp::exception", 
"C++Error", "error", "condition"))) 
11.
arrange_impl(.data, dots) 
10.
arrange.tbl_df(.data = ., !!rlang::enquo(grouping.var)) 
9.
dplyr::arrange(.data = ., !!rlang::enquo(grouping.var)) 
8.
function_list[[i]](value) 
7.
freduce(value, `_function_list`) 
6.
`_fseq`(`_lhs`) 
5.
eval(quote(`_fseq`(`_lhs`)), env, env) 
4.
eval(quote(`_fseq`(`_lhs`)), env, env) 
3.
withVisible(eval(quote(`_fseq`(`_lhs`)), env, env)) 
2.
df %<>% dplyr::mutate_if(.tbl = ., .predicate = purrr::is_bare_character, 
    .funs = ~as.factor(.)) %>% dplyr::mutate_if(.tbl = ., .predicate = is.factor, 
    .funs = ~base::droplevels(.)) %>% dplyr::filter(.data = ., 
    !is.na(!!rlang::enquo(grouping.var))) %>% dplyr::arrange(.data = .,  ... at grouped_ggbetweenstats.R#132
1.
ggstatsplot::grouped_ggbetweenstats(data = dplyr::sample_frac(tbl = ggstatsplot::movies_long, 
    size = 0.25) %>% dplyr::filter(.data = ., mpaa %in% c("R", 
    "PG-13"), genre %in% c("Drama", "Comedy")), x = genre, y = rating, 
    grouping.var = "mpaa") 

Created on 2018-11-13 by the reprex package (v0.2.1)

goals for 0.0.7

(Goal for release date: last week of December)

To do:

  • Add groupedstats as dependencies and import shared functions from there
  • Go full rlang rather than using short-cuts
  • Refactor code to remove stats::na.omit(). Take a more fine-grained approach to remove NAs only from columns of interest.
  • Add results.subtitle argument to all functions
  • Get ggcoefstats to work with dataframe arguments
  • Showing both 50% and 95% CIs for ggcoefstats (like in Bayesian inference plots: e.g., https://twitter.com/tjmahr/status/1048226472710873089)
  • Add many more tests and get the code coverage to at least 50%
    (currently at 14%: https://github.com/IndrajeetPatil/ggstatsplot/tree/master/tests)
  • Check font size for theme_ggstatsplot function; give user arguments option to change all aspects of the theme?
  • Clean up Rmd using gramr package
  • Change k = 2 for all functions to follow APA guidelines
  • Add Bayes Factors to ggscatterstats, ggpiestats, and ggbetweenstats (anova designs)
  • When there are many levels in a factor, ggpiestats labels can overlap; give the option to have the labels to be either "internal" (current default) or "external" to the slices
  • Add group option for ggscatterstats to support grouped marginals (https://github.com/daattali/ggExtra/blob/master/inst/vignette_files/ggExtra_files/figure-markdown_strict/ggmarginal-grouping-1.png)
  • Add ggplot.function argument to grouped_ variants to make modifications with ggplot2 functions to customize the plot
  • Add pairwise comparisons support for ggbetweenstats
  • Add new function ggdotplotstats for dot plots/charts
  • Change 95% CI to have 95% as a subscript

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.