kassambara / rstatix Goto Github PK

Pipe-friendly Framework for Basic Statistical Tests in R

Home Page: https://rpkgs.datanovia.com/rstatix/

R 99.86% JavaScript 0.14%

rstatix's Introduction

rstatix

Provides a simple and intuitive pipe-friendly framework, coherent with the ‘tidyverse’ design philosophy, for performing basic statistical tests, including t-test, Wilcoxon test, ANOVA, Kruskal-Wallis and correlation analyses.

The output of each test is automatically transformed into a tidy data frame to facilitate visualization.

Additional functions are available for reshaping, reordering, manipulating and visualizing correlation matrix. Functions are also included to facilitate the analysis of factorial experiments, including purely ‘within-Ss’ designs (repeated measures), purely ‘between-Ss’ designs, and mixed ‘within-and-between-Ss’ designs.

It’s also possible to compute several effect size metrics, including “eta squared” for ANOVA, “Cohen’s d” for t-test and “Cramer’s V” for the association between categorical variables. The package contains helper functions for identifying univariate and multivariate outliers, assessing normality and homogeneity of variances.

Key functions

Descriptive statistics

get_summary_stats(): Compute summary statistics for one or multiple numeric variables. Can handle grouped data.
freq_table(): Compute frequency table of categorical variables.
get_mode(): Compute the mode of a vector, that is the most frequent values.
identify_outliers(): Detect univariate outliers using boxplot methods.
mahalanobis_distance(): Compute Mahalanobis Distance and Flag Multivariate Outliers.
shapiro_test() and mshapiro_test(): Univariate and multivariate Shapiro-Wilk normality test.

Comparing means

t_test(): perform one-sample, two-sample and pairwise t-tests
wilcox_test(): perform one-sample, two-sample and pairwise Wilcoxon tests
sign_test(): perform sign test to determine whether there is a median difference between paired or matched observations.
anova_test(): an easy-to-use wrapper around car::Anova() to perform different types of ANOVA tests, including independent measures ANOVA, repeated measures ANOVA and mixed ANOVA.
get_anova_test_table(): extract ANOVA table from anova_test() results. Can apply sphericity correction automatically in the case of within-subject (repeated measures) designs.
welch_anova_test(): Welch one-Way ANOVA test. A pipe-friendly wrapper around the base function stats::oneway.test(). This is is an alternative to the standard one-way ANOVA in the situation where the homogeneity of variance assumption is violated.
kruskal_test(): perform kruskal-wallis rank sum test
friedman_test(): Provides a pipe-friendly framework to perform a Friedman rank sum test, which is the non-parametric alternative to the one-way repeated measures ANOVA test.
get_comparisons(): Create a list of possible pairwise comparisons between groups.
get_pvalue_position(): autocompute p-value positions for plotting significance using ggplot2.

Facilitating ANOVA computation in R

factorial_design(): build factorial design for easily computing ANOVA using the car::Anova() function. This might be very useful for repeated measures ANOVA, which is hard to set up with the car package.
anova_summary(): Create beautiful summary tables of ANOVA test results obtained from either car::Anova() or stats::aov(). The results include ANOVA table, generalized effect size and some assumption checks, such as Mauchly’s test for sphericity in the case of repeated measures ANOVA.

Post-hoc analyses

tukey_hsd(): performs tukey post-hoc tests. Can handle different inputs formats: aov, lm, formula.
dunn_test(): compute multiple pairwise comparisons following Kruskal-Wallis test.
games_howell_test(): Performs Games-Howell test, which is used to compare all possible combinations of group differences when the assumption of homogeneity of variances is violated.
emmeans_test(): pipe-friendly wrapper arround emmeans function to perform pairwise comparisons of estimated marginal means. Useful for post-hoc analyses following up ANOVA/ANCOVA tests.

Comparing proportions

prop_test(), pairwise_prop_test() and row_wise_prop_test(). Performs one-sample and two-samples z-test of proportions. Wrappers around the R base function prop.test() but have the advantage of performing pairwise and row-wise z-test of two proportions, the post-hoc tests following a significant chi-square test of homogeneity for 2xc and rx2 contingency tables.
fisher_test(), pairwise_fisher_test() and row_wise_fisher_test(): Fisher’s exact test for count data. Wrappers around the R base function fisher.test() but have the advantage of performing pairwise and row-wise fisher tests, the post-hoc tests following a significant chi-square test of homogeneity for 2xc and rx2 contingency tables.
chisq_test(), pairwise_chisq_gof_test(), pairwise_chisq_test_against_p(): Performs chi-squared tests, including goodness-of-fit, homogeneity and independence tests.
binom_test(), pairwise_binom_test(), pairwise_binom_test_against_p(): Performs exact binomial test and pairwise comparisons following a significant exact multinomial test. Alternative to the chi-square test of goodness-of-fit-test when the sample.
multinom_test(): performs an exact multinomial test. Alternative to the chi-square test of goodness-of-fit-test when the sample size is small.
mcnemar_test(): performs McNemar chi-squared test to compare paired proportions. Provides pairwise comparisons between multiple groups.
cochran_qtest(): extension of the McNemar Chi-squared test for comparing more than two paired proportions.
prop_trend_test(): Performs chi-squared test for trend in proportion. This test is also known as Cochran-Armitage trend test.

Comparing variances

levene_test(): Pipe-friendly framework to easily compute Levene’s test for homogeneity of variance across groups. Handles grouped data.
box_m(): Box’s M-test for homogeneity of covariance matrices

Effect Size

cohens_d(): Compute cohen’s d measure of effect size for t-tests.
wilcox_effsize(): Compute Wilcoxon effect size (r).
eta_squared() and partial_eta_squared(): Compute effect size for ANOVA.
kruskal_effsize(): Compute the effect size for Kruskal-Wallis test as the eta squared based on the H-statistic.
friedman_effsize(): Compute the effect size of Friedman test using the Kendall’s W value.
cramer_v(): Compute Cramer’s V, which measures the strength of the association between categorical variables.

Correlation analysis

Computing correlation:

cor_test(): correlation test between two or more variables using Pearson, Spearman or Kendall methods.
cor_mat(): compute correlation matrix with p-values. Returns a data frame containing the matrix of the correlation coefficients. The output has an attribute named “pvalue”, which contains the matrix of the correlation test p-values.
cor_get_pval(): extract a correlation matrix p-values from an object of class cor_mat().
cor_pmat(): compute the correlation matrix, but returns only the p-values of the correlation tests.
as_cor_mat(): convert a cor_test object into a correlation matrix format.

Reshaping correlation matrix:

cor_reorder(): reorder correlation matrix, according to the coefficients, using the hierarchical clustering method.
cor_gather(): takes a correlation matrix and collapses (or melt) it into long format data frame (paired list)
cor_spread(): spread a long correlation data frame into wide format (correlation matrix).

Subsetting correlation matrix:

cor_select(): subset a correlation matrix by selecting variables of interest.
pull_triangle(), pull_upper_triangle(), pull_lower_triangle(): pull upper and lower triangular parts of a (correlation) matrix.
replace_triangle(), replace_upper_triangle(), replace_lower_triangle(): replace upper and lower triangular parts of a (correlation) matrix.

Visualizing correlation matrix:

cor_as_symbols(): replaces the correlation coefficients, in a matrix, by symbols according to the value.
cor_plot(): visualize correlation matrix using base plot.
cor_mark_significant(): add significance levels to a correlation matrix.

Adjusting p-values, formatting and adding significance symbols

adjust_pvalue(): add an adjusted p-values column to a data frame containing statistical test p-values
add_significance(): add a column containing the p-value significance level
p_round(), p_format(), p_mark_significant(): rounding and formatting p-values

Extract information from statistical tests

Extract information from statistical test results. Useful for labelling plots with test outputs.

get_pwc_label(): Extract label from pairwise comparisons.
get_test_label(): Extract label from statistical tests.
create_test_label(): Create labels from user specified test results.

Data manipulation helper functions

These functions are internally used in the rstatix and in the ggpubr R package to make it easy to program with tidyverse packages using non standard evaluation.

df_select(), df_arrange(), df_group_by(): wrappers arround dplyr functions for supporting standard and non standard evaluations.
df_nest_by(): Nest a tibble data frame using grouping specification. Supports standard and non standard evaluations.
df_split_by(): Split a data frame by groups into subsets or data panel. Very similar to the function df_nest_by(). The only difference is that, it adds labels to each data subset. Labels are the combination of the grouping variable levels.
df_unite(): Unite multiple columns into one.
df_unite_factors(): Unite factor columns. First, order factors levels then merge them into one column. The output column is a factor.
df_label_both(), df_label_value(): functions to label data frames rows by by one or multiple grouping variables.
df_get_var_names(): Returns user specified variable names. Supports standard and non standard evaluation.

Others

doo(): alternative to dplyr::do for doing anything. Technically it uses nest() + mutate() + map() to apply arbitrary computation to a grouped data frame.
sample_n_by(): sample n rows by group from a table
convert_as_factor(), set_ref_level(), reorder_levels(): Provides pipe-friendly functions to convert simultaneously multiple variables into a factor variable.
make_clean_names(): Pipe-friendly function to make syntactically valid column names (for input data frame) or names (for input vector).
counts_to_cases(): converts a contingency table or a data frame of counts into a data frame of individual observations.

Installation and loading

Install the latest developmental version from GitHub as follow:

if(!require(devtools)) install.packages("devtools")
devtools::install_github("kassambara/rstatix")

Or install from CRAN as follow:

install.packages("rstatix")

Loading packages

library(rstatix)  
library(ggpubr)  # For easy data-visualization

Descriptive statistics

# Summary statistics of some selected variables
#::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
iris %>% 
  get_summary_stats(Sepal.Length, Sepal.Width, type = "common")
#> # A tibble: 2 x 10
#>   variable         n   min   max median   iqr  mean    sd    se    ci
#>   <fct>        <dbl> <dbl> <dbl>  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 Sepal.Length   150   4.3   7.9    5.8   1.3  5.84 0.828 0.068 0.134
#> 2 Sepal.Width    150   2     4.4    3     0.5  3.06 0.436 0.036 0.07

# Whole data frame
#::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
iris %>% get_summary_stats(type = "common")
#> # A tibble: 4 x 10
#>   variable         n   min   max median   iqr  mean    sd    se    ci
#>   <fct>        <dbl> <dbl> <dbl>  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 Sepal.Length   150   4.3   7.9   5.8    1.3  5.84 0.828 0.068 0.134
#> 2 Sepal.Width    150   2     4.4   3      0.5  3.06 0.436 0.036 0.07 
#> 3 Petal.Length   150   1     6.9   4.35   3.5  3.76 1.76  0.144 0.285
#> 4 Petal.Width    150   0.1   2.5   1.3    1.5  1.20 0.762 0.062 0.123


# Grouped data
#::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
iris %>%
  group_by(Species) %>% 
  get_summary_stats(Sepal.Length, type = "mean_sd")
#> # A tibble: 3 x 5
#>   Species    variable         n  mean    sd
#>   <fct>      <fct>        <dbl> <dbl> <dbl>
#> 1 setosa     Sepal.Length    50  5.01 0.352
#> 2 versicolor Sepal.Length    50  5.94 0.516
#> 3 virginica  Sepal.Length    50  6.59 0.636

Comparing two means

To compare the means of two groups, you can use either the function t_test() (parametric) or wilcox_test() (non-parametric). In the following example the t-test will be illustrated.

Data

Preparing the demo data set:

df <- ToothGrowth
df$dose <- as.factor(df$dose)
head(df)
#>    len supp dose
#> 1  4.2   VC  0.5
#> 2 11.5   VC  0.5
#> 3  7.3   VC  0.5
#> 4  5.8   VC  0.5
#> 5  6.4   VC  0.5
#> 6 10.0   VC  0.5

One-sample test

The one-sample test is used to compare the mean of one sample to a known standard (or theoretical / hypothetical) mean (mu).

df %>% t_test(len ~ 1, mu = 0)
#> # A tibble: 1 x 7
#>   .y.   group1 group2         n statistic    df        p
#> * <chr> <chr>  <chr>      <int>     <dbl> <dbl>    <dbl>
#> 1 len   1      null model    60      19.1    59 6.94e-27
# One-sample test of each dose level
df %>% 
  group_by(dose) %>%
  t_test(len ~ 1, mu = 0)
#> # A tibble: 3 x 8
#>   dose  .y.   group1 group2         n statistic    df        p
#> * <fct> <chr> <chr>  <chr>      <int>     <dbl> <dbl>    <dbl>
#> 1 0.5   len   1      null model    20      10.5    19 2.24e- 9
#> 2 1     len   1      null model    20      20.0    19 3.22e-14
#> 3 2     len   1      null model    20      30.9    19 1.03e-17

Compare two independent groups

Create a simple box plot with p-values:

# T-test
stat.test <- df %>% 
  t_test(len ~ supp, paired = FALSE) 
stat.test
#> # A tibble: 1 x 8
#>   .y.   group1 group2    n1    n2 statistic    df      p
#> * <chr> <chr>  <chr>  <int> <int>     <dbl> <dbl>  <dbl>
#> 1 len   OJ     VC        30    30      1.92  55.3 0.0606

# Create a box plot
p <- ggboxplot(
  df, x = "supp", y = "len", 
  color = "supp", palette = "jco", ylim = c(0,40)
  )
# Add the p-value manually
p + stat_pvalue_manual(stat.test, label = "p", y.position = 35)

Customize labels using glue expression:

p +stat_pvalue_manual(stat.test, label = "T-test, p = {p}", 
                      y.position = 36)

Grouped data: compare supp levels after grouping the data by “dose”

# Statistical test
stat.test <- df %>%
  group_by(dose) %>%
  t_test(len ~ supp) %>%
  adjust_pvalue() %>%
  add_significance("p.adj")
stat.test
#> # A tibble: 3 x 11
#>   dose  .y.   group1 group2    n1    n2 statistic    df       p   p.adj
#>   <fct> <chr> <chr>  <chr>  <int> <int>     <dbl> <dbl>   <dbl>   <dbl>
#> 1 0.5   len   OJ     VC        10    10    3.17    15.0 0.00636 0.0127 
#> 2 1     len   OJ     VC        10    10    4.03    15.4 0.00104 0.00312
#> 3 2     len   OJ     VC        10    10   -0.0461  14.0 0.964   0.964  
#> # … with 1 more variable: p.adj.signif <chr>

# Visualization
ggboxplot(
  df, x = "supp", y = "len",
  color = "supp", palette = "jco", facet.by = "dose",
  ylim = c(0, 40)
  ) +
  stat_pvalue_manual(stat.test, label = "p.adj", y.position = 35)

Compare paired samples

# T-test
stat.test <- df %>% 
  t_test(len ~ supp, paired = TRUE) 
stat.test
#> # A tibble: 1 x 8
#>   .y.   group1 group2    n1    n2 statistic    df       p
#> * <chr> <chr>  <chr>  <int> <int>     <dbl> <dbl>   <dbl>
#> 1 len   OJ     VC        30    30      3.30    29 0.00255

# Box plot
p <- ggpaired(
  df, x = "supp", y = "len", color = "supp", palette = "jco", 
  line.color = "gray", line.size = 0.4, ylim = c(0, 40)
  )
p + stat_pvalue_manual(stat.test, label = "p", y.position = 36)

Multiple pairwise comparisons

Pairwise comparisons: if the grouping variable contains more than two categories, a pairwise comparison is automatically performed.

# Pairwise t-test
pairwise.test <- df %>% t_test(len ~ dose)
pairwise.test
#> # A tibble: 3 x 10
#>   .y.   group1 group2    n1    n2 statistic    df        p    p.adj p.adj.signif
#> * <chr> <chr>  <chr>  <int> <int>     <dbl> <dbl>    <dbl>    <dbl> <chr>       
#> 1 len   0.5    1         20    20     -6.48  38.0 1.27e- 7 2.54e- 7 ****        
#> 2 len   0.5    2         20    20    -11.8   36.9 4.40e-14 1.32e-13 ****        
#> 3 len   1      2         20    20     -4.90  37.1 1.91e- 5 1.91e- 5 ****
# Box plot
ggboxplot(df, x = "dose", y = "len")+
  stat_pvalue_manual(
    pairwise.test, label = "p.adj", 
    y.position = c(29, 35, 39)
    )

Multiple pairwise comparisons against reference group: each level is compared to the ref group

# Comparison against reference group
#::::::::::::::::::::::::::::::::::::::::
# T-test: each level is compared to the ref group
stat.test <- df %>% t_test(len ~ dose, ref.group = "0.5")
stat.test
#> # A tibble: 2 x 10
#>   .y.   group1 group2    n1    n2 statistic    df        p    p.adj p.adj.signif
#> * <chr> <chr>  <chr>  <int> <int>     <dbl> <dbl>    <dbl>    <dbl> <chr>       
#> 1 len   0.5    1         20    20     -6.48  38.0 1.27e- 7 1.27e- 7 ****        
#> 2 len   0.5    2         20    20    -11.8   36.9 4.40e-14 8.80e-14 ****
# Box plot
ggboxplot(df, x = "dose", y = "len", ylim = c(0, 40)) +
  stat_pvalue_manual(
    stat.test, label = "p.adj.signif", 
    y.position = c(29, 35)
    )

# Remove bracket
ggboxplot(df, x = "dose", y = "len", ylim = c(0, 40)) +
  stat_pvalue_manual(
    stat.test, label = "p.adj.signif", 
    y.position = c(29, 35),
    remove.bracket = TRUE
    )

Multiple pairwise comparisons against all (base-mean): Comparison of each group against base-mean.

# T-test
stat.test <- df %>% t_test(len ~ dose, ref.group = "all")
stat.test
#> # A tibble: 3 x 10
#>   .y.   group1 group2    n1    n2 statistic    df         p   p.adj p.adj.signif
#> * <chr> <chr>  <chr>  <int> <int>     <dbl> <dbl>     <dbl>   <dbl> <chr>       
#> 1 len   all    0.5       60    20     5.82   56.4   2.90e-7 8.70e-7 ****        
#> 2 len   all    1         60    20    -0.660  57.5   5.12e-1 5.12e-1 ns          
#> 3 len   all    2         60    20    -5.61   66.5   4.25e-7 8.70e-7 ****
# Box plot with horizontal mean line
ggboxplot(df, x = "dose", y = "len") +
  stat_pvalue_manual(
    stat.test, label = "p.adj.signif", 
    y.position = 35,
    remove.bracket = TRUE
    ) +
  geom_hline(yintercept = mean(df$len), linetype = 2)

ANOVA test

# One-way ANOVA test
#:::::::::::::::::::::::::::::::::::::::::
df %>% anova_test(len ~ dose)
#> ANOVA Table (type II tests)
#> 
#>   Effect DFn DFd      F        p p<.05   ges
#> 1   dose   2  57 67.416 9.53e-16     * 0.703

# Two-way ANOVA test
#:::::::::::::::::::::::::::::::::::::::::
df %>% anova_test(len ~ supp*dose)
#> ANOVA Table (type II tests)
#> 
#>      Effect DFn DFd      F        p p<.05   ges
#> 1      supp   1  54 15.572 2.31e-04     * 0.224
#> 2      dose   2  54 92.000 4.05e-18     * 0.773
#> 3 supp:dose   2  54  4.107 2.20e-02     * 0.132

# Two-way repeated measures ANOVA
#:::::::::::::::::::::::::::::::::::::::::
df$id <- rep(1:10, 6) # Add individuals id
# Use formula
# df %>% anova_test(len ~ supp*dose + Error(id/(supp*dose)))
# or use character vector
df %>% anova_test(dv = len, wid = id, within = c(supp, dose))
#> ANOVA Table (type III tests)
#> 
#> $ANOVA
#>      Effect DFn DFd       F        p p<.05   ges
#> 1      supp   1   9  34.866 2.28e-04     * 0.224
#> 2      dose   2  18 106.470 1.06e-10     * 0.773
#> 3 supp:dose   2  18   2.534 1.07e-01       0.132
#> 
#> $`Mauchly's Test for Sphericity`
#>      Effect     W     p p<.05
#> 1      dose 0.807 0.425      
#> 2 supp:dose 0.934 0.761      
#> 
#> $`Sphericity Corrections`
#>      Effect   GGe      DF[GG]    p[GG] p[GG]<.05   HFe      DF[HF]    p[HF]
#> 1      dose 0.838 1.68, 15.09 2.79e-09         * 1.008 2.02, 18.15 1.06e-10
#> 2 supp:dose 0.938 1.88, 16.88 1.12e-01           1.176 2.35, 21.17 1.07e-01
#>   p[HF]<.05
#> 1         *
#> 2

# Use model as arguments
#:::::::::::::::::::::::::::::::::::::::::
.my.model <- lm(yield ~ block + N*P*K, npk)
anova_test(.my.model)
#> ANOVA Table (type II tests)
#> 
#>   Effect DFn DFd      F     p p<.05   ges
#> 1  block   4  12  4.959 0.014     * 0.623
#> 2      N   1  12 12.259 0.004     * 0.505
#> 3      P   1  12  0.544 0.475       0.043
#> 4      K   1  12  6.166 0.029     * 0.339
#> 5    N:P   1  12  1.378 0.263       0.103
#> 6    N:K   1  12  2.146 0.169       0.152
#> 7    P:K   1  12  0.031 0.863       0.003
#> 8  N:P:K   0  12     NA    NA  <NA>    NA

Correlation tests

# Data preparation
mydata <- mtcars %>% 
  select(mpg, disp, hp, drat, wt, qsec)
head(mydata, 3)
#>                mpg disp  hp drat    wt  qsec
#> Mazda RX4     21.0  160 110 3.90 2.620 16.46
#> Mazda RX4 Wag 21.0  160 110 3.90 2.875 17.02
#> Datsun 710    22.8  108  93 3.85 2.320 18.61

# Correlation test between two variables
mydata %>% cor_test(wt, mpg, method = "pearson")
#> # A tibble: 1 x 8
#>   var1  var2    cor statistic        p conf.low conf.high method 
#>   <chr> <chr> <dbl>     <dbl>    <dbl>    <dbl>     <dbl> <chr>  
#> 1 wt    mpg   -0.87     -9.56 1.29e-10   -0.934    -0.744 Pearson

# Correlation of one variable against all
mydata %>% cor_test(mpg, method = "pearson")
#> # A tibble: 5 x 8
#>   var1  var2    cor statistic        p conf.low conf.high method 
#>   <chr> <chr> <dbl>     <dbl>    <dbl>    <dbl>     <dbl> <chr>  
#> 1 mpg   disp  -0.85     -8.75 9.38e-10  -0.923     -0.708 Pearson
#> 2 mpg   hp    -0.78     -6.74 1.79e- 7  -0.885     -0.586 Pearson
#> 3 mpg   drat   0.68      5.10 1.78e- 5   0.436      0.832 Pearson
#> 4 mpg   wt    -0.87     -9.56 1.29e-10  -0.934     -0.744 Pearson
#> 5 mpg   qsec   0.42      2.53 1.71e- 2   0.0820     0.670 Pearson

# Pairwise correlation test between all variables
mydata %>% cor_test(method = "pearson")
#> # A tibble: 36 x 8
#>    var1  var2    cor statistic        p conf.low conf.high method 
#>    <chr> <chr> <dbl>     <dbl>    <dbl>    <dbl>     <dbl> <chr>  
#>  1 mpg   mpg    1       Inf    0.         1          1     Pearson
#>  2 mpg   disp  -0.85     -8.75 9.38e-10  -0.923     -0.708 Pearson
#>  3 mpg   hp    -0.78     -6.74 1.79e- 7  -0.885     -0.586 Pearson
#>  4 mpg   drat   0.68      5.10 1.78e- 5   0.436      0.832 Pearson
#>  5 mpg   wt    -0.87     -9.56 1.29e-10  -0.934     -0.744 Pearson
#>  6 mpg   qsec   0.42      2.53 1.71e- 2   0.0820     0.670 Pearson
#>  7 disp  mpg   -0.85     -8.75 9.38e-10  -0.923     -0.708 Pearson
#>  8 disp  disp   1       Inf    0.         1          1     Pearson
#>  9 disp  hp     0.79      7.08 7.14e- 8   0.611      0.893 Pearson
#> 10 disp  drat  -0.71     -5.53 5.28e- 6  -0.849     -0.481 Pearson
#> # … with 26 more rows

Correlation matrix

# Compute correlation matrix
#::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
cor.mat <- mydata %>% cor_mat()
cor.mat
#> # A tibble: 6 x 7
#>   rowname   mpg  disp    hp   drat    wt   qsec
#> * <chr>   <dbl> <dbl> <dbl>  <dbl> <dbl>  <dbl>
#> 1 mpg      1    -0.85 -0.78  0.68  -0.87  0.42 
#> 2 disp    -0.85  1     0.79 -0.71   0.89 -0.43 
#> 3 hp      -0.78  0.79  1    -0.45   0.66 -0.71 
#> 4 drat     0.68 -0.71 -0.45  1     -0.71  0.091
#> 5 wt      -0.87  0.89  0.66 -0.71   1    -0.17 
#> 6 qsec     0.42 -0.43 -0.71  0.091 -0.17  1

# Show the significance levels
#::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
cor.mat %>% cor_get_pval()
#> # A tibble: 6 x 7
#>   rowname      mpg     disp           hp       drat        wt       qsec
#>   <chr>      <dbl>    <dbl>        <dbl>      <dbl>     <dbl>      <dbl>
#> 1 mpg     0.       9.38e-10 0.000000179  0.0000178  1.29e- 10 0.0171    
#> 2 disp    9.38e-10 0.       0.0000000714 0.00000528 1.22e- 11 0.0131    
#> 3 hp      1.79e- 7 7.14e- 8 0            0.00999    4.15e-  5 0.00000577
#> 4 drat    1.78e- 5 5.28e- 6 0.00999      0          4.78e-  6 0.62      
#> 5 wt      1.29e-10 1.22e-11 0.0000415    0.00000478 2.27e-236 0.339     
#> 6 qsec    1.71e- 2 1.31e- 2 0.00000577   0.62       3.39e-  1 0

# Replacing correlation coefficients by symbols
#::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
cor.mat %>%
  cor_as_symbols() %>%
  pull_lower_triangle()
#>   rowname mpg disp hp drat wt qsec
#> 1     mpg                         
#> 2    disp   *                     
#> 3      hp   *    *                
#> 4    drat   +    +  .             
#> 5      wt   *    *  +    +        
#> 6    qsec   .    .  +

# Mark significant correlations
#::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
cor.mat %>%
  cor_mark_significant()
#>   rowname       mpg      disp        hp      drat    wt qsec
#> 1     mpg                                                   
#> 2    disp -0.85****                                         
#> 3      hp -0.78****  0.79****                               
#> 4    drat  0.68**** -0.71****   -0.45**                     
#> 5      wt -0.87****  0.89****  0.66**** -0.71****           
#> 6    qsec     0.42*    -0.43* -0.71****     0.091 -0.17


# Draw correlogram using R base plot
#::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
cor.mat %>%
  cor_reorder() %>%
  pull_lower_triangle() %>% 
  cor_plot()

rstatix's People

Contributors

Stargazers

Watchers

Forkers

mutual-ai mz-cy-han1998 carlosdancing lazycrazyowl guhjy babasaraki fsebasg gravitytrope acaro12 ntk321 karthy257 romainfrancois muefong tbudylin dockerytong hyiltiz hannesoberreiter bhaskar-jr chencaf sultan-mia ivanchoanb msyll830 srikanthkris housemeyster jmbarbone forked-packages shicheng-guo zwael ashkanpakzad wwzhang luffyluffy genomicsnx hufanglq owain-s fabifuu lupupam khemlalnirmalkar lixiang117423 psychelzh akshnaidu shiry123 snijesh llrs mys721tx sand94 drvictorvs stacithornton erhan-cene jakub-jedrusiak gene-knight9

rstatix's Issues

add_xy_position() can not be used in dodged plots

Hi,
Kassambara
I found add_xy_position can not be used in dodged plots. Here is an example:

df <- read.csv("data.csv")
df

id density_IIB A B D
1 1 13748.63 CC TT qq
2 2 15551.73 CC TT Qq
3 3 15446.55 CC TT Qq
4 4 18932.55 CC TT qq
5 5 22789.17 GC CT Qq
6 6 17179.53 GG CC Qq
7 7 22583.82 GC CT Qq
8 8 15747.07 GG CC Qq
9 9 13943.97 GG CC QQ
10 10 18707.16 GC CT qq
11 11 19263.11 CC TT Qq
12 12 17850.69 CC TT QQ

library(tidyverse)
library(RColorBrewer)
library(rstatix)

Prepareing the data frame:

df2 <- df %>%
  gather("class", "genotype", 3:5) %>% 
  unite(class2, class:genotype, sep = "_", remove = FALSE) 
df2

id density_IIB class2 class genotype
1 1 13748.63 A_CC A CC
2 2 15551.73 A_CC A CC
3 3 15446.55 A_CC A CC
4 4 18932.55 A_CC A CC
5 5 22789.17 A_GC A GC
6 6 17179.53 A_GG A GG
7 7 22583.82 A_GC A GC
8 8 15747.07 A_GG A GG
9 9 13943.97 A_GG A GG
10 10 18707.16 A_GC A GC
11 11 19263.11 A_CC A CC
12 12 17850.69 A_CC A CC
13 1 13748.63 B_TT B TT
14 2 15551.73 B_TT B TT
15 3 15446.55 B_TT B TT
16 4 18932.55 B_TT B TT
17 5 22789.17 B_CT B CT
18 6 17179.53 B_CC B CC
19 7 22583.82 B_CT B CT
20 8 15747.07 B_CC B CC
21 9 13943.97 B_CC B CC
22 10 18707.16 B_CT B CT
23 11 19263.11 B_TT B TT
24 12 17850.69 B_TT B TT
25 1 13748.63 D_qq D qq
26 2 15551.73 D_Qq D Qq
27 3 15446.55 D_Qq D Qq
28 4 18932.55 D_qq D qq
29 5 22789.17 D_Qq D Qq
30 6 17179.53 D_Qq D Qq
31 7 22583.82 D_Qq D Qq
32 8 15747.07 D_Qq D Qq
33 9 13943.97 D_QQ D QQ
34 10 18707.16 D_qq D qq
35 11 19263.11 D_Qq D Qq
36 12 17850.69 D_QQ D QQ

Computing p-value using the pairwise t test:

df2_stat <- df2 %>% 
  group_by(class) %>% 
  pairwise_t_test(
  density_IIB ~ class2, pool.sd = FALSE,
  p.adjust.method = "bonferroni"
) %>% 
  # dplyr::group_by(class) %>%  I tried to use the group function but it doesn't work...
  add_xy_position(x = "class",dodge = 1,step.increase = .1)

And then ploting using ggplot2:

ggplot()+
  geom_boxplot(data = df2, aes(x = class, y = density_IIB, fill = class2),
               position = position_dodge(1))+
  scale_fill_manual(values = rep(brewer.pal(3, "Dark2"), each = 3))+
  ggpubr::stat_pvalue_manual(data = df2_stat, hide.ns = FALSE)

And I got the plot like this:

So how can I adjust the position of the p-values correctly? Do you have any suggestion?
I know using facet may be a better choice, but if I don't want to facet the plot, what should I do?

Thank you very much!
TZ

Conflicting results in anova_test() for formula and argument input for two-way and higher order designs

When conducting a between-subjects ANOVA with two or more factors using anova_test(), the results depend on how the model was defined (as formula vs. as function arguments). As checked with JASP and SPSS, the results are only correct, if the model is defined via function arguments. The formula call gives wrong results.

library(tidyverse)
library(rstatix)
#> 
#> Attache Paket: 'rstatix'
#> The following object is masked from 'package:stats':
#> 
#>     filter
library(datarium)

satisfaction <- datarium::jobsatisfaction

satisfaction %>% 
  anova_test(score ~ gender*education_level, type = 3)
#> Coefficient covariances computed by hccm()
#> ANOVA Table (type III tests)
#> 
#>                   Effect DFn DFd       F        p p<.05   ges
#> 1                 gender   1  52   1.547 2.19e-01       0.029
#> 2        education_level   2  52 132.432 3.92e-21     * 0.836
#> 3 gender:education_level   2  52   7.338 2.00e-03     * 0.220

satisfaction %>% 
  anova_test(dv = score, between = c(gender, education_level), type = 3)
#> Coefficient covariances computed by hccm()
#> ANOVA Table (type III tests)
#> 
#>                   Effect DFn DFd       F        p p<.05   ges
#> 1                 gender   1  52   0.586 4.48e-01       0.011
#> 2        education_level   2  52 189.172 1.37e-24     * 0.879
#> 3 gender:education_level   2  52   7.338 2.00e-03     * 0.220

^{Created on 2020-05-05 by the reprex package (v0.3.0)}

Prepare for tidyr v1.0.0

We see problems in our revdep checks for tidyr v1.0.0.

Since we've interacted recently about 2 other packages and you seem to be preparing a release, maybe you can kill multiple birds with one stone. Here's what we're seeing:

https://github.com/tidyverse/tidyr/blob/master/revdep/problems.md#rstatix

Adapt to dplyr dev version which no longer support adding an extra class to a data frame

suppressPackageStartupMessages(library(dplyr))

mydata <- as_tibble(ToothGrowth)
class(mydata)
#> [1] "tbl_df"     "tbl"        "data.frame"

# Using an additional "custom_class"
mydata_2 <- mydata
class(mydata_2) <- c(class(mydata_2), "custom_class")
class(mydata_2)
#> [1] "tbl_df"       "tbl"          "data.frame"   "custom_class"

# This works
mydata %>% mutate(len_2 = 2*len)
#> # A tibble: 60 x 4
#>      len supp   dose len_2
#>    <dbl> <fct> <dbl> <dbl>
#>  1   4.2 VC      0.5   8.4
#>  2  11.5 VC      0.5  23  
#>  3   7.3 VC      0.5  14.6
#>  4   5.8 VC      0.5  11.6
#>  5   6.4 VC      0.5  12.8
#>  6  10   VC      0.5  20  
#>  7  11.2 VC      0.5  22.4
#>  8  11.2 VC      0.5  22.4
#>  9   5.2 VC      0.5  10.4
#> 10   7   VC      0.5  14  
#> # … with 50 more rows

# This no longer works in dplyr dev version
mydata_2 %>% mutate(len_2 = 2*len)
#> Error: Input must be a vector, not a `tbl_df/tbl/data.frame/custom_class` object.
#> Backtrace:
#>      █
#>   1. ├─mydata_2 %>% mutate(len_2 = 2 * len)
#>   2. │ ├─base::withVisible(eval(quote(`_fseq`(`_lhs`)), env, env))
#>   3. │ └─base::eval(quote(`_fseq`(`_lhs`)), env, env)
#>   4. │   └─base::eval(expr, envir, enclos)
#>   5. │     └─`_fseq`(`_lhs`)
#>   6. │       └─magrittr::freduce(value, `_function_list`)
#>   7. │         ├─base::withVisible(function_list[[k]](value))
#>   8. │         └─function_list[[k]](value)
#>   9. │           ├─dplyr::mutate(., len_2 = 2 * len)
#>  10. │           └─dplyr:::mutate.data.frame(., len_2 = 2 * len)
#>  11. │             └─dplyr:::mutate_cols(.data, ...)
#>  12. │               └─DataMask$new(.data, caller_env())
#>  13. │                 └─.subset2(public_bind_env, "initialize")(...)
#>  14. │                   └─dplyr::group_rows(data)
#>  15. │                     ├─dplyr::group_data(.data)
#>  16. │                     └─dplyr:::group_data.data.frame(.data)
#>  17. │                       └─vctrs::vec_init(.data[, 0], 1)
#>  18. └─vctrs:::stop_scalar_type(...)
#>  19.   └─vctrs:::stop_vctrs(msg, "vctrs_error_scalar_type", actual = x)

^{Created on 2020-03-29 by the reprex package (v0.3.0.9001)}

Session info

sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value                       
#>  version  R version 3.3.2 (2016-10-31)
#>  os       macOS Sierra 10.12.6        
#>  system   x86_64, darwin13.4.0        
#>  ui       X11                         
#>  language (EN)                        
#>  collate  fr_FR.UTF-8                 
#>  ctype    fr_FR.UTF-8                 
#>  tz       Europe/Paris                
#>  date     2020-03-29                  
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package     * version     date       lib source                            
#>  assertthat    0.2.1.9000  2019-08-04 [1] Github (hadley/assertthat@50dc4b0)
#>  backports     1.1.5       2019-10-02 [1] CRAN (R 3.3.2)                    
#>  cli           2.0.2       2020-02-28 [1] CRAN (R 3.3.2)                    
#>  crayon        1.3.4       2017-09-16 [1] CRAN (R 3.3.2)                    
#>  digest        0.6.25      2020-02-23 [1] CRAN (R 3.3.2)                    
#>  dplyr       * 0.8.99.9002 2020-03-28 [1] Github (tidyverse/dplyr@c7f2936)  
#>  evaluate      0.14        2019-05-28 [1] CRAN (R 3.3.2)                    
#>  fansi         0.4.1       2020-01-08 [1] CRAN (R 3.3.2)                    
#>  fs            1.3.1       2019-05-06 [1] CRAN (R 3.3.2)                    
#>  generics      0.0.2       2018-11-29 [1] CRAN (R 3.3.2)                    
#>  glue          1.3.2       2020-03-12 [1] CRAN (R 3.3.2)                    
#>  highr         0.8         2019-03-20 [1] CRAN (R 3.3.2)                    
#>  htmltools     0.4.0       2019-10-04 [1] CRAN (R 3.3.2)                    
#>  knitr         1.28        2020-02-06 [1] CRAN (R 3.3.2)                    
#>  lifecycle     0.2.0.9000  2020-03-28 [1] Github (r-lib/lifecycle@355dcba)  
#>  magrittr      1.5         2014-11-22 [1] CRAN (R 3.3.0)                    
#>  pillar        1.4.3       2019-12-20 [1] CRAN (R 3.3.2)                    
#>  pkgconfig     2.0.3       2019-09-22 [1] CRAN (R 3.3.2)                    
#>  purrr         0.3.3       2019-10-18 [1] CRAN (R 3.3.2)                    
#>  R6            2.4.1       2019-11-12 [1] CRAN (R 3.3.2)                    
#>  Rcpp          1.0.3       2019-11-08 [1] CRAN (R 3.3.2)                    
#>  reprex        0.3.0.9001  2020-03-03 [1] Github (tidyverse/reprex@a019cc4) 
#>  rlang         0.4.5.9000  2020-03-28 [1] Github (r-lib/rlang@a90b04b)      
#>  rmarkdown     2.1.1       2020-03-03 [1] Github (rstudio/rmarkdown@1025379)
#>  rstudioapi    0.10        2019-03-19 [1] CRAN (R 3.3.2)                    
#>  sessioninfo   1.1.1       2018-11-05 [1] CRAN (R 3.3.2)                    
#>  stringi       1.4.6       2020-02-17 [1] CRAN (R 3.3.2)                    
#>  stringr       1.4.0       2019-02-10 [1] CRAN (R 3.3.2)                    
#>  styler        1.0.2       2018-06-26 [1] CRAN (R 3.3.2)                    
#>  tibble        2.1.3       2019-06-06 [1] CRAN (R 3.3.2)                    
#>  tidyselect    1.0.0       2020-01-27 [1] CRAN (R 3.3.2)                    
#>  utf8          1.1.4       2018-05-24 [1] CRAN (R 3.3.2)                    
#>  vctrs         0.2.99.9010 2020-03-28 [1] Github (r-lib/vctrs@67c49a1)      
#>  withr         2.1.2.9000  2020-03-03 [1] Github (jimhester/withr@16d47fd)  
#>  xfun          0.12        2020-01-13 [1] CRAN (R 3.3.2)                    
#>  yaml          2.2.1       2020-02-01 [1] CRAN (R 3.3.2)                    
#> 
#> [1] /Library/Frameworks/R.framework/Versions/3.3/Resources/library

bug in `get_summary_stats`: type = "quantile" not working

library(rstatix)
#> 
#> Attaching package: 'rstatix'
#> The following object is masked from 'package:stats':
#> 
#>     filter

iris %>% 
    rstatix::get_summary_stats(Petal.Length, type = "quantile", probs = seq(0, 1, 0.25))
#> Error: Can't subset columns that don't exist.
#> x The column `data` doesn't exist.

^{Created on 2020-04-28 by the reprex package (v0.3.0)}

Other types ("mean", "mean_sd"...) are working fine.

Add sample count

This is a wonderful tool. I also like your ggpubr and survminer.

Sample count of each group is an important measure of a statistical test. Is it possible to add this info to the result? I think it will help the user to understand their data and also introduce a variable for visualization.

Best,
Shixiang

feature request: pass lm() specific arguments (e.g., weights) to lm() calls

stats::lm() takes additional arguments such as "weights". It would be great to be able to pass this argument to calls of it within anova_test() and factorial_design()

Some issues with p.adjust.method and combining p.values from multiple tests

Hi again,
I'm trying to use your package for stuff I normally do in R. And so as I face difficulties, I'm posting here. If I'm doing something obviously wrong apologies again.
I've three issues this time:

In t_test although F1 says that p.adjust.method = "holm" by default, it doesn't seem to be correcting p.values by default. In fact the p.adjust.method seems to be broken for this function and could potentially be a bug.
In pairwise_t_test, setting p.adjust.method = "none" still returns p.adj and p.adj.signif columns (eventhough these don't have adjusted values)
I'm trying to combine two different tests. In the first test I perform t_test with mu = 100. In the second test I perform pairwise tests using either a list of comparisons or pairwise_t_test. So my requirement here is that first perform the two tests without p value adjustment. Then combine the tables and adjust the p.values (so that p values are adjusted across all tests rather than just within a single test). And then finally plot these in a plot.
Here's the code I used in the previous issue as an example code here again with the issues commented.
I use t_test / wilcox_test.
I think the issues are reproducible in the both of them (and maybe other tests that I've not checked).
Again, if I'm doing something obviously wrong as last time, apologies in advance!

####### Problems with output of t_test --------


# Required package --------------------
library(ggpubr)
library(rstatix) # rstatix version: 0.4.0
# I'm on Fedora 31 with R updated and all packages updated, if that detail is relevant

# Generate dataset ------------------------
set.seed(24)
labels <- c("A", "B", "C", "D", "E", "F", "G")

test_dataset <- data.frame(label = NULL, value = NULL)
means_vector <- c(75, 125, 75, 55, 45, 35, 25)
for (i in 1:length(means_vector)) {
  ran_nums <- rnorm(n = 20, mean = means_vector[i], sd = 50)
  temp_data <- data.frame(label = labels[i], value = ran_nums)
  test_dataset <- rbind(test_dataset, temp_data)
}




# Perform the test ------------------------

#### PROBLEM 1 ####

# These two give the same results, eventhough by default p.adjust.method = "holm"
# (When I press F1 on wilcox_test this is what is shown regarding p.adjust.method:
# wilcox_test(data, formula, comparisons = NULL, ref.group = NULL,
#             p.adjust.method = "holm", paired = FALSE, exact = NULL,
#             alternative = "two.sided", mu = 0, conf.level = 0.95,
#             detailed = FALSE)
# )

test_dataset_pvalues_1 <- test_dataset %>% 
  group_by(label) %>% 
  wilcox_test(value ~ 1, mu = 100) 

test_dataset_pvalues_2 <- test_dataset %>% 
  group_by(label) %>% 
  wilcox_test(value ~ 1, mu = 100, p.adjust.method = "none") 

test_dataset_pvalues_1 == test_dataset_pvalues_2 # shows both are identical

# These two give same results
test_dataset_pvalues_1 <- test_dataset %>% 
  group_by(label) %>% 
  t_test(value ~ 1, mu = 100) 

test_dataset_pvalues_2 <- test_dataset %>% 
  group_by(label) %>% 
  t_test(value ~ 1, mu = 100, p.adjust.method = "none") 

test_dataset_pvalues_1 == test_dataset_pvalues_2 # shows both are identical

# These two give same results
test_dataset_pvalues_1 <- test_dataset %>% 
  group_by(label) %>% 
  t_test(value ~ 1, mu = 100, p.adjust.method = "holm") 

test_dataset_pvalues_2 <- test_dataset %>% 
  group_by(label) %>% 
  t_test(value ~ 1, mu = 100, p.adjust.method = "none") 

test_dataset_pvalues_1 == test_dataset_pvalues_2 # shows both are identical

# These two give same results
test_dataset_pvalues_1 <- test_dataset %>% 
  group_by(label) %>% 
  t_test(value ~ 1, mu = 100, p.adjust.method = "holm") 

test_dataset_pvalues_2 <- test_dataset %>% 
  group_by(label) %>% 
  t_test(value ~ 1, mu = 100, p.adjust.method = "BH") 

test_dataset_pvalues_1 == test_dataset_pvalues_2 # shows both are identical

# So basically the p.adjust.method doesn't seem to be functioning




#### PROBLEM 2 ####

# In pairwise_t_test when asked for not adjusting p values, this setting is respected, but p.adj columns are returned anyway (with non adjusted pvalues)
# If this is intentional behaviour - that's okay (although it could be misleading) - but this behaviour seems to be different from what you have going for the t_test function where these extra p.adj columns are not given out

test_dataset_pvalues_1 <- test_dataset %>% 
  #group_by(label) %>%  # Why does uncommenting this line give an error?
  pairwise_t_test(value~label) 


test_dataset_pvalues_2 <- test_dataset %>% 
  #group_by(label) %>%  # Why does uncommenting this line give an error?
  pairwise_t_test(value~label, p.adjust.method = "holm") 

test_dataset_pvalues_1 == test_dataset_pvalues_2 # shows both are identical AS IT SHOULD IN THIS CASE



# However, when setting p.adjust.method = "none" although it works, why are the p.adj and p.adj.signif columns returned?
# These columns have the same p values as the non adjusted one - so the correction has not taken place as asked, but the columns are returned anyway when they should not be right?
test_dataset_pvalues_3 <- test_dataset %>% 
  #group_by(label) %>%  # Why does uncommenting this line give an error?
  pairwise_t_test(value~label, p.adjust.method = "none") 


##### PROBLEM 3 #####
# How do I combine results of tests for performing multiple testing correction and plotting in same plot?

# For example, in my random dataset generated here, I need to perform first comparisons with a mu = 100
test_dataset_pvalues_1 <- test_dataset %>% 
  group_by(label) %>% 
  wilcox_test(value ~ 1, mu = 100) 

# Then I need to make some pairwise comparisons
comparisons_list <- list(c("A", "B"), c("A", "D"), c("A", "F"))
test_dataset_pvalues_2 <- test_dataset %>% 
  #group_by(label) %>% 
  wilcox_test(value ~ label, comparisons = comparisons_list, p.adjust.method = "none")

# Or I need to make all pairwise comparisons
test_dataset_pvalues_2 <- test_dataset %>% 
  #group_by(label) %>% 
  pairwise_wilcox_test(value ~ label, p.adjust.method = "none")

# So now I need to report both test_dataset_pvalues_1 and 2 in my plot
# But first I need to combine them and perform multiple testing correction across all tests
# Currently I can only perform multiple testing correction within the set of tests performed, but often you need to perform multiple testing correction across all tests performed right?
# There's no easy way to combine test_dataset_pvalues_1 and 2 because of two reasons:
# The extra p.adj columns even with multiple testing is not performed
# And the n1 n2 columns issue which would be absent in the tests performed with mu = 100 (table 1)
# So how would you recommend this complex testing paradigm be achieved in a pipe?
# Having one common table will also aid their addition to the plot

# Plot the data -------------------------------
myplot <- ggplot(test_dataset, aes(x = label, y = value)) +
  geom_boxplot() +
  geom_hline(yintercept = 100, linetype = 2, colour = "red") 

# Add p-values on the plot -----------------------
# Right now I've just added to two tables without pvalue adjustment for this example.
# I'd like to combine the two tables, adjust pvalues and then plot them together.

test_dataset_pvalues_1  <- test_dataset_pvalues_1 %>%
  add_xy_position(x = "label")

myplot + 
  stat_pvalue_manual(test_dataset_pvalues_1, label = "p", 
                            remove.bracket = TRUE, hjust = 1) +
  stat_pvalue_manual(test_dataset_pvalues_2, label = "p", 
                     remove.bracket = FALSE, y.position = 200) # How to auto get y.position here?

Pairwise-T-test values all the same?!

Hi guys!

I am using the Rstatix package to perform a repeated measures ANOVA and Pairwise-T-test for my bachelor's thesis. However, I recently stumbled across a problem. At first, everything was fine. But all of a sudden the Pairwise T-test started generating the same values for each ANOVA-group (this was not the case at first). I have been using the same script continuously and restarted R multiple times but keep having the same issues. I hope someone here can help!

Background information: my data consists of 9 participants who's sleep onset has been measured during three time points: week, weekdays and weekends. My goal is to compare these three time points. My data table consists of the columns "Participant" (number 1-9), "Time" (Week, Weekdays or Weekend) and "Sleeponset" (numeric variable for sleep onset).

I use the following script to perform the ANOVA and Pairwise-T-test:

aov <- anova_test(data = anovaSleeponset, dv = Sleeponset, wid = Participant, within = Time)
get_anova_table(aov)

pairwisecomp <- anovaSleeponset %>% pairwise_t_test(Sleeponset~Time, paired = TRUE, p.adjust.method = "bonferroni")
pairwisecomp

I receive the following output:

ANOVA Table (type III tests)

  Effect DFn DFd      F     p p<.05   ges
1   Time   2  16 10.433 0.001     * 0.023

.y.        group1   group2      n1    n2 statistic    df     p p.adj p.adj.signif
* <chr>      <chr>    <chr>    <int> <int>     <dbl> <dbl> <dbl> <dbl> <chr>       
1 Sleeponset Week     Weekdays     9     9      3.23     8 0.012 0.036 *           
2 Sleeponset Week     Weekend      9     9     -3.23     8 0.012 0.036 *           
3 Sleeponset Weekdays Weekend      9     9     -3.23     8 0.012 0.036 *

As you can see, the Pairwise-T values are similar for all comparisons. This happens all the time. Also when I compare other number variables like sleep efficiency or sleep duration. Can someone help me out?!

chisq_test() does not appear to accept piped data

chisq_test() seems to only expect a table. Since the goal of the package is to work with pipes, it seems like this is counterintuitive:

require(tidyverse)
#> Loading required package: tidyverse

diamonds %>% rstatix::chisq_test(cut, clarity)
#> Error in as.list.environment(environment()): object 'clarity' not found

diamonds %>% table(cut, clarity) %>% rstatix::chisq_test(cut, clarity)
#> Error in table(., cut, clarity): object 'clarity' not found

table(diamonds$cut, diamonds$clarity) %>% rstatix::chisq_test(cut, clarity)
#> Error in as.list.environment(environment()): object 'clarity' not found

table(diamonds$cut, diamonds$clarity) %>% rstatix::chisq_test()
#> # A tibble: 1 x 6
#>       n statistic     p    df method          p.signif
#> * <int>     <dbl> <dbl> <int> <chr>           <chr>   
#> 1 53940     4391.     0    28 Chi-square test ****

^{Created on 2020-05-05 by the reprex package (v0.3.0)}

How to get the y.position value from add_xy_position( ) function

Hello. I have a question about adding pairwise comparison p-value to the plot.

In your Kruskal Wallis tutorial, before drawing the boxplot, you add_xy_position() to the pairwise comparison result pwc, so pwc have some additional columns such as y.position. I was wondering, how you get these y.postion values?

In the add_xy_position() help document, it is said the fun argument is a summary statistics function. My question is, this is a summary statistics for whom? I set fun = "median", but the result doesn't look like the median.

A reproducible example is here:

library(rstatix)
#> 
#> Attaching package: 'rstatix'
#> The following object is masked from 'package:stats':
#> 
#>     filter
library(ggpubr)
#> Loading required package: ggplot2
#> Loading required package: magrittr

data("PlantGrowth")

# conduct Kruskal Wallis test
res.kruskal <- PlantGrowth %>% 
  kruskal_test(weight ~ group)

# conduct pairwise comparison
pwc <- PlantGrowth %>% 
  dunn_test(weight ~ group, p.adjust.method = "bonferroni")

# Visualization: box plots with p-values
pwc <- pwc %>% 
  add_xy_position(x = "group", fun = "median")

ggboxplot(PlantGrowth, x = "group", y = "weight") +
  stat_pvalue_manual(pwc) +
  labs(subtitle = get_test_label(res.kruskal, detailed = TRUE),
       caption = get_pwc_label(pwc))

^{Created on 2020-02-14 by the reprex package (v0.3.0)}

And there are other options for fun like "mean_sd", "mean_se", "mean_ci" and so on. Could you please provide more detailed information?

Change "groups" column name in the output of add_xy_position()

groups should be changed to for exampl .groups. to avoid possible conflict with an input data containing the same column name

wilcox_test and adjust_pvalue with groups

Hi,

this is maybe more a statistical question rather than rstatix, but I would like to make sure I'm using the functions correctly. When using a a two-sample wilcox_test, adjusting p values doesn't really change anything (when using 2 groups data). I read the documentation where it states: "For a grouped data, if pairwise test is performed, then the p-values are adjusted for each group level independently." But I don't completely grasp it.
Let me include some reproducible example:

# Required package 
library(ggpubr)
library(rstatix)

# dataset 
data("ToothGrowth")

# wilcox_test
pwc <- ToothGrowth %>% 
  wilcox_test(len ~ dose) %>%
  adjust_pvalue(p.col = "p", output.col = "p.adj",method = "BH") %>%
  add_significance() 
pwc

pwc2 <- ToothGrowth %>% 
  wilcox_test(len ~ supp) %>%
  adjust_pvalue(p.col = "p", output.col = "p.adj",method = "BH") %>%
  add_significance() 
pwc2

Output is:

pwc
A tibble: 3 x 9
.y. group1 group2 n1 n2 statistic p p.adj p.adj.signif

1 len 0.5 1 20 20 33.5 0.00000702 1.05e-5 ****
2 len 0.5 2 20 20 1.5 0.0000000841 2.52e-7 ****
3 len 1 2 20 20 61 0.000177 1.77e-4 ***

pwc2
A tibble: 1 x 9
.y. group1 group2 n1 n2 statistic p p.adj p.adj.signif

1 len OJ VC 30 30 576. 0.0645 0.0645 ns

Why do pairwise comparisons between just 2 groups not get p.adj? (maybe this was not the perfect example because in here it's ns, but it happens also when significant).
Is the p-value obtained from wilcox_test using just 2 groups then already adjusted? Or if not, is it then correct to report it as is?

I'm afraid my lockdown brain is struggling even more than usual with the basic stats.
Thanks a bunch for the help!

Degree of Freedom is different between rstatix and afex packages

I have found that df (Degree of Freedom) reported by rstatix package is different from what is returned by afex package.

For example, you could see the difference in the following. F(2, 18) in rstatix package; F(1.38, 12.42) in afex package.

Would you mind addressing this issue using the reproducible code below?:

# Load packages:
suppressPackageStartupMessages(library(tidyverse))
suppressPackageStartupMessages(library(rstatix))
suppressPackageStartupMessages(library(afex))
library(papaja)

# Prepare data:
data(selfesteem, package = "datarium")

# Transform data into a long format:
df <- selfesteem %>%
  tidyr::gather(key = "time", value = "score", -id) %>%
  rstatix::convert_as_factor(id, time)

# Anova results from `rstatix` package:
res.aov1 <- rstatix::anova_test(data = df, dv = score, wid = id, within = time)
res.aov1
#> ANOVA Table (type III tests)
#> 
#> $ANOVA
#>   Effect DFn DFd      F        p p<.05   ges
#> 1   time   2  18 55.469 2.01e-08     * 0.829
#> 
#> $`Mauchly's Test for Sphericity`
#>   Effect     W     p p<.05
#> 1   time 0.551 0.092      
#> 
#> $`Sphericity Corrections`
#>   Effect  GGe      DF[GG]    p[GG] p[GG]<.05   HFe      DF[HF]    p[HF]
#> 1   time 0.69 1.38, 12.42 2.16e-06         * 0.774 1.55, 13.94 6.03e-07
#>   p[HF]<.05
#> 1         *
rstatix::get_test_label(res.aov1, description = NULL, detailed = TRUE, type = "text")
#> [1] "F(2,18) = 55.47, p = <0.0001, eta2[g] = 0.829"

# Anova results from `afex` package:
res.aov2 <- afex::aov_ez(data = df, dv = "score", id = "id", within = "time")
res.aov2
#> Anova Table (Type 3 tests)
#> 
#> Response: score
#>   Effect          df  MSE         F ges p.value
#> 1   time 1.38, 12.42 1.34 55.47 *** .83  <.0001
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '+' 0.1 ' ' 1
#> 
#> Sphericity correction method: GG
papaja::apa_print(res.aov2)$full_result$time
#> [1] "$F(1.38, 12.42) = 55.47$, $\\mathit{MSE} = 1.34$, $p < .001$, $\\hat{\\eta}^2_G = .829$"

^{Created on 2019-12-27 by the reprex package (v0.3.0.9001)}

Session info

sessioninfo::session_info()
#> - Session info ---------------------------------------------------------------
#>  setting  value                       
#>  version  R version 3.6.1 (2019-07-05)
#>  os       Windows 10 x64              
#>  system   x86_64, mingw32             
#>  ui       RTerm                       
#>  language (EN)                        
#>  collate  English_United States.1252  
#>  ctype    English_United States.1252  
#>  tz       America/New_York            
#>  date     2019-12-27                  
#> 
#> - Packages -------------------------------------------------------------------
#>  package      * version     date       lib
#>  abind          1.4-5       2016-07-21 [1]
#>  afex         * 0.25-1      2019-12-26 [1]
#>  assertthat     0.2.1       2019-03-21 [1]
#>  backports      1.1.5       2019-10-02 [1]
#>  boot           1.3-23      2019-07-05 [1]
#>  broom          0.5.3.9000  2019-12-19 [1]
#>  car            3.0-6       2019-12-23 [1]
#>  carData        3.0-3       2019-11-16 [1]
#>  cellranger     1.1.0       2016-07-27 [1]
#>  cli            2.0.0       2019-12-09 [1]
#>  coda           0.19-3      2019-07-05 [1]
#>  codetools      0.2-16      2018-12-24 [1]
#>  colorspace     1.4-1       2019-03-18 [1]
#>  crayon         1.3.4       2017-09-16 [1]
#>  curl           4.3         2019-12-02 [1]
#>  data.table     1.12.8      2019-12-09 [1]
#>  DBI            1.0.0       2018-05-02 [1]
#>  dbplyr         1.4.2       2019-11-08 [1]
#>  digest         0.6.23      2019-11-23 [1]
#>  dplyr        * 0.8.3       2019-07-04 [1]
#>  ellipsis       0.3.0       2019-09-20 [1]
#>  emmeans        1.4.3.01    2019-11-28 [1]
#>  estimability   1.3         2018-02-11 [1]
#>  evaluate       0.14        2019-05-28 [1]
#>  fansi          0.4.0       2018-10-05 [1]
#>  forcats      * 0.4.0       2019-02-17 [1]
#>  foreign        0.8-72      2019-08-02 [1]
#>  fs             1.3.1       2019-05-06 [1]
#>  generics       0.0.2       2018-11-29 [1]
#>  ggplot2      * 3.2.1.9000  2019-12-23 [1]
#>  glue           1.3.1.9000  2019-12-19 [1]
#>  gtable         0.3.0       2019-03-25 [1]
#>  haven          2.2.0.9000  2019-11-08 [1]
#>  highr          0.8         2019-03-20 [1]
#>  hms            0.5.2       2019-10-30 [1]
#>  htmltools      0.4.0       2019-10-04 [1]
#>  httr           1.4.1       2019-08-05 [1]
#>  jsonlite       1.6         2018-12-07 [1]
#>  knitr          1.26.1      2019-12-05 [1]
#>  lattice        0.20-38     2018-11-04 [1]
#>  lifecycle      0.1.0       2019-08-01 [1]
#>  lme4         * 1.1-21      2019-03-05 [1]
#>  lmerTest       3.1-1       2019-12-13 [1]
#>  lubridate      1.7.4       2018-04-11 [1]
#>  magrittr       1.5         2014-11-22 [1]
#>  MASS           7.3-51.4    2019-03-31 [1]
#>  Matrix       * 1.2-18      2019-11-27 [1]
#>  minqa          1.2.4       2014-10-09 [1]
#>  modelr         0.1.5       2019-08-08 [1]
#>  multcomp       1.4-10      2019-03-05 [1]
#>  munsell        0.5.0       2018-06-12 [1]
#>  mvtnorm        1.0-11      2019-06-19 [1]
#>  nlme           3.1-142     2019-11-07 [1]
#>  nloptr         1.2.1       2018-10-03 [1]
#>  numDeriv       2016.8-1.1  2019-06-06 [1]
#>  openxlsx       4.1.4       2019-12-06 [1]
#>  papaja       * 0.1.0.9842  2019-12-19 [1]
#>  pillar         1.4.3       2019-12-20 [1]
#>  pkgconfig      2.0.3       2019-09-22 [1]
#>  plyr           1.8.5       2019-12-10 [1]
#>  purrr        * 0.3.3       2019-10-18 [1]
#>  R6             2.4.1       2019-11-12 [1]
#>  Rcpp           1.0.3       2019-11-08 [1]
#>  readr        * 1.3.1.9000  2019-12-23 [1]
#>  readxl         1.3.1.9000  2019-12-17 [1]
#>  reprex         0.3.0.9001  2019-11-13 [1]
#>  reshape2       1.4.3       2017-12-11 [1]
#>  rio            0.5.16      2018-11-26 [1]
#>  rlang          0.4.2.9000  2019-12-26 [1]
#>  rmarkdown      2.0.5       2019-12-20 [1]
#>  rstatix      * 0.3.1.999   2019-12-23 [1]
#>  rvest          0.3.5.9000  2019-11-09 [1]
#>  sandwich       2.5-1       2019-04-06 [1]
#>  scales         1.1.0       2019-11-18 [1]
#>  sessioninfo    1.1.1       2018-11-05 [1]
#>  stringi        1.4.3       2019-03-12 [1]
#>  stringr      * 1.4.0.9000  2019-11-11 [1]
#>  styler         1.2.0.9000  2019-11-13 [1]
#>  survival       3.1-8       2019-12-03 [1]
#>  TH.data        1.0-10      2019-01-21 [1]
#>  tibble       * 2.1.3       2019-06-06 [1]
#>  tidyr        * 1.0.0.9000  2019-12-09 [1]
#>  tidyselect     0.2.99.9000 2019-12-19 [1]
#>  tidyverse    * 1.3.0.9000  2019-12-17 [1]
#>  vctrs          0.2.99.9000 2019-12-26 [1]
#>  withr          2.1.2       2018-03-15 [1]
#>  xfun           0.11.3      2019-11-22 [1]
#>  xml2           1.2.2.9000  2019-12-19 [1]
#>  xtable         1.8-4       2019-04-21 [1]
#>  yaml           2.2.0       2018-07-25 [1]
#>  zip            2.0.4       2019-09-01 [1]
#>  zoo            1.8-6       2019-05-28 [1]
#>  source                               
#>  CRAN (R 3.6.0)                       
#>  Github (singmann/afex@23af68c)       
#>  CRAN (R 3.6.1)                       
#>  CRAN (R 3.6.1)                       
#>  CRAN (R 3.6.1)                       
#>  Github (tidymodels/broom@3c922d5)    
#>  CRAN (R 3.6.1)                       
#>  CRAN (R 3.6.1)                       
#>  CRAN (R 3.6.1)                       
#>  CRAN (R 3.6.1)                       
#>  CRAN (R 3.6.1)                       
#>  CRAN (R 3.6.1)                       
#>  CRAN (R 3.6.1)                       
#>  CRAN (R 3.6.1)                       
#>  CRAN (R 3.6.1)                       
#>  CRAN (R 3.6.1)                       
#>  CRAN (R 3.6.0)                       
#>  Github (hadley/dbplyr@99d4db4)       
#>  CRAN (R 3.6.1)                       
#>  CRAN (R 3.6.1)                       
#>  CRAN (R 3.6.1)                       
#>  CRAN (R 3.6.2)                       
#>  CRAN (R 3.6.0)                       
#>  CRAN (R 3.6.1)                       
#>  CRAN (R 3.6.1)                       
#>  CRAN (R 3.6.1)                       
#>  CRAN (R 3.6.1)                       
#>  CRAN (R 3.6.1)                       
#>  CRAN (R 3.6.1)                       
#>  Github (hadley/ggplot2@d05e437)      
#>  Github (tidyverse/glue@b9ffe6c)      
#>  CRAN (R 3.6.1)                       
#>  Github (tidyverse/haven@690f9d7)     
#>  CRAN (R 3.6.1)                       
#>  CRAN (R 3.6.1)                       
#>  CRAN (R 3.6.1)                       
#>  CRAN (R 3.6.1)                       
#>  CRAN (R 3.6.1)                       
#>  Github (yihui/knitr@33d69c3)         
#>  CRAN (R 3.6.1)                       
#>  CRAN (R 3.6.1)                       
#>  CRAN (R 3.6.0)                       
#>  CRAN (R 3.6.2)                       
#>  CRAN (R 3.6.1)                       
#>  CRAN (R 3.6.1)                       
#>  CRAN (R 3.6.1)                       
#>  CRAN (R 3.6.1)                       
#>  CRAN (R 3.6.0)                       
#>  CRAN (R 3.6.1)                       
#>  CRAN (R 3.6.1)                       
#>  CRAN (R 3.6.1)                       
#>  CRAN (R 3.6.0)                       
#>  CRAN (R 3.6.1)                       
#>  CRAN (R 3.6.0)                       
#>  CRAN (R 3.6.0)                       
#>  CRAN (R 3.6.1)                       
#>  Github (crsh/papaja@081b13f)         
#>  CRAN (R 3.6.2)                       
#>  CRAN (R 3.6.1)                       
#>  CRAN (R 3.6.1)                       
#>  CRAN (R 3.6.1)                       
#>  CRAN (R 3.6.1)                       
#>  CRAN (R 3.6.1)                       
#>  Github (hadley/readr@4bb6ab2)        
#>  Github (hadley/readxl@641ce4e)       
#>  Github (tidyverse/reprex@27aa69a)    
#>  CRAN (R 3.6.1)                       
#>  CRAN (R 3.6.0)                       
#>  Github (r-lib/rlang@ce4f717)         
#>  Github (rstudio/rmarkdown@7f51e23)   
#>  Github (kassambara/rstatix@7c03d07)  
#>  Github (hadley/rvest@a5200f2)        
#>  CRAN (R 3.6.1)                       
#>  CRAN (R 3.6.1)                       
#>  CRAN (R 3.6.1)                       
#>  CRAN (R 3.6.0)                       
#>  Github (hadley/stringr@80aaaac)      
#>  Github (r-lib/styler@a8acde5)        
#>  CRAN (R 3.6.1)                       
#>  CRAN (R 3.6.0)                       
#>  CRAN (R 3.6.1)                       
#>  Github (hadley/tidyr@885d1af)        
#>  Github (tidyverse/tidyselect@6b8f176)
#>  Github (tidyverse/tidyverse@1d7f9b7) 
#>  Github (r-lib/vctrs@fa48942)         
#>  CRAN (R 3.6.1)                       
#>  Github (yihui/xfun@4fcd6ff)          
#>  Github (r-lib/xml2@9fa8a7b)          
#>  CRAN (R 3.6.0)                       
#>  CRAN (R 3.6.0)                       
#>  CRAN (R 3.6.1)                       
#>  CRAN (R 3.6.1)                       
#> 
#> [1] C:/Program Files/R/R-3.6.1/library

Add function to compute generalized eta squared

allow for random intercept in anova_test?

Hi! This is not a bug - just a request. Is it possible to allow for random intercepts in anova_test? I have added covariates, but the typical notation ( 1 | variable) was ignored when I attempted to use it in the function. Your tutorial on datanovia was super helpful - thank you!

Add the function make_clean_names()

Pipe friendly function to replace make.names()

Add the option `conf.int` in wilcox_test()

Optionally (if argument conf.int is true), a nonparametric confidence interval and an estimator for the pseudomedian (one-sample case) or for the difference of the location parameters x-y is computed.

Error from name collisions in shapiro_test()

Hello,

Thank you for your wonderful packages. They are really appreciated.
I found that shapiro_test() throws an error if it is provided with column names "value" or "variable" as input.

Reproducible example:

data("selfesteem2", package = "datarium")

selfesteem2 <- selfesteem2 %>%
  gather(key = "time", value = "value", t1, t2, t3) %>%
  convert_as_factor(id, time)

selfesteem2 %>%
  group_by(treatment, time) %>%
  shapiro_test(value)

Error: Column variable is unknown

data("selfesteem2", package = "datarium")

selfesteem2 <- selfesteem2 %>%
  gather(key = "time", value = "variable", t1, t2, t3) %>%
  convert_as_factor(id, time)

selfesteem2 %>%
  group_by(treatment, time) %>%
  shapiro_test(variable)

Error: Result must have length 12, not 0

feature request: confidence intervals for effect sizes in `get_anova_table`

Will be nice to get CIs for both ges and pes

library(rstatix)
#> 
#> Attaching package: 'rstatix'
#> The following object is masked from 'package:stats':
#> 
#>     filter

data("ToothGrowth")
df <- ToothGrowth
df$id <- rep(1:10, 6)
get_anova_table(df %>% anova_test(len ~ supp * dose + Error(id / (supp * dose))), correction = "GG")
#> ANOVA Table (type III tests)
#> 
#>      Effect  DFn   DFd       F        p p<.05   ges
#> 1      supp 1.00  9.00  34.866 2.28e-04     * 0.224
#> 2      dose 1.68 15.09 106.470 2.79e-09     * 0.773
#> 3 supp:dose 1.88 16.88   2.534 1.12e-01       0.132

^{Created on 2019-12-17 by the reprex package (v0.3.0)}

Session info

devtools::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value                       
#>  version  R version 3.6.2 (2019-12-12)
#>  os       macOS Mojave 10.14.6        
#>  system   x86_64, darwin15.6.0        
#>  ui       X11                         
#>  language (EN)                        
#>  collate  en_US.UTF-8                 
#>  ctype    en_US.UTF-8                 
#>  tz       Europe/Berlin               
#>  date     2019-12-17                  
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package     * version    date       lib source                          
#>  abind         1.4-5      2016-07-21 [1] CRAN (R 3.6.0)                  
#>  assertthat    0.2.1      2019-03-21 [1] CRAN (R 3.6.0)                  
#>  backports     1.1.5      2019-10-02 [1] CRAN (R 3.6.0)                  
#>  broom         0.5.3      2019-12-14 [1] CRAN (R 3.6.0)                  
#>  callr         3.4.0      2019-12-09 [1] CRAN (R 3.6.0)                  
#>  car           3.0-5      2019-11-15 [1] CRAN (R 3.6.0)                  
#>  carData       3.0-3      2019-11-16 [1] CRAN (R 3.6.0)                  
#>  cellranger    1.1.0      2016-07-27 [1] CRAN (R 3.6.0)                  
#>  cli           2.0.0      2019-12-09 [1] CRAN (R 3.6.0)                  
#>  crayon        1.3.4      2017-09-16 [1] CRAN (R 3.6.0)                  
#>  curl          4.3        2019-12-02 [1] CRAN (R 3.6.0)                  
#>  data.table    1.12.8     2019-12-09 [1] CRAN (R 3.6.0)                  
#>  desc          1.2.0      2018-05-01 [1] CRAN (R 3.6.0)                  
#>  devtools      2.2.1      2019-09-24 [1] CRAN (R 3.6.0)                  
#>  digest        0.6.23     2019-11-23 [1] CRAN (R 3.6.0)                  
#>  dplyr         0.8.3.9000 2019-10-15 [1] Github (tidyverse/dplyr@55f4151)
#>  ellipsis      0.3.0      2019-09-20 [1] CRAN (R 3.6.0)                  
#>  evaluate      0.14       2019-05-28 [1] CRAN (R 3.6.0)                  
#>  fansi         0.4.0      2018-10-05 [1] CRAN (R 3.6.0)                  
#>  forcats       0.4.0      2019-02-17 [1] CRAN (R 3.6.0)                  
#>  foreign       0.8-72     2019-08-02 [2] CRAN (R 3.6.2)                  
#>  fs            1.3.1      2019-05-06 [1] CRAN (R 3.6.0)                  
#>  generics      0.0.2      2018-11-29 [1] CRAN (R 3.6.0)                  
#>  glue          1.3.1      2019-03-12 [1] CRAN (R 3.6.0)                  
#>  haven         2.2.0      2019-11-08 [1] CRAN (R 3.6.1)                  
#>  highr         0.8        2019-03-20 [1] CRAN (R 3.6.0)                  
#>  hms           0.5.2      2019-10-30 [1] CRAN (R 3.6.0)                  
#>  htmltools     0.4.0      2019-10-04 [1] CRAN (R 3.6.0)                  
#>  knitr         1.26       2019-11-12 [1] CRAN (R 3.6.0)                  
#>  lattice       0.20-38    2018-11-04 [2] CRAN (R 3.6.2)                  
#>  lifecycle     0.1.0      2019-08-01 [1] CRAN (R 3.6.0)                  
#>  magrittr      1.5        2014-11-22 [1] CRAN (R 3.6.0)                  
#>  memoise       1.1.0      2017-04-21 [1] CRAN (R 3.6.0)                  
#>  nlme          3.1-142    2019-11-07 [2] CRAN (R 3.6.2)                  
#>  openxlsx      4.1.4      2019-12-06 [1] CRAN (R 3.6.0)                  
#>  pillar        1.4.2      2019-06-29 [1] CRAN (R 3.6.0)                  
#>  pkgbuild      1.0.6      2019-10-09 [1] CRAN (R 3.6.0)                  
#>  pkgconfig     2.0.3      2019-09-22 [1] CRAN (R 3.6.0)                  
#>  pkgload       1.0.2      2018-10-29 [1] CRAN (R 3.6.0)                  
#>  prettyunits   1.0.2      2015-07-13 [1] CRAN (R 3.6.0)                  
#>  processx      3.4.1      2019-07-18 [1] CRAN (R 3.6.0)                  
#>  ps            1.3.0      2018-12-21 [1] CRAN (R 3.6.0)                  
#>  purrr         0.3.3      2019-10-18 [1] CRAN (R 3.6.0)                  
#>  R6            2.4.1      2019-11-12 [1] CRAN (R 3.6.0)                  
#>  Rcpp          1.0.3      2019-11-08 [1] CRAN (R 3.6.1)                  
#>  readxl        1.3.1      2019-03-13 [1] CRAN (R 3.6.0)                  
#>  remotes       2.1.0      2019-06-24 [1] CRAN (R 3.6.0)                  
#>  rio           0.5.16     2018-11-26 [1] CRAN (R 3.6.0)                  
#>  rlang         0.4.2.9000 2019-12-16 [1] Github (r-lib/rlang@ec7c1ed)    
#>  rmarkdown     2.0        2019-12-12 [1] CRAN (R 3.6.0)                  
#>  rprojroot     1.3-2      2018-01-03 [1] CRAN (R 3.6.0)                  
#>  rstatix     * 0.3.1      2019-12-16 [1] CRAN (R 3.6.2)                  
#>  sessioninfo   1.1.1      2018-11-05 [1] CRAN (R 3.6.0)                  
#>  stringi       1.4.3      2019-03-12 [1] CRAN (R 3.6.0)                  
#>  stringr       1.4.0      2019-02-10 [1] CRAN (R 3.6.0)                  
#>  testthat      2.3.1      2019-12-01 [1] CRAN (R 3.6.0)                  
#>  tibble        2.1.3      2019-06-06 [1] CRAN (R 3.6.0)                  
#>  tidyr         1.0.0      2019-09-11 [1] CRAN (R 3.6.0)                  
#>  tidyselect    0.2.5      2018-10-11 [1] CRAN (R 3.6.0)                  
#>  usethis       1.5.1      2019-07-04 [1] CRAN (R 3.6.0)                  
#>  vctrs         0.2.0.9007 2019-12-16 [1] Github (r-lib/vctrs@7228f79)    
#>  withr         2.1.2      2018-03-15 [1] CRAN (R 3.6.0)                  
#>  xfun          0.11       2019-11-12 [1] CRAN (R 3.6.0)                  
#>  yaml          2.2.0      2018-07-25 [1] CRAN (R 3.6.0)                  
#>  zip           2.0.4      2019-09-01 [1] CRAN (R 3.6.0)                  
#> 
#> [1] /Users/patil/Library/R/3.6/library
#> [2] /Library/Frameworks/R.framework/Versions/3.6/Resources/library

get_summary_stats result is sorted alphabetically by column #1

A simple workable example (using piping):
data.frame(c=rnorm(50,10,5), b=rnorm(50,100,20), a=rnorm(50,0,1)) %>% get_summary_stats(a, c, b, type="mean_sd")

I would have expected the output to be in the order of a,c,b as per the get_summary_stats command
Instead it is sorted alphabetically

Lossy cast from <tbl_df< to >> ... to <data.frame<>>.

The "lossy cast" error

Here are the codes that I used:

ToothGrowth %>% str

'data.frame': 60 obs. of 3 variables:
$ len : num 4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
$ supp: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ...
$ dose: num 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...

ToothGrowth %>% mutate(dose = as.factor(.$dose)) %>% 
group_by(supp) %>%
  kruskal_test(len ~ dose)

Lossy cast from <tbl_df<
.y. : character
n : integer
statistic: double
df : integer
p : double
method : character

to <data.frame<>>.
Dropped variables: .y., n, statistic, df, p, methodTraceback:

ToothGrowth %>% mutate(dose = as.factor(.$dose)) %>% group_by(supp) %>%
. kruskal_test(len ~ dose)

withVisible(eval(quote(_fseq(_lhs)), env, env))

eval(quote(_fseq(_lhs)), env, env)

eval(quote(_fseq(_lhs)), env, env)

_fseq(_lhs)

freduce(value, _function_list)

withVisible(function_list[k])

function_list[k]

kruskal_test(., len ~ dose)

data %>% doo(kruskal_test, formula, ...) %>% set_attrs(args = args) %>%
. add_class(c("rstatix_test", "kruskal_test"))

withVisible(eval(quote(_fseq(_lhs)), env, env))

eval(quote(_fseq(_lhs)), env, env)

eval(quote(_fseq(_lhs)), env, env)

_fseq(_lhs)

freduce(value, _function_list)

function_list[i]

doo(., kruskal_test, formula, ...)

suppressWarnings(unnest(.results))

withCallingHandlers(expr, warning = function(w) invokeRestart("muffleWarning"))

unnest(.results)

tidyr::unnest(data, cols = cols, ...)

unnest.data.frame(data, cols = cols, ...)

unchop(data, !(!cols), keep_empty = keep_empty, ptype = ptype)

vec_rbind(!(!(!x)), .ptype = ptype)

vec_cast_dispatch(x = x, to = to, x_arg = x_arg, to_arg = to_arg)

vec_cast.data.frame(x = x, to = to, x_arg = x_arg, to_arg = to_arg)

vec_cast.data.frame.data.frame(x = x, to = to, x_arg = x_arg,
. to_arg = to_arg)

df_lossy_cast(out = out, x = x, to = to)

maybe_lossy_cast(result = out, x = x, to = to, lossy = length(extra) >
. 0, locations = int(), details = inline_list("Dropped variables: ",
. extra, quote = "`"), .subclass = "vctrs_error_cast_lossy_dropped")

withRestarts(vctrs_restart_error_cast_lossy = function() result,
. stop_lossy_cast(x = x, to = to, result = result, locations = locations,
. details = details, ..., x_arg = x_arg, to_arg = to_arg,
. message = message, .subclass = .subclass))

withOneRestart(expr, restarts[[1L]])

doWithOneRestart(return(expr), restart)

stop_lossy_cast(x = x, to = to, result = result, locations = locations,
. details = details, ..., x_arg = x_arg, to_arg = to_arg, message = message,
. .subclass = .subclass)

stop_vctrs(message, x = x, y = to, to = to, result = result,
. locations = locations, details = details, ..., .subclass = c(.subclass,
. "vctrs_error_cast_lossy"))

abort(message, .subclass = c(.subclass, "vctrs_error"), ...)

Here's the session info:
R version 3.4.3 (2017-11-30)
Platform: x86_64-redhat-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS/LAPACK: /usr/lib64/R/lib/libRblas.so

locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] reshape2_1.4.3 rstatix_0.2.0.999 dplyr_0.8.3 jsonlite_1.6
[5] formatR_1.7

loaded via a namespace (and not attached):
[1] Rcpp_1.0.2 lattice_0.20-38 tidyr_1.0.0 prettyunits_1.0.2
[5] ps_1.3.0 assertthat_0.2.1 zeallot_0.1.0 rprojroot_1.3-2
[9] digest_0.6.22 IRdisplay_0.7.0 R6_2.4.0 plyr_1.8.4
[13] repr_1.0.1 backports_1.1.5 MatrixModels_0.4-1 evaluate_0.14
[17] pillar_1.4.2 rlang_0.4.0 curl_4.2 uuid_0.1-2
[21] minqa_1.2.4 SparseM_1.77 car_2.1-6 callr_3.3.2
[25] nloptr_1.2.1 Matrix_1.2-17 desc_1.2.0 devtools_2.2.1
[29] splines_3.4.3 lme4_1.1-21 stringr_1.4.0 broom_0.5.2
[33] compiler_3.4.3 pkgconfig_2.0.3 base64enc_0.1-4 pkgbuild_1.0.5
[37] mgcv_1.8-29 htmltools_0.4.0 nnet_7.3-12 tidyselect_0.2.5
[41] tibble_2.1.3 crayon_1.3.4 withr_2.1.2 MASS_7.3-51.4
[45] grid_3.4.3 nlme_3.1-141 lifecycle_0.1.0 magrittr_1.5
[49] cli_1.1.0 stringi_1.4.3 fs_1.3.1 remotes_2.1.0
[53] testthat_2.2.1 ellipsis_0.3.0 generics_0.0.2 vctrs_0.2.0
[57] boot_1.3-23 IRkernel_1.0.2 tools_3.4.3 glue_1.3.1
[61] purrr_0.3.3 processx_3.4.1 pkgload_1.0.2 parallel_3.4.3
[65] pbkrtest_0.4-7 sessioninfo_1.1.1 memoise_1.1.0 pbdZMQ_0.3-3
[69] usethis_1.5.1 quantreg_5.51

Thanks!!

rstatix 0.1.2 on its way to CRAN

get_y_position(): add argument stack to autocompute p-value positions for stacked bar plot

CRAN Check Failure for Upcoming broom Release

Hi there! The broom dev team just ran reverse dependency checks on the upcoming broom 0.7.0 release and found new errors/test failures for the CRAN version of this package. I've pasted the results below, which seem to result from our decision to deprecate tidier methods for summary() objects.

checking examples ... ERROR

...
Running examples in ‘rstatix-Ex.R’ failed
The error most likely occurred in:

> ### Name: emmeans_test
> ### Title: Pairwise Comparisons of Estimated Marginal Means
> ### Aliases: emmeans_test get_emmeans
> 
> ### ** Examples
> 
> # Data preparation
> df <- ToothGrowth
> df$dose <- as.factor(df$dose)
> 
> # Pairwise comparisons
> res <- df %>%
+  group_by(supp) %>%
+  emmeans_test(len ~ dose, p.adjust.method = "bonferroni")
Error in summary.emmGrid(x, ...) : 
  formal argument "infer" matched by multiple actual arguments
Calls: %>% ... <Anonymous> -> tidy -> tidy.emmGrid -> tidy_emmeans -> summary
Execution halted

Please call tidy.*() on the model object itself, rather than its summary.🙂

We hope to submit this new version of the package to CRAN in the coming weeks. If you encounter any problems fixing these issues, please feel free to reach out!

y-position .. Is there any way for y-position becomes automatic?

Hello,
Any possibility of y-position becomes automatic?

stat.test <- aov(mean_child_interest ~ Country, data = stats_child_interest_complete) %>%
  tukey_hsd()

ggbarplot(stats_child_interest_complete, 
          x = "Country", 
          y = "mean_child_interest", 
          color = "Country",
          add = "mean_se",
          position = position_dodge()) 

ggboxplot(stats_child_interest_complete, x = "Country", y = "mean_child_interest") +
    stat_pvalue_manual(
    stat.test, label = "p.adj", 
    **y.position = c(7, 8, 9,10,11)**
  )

Uninformative error message for hedges.correction = TRUE

This code gives uninformative error message because it is not clear, what is object 'y':

data(mice, package = "datarium")
rstatix::cohens_d(mice, weight ~ 1, mu = 25, hedges.correction = TRUE)
#> Error in get_cohens_d(data, formula, paired = paired, var.equal = var.equal, : object 'y' not found

^{Created on 2020-04-23 by the reprex package (v0.3.0)}

feature request: supporting Hedge's g

rstatix currently supports Cohen's d, but it will be nice to have a similar function for Hedge's g, which is much more preferable from meta-analysis perspective

feature request: posthoc test for Freidman's test

Currently, rstatix supports posthoc comparisons for Kruskal-Wallis test with a function for Dunn test.

Similarly, will you consider supporting posthoc comparisons for Friedman test with a function for Durbin-Conover test?

Feature request: Conover-Iman test as a Kruskal-Wallis post-hoc

In addition to the Dunn's test, could Conover-Iman [1] test be implemented in rstatix? Conover-Iman test has greater statistical power.

1: https://permalink.lanl.gov/object/tr?what=info:lanl-repo/lareport/LA-07677-MS

bug in cohens_d for one sample t-test

For one sample t-tests, cohens_d ignores mu argument.

> ToothGrowth %>% cohens_d(len ~ 1, mu = 0) # default mu value
# A tibble: 1 x 6
  .y.   group1 group2     effsize     n magnitude
* <chr> <chr>  <chr>        <dbl> <int> <ord>    
1 len   1      null model    2.46    60 large    

> ToothGrowth %>% cohens_d(len ~ 1, mu = 5)
# A tibble: 1 x 6
  .y.   group1 group2     effsize     n magnitude
* <chr> <chr>  <chr>        <dbl> <int> <ord>    
1 len   1      null model    2.46    60 large    

> ToothGrowth %>% cohens_d(len ~ 1, mu = 10000000)
# A tibble: 1 x 6
  .y.   group1 group2     effsize     n magnitude
* <chr> <chr>  <chr>        <dbl> <int> <ord>    
1 len   1      null model    2.46    60 large

Add description for performing single tests to the readme.md file

Thanks for developing this package.
I was wondering if you could add the description for performing single tests to the readme.md file which can then be used as quick reference.
You've a short description on this issue here: kassambara/ggpubr#79
However, many aspects of that description are lacking.
For example, it took me a while to figure out what len ~ 1 meant in (t_test(len ~ 1, mu = 2).

Update the website for rstatix

Online documentation misses some topics, e.g., there is no reference for functions related to proportions (https://rpkgs.datanovia.com/rstatix/reference/index.html):

Maybe automatic deployment of pkgdown website would be helpful? It could be applied with:

usethis::use_github_action("pkgdown")

Bug report: `var` must evaluate to a single number or a column name, not NULL

I use the latest version from GitHub.

Code:

df <- ToothGrowth
df$dose <- as.factor(df$dose)
head(df)

library(rstatix)  
library(ggpubr)  # For easy data-visualization

stat.test <- df %>% t_test(len ~ dose, ref.group = "all")
stat.test
# Box plot with horizontal mean line
ggboxplot(df, x = "dose", y = "len") +
  stat_pvalue_manual(
    stat.test, label = "p.adj.signif", 
    y.position = 35,
    remove.bracket = TRUE
  ) +
  geom_hline(yintercept = mean(df$len), linetype = 2)

Print:

> ggboxplot(df, x = "dose", y = "len") +
+   stat_pvalue_manual(
+     stat.test, label = "p.adj.signif", 
+     y.position = 35,
+     remove.bracket = TRUE
+   ) +
+   geom_hline(yintercept = mean(df$len), linetype = 2)
错误: `var` must evaluate to a single number or a column name, not NULL
Call `rlang::last_error()` to see a backtrace
> rlang::last_error()
<error>
message: `var` must evaluate to a single number or a column name, not NULL
class:   `rlang_error`
backtrace:
  1. ggpubr::stat_pvalue_manual(...)
 13. dplyr::pull(., NULL)
 22. tidyselect::vars_pull(names(.data), !!enquo(var))
 23. ggpubr::stat_pvalue_manual(...)
Call `rlang::last_trace()` to see the full backtrace

Session:

> devtools::session_info()
─ Session info ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
 setting  value                                 
 version  R version 3.6.0 RC (2019-04-21 r76409)
 os       macOS High Sierra 10.13.6             
 system   x86_64, darwin15.6.0                  
 ui       RStudio                               
 language (EN)                                  
 collate  zh_CN.UTF-8                           
 ctype    zh_CN.UTF-8                           
 tz       Asia/Shanghai                         
 date     2019-08-26                            

─ Packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
 package     * version   date       lib source                             
 abind         1.4-5     2016-07-21 [1] CRAN (R 3.6.0)                     
 assertthat    0.2.1     2019-03-21 [1] CRAN (R 3.6.0)                     
 backports     1.1.4     2019-04-10 [1] CRAN (R 3.6.0)                     
 broom         0.5.2     2019-04-07 [1] CRAN (R 3.6.0)                     
 callr         3.3.0     2019-07-04 [1] CRAN (R 3.6.0)                     
 car           3.0-3     2019-05-27 [1] CRAN (R 3.6.0)                     
 carData       3.0-2     2018-09-30 [1] CRAN (R 3.6.0)                     
 cellranger    1.1.0     2016-07-27 [1] CRAN (R 3.6.0)                     
 cli           1.1.0     2019-03-19 [1] CRAN (R 3.6.0)                     
 colorspace    1.4-1     2019-03-18 [1] CRAN (R 3.6.0)                     
 crayon        1.3.4     2017-09-16 [1] CRAN (R 3.6.0)                     
 curl          3.3       2019-01-10 [1] CRAN (R 3.6.0)                     
 data.table    1.12.2    2019-04-07 [1] CRAN (R 3.6.0)                     
 desc          1.2.0     2018-05-01 [1] CRAN (R 3.6.0)                     
 devtools      2.1.0     2019-07-06 [1] CRAN (R 3.6.0)                     
 digest        0.6.20    2019-07-04 [1] CRAN (R 3.6.0)                     
 dplyr         0.8.3     2019-07-04 [1] CRAN (R 3.6.0)                     
 fansi         0.4.0     2018-10-05 [1] CRAN (R 3.6.0)                     
 forcats       0.4.0     2019-02-17 [1] CRAN (R 3.6.0)                     
 foreign       0.8-71    2018-07-20 [2] CRAN (R 3.6.0)                     
 fs            1.3.1     2019-05-06 [1] CRAN (R 3.6.0)                     
 generics      0.0.2     2018-11-29 [1] CRAN (R 3.6.0)                     
 ggplot2     * 3.2.0     2019-06-16 [1] CRAN (R 3.6.0)                     
 ggpubr      * 0.2.1     2019-06-23 [1] CRAN (R 3.6.0)                     
 ggsignif      0.5.0     2019-02-20 [1] CRAN (R 3.6.0)                     
 glue          1.3.1     2019-03-12 [1] CRAN (R 3.6.0)                     
 gtable        0.3.0     2019-03-25 [1] CRAN (R 3.6.0)                     
 haven         2.1.1     2019-07-04 [1] CRAN (R 3.6.0)                     
 hms           0.5.0     2019-07-09 [1] CRAN (R 3.6.0)                     
 lattice       0.20-38   2018-11-04 [2] CRAN (R 3.6.0)                     
 lazyeval      0.2.2     2019-03-15 [1] CRAN (R 3.6.0)                     
 magrittr    * 1.5       2014-11-22 [1] CRAN (R 3.6.0)                     
 memoise       1.1.0     2017-04-21 [1] CRAN (R 3.6.0)                     
 munsell       0.5.0     2018-06-12 [1] CRAN (R 3.6.0)                     
 nlme          3.1-140   2019-05-12 [2] CRAN (R 3.6.0)                     
 openxlsx      4.1.0.1   2019-05-28 [1] CRAN (R 3.6.0)                     
 pacman      * 0.5.1     2019-03-11 [1] CRAN (R 3.6.0)                     
 pillar        1.4.2     2019-06-29 [1] CRAN (R 3.6.0)                     
 pkgbuild      1.0.3     2019-03-20 [1] CRAN (R 3.6.0)                     
 pkgconfig     2.0.2     2018-08-16 [1] CRAN (R 3.6.0)                     
 pkgload       1.0.2     2018-10-29 [1] CRAN (R 3.6.0)                     
 prettyunits   1.0.2     2015-07-13 [1] CRAN (R 3.6.0)                     
 processx      3.4.0     2019-07-03 [1] CRAN (R 3.6.0)                     
 ps            1.3.0     2018-12-21 [1] CRAN (R 3.6.0)                     
 purrr         0.3.2     2019-03-15 [1] CRAN (R 3.6.0)                     
 R6            2.4.0     2019-02-14 [1] CRAN (R 3.6.0)                     
 Rcpp          1.0.1     2019-03-17 [1] CRAN (R 3.6.0)                     
 readxl        1.3.1     2019-03-13 [1] CRAN (R 3.6.0)                     
 remotes       2.1.0     2019-06-24 [1] CRAN (R 3.6.0)                     
 rio           0.5.16    2018-11-26 [1] CRAN (R 3.6.0)                     
 rlang         0.4.0     2019-06-25 [1] CRAN (R 3.6.0)                     
 rprojroot     1.3-2     2018-01-03 [1] CRAN (R 3.6.0)                     
 rstatix     * 0.1.2.999 2019-08-26 [1] Github (kassambara/rstatix@ce03500)
 rstudioapi    0.10      2019-03-19 [1] CRAN (R 3.6.0)                     
 scales        1.0.0     2018-08-09 [1] CRAN (R 3.6.0)                     
 sessioninfo   1.1.1     2018-11-05 [1] CRAN (R 3.6.0)                     
 testthat      2.1.1     2019-04-23 [1] CRAN (R 3.6.0)                     
 tibble        2.1.3     2019-06-06 [1] CRAN (R 3.6.0)                     
 tidyr         0.8.3     2019-03-01 [1] CRAN (R 3.6.0)                     
 tidyselect    0.2.5     2018-10-11 [1] CRAN (R 3.6.0)                     
 usethis       1.5.1     2019-07-04 [1] CRAN (R 3.6.0)                     
 utf8          1.1.4     2018-05-24 [1] CRAN (R 3.6.0)                     
 vctrs         0.2.0     2019-07-05 [1] CRAN (R 3.6.0)                     
 withr         2.1.2     2018-03-15 [1] CRAN (R 3.6.0)                     
 zeallot       0.1.0     2018-01-28 [1] CRAN (R 3.6.0)                     
 zip           2.0.3     2019-07-03 [1] CRAN (R 3.6.0)                     

[1] /Users/wsx/R_library
[2] /Library/Frameworks/R.framework/Versions/3.6/Resources/library

`rstatix::identify_outliers` falters in the presence of NAs

rstatix::identify_outliers(ggplot2::msleep, brainwt)
#> Error in quantile.default(x, 0.25): missing values and NaN's not allowed if 'na.rm' is FALSE

^{Created on 2019-08-14 by the reprex package (v0.3.0)}

prop_test

Any chance at adding prop_test, which would give p-values based on test for equivalence of proportions in the event the values passed in take on only two values?

cohens_d: Is there a way output values with directionality?

Dear Alboukadel KASSAMBARA,

Thanks for a great library. I have been searching for a library to calculate a confidence interval of Cohen’s d from a Welch test. And the way you implement bootstrapping to do that seems like a way to go.

However, when I use the cohens_d function, it returns only positive values for effsize, conf.low, conf.high. So I guess, the effsize is the absolute value of d, or |d| and the conf values are also the absolute values of the bootstrapping bounds. This is problematic when conf.low and conf.high have different directionality (e.g., -.2 to .3). Is there a way to output values with directionality, as opposed to using absoluate values?

Thanks so much,

Narun

Handling of missing values in anova_test() in mixed designs

Repeated measures ANOVA relies on listwise case deletion, dependent on the used variables. However, if there are missing values in variables not included in the analysis, these cases are currently omitted as well. Can this be changed?

library(tidyverse)
library(rstatix)
#> 
#> Attache Paket: 'rstatix'
#> The following object is masked from 'package:stats':
#> 
#>     filter
library(datarium)

# Create an additional variable with values for each participant and 5 missing
additional <- 1:45

missing <- sample(additional, size = 5)

additional[missing] <- NA


# Add that as a new column to the "anxiety" dataset
anxiety_add <- anxiety %>% add_column(additional, .after = "id")

anxiety_long <- anxiety_add %>% 
  pivot_longer(
    cols = t1:t3,
    names_to = "time",
    values_to = "score"
  )


# Not that we don't use the column "additional"
anxiety_long %>%
  anova_test(dv = score, wid = id, within = time, between = group)
#> Warning: NA detected in rows: 7,8,9,13,14,15,37,38,39,46,47,48,70,71,72.
#> Removing this rows before the analysis.
#> ANOVA Table (type III tests)
#> 
#> $ANOVA
#>       Effect DFn DFd       F        p p<.05   ges
#> 1      group   2  37   4.553 1.70e-02     * 0.193
#> 2       time   2  74 352.627 1.48e-38     * 0.195
#> 3 group:time   4  74  98.270 7.96e-29     * 0.119
#> 
#> $`Mauchly's Test for Sphericity`
#>       Effect     W     p p<.05
#> 1       time 0.888 0.117      
#> 2 group:time 0.888 0.117      
#> 
#> $`Sphericity Corrections`
#>       Effect   GGe     DF[GG]    p[GG] p[GG]<.05   HFe      DF[HF]    p[HF]
#> 1       time 0.899 1.8, 66.53 6.42e-35         * 0.942 1.88, 69.72 1.80e-36
#> 2 group:time 0.899 3.6, 66.53 3.79e-26         * 0.942 3.77, 69.72 2.73e-27
#>   p[HF]<.05
#> 1         *
#> 2         *

^{Created on 2020-03-27 by the reprex package (v0.3.0)}

Feature request: bias correction for Cramer's V

Add an option to calculate a bias-corrected version of Crammer's V. E.g., as in this package:
https://github.com/cran/rcompanion/blob/master/R/cramerV.r

freq_table() removes rows with any NAs

Hi!

freq_table()'s na.rm argument will remove rows with NA in any column if TRUE, not just the variable names used to create the frequency table. It seems counterintuitive to me, but if it is intentional rather than a bug, perhaps it could be stated in the help text?

Thanks for a great package!

//Fredrik

Submitting v0.2.0 to CRAN

How to add results of a t_test or wilcox_test performed with a mu value to a ggplot?

So I performed a wilcox_test (or t_test) using your wonderful package since it has pipe friendly functions.
I've trouble adding p-values to the plot.
The tests performed here are not pairwise tests, but rather against a mu value.
Here's the code:

# For reproducing randomly generated data
set.seed(24)

# Generate dataset
labels <- c("A", "B", "C", "D", "E", "F", "G")

test_dataset <- data.frame(label = NULL, value = NULL)
means_vector <- c(75, 125, 75, 55, 45, 35, 25)
for (i in 1:length(means_vector)) {
  ran_nums <- rnorm(n = 20, mean = means_vector[i], sd = 50)
  temp_data <- data.frame(label = labels[i], value = ran_nums)
  test_dataset <- rbind(test_dataset, temp_data)
}

# Perform the test
library(rstatix)
test_dataset_pvalues <- test_dataset %>% 
  group_by(label) %>% 
  wilcox_test(value ~ 1, mu = 100) # wilcox test against of all labels against mu = 100

# Plot the data
library(ggplot2)
ggplot(test_dataset, aes(x = label, y = value)) +
  geom_boxplot() +
  geom_hline(yintercept = 100, linetype = 2, colour = "red") +
  stat_pvalue_manual(test_dataset_pvalues, label = "p", y.position = 150, remove.bracket = T)

This is what the output looks like:

So, how do I add the p-values over each label in this case?

double-checking p-values from Dunn test

The p-values from dunn.test and rstatix don't match up, and I am not sure why there is this discrepancy. I also checked the same with popular GUI softwares like jamvoi and their p-values are the same as the ones outputed by dunn.test. So I thought I would raise this issue.

library(dunn.test)

invisible(capture.output(df <-
                 as.data.frame(dunn.test(
                   x = mtcars$wt,
                   g = as.factor(mtcars$cyl),
                   table = FALSE,
                   kw = FALSE,
                   label = FALSE,
                   alpha = 0.05,
                   method = "none"
                 )), 
                 file = NULL))

tibble::as_tibble(df)
#> # A tibble: 3 x 5
#>    chi2     Z           P  P.adjusted comparisons
#>   <dbl> <dbl>       <dbl>       <dbl> <fct>      
#> 1  22.8 -1.84 0.0332      0.0332      4 - 6      
#> 2  22.8 -4.76 0.000000988 0.000000988 4 - 8      
#> 3  22.8 -2.22 0.0132      0.0132      6 - 8

library(rstatix)
#> 
#> Attaching package: 'rstatix'
#> The following object is masked from 'package:stats':
#> 
#>     filter

dunn_test(mtcars, wt ~ cyl, p.adjust.method = "none")
#> # A tibble: 3 x 9
#>   .y.   group1 group2    n1    n2 statistic          p      p.adj p.adj.signif
#> * <chr> <chr>  <chr>  <int> <int>     <dbl>      <dbl>      <dbl> <chr>       
#> 1 wt    4      6         11     7      1.84 0.0663     0.0663     ns          
#> 2 wt    4      8         11    14      4.76 0.00000198 0.00000198 ****        
#> 3 wt    6      8          7    14      2.22 0.0263     0.0263     *

^{Created on 2020-05-28 by the reprex package (v0.3.0.9001)}

Session info

sessioninfo::session_info()
#> - Session info ---------------------------------------------------------------
#>  setting  value                                             
#>  version  R Under development (unstable) (2020-02-28 r77874)
#>  os       Windows 10 x64                                    
#>  system   x86_64, mingw32                                   
#>  ui       RTerm                                             
#>  language (EN)                                              
#>  collate  English_United States.1252                        
#>  ctype    English_United States.1252                        
#>  tz       Europe/Berlin                                     
#>  date     2020-05-28                                        
#> 
#> - Packages -------------------------------------------------------------------
#>  package     * version     date       lib source                           
#>  abind         1.4-5       2016-07-21 [1] CRAN (R 4.0.0)                   
#>  assertthat    0.2.1       2019-03-21 [1] CRAN (R 4.0.0)                   
#>  backports     1.1.7       2020-05-13 [1] CRAN (R 4.0.0)                   
#>  broom         0.7.0.9000  2020-05-26 [1] Github (tidymodels/broom@a56ab06)
#>  car           3.0-8       2020-05-21 [1] CRAN (R 4.0.0)                   
#>  carData       3.0-4       2020-05-22 [1] CRAN (R 4.0.0)                   
#>  cellranger    1.1.0       2016-07-27 [1] CRAN (R 4.0.0)                   
#>  cli           2.0.2       2020-02-28 [1] CRAN (R 4.0.0)                   
#>  crayon        1.3.4       2017-09-16 [1] CRAN (R 4.0.0)                   
#>  curl          4.3         2019-12-02 [1] CRAN (R 4.0.0)                   
#>  data.table    1.12.8      2019-12-09 [1] CRAN (R 4.0.0)                   
#>  digest        0.6.25      2020-02-23 [1] CRAN (R 4.0.0)                   
#>  dplyr         0.8.99.9003 2020-05-25 [1] Github (tidyverse/dplyr@735e6a2) 
#>  dunn.test   * 1.3.5       2017-10-27 [1] CRAN (R 4.0.0)                   
#>  ellipsis      0.3.1       2020-05-15 [1] CRAN (R 4.0.0)                   
#>  evaluate      0.14        2019-05-28 [1] CRAN (R 4.0.0)                   
#>  fansi         0.4.1       2020-01-08 [1] CRAN (R 4.0.0)                   
#>  forcats       0.5.0       2020-03-01 [1] CRAN (R 4.0.0)                   
#>  foreign       0.8-75      2020-01-20 [2] CRAN (R 4.0.0)                   
#>  fs            1.4.1       2020-04-04 [1] CRAN (R 4.0.0)                   
#>  generics      0.0.2       2018-11-29 [1] CRAN (R 4.0.0)                   
#>  glue          1.4.1       2020-05-13 [1] CRAN (R 4.0.0)                   
#>  haven         2.3.0       2020-05-24 [1] CRAN (R 4.0.0)                   
#>  highr         0.8         2019-03-20 [1] CRAN (R 4.0.0)                   
#>  hms           0.5.3       2020-01-08 [1] CRAN (R 4.0.0)                   
#>  htmltools     0.4.0       2019-10-04 [1] CRAN (R 4.0.0)                   
#>  knitr         1.28        2020-02-06 [1] CRAN (R 4.0.0)                   
#>  lifecycle     0.2.0.9000  2020-03-16 [1] Github (r-lib/lifecycle@355dcba) 
#>  magrittr      1.5         2014-11-22 [1] CRAN (R 4.0.0)                   
#>  openxlsx      4.1.5       2020-05-06 [1] CRAN (R 4.0.0)                   
#>  pillar        1.4.4       2020-05-05 [1] CRAN (R 4.0.0)                   
#>  pkgconfig     2.0.3       2019-09-22 [1] CRAN (R 4.0.0)                   
#>  purrr         0.3.4       2020-04-17 [1] CRAN (R 4.0.0)                   
#>  R6            2.4.1       2019-11-12 [1] CRAN (R 4.0.0)                   
#>  Rcpp          1.0.4.6     2020-04-09 [1] CRAN (R 4.0.0)                   
#>  readxl        1.3.1       2019-03-13 [1] CRAN (R 4.0.0)                   
#>  reprex        0.3.0.9001  2020-03-25 [1] Github (tidyverse/reprex@a019cc4)
#>  rio           0.5.16      2018-11-26 [1] CRAN (R 4.0.0)                   
#>  rlang         0.4.6       2020-05-02 [1] CRAN (R 4.0.0)                   
#>  rmarkdown     2.1         2020-01-20 [1] CRAN (R 4.0.0)                   
#>  rstatix     * 0.5.0       2020-04-28 [1] CRAN (R 4.0.0)                   
#>  rstudioapi    0.11        2020-02-07 [1] CRAN (R 4.0.0)                   
#>  sessioninfo   1.1.1       2018-11-05 [1] CRAN (R 4.0.0)                   
#>  stringi       1.4.6       2020-02-17 [1] CRAN (R 4.0.0)                   
#>  stringr       1.4.0       2019-02-10 [1] CRAN (R 4.0.0)                   
#>  styler        1.3.2.9000  2020-05-17 [1] Github (r-lib/styler@8dad103)    
#>  tibble        3.0.1       2020-04-20 [1] CRAN (R 4.0.0)                   
#>  tidyr         1.1.0       2020-05-20 [1] CRAN (R 4.0.0)                   
#>  tidyselect    1.1.0       2020-05-11 [1] CRAN (R 4.0.0)                   
#>  utf8          1.1.4       2018-05-24 [1] CRAN (R 4.0.0)                   
#>  vctrs         0.3.0       2020-05-09 [1] Github (r-lib/vctrs@5b71d88)     
#>  withr         2.2.0       2020-04-20 [1] CRAN (R 4.0.0)                   
#>  xfun          0.14        2020-05-20 [1] CRAN (R 4.0.0)                   
#>  yaml          2.2.1       2020-02-01 [1] CRAN (R 4.0.0)                   
#>  zip           2.0.4       2019-09-01 [1] CRAN (R 4.0.0)                   
#> 
#> [1] C:/Users/inp099/Documents/R/win-library/4.0
#> [2] C:/Program Files/R/R-devel/library

What kind of t.test does t_test perform? Welch?

Hi,
Can you mention in the help docs for the test which kind of t test is performed by the t_test and pairwise_t_test functions?
The base functions mention Welch t-test as the default test:
var.equal a logical variable indicating whether to treat the two variances as being equal. If TRUE then the pooled variance is used to estimate the variance otherwise the Welch (or Satterthwaite) approximation to the degrees of freedom is used.
This var.equal is FALSE by default.

(I was wondering if t_test implementation is exactly the same as t.test)

Thanks in advance

CRITICAL issue: p values depend on the order of factor levels: Games-Howell test

The p values of the Games-Howell test depend on the order of factor levels (find the p.adj and p.adj.signif columns in the examples below). I find this as a critical issue, which must be solved as soon as possible.

library(tidyverse)
library(rstatix)
#> 
#> Attaching package: 'rstatix'
#> The following object is masked from 'package:stats':
#> 
#>     filter

# Incorrect p values ------------------------------------------------
levels(PlantGrowth$group)
#> [1] "ctrl" "trt1" "trt2"

rstatix::games_howell_test(PlantGrowth, weight ~ group)
#> # A tibble: 3 x 8
#>   .y.    group1 group2 estimate conf.low conf.high p.adj p.adj.signif
#> * <chr>  <chr>  <chr>     <dbl>    <dbl>     <dbl> <dbl> <chr>       
#> 1 weight ctrl   trt1     -0.371   -1.17      0.430 1     ns          
#> 2 weight ctrl   trt2      0.494   -0.101     1.09  0.113 ns          
#> 3 weight trt1   trt2      0.865    0.114     1.62  0.024 *

# Incorrect p values ------------------------------------------------
PlantGrowth2 <- 
  PlantGrowth %>% 
  mutate(group = fct_reorder(group, weight, .desc = TRUE))  
levels(PlantGrowth2$group)
#> [1] "trt2" "ctrl" "trt1"

rstatix::games_howell_test(PlantGrowth2, weight ~ group)
#> # A tibble: 3 x 8
#>   .y.    group1 group2 estimate conf.low conf.high p.adj p.adj.signif
#> * <chr>  <chr>  <chr>     <dbl>    <dbl>     <dbl> <dbl> <chr>       
#> 1 weight trt2   ctrl     -0.494    -1.09     0.101     1 ns          
#> 2 weight trt2   trt1     -0.865    -1.62    -0.114     1 ns          
#> 3 weight ctrl   trt1     -0.371    -1.17     0.430     1 ns

# Correct p values --------------------------------------------------
PlantGrowth3 <- 
  PlantGrowth %>% 
  mutate(group = fct_reorder(group, weight))
levels(PlantGrowth3$group)
#> [1] "trt1" "ctrl" "trt2"

rstatix::games_howell_test(PlantGrowth3, weight ~ group)
#> # A tibble: 3 x 8
#>   .y.    group1 group2 estimate conf.low conf.high p.adj p.adj.signif
#> * <chr>  <chr>  <chr>     <dbl>    <dbl>     <dbl> <dbl> <chr>       
#> 1 weight trt1   ctrl      0.371   -0.430      1.17 0.475 ns          
#> 2 weight trt1   trt2      0.865    0.114      1.62 0.024 *           
#> 3 weight ctrl   trt2      0.494   -0.101      1.09 0.113 ns

^{Created on 2020-04-27 by the reprex package (v0.3.0)}

You may compare the results with PMCMRplus::gamesHowellTest() and userfriendlyscience::posthocTGH()

bug in `tukey_hsd`: wrong grouping labels when `-` present

The grouping levels here are a-0 and b-1, but that's not what is returned:

library(rstatix)

g <- c(rep("a-0", 50), rep("b-1", 50))
x <- rnorm(100)
(df <- tibble::as_tibble(cbind.data.frame(g, x)))
#> # A tibble: 100 x 2
#>    g          x
#>    <fct>  <dbl>
#>  1 a-0   -2.08 
#>  2 a-0   -0.527
#>  3 a-0    2.44 
#>  4 a-0   -0.556
#>  5 a-0    0.886
#>  6 a-0    0.655
#>  7 a-0    0.882
#>  8 a-0   -0.262
#>  9 a-0    0.223
#> 10 a-0    0.117
#> # ... with 90 more rows

rstatix::tukey_hsd(aov(x ~ g, data = df))
#> Warning: Expected 2 pieces. Additional pieces discarded in 1 rows [1].
#> # A tibble: 1 x 8
#>   term  group1 group2 estimate conf.low conf.high p.adj p.adj.signif
#> * <chr> <chr>  <chr>     <dbl>    <dbl>     <dbl> <dbl> <chr>       
#> 1 g     1      b        -0.127   -0.521     0.267 0.524 ns

^{Created on 2019-12-23 by the reprex package (v0.3.0)}

Session info

devtools::session_info()
#> - Session info ---------------------------------------------------------------
#>  setting  value                       
#>  version  R version 3.6.1 (2019-07-05)
#>  os       Windows 10 x64              
#>  system   x86_64, mingw32             
#>  ui       RTerm                       
#>  language (EN)                        
#>  collate  English_United States.1252  
#>  ctype    English_United States.1252  
#>  tz       Europe/Berlin               
#>  date     2019-12-23                  
#> 
#> - Packages -------------------------------------------------------------------
#>  package     * version    date       lib source                          
#>  abind         1.4-5      2016-07-21 [1] CRAN (R 3.5.0)                  
#>  assertthat    0.2.1      2019-03-21 [1] CRAN (R 3.6.0)                  
#>  backports     1.1.5      2019-10-02 [1] CRAN (R 3.6.1)                  
#>  broom         0.5.3.9000 2019-12-15 [1] local                           
#>  callr         3.4.0      2019-12-09 [1] CRAN (R 3.6.1)                  
#>  car           3.0-6      2019-12-23 [1] CRAN (R 3.6.1)                  
#>  carData       3.0-3      2019-11-16 [1] CRAN (R 3.6.1)                  
#>  cellranger    1.1.0      2016-07-27 [1] CRAN (R 3.5.1)                  
#>  cli           2.0.0.9000 2019-12-23 [1] Github (r-lib/cli@0293ae7)      
#>  crayon        1.3.4      2017-09-16 [1] CRAN (R 3.5.1)                  
#>  curl          4.3        2019-12-02 [1] CRAN (R 3.6.1)                  
#>  data.table    1.12.8     2019-12-09 [1] CRAN (R 3.6.1)                  
#>  desc          1.2.0      2019-11-11 [1] Github (r-lib/desc@61205f6)     
#>  devtools      2.2.1      2019-09-24 [1] CRAN (R 3.6.1)                  
#>  digest        0.6.23     2019-11-23 [1] CRAN (R 3.6.1)                  
#>  dplyr         0.8.3.9000 2019-10-10 [1] Github (tidyverse/dplyr@dcfc1d1)
#>  ellipsis      0.3.0      2019-09-20 [1] CRAN (R 3.6.1)                  
#>  evaluate      0.14       2019-05-28 [1] CRAN (R 3.6.0)                  
#>  fansi         0.4.0      2018-11-05 [1] Github (brodieG/fansi@ab11e9c)  
#>  forcats       0.4.0      2019-02-17 [1] CRAN (R 3.5.2)                  
#>  foreign       0.8-71     2018-07-20 [2] CRAN (R 3.6.1)                  
#>  fs            1.3.1      2019-05-06 [1] CRAN (R 3.6.0)                  
#>  generics      0.0.2      2019-03-05 [1] Github (r-lib/generics@c15ac43) 
#>  glue          1.3.1      2019-03-12 [1] CRAN (R 3.6.0)                  
#>  haven         2.2.0      2019-11-08 [1] CRAN (R 3.6.1)                  
#>  highr         0.8        2019-03-20 [1] CRAN (R 3.6.0)                  
#>  hms           0.5.2      2019-10-30 [1] CRAN (R 3.6.1)                  
#>  htmltools     0.4.0      2019-10-04 [1] CRAN (R 3.6.1)                  
#>  knitr         1.26       2019-11-12 [1] CRAN (R 3.6.1)                  
#>  lifecycle     0.1.0      2019-08-01 [1] CRAN (R 3.6.1)                  
#>  magrittr      1.5        2014-11-22 [1] CRAN (R 3.5.1)                  
#>  memoise       1.1.0      2017-04-21 [1] CRAN (R 3.6.0)                  
#>  openxlsx      4.1.4      2019-12-06 [1] CRAN (R 3.6.1)                  
#>  pillar        1.4.3      2019-12-20 [1] CRAN (R 3.6.1)                  
#>  pkgbuild      1.0.6      2019-10-09 [1] CRAN (R 3.6.1)                  
#>  pkgconfig     2.0.3      2019-09-22 [1] CRAN (R 3.6.1)                  
#>  pkgload       1.0.2      2018-10-29 [1] CRAN (R 3.6.0)                  
#>  prettyunits   1.0.2      2015-07-13 [1] CRAN (R 3.5.1)                  
#>  processx      3.4.1      2019-07-18 [1] CRAN (R 3.6.1)                  
#>  ps            1.3.0      2018-12-21 [1] CRAN (R 3.6.0)                  
#>  purrr         0.3.3      2019-10-18 [1] CRAN (R 3.6.1)                  
#>  R6            2.4.1      2019-11-12 [1] CRAN (R 3.6.1)                  
#>  Rcpp          1.0.3      2019-11-08 [1] CRAN (R 3.6.1)                  
#>  readxl        1.3.1      2019-03-13 [1] CRAN (R 3.6.0)                  
#>  remotes       2.1.0      2019-06-24 [1] CRAN (R 3.6.0)                  
#>  rio           0.5.16     2018-11-26 [1] CRAN (R 3.6.0)                  
#>  rlang         0.4.2      2019-11-23 [1] CRAN (R 3.6.1)                  
#>  rmarkdown     2.0        2019-12-12 [1] CRAN (R 3.6.1)                  
#>  rprojroot     1.3-2      2018-01-03 [1] CRAN (R 3.5.1)                  
#>  rstatix     * 0.3.1      2019-12-16 [1] CRAN (R 3.6.1)                  
#>  sessioninfo   1.1.1      2018-11-05 [1] CRAN (R 3.6.0)                  
#>  stringi       1.4.3      2019-03-12 [1] CRAN (R 3.6.0)                  
#>  stringr       1.4.0      2019-02-10 [1] CRAN (R 3.6.0)                  
#>  testthat      2.3.1      2019-12-01 [1] CRAN (R 3.6.1)                  
#>  tibble        2.1.3      2019-06-06 [1] CRAN (R 3.6.1)                  
#>  tidyr         1.0.0      2019-09-11 [1] CRAN (R 3.6.1)                  
#>  tidyselect    0.2.5      2018-10-11 [1] CRAN (R 3.5.1)                  
#>  usethis       1.5.1.9000 2019-12-12 [1] Github (r-lib/usethis@23dd62c)  
#>  utf8          1.1.4      2018-05-24 [1] CRAN (R 3.5.1)                  
#>  vctrs         0.2.1      2019-12-17 [1] CRAN (R 3.6.1)                  
#>  withr         2.1.2      2018-03-15 [1] CRAN (R 3.5.1)                  
#>  xfun          0.11       2019-11-12 [1] CRAN (R 3.6.1)                  
#>  yaml          2.2.0      2018-07-25 [1] CRAN (R 3.5.1)                  
#>  zeallot       0.1.0      2018-01-28 [1] CRAN (R 3.5.1)                  
#>  zip           2.0.4      2019-09-01 [1] CRAN (R 3.6.1)                  
#> 
#> [1] C:/Users/inp099/Documents/R/win-library/3.6
#> [2] C:/Program Files/R/R-3.6.1/library

Issues with cor_mat

Hello,

I was trying to answer a question and installed the package. However, cor_mat fails with the error:

Error: x must be a vector, not a tbl_df/tbl/data.frame/cor_test object.
Run rlang::last_error() to see where the error occurred.

I ran the example below which led to the above error:

mtcars %>%
  select(mpg, disp, hp, drat, wt, qsec) ->my_data

my_data %>% cor_mat(mpg, hp, wt)

I suspect this is to do with my dplyr version although that seems unlikely. My session info for reference:

R version 3.6.3 (2020-02-29)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 18362)

Matrix products: default

locale:
[1] LC_COLLATE=English_United Kingdom.1252  LC_CTYPE=English_United Kingdom.1252   
[3] LC_MONETARY=English_United Kingdom.1252 LC_NUMERIC=C                           
[5] LC_TIME=English_United Kingdom.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] rstatix_0.4.0     dplyr_0.8.99.9000

loaded via a namespace (and not attached):
 [1] zip_2.0.4          Rcpp_1.0.3         pillar_1.4.3       compiler_3.6.3    
 [5] cellranger_1.1.0   forcats_0.5.0.9000 tools_3.6.3        lifecycle_0.2.0   
 [9] tibble_2.1.3       nlme_3.1-144       lattice_0.20-38    pkgconfig_2.0.3   
[13] rlang_0.4.5        openxlsx_4.1.4     rstudioapi_0.11    curl_4.3          
[17] haven_2.2.0        rio_0.5.16         generics_0.0.2     vctrs_0.2.4       
[21] hms_0.5.3          grid_3.6.3         tidyselect_1.0.0   glue_1.3.2        
[25] data.table_1.12.8  R6_2.4.1           readxl_1.3.1       foreign_0.8-75    
[29] carData_3.0-3      tidyr_1.0.2        purrr_0.3.3        car_3.0-6         
[33] magrittr_1.5       ellipsis_0.3.0     backports_1.1.5    abind_1.4-5       
[37] stringi_1.4.6      broom_0.5.5        crayon_1.3.4

pairwise_binom_test_against_p() fails with unnamed vector

x = c(100, 100)
rstatix::pairwise_binom_test_against_p(x)
#> Error in names(groups) <- paste0("grp", 1:length(x)): attempt to set an attribute on NULL

^{Created on 2020-05-06 by the reprex package (v0.3.0)}

inconsistency in Dunn test p-values?

Although z-values for Dunn test are similar between dunn.test and rstatix, the p-values are different. Any ideas on why this might be? Is rstatix using a different way to compute p-values?

set.seed(123)
library(dunn.test)
library(rstatix)

# data
x <- c(2.9, 3.0, 2.5, 2.6, 3.2) # normal subjects
y <- c(3.8, 2.7, 4.0, 2.4) # with obstructive airway disease
z <- c(2.8, 3.4, 3.7, 2.2, 2.0) # with asbestosis
x <- c(x, y, z)
g <- factor(rep(1:3, c(5, 4, 5)),
            labels = c("Normal",
                       "COPD",
                       "Asbestosis"))

# dunn.test

df <- dunn.test(x, g, method = "none")
#>   Kruskal-Wallis rank sum test
#> 
#> data: x and g
#> Kruskal-Wallis chi-squared = 0.7714, df = 2, p-value = 0.68
#> 
#> 
#>                              Comparison of x by g                              
#>                                 (No adjustment)                                
#> Col Mean-|
#> Row Mean |   Asbestos       COPD
#> ---------+----------------------
#>     COPD |  -0.855235
#>          |     0.1962
#>          |
#>   Normal |  -0.226778   0.641426
#>          |     0.4103     0.2606
#> 
#> alpha = 0.05
#> Reject Ho if p <= alpha/2

tibble::as_tibble(cbind.data.frame(
  "comparisons" = df$comparisons,
  "z.value" = df$Z,
  "p.value" = df$P,
  "p.adjusted" = df$P.adjusted
))
#> # A tibble: 3 x 4
#>   comparisons         z.value p.value p.adjusted
#>   <fct>                 <dbl>   <dbl>      <dbl>
#> 1 Asbestosis - COPD    -0.855   0.196      0.196
#> 2 Asbestosis - Normal  -0.227   0.410      0.410
#> 3 COPD - Normal         0.641   0.261      0.261

# rstatix

m <- cbind.data.frame(x, g)

rstatix::dunn_test(m, x ~ g, p.adjust.method = "none")
#> # A tibble: 3 x 9
#>   .y.   group1 group2        n1    n2 statistic     p p.adj p.adj.signif
#> * <chr> <chr>  <chr>      <int> <int>     <dbl> <dbl> <dbl> <chr>       
#> 1 x     Normal COPD           5     4     0.641 0.521 0.521 ns          
#> 2 x     Normal Asbestosis     5     5    -0.227 0.821 0.821 ns          
#> 3 x     COPD   Asbestosis     4     5    -0.855 0.392 0.392 ns

^{Created on 2019-12-24 by the reprex package (v0.3.0)}

Session info

devtools::session_info()
#> - Session info ---------------------------------------------------------------
#>  setting  value                       
#>  version  R version 3.6.1 (2019-07-05)
#>  os       Windows 10 x64              
#>  system   x86_64, mingw32             
#>  ui       RTerm                       
#>  language (EN)                        
#>  collate  English_United States.1252  
#>  ctype    English_United States.1252  
#>  tz       Europe/Berlin               
#>  date     2019-12-24                  
#> 
#> - Packages -------------------------------------------------------------------
#>  package     * version    date       lib source                          
#>  abind         1.4-5      2016-07-21 [1] CRAN (R 3.5.0)                  
#>  assertthat    0.2.1      2019-03-21 [1] CRAN (R 3.6.0)                  
#>  backports     1.1.5      2019-10-02 [1] CRAN (R 3.6.1)                  
#>  broom         0.5.3.9000 2019-12-15 [1] local                           
#>  callr         3.4.0      2019-12-09 [1] CRAN (R 3.6.1)                  
#>  car           3.0-6      2019-12-23 [1] CRAN (R 3.6.1)                  
#>  carData       3.0-3      2019-11-16 [1] CRAN (R 3.6.1)                  
#>  cellranger    1.1.0      2016-07-27 [1] CRAN (R 3.5.1)                  
#>  cli           2.0.0.9000 2019-12-23 [1] Github (r-lib/cli@0293ae7)      
#>  crayon        1.3.4      2017-09-16 [1] CRAN (R 3.5.1)                  
#>  curl          4.3        2019-12-02 [1] CRAN (R 3.6.1)                  
#>  data.table    1.12.8     2019-12-09 [1] CRAN (R 3.6.1)                  
#>  desc          1.2.0      2019-11-11 [1] Github (r-lib/desc@61205f6)     
#>  devtools      2.2.1      2019-09-24 [1] CRAN (R 3.6.1)                  
#>  digest        0.6.23     2019-11-23 [1] CRAN (R 3.6.1)                  
#>  dplyr         0.8.3.9000 2019-10-10 [1] Github (tidyverse/dplyr@dcfc1d1)
#>  dunn.test   * 1.3.5      2017-10-27 [1] CRAN (R 3.6.0)                  
#>  ellipsis      0.3.0      2019-09-20 [1] CRAN (R 3.6.1)                  
#>  evaluate      0.14       2019-05-28 [1] CRAN (R 3.6.0)                  
#>  fansi         0.4.0      2018-11-05 [1] Github (brodieG/fansi@ab11e9c)  
#>  forcats       0.4.0      2019-02-17 [1] CRAN (R 3.5.2)                  
#>  foreign       0.8-71     2018-07-20 [2] CRAN (R 3.6.1)                  
#>  fs            1.3.1      2019-05-06 [1] CRAN (R 3.6.0)                  
#>  generics      0.0.2      2019-03-05 [1] Github (r-lib/generics@c15ac43) 
#>  glue          1.3.1      2019-03-12 [1] CRAN (R 3.6.0)                  
#>  haven         2.2.0      2019-11-08 [1] CRAN (R 3.6.1)                  
#>  highr         0.8        2019-03-20 [1] CRAN (R 3.6.0)                  
#>  hms           0.5.2      2019-10-30 [1] CRAN (R 3.6.1)                  
#>  htmltools     0.4.0      2019-10-04 [1] CRAN (R 3.6.1)                  
#>  knitr         1.26       2019-11-12 [1] CRAN (R 3.6.1)                  
#>  lifecycle     0.1.0      2019-08-01 [1] CRAN (R 3.6.1)                  
#>  magrittr      1.5        2014-11-22 [1] CRAN (R 3.5.1)                  
#>  memoise       1.1.0      2017-04-21 [1] CRAN (R 3.6.0)                  
#>  openxlsx      4.1.4      2019-12-06 [1] CRAN (R 3.6.1)                  
#>  pillar        1.4.3      2019-12-20 [1] CRAN (R 3.6.1)                  
#>  pkgbuild      1.0.6      2019-10-09 [1] CRAN (R 3.6.1)                  
#>  pkgconfig     2.0.3      2019-09-22 [1] CRAN (R 3.6.1)                  
#>  pkgload       1.0.2      2018-10-29 [1] CRAN (R 3.6.0)                  
#>  prettyunits   1.0.2      2015-07-13 [1] CRAN (R 3.5.1)                  
#>  processx      3.4.1      2019-07-18 [1] CRAN (R 3.6.1)                  
#>  ps            1.3.0      2018-12-21 [1] CRAN (R 3.6.0)                  
#>  purrr         0.3.3      2019-10-18 [1] CRAN (R 3.6.1)                  
#>  R6            2.4.1      2019-11-12 [1] CRAN (R 3.6.1)                  
#>  Rcpp          1.0.3      2019-11-08 [1] CRAN (R 3.6.1)                  
#>  readxl        1.3.1      2019-03-13 [1] CRAN (R 3.6.0)                  
#>  remotes       2.1.0      2019-06-24 [1] CRAN (R 3.6.0)                  
#>  rio           0.5.16     2018-11-26 [1] CRAN (R 3.6.0)                  
#>  rlang         0.4.2      2019-11-23 [1] CRAN (R 3.6.1)                  
#>  rmarkdown     2.0        2019-12-12 [1] CRAN (R 3.6.1)                  
#>  rprojroot     1.3-2      2018-01-03 [1] CRAN (R 3.5.1)                  
#>  rstatix     * 0.3.1      2019-12-16 [1] CRAN (R 3.6.1)                  
#>  sessioninfo   1.1.1      2018-11-05 [1] CRAN (R 3.6.0)                  
#>  stringi       1.4.3      2019-03-12 [1] CRAN (R 3.6.0)                  
#>  stringr       1.4.0      2019-02-10 [1] CRAN (R 3.6.0)                  
#>  testthat      2.3.1      2019-12-01 [1] CRAN (R 3.6.1)                  
#>  tibble        2.1.3      2019-06-06 [1] CRAN (R 3.6.1)                  
#>  tidyr         1.0.0      2019-09-11 [1] CRAN (R 3.6.1)                  
#>  tidyselect    0.2.5      2018-10-11 [1] CRAN (R 3.5.1)                  
#>  usethis       1.5.1.9000 2019-12-12 [1] Github (r-lib/usethis@23dd62c)  
#>  utf8          1.1.4      2018-05-24 [1] CRAN (R 3.5.1)                  
#>  vctrs         0.2.1      2019-12-17 [1] CRAN (R 3.6.1)                  
#>  withr         2.1.2      2018-03-15 [1] CRAN (R 3.5.1)                  
#>  xfun          0.11       2019-11-12 [1] CRAN (R 3.6.1)                  
#>  yaml          2.2.0      2018-07-25 [1] CRAN (R 3.5.1)                  
#>  zeallot       0.1.0      2018-01-28 [1] CRAN (R 3.5.1)                  
#>  zip           2.0.4      2019-09-01 [1] CRAN (R 3.6.1)                  
#> 
#> [1] C:/Users/inp099/Documents/R/win-library/3.6
#> [2] C:/Program Files/R/R-3.6.1/library

kassambara / rstatix Goto Github PK

rstatix's Introduction

rstatix

Key functions

Descriptive statistics

Comparing means

Facilitating ANOVA computation in R

Post-hoc analyses

Comparing proportions

Comparing variances

Effect Size

Correlation analysis

Adjusting p-values, formatting and adding significance symbols

Extract information from statistical tests

Data manipulation helper functions

Others

Installation and loading

Descriptive statistics

Comparing two means

Data

One-sample test

Compare two independent groups

Compare paired samples

Multiple pairwise comparisons

ANOVA test

Correlation tests

Correlation matrix

Related articles

rstatix's People

Contributors

Stargazers

Watchers

Forkers

rstatix's Issues

Recommend Projects

Recommend Topics

Recommend Org