easystats / blog Goto Github PK

View Code? Open in Web Editor NEW

10.0 9.0 1.0 16.32 MB

:mega: The collaborative blog

Home Page: https://easystats.github.io/blog/

CSS 4.89% HTML 66.02% JavaScript 19.50% SCSS 9.59%

blog easystats rstats statistics-modeling

blog's Introduction

easystats: An R Framework for Easy Statistical Modeling, Visualization, and Reporting

What is easystats?

easystats is a collection of R packages, which aims to provide a unifying and consistent framework to tame, discipline, and harness the scary R statistics and their pesky models.

However, there is not (yet) an unique “easystats” way of doing data analysis. Instead, start with one package and, when you’ll face a new challenge, do check if there is an easystats answer for it in other packages. You will slowly uncover how using them together facilitates your life. And, who knows, you might even end up using them all.

Installation

Type	Source	Command
Release	CRAN	`install.packages("easystats")`
Development	r-universe	`install.packages("easystats", repos = "https://easystats.r-universe.dev")`
Development	GitHub	`remotes::install_github("easystats/easystats")`

Finally, as easystats sometimes depends on some additional packages for specific functions that are not downloaded by default. If you want to benefit from the full easystats experience without any hiccups, simply run the following:

easystats::install_suggested()

Citation

To cite the package, run the following command:

citation("easystats")
To cite easystats in publications use:

  Lüdecke, Patil, Ben-Shachar, Wiernik, Bacher, Thériault, & Makowski
  (2022). easystats: Framework for Easy Statistical Modeling,
  Visualization, and Reporting. CRAN.
  doi:10.32614/CRAN.package.easystats
  <https://doi.org/10.32614/CRAN.package.easystats>

A BibTeX entry for LaTeX users is

  @Article{,
    title = {easystats: Framework for Easy Statistical Modeling, Visualization, and Reporting},
    author = {Daniel Lüdecke and Mattan S. Ben-Shachar and Indrajeet Patil and Brenton M. Wiernik and Etienne Bacher and Rémi Thériault and Dominique Makowski},
    journal = {CRAN},
    doi = {https://doi.org/10.32614/CRAN.package.easystats},
    year = {2022},
    note = {R package},
    url = {https://easystats.github.io/easystats/},
  }

If you want to do this only for certain packages in the ecosystem, have a look at this article on how you can do so! https://easystats.github.io/easystats/articles/citation.html

Getting started

Each easystats package has a different scope and purpose. This means your best way to start is to explore and pick the one(s) that you feel might be useful to you. However, as they are built with a “bigger picture” in mind, you will realize that using more of them creates a smooth workflow, as these packages are meant to work together. Ideally, these packages work in unison to cover all aspects of statistical analysis and data visualization.

report: 📜 🎉 Automated statistical reporting of objects in R
correlation: 🔗 Your all-in-one package to run correlations
modelbased: 📈 Estimate effects, group averages and contrasts between groups based on statistical models
bayestestR: 👻 Great for beginners or experts of Bayesian statistics
effectsize: 🐉 Compute, convert, interpret and work with indices of effect size and standardized parameters
see: 🎨 The plotting companion to create beautiful results visualizations
parameters: 📊 Obtain a table containing all information about the parameters of your models
performance: 💪 Models’ quality and performance metrics (R2, ICC, LOO, AIC, BF, …)
insight: 🔮 For developers, a package to help you work with different models and packages
datawizard: 🧙 Magic potions to clean and transform your data

Frequently Asked Questions

How is easystats different from the tidyverse?

You’ve probably already heard about the tidyverse, another very popular collection of packages (ggplot, dplyr, tidyr, …) that also makes using R easier. So, should you pick the tidyverse or easystats? Pick both!

Indeed, these two ecosystems have been designed with very different goals in mind. The tidyverse packages are primarily made to create a new R experience, where data manipulation and exploration is intuitive and consistent. On the other hand, easystats focuses more on the final stretch of the analysis: understanding and interpreting your results and reporting them in a manuscript or a report, while following best practices. You can definitely use the easystats functions in a tidyverse workflow!

easystats + tidyverse = ❤️

Can easystats be useful to advanced users and/or developers?

Yes, definitely! easystats is built in terms of modules that are general enough to be used inside other packages. For instance, the insight package is made to easily implement support for post-processing of pretty much all regression model packages under the sun. We use it in all the easystats packages, but it is also used in other non-easystats packages, such as ggstatsplot, modelsummary, ggeffects, and more.

So why not in yours?

Moreover, the easystats packages are very lightweight, with a minimal set of dependencies, which again makes it great if you want to rely on them.

Documentation

Websites

Each easystats package has a dedicated website.

For example, website for parameters is https://easystats.github.io/parameters/.

Blog

In addition to the websites containing documentation for these packages, you can also read posts from easystats blog: https://easystats.github.io/blog/posts/.

Other learning resources

In addition to these websites and blog posts, you can also check out the following presentations and talks to learn more about this ecosystem:

https://easystats.github.io/easystats/articles/resources.html

Dependencies

easystats packages are designed to be lightweight, i.e., they don’t have any third-party hard dependencies, other than base-R packages or other easystats packages! If you develop R packages, this means that you can safely use easystats packages as dependencies in your own packages, without the risk of entering the dependency hell.

library(deepdep)

plot_dependencies("easystats", depth = 2, show_stamp = FALSE)

As we can see, the only exception is the {see} package, which is responsible for plotting and creating figures and relies on {ggplot2}, which does have a substantial number of dependencies.

Usage

Total downloads

Total	insight	datawizard	parameters	performance	bayestestR	effectsize	correlation	see	modelbased	report	easystats
22,663,701	6,645,177	4,004,503	2,765,087	2,682,787	2,629,672	2,087,821	690,405	572,721	338,671	186,895	59,962

Trend

Contributing

We are happy to receive bug reports, suggestions, questions, and (most of all) contributions to fix problems and add features. Pull Requests for contributions are encouraged.

Here are some simple ways in which you can contribute (in the increasing order of commitment):

Read and correct any inconsistencies in the documentation
Raise issues about bugs or wanted features
Review code
Add new functionality

Code of Conduct

Please note that the ‘easystats’ project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.

blog's People

Contributors

Stargazers

Watchers

Forkers

guhjy

blog's Issues

index page empty

It seems that now (not sure if new error or not) the index page:

https://easystats.github.io/blog/

It's not a big deal, as users will quickly click on "posts" and navigate where they want, but still, would be nicer with the posts being displayed on the main page.

I am not sure what is the reason for that. Since we didn't change anything regarding to the index page (which is automatically generated by blogdown), it might be an update from blogdown? Not sure, will investigate

broken image references

For this post: https://easystats.github.io/blog/posts/bayestestr_evidence_ani/

update portfolio: a few new members are missing from the list

https://easystats.github.io/blog/portfolio/

How to release a blog post

Add a .Rmd to contents
run blogdown::serve_site() so it builds the corresponding html and shows the site locally
if everything is ok, just push everything

scope

@strengejacke What do you think 😄 ? We can just add Rmd files in the content/posts folder, run blogdown::serve_site() command and push...

Announcing parameters and see updates

I'd like to write two to three posts, introducing some of the news that shipped with the recent updates from parameters and see, mostly combinining both packages (accompanied by bayestestR, when showing new plotting-functions in see). What do you think?

[Bookdown request] A textbook for learning stats using easystats

Thanks for all the wonderful packages in easystats.

I understood that there are already many excellent documentation and blog posts, but is it planned to have a complete textbook to learn basic concepts in stats and the uses of easystats in a systematic way?

For example, moderndive is a very good bookdown to learn intro level stats and also the package infer. Spatial Data Science is recommended to learn spatial analysis using their sf and stars. Similarly, as the philosophy of easystats, it aims to make R stats easy by creating many new and intuitive tools. Maybe it is good to have a stats-newbie-friendly book to guide step by step and to plot the big picture of all these tools in helping learn stats easier.

content

The maintainer of R-bloggers asked to add a bit more content first before he connects it to the platform. I'll re-adapt some of the posts from the psycho blog and add them here.

Polish and publish the R programming tutorial

Currently, here:
https://github.com/easystats/easystats/blob/main/WIP/programming.Rmd

Post authorship?

What do we think of having post's author's name attached to the blog post?
(Not in a vain kind of way, just in an informative one - but I'm totally cool with this not even being on the table 😀 )

new post: bootstap stuff

Basically show how to bootstrap a complete analysis.

bootstrap_parameters
w/ emmeans
standardized parameters

(Maybe also show some assumption violations with performace functions?)

Hypothesis testing in the Bayesian framework

From here..

Hypothesis testing framework

Probability of direction

pd is a measure of existence based only on the posterior - it is the maximal percent of the posterior that is on some side of zero.
In a hypothesis testing framework, it tests

H0: theta = 0
H1: theta>0 OR theta<0 (not theta≠0, as it takes the maximal side only)

(??the pd is a measure of certainty - how certain we are that theta is not 0 (by testing how probable is the most probable sign of theta).

ROPE

ROPE is a measure of significance that is based on the posterior and on some pre-conceived notion of a "small" effect.
In a hypothesis testing framework, it tests

H0: theta ∈ ROPE
H1: theta ∉ ROPE

Bayes factors

BF is a relative measure of evidence. In the case where one model is the point null, it tests the relative probability of the data between the models.
In a hypothesis testing framework, it tests

H1: Data were generated by M₁ (as specified by priors)
H2: Data were generated by M₂ (as specified by priors)

p-MAP

There's also the p-MAP, which isn't getting much love by us... We are waiting for the feedback from Prof Jeff Mills from which this index was inspired.

What is common between the indices

Are comparative: ROPE, BF, p-MAP
Examine the posterior values: pd, ROPE, p-MAP
Tell you if an effect exists: pd, p-MAP, BF (when comparing to the point-null)
Tell you if an effect is significant: ROPE, BF (when comparing to a interval-null, which a user would have to construct themselves...)
Can be used to "test" a single parameters: all (BF by comparing to a null constraint on that parameter).
Can be used to compare models: BF
Can be used to support the null: BF, ROPE

I think the main vignette and guidelines should be along one or more of these ^ lines...

It would be interesting to formalize it and develop it even further. Maybe starting with a blogpost? "Hypothesis testing in the Bayesian framework". Such conceptualization could potentially also be integrated as a paragraph in the intro of the significance paper.

performance annoucement

As performance just got on CRAN, we should make an announcement post for it :)

We could re-use the bayestestR ANN one that we did.

DataCamp

As DataCamp became discredited in the community lately, we might think about removing the link to their site on our resources page.

Though I'm not familar with all incidents in the DataCamp file in detail, I guess this might conflict with our code of conduct.

Dealing with images paths

rstudio/blogdown#45

seems that either we manually copy the images to the static folder and then link them accordingly, or it might want if we set relative paths to False here. However, I remember having some issues with this when I tried to set this blog up, so it'll be safer to do some experiments in a few days.

people

@pdwaggoner @mattansb could you guys add yourselves in the monster hunters list.

Trouble rendering the website

Can someone else please try rendering the website ASAP.

If I try, I get the following error:

==> rmarkdown::render_site(encoding = 'UTF-8')

Start building sites … 
hugo v0.92.0-B3549403+extended windows/amd64 BuildDate=2022-01-12T08:23:18Z VendorInfo=gohugoio
ERROR 2022/01/27 04:03:31 Page.Hugo is deprecated and will be removed in Hugo 0.93.0. Use the global hugo function.
ERROR 2022/01/27 04:03:31 Page.RSSLink is deprecated and will be removed in Hugo 0.93.0. Use the Output Format's link, e.g. something like:
    {{ with .OutputFormats.Get "RSS" }}{{ .RelPermalink }}{{ end }}
ERROR 2022/01/27 04:03:31 render of "page" failed: execute of template failed: template: _default/single.html:11:7: executing "footer" at <partial "page-single/footer.html" .>: error calling partial: execute of template failed: template: partials/page-single/footer.html:6:3: executing "partials/page-single/footer.html" at <partial "highlight-js.html" .>: error calling partial: "C:\Users\IndrajeetPatil\Documents\GitHub\blog\themes\hyde-hyde\layouts\partials\highlight-js.html:1:10": execute of template failed: template: partials/highlight-js.html:1:10: executing "partials/highlight-js.html" at <(not (isset .Params "highlight")) and ((isset .Params "highlight") .Params.highlight)>: can't give argument to non-function not (isset .Params "highlight")
ERROR 2022/01/27 04:03:31 render of "page" failed: execute of template failed: template: _default/single.html:11:7: executing "footer" at <partial "page-single/footer.html" .>: error calling partial: execute of template failed: template: partials/page-single/footer.html:6:3: executing "partials/page-single/footer.html" at <partial "highlight-js.html" .>: error calling partial: "C:\Users\IndrajeetPatil\Documents\GitHub\blog\themes\hyde-hyde\layouts\partials\highlight-js.html:1:10": execute of template failed: template: partials/highlight-js.html:1:10: executing "partials/highlight-js.html" at <(not (isset .Params "highlight")) and ((isset .Params "highlight") .Params.highlight)>: can't give argument to non-function not (isset .Params "highlight")
ERROR 2022/01/27 04:03:31 render of "page" failed: execute of template failed: template: _default/single.html:11:7: executing "footer" at <partial "page-single/footer.html" .>: error calling partial: execute of template failed: template: partials/page-single/footer.html:6:3: executing "partials/page-single/footer.html" at <partial "highlight-js.html" .>: error calling partial: "C:\Users\IndrajeetPatil\Documents\GitHub\blog\themes\hyde-hyde\layouts\partials\highlight-js.html:1:10": execute of template failed: template: partials/highlight-js.html:1:10: executing "partials/highlight-js.html" at <(not (isset .Params "highlight")) and ((isset .Params "highlight") .Params.highlight)>: can't give argument to non-function not (isset .Params "highlight")
ERROR 2022/01/27 04:03:31 render of "page" failed: execute of template failed: template: _default/single.html:11:7: executing "footer" at <partial "page-single/footer.html" .>: error calling partial: execute of template failed: template: partials/page-single/footer.html:6:3: executing "partials/page-single/footer.html" at <partial "highlight-js.html" .>: error calling partial: "C:\Users\IndrajeetPatil\Documents\GitHub\blog\themes\hyde-hyde\layouts\partials\highlight-js.html:1:10": execute of template failed: template: partials/highlight-js.html:1:10: executing "partials/highlight-js.html" at <(not (isset .Params "highlight")) and ((isset .Params "highlight") .Params.highlight)>: can't give argument to non-function not (isset .Params "highlight")
Total in 401 ms
Error: Error building site: failed to render pages: render of "page" failed: execute of template failed: template: _default/single.html:11:7: executing "footer" at <partial "page-single/footer.html" .>: error calling partial: execute of template failed: template: partials/page-single/footer.html:6:3: executing "partials/page-single/footer.html" at <partial "highlight-js.html" .>: error calling partial: "C:\Users\IndrajeetPatil\Documents\GitHub\blog\themes\hyde-hyde\layouts\partials\highlight-js.html:1:10": execute of template failed: template: partials/highlight-js.html:1:10: executing "partials/highlight-js.html" at <(not (isset .Params "highlight")) and ((isset .Params "highlight") .Params.highlight)>: can't give argument to non-function not (isset .Params "highlight")

==> The site has been generated to the directory 'docs'.

** Note that normally you cannot just open the .html files in this directory to view them in a browser. This directory need to be served before you can preview web pages correctly (e.g., you may deploy the folder to a web server). Alternatively, blogdown::serve_site() gives you a local preview of the site.

For now, I am just including the static HTML generated on my local machine to not have 404, but the page look is completely off, and we need to fix it ASAP by clearing the site and rendering it again.

Ideas for blogposts

insight
- CRAN annoucement
- Difference between terms, parameters, predictors etc.
bayestestR
- CRAN annoucement
- rnorm_perfect
- distribution
- maybe elaborate more on equivalence_test(), based on the discussion with Aki
- Introduce the bayesfactor function (@mattansb)
- describe_posteriors
- Bayesian approach to frequentist algorithms (easystats/bayestestR#219)
- ...
performance
- CRAN annoucement
- Variance components / R2 / ICC for mixed models
- check_model
- present the check_ family
- Overview / comparison of performance functions
- ...
parameters
- CRAN annoucement
- ~~find_distribution and distribution classification~~
- Data standardization vs. data normalization: also introduce bayetestR::estimate_density.df and see
- Present .*. and parameters_selection
- How to intepret coefficients in a regression (interactions and nested models)
- Parameters standardization
- n_factors
- psych support
- efa_to_cfa and graph plots for lavaan plots
- check_factorstructure
- parameters is also interesting for developpers: parameters_type
- ...
report
- CRAN annoucement
- report_participants
- ...
correlation
- CRAN annoucement
- How to plot correlations
- ...
estimate
- CRAN annoucement
- lighthouse plots
- The world is non-linear: polynomial, splines and GAMs (and their linear segmentation interpretation)
- Signal processing features: smoothing, find_inversions
- ...
see
- CRAN annoucement
- Plotting examples for bayestestR functions (not all, just a small scope like p_direction, rope, ...)
- ...
easystats
- "we are growing, now 5 packages on CRAN"
- easystats_update()
effectsize
- CRAN annoucement
- Data standardization (normal vs. robust)

If you guys have ideas about posts feel free to add :)

bayestestR

Now that bayestestR is on CRAN, maybe we can publish a blog post? I really like the readme-file, actually, we could almost copy/paste that as a first post.

Fix broken URLs

@DominiqueMakowski any ideas how to quickly restore or fix the broken URLs? Many blog pages are affected, I think, though I fixed some of them.

Game of Thrones bait

(just kidding of course:) since GoT madness just started again, we should make a post titled "Do you know which Game of Thrones character dies next?". In which we say "Well, we neither. However, have you heard of the easystats project?" and then go on and present easystats. I bet such post would be clicked on a lot 😅 😅

Feedly

We should think about adding our blog to feedly:
https://www.feedly.com/factory.html

Fix report_anova blog and report_correlation posts

due to the fact that we need to reknit all the posts everytime we built the website, I had to deactivate the output from the two report_ posts (which were initially used as placeholders anyway), because they fail because report is currently broken (AGAIN). Need to fix it, then fix the posts accordingly.

Modernize the blog

We should modernize the blog (probably re-make it using the new blogdown template) so that it's less clunky to use and update. I'm not sure if it will preserve the connection to R-bloggers, but if not we can always update it. @IndrajeetPatil master of the infrastructure pliz help :)

comment on the p-direction post

I couldn't figure out how to leave comments on the blog, so I hope you don't mind if I open an issue relating to the post the p-direction. Apologies in advance if this is obvious and just considered too much for the blog post!

You can compute traditional p-values for Bayesian estimators using the bootstrap. Using max a posteriori (MAP) will then produce results identical to the traditional p-value derived from penalized maximum likelihood where the prior is considered the "penalty". But MAP isn't a Bayesian estimator and doesn't have the nice properties of the two common Bayesian estimators, the posterior mean (minimizes expected square error) and posterior median (mimimizes expected absolute error). Deriving a point estimate isn't particularly Bayesian, but at least the posterior mean and median have natural probabilistic interpretations as an expectation and the point at which a 50% probability obtains. With those estimators, results will vary from MAP based on how skewed the distribution is.

A bigger issue is that MAP doesn't even exist for our bread and butter hierarchical models. The frequentist approach is to use maximum marginal likelihood (this is often called "empirical Bayes" in that the MML estimate is for the hierarchical or "prior" parameters). This leads to underestimates of lower-level regression coefficient uncertainty by construction, as you see in packages like lme4 in R.

Part of the point of Bayesian inference is to not have to collapse to a point estimate. When we want to do downstream predictive inference, we don't want to just plug in an estimate, we want to do posterior predictive inference and average over our uncertainty in parameter estimation.

Defining what it means for a prior to be "informative" is tricky and it wasn't defined in this post. This is particularly vexed because of changes of variables. A prior defined for a probability variable in [0, 1] that's flat is very different from a flat prior for a log odds variable in (-infinity, infinity). A flat prior on [0, 1] under the logit transform leads to a standard logistic prior on the log odds. That's not flat. In MLE, changing variables doesn't matter, but it does in Bayes.

I wouldn't say that changing the threshold for significance with regularization is a good thing. While regularlization can be good for error control (trading variance for bias), the whole notion of a dichotomous up/down decision through signfiicance is the problem, not the threshold used. Also, we tend to use regularization that is not to zero, but to the population mean. This is also common in frequentist penalized maximum likelihood estimates (see, e.g., Efron and Morris's famous paper on predicting batting average in baseball using empirical Bayes, which despite the name, is a frequentist max-marginal likelihood method). That's even better for error control than shrinkage, but it's going to have the "wrong" effect on this notion of p-direction unless you talk about p-direction of the difference from the population estimate, rather than the random effect itself (that is, you don't want to say Connecticut is significantly different than zero, but significantly different than other states).

P.S. For reference, Gelman et al. use a similar, but not equivalent notion, in calculating posterior predictive p-values in Bayesian Data Analysis, but without flipping signs (so that either values near 0 or near 1 are evidence the model doesn't fit the data well). These are not intended to be used in hypothesis tests, though, just as a diagnostic.

ressources.md should be resources.md

The file "ressources.md" should be "resources.md". I guess menu link Ressources is created automagically from the filename and will correct itself with the name change?