Code Monkey home page Code Monkey logo

statisticalrethinking.jl's Introduction

StatisticalRethinking v4

Project Status Documentation Build Status

Purpose of this package

The StatisticalRethinking.jl package contains functions comparable to the functions in the R package "rethinking" associated with the book Statistical Rethinking by Richard McElreath.

These functions are used in Jupyter and Pluto notebook projects specifically intended for hands-on use while studying the book or taking the course.

Currently there are 3 of these notebook projects:

  1. Max Lapan's rethinking-2ed-julia which uses Turing.jl and Jupyter notebooks.

  2. The SR2TuringPluto.jl project, also Turing.jl based but using Pluto.jl instead of Jupyter. It is based on Max Lapan's work above.

  3. The SR2StanPluto.jl project, which uses Stan as implemented in StanSample.jl and StanQuap.jl. See StanJulia.

There is a fourth option to study the Turing.jl versions of the models in the Statistical Rethinking book which is in the form of a package and Franklin web pages: TuringModels.jl.

Why a StatisticalRethinking v4?

Over time more options become available to express the material covered in Statistical Rethinking, e.g. the use of KeyedArrays (provided by AxisKeys.jl) for the representation of mcmc chains.

Other examples are the recently developed ParetoSmooth.jl which could be used in the PSIS related examples as a replacement for ParetoSmoothedImportanceSampling.jl and the preliminary work by SHMUMA on Dagitty.jl (a potential replacement for StructuralCausalModels.jl).

While StatisticalRethinking v3 focused on making StatisticalRethinking.jl mcmc package independent, StatisticalRethinking v4 aims at de-coupling it from a specific graphical package and thus enables new choices for graphics, e.g. using Makie.jl and AlgebraOfGraphics.jl.

StatisticalRethinking.jl v4 also fits better with the new setup of Pluto notebooks which keep track of used package versions in the notebooks themselves (see here).

Workflow of StatisticalRethinkingJulia (v4):

  1. Data preparation, typically using CSV.jl, DataFrames.jl and some statistical methods from StatsBase.jl and Statistics.jl. In some cases simulations are used which need Distributions.jl and a few special methods (available in StatisticalRethinking.jl).

  2. Define the mcmc model, e.g. using StanSample.jl or Turing.jl, and obtain draws from the model.

  3. Capture the draws for further processing. In Turing that is usually done using MCMCChains.jl, in StanSample.jl v4 it's mostly in the form of a DataFrame, a StanTable, a KeyedArray chains (obtained from AxisKeys.jl).

  4. For further processing, the projects nearly always convert chains to a DataFrame.

  5. Inspect the chains using statistical and visual methods. In many cases this will need one or more statistical packages and one of the graphical options.

Currently visual options are StatsPlots/Plots based, e.g. in MCMCChains.jl and StatisticalRethinkingPlots.jl.

  1. The above 5 steps could all be done by just using StanSample.jl or Turing.jl.

The book Statistical Rethinking has a different objective and studies how models compare, how models can help (or mislead) and why multilevel modeling might help in some cases.

  1. For this, additional packages are available, explained and demonstrated, e.g. StructuralCausalModels.jl, ParetoSmoothedImportanceSampling.jl and quite a few more.

Using StatisticalRethinking v4

To work through the StatisticalRethinking book using Julia and Turing or Stan, download either one of the above mentioned projects and start Pluto (or Jupyter).

An early, experimental version of StructuralCausalModels.jl is also included as a dependency in the StatisticalRethinking.jl package.

In the meantime I will definitely keep my eyes on Dagitty.jl, Omega.jl and CausalInference.jl. In particular Dagitty.jl has very similar objectives as StructuralCausalModels.jl and over time might replace it in the StatisticalRethinkingJulia ecosystem. For now, StructuralCausalModels does provide ways to convert DAGs to Dagitty and ggm formats.

Similarly, a dependency ParetoSmoothedImportanceSampling.jl is used which provides PSIS and WAIC statistics for model comparison.

Versions

As listed in issue #145 recently it was noticed that some very old Jupyter notebook files are still present which makes an initial download, e.g. when dev-ing the package, rather long. This is not a problem when you just add the package.

I am planning to address that in v5.

Version 4

  • Drop the heavy use of @reexport.
  • Enable a future switch to Makie.jl and AlgebraOfGraphics.jl by moving all graphics to StatisticalRethinkingPlots and StatisticalRethinkingMakie (in the future).
  • Many more improvements by Max Lapan (@shmuma).

Versions 3.2.1 - 3.3.6

  • Improvements by Max Lapan.
  • Added trankplot.jl.
  • Add compare() and plot_models() abstractions.

Version 3.2.0

  • Option to retieve sampling results as a NamedTuple.
  • Added new method to plotbounds() to handle NamedTuples.
  • Added plotlines().

Versions v3.1.1 - 3.1.8

  • Updates from CompatHelper.
  • Switch to Github actions (CI, Documenter).
  • Updates from Rik Huijzer (link function).
  • Redo quap() based on StanOptimize.
  • Start Updating notebooks in ch 2-8 using new quap().
  • Redoing and updating the models in the models subdirectory.

Version 3.1.0

Align (stanbased) quap with Turing quap. quap() now returns a NamedTuple that includes a field distr which represents the quadratic Normal (MvNormal) approximation.

Version 3.0.0

StatisticalRethinking.jl v3 is independent of the underlying mcmc package. All scripts previously in StatisticalRethinking.jl v2 holding the snippets have been replaced by Pluto notebooks in the above mentioned mcmc specific project repositories.

Initially SR2TuringPluto.jl will lag SR2StanPluto.jl somewhat but later this year both will cover the same chapters.

It is the intention to develop tests for StatisticalRethinking.jl v3 that work across the different mcmc implementations. This will limit dependencies to the test/Project.toml.

Version 2.2.9

Currently the latest release available in the StatisticalRethinking.jl v2 format.

Installation

To install the package (from the REPL):

] add StatisticalRethinking

but in most cases this package will be a dependency of another package or project, e.g. SR2StanPluto.jl or SR2TuringPluto.jl.

Documentation

  • STABLEdocumentation of the most recently tagged version.
  • DEVELdocumentation of the in-development version.

Acknowledgements

Of course, without the excellent textbook by Richard McElreath, this package would not have been possible. The author has also been supportive of this work and gave permission to use the datasets.

Questions and issues

Question and contributions are very welcome, as are feature requests and suggestions. Please open an issue if you encounter any problems or have a question.

codecov

Coverage Status

statisticalrethinking.jl's People

Contributors

alecloudenback avatar eljfe avatar github-actions[bot] avatar goedman avatar josephwagner avatar karajan9 avatar pitmonticone avatar rikhuijzer avatar shmuma avatar torkar avatar tpapp avatar yibeichan avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

statisticalrethinking.jl's Issues

Plotbounds for Turing

Is there any code to plot the bounds using Turing? I have succeeded in plotting the linear models but haven't been able to plot the quadratic or cubic models.

TagBot trigger issue

This issue is used to trigger TagBot; feel free to unsubscribe.

If you haven't already, you should update your TagBot.yml to include issue comment triggers.
Please see this post on Discourse for instructions and more details.

PI doc string needs updating

Hi,
Thanks for this package!

Small issue with the doc string. The doc string proposes the prob= argument, where it should be perc_prob=

To reproduce, ... way down at the bottom is the PI(1:10; prob=0.1) argument.

help?> PI
search: plot_density_interval PI pi Pipe pie pie! pipeline PipeBuffer sinpi cospi cispi getpid rem2pi mod2pi groupindices Pair print

  PI
  ≡≡≡≡

  Compute percentile central interval of data. Returns vector of bounds.

  PI(data; perc_prob)
  

  Required arguments
  ====================

    •  data: iterable over data values

  Optional arguments
  ====================

    •  prob::Float64=0.89: percentile interval to calculate

  Examples
  ==========

  julia> using StatisticalRethinking
  
  julia> PI(1:10)
  2-element Vector{Float64}:
   1.495
   9.505
  
  julia> PI(1:10; prob=0.1)
  2-element Vector{Float64}:
   5.05
   5.95

It got me at first.

'.git' folder is bloated with 1 Go of data !

My branch is up to date
'

julia du -h StatisticalRethinking.jl
28K StatisticalRethinking.jl/.git/hooks
0 StatisticalRethinking.jl/.git/info
0 StatisticalRethinking.jl/.git/logs/refs/heads
0 StatisticalRethinking.jl/.git/logs/refs/remotes/origin
0 StatisticalRethinking.jl/.git/logs/refs/remotes
0 StatisticalRethinking.jl/.git/logs/refs
0 StatisticalRethinking.jl/.git/logs
0 StatisticalRethinking.jl/.git/objects/info
1.2G StatisticalRethinking.jl/.git/objects/pack
1.2G StatisticalRethinking.jl/.git/objects
0 StatisticalRethinking.jl/.git/refs/heads
0 StatisticalRethinking.jl/.git/refs/remotes/origin
0 StatisticalRethinking.jl/.git/refs/remotes
0 StatisticalRethinking.jl/.git/refs/tags
0 StatisticalRethinking.jl/.git/refs
1.2G StatisticalRethinking.jl/.git
0 StatisticalRethinking.jl/chapters/00
16K StatisticalRethinking.jl/chapters/02
16K StatisticalRethinking.jl/chapters/03
44K StatisticalRethinking.jl/chapters/04
8.0K StatisticalRethinking.jl/chapters/05
12K StatisticalRethinking.jl/chapters/09
96K StatisticalRethinking.jl/chapters
1.8M StatisticalRethinking.jl/data
16K StatisticalRethinking.jl/docs/src
24K StatisticalRethinking.jl/docs
140K StatisticalRethinking.jl/notebooks/00
1.5M StatisticalRethinking.jl/notebooks/02
3.1M StatisticalRethinking.jl/notebooks/03
12M StatisticalRethinking.jl/notebooks/04
960K StatisticalRethinking.jl/notebooks/05
128K StatisticalRethinking.jl/notebooks/09
18M StatisticalRethinking.jl/notebooks
0 StatisticalRethinking.jl/scripts/00
16K StatisticalRethinking.jl/scripts/02
20K StatisticalRethinking.jl/scripts/03
96K StatisticalRethinking.jl/scripts/04
8.0K StatisticalRethinking.jl/scripts/05
16K StatisticalRethinking.jl/scripts/09
156K StatisticalRethinking.jl/scripts
0 StatisticalRethinking.jl/src/quaps
16K StatisticalRethinking.jl/src
16K StatisticalRethinking.jl/test
1.2G StatisticalRethinking.jl /0.7s
➜ julia cd StatisticalRethinking.jl/'

So for about 4 Mo of data the .git is 1 Go
Some branches needs squashing ?

Low effective sample size in m12.6.jl

Hi -

The Turing model in m12.6.jl appears to have a low effective sample size. m12.1.jl and m12.2.jl seem to be good, but I have not tested the others because their run times are long. It's not clear why all the models have long run times. From what I gathered, there might be ongoing performance issues with Turing that have not been fully resolved (see https://github.com/TuringLang/Turing.jl/issues/325). So the run time issue may not be related to the model specification of the examples.

m12.6.jl

using StatisticalRethinking
using Turing

Turing.setadbackend(:reverse_diff)

d = CSV.read(joinpath(dirname(Base.pathof(StatisticalRethinking)), "..", "data",
    "Kline.csv"), delim=';');
size(d) # Should be 10x5

d[:log_pop] = map((x) -> log(x), d[:population]);
d[:society] = 1:10;

@model m12_6(total_tools, log_pop, society) = begin

    N = length(total_tools)

    α ~ Normal(0, 10)
    βp ~ Normal(0, 1)

    σ_society ~ Truncated(Cauchy(0, 1), 0, Inf)

    N_society = length(unique(society)) #10

    α_society = Vector{Real}(undef, N_society)

    α_society ~ [Normal(0, σ_society)]

    for i ∈ 1:N
        λ = exp(α + α_society[society[i]] + βp*log_pop[i])
        total_tools[i] ~ Poisson(λ)
    end
end

posterior = sample(m12_6(d[:total_tools], d[:log_pop],
    d[:society]), Turing.NUTS(4000, 1000, 0.95));

describe(posterior)

Results

[NUTS] Sampling...100% Time: 0:03:10
[NUTS] Finished with
  Running time        = 189.01329048200012;
  #lf / sample        = 0.0035;
  #evals / sample     = 186.61175;
  pre-cond. metric    = [1.0, 1.0, 1.0, 1.0, 1.0, 1.0,....
Iterations = 1:4000
Thinning interval = 1
Chains = 1
Samples per chain = 4000

Empirical Posterior Estimates:
                   Mean             SD           Naive SE           MCSE          ESS
            α    0.770457654    0.8398108547   0.013278575523   0.09904739668   71.891383
 α_society[9]    0.337460060    0.4270482149   0.006752225149   0.06140214467   48.371187
       lf_num    0.003500000    0.2213594362   0.003500000000   0.00350000000 4000.000000
 α_society[8]   -0.140869387    0.2497780406   0.003949337588   0.02613925834   91.310766
      elapsed    0.047253323    0.0828856831   0.001310537721   0.00092100913 4000.000000
      epsilon    0.005160425    0.0016025041   0.000025337815   0.00005154582  966.521282
 α_society[3]    0.021898862    0.2591870665   0.004098107351   0.02527437584  105.163680
 α_society[1]   -0.096965256    0.4072306578   0.006438882059   0.05316658157   58.668294
α_society[10]   -0.171333714    0.3778259984   0.005973953571   0.04304737075   77.035413
 α_society[5]    0.076119591    0.2271698255   0.003591870321   0.01887655095  144.829156
     eval_num  186.611750000   23.2244882861   0.367211402380   2.91686468678   63.395691
 α_society[7]    0.221995564    0.3807670154   0.006020455133   0.05339425422   50.854524
    σ_society    0.413636355    0.4968162365   0.007855354429   0.07478289106   44.135399
 α_society[2]    0.139162211    0.2897333942   0.004581087200   0.03116427768   86.433675
 α_society[6]   -0.279303630    0.2994251039   0.004734326584   0.03499527610   73.207836
           βp    0.042494276    2.2630663543   0.035782220879   0.25111254635   81.219021
           lp -865.815326541 7545.5204017129 119.305153003407 826.42652778225   83.362340
 α_society[4]    0.423954360    0.3775198792   0.005969113402   0.05089604479   55.018859
       lf_eps    0.005160425    0.0016025041   0.000025337815   0.00005154582  966.521282

Reexporting of packages

I've noticed that StatisticalRethinking.jl is reexporting lots of packages which are not being used in methods.
For example, BSpline, has no references, but was exported.

This is a bit confusing, because it adds "fake dependencies" of the package and issues with importing both StatisticalRethinking and original package which was reexported.

To illustrate, if we do this:

using StatisticalRethiking
using Interpolations

methods(interpolate)

Will produce a warning
WARNING: both StatisticalRethinking and Interpolations export "interpolate"; uses of it in module Main must be qualified

Similar situation is with other reexported modules. This is very surprising that by importing StatisticalRethinking we're automatically getting full DataFrames methods set into our namespace. What if I don't want to use data frames? From my perspective, user should have a control over the functionality it gets by importing modules he is going to use, so, this "extra surpise bonus" is not very polite behaviour of StatisticalRethiking.

StatisticalRethinking.jl v3: Just a set of methods (such as plotcoef, precis, etc.)?

In StatisticalRethinkingJulia/TuringModels.jl#23 (comment) I suggested to end up with a StatisticalRethinking.jl to provide just a set of methods (such as plotcoef, precis, plotbounds, link, quap etc which are useful for all 3 ...Models.jl repos) and move the actual use of these methods to the specific mcmc model repositories. In the book these methods are typically explained in the Overthinking sections.

This is a significant change in direction so I wanted to post very early on to get feedback.

Someone who wants to take the StatisticalRethinking course and is interested in Turing would use either TuringModels.jl (for just the models) or a still to be created StatisticalRethinkingTuring.jl. Or these 2 repos could be combined.

Similarly, for Stan it would mean a lot of the current contents of StatisticalRethinking.jl would either move to StanModels.jl (basically all models) and the core course materials would end up in StatisticalRethinkingStan.jl.

Just a thought for now as more and more folks (rightly so!) seem interested in a Turing specific version.

Problem with opening file and possibly a flux dependency

Hi Rob-

I want to report a problem I am having on my machine. I am using Julia 1.0.3 on Ubuntu 16.0. Here is the error I receive when I try to run the tests.

(v1.0) pkg> test StatisticalRethinking
   Testing StatisticalRethinking
    Status `/tmp/tmpPmsCDn/Manifest.toml`
  [621f4979] AbstractFFTs v0.3.2
  [1520ce14] AbstractTrees v0.2.1
  [79e6a3ab] Adapt v0.4.2
  [ec485272] ArnoldiMethod v0.0.4
  [7d9fca2a] Arpack v0.3.0
  [13072b0f] AxisAlgorithms v0.3.0
  [76274a88] Bijectors v0.2.5
  [9e28174c] BinDeps v0.8.10
  [b99e7846] BinaryProvider v0.5.3
  [a74b3585] Blosc v0.5.1
  [e1450e63] BufferedStreams v1.0.0
  [631607c0] CMake v1.1.1
  [d5fb7624] CMakeWrapper v0.2.2
  [336ed68f] CSV v0.4.3
  [159f3aea] Cairo v0.5.6
  [49dc2e85] Calculus v0.4.1
  [324d7699] CategoricalArrays v0.5.2
  [aaaa29a8] Clustering v0.12.1
  [593b3428] CmdStan v4.5.1
  [944b1d66] CodecZlib v0.5.1
  [3da002f7] ColorTypes v0.7.5
  [5ae59095] Colors v0.9.5
  [861a8166] Combinatorics v0.7.0
  [bbf7d656] CommonSubexpressions v0.2.0
  [34da2185] Compat v1.4.0
  [a81c6b42] Compose v0.7.2
  [8f4d0f93] Conda v1.1.1
  [d38c429a] Contour v0.5.1
  [7ad07ef1] CoupledFields v0.1.0
  [a93c6f00] DataFrames v0.17.0
  [9a8bc11e] DataStreams v0.4.1
  [864edb3b] DataStructures v0.15.0
  [e7dc6d0d] DataValues v0.4.6
  [01453d9d] DiffEqDiffTools v0.7.1
  [163ba53b] DiffResults v0.0.3
  [b552c78f] DiffRules v0.0.7
  [b4f34e82] Distances v0.7.4
  [31c24e10] Distributions v0.16.4
  [ffbed154] DocStringExtensions v0.6.0
  [8f5d6c58] EzXML v0.9.0
  [7a1cc6ca] FFTW v0.2.4
  [5789e2e9] FileIO v1.0.5
  [53c48c17] FixedPointNumbers v0.5.3
  [587475ba] Flux v0.7.0
  [f6369f11] ForwardDiff v0.10.2
  [38e38edf] GLM v1.0.2
  [28b8d3ca] GR v0.37.0
  [c91e804a] Gadfly v1.0.1
  [a2bd30eb] Graphics v0.4.0
  [f67ccb44] HDF5 v0.11.0
  [0862f596] HTTPClient v0.2.1
  [a1b4810d] Hexagons v0.2.0
  [d9be37ee] Homebrew v0.7.1
  [9b13fd28] IndirectArrays v0.5.0
  [d25df0c9] Inflate v0.1.1
  [a98d9a8b] Interpolations v0.11.1
  [c8e1da08] IterTools v1.1.1
  [1c8ee90f] IterableTables v0.10.0
  [82899510] IteratorInterfaceExtensions v0.1.1
  [4138dd39] JLD v0.9.1
  [682c06a0] JSON v0.20.0
  [e5e0dc1b] Juno v0.5.4
  [5ab0869b] KernelDensity v0.5.1
  [1b4a561d] LegacyStrings v0.4.1
  [b27032c2] LibCURL v0.4.1
  [522f3ed2] LibExpat v0.5.0
  [6f1fad26] Libtask v0.2.1
  [2ec943e9] Libz v1.0.0
  [093fc24a] LightGraphs v1.2.0
  [d3d80556] LineSearches v7.0.1
  [98b081ad] Literate v1.0.2
  [4345ca2d] Loess v0.5.0
  [1671dc4f] MCMCChain v0.2.2
  [1914dd2f] MacroTools v0.4.4
  [5424a776] Mamba v0.12.0
  [dbb5928d] MappedArrays v0.2.1
  [442fdcdd] Measures v0.3.0
  [e89f7d12] Media v0.5.0
  [e1d29d7a] Missings v0.4.0
  [78c3b35d] Mocking v0.5.7
  [d41bc354] NLSolversBase v7.1.2
  [872c559c] NNlib v0.4.3
  [77ba4419] NaNMath v0.3.2
  [86f7a689] NamedArrays v0.9.2
  [b8a86587] NearestNeighbors v0.4.3
  [4d1e1d77] Nullables v0.0.8
  [510215fc] Observables v0.2.3
  [6fe1bfb0] OffsetArrays v0.9.1
  [429524aa] Optim v0.17.2
  [bac558e1] OrderedCollections v1.0.2
  [90014a1f] PDMats v0.9.6
  [d96e819e] Parameters v0.10.3
  [69de0a69] Parsers v0.2.16
  [ccf2f8ad] PlotThemes v0.3.0
  [995b91a9] PlotUtils v0.5.5
  [91a5bcdd] Plots v0.22.5
  [f27b6e38] Polynomials v0.5.2
  [85a6dd25] PositiveFactorizations v0.2.1
  [92933f4c] ProgressMeter v0.9.0
  [1fd47b50] QuadGK v2.0.3
  [df47a6cb] RData v0.6.0
  [ce6b1742] RDatasets v0.6.1
  [c84ed2f1] Ratios v0.3.0
  [3cdcf5f2] RecipesBase v0.6.0
  [189a3867] Reexport v0.2.0
  [ae029012] Requires v0.5.2
  [79098fc4] Rmath v0.5.0
  [992d4aef] Showoff v0.2.1
  [699a6c99] SimpleTraits v0.8.0
  [a2af1166] SortingAlgorithms v0.3.1
  [276daf66] SpecialFunctions v0.7.2
  [8f1571ae] StanMCMCChain v4.1.0
  [60ddc479] StatPlots v0.9.0
  [90137ffa] StaticArrays v0.10.2
  [2d09df54] StatisticalRethinking v0.4.0
  [2913bbd2] StatsBase v0.27.0
  [4c63d2b9] StatsFuns v0.7.0
  [3eaba693] StatsModels v0.4.0
  [3783bdb8] TableTraits v0.4.1
  [382cd787] TableTraitsUtils v0.3.1
  [bd369af6] Tables v0.1.14
  [f269a46b] TimeZones v0.8.4
  [3bb67fe8] TranscodingStreams v0.8.1
  [fce5fe82] Turing v0.6.5
  [30578b45] URIParser v0.4.0
  [81def892] VersionParsing v1.1.3
  [ea10d353] WeakRefStrings v0.5.3
  [cc8bc4a8] Widgets v0.4.4
  [c17dfb99] WinRPM v0.4.2
  [efce3f68] WoodburyMatrices v0.4.1
  [a5390f91] ZipFile v0.8.0
  [2a0f44e3] Base64  [`@stdlib/Base64`]
  [8bf52ea8] CRC32c  [`@stdlib/CRC32c`]
  [ade2ca70] Dates  [`@stdlib/Dates`]
  [8bb1440f] DelimitedFiles  [`@stdlib/DelimitedFiles`]
  [8ba89e20] Distributed  [`@stdlib/Distributed`]
  [9fa8497b] Future  [`@stdlib/Future`]
  [b77e0a4c] InteractiveUtils  [`@stdlib/InteractiveUtils`]
  [76f85450] LibGit2  [`@stdlib/LibGit2`]
  [8f399da3] Libdl  [`@stdlib/Libdl`]
  [37e2e46d] LinearAlgebra  [`@stdlib/LinearAlgebra`]
  [56ddb016] Logging  [`@stdlib/Logging`]
  [d6f4376e] Markdown  [`@stdlib/Markdown`]
  [a63ad114] Mmap  [`@stdlib/Mmap`]
  [44cfe95a] Pkg  [`@stdlib/Pkg`]
  [de0858da] Printf  [`@stdlib/Printf`]
  [9abbd945] Profile  [`@stdlib/Profile`]
  [3fa0cd96] REPL  [`@stdlib/REPL`]
  [9a3f8284] Random  [`@stdlib/Random`]
  [ea8e919c] SHA  [`@stdlib/SHA`]
  [9e88b42a] Serialization  [`@stdlib/Serialization`]
  [1a1011a3] SharedArrays  [`@stdlib/SharedArrays`]
  [6462fe0b] Sockets  [`@stdlib/Sockets`]
  [2f01184e] SparseArrays  [`@stdlib/SparseArrays`]
  [10745b16] Statistics  [`@stdlib/Statistics`]
  [4607b0f0] SuiteSparse  [`@stdlib/SuiteSparse`]
  [8dfed614] Test  [`@stdlib/Test`]
  [cf7118a7] UUIDs  [`@stdlib/UUIDs`]
  [4ec0a83e] Unicode  [`@stdlib/Unicode`]
WARNING: Method definition <=(Flux.Tracker.TrackedReal{T} where T<:Real, Flux.Tracker.TrackedReal{T} where T<:Real) in module Tracker at /home/dfish/.julia/packages/Flux/T3PhK/src/tracker/lib/real.jl:45 overwritten in module Turing at /home/dfish/.julia/packages/Turing/kroIM/src/core/ad.jl:145.
WARNING: Method definition <=(Flux.Tracker.TrackedReal{T} where T<:Real, Flux.Tracker.TrackedReal{T} where T<:Real) in module Turing at /home/dfish/.julia/packages/Turing/kroIM/src/core/ad.jl:145 overwritten in module Tracker at /home/dfish/.julia/packages/Flux/T3PhK/src/tracker/lib/real.jl:45.
WARNING: Method definition <=(Flux.Tracker.TrackedReal{T} where T<:Real, Flux.Tracker.TrackedReal{T} where T<:Real) in module Tracker at /home/dfish/.julia/packages/Flux/T3PhK/src/tracker/lib/real.jl:45 overwritten in module Turing at /home/dfish/.julia/packages/Turing/kroIM/src/core/ad.jl:145.
┌ Warning: Package StatisticalRethinking does not have Flux in its dependencies:
│ - If you have StatisticalRethinking checked out for development and have
│   added Flux as a dependency but haven't updated your primary
│   environment's manifest file, try `Pkg.resolve()`.
│ - Otherwise you may need to report an issue with StatisticalRethinking
└ Loading Flux into StatisticalRethinking from project dependency, future warnings for StatisticalRethinking are suppressed.
┌ Warning: The call to compilecache failed to create a usable precompiled cache file for StatisticalRethinking [2d09df54-9d0f-5258-8220-54c2a3d4fbee]
│   exception = Required dependency KernelDensity [5ab0869b-81aa-558d-bb23-cbf5423bbe9b] failed to load from a cache file.
└ @ Base loading.jl:969
loaded
┌ Warning: Package StatisticalRethinking does not have Flux in its dependencies:
│ - If you have StatisticalRethinking checked out for development and have
│   added Flux as a dependency but haven't updated your primary
│   environment's manifest file, try `Pkg.resolve()`.
│ - Otherwise you may need to report an issue with StatisticalRethinking
└ Loading Flux into StatisticalRethinking from project dependency, future warnings for StatisticalRethinking are suppressed.
[ Info: generating plain script file from `~/.julia/packages/StatisticalRethinking/iUqnm/scripts/00/clip-01-03.jl`
[ Info: not running on Travis, skipping links will not be correct.
[ Info: writing result to `~/.julia/packages/StatisticalRethinking/iUqnm/chapters/00/clip-01-03.jl`
ERROR: LoadError: SystemError: opening file /home/dfish/.julia/packages/StatisticalRethinking/iUqnm/chapters/00/clip-01-03.jl: Permission denied
Stacktrace:
 [1] #systemerror#39(::Nothing, ::Function, ::String, ::Bool) at ./error.jl:106
 [2] systemerror at ./error.jl:106 [inlined]
 [3] #open#293(::Nothing, ::Nothing, ::Nothing, ::Bool, ::Nothing, ::Function, ::String) at ./iostream.jl:283
 [4] #open at ./iostream.jl:0 [inlined]
 [5] open(::String, ::String) at ./iostream.jl:339
 [6] #open#294(::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}, ::Function, ::getfield(Base, Symbol("##254#255")){String,Tuple{}}, ::String, ::Vararg{String,N} where N) at ./iostream.jl:367
 [7] open at ./iostream.jl:367 [inlined]
 [8] write at ./io.jl:283 [inlined]
 [9] #script#9(::typeof(identity), ::typeof(identity), ::String, ::Bool, ::Bool, ::Bool, ::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}, ::Function, ::String, ::String) at /home/dfish/.julia/packages/Literate/OTyBa/src/Literate.jl:276
 [10] script(::String, ::String) at /home/dfish/.julia/packages/Literate/OTyBa/src/Literate.jl:235
 [11] (::getfield(Main, Symbol("##3#4")))() at /home/dfish/.julia/packages/StatisticalRethinking/iUqnm/test/runtests.jl:27
 [12] cd(::getfield(Main, Symbol("##3#4")), ::String) at ./file.jl:96
 [13] top-level scope at /home/dfish/.julia/packages/StatisticalRethinking/iUqnm/test/runtests.jl:19
 [14] include at ./boot.jl:317 [inlined]
 [15] include_relative(::Module, ::String) at ./loading.jl:1044
 [16] include(::Module, ::String) at ./sysimg.jl:29
 [17] include(::String) at ./client.jl:392
 [18] top-level scope at none:0
in expression starting at /home/dfish/.julia/packages/StatisticalRethinking/iUqnm/test/runtests.jl:6
ERROR: Package StatisticalRethinking errored during testing

trankplots() and precis() function

Discussed in https://github.com/orgs/StatisticalRethinkingJulia/discussions/156

Originally posted by Zhe-Jessica March 2, 2023
Hi, I notice that trankplots() and precis() function has been added to StatisticalRethinking.jl during the previous update. This documentation (see below) also mentioned this two function.
https://stanjulia.github.io/Stan.jl/stable/WALKTHROUGH2/
I tested them in the 09-Chapter 9. Markov Chain Monte Carlo.ipynb, but these functions don't work. The error shows 'trankplots not defined' and 'no method matching precis'. I think these two functions are very useful. Is it possible to give an example code for how to use them?

Dependency issue with psisloo function

Whey I try to use compare in :psis mode, I get

UndefVarError: psisloo not defined

Stacktrace:
 [1] compare(m::Vector{Matrix{Float64}}, ::Val{:psis}; mnames::Vector{String})
   @ StatisticalRethinking ~/.julia/packages/StatisticalRethinking/Uy2Qf/src/compare_models.jl:82
 [2] compare(m::Vector{Matrix{Float64}}, type::Symbol; mnames::Vector{String})
   @ StatisticalRethinking ~/.julia/packages/StatisticalRethinking/Uy2Qf/src/compare_models.jl:132
 [3] compare(m::Vector{Matrix{Float64}}, type::Symbol)
   @ StatisticalRethinking ~/.julia/packages/StatisticalRethinking/Uy2Qf/src/compare_models.jl:132
 [4] top-level scope
   @ In[92]:1
 [5] eval
   @ ./boot.jl:360 [inlined]
 [6] include_string(mapexpr::typeof(REPL.softscope), mod::Module, code::String, filename::String)
   @ Base ./loading.jl:1116

I found this function in ParetoSmoothedImportanceSampling.jl, but we should have it in dependecies list.

StatisticalRethinking v4 (SR4)

Notes/ideas:

  1. The Stan side chains of SR4 will be based on AxisKeys.jl's KeyedArrays, e.g.:
julia> cans = read_samples(m5_1s)
3-dimensional KeyedArray(NamedDimsArray(...)) with keys:
↓   iteration ∈ 1000-element UnitRange{Int64}
→   chain ∈ 4-element UnitRange{Int64}
□   param ∈ 103-element Vector{Symbol}
And data, 1000×4×103 Array{Float64, 3}:
[showing 3 of 103 slices]
[:, :, 1] ~ (:, :, :a):
         (1)          (2)          (3)           (4)
    (1)    0.164875    -0.0316173    0.117778     -0.0587963
    (2)    0.0683526   -0.112191    -0.152213     -0.132654
    (3)   -0.0570758   -0.0428441    0.0723148    -0.0818758
    (4)    0.0736573    0.0684627   -0.153873      0.0619487
      ⋮                                          
  (996)    0.100458    -0.0313996   -0.111339     -0.0978581
  (997)    0.0250855   -0.228102    -0.109103      0.0180321
  (998)   -0.0987535    0.12663      0.00225604   -0.00989224
  (999)   -0.0401901    0.127126    -0.0615762    -0.00764688
 (1000)    0.0292844   -0.0801682   -0.036483     -0.114162

[:, :, 52] ~ (:, :, Symbol("mu.49")):
         (1)          (2)          (3)           (4)
    (1)    0.0642467   -0.12793     -0.00297304   -0.147075
    (2)   -0.0459464   -0.245202    -0.255521     -0.213369
    (3)   -0.159283    -0.141249    -0.0534471    -0.169041
    (4)   -0.0377947   -0.0500161   -0.280926     -0.0721203
      ⋮                                          
  (996)    0.0236626   -0.0940149   -0.223359     -0.20557
  (997)   -0.108117    -0.331829    -0.226459     -0.0873184
  (998)   -0.198844     0.0263513   -0.0773827    -0.123203
  (999)   -0.136195     0.0142719   -0.150853     -0.135881
 (1000)   -0.117577    -0.18169     -0.155621     -0.249585

[:, :, 103] ~ (:, :, Symbol("log_lik.50")):
         (1)         (2)         (3)         (4)
    (1)   -1.00343    -0.793367   -1.05779    -0.863638
    (2)   -0.969577   -0.901926   -0.808977   -0.722824
    (3)   -0.807445   -0.838343   -1.07894    -0.74081
    (4)   -1.04428    -0.991569   -0.875301   -1.1158
      ⋮                                      
  (996)   -0.896697   -0.694199   -0.837646   -0.777261
  (997)   -1.07123    -0.986055   -0.859606   -0.879211
  (998)   -0.769609   -0.864846   -0.633001   -0.816084
  (999)   -0.742747   -0.98111    -0.624656   -1.04728
 (1000)   -1.19845    -0.779751   -0.962009   -0.893048

and:

julia> axiskeys(chns)
(1:1000, 1:4, [:a, :bA, :sigma, 
    Symbol("mu.1"), Symbol("mu.2")  ... Symbol("mu.49)" , Symbol("mu.50"), 
    Symbol("log_lik.1"), Symbol("log_lik.2")  ... Symbol("log_lik.49"), Symbol("log_lik.50")]
)

Using the overloaded matrix() method from the Tables.jl API to extract a vector parameter:

julia> chns_log_lik = matrix(chns, :log_lik)
3-dimensional KeyedArray(NamedDimsArray(...)) with keys:
↓   iteration ∈ 1000-element UnitRange{Int64}
→   chain ∈ 4-element UnitRange{Int64}
□   param ∈ 50-element view(::Vector{Symbol},...)
And data, 1000×4×50 view(::Array{Float64, 3}, :, :, [54, 55   …  102, 103]) with eltype Float64:
[showing 3 of 50 slices]
[:, :, 1] ~ (:, :, Symbol("log_lik.1")):
         (1)        (2)        (3)        (4)
    (1)   -1.70719   -2.21329   -1.82935   -2.08948
    (2)   -1.89324   -2.28388   -2.30598   -2.51493
    (3)   -2.22125   -2.12325   -1.75062   -2.38044
    (4)   -1.72746   -1.89979   -2.24047   -1.81943
      ⋮                                   
  (996)   -1.87282   -2.46479   -2.22433   -2.36482
  (997)   -1.85819   -2.08163   -2.19739   -2.02657
  (998)   -2.34589   -2.09177   -2.59066   -2.38515
  (999)   -2.36031   -1.99972   -2.74304   -1.81838
 (1000)   -1.71913   -2.30865   -1.94024   -2.4405

[:, :, 26] ~ (:, :, Symbol("log_lik.26")):
         (1)         (2)         (3)         (4)
    (1)   -0.910393   -0.741409   -0.811551   -0.845247
    (2)   -0.807438   -0.676429   -0.763983   -0.718875
    (3)   -0.739511   -0.786707   -0.868633   -0.722171
    (4)   -0.937871   -0.795258   -0.734667   -0.809258
      ⋮                                      
  (996)   -0.897659   -0.713518   -0.749929   -0.691888
  (997)   -0.806372   -0.966478   -0.747799   -0.782096
  (998)   -0.717109   -0.73475    -0.625633   -0.639031
  (999)   -0.687475   -0.754141   -0.589675   -0.872287
 (1000)   -0.862781   -0.717588   -0.834025   -0.617522

[:, :, 50] ~ (:, :, Symbol("log_lik.50")):
         (1)         (2)         (3)         (4)
    (1)   -1.00343    -0.793367   -1.05779    -0.863638
    (2)   -0.969577   -0.901926   -0.808977   -0.722824
    (3)   -0.807445   -0.838343   -1.07894    -0.74081
    (4)   -1.04428    -0.991569   -0.875301   -1.1158
      ⋮                                      
  (996)   -0.896697   -0.694199   -0.837646   -0.777261
  (997)   -1.07123    -0.986055   -0.859606   -0.879211
  (998)   -0.769609   -0.864846   -0.633001   -0.816084
  (999)   -0.742747   -0.98111    -0.624656   -1.04728
 (1000)   -1.19845    -0.779751   -0.962009   -0.893048

Other commonly used methods are DataFrame(chains), concatenation of chains, etc.

  1. SR4 will decouple plot choices from SR4 (e.g. I plan to introduce separate packages StatisticalRethinking(Stats?)Plots and StatisticalRethinkingMakie).

  2. All Stan and Turing specific parts will be handled in the corresponding @require sections (currently I envisage Turing, StanSample, StanQuap, DiffEqBayesStan, MCMCChains and LogDensityProblems sections). See e.g. see issue, where the conversion to a DataFrame requires MCMCChains.jl.

  3. MCMCChains will be dropped as a standard dependency of SR4 (it has StatsPlots as a dependency). If required it needs to be provided by the (project) environment.

  4. I'm considering to drop the heavy use of @reexport in Sr4, e.g. see isssue

  5. I'm considering to produce all output results as a KeyedArray, e.g. see an example. This will in some places also affect packages in StanJulia, e.g. StanOptimize, StanQuap and DiffEqBayesStan. These will be updated accordingly and all brought to what I consider v4 level. StanSample.jl is pretty much done and by default returns a KeyedArray chain as shown above.

  6. A lot of documentation updates!! Introductory docs for StatisticalRethinking.jl and Stan.jl on Github, on-line docs for functions in all sub packages.

Version 0.9.0 without bounds

Version 0.9.0 (which weirdly doesn't have a tag, by the way, I wasn't even aware that's possible) does not seem to have any bounds on its dependencies. This means whenever compat doesn't work out perfectly it installs this version instead.
I learned this when I added StatisticalRethinking and instead of adding 2.2.3 and downgrading StatsBase from 0.33.0 to 0.32.2 I got StatisticalRethinking 0.9.0.

I'm not sure if that's fixable (I'm really not a Pkg guru) but if it is it would be great.

Sidenotes:
I'm not quite sure why the current version can't be installed, I suspect this open compat PR on Clustering.

When I installed 0.9.0 I also stumbled upon tpapp/DynamicHMC.jl#128 which is (I think) why this broke things for @ym-han over on Zulip.

Switching to ParetoSmooth.jl

Although at some point I will proceed in replacing the inner parts of ParetoSmoothedImportanceSampling.jl and/or StatsModelComparisons.jl over to ParetoSmooth.jl, for now I have upgraded the first package to Julia 1.7.

ParetoSmooth.jl currently does not play nice with Pluto, produces a warning on my system somewhere in loop vectorization code and also the resulting structs do not line up well with what StatisticalRethinking.jl needs (the latter 2 definitely can be fixed, but it is not currently my top priority).

Note: ParetoSmooth.jl is MIT licensed, ParetoSmoothedImportanceSampling.jl is GPL licensed (as most of StatisticalRethinkingJulia is).

Problem with Chains to DataFrame conversion

Hello!

I found the issue with dataframe conversion method defined in df.jl.
Resulting dataframe might have columns in the wrong order in comparison to the original chain.

How to reproduce:

using Turing
using Distributions
using StatisticalRethinking

@model function model(y)
    μ ~ Normal(5, 1)
    σ ~ Uniform(0, 1)
    b ~ Uniform(0, 1)
    y ~ Normal(μ, σ) + b
end

chain = sample(model(10), NUTS(), 1000)
display(describe(ch2)[1]);

first(DataFrame(ch2), 10)

The output from the chain is following

Summary Statistics
  parameters      mean       std   naive_se      mcse        ess      rhat   e ⋯
      Symbol   Float64   Float64    Float64   Float64    Float64   Float64     ⋯

           μ    7.5558    0.7538     0.0238    0.0299   449.4640    1.0019     ⋯
           σ    0.8266    0.1614     0.0051    0.0095   335.4502    1.0006     ⋯
           b    0.6971    0.2339     0.0074    0.0113   549.3266    1.0028     ⋯

But the resulting dataframe is
image

So, parameters b and μ are swapped.

Without importing StatisticalRethinking package, everything works as expected.
Why do we need this conversion at all? Turing provides Table.jl interface, which does the conversion without any extra methods.

Citation

Hi there

Thanks for your work here. How should I cite this package?

Exercise Solutions

Would you be open to pull requests with answers to the solutions at the end of each chapter?

Either add to /chapters/[XX]/exercises/ or create a new repo?

Thinking it could be structured as, e.g. /chapters/02/exercises/2E with the four easy responses in there, etc.

It seems like the promised (end of chapter 1) solutions aren't actually to be found online, and it would also be great to have a Julia version of community-driven answers.

Weird dependency error

Some time ago I start getting the error on initial import of the StatisticalRethinking. It happens only if Turing is imported first:

using Turing
using StatisticalRethinking

The error message is

┌ Warning: Error requiring `MCMCChains` from `StatisticalRethinking`
│   exception = (ErrorException("cannot assign a value to variable MCMCChains.MCMCChains from module StatisticalRethinking"), Union{Ptr{Nothing}, Base.InterpreterIP}[Ptr{Nothing} @0x00007fa4d6c81a7f, Ptr{Nothing} @0x00007fa4d6d101df, Ptr{Nothing} @0x00007fa4d6d31b3d, Ptr{Nothing} @0x00007fa4d6d14968, Ptr{Nothing} @0x00007fa4d6d15297, Base.InterpreterIP in top-level CodeInfo for StatisticalRethinking at statement 0, Ptr{Nothing} @0x00007fa4d6d30b31, Ptr{Nothing} @0x00007fa4d6d32949, Ptr{Nothing} @0x00007fa48e9a7f01, Ptr{Nothing} @0x00007fa48e9a7f2c, Ptr{Nothing} @0x00007fa4d6cf7769, Ptr{Nothing} @0x00007fa4d6d13f15, Ptr{Nothing} @0x00007fa4d6d13bce, Ptr{Nothing} @0x00007fa4d6d14811, Ptr{Nothing} @0x00007fa4d6d14b90, Ptr{Nothing} @0x00007fa4d6d1504a, Base.InterpreterIP in MethodInstance for err(::Any, ::Module, ::String) at statement 2, Ptr{Nothing} @0x00007fa48e9a7e77, Ptr{Nothing} @0x00007fa48e9a7e8c, Ptr{Nothing} @0x00007fa4d6cf7769, Ptr{Nothing} @0x00007fa4d6d13f15, Ptr{Nothing} @0x00007fa4d6d13bce, Ptr{Nothing} @0x00007fa4d6d148a5, Ptr{Nothing} @0x00007fa4d6d14b90, Ptr{Nothing} @0x00007fa4d6d1504a, Base.InterpreterIP in MethodInstance for withpath(::Any, ::String) at statement 10, Ptr{Nothing} @0x00007fa48e9a7df3, Ptr{Nothing} @0x00007fa48e9a7e1c, Ptr{Nothing} @0x00007fa4d6cf7769, Ptr{Nothing} @0x00007fa4d6d13f15, Ptr{Nothing} @0x00007fa4d6d13bce, Ptr{Nothing} @0x00007fa4d6d14811, Ptr{Nothing} @0x00007fa4d6d1504a, Base.InterpreterIP in MethodInstance for listenpkg(::Any, ::Base.PkgId) at statement 3, Ptr{Nothing} @0x00007fa48e9a7475, Ptr{Nothing} @0x00007fa48e9a75fc, Ptr{Nothing} @0x00007fa4d6cf7769, Ptr{Nothing} @0x00007fa4d6d304b5, Ptr{Nothing} @0x00007fa4d6d2617c, Ptr{Nothing} @0x00007fa499a24389, Ptr{Nothing} @0x00007fa499aaf62f, Ptr{Nothing} @0x00007fa499ab18a0, Ptr{Nothing} @0x00007fa499ab3c1d, Ptr{Nothing} @0x00007fa499ac3cec, Ptr{Nothing} @0x00007fa4d6cf7769, Ptr{Nothing} @0x00007fa4d6d2fcd1, Ptr{Nothing} @0x00007fa4d6d31d97, Ptr{Nothing} @0x00007fa4d6d30d7c, Ptr{Nothing} @0x00007fa4d6d32949, Ptr{Nothing} @0x00007fa499a9faec, Ptr{Nothing} @0x00007fa499a9fe89, Ptr{Nothing} @0x00007fa4d6cf7769, Ptr{Nothing} @0x00007fa499a9c9e3, Ptr{Nothing} @0x00007fa4d6cf7769, Ptr{Nothing} @0x00007fa4d6d08546, Ptr{Nothing} @0x00007fa499a6c7ee, Ptr{Nothing} @0x00007fa499a6ca1e, Ptr{Nothing} @0x00007fa499a6ca3c, Ptr{Nothing} @0x00007fa4d6cf7769, Ptr{Nothing} @0x00007fa4d6d18c9a, Ptr{Nothing} @0x0000000000000000])
└ @ Requires /home/shmuma/.julia/packages/Requires/7Ncym/src/require.jl:49

This appears only on initial import, if I restart the cell, it works without warnings and works properly.

If I add Optim import, the message changes:

using Optim
using Turing
using StatisticalRethinking

┌ Warning: Package StatisticalRethinking does not have Optim in its dependencies:
│ - If you have StatisticalRethinking checked out for development and have
│   added Optim as a dependency but haven't updated your primary
│   environment's manifest file, try `Pkg.resolve()`.
│ - Otherwise you may need to report an issue with StatisticalRethinking
│ Loading Optim into StatisticalRethinking from project dependency, future warnings for StatisticalRethinking are suppressed.
└ @ nothing nothing:910
┌ Warning: Error requiring `MCMCChains` from `StatisticalRethinking`
│   exception = (ErrorException("cannot assign a value to variable MCMCChains.MCMCChains from module StatisticalRethinking"), Union{Ptr{Nothing}, Base.InterpreterIP}[Ptr{Nothing} @0x00007f36d181da7f, Ptr{Nothing} @0x00007f36d18ac1df, Ptr{Nothing} @0x00007f36d18cdb3d, Ptr{Nothing} @0x00007f36d18b0968, Ptr{Nothing} @0x00007f36d18b1297, Base.InterpreterIP in top-level CodeInfo for StatisticalRethinking at statement 0, Ptr{Nothing} @0x00007f36d18ccb31, Ptr{Nothing} @0x00007f36d18ce949, Ptr{Nothing} @0x00007f3689566731, Ptr{Nothing} @0x00007f368956675c, Ptr{Nothing} @0x00007f36d1893769, Ptr{Nothing} @0x00007f36d18aff15, Ptr{Nothing} @0x00007f36d18afbce, Ptr{Nothing} @0x00007f36d18b0811, Ptr{Nothing} @0x00007f36d18b0b90, Ptr{Nothing} @0x00007f36d18b104a, Base.InterpreterIP in MethodInstance for err(::Any, ::Module, ::String) at statement 2, Ptr{Nothing} @0x00007f36895666a7, Ptr{Nothing} @0x00007f36895666bc, Ptr{Nothing} @0x00007f36d1893769, Ptr{Nothing} @0x00007f36d18aff15, Ptr{Nothing} @0x00007f36d18afbce, Ptr{Nothing} @0x00007f36d18b08a5, Ptr{Nothing} @0x00007f36d18b0b90, Ptr{Nothing} @0x00007f36d18b104a, Base.InterpreterIP in MethodInstance for withpath(::Any, ::String) at statement 10, Ptr{Nothing} @0x00007f3689566623, Ptr{Nothing} @0x00007f368956664c, Ptr{Nothing} @0x00007f36d1893769, Ptr{Nothing} @0x00007f36d18aff15, Ptr{Nothing} @0x00007f36d18afbce, Ptr{Nothing} @0x00007f36d18b0811, Ptr{Nothing} @0x00007f36d18b104a, Base.InterpreterIP in MethodInstance for listenpkg(::Any, ::Base.PkgId) at statement 3, Ptr{Nothing} @0x00007f3689564225, Ptr{Nothing} @0x00007f36895643ac, Ptr{Nothing} @0x00007f36d1893769, Ptr{Nothing} @0x00007f36d18cc4b5, Ptr{Nothing} @0x00007f36d18c217c, Ptr{Nothing} @0x00007f369453b389, Ptr{Nothing} @0x00007f36945c628f, Ptr{Nothing} @0x00007f36945c8500, Ptr{Nothing} @0x00007f36945ca87d, Ptr{Nothing} @0x00007f36945d077c, Ptr{Nothing} @0x00007f36d1893769, Ptr{Nothing} @0x00007f36d18cbcd1, Ptr{Nothing} @0x00007f36d18cdd97, Ptr{Nothing} @0x00007f36d18ccd7c, Ptr{Nothing} @0x00007f36d18ce949, Ptr{Nothing} @0x00007f36945b6aec, Ptr{Nothing} @0x00007f36945b6e89, Ptr{Nothing} @0x00007f36d1893769, Ptr{Nothing} @0x00007f36945b39e3, Ptr{Nothing} @0x00007f36d1893769, Ptr{Nothing} @0x00007f36d18a4546, Ptr{Nothing} @0x00007f36945837ee, Ptr{Nothing} @0x00007f3694583a1e, Ptr{Nothing} @0x00007f3694583a3c, Ptr{Nothing} @0x00007f36d1893769, Ptr{Nothing} @0x00007f36d18b4c9a, Ptr{Nothing} @0x0000000000000000])
└ @ Requires /home/shmuma/.julia/packages/Requires/7Ncym/src/require.jl:49

My dependencies are

      Status `~/work/rethinking/rethinking-2ed-julia/Project.toml`
  [488c2830] BSplines v0.3.2
  [336ed68f] CSV v0.9.6
  [d56128e0] Dagitty v0.0.1 `https://github.com/Shmuma/Dagitty.jl#main`
  [a93c6f00] DataFrames v1.2.2
  [31c24e10] Distributions v0.25.20
  [38e38edf] GLM v1.5.1
  [7073ff75] IJulia v1.23.2
  [5ab0869b] KernelDensity v0.6.3
  [b964fa9f] LaTeXStrings v1.2.1
  [429524aa] Optim v1.4.1
  [a68b5a21] ParetoSmooth v0.6.6
  [91a5bcdd] Plots v1.22.6
  [49802e3a] ProgressBars v1.4.0
  [2d09df54] StatisticalRethinking v4.3.0 `https://github.com/StatisticalRethinkingJulia/StatisticalRethinking.jl#master`
  [e1a513d0] StatisticalRethinkingPlots v0.9.3 `https://github.com/StatisticalRethinkingJulia/StatisticalRethinkingPlots.jl#master`
  [2913bbd2] StatsBase v0.33.11
  [4c63d2b9] StatsFuns v0.9.12
  [f3b207a7] StatsPlots v0.14.28
  [fce5fe82] Turing v0.18.0

Function precis() is not robust to missing values

Hi!

I found a problem with precis function, which crashes if dataframe contains missing values.
In section 4.5 of the book, "cherry blossoms" dataset is being analized, where 50% of cases contain missing values.

How to reproduce:
You can take the data here: https://github.com/Shmuma/rethinking-2ed-julia/blob/main/data/cherry_blossoms.csv

using CSV
using DataFrames
using StatisticalRethinking

d = DataFrame(CSV.File("cherry_blossoms.csv", missingstring="NA"))
println(first(d, 5))
precis(d)

The result is

10×5 DataFrame
 Row │ year   doy      temp      temp_upper  temp_lower 
     │ Int64  Int64?   Float64?  Float64?    Float64?   
─────┼──────────────────────────────────────────────────
   1 │   801  missing   missing     missing     missing 
   2 │   802  missing   missing     missing     missing 
   3 │   803  missing   missing     missing     missing 
   4 │   804  missing   missing     missing     missing 
   5 │   805  missing   missing     missing     missing 
   6 │   806  missing   missing     missing     missing 
   7 │   807  missing   missing     missing     missing 
   8 │   808  missing   missing     missing     missing 
   9 │   809  missing   missing     missing     missing 
  10 │   810  missing   missing     missing     missing 
ArgumentError: quantiles are undefined in presence of missing values

Stacktrace:
  [1] _quantilesort!(v::Vector{Union{Missing, Int64}}, sorted::Bool, minp::Float64, maxp::Float64)
    @ Statistics /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/Statistics/src/Statistics.jl:959
  [2] #quantile!#52
    @ /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/Statistics/src/Statistics.jl:943 [inlined]
  [3] quantile(v::Vector{Union{Missing, Int64}}, p::Float64; sorted::Bool, alpha::Float64, beta::Float64)
    @ Statistics /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/Statistics/src/Statistics.jl:1052
  [4] quantile(v::Vector{Union{Missing, Int64}}, p::Float64)
    @ Statistics /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/Statistics/src/Statistics.jl:1052
  [5] _broadcast_getindex_evalf
    @ ./broadcast.jl:648 [inlined]
  [6] _broadcast_getindex
    @ ./broadcast.jl:621 [inlined]
  [7] getindex
    @ ./broadcast.jl:575 [inlined]
  [8] copyto_nonleaf!(dest::Vector{Float64}, bc::Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{1}, Tuple{Base.OneTo{Int64}}, typeof(quantile), Tuple{Base.Broadcast.Extruded{Vector{AbstractVector{T} where T}, Tuple{Bool}, Tuple{Int64}}, Float64}}, iter::Base.OneTo{Int64}, state::Int64, count::Int64)
    @ Base.Broadcast ./broadcast.jl:1078
  [9] copy
    @ ./broadcast.jl:930 [inlined]
 [10] materialize
    @ ./broadcast.jl:883 [inlined]
 [11] precis(df::DataFrame; io::IJulia.IJuliaStdio{Base.PipeEndpoint}, digits::Int64, depth::Float64, alpha::Float64)
    @ StatisticalRethinking ~/.julia/packages/StatisticalRethinking/VpDQW/src/precis.jl:26
 [12] precis(df::DataFrame)
    @ StatisticalRethinking ~/.julia/packages/StatisticalRethinking/VpDQW/src/precis.jl:22
 [13] top-level scope
    @ In[23]:3
 [14] eval
    @ ./boot.jl:360 [inlined]
 [15] include_string(mapexpr::typeof(REPL.softscope), mod::Module, code::String, filename::String)
    @ Base ./loading.jl:1116

If I drop rows with missing values, only 787 of 1215 cases will be left. Precis will work, but produce result different from the book

dd = disallowmissing(d[completecases(d),:]);
println(size(d), size(dd))
precis(dd)

The output is

(1215, 5)(787, 5)
┌────────────┬──────────────────────────────────────────────────────────┐
│      param │    mean      std     5.5%     50%    94.5%     histogram │
├────────────┼──────────────────────────────────────────────────────────┤
│       year │  1533.4  291.123  1008.61  1563.0  1935.77  ▁▃▃▅▅▅█████▇ │
│        doy │ 104.921   6.2577     95.0   105.0    115.0      ▁▁▅██▅▂▁ │
│       temp │  6.1004   0.6834      5.1    6.06   7.3662      ▁▅▆█▄▂▁▁ │
│ temp_upper │  6.9376    0.812     5.81     6.8   8.4177      ▂█▆▂▁▁▁▁ │
│ temp_lower │  5.2635   0.7622   4.1923    5.25   6.6277   ▁▁▁▃▆█▅▂▁▁▁ │
└────────────┴──────────────────────────────────────────────────────────┘

Typo?

Likelihood is spelled "likelyhood" in the code of Chapter 2. Happy to fork and send PR.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.