Code Monkey home page Code Monkey logo

Comments (31)

itsdfish avatar itsdfish commented on May 24, 2024 1

No worries. That happens. Nonetheless, thank you for taking the time to code up these hierarchical models. I could not find any examples until Rob directed me to your work.

from statisticalrethinking.jl.

itsdfish avatar itsdfish commented on May 24, 2024

I forgot to mention my system and software specs:

versioninfo()
Julia Version 1.0.3
Commit 099e826241 (2018-12-18 01:34 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Core(TM) i7-4790K CPU @ 4.00GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-6.0.0 (ORCJIT, haswell)
Environment:
  JULIA_CMDSTAN_HOME = /home/dfish/cmdstan-2.17.1
  JULIA_EDITOR = atom -a
  JULIA_NUM_THREADS = 4
[fce5fe82] Turing v0.6.4
[2d09df54] StatisticalRethinking v0.4.0

from statisticalrethinking.jl.

torkar avatar torkar commented on May 24, 2024

You're forgetting the most likely reason, i.e., me messing things up when I translated the models ;)

from statisticalrethinking.jl.

torkar avatar torkar commented on May 24, 2024

from statisticalrethinking.jl.

itsdfish avatar itsdfish commented on May 24, 2024

Will do.

I'm not very familiar with Poisson regression, but your code appears to be equivalent to the Stan model found here. So it's not clear to me why the effective sample size is so low.

from statisticalrethinking.jl.

goedman avatar goedman commented on May 24, 2024

Chris,

I think the results should be something like:

Iterations = 1:1000
Thinning interval = 1
Chains = 1,2,3,4
Samples per chain = 1000

Empirical Posterior Estimates:
                   Mean            SD         Naive SE         MCSE         ESS    
         lp__  928.043391500  2.9801830251 0.047120831017 0.10412981885  819.097589
accept_stat__    0.936485280  0.0890638723 0.001408223469 0.00168386155 1000.000000
   stepsize__    0.054044275  0.0023661368 0.000037411907 0.00037883775   39.009752
  treedepth__    5.814750000  0.4684947271 0.007407552047 0.01877323347  622.775117
 n_leapfrog__   68.612000000 29.6538242614 0.468868130001 0.85627002690 1000.000000
  divergent__    0.000000000  0.0000000000 0.000000000000 0.00000000000         NaN
     energy__ -921.505192750  3.9165026073 0.061925343505 0.12391326621  998.990207
            a    1.101570431  0.7094342566 0.011217140506 0.02253883312  990.744009
           bp    0.260531749  0.0765491451 0.001210348257 0.00239107665 1000.000000
  a_society.1   -0.197773466  0.2306720863 0.003647245927 0.00536358286 1000.000000
  a_society.2    0.044530077  0.2139312867 0.003382550643 0.00518870890 1000.000000
  a_society.3   -0.041954795  0.1845972632 0.002918739008 0.00433489806 1000.000000
  a_society.4    0.326263470  0.1895911420 0.002997699165 0.00557424414 1000.000000
  a_society.5    0.046117209  0.1762603065 0.002786920147 0.00413155547 1000.000000
  a_society.6   -0.310893727  0.2033643874 0.003215473297 0.00550744282 1000.000000
  a_society.7    0.146506828  0.1701311145 0.002690009113 0.00373704400 1000.000000
  a_society.8   -0.164561717  0.1799417945 0.002845129585 0.00416658848 1000.000000
  a_society.9    0.276327443  0.1738077910 0.002748142474 0.00486988593 1000.000000
 a_society.10   -0.089019497  0.2748128829 0.004345173202 0.00812819300 1000.000000
sigma_society    0.304580809  0.1224826068 0.001936620057 0.00369141520 1000.000000

I've added 2 versions of Stan models, the one you posted above and the version generated by McElreath's ulam().

I think Richard is also short changing himself, his models are fine. In general, it might be possible to optimize the Turing performance for these models as has been demonstrated in the original publication on Turing (and in some other examples I've seen). In your case, where you often need to generate the likelihood, that's where you might have opportunities I think.

In your case above, I think there is still an issue with describe() and NUTS() sampling.

from statisticalrethinking.jl.

itsdfish avatar itsdfish commented on May 24, 2024

Hi Rob-

Can you refer me to the example or publication you are referring to? I would like to see if I can figure out an optimization. You also mentioned there might be something wrong with describe() and NUTS(). Do you mean a bug or problem in Turing, or do you mean that the NUTS parameters are poorly selected? Thanks.

from statisticalrethinking.jl.

goedman avatar goedman commented on May 24, 2024

Hi Chris, does this link work? Or http://turing.ml ? At the bottom of that page.

Yes, NUTS samples in Turing in the MCMCChain include the adaptation samples. In the above Turing run only the final 3000 samples should be used.

from statisticalrethinking.jl.

itsdfish avatar itsdfish commented on May 24, 2024

Thanks, Rob. Those links work. I'll see if I can make sense out of the papers.

Ok. Thanks for clarifying that adapt is the same as warmup or burnin. I thought it referred to something else. Other packages such as Stan and Mamba automatically exclude the burnin samples from the chain, but I did not find any info about how to do this. Are you aware of an elegant way to remove the burning samples from the chain? Thanks

from statisticalrethinking.jl.

goedman avatar goedman commented on May 24, 2024

I use something like chn2 = MCMCChain.Chains(chn.value[1001:4000, :,: ], names=chn.names) for 1000 adaptation samples.

from statisticalrethinking.jl.

itsdfish avatar itsdfish commented on May 24, 2024

Ok. That makes sense. I suppose your Stan code does something similar in the background and returns the chain without the burnin samples as you described.

from statisticalrethinking.jl.

itsdfish avatar itsdfish commented on May 24, 2024

Hi Rob-

I have a question about the computation of effective sample size, particularly when multiple chains are used. As far as I can tell based on the code here (line 118), effective sample size appears to be equal to the number of samples in a single chain. I'm not familiar with the details of effective sample size, but this strikes me as odd. It seems to me that it could underestimate effective sample size when samples within each chain are mostly independent. I'm pretty sure this is different than Stan's native calculations. Am I misunderstanding something? In either case, this makes it difficult to compare the Stan version of the model (which has multiple chains) and the Turing version.

from statisticalrethinking.jl.

goedman avatar goedman commented on May 24, 2024

I’ll have to refresh my memory but seem to recall multiple chains are important for the correlation needed to compute the ESS. I’ll have a look. Have you considered WAIC and friends? Another area I need to dig deeper into!

from statisticalrethinking.jl.

itsdfish avatar itsdfish commented on May 24, 2024

Thanks, Rob. That makes sense. Each chain is independent. So that should factor into the calculation somehow. I only have a cursory understanding of WAIC and LOO. From what I can gather, these are useful global model comparison metrics because they account for model flexibility unlike BIC, which simply penalizes based on the number of parameters. I think something like that could be useful.

from statisticalrethinking.jl.

goedman avatar goedman commented on May 24, 2024

The Stan docs section on ESS is well written. For quite a few of the models in SR (StatisticalRethinking), once correct, they are actually pretty close. I always have found the same between Mamba and Stan shared examples.

As far as I understand, LOO and the approach as in SR, clip-38s (derived from an example by Cameron Pfiffer) provide some insight into the predictive behavior of a model. Chapter 5 of SR talks a lot about WAIC and other methods to compare models hence I mentioned it.

from statisticalrethinking.jl.

goedman avatar goedman commented on May 24, 2024

Oh, forgot to mention that quickly scanning the MCMCChain ESS estimate I agree that this seem like a placeholder for now.

from statisticalrethinking.jl.

goedman avatar goedman commented on May 24, 2024

Chris, looking at a slightly more complicated model I see below results. These estimates of ESS seem pretty reasonable/comparable. Note also Richard Torkar's note in m8.1 in SR:

# Here we will use Turing's NUTS support, which is currently (2018) the originalNUTS by [Hoffman & Gelman]( http://www.stat.columbia.edu/~gelman/research/published/nuts.pdf) and not the one that's in Stan 2.18.2, i.e., Appendix A.5 in: https://arxiv.org/abs/1701.02434

The line 118 you mention above retrieves the ESS values from Turing.

Turing m8.1 in SR ( I have updated m8.1 to do 500 warmups, 500 samples, as in R's rethinking):

julia> describe(posterior2)
Iterations = 1:500
Thinning interval = 1
Chains = 1
Samples per chain = 500

Empirical Posterior Estimates:
              Mean                   SD                       Naive SE              MCSE        ESS   
       α    9.236001727  0.139979858032590220862090 0.0062600895608328340977144 0.0072621843 371.53229
  lf_num    0.000000000  0.000000000000000000000000 0.0000000000000000000000000 0.0000000000       NaN
      βA   -1.959147343  0.227559963323169772708710 0.0101767909389593319557932 0.0150545723 228.48351
      βR   -0.207507225  0.076837852113119081698400 0.0034362932114002028849420 0.0045783692 281.66236
       σ    0.948133574  0.056199375996908093600091 0.0025133125004431302630670 0.0014341875 500.00000
 elapsed    0.118077238  0.064602519801849694891516 0.0028891125158942432271658 0.0031892969 410.30708
 epsilon    0.027768298  0.000000000000000013891686 0.0000000000000000006212551 0.0000000000 500.00000
eval_num   45.760000000 24.580944768079657336556920 1.0992932690518784077937653 1.0886321693 500.00000
     βAR    0.401405080  0.123608980277352148435810 0.0055279616505918040270107 0.0063330853 380.95123
      lp -248.501695452  1.606431364589359755967735 0.0718417946481911401379605 0.1464664013 120.29521
  lf_eps    0.027768298  0.000000000000000013891686 0.0000000000000000006212551 0.0000000000 500.00000

For Stan I see:

precis(m8.1stan)
       mean   sd  5.5% 94.5% n_eff Rhat
a      9.22 0.13  9.01  9.44   272 1.01
bR    -0.20 0.08 -0.32 -0.08   287 1.01
bA    -1.94 0.22 -2.28 -1.60   270 1.01
bAR    0.39 0.13  0.18  0.58   283 1.00
sigma  0.95 0.05  0.87  1.03   458 1.00

The equivalent Stan model (generated in ulam()):

> stancode(m8.1stan)
data{
    real log_gdp[170];
    int cont_africa[170];
    real rugged[170];
}
parameters{
    real a;
    real bR;
    real bA;
    real bAR;
    real sigma;
}
model{
    vector[170] mu;
    sigma ~ cauchy( 0 , 2 );
    bAR ~ normal( 0 , 10 );
    bA ~ normal( 0 , 10 );
    bR ~ normal( 0 , 10 );
    a ~ normal( 0 , 100 );
    for ( i in 1:170 ) {
        mu[i] = a + bR * rugged[i] + bA * cont_africa[i] + bAR * rugged[i] * cont_africa[i];
    }
    log_gdp ~ normal( mu , sigma );
}

from statisticalrethinking.jl.

itsdfish avatar itsdfish commented on May 24, 2024

Hi Rob-

Thanks for the information. I just want to clarify something about ESS since its a bit out of my domain. Do you think it is computing ESS correctly for multiple chains or do you think its provisional or a placeholder?

Based on your post above, it looks like Turing and Stan are comparable for m8.1. However, do you think that m12.6 is still problematic? It seems low to me (ESS ≈ 100 out of 4000 samples). I cannot use Stan as a reference point because of some issues with StatisticalRethinking.jl. I will post a separate issue for that.

from statisticalrethinking.jl.

goedman avatar goedman commented on May 24, 2024

All parameters look ok. So I tend to trust those. The ESS values are indeed wildly different.

Turing:

[NUTS] Finished with
  Running time        = 202.84388444799984;
  #lf / sample        = 0.00325;
  #evals / sample     = 188.8355;
  pre-cond. metric    = [1.0, 1.0, 1.0, 1.0, 1.0, 1.0,....
Iterations = 1:4000
Thinning interval = 1
Chains = 1
Samples per chain = 4000

Empirical Posterior Estimates:
                   Mean           SD          Naive SE         MCSE          ESS    
            α   1.004089220   1.2021757984 0.01900806835  0.172012176882   48.844715
 α_society[9]   0.314035152   0.2962339976 0.00468387076  0.037324092782   62.992853
       lf_num   0.003250000   0.2055480479 0.00325000000  0.003250000000 4000.000000
 α_society[8]  -0.157876771   0.2677581639 0.00423362830  0.029809711153   80.680746
      elapsed   0.050710971   0.0762050853 0.00120490819  0.001355418667 3160.974426
      epsilon   0.004408781   0.0014156289 0.00002238306  0.000060675340  544.345235
 α_society[3]  -0.005504786   0.2712899685 0.00428947103  0.030908583257   77.038758
 α_society[1]  -0.188267215   0.3305963408 0.00522718712  0.038080326319   75.369349
α_society[10]  -0.087416257   0.5682651300 0.00898506063  0.079245853626   51.421995
 α_society[5]   0.074527668   0.2654575076 0.00419725173  0.030766726238   74.443614
     eval_num 188.835500000  13.5055003382 0.21354071004  0.826825404786  266.804857
 α_society[7]   0.174685320   0.2564394570 0.00405466383  0.026811173121   91.482508
    σ_society   0.397327646   0.3571602505 0.00564719941  0.046690337250   58.515675
 α_society[2]   0.095448172   0.3248486834 0.00513630867  0.039157069186   68.824310
 α_society[6]  -0.296009482   0.3331305547 0.00526725655  0.038717492136   74.031150
           βp   0.261908634   0.1764018894 0.00278915877  0.022422130828   61.894591
           lp -59.368315214 307.1271096473 4.85610598835 19.546096707429  246.897231
 α_society[4]   0.373266952   0.2832887740 0.00447918881  0.035566545669   63.441779
       lf_eps   0.004408781   0.0014156289 0.00002238306  0.000060675340  544.345235

vs. Stan:

Iterations = 1:1000
Thinning interval = 1
Chains = 1,2,3,4
Samples per chain = 1000

Empirical Posterior Estimates:
                   Mean            SD         Naive SE       MCSE         ESS    
         lp__  927.902390500  3.1524718081 0.04984495586 0.1116004144  797.941050
accept_stat__    0.932014939  0.0929430048 0.00146955794 0.0028291129 1000.000000
   stepsize__    0.051629625  0.0048496173 0.00007667918 0.0007764632   39.009752
  treedepth__    5.898500000  0.4327176460 0.00684186673 0.0161227264  720.331226
 n_leapfrog__   72.140000000 31.9035242366 0.50443900987 1.1058622347  832.291130
  divergent__    0.000000000  0.0000000000 0.00000000000 0.0000000000         NaN
     energy__ -921.434353000  4.0427745607 0.06392187839 0.1355470971  889.566883
            a    1.106856487  0.7239294856 0.01144633020 0.0199497764 1000.000000
           bp    0.260403109  0.0781991382 0.00123643694 0.0022459240 1000.000000
  a_society.1   -0.205620510  0.2471820296 0.00390829105 0.0046528145 1000.000000
  a_society.2    0.040094892  0.2189659306 0.00346215535 0.0051518934 1000.000000
  a_society.3   -0.043773446  0.1958565531 0.00309676401 0.0046023241 1000.000000
  a_society.4    0.327232229  0.1905474924 0.00301282039 0.0043900470 1000.000000
  a_society.5    0.039594977  0.1783490834 0.00281994661 0.0035126859 1000.000000
  a_society.6   -0.321556290  0.2083186557 0.00329380716 0.0046689237 1000.000000
  a_society.7    0.145628976  0.1741910386 0.00275420215 0.0037959779 1000.000000
  a_society.8   -0.172159660  0.1851656094 0.00292772535 0.0048882419 1000.000000
  a_society.9    0.275060569  0.1740551069 0.00275205288 0.0044513545 1000.000000
 a_society.10   -0.092800500  0.2826756400 0.00446949431 0.0080161625 1000.000000
sigma_society    0.311313021  0.1294821470 0.00204729250 0.0041120504  991.523286

from statisticalrethinking.jl.

itsdfish avatar itsdfish commented on May 24, 2024

I noticed that the standard deviations are much larger for the Turing model. Here is a comparison for the group level parameters for illustration:

bp:                        .078  vs.  .176
a:                         .732  vs.  1.20
sigma_society:    .129  vs  .357

Something still appears to be wrong, but the cause still alludes me.

from statisticalrethinking.jl.

goedman avatar goedman commented on May 24, 2024

You’re right. I also see very different ESS values using rethinking’s m12.6. I need to try a few longer CmdStan runs.

from statisticalrethinking.jl.

itsdfish avatar itsdfish commented on May 24, 2024

I removed the 1000 adaption samples from the Turing model in order to remove their influence on the results. Also reduced the target acceptance rate in case it was exploring the posterior distribution poorly. Here is what I found:

target acceptance: .95

[NUTS] Sampling...100% Time: 0:03:12
[NUTS] Finished with
  Running time        = 191.70206823999996;
  #lf / sample        = 0.00325;
  #evals / sample     = 188.65775;
  pre-cond. metric    = [1.0, 1.0, 1.0, 1.0, 1.0, 1.0,....
Iterations = 1:3000
Thinning interval = 1
Chains = 1
Samples per chain = 3000

Empirical Posterior Estimates:

                   Mean                 SD                        Naive SE               MCSE         ESS    
            α   0.955611675 0.801634334337607890041966 0.014635773592911533597105134 0.1107388075   52.402612
 α_society[9]   0.286028801 0.183044089778219759567790 0.003341912566317725621217738 0.0146428337  156.264801
       lf_num   0.000000000 0.000000000000000000000000 0.000000000000000000000000000 0.0000000000         NaN
 α_society[8]  -0.166639868 0.187179102369929323268494 0.003417407222052633135322974 0.0127429559  215.761702
      elapsed   0.048227234 0.004355363856269956965295 0.000079517701007391448767871 0.0001553981  785.520417
      epsilon   0.004825649 0.000000000000000003470025 0.000000000000000000063353705 0.0000000000 3000.000000
 α_society[3]  -0.035785666 0.193764587794183457125641 0.003537641186018815157854345 0.0121440603  254.578070
 α_society[1]  -0.183159532 0.247343255556180280363421 0.004515849350496164808788091 0.0182076812  184.540149
α_society[10]  -0.126572347 0.323704385147619511098327 0.005910006456956382221712420 0.0430817967   56.455928
 α_society[5]   0.056918914 0.177181710159165506368595 0.003234880647717240393507199 0.0110812576  255.658172
     eval_num 189.812000000 4.205062447026735483746052 0.076773585265147187284284769 0.1614776594  678.141009
 α_society[7]   0.163834551 0.177038217340657666465376 0.003232260839266015445964131 0.0112609137  247.164898
    σ_society   0.325944709 0.129135298073802834428747 0.002357677190839174621167507 0.0120228296  115.365661
 α_society[2]   0.077992902 0.219588506331196392507010 0.004009119276215408408936547 0.0183418367  143.328833
 α_society[6]  -0.307410240 0.209517337158618810466493 0.003825245725006362357972778 0.0137775619  231.257162
           βp   0.275186134 0.088409607759201452870101 0.001614131215663279755173631 0.0123424606   51.309222
           lp -39.104524881 3.107350230374036126335113 0.056732193841491133923238266 0.2402903520  167.227518
 α_society[4]   0.350421754 0.190805347326677005037610 0.003483613094114301408921186 0.0139371449  187.427562
       lf_eps   0.004825649 0.000000000000000003470025 0.000000000000000000063353705 0.0000000000 3000.000000


target acceptance: .85

[NUTS] Sampling...100% Time: 0:03:09
[NUTS] Finished with
  Running time        = 189.14339445700003;
  #lf / sample        = 0.00425;
  #evals / sample     = 186.443;
  pre-cond. metric    = [1.0, 1.0, 1.0, 1.0, 1.0, 1.0,....
Iterations = 1:3000
Thinning interval = 1
Chains = 1
Samples per chain = 3000

Empirical Posterior Estimates:
                   Mean                   SD                        Naive SE                       MCSE                 ESS   
            α   0.8134787273 0.7108561634839954956888164 0.012978398529392150273498174 0.09870334012745726748327968   51.86803
 α_society[9]   0.2926361406 0.1932001188635565080975454 0.003527335440474975002816960 0.01444417530174513936147562  178.90780
       lf_num   0.0000000000 0.0000000000000000000000000 0.000000000000000000000000000 0.00000000000000000000000000        NaN
 α_society[8]  -0.1590971061 0.1967833141053074996751349 0.003592755335870048347124417 0.01316524637746332125620707  223.41821
      elapsed   0.0483080565 0.0042024196058919943350118 0.000076725333808300508068104 0.00014354289387865013763630  857.10777
      epsilon   0.0045317049 0.0000000000000000017350127 0.000000000000000000031676853 0.00000000000000000032213006   29.00967
 α_society[3]  -0.0067304782 0.2090444577254731983728675 0.003816612167255891904793508 0.01757726410466757538397786  141.44082
 α_society[1]  -0.1556434701 0.2309705730640927157093500 0.004216926432903289785614209 0.01828831932067467858082921  159.50185
α_society[10]  -0.1601049601 0.2953917351050680539792381 0.005393090553921215009325341 0.03187889692910577527085181   85.85985
 α_society[5]   0.0798737772 0.2023645587702860015877349 0.003694654455935517979592309 0.01595521579849443552512334  160.86573
     eval_num 189.9480000000 2.6381083889706475886782755 0.048165049126094559850397303 0.04802872703590684022589841 3000.00000
 α_society[7]   0.1770457609 0.2012846923210417360294855 0.003674938882157381420806574 0.01570978361751502669352298  164.16520
    σ_society   0.3391237501 0.1384958578410105112244111 0.002528576848685006213390825 0.01229016688522075774481213  126.98663
 α_society[2]   0.1178371570 0.2373396707365764413122378 0.004333209715109055497339607 0.02248878835938617273826345  111.38034
 α_society[6]  -0.3173406748 0.2123688058028609937366582 0.003877306181622033019740226 0.01472276552064881796622053  208.06674
           βp   0.2895390438 0.0764195787805128229353002 0.001395224237771000401486798 0.01027541364087441837227299   55.31089
           lp -39.5093136047 3.1589412559542640046572615 0.057674112790661702376837638 0.25098154520111654042224814  158.41617
 α_society[4]   0.3901159990 0.2133900603185035771414135 0.003895951652794414372255050 0.01942082646812128346458515  120.72938
       lf_eps   0.0045317049 0.0000000000000000017350127 0.000000000000000000031676853 0.00000000000000000032213006   29.00967

Removing the adaption samples seems to have resolved the problem with the inflated standard deviations. (I can provide my functions for removing burnin/adaption samples if that would be useful) Unfortunately, changing the target acceptance rate did not improve the ESS. So I think there is still something wrong, and this could contribute, in part, to the slow run times. Here are some possibilities:

  1. An error in the ESS calculation
  2. An error somewhere in the NUTS sampler
  3. An optimization in the NUTS sampler than is not present in the Turing version
  4. Stan's reparameterization of the Poisson changed the geometry of the posterior, making sampling more efficient.
  5. Something else I have not thought of

from statisticalrethinking.jl.

itsdfish avatar itsdfish commented on May 24, 2024

I also wanted to look at the gelman rhat statistics. I ran the Turing version with acceptance rate of .95, four chains, 4000 samples each, with 1000 adaption samples removed.

Iterations = 1:2000
Thinning interval = 1
Chains = 1,2,3,4
Samples per chain = 2000

                    PSRF           97.5%    
            α   1.0570000×10⁰  1.1500000×10⁰
 α_society[9]   1.0050000×10⁰  1.0120000×10⁰
       lf_num             NaN            NaN
 α_society[8]   1.0020000×10⁰  1.0040000×10⁰
      elapsed   1.0140000×10⁰  1.0420000×10⁰
      epsilon 2.13076927×10¹⁴ 4.3694579×10¹⁴
 α_society[3]   1.0110000×10⁰  1.0320000×10⁰
 α_society[1]   1.0280000×10⁰  1.0800000×10⁰
α_society[10]   1.0230000×10⁰  1.0620000×10⁰
 α_society[5]   1.0050000×10⁰  1.0150000×10⁰
     eval_num   1.1520000×10⁰  1.1540000×10⁰
 α_society[7]   1.0060000×10⁰  1.0170000×10⁰
    σ_society   1.0070000×10⁰  1.0150000×10⁰
 α_society[2]   1.0210000×10⁰  1.0590000×10⁰
 α_society[6]   1.0020000×10⁰  1.0050000×10⁰
           βp   1.0530000×10⁰  1.1380000×10⁰
           lp   1.0040000×10⁰  1.0110000×10⁰
 α_society[4]   1.0070000×10⁰  1.0180000×10⁰
       lf_eps 2.13076927×10¹⁴ 4.3694579×10¹⁴

Convergence for some parameters looks good, while others like βp and α are somewhat large.

from statisticalrethinking.jl.

itsdfish avatar itsdfish commented on May 24, 2024

By the way, if the speed problem persists after we figure out the ESS issue, I was thinking about opening an issue on Turing. Currently, with four chains, 4000 samples, 1000 burnin samples, and target acceptance rate of .95, Turing requires 196 seconds whereas Stan only requires 2.18 seconds, a gap of nearly 2 orders of magnitude. Supposedly, Julia is capable of being within a factor of 2 of C. It makes me wonder whether there are still optimizations to be made in Turing or whether Julia does not perform well outside of micro-benchmarks.

from statisticalrethinking.jl.

goedman avatar goedman commented on May 24, 2024

Naively I have always assumed part of the extra time needed by Turing might be AD related. In another thread I did see a post by Tamas Papp where he mentioned that for large models he still codes derivatives manually. And Richard's remark on differences between Turing's NUTS and what's in stan currently.

from statisticalrethinking.jl.

itsdfish avatar itsdfish commented on May 24, 2024

Thanks for the info. I'll see if I can track down those posts. I was under the impression that Stan uses AD. Perhaps its better optimized or it uses explicit derivative information for typical models.

from statisticalrethinking.jl.

itsdfish avatar itsdfish commented on May 24, 2024

I came across this pull request for Turing where a model from this package was producing low ESS. I'm not sure if its related to the problem described here.

from statisticalrethinking.jl.

goedman avatar goedman commented on May 24, 2024

Hi Chris, interesting, I think we should bring the low ESS values for m12.6 up on Turing together with above link. This is not something we (I at least) can add a lot of value.

from statisticalrethinking.jl.

itsdfish avatar itsdfish commented on May 24, 2024

Sounds good. I'll go ahead and make a reference to this issue on the Turing issue. Thanks!

from statisticalrethinking.jl.

itsdfish avatar itsdfish commented on May 24, 2024

Rob, I will close this issue since its most likely due to the adaption bug in Turing.

from statisticalrethinking.jl.

goedman avatar goedman commented on May 24, 2024

Thanks Chris, it was also getting pretty long (but useful for me).

from statisticalrethinking.jl.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.