Code Monkey home page Code Monkey logo

mgcviz's People

Contributors

mfasiolo avatar ndevln avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

mgcviz's Issues

l_densCheck should use scale_fill_gradient2

Arguably, dFun is on a divergent scale with a natural midpoint at 0, so it should use a divergent color scale with fixed midpoint = 0 so that a divergence of 0 is visually identical across models, the border beween under- and over-represented residual values is more vivid and the colors corresponding to "no relevant divergence between observed and expected density" cannot change depending on the extremes of the observed distances.

Thx for the great package. Just playing around with settings and options and adding my impressions :)

Confidence bands for 2D contours

Is it possible to produce a 2D contour map of a 3D surface with additional contours representing the error in estimation? For an example in base R, see below, which uses the predict.gam() function in package mgcv...

image

Improve recursion efficiency in check1D and check2D

When check1D is called with a list as argument (e.g. check1D(o, list("x1", "x2", "x3"))) it is then internally called recursively (only one level of recursion) and a plotGam object is returned containing a list 3 ggplots.

However, the residuals are recomputed 3 times (in this case), hence it would be be better to compute them once, and then call the rest of the code. Same for check2D, where this could be particularly important if we are checking all pairs of variables.

Problem with running vignette

Hi

library(mgcViz)
n  <- 1e3
x1 <- rnorm(n)
x2 <- rnorm(n)
dat <- data.frame("x1" = x1, "x2" = x2,
                  "y" = sin(x1) + 0.5 * x2^2 + pmax(x2, 0.2) * rnorm(n))
b <- gam(y ~ s(x1, x2), data = dat, method = "REML")
b <- getViz(b)
plot(sm(b, 1)) + l_fitRaster() + l_fitContour() + l_points()

The plot returns an error :

Error in UseMethod("rescale") : 
  pas de méthode pour 'rescale' applicable pour un objet de classe "AsIs"

Here is my session Info (I suspect a package version problem):

> sessionInfo()
R version 3.4.1 (2017-06-30)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Sierra 10.12.6

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib

locale:
[1] fr_FR.UTF-8/fr_FR.UTF-8/fr_FR.UTF-8/C/fr_FR.UTF-8/fr_FR.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] mgcViz_0.0.1  rgl_0.98.1    ggplot2_2.2.1 mgcv_1.8-22   nlme_3.1-131 

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.13       compiler_3.4.1     RColorBrewer_1.1-2 plyr_1.8.4         bindr_0.1         
 [6] viridis_0.4.0      tools_3.4.1        mvnfast_0.2.2      digest_0.6.12      viridisLite_0.2.0 
[11] jsonlite_1.5       tibble_1.3.4       gtable_0.2.0       lattice_0.20-35    pkgconfig_2.0.1   
[16] rlang_0.1.2        Matrix_1.2-10      rstudioapi_0.7     shiny_1.0.5        GGally_1.3.2      
[21] yaml_2.1.14        bindrcpp_0.2       gridExtra_2.3      httr_1.3.1         dplyr_0.7.4       
[26] knitr_1.16         htmlwidgets_0.9    grid_3.4.1         glue_1.1.1         reshape_0.8.7     
[31] data.table_1.10.4  R6_2.2.2           plotly_4.7.1       tidyr_0.7.1        purrr_0.2.2.2     
[36] magrittr_1.5       matrixStats_0.52.2 scales_0.5.0       htmltools_0.3.6    assertthat_0.2.0  
[41] mime_0.5           xtable_1.8-2       colorspace_1.3-2   httpuv_1.3.5       KernSmooth_2.23-15
[46] miniUI_0.1.1       lazyeval_0.2.0     munsell_0.4.3     
``

Re-factor check.mgcv.smooth.2D and check.mgcv.smooth.MD

The idea seems to be that the check.something methods are just wrappers for more specific plots. See also this issue. Hence here it would make sense to extract a gridCheck2D function from check.mgcv.smooth.2D.

Currently the problem with check.mgcv.smooth.1D and 2D is that they really check what's happening along a 1 or 2 covariates, the do not really care about the specific smooth considered.

Polishing qq.gam

Outstanding issues:

  1. When method=="simul2", do we really need rank()? Can we just use 1:n? Especially given that
    rank() will miss unobserved levels of discrete variables.

  2. When method=="simul1" we use .quBySort() to get the quantiles by sorting. Check how the
    quantiles are being estimated from the order statistics. Same when we calculate the quantiles
    using rowOrderStats().

  3. Provide gamlss and extended families with CDF function, otherwise we won't be able to use
    method=="tnorm" or "tunif" with them! Do this by extending the fix.family.cdf() function.

l_densCheck's default dFun should use the model's scale not the residual's standard deviation

Not sure how often this comes up if you don't set the scale manually but it seems dangerous to me:

library(mgcv)
library(mgcViz)
n <- 1e3
x1 <- rnorm(n)
x2 <- rnorm(n)
dat <- data.frame("x1" = x1, 
                  "x2" = x2, 
                  "y" = rnorm(n, sin(x1) + x2^2, sd = 1))
# fit with much too small scale:
b_scaled <- gam(y ~ s(x1) + s(x2), data=dat, scale = .1)
qq.gam(b_scaled)

check1D(getViz(b_scaled), x = "x1") + l_densCheck() + scale_fill_gradient2()
#> Scale for 'fill' is already present. Adding another scale for 'fill',
#> which will replace the existing scale.

# looks ok (unstructured wiggles), but we *know* the model variance is much too small:

# Comparing with Gaussian density with model's scale, *not* empirical sd of residuals:
fasiolo_dist <- function(.ed, .gr, .y) {
  td <- dnorm(.gr, 0, sd = .1)
  sign(.ed - td) * (abs(sqrt(.ed) - sqrt(td)))^1/3
}

check1D(getViz(b_scaled), x = "x1") + l_densCheck(dFun = fasiolo_dist) + 
  scale_fill_gradient2()
#> Scale for 'fill' is already present. Adding another scale for 'fill',
#> which will replace the existing scale.

# looks as it should: shows underdispersion of residuals

Created on 2018-09-11 by the reprex package (v0.2.0).

plot.mgcv.smooth.1d conditional density artefacts

Look at the left of the plot. Strangely this does not seem to happen for different seeds.

set.seed(414)
library(mgcViz)
n  <- 1e6
x1 <- rnorm(n)
x2 <- rnorm(n)
dat <- data.frame("x1" = x1, "x2" = x2,
                  "y" = sin(x1) + 0.5 * x2^2 + pmax(x2, 0.2) * rnorm(n))
b <- bam(y ~ s(x1)+s(x2), data = dat, method = "fREML", discrete = TRUE)
v <- getViz(b)
plot(v(2), resDen = "cond", alpha.rug = 0.05, maxpo = 1e4)

Re-write bin1D() in Rcpp

The double loop would definitely go much faster in Rcpp, and this should not be too hard to do.

Getting rid of the `scheme` option

This scheme option is just a pain at the moment, I would propose to get rid of it entirely.
In particular, I think we need to remove it from

  • plot.mgcv.smooth.2D
  • plot.mgcv.smooth.MD
  • plot.sos.smooth

and maybe more.

In plot.mgcv.smooth.2D the persp or scheme == 2 option could become a separate function,
while the scheme = 0 with se = TRUE can be achieved by not calling geom_raster and by adding
the contour lines for the confidence intervals. We can add options to do that. scheme==3 is like
what we already have (scheme=2) but in black and white. We can modify the colour scale afterwards,
so do not need for this scheme.

Scalable version of check.gam

Currently the two scatter plots produced by check.gam do not handle large data sets. It would be useful to implement them via hexagonal binning, at least when the number of observations is, say, > 5e4.

Add posterior checking tools

Add a postCheck function, that simulated responses from the posterior and summarizes them using a scalar valued function. The full data can be summarized as well, and possibly bootstrapped. Then we can add some layers such as histograms or density plots.

Add animations for visualization of effect uncertainty

Section 4 of the paper "Graphics for uncertainty" from Bowman explain who to visualise the uncertainty of the fitted effects by simulating curves from the posterior, interpolating and then plotting them using animations.

This could clearly be implemented in mgcViz, probably using gganimate.

Add function to get smooth effect plots on response scale

At the moment we need to use the trans function to plot on response scale. This can be quite laborious. Maybe it is possible to do something automatically, possibly using the derivatives of the inverse link to transform the standard errors as well. Actually it might be possible to do this for general transformations, with transformation to response scale being a special case.

Example:

#######
# Simulate some data
library(mgcViz)
n  <- 2e3
x1 <- rnorm(n)
x2 <- rnorm(n)
x3 <- rnorm(n)
dat <- data.frame("x1" = x1, "x2" = x2, "x3" = x3,
                  "y" = abs( 0.1 + sin(x1) + 0.5 * x2^2 + 0.2*x3 + 0.2*rnorm(n) ))
b <- gamV(y ~ s(x1, x2) + s(x3), data = dat, family = Gamma(link=log))

# Plot s(x1, x2) on link scale
plot(sm(b, 1))

# Plot s(x1, x2) on response scale
plot(sm(b, 1), 
     trans = function(x){ 
       b$family$linkinv(coef(b)[1] + x)
       })

# We added the intercept and inverted the link function (if you use the identity link this is not necessary).
# Also, above s(x3) is left to zero, so we are conditioning on x3 being approximately 0 as can be seen from plot(sm(b, 2))

# To condition on a different value of x3 we must use predict.
# Here s3 is the value of s(x3) at x3 = 2 (the values of x1 and x2 do not matter here)
s3 <- predict(b, newdata = data.frame(x1 = 0, x2 = 0, x3 = 2), type = "terms")[2]

# Plotting on response scale
plot(sm(b, 1), 
     trans = function(x){
       b$family$linkinv(coef(b)[1] + s3 + x)
       })

Add wormplot option to qq.gam

To get a wormplot, it should be sufficient to subtract Dq (i.e. the x axis) from D, conf and df in .plot.qq.gam().

Allow bare variable names in l_XXX calls

Thx for this great package!

I think it would be even more awesome if I didn't have to "quote" all my "variables".
check1D(v, x = x3) 👍 , check1D(v, x = "x3") 👎

Add simulation-based approach to cdCheck

cdCheck allows to compare the residuals distribution only with uniform or Gaussian densities. We can cover model residuals types (e.g. Pearson) by simulating residuals from the model and use them to estimate the model-based residual density.

Notice that this might result in fairly noisy density estimates in the tails of the x distributions. It would be nice to be able to pass weights to simulate.gam, so that we could simulate more responses where we have little data. This could be achieved by re-sampling the elements of mu, wt and scale (the arguments of family$rd).

Create shiny version of qq.gam then enables zooming

It would be quite easy to create an app where by drawing a rectangle on the plot (basic shiny feature) we get the corresponding xlim and ylim, which then get passed to zoo.qqGam(), which does the binning and the plotting.

add horizontal line

Hi, is there a way to add a horizontal line to each graph in print(plot(b, allTerms = T), pages = 1) ?
I would be interested to display a horizontal line at 0 in each of the graphs.
Thanks!

Add methods for plotting mqgam objects

We want methods for plotting multiple effects (one per quantile) on the same plot.
Easy to do for 1D smooths, much harder to see how to do it for multivariate effects.

Same scale on "by factor" smooths

1D and 2D by factor smooths are currently plotted on an individual basis each with their own scale. This is particularly strange for 2D smooth. Maybe there is room for modifying plot.gamViz so that it plots them on a grid, or at least using the same y-scale (for 1D plots) or colour-scale (for 2D rasters). A short-term solution for the 2D case is manual intervention:

# Simulate some data
library(mgcViz)
library(viridis)
set.seed(3)
dat <- gamSim(1,n=2000,dist="normal",scale=20)
dat$fac <- as.factor( sample(c("A1", "A2", "A3"), nrow(dat), replace = TRUE) )
dat$logi <- as.logical( sample(c(TRUE, FALSE), nrow(dat), replace = TRUE) )
bs <- "cr"; k <- 12
b <- gamV(y~fac + s(x0, x1, by = fac), data=dat)

# Different scales
print(plot(b), pages = 1)

# Same scale: need to add raster and contour manually to avoid having them added
# automatically by print.plotSmooth()
print(plot(b) + l_fitRaster() + l_fitContour() +
      scale_fill_gradientn(colours = viridis(50, begin = 0.2), 
                          limits = c(-6,7), name = "s(x)"), pages = 1) 

Dealing with factor variables

At the moment layers such as l_gridCheck1D do no work with factor variables. These functions need to be modified to cover factor variables.

Create a check.mgcv.smooth.1D method

We need to create something similar to check.gam. That is a function that produces an set of plots, probably based on gridCheck, cdCheck, ACF or qq.gam and probably including also a plot from plot.mgcv.smooth.1D.

Hence check.mgcv.smooth.1D would essentially be just a wrapper. Given that gridCheck and qq.gam simulate data from the model, there is potential here for simulating the data only once, and passing it to both methods.

Consistent naming for classes

Functions such as check.gam() output objects of class "check.gam". I might be better renaming these
to something like "checkGam" to avoid confusion/conflicts.

Faster sorting

In order to speed up qq.gam, we should keep on the lookout for faster sorting functions.
A promising one is data.table::fsort() which unfortunately works only for positive vectors
at the moment.

Simultaneous confidence intervals

We could add an option to l_ciLines, l_ciContour etc that allows to get simultaneous credible intervals using the approach described in section 6.10.2 of Wood's book (2nd ed).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.