mfasiolo / mgcviz Goto Github PK

View Code? Open in Web Editor NEW

75.0 75.0 9.0 38.75 MB

An R package for interactive visualization of GAM models

Home Page: https://mfasiolo.github.io/mgcViz/

R 100.00%

mgcviz's People

Contributors

Stargazers

Watchers

Forkers

dill guhjy nemochina2008 valnoacco ndevln huyaodong dallak ningyile artlesshao

mgcviz's Issues

Extend plotRGL to smooths on sphere

We could follow the approach used here:

https://statnmap.com/2017-11-01-spatial-interpolation-on-earth-as-a-3d-sphere/

and we could add contours as well (not sure how hard it is).

l_densCheck should use scale_fill_gradient2

Arguably, dFun is on a divergent scale with a natural midpoint at 0, so it should use a divergent color scale with fixed midpoint = 0 so that a divergence of 0 is visually identical across models, the border beween under- and over-represented residual values is more vivid and the colors corresponding to "no relevant divergence between observed and expected density" cannot change depending on the extremes of the observed distances.

Thx for the great package. Just playing around with settings and options and adding my impressions :)

Confidence bands for 2D contours

Is it possible to produce a 2D contour map of a 3D surface with additional contours representing the error in estimation? For an example in base R, see below, which uses the predict.gam() function in package mgcv...

Improve recursion efficiency in check1D and check2D

When check1D is called with a list as argument (e.g. check1D(o, list("x1", "x2", "x3"))) it is then internally called recursively (only one level of recursion) and a plotGam object is returned containing a list 3 ggplots.

However, the residuals are recomputed 3 times (in this case), hence it would be be better to compute them once, and then call the rest of the code. Same for check2D, where this could be particularly important if we are checking all pairs of variables.

Give gridFun in l_gridCheck1D access to the covariates

If we let the user define a function

gridFun = function(.x, .v1){
 ....
}

then they could, for instance, regress the residuals .x on the variable .v1, to get useful diagnostics.

plot.mgcv.smooth1D dTrans argument for conditional density of residuals

I removed the dTrans argument as it was not called.

Do we want to reintroduce it as an argument for plot.mgcv.smooth1D or should we keep only the default (cubic root) ?

Problem with running vignette

library(mgcViz)
n  <- 1e3
x1 <- rnorm(n)
x2 <- rnorm(n)
dat <- data.frame("x1" = x1, "x2" = x2,
                  "y" = sin(x1) + 0.5 * x2^2 + pmax(x2, 0.2) * rnorm(n))
b <- gam(y ~ s(x1, x2), data = dat, method = "REML")
b <- getViz(b)
plot(sm(b, 1)) + l_fitRaster() + l_fitContour() + l_points()

The plot returns an error :

Error in UseMethod("rescale") : 
  pas de méthode pour 'rescale' applicable pour un objet de classe "AsIs"

Here is my session Info (I suspect a package version problem):

> sessionInfo()
R version 3.4.1 (2017-06-30)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Sierra 10.12.6

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib

locale:
[1] fr_FR.UTF-8/fr_FR.UTF-8/fr_FR.UTF-8/C/fr_FR.UTF-8/fr_FR.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] mgcViz_0.0.1  rgl_0.98.1    ggplot2_2.2.1 mgcv_1.8-22   nlme_3.1-131 

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.13       compiler_3.4.1     RColorBrewer_1.1-2 plyr_1.8.4         bindr_0.1         
 [6] viridis_0.4.0      tools_3.4.1        mvnfast_0.2.2      digest_0.6.12      viridisLite_0.2.0 
[11] jsonlite_1.5       tibble_1.3.4       gtable_0.2.0       lattice_0.20-35    pkgconfig_2.0.1   
[16] rlang_0.1.2        Matrix_1.2-10      rstudioapi_0.7     shiny_1.0.5        GGally_1.3.2      
[21] yaml_2.1.14        bindrcpp_0.2       gridExtra_2.3      httr_1.3.1         dplyr_0.7.4       
[26] knitr_1.16         htmlwidgets_0.9    grid_3.4.1         glue_1.1.1         reshape_0.8.7     
[31] data.table_1.10.4  R6_2.2.2           plotly_4.7.1       tidyr_0.7.1        purrr_0.2.2.2     
[36] magrittr_1.5       matrixStats_0.52.2 scales_0.5.0       htmltools_0.3.6    assertthat_0.2.0  
[41] mime_0.5           xtable_1.8-2       colorspace_1.3-2   httpuv_1.3.5       KernSmooth_2.23-15
[46] miniUI_0.1.1       lazyeval_0.2.0     munsell_0.4.3     
``

Re-factor check.mgcv.smooth.2D and check.mgcv.smooth.MD

The idea seems to be that the check.something methods are just wrappers for more specific plots. See also this issue. Hence here it would make sense to extract a gridCheck2D function from check.mgcv.smooth.2D.

Currently the problem with check.mgcv.smooth.1D and 2D is that they really check what's happening along a 1 or 2 covariates, the do not really care about the specific smooth considered.

Add layer for plotting derivatives of fitted smooth

Polishing qq.gam

Outstanding issues:

When method=="simul2", do we really need rank()? Can we just use 1:n? Especially given that
rank() will miss unobserved levels of discrete variables.
When method=="simul1" we use .quBySort() to get the quantiles by sorting. Check how the
quantiles are being estimated from the order statistics. Same when we calculate the quantiles
using rowOrderStats().
Provide gamlss and extended families with CDF function, otherwise we won't be able to use
method=="tnorm" or "tunif" with them! Do this by extending the fix.family.cdf() function.

Enable support for unconditional==TRUE

See .initializeXXX, in particular: stop("Not supported at the moment")

l_densCheck's default dFun should use the model's scale not the residual's standard deviation

Not sure how often this comes up if you don't set the scale manually but it seems dangerous to me:

library(mgcv)
library(mgcViz)
n <- 1e3
x1 <- rnorm(n)
x2 <- rnorm(n)
dat <- data.frame("x1" = x1, 
                  "x2" = x2, 
                  "y" = rnorm(n, sin(x1) + x2^2, sd = 1))
# fit with much too small scale:
b_scaled <- gam(y ~ s(x1) + s(x2), data=dat, scale = .1)
qq.gam(b_scaled)

check1D(getViz(b_scaled), x = "x1") + l_densCheck() + scale_fill_gradient2()
#> Scale for 'fill' is already present. Adding another scale for 'fill',
#> which will replace the existing scale.

# looks ok (unstructured wiggles), but we *know* the model variance is much too small:

# Comparing with Gaussian density with model's scale, *not* empirical sd of residuals:
fasiolo_dist <- function(.ed, .gr, .y) {
  td <- dnorm(.gr, 0, sd = .1)
  sign(.ed - td) * (abs(sqrt(.ed) - sqrt(td)))^1/3
}

check1D(getViz(b_scaled), x = "x1") + l_densCheck(dFun = fasiolo_dist) + 
  scale_fill_gradient2()
#> Scale for 'fill' is already present. Adding another scale for 'fill',
#> which will replace the existing scale.

# looks as it should: shows underdispersion of residuals

Created on 2018-09-11 by the reprex package (v0.2.0).

plot.mgcv.smooth.1d conditional density artefacts

Look at the left of the plot. Strangely this does not seem to happen for different seeds.

set.seed(414)
library(mgcViz)
n  <- 1e6
x1 <- rnorm(n)
x2 <- rnorm(n)
dat <- data.frame("x1" = x1, "x2" = x2,
                  "y" = sin(x1) + 0.5 * x2^2 + pmax(x2, 0.2) * rnorm(n))
b <- bam(y ~ s(x1)+s(x2), data = dat, method = "fREML", discrete = TRUE)
v <- getViz(b)
plot(v(2), resDen = "cond", alpha.rug = 0.05, maxpo = 1e4)

Re-write bin1D() in Rcpp

The double loop would definitely go much faster in Rcpp, and this should not be too hard to do.

Plot only residuals signs in plotRGL when using qgam

Getting rid of the `scheme` option

This scheme option is just a pain at the moment, I would propose to get rid of it entirely.
In particular, I think we need to remove it from

plot.mgcv.smooth.2D
plot.mgcv.smooth.MD
plot.sos.smooth

and maybe more.

In plot.mgcv.smooth.2D the persp or scheme == 2 option could become a separate function,
while the scheme = 0 with se = TRUE can be achieved by not calling geom_raster and by adding
the contour lines for the confidence intervals. We can add options to do that. scheme==3 is like
what we already have (scheme=2) but in black and white. We can modify the colour scale afterwards,
so do not need for this scheme.

Scalable version of check.gam

Currently the two scatter plots produced by check.gam do not handle large data sets. It would be useful to implement them via hexagonal binning, at least when the number of observations is, say, > 5e4.

Add posterior checking tools

Add a postCheck function, that simulated responses from the posterior and summarizes them using a scalar valued function. The full data can be summarized as well, and possibly bootstrapped. Then we can add some layers such as histograms or density plots.

Add animations for visualization of effect uncertainty

Section 4 of the paper "Graphics for uncertainty" from Bowman explain who to visualise the uncertainty of the fitted effects by simulating curves from the posterior, interpolating and then plotting them using animations.

This could clearly be implemented in mgcViz, probably using gganimate.

Avoid automatic rescaling of alpha scale in plot.mgcv.smooth.2D()

See here for a possible solution:

https://stackoverflow.com/questions/22408820/telling-ggplot-not-to-scale-alpha

Create animated version of plotSlice

Instead of plotting a panel of slices, we could create an animation. Probably using gganimate.

Add function to get smooth effect plots on response scale

At the moment we need to use the trans function to plot on response scale. This can be quite laborious. Maybe it is possible to do something automatically, possibly using the derivatives of the inverse link to transform the standard errors as well. Actually it might be possible to do this for general transformations, with transformation to response scale being a special case.

Example:

#######
# Simulate some data
library(mgcViz)
n  <- 2e3
x1 <- rnorm(n)
x2 <- rnorm(n)
x3 <- rnorm(n)
dat <- data.frame("x1" = x1, "x2" = x2, "x3" = x3,
                  "y" = abs( 0.1 + sin(x1) + 0.5 * x2^2 + 0.2*x3 + 0.2*rnorm(n) ))
b <- gamV(y ~ s(x1, x2) + s(x3), data = dat, family = Gamma(link=log))

# Plot s(x1, x2) on link scale
plot(sm(b, 1))

# Plot s(x1, x2) on response scale
plot(sm(b, 1), 
     trans = function(x){ 
       b$family$linkinv(coef(b)[1] + x)
       })

# We added the intercept and inverted the link function (if you use the identity link this is not necessary).
# Also, above s(x3) is left to zero, so we are conditioning on x3 being approximately 0 as can be seen from plot(sm(b, 2))

# To condition on a different value of x3 we must use predict.
# Here s3 is the value of s(x3) at x3 = 2 (the values of x1 and x2 do not matter here)
s3 <- predict(b, newdata = data.frame(x1 = 0, x2 = 0, x3 = 2), type = "terms")[2]

# Plotting on response scale
plot(sm(b, 1), 
     trans = function(x){
       b$family$linkinv(coef(b)[1] + s3 + x)
       })

Make plot.mgcv.smooth.2D() work when scheme == 0 or 1

Add a l_pvRaster() layers that plots the significance of the fit across 2D

Same can be done for 1D smooths (in some way).

Add wormplot option to qq.gam

To get a wormplot, it should be sufficient to subtract Dq (i.e. the x axis) from D, conf and df in .plot.qq.gam().

Allow bare variable names in l_XXX calls

Thx for this great package!

I think it would be even more awesome if I didn't have to "quote" all my "variables".
check1D(v, x = x3) 👍 , check1D(v, x = "x3") 👎

Add simulation-based approach to cdCheck

cdCheck allows to compare the residuals distribution only with uniform or Gaussian densities. We can cover model residuals types (e.g. Pearson) by simulating residuals from the model and use them to estimate the model-based residual density.

Notice that this might result in fairly noisy density estimates in the tails of the x distributions. It would be nice to be able to pass weights to simulate.gam, so that we could simulate more responses where we have little data. This could be achieved by re-sampling the elements of mu, wt and scale (the arguments of family$rd).

Passing ... to ggplot functions

We need to work out how to do this without generating too many warnings.

Create shiny version of qq.gam then enables zooming

It would be quite easy to create an app where by drawing a rectangle on the plot (basic shiny feature) we get the corresponding xlim and ylim, which then get passed to zoo.qqGam(), which does the binning and the plotting.

Strange residuals in plotRGL when using linear functional smooth

Residuals seems to be in a corner of the covariate space, maybe the same problem occurs if using l_points with standard 2d smooth plots.

Transform .initializeXXX into a proper function

It seems that using the eval() trick is unnecessary, in that using a proper function would not cause any
performance handicap (no large object will be copied).

add horizontal line

Hi, is there a way to add a horizontal line to each graph in print(plot(b, allTerms = T), pages = 1) ?
I would be interested to display a horizontal line at 0 in each of the graphs.
Thanks!

Add methods for plotting mqgam objects

We want methods for plotting multiple effects (one per quantile) on the same plot.
Easy to do for 1D smooths, much harder to see how to do it for multivariate effects.

Add level numbering in contour plots

Needed in plot.mgcv.smooth.2D, plot.sos.smooth, check.mgcv.smooth.2D and maybe others.

Same scale on "by factor" smooths

1D and 2D by factor smooths are currently plotted on an individual basis each with their own scale. This is particularly strange for 2D smooth. Maybe there is room for modifying plot.gamViz so that it plots them on a grid, or at least using the same y-scale (for 1D plots) or colour-scale (for 2D rasters). A short-term solution for the 2D case is manual intervention:

# Simulate some data
library(mgcViz)
library(viridis)
set.seed(3)
dat <- gamSim(1,n=2000,dist="normal",scale=20)
dat$fac <- as.factor( sample(c("A1", "A2", "A3"), nrow(dat), replace = TRUE) )
dat$logi <- as.logical( sample(c(TRUE, FALSE), nrow(dat), replace = TRUE) )
bs <- "cr"; k <- 12
b <- gamV(y~fac + s(x0, x1, by = fac), data=dat)

# Different scales
print(plot(b), pages = 1)

# Same scale: need to add raster and contour manually to avoid having them added
# automatically by print.plotSmooth()
print(plot(b) + l_fitRaster() + l_fitContour() +
      scale_fill_gradientn(colours = viridis(50, begin = 0.2), 
                          limits = c(-6,7), name = "s(x)"), pages = 1)