"Problem" (quoted from initial README entry ~February 2023)
Currently the stat_* functions extract metadata after the fortify()
method is called by accessing a “last SoilProfileCollection” cached in a special environment ggspc.env
exported by the package.
It was previously recognized this can be “fixed” with an explicit reference to the dataset in the stat or geom calls (i.e. ggplot() + stat_depth_weighted(loafercreek)
).
Demonstration of Problem
This causes problems when data are defined in the ggplot()
call, such as the following unexpected behavior.
library(ggspc)
library(ggplot2)
data(loafercreek, package = "soilDB")
data(gopheridge, package = "soilDB")
aqp::nrow(loafercreek)
x <- ggplot(loafercreek, aes(clay, hillslopeprof))
y <- ggplot(gopheridge, aes(clay, hillslopeprof))
x + geom_boxplot() # works
## Warning: Removed 167 rows containing non-finite values (`stat_boxplot()`).
y + geom_boxplot() # works
## Warning: Removed 80 rows containing non-finite values (`stat_boxplot()`).
x + stat_depth_weighted() # error
## Error in `stat_depth_weighted()`:
## ! Problem while computing stat.
## ℹ Error occurred in the 1st layer.
## Caused by error in `$<-.data.frame`:
## ! replacement has 317 rows, data has 626
x + stat_depth_weighted(loafercreek) # works
y + stat_depth_weighted() # error
## Error in `stat_depth_weighted()`:
## ! Problem while computing stat.
## ℹ Error occurred in the 1st layer.
## Caused by error in `$<-.data.frame`:
## ! replacement has 626 rows, data has 317
y + stat_depth_weighted(gopheridge) # works
Path forward
Looking around at other packages that hack ggplot2 and/or provide special methods for S4 classes... most give examples of using the ggplot() + stat_depth_weighted(loafercreek)
style. The S4 object is not supplied to the ggplot()
function. The general idea that full object information can persist past fortify()
is not likely consistent with the design of ggplot2/ggproto objects.
With adequate warning and documentation the above workaround could be a simple and effective way of doing things. Would need to revise the few examples that suggest doing the other way, and provide warnings to folks who will almost certainly try to pass their SoilProfileCollection directly to ggplot2::ggplot()
. Rather than overwriting a last SPC object in a shared package environment, I have tried returning the SPC as an attribute of the ggplot()
result, but I don’t know that it is possible to expose that attribute to the stat_*
functions or ggproto
classes by any standard method.
An alternate approach might also include providing a custom ggplot()
like method, or an object that can store a custom dataset-specific environment (rather than a package environment.
Perhaps there is still a better, more canonical, ggplot2 way to implement this I have still not figured out.