hannameyer / cast Goto Github PK
View Code? Open in Web Editor NEWDeveloper Version of the R package CAST: Caret Applications for Spatio-Temporal models
Home Page: https://hannameyer.github.io/CAST/
Developer Version of the R package CAST: Caret Applications for Spatio-Temporal models
Home Page: https://hannameyer.github.io/CAST/
Thank you for the great package!
I am attempting to run aoa without specifying a model (just newdata and weights). However, I'm getting the same results whether I supply a table of weights or not. Reading through the code, it seems as though if weight cannot be extracted from a trained model early on, it gets assigned as an object of class error, over-writing any table of weights initially supplied to the function. Do I have this right?
First of all, thank you for the excellent package and companion articles.
While looking over the aoa
code it occurred to me that some of the complexity associated with handling categorical variables can be simplified by switching to a different distance metric. Gower's generalized distance metric is ideal because it can integrate mixtures of ratio, nominal, and ordinal data types. Also, the metric automatically includes scaling / centering. There are a couple of implementations:
It would appear that the knnx.dist
function does all of the heavy lifting in aoa
.
A quick benchmark of a couple candidate methods.
library(gower)
library(cluster)
library(FNN)
library(microbenchmark)
set.seed(10101)
n <- 1000
a <- rnorm(n = n, mean = 0, sd = 2)
x <- rnorm(n = n, mean = 0, sd = 2)
y <- rnorm(n = n, mean = 0, sd = 2)
z <- data.frame(x, y, a)
microbenchmark(
gower = gower_dist(z[1:10, ], z),
knn = knnx.dist(data = z, query = z[1:10, ], k = 1),
daisy = daisy(z, metric = 'gower')
)
The interface and resulting objects aren't directly compatible, but it does seem like gower::gower_dist()
is a reasonable candidate in terms of speed. The main reason to consider cluster::daisy
is that it can accommodate all variable types, while gower::gower_dist()
does not yet differentiate between nominal / ordinal factors.
Unit: microseconds
expr min lq mean median uq max neval cld
gower 395.7 444.70 523.737 497.35 559.0 874.3 100 a
knn 772.6 794.05 892.615 842.70 925.2 1382.7 100 a
daisy 56398.0 73496.70 100253.478 78571.80 88727.8 276262.1 100 b
Profiling data for aoa
run in a single thred:
This was performed with a model based on 1,030 observations as applied to a raster stack
dimensions : 3628, 2351, 8529428, 18 (nrow, ncol, ncell, nlayers)
I'll follow-up with a small example dataset that contains nominal and ordinal variables.
Hi there,
More of an enhancement suggestion but also a question. Any advice on parallel-izing ffs()
? I'm using ranger
to create species distribution models for many plant species and have ~70 covariates, resulting in ffs
reporting over 4000 individual models being trained. I have ~20 cores at my disposal, I think I could see major speed improvements with a multicore implementation similar to aoa
.
Thanks,
Rob
Hi,
I have been using the AOA() function with parallelization as follows:
´´´
cl <- makePSOCKcluster(detectCores()-2)
registerDoParallel(cl)
AOA <- aoa(df, model_sp, cl)
stopCluster(cl)
´´´
When I check the task manager processes and performance tabs, it doesn't seem like the computer is using its cores. CPU usage is quite low (below 20%), and it takes quite some time to finish.
(dataset: 70.000 lines, 17 columns, 10folds 5reps cv)
Any idea on what might be happening? :)
In addition, I tried using the calibrate_aoa() function with multiCV=T, it took days and ended up throwing an error in the end. I'll run it again when I have some time and post the message.
Some sets of training distances produce thresholds that are larger than the maximum training distance.
I expected based on Meyer & Pebesma 2021 that the threshold would lie within the range of training distances:
"The outlier-removed maximum DI of the training data is the one used as threshold for the AOA (boxplot in Figure 2b) where outliers are defined as values greater than the upper whisker (i.e. larger than the 75-percentile plus 1.5 times the IQR of the DI values of the cross-validated training data)."
Issue #46 and commit c92a3f2 changed how this threshold is calculated.
Here two examples where the updated calculation produces a threshold value that is larger than the maximum training distance:
check_threshold <- function(di) {
cat("max(di) = ")
cat(
max(di),
fill = TRUE
)
# threshold calculated by `trainDI()` with code from `CAST` <= `v0.7.0`
cat("threshold <= v0.7.0: ")
cat(
max(di[!(di > (stats::quantile(di, 0.75) + 1.5 * stats::IQR(di)))])
)
cat(" ")
cat(
grDevices::boxplot.stats(di)$stats[5],
fill = TRUE
)
# threshold calculated by `trainDI()` with code from `CAST` >= `v0.7.1`
# * issue github.com/HannaMeyer/CAST/issues/46
# * commit https://github.com/HannaMeyer/CAST/commit/c92a3f2923545268db86e5bc1da6d8966d797d94
cat("threshold >= v0.7.1: ")
cat(
stats::quantile(di, 0.75, na.rm = TRUE) + 1.5 * stats::IQR(di, na.rm = TRUE),
fill = TRUE
)
}
set.seed(17)
check_threshold(rpois(100, 2))
#> max(di) = 5
#> threshold <= v0.7.0: 5 5
#> threshold >= v0.7.1: 6
set.seed(17)
check_threshold(rnorm(100))
#> max(di) = 2.442327
#> threshold <= v0.7.0: 2.442327 2.442327
#> threshold >= v0.7.1: 2.560418
Created on 2023-06-30 with reprex v2.0.2
Meyer, H., & Pebesma, E. (2021). Predicting into unknown space? Estimating the area of applicability of spatial prediction models. Methods in Ecology and Evolution, 12(9), 1620–1633. https://doi.org/10.1111/2041-210X.13650
Hello @HannaMeyer,
in a current use-case I came across the issue that terra's SpatRaster objects are currently not supported by CAST.
Of course users can coerce their SpatRaster objects to RasterStack before using CAST functionality. However, in a recent PR #15 I included that functionality within the aoa
and calibrate_aoa
functions so that SpatRasters now won't throw an error and the output is also a SpatRaster object. Its a rather "dirty" solution but I was wondering if a transition to terra is something you are interested in for the package? And if so, should Raster objects still be supported since existing code might rely on the raster package? In that regard, if a transition to terra is considered valuable for CAST I guess it would make sense to "flip" the dependencies so that the general code base works on SpatRasters and Raster objects would be coerced beforehand?
Let me know if that is something you are interested in and I could put in some work on the mentioned PR.
When I try to use a recipe with ffs it doesn't recognize the "response" from the recipe object. Is there a work around for this? Thanks.
``
rec_test <-
recipe(stream ~ ., data = testing) %>%
update_role(sb12, new_role = "performance var") %>%
step_center(all_predictors()) %>%
step_scale(all_predictors()) %>%
step_pca(contains("_30agg"), prefix = "pca_B", threshold = 0.9)
sb12 <- CreateSpacetimeFolds(train,spacevar = "sb12",k = 10, seed = 123 )
ctrltest <- trainControl(method="repeatedcv",
repeats = 5,
allowParallel = TRUE,
returnResamp = "all",
verbose = FALSE,
index = sb12$index)
set.seed(1234)
ffstest <- ffs(rec_test,
data = testing,
metric = "Kappa",
method = "rf",
trControl = ctrltest)
Error in ffs(rec_test, data = testing, metric = "Kappa", method = "rf", :
argument "response" is missing, with no default
Hi!
Thank you for creating this package it has been a pleasure using it so far.
I am using the knndm function to see if I can better mange spatial auto correlation in my RF model, but I am having some issues at the global_validation step.
I get an error:
"Error in global_validation(model4) : Global performance could not be estimated because predictions were not saved. Train model with savePredictions='final'."
I figured this was to do with how the data is feeding into the knndm function, now as a classification model I don't have any reference. The parameters used in the 'trainControl' function should be okay (including the line savePredictions='final'). But I'm confused
Any assistance would be greatly appreciated I'm still a beginner!
Thank you
Emme
knndm_folds4 <- knndm(pts4, modeldomain=studyArea, k = 5)
knndm_folds4
plot(knndm_folds4)
ctrl4 <- trainControl(method="cv",
index=knndm_folds4$indx_train,
savePredictions='final')
model4 <- train(abpres~., data = Train4[,-c(2:3)], method="rf", trcontrol = ctrl4)
model4
lengths(model4$pred)
global_validation(model4)
global_validation(model4)
Error in global_validation(model4) :
Global performance could not be estimated because predictions were not saved.
Train model with savePredictions='final'
(Coordinate data is projected)
head(pts4)
abpres BtmSalinity_fall_max BtmStress_fall_max BtmStress_fall_min BtmTemp_fall_max BtmTemp_fall_min
1 abs 35.04016 0.084448554 0.0318813547 5.066860 4.3416319
3 pres 31.07378 0.003851054 0.0003281421 11.879930 4.8315330
5 pres 32.51310 0.005012250 0.0002675029 5.575860 0.2846721
6 abs 34.97558 0.047943920 0.0008219921 5.541814 3.9962437
7 pres 34.75494 0.058296647 0.0227264166 4.417707 2.7072399
8 pres 33.39942 0.019661102 0.0067749284 2.136045 -0.4165318
MLD_fall_max MLD_fall_min SurfaceTemp_fall_max geometry
1 195.44728 12.422209 9.134846 POINT (926113.1 1210112)
3 39.75787 8.601914 18.005045 POINT (-144911.3 -436612)
5 33.26155 4.898076 13.206921 POINT (491246.7 -249519.5)
6 29.63660 4.868274 22.026590 POINT (-128115.7 -818321.2)
7 33.98904 5.724236 11.875775 POINT (841782.8 -177986.8)
8 31.39414 7.639641 16.741175 POINT (483544 -490805.2)
Hi @HannaMeyer
here are some examples of how you/users could use the autoplot()
generics in {mlr3spatiotempcv} to visualize partitions created via {CAST}.
For spacetime, one can choose whether to show the ommited observations or not (show_omitted = TRUE/FALSE
)
All Cstf functions also have a 2D plotting generic but when the dataset is spatiotemporal, this options is limited due to overplotting in a 2D space.
Available from {mlr3spatiotempcv} >= 0.2.1.9003 (important bugfix for spacevar + timevar plotting in 0.3.0.9005).
library(mlr3)
library(mlr3spatiotempcv)
data <- cookfarm_sample
# tweak Date variable for plotting
data$Date <- rep(c(
"2020-01-01", "2020-02-01", "2020-03-01", "2020-04-01",
"2020-05-01"
), times = 1, each = 100)
b <- mlr3::as_data_backend(data)
b$hash <- "_mlr3_tasks_cookfarm_"
task <- TaskRegrST$new(
id = "cookfarm", b, target = "PHIHOX",
extra_args = list(
coordinate_names = c("x", "y"), coords_as_features = FALSE,
crs = 26911
)
)
# time out --------------------------------------------------------------------
rsp <- rsmp("sptcv_cstf", folds = 5, time_var = "Date")
set.seed(42)
rsp$instantiate(task)
# without omitted, we have no values on the y-axis and the plot is not shown
autoplot(rsp, task, fold_id = 5, show_omitted = TRUE, plot3D = TRUE)
# space out -------------------------------------------------------------------
rsp <- rsmp("sptcv_cstf", folds = 5, space_var = "SOURCEID")
set.seed(42)
rsp$instantiate(task)
# without omitted, we have no values on the y-axis and the plot is not shown
autoplot(rsp, task, fold_id = 5, show_omitted = TRUE, plot3D = TRUE)
# spacetime out --------------------------------------------------------------------
rsp <- rsmp("sptcv_cstf", folds = 5, time_var = "Date", space_var = "SOURCEID")
set.seed(42)
rsp$instantiate(task)
# without omitted, we have no values on the y-axis and the plot is not shown
autoplot(rsp, task, fold_id = 5, show_omitted = TRUE, plot3D = TRUE)
Created on 2021-04-01 by the reprex package (v1.0.0)
Sorry, me again.
When running the
calibrate_aoa
function I'm getting the error
Error: Package "scam" needed for this function to work. Please install it.
It solves after installing it.
Cheers
Hi Hanna,
when running the example code from the function reference of aoa()
or trainDI()
I get the following error from the trainDI()
call:
> #...then calculate the DI of the trained model:
> DI = trainDI(model=model)
negative weights were set to 0
Error in get.knnx(data, query, k, algorithm) :
DLL requires the use of native symbols
The example uses the default arguments (method = "L2"
). For method = "MD"
there is no error.
I couldn't figure out a workaround so far. Only lead I found was this thread.
Thanks for your help!
R version 4.3.0 (2023-04-21)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 22.04.2 LTS
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.10.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
time zone: Europe/Vaduz
tzcode source: system (glibc)
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] latticeExtra_0.6-29 viridis_0.6.2 viridisLite_0.4.0 caret_6.0-90 lattice_0.21-8 ggplot2_3.4.2
[7] terra_1.5-21 CAST_0.8.1 dplyr_1.1.2 sf_1.0-12
loaded via a namespace (and not attached):
[1] gtable_0.3.0 recipes_1.0.6 vctrs_0.6.2 tools_4.3.0 generics_0.1.2
[6] stats4_4.3.0 parallel_4.3.0 tibble_3.2.1 proxy_0.4-26 fansi_1.0.2
[11] pkgconfig_2.0.3 ModelMetrics_1.2.2.2 Matrix_1.5-4.1 KernSmooth_2.23-20 data.table_1.14.2
[16] RColorBrewer_1.1-2 lifecycle_1.0.3 FNN_1.1.3 compiler_4.3.0 stringr_1.5.0
[21] munsell_0.5.0 codetools_0.2-19 class_7.3-21 prodlim_2019.11.13 pillar_1.9.0
[26] MASS_7.3-59 classInt_0.4-3 gower_1.0.0 iterators_1.0.14 rpart_4.1.19
[31] foreach_1.5.2 nlme_3.1-162 parallelly_1.30.0 lava_1.6.10 tidyselect_1.2.0
[36] digest_0.6.29 stringi_1.7.6 future_1.24.0 reshape2_1.4.4 purrr_1.0.1
[41] listenv_0.8.0 splines_4.3.0 grid_4.3.0 colorspace_2.0-2 cli_3.6.1
[46] magrittr_2.0.2 randomForest_4.7-1.1 survival_3.5-3 utf8_1.2.2 future.apply_1.8.1
[51] e1071_1.7-9 withr_2.5.0 scales_1.2.1 lubridate_1.8.0 jpeg_0.1-9
[56] globals_0.14.0 nnet_7.3-18 gridExtra_2.3 timeDate_3043.102 png_0.1-7
[61] hardhat_1.3.0 rlang_1.1.0 Rcpp_1.0.10 glue_1.6.2 DBI_1.1.2
[66] pROC_1.18.0 ipred_0.9-12 rstudioapi_0.13 R6_2.5.1 plyr_1.8.6
[71] units_0.8-0
Dear @HannaMeyer and colleagues,
thanks for the nice package and clear tutorials.
Please be aware that the csample function in the tutorial 2 is not defined. I found it in the tutorial 3, but you might want to define it beforehand
CreateSpacetimeFolds
currently returns an error when passing a tibble
for x
.
I know that might me not the best place to start a talk, but this was, others might be able to take place.
As a scientist, I'm torn. An automatic selection of variables are driven by statistical and not theoretical considerations. Of course, the whole thing depends on the research question and the research approach. Fortunately, that's a new field to be explored :-)
Are you planning to integrate backward, stepwise and other approaches of selection?
Hello CAST team,
thank you for the nice package!
A dependency on "twosamples" package is not installed during initial installation:
indices_knndm <- knndm(splotdata,predictors_sp,k=3)
error: "Error in loadNamespace(x) : there is no package called ‘twosamples’ "
Sincerely,
Dear Hanna,
if n in factorial(n) is larger than 170, the results is NaN or Inf (R version 4.0.1). I wonder about this cause it did not happen before (before = 18 month ago :-). Did you include the line lately? I wonder because for the rainfall paper there have been more than 400 variables? While there are ways to compute factorial for larger n with big integer (gmp::factorialZ), this does not solve the problem because the matrix definition won't understand.
Is perf_all
the matrix you mentioned which is only relevant if you want to look up the ffs/error function, one could make it optional to execute.
Cheers
Thomas
Line 142 in f7a2824
Hi Hanna,
Thank you a lot for the great package!
I noticed that when using the package with the ranger method, it is not possible to use a tuneGrid dataframe as with caret. It is just possible to set the tuneLength argument (luckily).
When using a tune grid, R throws the following error:
[1] "model using NDVI,soil_moist will be trained now..." Something is wrong; all the RMSE metric values are missing: RMSE Rsquared MAE Min. : NA Min. : NA Min. : NA 1st Qu.: NA 1st Qu.: NA 1st Qu.: NA Median : NA Median : NA Median : NA Mean :NaN Mean :NaN Mean :NaN 3rd Qu.: NA 3rd Qu.: NA 3rd Qu.: NA Max. : NA Max. : NA Max. : NA NA's :1 NA's :1 NA's :1 Error: Stopping In addition: Warning message: In nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
I tried adding the metric argument metric = c("RMSE")
. But it didn't work.
Here is my code:
train_ffs_model <- function(data){
#tuneGrid_ffs <- expand.grid(mtry = 3, splitrule = "variance", min.node.size = 5)
predictors <- setdiff(names(data), c("x","y","region","Lstmean","geometry"))
folds <- CAST::CreateSpacetimeFolds(data,spacevar = "region",k=7)
model <- CAST::ffs(data[,predictors],data$Lstmean,
method="ranger",
importance = "permutation",
tuneLength = 1,
#tuneGrid = tuneGrid_ffs,
trControl=trainControl(method="cv",number=10,
index = folds$index,indexOut = folds$indexOut))
return(model)
}
library(parallel)
library(doParallel)
cl <- makePSOCKcluster(4)
registerDoParallel(cl)
ffs_models_03_11 <- purrr::map(.x = data_years_03_11, .f = train_ffs_model)
stopCluster(cl)
I ended up commenting out the tuneGrid and relying only on the tunelength. However, I would like to have more control on the hyperparameters. And since the dataframes are quite big (80k rows), ranger is much faster than RF.
or am I doing something wrong ?
Thank you!
When trying to calculate the variable importance of my ffsmodel using varImp(ffsmodel_LLO)
, it breaks with:
Error in varImp[, "%IncMSE"] : subscript out of bounds
According to https://stackoverflow.com/a/24043890, the importance hasn't been calculated in ffs(). Could you add it in future?
With reference to Algorithm 1 in Meyer et al., 2018.
If I have 4 predictors, Algorithm 1 evaluates all the possible couples to find the best one.
The algorithm recursively adds the remaining variables i=3,4.
The algorithm stops if mean(error of model_i) > mean(error of model_best).
This means that if the error increases with 3 variables the algorithm stops and it doesn't check what happens whit 4.
What if there is a couple of remaining variables that interact and improve model performance? In other words,
what if with 3 variables performance decreases while with 4 variables it increases? If the algorithm stops at 3, we will never know.
Is that right or I'm missing something?
Thanks
Hi again
The line
###for the spatial CV: RMSE(AOA_spatial$AOA)==1],values)(AOA_spatial$AOA)==1])
fails in my R
same for the RMSE of the random CV
you probably want to add the argument na.rm=T
there
thanks!
Thanks for this package, it is very helpful. One point to note:
It looks like the minimum value reset in line 183 of ffs should be .mtry
instead of mtry
.
When running the example within the ffs function with method = ranger
, it is clear that the instances with mtry > # of predictors isn't skipped in the same way as when the tuneLength
argument is set. See example below.
library(CAST)
#> Warning: package 'CAST' was built under R version 4.1.3
library(doParallel)
#> Warning: package 'doParallel' was built under R version 4.1.3
#> Loading required package: foreach
#> Warning: package 'foreach' was built under R version 4.1.3
#> Loading required package: iterators
#> Warning: package 'iterators' was built under R version 4.1.3
#> Loading required package: parallel
library(lubridate)
#> Warning: package 'lubridate' was built under R version 4.1.2
#>
#> Attaching package: 'lubridate'
#> The following objects are masked from 'package:base':
#>
#> date, intersect, setdiff, union
library(ggplot2)
#> Warning: package 'ggplot2' was built under R version 4.1.2
library(caret)
#> Warning: package 'caret' was built under R version 4.1.3
#> Loading required package: lattice
#> Warning: package 'lattice' was built under R version 4.1.2
library(tictoc)
#> Warning: package 'tictoc' was built under R version 4.1.1
cl <- makeCluster(3)
registerDoParallel(cl)
#load and prepare dataset:
dat <- get(load(system.file("extdata","Cookfarm.RData",package="CAST")))
trainDat <- dat[dat$altitude==-0.3&year(dat$Date)==2012&week(dat$Date)%in%c(13:14),]
#visualize dataset:
ggplot(data = trainDat, aes(x=Date, y=VW)) + geom_line(aes(colour=SOURCEID))
#create folds for Leave Location Out Cross Validation:
set.seed(10)
indices <- CreateSpacetimeFolds(trainDat,spacevar = "SOURCEID",k=3)
ctrl <- trainControl(method="cv",index = indices$index)
#define potential predictors:
predictors <- c("DEM","TWI","BLD","Precip_cum","cday","MaxT_wrcc",
"Precip_wrcc","NDRE.M","Bt","MinT_wrcc","Northing","Easting")
tuneGrid<- data.frame(
.mtry = c(2,5,7),
.splitrule = "variance",
.min.node.size = c(5))
tic()
#run ffs model with Leave Location out CV
set.seed(10)
ffsmodel <- ffs(trainDat[,predictors],trainDat$VW,method="ranger",
tuneGrid = tuneGrid, num.trees = 100,trControl=ctrl)
#> [1] "model using DEM,TWI will be trained now..."
#> Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
#> There were missing values in resampled performance measures.
#> Warning in train.default(predictors[, minGrid[i, ]], response, method =
#> method, : missing values found in aggregated results
#> [1] "maximum number of models that still need to be trained: 120"
#> [1] "model using DEM,BLD will be trained now..."
#> Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
#> There were missing values in resampled performance measures.
#> Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
#> missing values found in aggregated results
#> [1] "maximum number of models that still need to be trained: 119"
#> [1] "model using DEM,Precip_cum will be trained now..."
#> Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
#> There were missing values in resampled performance measures.
#> Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
#> missing values found in aggregated results
#> [1] "maximum number of models that still need to be trained: 118"
#> [1] "model using DEM,cday will be trained now..."
#> Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
#> There were missing values in resampled performance measures.
#> Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
#> missing values found in aggregated results
#> [1] "maximum number of models that still need to be trained: 117"
#> [1] "model using DEM,MaxT_wrcc will be trained now..."
#> Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
#> There were missing values in resampled performance measures.
#> Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
#> missing values found in aggregated results
#> [1] "maximum number of models that still need to be trained: 116"
#> [1] "model using DEM,Precip_wrcc will be trained now..."
#> Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
#> There were missing values in resampled performance measures.
#> Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
#> missing values found in aggregated results
#> [1] "maximum number of models that still need to be trained: 115"
#> [1] "model using DEM,NDRE.M will be trained now..."
#> Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
#> There were missing values in resampled performance measures.
#> Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
#> missing values found in aggregated results
#> [1] "maximum number of models that still need to be trained: 114"
#> [1] "model using DEM,Bt will be trained now..."
#> Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
#> There were missing values in resampled performance measures.
#> Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
#> missing values found in aggregated results
#> [1] "maximum number of models that still need to be trained: 113"
#> [1] "model using DEM,MinT_wrcc will be trained now..."
#> Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
#> There were missing values in resampled performance measures.
#> Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
#> missing values found in aggregated results
#> [1] "maximum number of models that still need to be trained: 112"
#> [1] "model using DEM,Northing will be trained now..."
#> Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
#> There were missing values in resampled performance measures.
#> Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
#> missing values found in aggregated results
#> [1] "maximum number of models that still need to be trained: 111"
#> [1] "model using DEM,Easting will be trained now..."
#> Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
#> There were missing values in resampled performance measures.
#> Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
#> missing values found in aggregated results
#> [1] "maximum number of models that still need to be trained: 110"
#> [1] "model using TWI,BLD will be trained now..."
#> Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
#> There were missing values in resampled performance measures.
#> Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
#> missing values found in aggregated results
#> [1] "maximum number of models that still need to be trained: 109"
#> [1] "model using TWI,Precip_cum will be trained now..."
#> Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
#> There were missing values in resampled performance measures.
#> Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
#> missing values found in aggregated results
#> [1] "maximum number of models that still need to be trained: 108"
#> [1] "model using TWI,cday will be trained now..."
#> Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
#> There were missing values in resampled performance measures.
#> Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
#> missing values found in aggregated results
#> [1] "maximum number of models that still need to be trained: 107"
#> [1] "model using TWI,MaxT_wrcc will be trained now..."
#> Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
#> There were missing values in resampled performance measures.
#> Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
#> missing values found in aggregated results
#> [1] "maximum number of models that still need to be trained: 106"
#> [1] "model using TWI,Precip_wrcc will be trained now..."
#> Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
#> There were missing values in resampled performance measures.
#> Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
#> missing values found in aggregated results
#> [1] "maximum number of models that still need to be trained: 105"
#> [1] "model using TWI,NDRE.M will be trained now..."
#> Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
#> There were missing values in resampled performance measures.
#> Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
#> missing values found in aggregated results
#> [1] "maximum number of models that still need to be trained: 104"
#> [1] "model using TWI,Bt will be trained now..."
#> Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
#> There were missing values in resampled performance measures.
#> Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
#> missing values found in aggregated results
#> [1] "maximum number of models that still need to be trained: 103"
#> [1] "model using TWI,MinT_wrcc will be trained now..."
#> Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
#> There were missing values in resampled performance measures.
#> Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
#> missing values found in aggregated results
#> [1] "maximum number of models that still need to be trained: 102"
#> [1] "model using TWI,Northing will be trained now..."
#> Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
#> There were missing values in resampled performance measures.
#> Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
#> missing values found in aggregated results
#> [1] "maximum number of models that still need to be trained: 101"
#> [1] "model using TWI,Easting will be trained now..."
#> Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
#> There were missing values in resampled performance measures.
#> Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
#> missing values found in aggregated results
#> [1] "maximum number of models that still need to be trained: 100"
#> [1] "model using BLD,Precip_cum will be trained now..."
#> Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
#> There were missing values in resampled performance measures.
#> Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
#> missing values found in aggregated results
#> [1] "maximum number of models that still need to be trained: 99"
#> [1] "model using BLD,cday will be trained now..."
#> Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
#> There were missing values in resampled performance measures.
#> Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
#> missing values found in aggregated results
#> [1] "maximum number of models that still need to be trained: 98"
#> [1] "model using BLD,MaxT_wrcc will be trained now..."
#> Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
#> There were missing values in resampled performance measures.
#> Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
#> missing values found in aggregated results
#> [1] "maximum number of models that still need to be trained: 97"
#> [1] "model using BLD,Precip_wrcc will be trained now..."
#> Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
#> There were missing values in resampled performance measures.
#> Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
#> missing values found in aggregated results
#> [1] "maximum number of models that still need to be trained: 96"
#> [1] "model using BLD,NDRE.M will be trained now..."
#> Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
#> There were missing values in resampled performance measures.
#> Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
#> missing values found in aggregated results
#> [1] "maximum number of models that still need to be trained: 95"
#> [1] "model using BLD,Bt will be trained now..."
#> Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
#> There were missing values in resampled performance measures.
#> Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
#> missing values found in aggregated results
#> [1] "maximum number of models that still need to be trained: 94"
#> [1] "model using BLD,MinT_wrcc will be trained now..."
#> Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
#> There were missing values in resampled performance measures.
#> Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
#> missing values found in aggregated results
#> [1] "maximum number of models that still need to be trained: 93"
#> [1] "model using BLD,Northing will be trained now..."
#> Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
#> There were missing values in resampled performance measures.
#> Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
#> missing values found in aggregated results
#> [1] "maximum number of models that still need to be trained: 92"
#> [1] "model using BLD,Easting will be trained now..."
#> Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
#> There were missing values in resampled performance measures.
#> Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
#> missing values found in aggregated results
#> [1] "maximum number of models that still need to be trained: 91"
#> [1] "model using Precip_cum,cday will be trained now..."
#> Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
#> There were missing values in resampled performance measures.
#> Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
#> missing values found in aggregated results
#> [1] "maximum number of models that still need to be trained: 90"
#> [1] "model using Precip_cum,MaxT_wrcc will be trained now..."
#> Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
#> There were missing values in resampled performance measures.
#> Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
#> missing values found in aggregated results
#> [1] "maximum number of models that still need to be trained: 89"
#> [1] "model using Precip_cum,Precip_wrcc will be trained now..."
#> Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
#> There were missing values in resampled performance measures.
#> Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
#> missing values found in aggregated results
#> [1] "maximum number of models that still need to be trained: 88"
#> [1] "model using Precip_cum,NDRE.M will be trained now..."
#> Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
#> There were missing values in resampled performance measures.
#> Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
#> missing values found in aggregated results
#> [1] "maximum number of models that still need to be trained: 87"
#> [1] "model using Precip_cum,Bt will be trained now..."
#> Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
#> There were missing values in resampled performance measures.
#> Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
#> missing values found in aggregated results
#> [1] "maximum number of models that still need to be trained: 86"
#> [1] "model using Precip_cum,MinT_wrcc will be trained now..."
#> Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
#> There were missing values in resampled performance measures.
#> Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
#> missing values found in aggregated results
#> [1] "maximum number of models that still need to be trained: 85"
#> [1] "model using Precip_cum,Northing will be trained now..."
#> Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
#> There were missing values in resampled performance measures.
#> Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
#> missing values found in aggregated results
#> [1] "maximum number of models that still need to be trained: 84"
#> [1] "model using Precip_cum,Easting will be trained now..."
#> Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
#> There were missing values in resampled performance measures.
#> Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
#> missing values found in aggregated results
#> [1] "maximum number of models that still need to be trained: 83"
#> [1] "model using cday,MaxT_wrcc will be trained now..."
#> Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
#> There were missing values in resampled performance measures.
#> Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
#> missing values found in aggregated results
#> [1] "maximum number of models that still need to be trained: 82"
#> [1] "model using cday,Precip_wrcc will be trained now..."
#> Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
#> There were missing values in resampled performance measures.
#> Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
#> missing values found in aggregated results
#> [1] "maximum number of models that still need to be trained: 81"
#> [1] "model using cday,NDRE.M will be trained now..."
#> Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
#> There were missing values in resampled performance measures.
#> Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
#> missing values found in aggregated results
#> [1] "maximum number of models that still need to be trained: 80"
#> [1] "model using cday,Bt will be trained now..."
#> Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
#> There were missing values in resampled performance measures.
#> Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
#> missing values found in aggregated results
#> [1] "maximum number of models that still need to be trained: 79"
#> [1] "model using cday,MinT_wrcc will be trained now..."
#> Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
#> There were missing values in resampled performance measures.
#> Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
#> missing values found in aggregated results
#> [1] "maximum number of models that still need to be trained: 78"
#> [1] "model using cday,Northing will be trained now..."
#> Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
#> There were missing values in resampled performance measures.
#> Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
#> missing values found in aggregated results
#> [1] "maximum number of models that still need to be trained: 77"
#> [1] "model using cday,Easting will be trained now..."
#> Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
#> There were missing values in resampled performance measures.
#> Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
#> missing values found in aggregated results
#> [1] "maximum number of models that still need to be trained: 76"
#> [1] "model using MaxT_wrcc,Precip_wrcc will be trained now..."
#> Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
#> There were missing values in resampled performance measures.
#> Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
#> missing values found in aggregated results
#> [1] "maximum number of models that still need to be trained: 75"
#> [1] "model using MaxT_wrcc,NDRE.M will be trained now..."
#> Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
#> There were missing values in resampled performance measures.
#> Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
#> missing values found in aggregated results
#> [1] "maximum number of models that still need to be trained: 74"
#> [1] "model using MaxT_wrcc,Bt will be trained now..."
#> Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
#> There were missing values in resampled performance measures.
#> Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
#> missing values found in aggregated results
#> [1] "maximum number of models that still need to be trained: 73"
#> [1] "model using MaxT_wrcc,MinT_wrcc will be trained now..."
#> Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
#> There were missing values in resampled performance measures.
#> Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
#> missing values found in aggregated results
#> [1] "maximum number of models that still need to be trained: 72"
#> [1] "model using MaxT_wrcc,Northing will be trained now..."
#> Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
#> There were missing values in resampled performance measures.
#> Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
#> missing values found in aggregated results
#> [1] "maximum number of models that still need to be trained: 71"
#> [1] "model using MaxT_wrcc,Easting will be trained now..."
#> Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
#> There were missing values in resampled performance measures.
#> Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
#> missing values found in aggregated results
#> [1] "maximum number of models that still need to be trained: 70"
#> [1] "model using Precip_wrcc,NDRE.M will be trained now..."
#> Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
#> There were missing values in resampled performance measures.
#> Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
#> missing values found in aggregated results
#> [1] "maximum number of models that still need to be trained: 69"
#> [1] "model using Precip_wrcc,Bt will be trained now..."
#> Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
#> There were missing values in resampled performance measures.
#> Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
#> missing values found in aggregated results
#> [1] "maximum number of models that still need to be trained: 68"
#> [1] "model using Precip_wrcc,MinT_wrcc will be trained now..."
#> Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
#> There were missing values in resampled performance measures.
#> Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
#> missing values found in aggregated results
#> [1] "maximum number of models that still need to be trained: 67"
#> [1] "model using Precip_wrcc,Northing will be trained now..."
#> Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
#> There were missing values in resampled performance measures.
#> Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
#> missing values found in aggregated results
#> [1] "maximum number of models that still need to be trained: 66"
#> [1] "model using Precip_wrcc,Easting will be trained now..."
#> Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
#> There were missing values in resampled performance measures.
#> Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
#> missing values found in aggregated results
#> [1] "maximum number of models that still need to be trained: 65"
#> [1] "model using NDRE.M,Bt will be trained now..."
#> Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
#> There were missing values in resampled performance measures.
#> Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
#> missing values found in aggregated results
#> [1] "maximum number of models that still need to be trained: 64"
#> [1] "model using NDRE.M,MinT_wrcc will be trained now..."
#> Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
#> There were missing values in resampled performance measures.
#> Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
#> missing values found in aggregated results
#> [1] "maximum number of models that still need to be trained: 63"
#> [1] "model using NDRE.M,Northing will be trained now..."
#> Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
#> There were missing values in resampled performance measures.
#> Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
#> missing values found in aggregated results
#> [1] "maximum number of models that still need to be trained: 62"
#> [1] "model using NDRE.M,Easting will be trained now..."
#> Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
#> There were missing values in resampled performance measures.
#> Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
#> missing values found in aggregated results
#> [1] "maximum number of models that still need to be trained: 61"
#> [1] "model using Bt,MinT_wrcc will be trained now..."
#> Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
#> There were missing values in resampled performance measures.
#> Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
#> missing values found in aggregated results
#> [1] "maximum number of models that still need to be trained: 60"
#> [1] "model using Bt,Northing will be trained now..."
#> Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
#> There were missing values in resampled performance measures.
#> Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
#> missing values found in aggregated results
#> [1] "maximum number of models that still need to be trained: 59"
#> [1] "model using Bt,Easting will be trained now..."
#> Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
#> There were missing values in resampled performance measures.
#> Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
#> missing values found in aggregated results
#> [1] "maximum number of models that still need to be trained: 58"
#> [1] "model using MinT_wrcc,Northing will be trained now..."
#> Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
#> There were missing values in resampled performance measures.
#> Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
#> missing values found in aggregated results
#> [1] "maximum number of models that still need to be trained: 57"
#> [1] "model using MinT_wrcc,Easting will be trained now..."
#> Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
#> There were missing values in resampled performance measures.
#> Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
#> missing values found in aggregated results
#> [1] "maximum number of models that still need to be trained: 56"
#> [1] "model using Northing,Easting will be trained now..."
#> Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
#> There were missing values in resampled performance measures.
#> Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
#> missing values found in aggregated results
#> [1] "maximum number of models that still need to be trained: 55"
#> [1] "vars selected: DEM,BLD with RMSE NaN"
#> [1] "model using additional variable TWI will be trained now..."
#> Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
#> There were missing values in resampled performance measures.
#> Warning in train.default(predictors[, c(startvars, nextvars[i])], response, :
#> missing values found in aggregated results
#> [1] "maximum number of models that still need to be trained: 54"
#> [1] "model using additional variable Precip_cum will be trained now..."
#> Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
#> There were missing values in resampled performance measures.
#> Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
#> missing values found in aggregated results
#> [1] "maximum number of models that still need to be trained: 53"
#> [1] "model using additional variable cday will be trained now..."
#> Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
#> There were missing values in resampled performance measures.
#> Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
#> missing values found in aggregated results
#> [1] "maximum number of models that still need to be trained: 52"
#> [1] "model using additional variable MaxT_wrcc will be trained now..."
#> Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
#> There were missing values in resampled performance measures.
#> Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
#> missing values found in aggregated results
#> [1] "maximum number of models that still need to be trained: 51"
#> [1] "model using additional variable Precip_wrcc will be trained now..."
#> Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
#> There were missing values in resampled performance measures.
#> Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
#> missing values found in aggregated results
#> [1] "maximum number of models that still need to be trained: 50"
#> [1] "model using additional variable NDRE.M will be trained now..."
#> Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
#> There were missing values in resampled performance measures.
#> Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
#> missing values found in aggregated results
#> [1] "maximum number of models that still need to be trained: 49"
#> [1] "model using additional variable Bt will be trained now..."
#> Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
#> There were missing values in resampled performance measures.
#> Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
#> missing values found in aggregated results
#> [1] "maximum number of models that still need to be trained: 48"
#> [1] "model using additional variable MinT_wrcc will be trained now..."
#> Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
#> There were missing values in resampled performance measures.
#> Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
#> missing values found in aggregated results
#> [1] "maximum number of models that still need to be trained: 47"
#> [1] "model using additional variable Northing will be trained now..."
#> Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
#> There were missing values in resampled performance measures.
#> Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
#> missing values found in aggregated results
#> [1] "maximum number of models that still need to be trained: 46"
#> [1] "model using additional variable Easting will be trained now..."
#> Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
#> There were missing values in resampled performance measures.
#> Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
#> missing values found in aggregated results
#> [1] "maximum number of models that still need to be trained: 45"
#> [1] "vars selected: DEM,BLD,Northing with RMSE NaN"
#> [1] "model using additional variable TWI will be trained now..."
#> Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
#> There were missing values in resampled performance measures.
#> Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
#> missing values found in aggregated results
#> [1] "maximum number of models that still need to be trained: 44"
#> [1] "model using additional variable Precip_cum will be trained now..."
#> Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
#> There were missing values in resampled performance measures.
#> Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
#> missing values found in aggregated results
#> [1] "maximum number of models that still need to be trained: 43"
#> [1] "model using additional variable cday will be trained now..."
#> Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
#> There were missing values in resampled performance measures.
#> Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
#> missing values found in aggregated results
#> [1] "maximum number of models that still need to be trained: 42"
#> [1] "model using additional variable MaxT_wrcc will be trained now..."
#> Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
#> There were missing values in resampled performance measures.
#> Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
#> missing values found in aggregated results
#> [1] "maximum number of models that still need to be trained: 41"
#> [1] "model using additional variable Precip_wrcc will be trained now..."
#> Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
#> There were missing values in resampled performance measures.
#> Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
#> missing values found in aggregated results
#> [1] "maximum number of models that still need to be trained: 40"
#> [1] "model using additional variable NDRE.M will be trained now..."
#> Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
#> There were missing values in resampled performance measures.
#> Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
#> missing values found in aggregated results
#> [1] "maximum number of models that still need to be trained: 39"
#> [1] "model using additional variable Bt will be trained now..."
#> Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
#> There were missing values in resampled performance measures.
#> Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
#> missing values found in aggregated results
#> [1] "maximum number of models that still need to be trained: 38"
#> [1] "model using additional variable MinT_wrcc will be trained now..."
#> Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
#> There were missing values in resampled performance measures.
#> Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
#> missing values found in aggregated results
#> [1] "maximum number of models that still need to be trained: 37"
#> [1] "model using additional variable Easting will be trained now..."
#> Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
#> There were missing values in resampled performance measures.
#> Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
#> missing values found in aggregated results
#> [1] "maximum number of models that still need to be trained: 36"
#> [1] "vars selected: DEM,BLD,Northing with RMSE NaN"
#> Note: No increase in performance found using more than 3 variables
ffsmodel
#> Random Forest
#>
#> 490 samples
#> 3 predictor
#>
#> No pre-processing
#> Resampling: Cross-Validated (10 fold)
#> Summary of sample sizes: 332, 326, 322
#> Resampling results across tuning parameters:
#>
#> mtry RMSE Rsquared MAE
#> 2 0.06282679 0.2913655 0.04683066
#> 5 NaN NaN NaN
#> 7 NaN NaN NaN
#>
#> Tuning parameter 'splitrule' was held constant at a value of variance
#>
#> Tuning parameter 'min.node.size' was held constant at a value of 5
#> RMSE was used to select the optimal model using the smallest value.
#> The final values used for the model were mtry = 2, splitrule = variance
#> and min.node.size = 5.
toc()
#> 36.04 sec elapsed
stopCluster(cl)
Created on 2022-04-24 by the reprex package (v2.0.1)
This package depends on (depends, imports or suggests) raster and one or more of the retiring packages rgdal, rgeos or maptools (https://r-spatial.org/r/2022/04/12/evolution.html, https://r-spatial.org/r/2022/12/14/evolution2.html). Since raster 3.6.3
, all use of external FOSS library functionality has been transferred to terra, making the retiring packages very likely redundant. It would help greatly if you could remove dependencies on the retiring packages as soon as possible.
First, thanks for your work on CAST
. It is a very nice package and I am looking forward to further developments.
I recently ran into an issue while trying to run the tutorial https://cran.r-project.org/web/packages/CAST/vignettes/AOA-tutorial.html with my own data. I ran the function aoa
, but the AOA$AOA results were only zeros.
AOA <- aoa(newdata = newdata, model = mod1, returnTrainDI = TRUE, cl = cl)
I found the issue was that I am using a tibble when training the model as below:
mod1 <- train(x = mytbl[,predictorNames],
y = mytbl$response,
method = "rf",
importance = TRUE,
tuneGrid = expand.grid(mtry = c(2:length(predictorNames))),
trControl = trainControl(method = "cv", savePredictions = TRUE))
Because of that, model$trainingData
is also a tibble, and on line 168, newdata[,catvar]
becomes NA, because I have one categorical predictor. tibble
has a different dropping behavior than data.frame
when a single column is returned. Specifically, unique(train[,catvar])
return a one-column tibble instead of a vector.
Line 168 in b34bc35
The solution for me was to use mytbl <- as.data.frame(mytbl)
before training the model, but I would suggest to use this at the beginning of the aoa function call to increase robustness to handle tibbles as well:
if(is.null(train)){train <- as.data.frame(model$trainingData)}
I don't have a ready reprex but I hope my description is sufficient to understand the issue.
Hello,
I've been reading through the documentation for the ffs function, and I haven't been able to figure out a way to change what the threshold is for classifying predicted values. Am I missing something, or is this a missing feature? Perhaps the issue stems more from the caret package, but it is with the ffs function that I run into issues.
Here is an example of problem I'm having:
library(caret)
library(CAST)
test_data <- structure(list(presence = c("no", "yes", "no", "no", "no", "no",
"yes", "no", "no", "no", "no", "no", "no", "yes", "no", "no",
"no", "no", "no", "no", "yes", "no", "no", "yes", "yes", "yes",
"no", "no", "no", "no", "yes", "no", "no", "no", "no", "no",
"no", "no", "no", "no", "no", "no", "no", "no", "no", "yes",
"no", "no", "no", "no", "yes", "no", "no", "yes", "no", "no",
"yes", "no", "no", "no", "no", "no", "no", "no", "no", "no",
"no", "no", "no", "no", "no", "no", "no", "no", "no", "no", "no",
"no", "no", "no", "yes", "no", "yes", "no", "no", "no", "no",
"no", "yes", "no", "no", "no", "no", "no", "yes", "no", "no",
"no", "no", "no"), annual_precip = c(153L, 200L, 235L, 281L,
296L, 200L, 130L, 127L, 294L, 169L, 221L, 242L, 105L, 173L, 420L,
212L, 116L, 252L, 153L, 167L, 243L, 186L, 412L, 179L, 237L, 107L,
147L, 231L, 157L, 286L, 185L, 154L, 176L, 205L, 84L, 209L, 87L,
247L, 380L, 146L, 218L, 119L, 420L, 420L, 200L, 195L, 199L, 411L,
419L, 188L, 127L, 156L, 108L, 195L, 183L, 397L, 152L, 122L, 148L,
152L, 219L, 159L, 152L, 107L, 367L, 393L, 115L, 252L, 241L, 169L,
297L, 310L, 199L, 147L, 142L, 226L, 118L, 289L, 246L, 237L, 153L,
113L, 203L, 220L, 76L, 101L, 346L, 133L, 154L, 305L, 156L, 233L,
442L, 130L, 125L, 127L, 117L, 199L, 211L, 109L), precip_wettest_Q = c(74L,
92L, 118L, 125L, 130L, 85L, 69L, 61L, 142L, 84L, 104L, 104L,
54L, 84L, 183L, 101L, 56L, 125L, 74L, 75L, 112L, 91L, 175L, 94L,
102L, 58L, 75L, 102L, 87L, 125L, 91L, 76L, 86L, 115L, 46L, 100L,
45L, 123L, 156L, 69L, 103L, 66L, 172L, 183L, 85L, 100L, 85L,
169L, 177L, 92L, 70L, 75L, 53L, 95L, 88L, 167L, 74L, 66L, 71L,
75L, 104L, 78L, 75L, 62L, 153L, 162L, 70L, 124L, 121L, 82L, 140L,
143L, 84L, 75L, 70L, 113L, 65L, 127L, 122L, 108L, 75L, 61L, 94L,
116L, 39L, 52L, 136L, 66L, 75L, 129L, 76L, 117L, 183L, 69L, 64L,
63L, 65L, 92L, 94L, 55L), mean_diurnal_range = c(7L, 7L, 7L,
9L, 5L, 7L, 7L, 6L, 7L, 7L, 7L, 6L, 6L, 7L, 6L, 8L, 6L, 8L, 7L,
7L, 6L, 7L, 5L, 7L, 6L, 6L, 7L, 7L, 6L, 9L, 7L, 6L, 8L, 6L, 5L,
7L, 6L, 7L, 4L, 6L, 7L, 7L, 4L, 6L, 7L, 7L, 8L, 4L, 5L, 7L, 7L,
7L, 6L, 7L, 7L, 5L, 6L, 6L, 7L, 6L, 6L, 7L, 6L, 5L, 4L, 4L, 6L,
7L, 8L, 7L, 7L, 7L, 8L, 7L, 7L, 6L, 7L, 8L, 8L, 7L, 7L, 6L, 7L,
6L, 5L, 6L, 4L, 6L, 7L, 6L, 6L, 7L, 4L, 7L, 6L, 7L, 6L, 7L, 7L,
6L), isothermality = c(14L, 13L, 14L, 17L, 11L, 16L, 13L, 14L,
15L, 14L, 14L, 14L, 14L, 14L, 13L, 16L, 14L, 15L, 14L, 16L, 14L,
14L, 12L, 14L, 13L, 14L, 15L, 14L, 14L, 17L, 14L, 13L, 15L, 15L,
13L, 16L, 13L, 14L, 10L, 13L, 14L, 15L, 11L, 13L, 16L, 15L, 16L,
10L, 11L, 14L, 14L, 14L, 14L, 14L, 14L, 11L, 13L, 14L, 14L, 13L,
14L, 14L, 13L, 12L, 10L, 11L, 12L, 14L, 15L, 14L, 14L, 14L, 16L,
15L, 14L, 13L, 14L, 16L, 15L, 14L, 14L, 14L, 13L, 14L, 13L, 13L,
11L, 14L, 14L, 14L, 13L, 14L, 11L, 15L, 13L, 14L, 13L, 13L, 14L,
14L)), row.names = c(NA, -100L), class = c("tbl_df", "tbl", "data.frame"
))
If I create a model using all 4 predictors variables, the Kappa statistic calculated by the train function is 0.08
set.seed(2354)
model <- train(presence ~ .,
trControl = trControlCon,
method = 'glm',
family = 'binomial',
metric = 'Kappa',
data = test_data
)
Generalized Linear Model
100 samples
4 predictor
2 classes: 'no', 'yes'
No pre-processing
Resampling: Cross-Validated (3 fold)
Summary of sample sizes: 66, 67, 67
Resampling results:
Accuracy Kappa
0.8404635 0.08159167
However, this is based on a default threshold of 0.5. According the calculations below however, the threshold that would maximize Kappa is 0.2.
dt <- model$pred[,c("rowIndex", "obs", "yes")] %>%
arrange(rowIndex) %>%
mutate(obs = ifelse(obs == "yes", TRUE, FALSE),
rowIndex = as.character(rowIndex))
ths <- optimal.thresholds(dt, opt.methods = "MaxKappa")
Method yes
MaxKappa 0.2
If I calculate Kappa using this threshold, I estimate a much higher Kappa statistic of 0.3.
cmx_test <- cmx(dt, ths$yes[1])
Kappa(cmx_test)
Kappa Kappa.sd
0.2412141 0.1228811
This becomes an issues when I try to use the ffs function, because I am running into many instances where when I'm using Kappa as the metric for variable selection, all of the 2 and 3 variable combinations have a Kappa statistic of 0 or less, so the algorithm stops. When this happens, all of the predicted values are returned as "no", because the probability of "yes" is less than 0.5 for all of the observations. However, if the threshold had been 0.2 instead of 0.5, I suspect the Kappa value would have varied more between predictor combinations, and likely more variables would be selected.
FF <- ffs(predictors = test_data[,2:5],
response = test_data$presence,
trControl = trainControl(method = 'cv', number = 3, classProbs = TRUE,
savePredictions = TRUE),
minVar = 2,
method = 'glm',
family = 'binomial',
metric = "Kappa"
)
FF$perf_all
var1 var2 var3 Kappa SE nvar
1 annual_precip precip_wettest_Q 0.00000000 0.00000000 2
2 annual_precip mean_diurnal_range 0.00000000 0.00000000 2
3 annual_precip isothermality 0.00000000 0.00000000 2
4 precip_wettest_Q mean_diurnal_range -0.01587302 0.01587302 2
5 precip_wettest_Q isothermality 0.00000000 0.00000000 2
6 mean_diurnal_range isothermality 0.00000000 0.00000000 2
7 annual_precip precip_wettest_Q mean_diurnal_range -0.01587302 0.01587302 3
8 annual_precip precip_wettest_Q isothermality 0.00000000 0.00000000 3
Do you have any suggestions as to how I could customize the threshold used to calculate the classification metrics for ffs to avoid this issue?
Hi all,
Adapting some code from the MEE-AOA repo, I believe I can calculate an AOA like this:
set.seed(123)
library(CAST)
library(caret)
library(virtualspecies)
npoints <- 50
meansPCA <- c(3, -1)
sdPCA <- c(2, 2)
simulateResponse <- c("bio2","bio5","bio10", "bio13", "bio14","bio19")
studyarea <- c(-15, 65, 30, 75)
predictors_global <- raster::brick(
system.file(
"extdata/bioclim_global.grd",
package = "CAST"
)
)
predictors <- crop(predictors_global, extent(studyarea))
mask <- predictors[[1]]
values(mask)[!is.na(values(mask))] <- 1
response_vs <- generateSpFromPCA(
predictors[[simulateResponse]],
means = meansPCA,
sds = sdPCA,
plot = FALSE
)
response <- response_vs$suitab.raster
mask <- rasterToPolygons(mask,dissolve=TRUE)
samplepoints <- spsample(mask,npoints,"random")
trainDat <- extract(predictors,samplepoints,df=TRUE)
trainDat$response <- extract (response,samplepoints)
trainDat <- trainDat[complete.cases(trainDat),]
model <- train(trainDat[,names(predictors)],
trainDat$response,
method="rf",
importance=TRUE,
trControl = trainControl(method="none"))
AOA <- aoa(trainDat, model=model)
According to the 2021 paper, I believe the AOA threshold after this should be equal to "the 75-percentile plus 1.5 times the IQR of the DI values of the cross-validated training data". Calculating that using quantile
and IQR
gives us these results:
di <- attr(AOA$AOA, "TrainDI")
(threshold_quantile <- stats::quantile(di, 0.75))
#> 75%
#> 0.3059488
(threshold_iqr <- (1.5 * stats::IQR(di)))
#> [1] 0.3392091
threshold_quantile + threshold_iqr
#> 75%
#> 0.6451579
But the AOA threshold returned by aoa()
doesn't match that calculation:
AOA$parameters$threshold
#> [1] 0.4770295
If I'm right and this is unexpected, it seems to be due to the use of boxplot.stats()
here:
Line 221 in afcba3f
That gives us the threshold that CAST returns:
grDevices::boxplot.stats(di)$stats[5]
#> [1] 0.4770295
But I'm not entirely sure what boxplot.stats()
actually does. For instance, imagine that we cut off the last di value in our vector:
di[50]
#> [1] 0.2120274
di <- di[1:49]
Because it's a rather low number, both our 75% percentile and IQR increase:
(threshold_quantile <- stats::quantile(di, 0.75))
#> 75%
#> 0.3101567
(threshold_iqr <- (1.5 * stats::IQR(di)))
#> [1] 0.3523555
threshold_quantile + threshold_iqr
#> 75%
#> 0.6625121
But boxplot.stats()
returns the same value as before:
grDevices::boxplot.stats(di)$stats[5]
#> [1] 0.4770295
Created on 2022-12-11 by the reprex package (v2.0.1)
Apologies if I'm misunderstanding something here! The return here just didn't match my expectations.
Would you be willing to make a version of CreateSpacetimeFolds
for rsample
?
The filesize (especially the RAM) get quite big with many predictors.
This is because of the creation of a large data.frame for the perf_all containing rows and columns corresponding to the number of predictors and model runs.
The line 224 gets rid of the empty lines at the bottom of the df, however there are still empty columns left after the ffs stops. E.g. with 116 predictors, 8 got selected by the ffs. Perf all still have all 119 columns for every predictor:
length(colnames(perf_all_big$perf_all))
[1] 119
To get rid of the columns you could use e.g.
bestmodel$perf_all <- bestmodel$perf_all[,colSums(is.na(bestmodel$perf_all)) != nrow(bestmodel$perf_all)]
Again the example with reduced size:
cutting <- big_perf_all[, colSums(is.na(big_perf_all)) != nrow(big_perf_all)]
> object.size(big_perf_all)
3604768 bytes (3.4 mb)
> object.size(cutting)
480784 bytes (0.4 mb)
Greetings Marvin
Is it possible to use the predict function with other class object besides terra raster?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.