mllg / batchtools Goto Github PK
View Code? Open in Web Editor NEWTools for computation on batch systems
Home Page: https://mllg.github.io/batchtools/
License: GNU Lesser General Public License v3.0
Tools for computation on batch systems
Home Page: https://mllg.github.io/batchtools/
License: GNU Lesser General Public License v3.0
Maybe completely rely on rdocumentation.com and CRAN for hosting manuals and vignettes. Currently the man stuff is broken.
... as soon as https://github.com/MangoTheCat/processx/ is released. Benefit over cfSSH w/ only local workers: works on windows.
hyperparameters are very often mentioned. this does not make sense, batchtools does not only concern ML.
it should be "parameters".
currently we have these magic numbers in the code
default.wait = 5
....
wait = wait * 1.025
while i will heavily argue that we should have very reasonable defaults here, so that most users never have to touch them, these 2 should be configurable via options.
See #15
The source file should contain something like
setwd(dir)
Your main file
library(batchtools)
library(plyr)
dir = "/home/probst/Random_Forest/RFParset"
setwd(paste0(dir,"/results"))
unlink("probs-test", recursive = TRUE)
regis = makeExperimentRegistry("probs-test",
source = "/home/probst/Random_Forest/RFParset/code/probst_defs.R"
)
regis$cluster.functions = makeClusterFunctionsMulticore(debug = TRUE)
addProblem(name = as.character(1), data = 1)
addAlgorithm("eval", fun = function(job, data, instance, ...) {
x = list(...)
data + x$x + 1
})
set.seed(124)
ades = data.frame(c(sample(1:100)))
names(ades) = "x"
addExperiments(algo.designs = list(eval = ades))
ids = getJobTable()$job.id
ids = chunkIds(ids, chunk.size = 10)
submitJobs(ids)
getStatus()
Result:
OS cmd: /nfsmb/koll/probst/R/x86_64-pc-linux-gnu-library/3.2/batchtools/bin/linux-helper list-jobs probs-test
OS result (exit code 0):
character(0)
Status for 100 jobs:
Submitted: 100 (100.0%)
Queued : 0 ( 0.0%)
Started : 0 ( 0.0%)
Running : 0 ( 0.0%)
Done : 0 ( 0.0%)
Error : 0 ( 0.0%)
we discussed this on hangout
happens if:
apparently the latter is a user error on our part, but this was still hard to figure out and completely broke our workflow after a package update
resolution:
I can not say exactly how this happens but starting from a bit messed up state:
> getStatus()
Status for 660 jobs:
Submitted : 540 ( 81.8%)
Queued : 0 ( 0.0%)
Started : 55 ( 8.3%)
Running : 0 ( 0.0%)
Done : 23 ( 3.5%)
Error : 32 ( 4.8%)
Expired : 485 ( 73.5%)
Now I run
> submit.ids = chunkIds(ids = findNotDone(), chunk.size=5)
> submitJobs(ids = submit.ids, resources=list(walltime = 60^2, memory = 4000))
Error in submitJobs(ids = submit.ids, resources = list(walltime = 60^2, :
Assertion on 'ids$chunk' failed: Contains missing values.
Looking into it, it seems that the cause is convertIds()
which generates something like
> submit.ids2 = batchtools:::convertIds(reg = reg, ids = submit.ids, default = batchtools:::.findNotSubmitted(reg = reg), keep.extra = c("job.id", "chunk"))
> submit.ids2
job.id chunk
1: 1 42
2: 2 79
3: 3 35
4: 4 21
5: 5 43
---
633: NA NA
634: NA NA
635: NA NA
636: NA NA
637: NA NA
Messaging back to the registry as soon as the job starts is currently not implemented and should not be necessary if listJobs
would differentiate between running and queued jobs.
Should be pretty simple to implement.
We need a capital R in the conf file on LiDO. ~/.batchtools.conf.r
is not used, only ~/.batchtools.conf.R
switches to Torque. Do we intend this?
I needed an hour to find this issue...
Additional info:
ExperimentRegistry.R
uses .batchtools.conf.r
, Registry.R
uses .batchtools.conf.R
. I switch the first one to R
.
I added a test for multi row results in 0b83fe8. Do we expect the results from tab.expect
or is tab
correct?
I inspect makeClusterFunctionsTorque.R and I cannot understand how you handle the template, it seems you read the template file once template = cfReadBrewTemplate(template, "##")
, and then drop it as it is not returned by the function. How is the template then added to the registry via a conf.file? How is the template used when submitting jobs? Does the current version of batchtools rather rely on that batchtools:::findTemplateFile
always recover some template file later?
all the best
make_data <- function(data, scale, job=NULL) {
gamSim(eg = 1, n = 4000, dist = "normal", scale = scale,
verbose = FALSE)
}
fit_model <- function(data, job=NULL, instance) {
m <- gam(y ~ s(x0) + s(x1) + s(x2) + s(x3), data = instance)
m$coefficients
}
library(batchtools)
file.dir <- paste0("testtest_", Sys.Date())
reg <- makeExperimentRegistry(file.dir = file.dir, packages = "mgcv",
seed = 1)
reg$cluster.functions <- makeClusterFunctionsMulticore(25)
saveRegistry()
# Add problem and algorithms to the registry
addProblem(name = "make_data", data = NULL, fun = make_data, seed = 1)
addAlgorithm(name = "fit_model", fun = fit_model)
# Add experiments
problems <- list(make_data = data.frame(scale = 2 ^ (-4 : 4)))
addExperiments(problems, repls = 50)
#
options(error = function( ) dump.frames("batchtools.dump", to.file = TRUE))
# testJob(1)
submitJobs()
This results in
# Submitting 450 jobs in 450 chunks using cluster functions 'Parallel' ...
# Submitting [===================================----------------] 70% eta: 14sError in mcfork(detached) :
# unable to fork, possible reason: Cannot allocate memory
Closing the session, reopening a fresh one, loading the registry and doing submitJobs()
again immediately triggers the same error after "x files synced" is written to the console.
Exactly the same experiment setup works fine with reg$cluster.functions <- makeClusterFunctionsSocket(25)
instead of reg$cluster.functions <- makeClusterFunctionsMulticore(25)
.
Until the R session that throws the "unable to fork"- error is closed nothing else works properly (some browser tabs crash (?), other open R sessions all fail with "Cannot allocate memory")
> sessionInfo()
R version 3.3.1 (2016-06-21)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.5 LTS
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=de_DE.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=de_DE.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=de_DE.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] mgcv_1.8-13 nlme_3.1-128 batchtools_0.1 data.table_1.9.6
loaded via a namespace (and not attached):
[1] lattice_0.20-33 snow_0.4-1 prettyunits_1.0.2 digest_0.6.10
[5] assertthat_0.1 chron_2.3-47 grid_3.3.1 R6_2.1.3
[9] backports_1.0.3 magrittr_1.5 progress_1.0.2 stringi_1.1.1
[13] Matrix_1.2-6 checkmate_1.8.1 tools_3.3.1 parallel_3.3.1
> packageDescription("batchtools")
Package: batchtools
Title: Tools for Computation on Batch Systems
Version: 0.1
[...]
Built: R 3.3.0; x86_64-pc-linux-gnu; 2016-08-22 15:16:02 UTC; unix
[...]
RemoteSha: e34e069ce00e2d9e727cfedaf7e2278751f0cfad
[...]
MWE:
library(batchtools)
reg = makeRegistry(file.dir = NA)
ids = batchMap(function(x) {Sys.sleep(x)}
, x = c(2, 6, 6, 6, 4)
)
ids = chunkIds(ids)
submitJobs(ids, resources = list(chunk.ncpus = 4))
getJobStatus()[, list(submitted, started, time.queued)]
This gives on my system:
submitted started time.queued
1: 2016-04-14 11:24:20 2016-04-14 11:24:14 -6 secs
2: 2016-04-14 11:24:20 2016-04-14 11:24:14 -6 secs
3: 2016-04-14 11:24:20 2016-04-14 11:24:14 -6 secs
4: 2016-04-14 11:24:20 2016-04-14 11:24:14 -6 secs
5: 2016-04-14 11:24:20 2016-04-14 11:24:16 -4 secs
Firstly, I thought the calculation of time.queued
is wrong, but it is the submitted
date which is after started
. A linux system give the correct number.
I've begun experimenting with batchtools
by migrating some previous projects that used BatchJobs
and BatchExperiments
. We use SGE on our cluster. Here are the contents of my working template file (much of which was copied from my existing template with BatchJobs
):
#!/bin/bash
# The name of the job, can be anything, simply used when displaying the list of running jobs
#$ -N "my_job"
# Combining output/error messages into one file
#$ -j y
# Giving the name of the output log file
#$ -o <%= log.file %>
# One needs to tell the queue system to use the current directory as the working directory
# Or else the script may fail as it will execute in your top level home directory /home/username
#$ -cwd
# use environment variables
#$ -V
# define multiple cores per node
$ -pe smp <%= resources$n.cores %>
# use resource reservation
$ -R y
# we merge R output with stdout from SGE, which gets then logged via -o option
module load cluster-setup
module unload R
module load R/3.2.0
Rscript -e 'batchtools::doJobCollection("<%= uri %>")' /dev/stdout
exit 0
Everything works so that's great! In my previous template I had the following that utilized job.name
and arrayjobs
:
# The name of the job, can be anything, simply used when displaying the list of running jobs
#$ -N <%= job.name %>
# use job arrays
#$ -t 1-<%= arrayjobs %>
You can see that I simply hard-coded a job name in my new template, but is there a more elegant way to define the job name dynamically? Also how would I go about specifying arrayjobs? Thanks for your great work on this package!
...
should also be fine.
Hi,
I tried to use makeClusterFunctionsMulticore
.
My registry looks like this:
regis = makeExperimentRegistry("probs-muell",
packages = c("mlr", "OpenML"),
source = "/nfsmb/koll/probst/Random_Forest/RFParset/code/probst_defs.R",
work.dir = paste0(dir,"/results")
)
regis$cluster.functions = makeClusterFunctionsMulticore()
Then I try to submit it (after algorithm setting, etc)
ids = getJobTable()$job.id
ids = chunkIds(ids, chunk.size = 30)
submitJobs(ids)
getStatus()
gives me following (also after waiting more time):
getStatus()
Status for 240 jobs:
Submitted: 240 (100.0%)
Queued : 0 ( 0.0%)
Started : 0 ( 0.0%)
Running : 0 ( 0.0%)
Done : 0 ( 0.0%)
Error : 0 ( 0.0%)
I do not find anything done in my results folder.
What is wrong here?
unlink("registry", recursive = T)
reg = makeExperimentRegistry()
addProblem("p1")
addAlgorithm("a1", fun = function(instance, method, ...) {
print(str(method))
return(method)
})
ades = data.frame(method = c("a", "b"))
addExperiments(algo.designs = list(a1 = ades))
testJob(1)
this shows that "method" is a factor with one element, and 2 levels.
a) this is really nearly never what you want. you want "method" to be a string.
for potential problems see this:
mllg/checkmate#75
(see the switch problem there)
NB: these are NOT the same issues. the one in cm is about guarding against this. this here is about not creating the problem in the first place, also wrt code you later dont control.
b) of course one can say that this is a user error, as data.frame did not set stringsAsFactors = FALSE.
but this will happen one million times, even for experienced R coders.
although i dislike warning usually, this seems to be a clear case where a warning would be beneficial.
the other option would be to auto-convert the factor columns of the design to chars.
If the problem has a problem seed, there should be a possibility to cache the results. Use a hash of problem id and problem seed as unique identifier?
In BatchJobs, I think argument resources
(named list) of submitJobs()
was the only way to pass a variable to the template, which then is also named resources
. Is this the case for batchtools as well? It looks so from inspecting the code.
There are two reasons why I ask:
resources
is the only one, the couldn't one just attach its field so that the template doesn't have to do resources$foo
etc. each time?Maybe what I'm fishing for is a generic args
argument that is a named list (or environment) whose elements are attached to the template evaluation environment. Then it could look like this:
resources <- list(ncpus = 4, walltime = 3600, memory = 2.0)
submitJobs(reg, args = resources)
and the template could immediate access ncpus
, walltime
, and memory
without having to use resources$ncpus
etc. The downside might be that it's less clear what's an argument coming from the submitJobs()
call and what comes from the internals of batchtools.
Just a thought and wondered if you already thought about this in the past.
I think it should be 'Slurm' not 'SLURM', cf. http://slurm.schedmd.com/ and https://en.wikipedia.org/wiki/Slurm_Workload_Manager (Wikipedia implies it was SLURM in the past).
This affects some of your function names.
Hi,
makeRegistry
and makeExperimentRegistry
have two different defaults for conf.file
makeRegistry = function(file.dir = "registry", work.dir = getwd(), conf.file = "~/.batchtools.conf.R", ...)
makeExperimentRegistry = function(file.dir = "registry", work.dir = getwd(), conf.file ="~/.batchtools.conf.r",...)
(Just the difference between .R and .r)
I don't think this is intended?
Because there is no way (yet) to create a connection which combines its output with a prefix, output is delayed until the job is terminated. Find a better workaround.
We should implement a function to summarize the registry. For instance, it is hard to get the added problems from an experimental registry, hence an object containing this and other interesting information of the registry is necessary.
as you chose to use Rscript. it is really confusing to users to not do that.
see #15
hi,
this is a problem that came up recently in a project and i couldnt find a way to write this down properly with bt.
i have a couple of instances, and some algos. to simplify this, imagine that the problems dont have any params. so i just have p_1, ... p_k. for the algos i would precreate, as a data.frame, the different config settings i want to compute and study. but: the algo configs should not be the same for every p_i.
reason: instead of "variance reduction" (= try out the same setting for each p_i), i want more "exploration" to learn possibly better how the params affect the algo performance.
problem: batchtools does not allow this. as i have to specify "algo.design" which is then used for every p_i.
this is conceptually problematic as what i just outlined is something which is extremely common as an at least potential approach in experimental designs.
solution (?):
instead of always forcing the user to enter prob.design and algo design, then internally compute the crossproduct, let the user already pass the combined design as a single df / dt. then he has complete control.
so in my case i would pass something like
prob.id, algo.id, algo.par.1, algo.par.2, ..., algo.par.m
If you want to delete some jobs of the registry, removeExperiments(ids)
will delete all result files and not only the results with the specified ids!
Here an example:
library(batchtools)
reg <- makeExperimentRegistry(file.dir = "test_registry")
subsample <- function(data, job) {
n <- nrow(data)
train <- sample(n, floor(n * 0.5))
test <- setdiff(seq(n), train)
list(test = test, train = train)
}
data("iris", package = "datasets")
addProblem(reg, name = "iris", data = iris, fun = subsample, seed = 123)
forest.wrapper <- function(job, data, instance, ...) {
library("randomForest")
mod <- randomForest(Species ~ ., data = data,
subset = instance$train, ...)
pred <- predict(mod, newdata = data[instance$test, ])
table(data$Species[instance$test], pred)
}
addAlgorithm(reg = reg, name = "forest", fun = forest.wrapper)
minsplit <- c(5, 10, 20, 6)
cp <- c(0.01, 0.1, 0.01, 0.1)
ntree <- c(100, 500, 1000, 200)
design <- data.frame(minsplit = minsplit, cp = cp, ntree = ntree)
algodes <- list( forest = design)
addExperiments(reg = reg, algo.design = algodes, repls = 2)
summarizeExperiments(reg = reg)
submitJobs()
getStatus()
all_jobs <- getJobPars(reg = reg)
jobs_500 <- subset(all_jobs, all_jobs$ntree == 500)
res_1 <- reduceResults(reg = reg, ids = jobs_500$job.id, fun = function(x, y) c(x, y))
res_1
> removeExperiments(ids = jobs_500$job.id)
Removing 2 Experiments
Cleaning up 1 job definitions
Removing 2 obsolete result files
Removing 2 obsolete log files
job.id
1: 3
2: 4
> res_2 <- reduceResults(reg = reg, fun = function(x, y) c(x, y))
Error in gzfile(file, "rb") : cannot open the connection
In addition: Warning message:
In gzfile(file, "rb") :
cannot open compressed file '/test_registry/results/1.rds', probable reason 'No such file or directory'
It's kind of a hassle to conduct easy experiments with batchtools that are more complicated than batchMap
but kind of to simple for the algo problem differntiation.
For example if I have a static problem without parameters I still have to generate the problem.design
list.
At the moment it kind of looks like this:
for (obj.fun in obj.funs){
addProblem(name = getID(obj.fun), data = list(obj.fun = obj.fun))
}
pdes = lapply(obj.funs, function(x) data.frame())
names(pdes) = sapply(obj.funs, getID)
This is kind of suboptimal to read and comprehend.
This issue is part of this JOSS review
In the Pi example vignette, there is "see one of the other vignettes for an example", maybe give the title of the mentioned vignette?
This issue is part of this JOSS review
The Appveyor badge in the README is red and refers to the estimateRuntimes branch, while the build for master is not failing (but it is the one one gets when clicking on the badge). I'm not sure it's a real problem, I wonder if it would make more sense to have two badges or to show the Appveyor badge for the master branch only.
If I try batchtools with the SGE template it fails on our cluster with:
Fatal error occurred: 101. Command 'qsub' produced exit code 1. Output: 'Unable to run job: denied: "528799ad34365a4c3e32ebb3f08bfd98" is not a valid object name (cannot start with a digit)
Which comes from the SGE template file:
#$ -N <%= job.hash %>
Fix can be as simple as prefixing something to the hash:
#$ -N job<%= job.hash %>
I'm not sure if this is a difference between the original Sun Grid Engine and the Son-of Grid Engine that we use.
currently one has to write this
reg = makeExperimentRegistry(file.dir = NA, make.default = FALSE)
prob = addProblem(reg = reg, "p1", data = 1)
prob = addProblem(reg = reg, "p2", data = 2)
algo = addAlgorithm(reg = reg, "a", fun = function(...) list(...))
algo.designs = list(a = data.table(par1 = 1, par2 = 2))
addExperiments(reg = reg, prob.designs = list(p1 = data.table()), algo.designs = algo.designs)
BE offered the possibility to shorten the last line to
addExperiments(reg = reg, prob.designs = "p1", algo.designs = algo.designs)
and i would suggest that we should allow this again
this is simple and useful convenience for the user. if i pass a charvec (x, y, z) this is internally tranformed to list(x = data.table(), y = data.table(), z = data.table())
Currently the building test on AppVeyor fails. The reason is an issue with r-appveyor
which doesn't load dependencies of dependencies as mentioned here: krlmlr/r-appveyor#69
As a workaround I add the missing packages to the appveyor.yml
in commit 40ab0d3 to load these packages manually.
I am getting my results with
reduceResultsDataTable(ids = ids_classif, fun = function(r) as.data.frame(as.list(r)), reg = regis, fill = TRUE)
and it takes about 5 hours (50000 jobs). Is there a way to make it faster, maybe parallelizable?
This issue is part of this JOSS review
The link to https://mllg.github.io/batchtools/ (really nice that the package has the website by the way) could be added to the description of the repository, so that it might be easier to see?
Hi,
I tried to use batchtools on windows.
I get the following error launching getStatus after creating my experiment.
> getStatus()
Error: 'mccollect' is not an exported object from 'namespace:parallel'
Looking at the function mccollect, it seems not available in windows:
https://stat.ethz.ch/R-manual/R-devel/library/parallel/html/mcparallel.html
Should I use BatchExperiment instead ?
Raphael
This was found by one of our master students.
Here is a smallish example:
library(batchtools)
reg <- makeExperimentRegistry(file.dir = "test_registry")
subsample <- function(data, job) {
n <- nrow(data)
train <- sample(n, floor(n * 0.5))
test <- setdiff(seq(n), train)
list(test = test, train = train)
}
data("iris", package = "datasets")
addProblem(reg, name = "iris", data = iris, fun = subsample, seed = 123)
forest.wrapper <- function(job, data, instance, ...) {
library("randomForest")
mod <- randomForest(Species ~ ., data = data,
subset = instance$train, ...)
pred <- predict(mod, newdata = data[instance$test, ])
table(data$Species[instance$test], pred)
}
addAlgorithm(reg = reg, name = "forest", fun = forest.wrapper)
minsplit <- c(5, 10, 20, 6)
cp <- c(0.01, 0.1, 0.01, 0.1)
ntree <- c(100, 500, 1000, 200)
design <- data.frame(minsplit = minsplit, cp = cp, ntree = ntree)
algodes <- list( forest = design)
addExperiments(reg = reg, algo.design = algodes, repls = 2)
summarizeExperiments(reg = reg)
submitJobs()
getStatus()
all_jobs <- getJobPars(reg = reg)
jobs_500 <- subset(all_jobs, all_jobs$ntree == 500)
res_1 <- reduceResults(reg = reg, ids = jobs_500$job.id, fun = function(x, y) c(x, y))
res_1
removeExperiments(ids = jobs_500$job.id)
res_2 <- reduceResults(reg = reg, fun = function(x, y) c(x, y))
[Same results if we use the data.table]
Also, if we specify no ids in removeExperiments
the documentation says that no jobs are deleted. But it seems that all results are deleted as well.
Currently problem
and algorithm
are returned as characters, whereas nominal parameters are stored as factors in the result of getJobPars
. This results in problems for statistical methods like ranger. Here data.matrix(data)
is used which return NA on character columns.
Does it make sense to store everything by default as factors?
There is no function resetJobs
like in BatchJobs. Maybe this could be helpful, although it is also possible to resubmit the Job just by writing submitJobs(ids)
again.
This issue is part of this JOSS review
The JSS paper is mentioned as a resource, which is understandable. I have a few comments/questions on this:
Is there any way to update the JSS paper to mention the new package? Or at least to add a warning in https://www.jstatsoft.org/article/view/v064i11 ?
Could you precise more in the README what is still usable in the JSS paper (e.g. the section about " We use an ExperimentRegistry where the job definition is split into creating problems and algorithms. " according to one batchtools
vignette) / what parts of the interfaces changed (besides having one single package now). I guess one can get this information by reading NEWS.md + the JSS paper, but NEWS.md contains other information too. This would also make the novelty of the software described in the JOSS paper clearer.
The second paragraph of the abstract of the JSS paper is really nice (the list with letters), I wonder if it is allowed to have it in your README / in an intro vignette too.
the clusterfunctions rely on issuing a couple of specific system commands. We do this by having (user configurable) R code where this cmds are generated and then executed.
Better approach might be this.
We shoud add another layer of abstraction. Which are the batch commands.
This is a small set of aliases / scripts.
Like this:
btsubmit
btkill
btlist
These we ship out, like we do with the cfs before, but the user could also adapt them if they have to.
This has a couple of advantages:
I would actually ship out at some point some extra cmds which kinda only make sense on the console:
btkill-all: kill everything from the CLI the hard way
btlist: with an better overview option, so you can see what currently runs, with the state.
One even later add stuff like show-active-user or show-queues, which we now have as "bad throw-away" versions for lido and which do not work at other places.
I fixed some issues of waitForJobs
in 7925170. However, the progressbar hides before all jobs are done. A MWE will follow.
This issue is part of this JOSS review
In the reference section of the package website, each .Rd is an entry, and there are many of them. I wonder if it would make the documentation easier to read in that part if the author used a _pkgdown.yml for creating groups, see e.g. this one with the resulting reference section
one really always want to have the job and algo ids in the results.
yes, one can getJobTable, reducerResultsDataTable and then join / merge the 2.
but this is a somewhat unintuitive and cumbersome for the absolute default use case
Hey, just following up on our discussion at useR about a standardized filename format for automatic lookup of template files. My proposal is to use filenames of format:
.batchtools.<scheduler>.tmpl
Examples:
.batchtools.slurm.tmpl
.batchtools.torque.tmpl
.batchtools.sge.tmpl
.batchtools.openlava.tmpl
.batchtools.lsf.tmpl
One reason for the <scheduler>
part is that I can image projects with collaborators that work on different systems and this would provide (at least some flexibility) to provide multiple template files in the same (Git) repository.
Where should these template files be located? The default could be be to search for the first available file in:
.
)~
)I'm hesitating whether it is a good thing to also fall back to a generic template file provided by the package or not. The reason why I'm not sure is that it might be confusing and it's not clear if it will work for all schedulers and for all setups. On the other hand, if it is possible to find a good enough template, then it's pretty neat.
I'm having a problem that occurs only with LSF.
The simplest reproducible code is this:
> btlapply(1:3, function(x) x^2)
Sourcing configuration file '~/.batchtools.conf.R' ...
Adding 3 jobs ...
Error: $ operator is invalid for atomic vectors
My config file and template follow.
I have some experience with BatchJobs so I have a good working LSF system. Perhaps I do not understand the setup or there is a problem?
You're help is greatly appreciated.
I will be running 10's of millions of jobs with batchtools and will be happy to report results:
###cluster.functions = makeClusterFunctionsLSF()
cluster.functions = makeClusterFunctionsLSF(template="/xxxx/users/yyyy/lsf_bob0.tmpl")
###cluster.functions = makeClusterFunctionsInteractive()
mail.start = "none"
mail.done = "none"
mail.error = "none"
#BSUB-J <%= job.hash %>
Rscript -e 'batchtools::doJobCollection("<%= uri %>")'
This issue is part of this JOSS review
This is again only a suggestion: the JSS paper has a flowchart for batchExperiments
, maybe a flowchart would make sense to explain the relationships between batchtools
function?
It would be nice if there was a handy option to define the resources needed by a specific algorithm or problem within the definition of them.
Now you just generate everything and then have to select afterwards what to start with which resources.
Further remark: Sending jobs with different resource demands in different batches might lead to underutilized queues so it would be beneficial to be able to start jobs kind of stratified.
in BE for good reason we had a parallel reduction method.
as for regs with a larger number of results / a non trivial reduction which at least does a little bit of "computation" or transformation you dont want to wait for hours (if you are on a parallel system)
is this supported now? because i dont think so
This issue is part of this JOSS review
The README states "As a successor of the packages BatchJobs and BatchExperiments, batchtools".
Maybe you could explain why it is a successor of those packages, e.g. what it does better? Is it meant to replace the other two packages?
In the README of those two packages I could not find a reference to batchtools
. It might make sense to add a link to batchtools
in the README (and even documentation for non Github users) of BatchJobs
and BatchExperiments
, if batchtools
does some tasks better?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.