berndbischl / bbmisc Goto Github PK

License: Other

Makefile 0.01% R 96.25% C 3.73%

bbmisc's Introduction

BBmisc

Miscellaneous helper functions for and from B. Bischl and some other guys at TU Dortmund, mainly for package development.

Offical CRAN release site: http://cran.r-project.org/web/packages/BBmisc/index.html
R Documentation in HTML: http://berndbischl.github.io/BBmisc/man
Run this in R to install the current GitHub version:
```
devtools::install_github("berndbischl/BBmisc")
```
Further installation instructions

bbmisc's People

Contributors

Stargazers

Watchers

Forkers

00tau mb706 shadogray lsheble michaelchirico rnaimehaom haniyeka

bbmisc's Issues

makeDataFrame: do we allow factor as a column type?

If yes, how do we prespecify levels....?

Really needed?

col.names row.names args should allow integers too

Check all methods.

checkArg misses 'lower'

If I understand the docu correct, the function "checkArg" should throw errors on these two checks, because the vector hits the boundaries:

cTest <- c(1, 0, 10, 3)
checkArg(cTest, "numeric", lower = 1)
checkArg(cTest, "numeric", upper = 8)

matchDataframeSubset

We currently have an unexported function to find a certain subset of a data.frame which matches against something.

It has a FIXME that it is not used anywhere but it is probably useful in general?

convertListOfRowsToDataFrame must be reread and better tested

list of vectors of the same type must also be supported

system3 popen cannot allocate memory issue

Hi!

I'm testing system3 which I find suiting all my needs in terms of return values. After a number of invocations I start getting:

Error in system2(command = command, args = args, stdout = stdout, stderr = stderr, :
cannot popen ''ciop-publish' -r -m /var/lib/hadoop-0.20/cache/mapred/mapred/local/taskTracker/fbrito/jobcache/job_201401131608_0076/attempt_201401131608_0076_r_000000_0/work/tmp/output/2012-04-04_cluster_1.wkt', probable reason 'Cannot allocate memory'
Calls: rciop.publish -> system3

Have you ever come across this popen issue?

TIA!

Fabrice

checkArg should be removed soon / deprecated

We have checkmate now

Naming conventions in merge

Can we pls only make exceptions for stuff like is.error, where we test for types and so on, and therefore the R naming convention takes preference?

And in all other cases keep camel case?

IE isSuperSet, isSubset?

add nice debug printer for arbitrary objects

Use requireNamespace in requirePackages?

See http://r-pkgs.had.co.nz/namespace.html

Jakob's new equals function

JakobR introduced a new equals operator that handles NAs in a different way.

It is currently in todo-files beause:

I do not like the name, as it does not tell me what the operator does
I am not sure how often it is needed
I need to publish a new version.

We can use this thread to discuss it.

Automatically enable progress bar for apply family of functions

extractSubList: bug in use.names

When we simply to a matrix, we call names() on the result.

This is wrong we have to set col-names or row-names only!

Chance for renaming and unifying

Recently we decided to rename BBmisc to HELL. There will be one update to BBmisc soon to inform users about our new package.

In line with the renaming of the package we have the possibility to arrive at an agreement concerning naming conventions. Therefore each developer should go through the currently available functions and point out possible candidates for the renaming procedure.

convertRows/ColsToList bug

It is unclear how use.names is to be interpreted. We have names for rows and cols of the input.
Currently does work perfectly.

Unitest and extend.

all our new apply functions should dispatch to parallelMap

Obvious extension.

Do we have circular depedency then?

Test do.call2 in other packages

It is pretty nasty to test w.r.t. scoping, but think we should give it a try in mlr, BatchJobs, etc.

Use case: function arguments given in a list-like structure + more arguments in the current environment. I.e. something like do.call(fun, c(args, list(data=data, foo=bar))) will become do.call2(fun, data=data, foo=bar, .args=args) which will not inflict a copy of data and foo.

makeProgressBar() outputs to stdout - ideally stderr

BBmisc::makeProgressBar() outputs to the standard output. The preferred is to output to standard error for such messages, cf. message(). The problem with sending this to stdout is that report generators such as Sweave, knitr, R.rsp etc. echo stdout to the generated report. This means that the progress bar of BBmisc clutter up the report (and won't be displayed as "progress" to the user while actually running).

I would consider this a major issue or even a critical one, since it renders BatchJobs progress bars useless and unusable for report generators.

**EXAMPLE:

> library("BBmisc")
> options(BBmisc.ProgressBar.width=50)
> bar <- makeProgressBar(max=5, label="test-bar")
> bar$inc(1)
test-bar |++++                 |  20% (00:00:00)>
> bfr <- capture.output(bar$inc(1))
> bfr
[1] "\rtest-bar |++++++++             |  40% (00:00:04)"

capture.output() captures stdout - not stderr.

> sessionInfo()
R version 3.1.0 Patched (2014-04-17 r65403)
Platform: x86_64-w64-mingw32/x64 (64-bit)

locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] BBmisc_1.5

loaded via a namespace (and not attached):
[1] tools_3.1.0

constant vector in normalize, method = range

Is this a bug? It happens for every constant vector.

normalize(c(0, 0), method = "range")
[1] NaN NaN

From a mathematical point of view, it's correct. On the other hand - if normalizing i don't want to think about "uhm ... can my vector be constant?", I just want to use this function without additional prechecks. Perhaps a better behaviour would be to return the min/max/mean value of the given range?

Remove deprecated funs in HELL 1.0

stringsAsFactos, convertDfCols, listToShortString

But we need to check this with a rev-dep test first.

add span function for num vecs

so max - min, if that does really not exist in R ?

showInfo function for console logging messages?

Abstraction and saves if statement in client code.
Bit currently we handle this by a separate function arg in client code, do we always want to pass that?

rowLapply: why dont we pass list elements directy to fun args

We convert a row to a list, then apply fun to it.

It is often more convenient to directly bind the elements to the args of fun.
This is currently not supported.

add function measureTime

Michel can you add the code here you wanted?

Have requirePackages check package versions

This should be parsed from the description of the package (in particular suggests).

EDA error on features page, maxsat12-pms

See here

## Error: length of 'dimnames' [2] not equal to array extent
## Error: if argument 'digits' is a vector of length more than one, it must
## have length equal to 3 ( ncol(x) + 1 )

http://berndbischl.github.io/coseal-algsel-benchmark-repo/task-pages/maxsat12-pms/features.html

Feature Request: Normalize to different ranges

I want to normalize a matrix of values with the range-method. I can specify the range for the normalization with the range-argument, but I can just define one range that wll be used for every row / col.

Now I have an application, in that I want to normalize every col to a different range. So perhaps it would be good to have the possibility to specify a matrix of normalization ranges?

(Of course I can also do it at the moment by mapping normalize to every col of my matrix or multiplying each row with the wanted factor after the normalization. But the code bad ...)

coalesce should be able to handle functions

Currently it is not possible to pass functions as parameters to coalesce. The following fails:

x = coalesce(NULL, min)

document and test first, last and binpack

tests should be moved to tests/testthat

See https://github.com/hadley/testthat and https://github.com/hadley/testthat/blob/master/NEWS.md. Incompatible with the infamous test_all.R file...

convertColsToList messes with data types

I think this is not what how is supposed to be, isn't it?

df.tmp = data.frame(a = 1:10, b = letters[1:10])
convertColsToList(df.tmp)
[[1]]
 [1] " 1" " 2" " 3" " 4" " 5" " 6" " 7" " 8" " 9" "10"

[[2]]
 [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j"

getRelativepath

Why is this not exported?

Is it finished?

add first / last element function for lists and vectors

methods / is : Remove s4 dead code

In some cases we still call is(...), but we do not import methods.

We really do not use this S4 functionality and want to switch to checkmate later on anyway.

Remove it.

finalize do.call2

allow repeated indexing in extractSublist, when you pass a vector of indices

global univariate optimizer

IMHO this might also go into its own package

Must be done in C.

Here are some pointers:

/http://arxiv.org/pdf/1307.3522v1.pdf

http://www.mat.univie.ac.at/~neum/glopt/1d.html

http://www.mat.univie.ac.at/~neum/software/ls/

http://ctt.sbras.ru/interval/Library/Thematic/GlobOptim/GlOptUsingIA-1.pdf

Bug in normalize

Doku says:
For data.frames, only the numeric columns are touched, all others are left unchanged.

But normalize(iris) drops the class col, all non-numeric cols seems to be dropped.

Can I fix it? Or do you want the function to drop non-numeric cols?

Helper function to take apart path elements, basename and file extension?

extractSubList(xs, ..., simplify=TRUE) returns a list() if xs is empty

I sometimes use extractSubList with simplify=TRUE as an index into another vector/list:

named_vector[extractSubList(someList, "names")]

This fails in the corner-case when someList is empty, since extractSubList returns a list() in that case. This could be fixed by returning some empty vector or NULL in that case (and if simplify == TRUE).

NULL seems like a good idea at first, but I would argue against it since returning NULL can be dangerous when the user doesn't expect it (x$abc = 1; x$a = unexpected_null() ; x$a gives 1). I would suggest raw(0) or logical(0).

Extend 'choices' test of "checkArg" to vectors

An argument of a function has to contain a vector with defined entries. Currently it is not possible to check this with "checkArg".
The last example should not return a warning.

Examples:
checkArg("One", choices = c("One", "Two", "Three")) # ok
checkArg("1", choices = c("One", "Two", "Three")) # gives expected warning
checkArg(c("One", "Three"), choices = c("One", "Two", "Three")) # ok, but warning

deep copy function for objects that contain envirs

Make coalesce's documentation more clear about it swallowing errors

> coalesce(stop("I will die before I return NULL"))
NULL

... surprised me. This is a bit of a problem when someone uses coalesce together with a function call that never returns NULL but which may fail, and then expects the return of coalesce to be non-NULL.

dapply extension?

I think it is not really documented, that we will get columns in the result df.

Although if smart one could guess...

I just had a scenario where I wanted the call results (lists) as rows.
Is this still dapply? I mean should we extend its usage for this?

Don't evaluate .args in do.call2

Hi,

currently .args in do.call2 gets evaluated so using .args instead of the ... doesn't really give a speed boost.

You could avoid that by replacing

    c(list(as.name(fun)), ddd, args)

with

    c(list(as.name(fun)), ddd, as.list(substitute(args)[-1]))

EDIT2: The solution used to be a lot more complicated. I'd completely forgotten that I'd already solved the same problem months ago. Benchmarks are for the old version but the differences are miniscule.

Speed comparison (code below). do.call3 is do.call2 with the above modification:

Unit: microseconds
                              expr      min       lq       mean     median        uq        max neval cld
         do.call("head", list(DT)) 3660.877 3881.071 18856.1073 10748.9050 12374.700  73379.548    10  ab
  do.call("head", list(quote(DT)))  599.625  756.372   787.4424   776.4325   821.062    984.653    10  a 
              do.call2("head", DT)  825.105  861.182   894.3355   877.1990   926.805    994.293    10  a 
 do.call2("head", args = list(DT)) 4088.513 7095.027 23175.9467 11260.6695 14250.389 100241.001    10   b
              do.call3("head", DT)  611.443  814.842   850.4215   858.6945   869.269   1065.826    10  a 
 do.call3("head", args = list(DT))  829.771  865.847   918.1589   920.1185   956.661   1016.376    10  a

N<-1e6
DT<-data.table(x=rnorm(N),y=runif(N),z=rpois(N,1))

microbenchmark(
  do.call("head", list(DT)),
  do.call("head", list(quote(DT))),
  do.call2("head", DT),
  do.call2("head", args=list(DT)),
  do.call3("head", DT),
  do.call3("head", args=list(DT)),
  times=1e1
)

remove attachFunction clearFunctionEnv

put in todofiles

ensureVector() not so integer friendly

Please check if that is the desired outcome

> x = 3
> ensureVector(x, n = 3)
[1] 3 3 3
> ensureVector(x, n = 3, cl = "integer")
[1] 3
> ensureVector(x, n = 3, cl = "numeric")
[1] 3 3 3

rowSapply for matrices?

How do I do this operation est in R?

given A matrix A
I would like to apply a function to every row.
this function returns a vector (a new row)

I want the resulting matrix.

HELL: redesign apply functions

We REALLY need a more consistent naming style and - more importantly - a CONSISTENT - behavior.

Here some ideas, REALLY not finished, please discuss this:

element A: Type that goes in. Examples:
lists, vectors, df, matrix, array. what else?

-- sub question for A: Depending on the type here, we can apply to different stuff.
Example: rows, cols, or in general a margin for an array.
Is it always clear what we bind to the args of the fun? and in what way?

element B:
extra arguments. Always with more.args or ...? Or what?
element C: Type that should be produced.
Here we can either "simplify" (which often means "guess") or let the user specify the
type. I am strongly in favor of specifying the type, so it is visible in the signature.

--subquestion for C:
Like above we might want to have results as rows or cols.

element D:
How should the names of the input be stuck to the output?
empty stuff:
Often we want to be able to do 0 applications and result of the correct type.
This often annoying in R. v*apply does this correctly.

Further questions:

Are there packages that already implement this correctly?
What other niceties did we aim for in the current BBmisc apply wrappers?
Have we covered everything here?
Can we automatically display a progress bar, if wanted?
Can we automatically parallelize? Do we want this?