Code Monkey home page Code Monkey logo

bbmisc's Introduction

bbmisc's People

Contributors

berndbischl avatar coorsaa avatar danielhorn avatar ja-thomas avatar jakob-r avatar jakobbossek avatar mb706 avatar mllg avatar olafmersmann avatar studerus avatar surmann avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

bbmisc's Issues

Don't evaluate .args in do.call2

Hi,

currently .args in do.call2 gets evaluated so using .args instead of the ... doesn't really give a speed boost.

You could avoid that by replacing

    c(list(as.name(fun)), ddd, args)

with

    c(list(as.name(fun)), ddd, as.list(substitute(args)[-1]))

EDIT2: The solution used to be a lot more complicated. I'd completely forgotten that I'd already solved the same problem months ago. Benchmarks are for the old version but the differences are miniscule.

Speed comparison (code below). do.call3 is do.call2 with the above modification:

Unit: microseconds
                              expr      min       lq       mean     median        uq        max neval cld
         do.call("head", list(DT)) 3660.877 3881.071 18856.1073 10748.9050 12374.700  73379.548    10  ab
  do.call("head", list(quote(DT)))  599.625  756.372   787.4424   776.4325   821.062    984.653    10  a 
              do.call2("head", DT)  825.105  861.182   894.3355   877.1990   926.805    994.293    10  a 
 do.call2("head", args = list(DT)) 4088.513 7095.027 23175.9467 11260.6695 14250.389 100241.001    10   b
              do.call3("head", DT)  611.443  814.842   850.4215   858.6945   869.269   1065.826    10  a 
 do.call3("head", args = list(DT))  829.771  865.847   918.1589   920.1185   956.661   1016.376    10  a 
N<-1e6
DT<-data.table(x=rnorm(N),y=runif(N),z=rpois(N,1))

microbenchmark(
  do.call("head", list(DT)),
  do.call("head", list(quote(DT))),
  do.call2("head", DT),
  do.call2("head", args=list(DT)),
  do.call3("head", DT),
  do.call3("head", args=list(DT)),
  times=1e1
)

ensureVector() not so integer friendly

Please check if that is the desired outcome

> x = 3
> ensureVector(x, n = 3)
[1] 3 3 3
> ensureVector(x, n = 3, cl = "integer")
[1] 3
> ensureVector(x, n = 3, cl = "numeric")
[1] 3 3 3

extractSubList(xs, ..., simplify=TRUE) returns a list() if xs is empty

I sometimes use extractSubList with simplify=TRUE as an index into another vector/list:

named_vector[extractSubList(someList, "names")]

This fails in the corner-case when someList is empty, since extractSubList returns a list() in that case. This could be fixed by returning some empty vector or NULL in that case (and if simplify == TRUE).

NULL seems like a good idea at first, but I would argue against it since returning NULL can be dangerous when the user doesn't expect it (x$abc = 1; x$a = unexpected_null() ; x$a gives 1). I would suggest raw(0) or logical(0).

Feature Request: Normalize to different ranges

I want to normalize a matrix of values with the range-method. I can specify the range for the normalization with the range-argument, but I can just define one range that wll be used for every row / col.

Now I have an application, in that I want to normalize every col to a different range. So perhaps it would be good to have the possibility to specify a matrix of normalization ranges?

(Of course I can also do it at the moment by mapping normalize to every col of my matrix or multiplying each row with the wanted factor after the normalization. But the code bad ...)

HELL: redesign apply functions

We REALLY need a more consistent naming style and - more importantly - a CONSISTENT - behavior.

Here some ideas, REALLY not finished, please discuss this:

  • element A: Type that goes in. Examples:
    lists, vectors, df, matrix, array. what else?

-- sub question for A: Depending on the type here, we can apply to different stuff.
Example: rows, cols, or in general a margin for an array.
Is it always clear what we bind to the args of the fun? and in what way?

  • element B:
    extra arguments. Always with more.args or ...? Or what?
  • element C: Type that should be produced.
    Here we can either "simplify" (which often means "guess") or let the user specify the
    type. I am strongly in favor of specifying the type, so it is visible in the signature.

--subquestion for C:
Like above we might want to have results as rows or cols.

  • element D:
    How should the names of the input be stuck to the output?
  • empty stuff:
    Often we want to be able to do 0 applications and result of the correct type.
    This often annoying in R. v*apply does this correctly.

Further questions:

  • Are there packages that already implement this correctly?
  • What other niceties did we aim for in the current BBmisc apply wrappers?
    Have we covered everything here?
  • Can we automatically display a progress bar, if wanted?
  • Can we automatically parallelize? Do we want this?

constant vector in normalize, method = range

Is this a bug? It happens for every constant vector.

normalize(c(0, 0), method = "range")
[1] NaN NaN

From a mathematical point of view, it's correct. On the other hand - if normalizing i don't want to think about "uhm ... can my vector be constant?", I just want to use this function without additional prechecks. Perhaps a better behaviour would be to return the min/max/mean value of the given range?

Test do.call2 in other packages

It is pretty nasty to test w.r.t. scoping, but think we should give it a try in mlr, BatchJobs, etc.

Use case: function arguments given in a list-like structure + more arguments in the current environment. I.e. something like do.call(fun, c(args, list(data=data, foo=bar))) will become do.call2(fun, data=data, foo=bar, .args=args) which will not inflict a copy of data and foo.

dapply extension?

I think it is not really documented, that we will get columns in the result df.

Although if smart one could guess...

I just had a scenario where I wanted the call results (lists) as rows.
Is this still dapply? I mean should we extend its usage for this?

rowSapply for matrices?

How do I do this operation est in R?

  • given A matrix A
  • I would like to apply a function to every row.
  • this function returns a vector (a new row)

I want the resulting matrix.

convertColsToList messes with data types

I think this is not what how is supposed to be, isn't it?

df.tmp = data.frame(a = 1:10, b = letters[1:10])
convertColsToList(df.tmp)
[[1]]
 [1] " 1" " 2" " 3" " 4" " 5" " 6" " 7" " 8" " 9" "10"

[[2]]
 [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j"

convertRows/ColsToList bug

It is unclear how use.names is to be interpreted. We have names for rows and cols of the input.
Currently does work perfectly.

Unitest and extend.

Jakob's new equals function

JakobR introduced a new equals operator that handles NAs in a different way.

It is currently in todo-files beause:

  • I do not like the name, as it does not tell me what the operator does
  • I am not sure how often it is needed
  • I need to publish a new version.

We can use this thread to discuss it.

methods / is : Remove s4 dead code

In some cases we still call is(...), but we do not import methods.

We really do not use this S4 functionality and want to switch to checkmate later on anyway.

Remove it.

checkArg misses 'lower'

If I understand the docu correct, the function "checkArg" should throw errors on these two checks, because the vector hits the boundaries:

cTest <- c(1, 0, 10, 3)
checkArg(cTest, "numeric", lower = 1)
checkArg(cTest, "numeric", upper = 8)

Bug in normalize

Doku says:
For data.frames, only the numeric columns are touched, all others are left unchanged.

But normalize(iris) drops the class col, all non-numeric cols seems to be dropped.

Can I fix it? Or do you want the function to drop non-numeric cols?

Naming conventions in merge

Can we pls only make exceptions for stuff like is.error, where we test for types and so on, and therefore the R naming convention takes preference?

And in all other cases keep camel case?

IE isSuperSet, isSubset?

Chance for renaming and unifying

Recently we decided to rename BBmisc to HELL. There will be one update to BBmisc soon to inform users about our new package.

In line with the renaming of the package we have the possibility to arrive at an agreement concerning naming conventions. Therefore each developer should go through the currently available functions and point out possible candidates for the renaming procedure.

Extend 'choices' test of "checkArg" to vectors

An argument of a function has to contain a vector with defined entries. Currently it is not possible to check this with "checkArg".
The last example should not return a warning.

Examples:
checkArg("One", choices = c("One", "Two", "Three")) # ok
checkArg("1", choices = c("One", "Two", "Three")) # gives expected warning
checkArg(c("One", "Three"), choices = c("One", "Two", "Three")) # ok, but warning

makeProgressBar() outputs to stdout - ideally stderr

BBmisc::makeProgressBar() outputs to the standard output. The preferred is to output to standard error for such messages, cf. message(). The problem with sending this to stdout is that report generators such as Sweave, knitr, R.rsp etc. echo stdout to the generated report. This means that the progress bar of BBmisc clutter up the report (and won't be displayed as "progress" to the user while actually running).

I would consider this a major issue or even a critical one, since it renders BatchJobs progress bars useless and unusable for report generators.

**EXAMPLE:

> library("BBmisc")
> options(BBmisc.ProgressBar.width=50)
> bar <- makeProgressBar(max=5, label="test-bar")
> bar$inc(1)
test-bar |++++                 |  20% (00:00:00)>
> bfr <- capture.output(bar$inc(1))
> bfr
[1] "\rtest-bar |++++++++             |  40% (00:00:04)"

capture.output() captures stdout - not stderr.

> sessionInfo()
R version 3.1.0 Patched (2014-04-17 r65403)
Platform: x86_64-w64-mingw32/x64 (64-bit)

locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] BBmisc_1.5

loaded via a namespace (and not attached):
[1] tools_3.1.0

system3 popen cannot allocate memory issue

Hi!

I'm testing system3 which I find suiting all my needs in terms of return values. After a number of invocations I start getting:

Error in system2(command = command, args = args, stdout = stdout, stderr = stderr, :
cannot popen ''ciop-publish' -r -m /var/lib/hadoop-0.20/cache/mapred/mapred/local/taskTracker/fbrito/jobcache/job_201401131608_0076/attempt_201401131608_0076_r_000000_0/work/tmp/output/2012-04-04_cluster_1.wkt', probable reason 'Cannot allocate memory'
Calls: rciop.publish -> system3

Have you ever come across this popen issue?

TIA!

Fabrice

matchDataframeSubset

We currently have an unexported function to find a certain subset of a data.frame which matches against something.

It has a FIXME that it is not used anywhere but it is probably useful in general?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.