Code Monkey home page Code Monkey logo

magrittr's Introduction

magrittr

CRAN status Codecov test coverage R-CMD-check

Overview

The magrittr package offers a set of operators which make your code more readable by:

  • structuring sequences of data operations left-to-right (as opposed to from the inside and out),
  • avoiding nested function calls,
  • minimizing the need for local variables and function definitions, and
  • making it easy to add steps anywhere in the sequence of operations.

The operators pipe their left-hand side values forward into expressions that appear on the right-hand side, i.e. one can replace f(x) with x %>% f(), where %>% is the (main) pipe-operator. When coupling several function calls with the pipe-operator, the benefit will become more apparent. Consider this pseudo example:

the_data <-
  read.csv('/path/to/data/file.csv') %>%
  subset(variable_a > x) %>%
  transform(variable_c = variable_a/variable_b) %>%
  head(100)

Four operations are performed to arrive at the desired data set, and they are written in a natural order: the same as the order of execution. Also, no temporary variables are needed. If yet another operation is required, it is straightforward to add to the sequence of operations wherever it may be needed.

If you are new to magrittr, the best place to start is the pipes chapter in R for data science.

Installation

# The easiest way to get magrittr is to install the whole tidyverse:
install.packages("tidyverse")

# Alternatively, install just magrittr:
install.packages("magrittr")

# Or the development version from GitHub:
# install.packages("devtools")
pak::pak("tidyverse/magrittr")

Usage

Basic piping

  • x %>% f is equivalent to f(x)
  • x %>% f(y) is equivalent to f(x, y)
  • x %>% f %>% g %>% h is equivalent to h(g(f(x)))

Here, “equivalent” is not technically exact: evaluation is non-standard, and the left-hand side is evaluated before passed on to the right-hand side expression. However, in most cases this has no practical implication.

The argument placeholder

  • x %>% f(y, .) is equivalent to f(y, x)
  • x %>% f(y, z = .) is equivalent to f(y, z = x)

Re-using the placeholder for attributes

It is straightforward to use the placeholder several times in a right-hand side expression. However, when the placeholder only appears in a nested expressions magrittr will still apply the first-argument rule. The reason is that in most cases this results more clean code.

x %>% f(y = nrow(.), z = ncol(.)) is equivalent to f(x, y = nrow(x), z = ncol(x))

The behavior can be overruled by enclosing the right-hand side in braces:

x %>% {f(y = nrow(.), z = ncol(.))} is equivalent to f(y = nrow(x), z = ncol(x))

Building (unary) functions

Any pipeline starting with the . will return a function which can later be used to apply the pipeline to values. Building functions in magrittr is therefore similar to building other values.

f <- . %>% cos %>% sin 
# is equivalent to 
f <- function(.) sin(cos(.)) 

Pipe with exposition of variables

Many functions accept a data argument, e.g. lm and aggregate, which is very useful in a pipeline where data is first processed and then passed into such a function. There are also functions that do not have a data argument, for which it is useful to expose the variables in the data. This is done with the %$% operator:

iris %>%
  subset(Sepal.Length > mean(Sepal.Length)) %$%
  cor(Sepal.Length, Sepal.Width)
#> [1] 0.3361992

data.frame(z = rnorm(100)) %$%
  ts.plot(z)

Code of Conduct

Please note that the magrittr project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.

magrittr's People

Contributors

ajschumacher avatar batpigandme avatar bfgray3 avatar casallas avatar cathblatter avatar davisvaughan avatar dchudz avatar dlebauer avatar gaborcsardi avatar hadley avatar hughparsonage avatar isomorphisms avatar jdnewmil avatar jimhester avatar kevinushey avatar leerssej avatar lionel- avatar mpadge avatar pkq avatar prosoitos avatar robertzk avatar romainfrancois avatar rsaporta avatar salim-b avatar smbache avatar steromano avatar tonytonov avatar trevorld avatar wibeasley avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

magrittr's Issues

Shorthands for multiply_by et al.

The "operator replacement" function multiply_by and its companions are helpful but too wordy in my opinion. How about defining, say,

`%>*%` <- `*`
`%>+%` <- `+`

Then, instead of 5 %>% add(2) %>% multiply_by(3). it's simply 5 %>+% 2 %>*% 3. This is shorter, and readability could be improved with syntax highlighting in the editor. Of course:

> 5 %>+% 2 %>*% 3
[1] 21
> 5 + 2 * 3
[1] 11

but that's intended. (This issue is a bit of a counterpart to #19.)

Piping with function operators

Hello,

I'm trying to figure out how to use function operators with maggritr. For example, I would like to use a function such as plyr::colwise in a chain of pipes:

data %>% colwise(mean)

This does not work since it is interpreted as colwise(mean, data). A workaround is

data %>% colwise(mean)()

But this is ugly and not very legible.

So I thought to do:

data %>% (colwise(mean))

to force the evaluation of the function operator, but it did not work.
It was interpreted as "("(data, colwise(mean)))

One possibility to make the pipe operator compatible with this behavior is to add the following line in its source code:

else {
  if (identical(rhs[[1]], quote(`(`))) rhs <- rhs[2]     ## Add this line
  dots <- c(FALSE, vapply(rhs[-1], identical, quote(.),
                          FUN.VALUE = logical(1)))

Wrapping the function operator with parentheses is still not ideal in my opinion, because one purpose of piping is to limit the number of parentheses around. However, I find it much better than adding empty parentheses at the end. Maybe someone will have a better idea.

A place for compose

Currently considering whether there is a place for a compose function in magrittr, although one may say its not exactly composition (ceci n'est pas un composite!). To try it out, I have implemented it in the dev branch. The idea is the following: when creating pipelines, there are steps that may well occur in other pipelines later on, and therefore really should be composed into a re-usable step (function). I was not interested in traditional (simple) composition, but wanted something more magrittr-like. Furthermore, suppose you have three functions, f(x), g(y), h(z). Then in a call h(g(f(x))), h will not know x or f(x), only g(f(x))). In the implementation here, one can access the inputs to previous sub-functions in the later functions. To illustrate a few examples:

# Compose a mean-absolute error function. 
# note that the default argument to each sub-function is `.`
mae <- compose(abs, mean(., na.rm = TRUE))
rnorm(100) %>% mae

# the above is equivalent to following
mae <- compose(abs, x ~ mean(x, na.rm = TRUE))
mae <- compose(. ~ abs(.), x ~ mean(x, na.rm = TRUE))

# The following example shows how to name and access inputs in later functions:
f <- compose(x ~ sin(x), y ~ cos(x*y))
f(1:100)  

In general the proposed syntax for each component in compose is

  • symbol will be converted into function(.) symbol(.)
  • a call will be converted into function(.) call
  • a formula call will name the argument, e.g. x ~ somefun(x, arg1, arg2, ...) will be function(x) somefun(x, arg1, arg2, ...).

The direction, as indicated in the examples, is that result of left-most functions are piped along into the right-most functions...

Edit: I have added also an operator, %,% (any ideas for a better name?), which allows e.g.

somefun <- abs %,% cos %,% sum(., na.rm = TRUE)

Also, since the autogenerated&nested nature of the composed functions are quite hard to decipher, I added a generic print for the composed functions, for prettier output.

Still not really sure whether this stuff belongs in magrittr?!

Create backpipe %<% operator

There are times when it is desirable to reverse the direction of the pipe operator to match the order of parenthesis. The motivating example is a shiny application in which the shiny commands build up HTML. A backpipe operator would preserve the hierarchy of the HTML while cleaning up the code immensely. Consider:

require(shiny)
# Original Shiny: 
div( class="color-green",  h1( "Header 1" ) )

# Current magrittr:
h1( "Header 1" ) %>% div( class="color-green" )    

# proposed 
div( class="color-green" )  %<% h1( "Header 1" ) 

The proposed syntax not only make the code more readable, but alos preserves the hierarchical nature of the HTML so the code more closely matches the generated HTML.

( BTW, I am already doing this with shiny application by simply cloning %>% and reversing the order of the arguments. I have not investigated how this would work with the changes for 1.1.0. I am happy to submit code if @smbache thinks this is a useful feature. )

Extract "curry" operation -- or a "tee" operator

The semantics of the %>% operator can be interpreted as follows:

  1. Apply a "curry" operation to the right-hand side, returning a function with one parameter.
  2. Call this function.

The %>% operator is very useful in itself, however, sometimes I wish I could isolate step 1. Assume the following simplified scenario:

x <- rnorm(100)
x <- Filter(function(x) x > 0, x)

I prefer the Filter to the redundant x[x > 0]. How would one write this with the help of magrittr::is_greater_than?

  • If we had access to the "curry" functionality in step 1:

    x <- Filter(curry(is_greater_than(., 0)), x)
    

    Or, shorter and more readable:

    x <- Filter(curry(. > 0), x)
    
  • If there was a "tee" operator %tee% that returns both input and output in a list:

    x <- x %tee% is_greater_than(., 0) %>% do.call(extract, .)
    

I think the first option is easier to read.

I'm aware of functional::Curry, but it'd be really great to have a variant that follows the rules in magrittr, to be able to easily interchange currying and piping.

Handling two (or more?) streams simultaneously

This may be taking pipelines a bit too far but it may also be nice to have:

A common scenario I encounter is that I have two data sources which need to be processed in a certain way, and are then both fed into one function. As an example, consider the case of two non-tidy data sources which I want to join:

a = data.frame(A=1:3, B=2:4, C=3:5)
b = data.frame(A=1:3, B=2:4, C=3:5) * 2
aa = melt(a, value.name='A')
bb = melt(b, value.name='B')
result = inner_join(aa, bb, by  = 'variable')

Somehow I imagine that it could be more readable to write this as a pipeline (at least once the simple melt operation here becomes a more complex chain of operations):

result = {
    a %>%
    … %>%
    melt(value.name = 'A')
} %++%
{
    b %>%
    … %>%
    melt(value.name = 'B')
} %>%
inner_join(by = 'variable')

The %++% would somehow construct a stream “tuple” here.

I’m not even convinced myself that this has any advantage over the first, stream-less variant. I’m also not aware of any other programming languages which would support this, so there may not be much precedence (but visual languages where the developer constructs streams, such as Knime, do of course support this).

Simpler argument for set_names aliases

At the moment, the functions for setting names, colnames and rownames are direct aliases to the corresponding primitive functions. As a result, this requires wrapping the names in a vector, ie:

data %>% set_names(c("name1", "name2"))

I think the c() is superfluous and makes the code slightly less easy to read. It may make sense to implement set_names in the following way:

set_names <- function(x, ...) {
  value <- c(...)
  names(x) <- value
  x
}

This way, we could simply call set_names like this:

data %>% set_names("name1", "name2")

FR: promote %>% to a normal operator and move to R core

%>% conflict with %>% from the operators package,

I know this is a problem with R's poor exporting functions, but I do have a suggested solution. Magrittr's %>% operator is so useful that it should be pushed into R core and given either its own operator perhaps
bullet • ascii(149) � -or-
middle dot · ascii(183) ·

%>% is a more flexible variant of the de-referencing operators in other languages.

It would be necessary to choose a ascii character, that renders well in rstudio, vim, emacs, ... as well as prints well in stackoverflow, github, google groups, etc.

`%<>%` is evaluated in wrong environment

%<>% breaks when the user redefines <- because it evaluates the assignment (via <-) in the caller’s environment. This shouldn’t happen; instead, while both lhs and rhs should be evaluated in the caller’s environment, <- should be evaluated in the package’s environment (or ditched in favour of assign).

I’d submit a PR but pipe looks sufficiently complicated that I prefer not to meddle. I do believe it would be enough to replace the assignment statement with

assign(deparse(lhs), result[["value"]], envir = parent)

Problem when using missing() together with pipes

I just stumbled upon odd error due to missing() when implementing magrittr pipes in my package, below is an example that illustrates the odd error:

fa <- function(a, b){
  a %>% 
    fb(b=b)
}
fb <- function(a, b){
  if(missing(b)){
    print("missing")
  }else{
    print(b)
  }
}

fa(a=1)

Throws: Error in print(b) : argument "b" is missing, with no default

PS. Thank you for this excellent package - I'm excited as a kid on X-mas...

Piping print() to ggplot doesn't work in a for loop

In a for loop, you explicitly need to use print() in order to display ggplots. It doesn't seem to work here, however; it prints only the metainformation.

for (i in 1:3) {
  ggplot(data.frame(x=rnorm(1e3)), aes(x=x)) + geom_histogram() %>%
    print()
}
## geom_histogram:
## stat_bin:
## position_stack: (width = NULL, height = NULL)
## geom_histogram:
## stat_bin:
## position_stack: (width = NULL, height = NULL)
## geom_histogram:
## stat_bin:
## position_stack: (width = NULL, height = NULL)

# This works though.
for (i in 1:3) {
  print(ggplot(data.frame(x=rnorm(1e3)), aes(x=x)) + geom_histogram())
}

On magrittr version 1.5, ggplot2 version 1.0.0.

New alias for `[[<-`

I cannot find a functional equivalent of [[<-, maybe there isn't one. In which case it would be nice to have one, so that I can write

list(a = 1, b = 2, c = 3) %>%
  set(b = "foo")

or maybe rather

list(a = 1, b = 2, c = 3) %>%
  set("b", "foo")

The function replace is actually almost good, but it drops the attributes of the value, so

foo <- "foo"
class(foo) <- "bar"
list(a = 1, b = 2, c = 3) %>%
  replace("b", foo)

only work if I write

foo <- "foo"
class(foo) <- "bar"
list(a = 1, b = 2, c = 3) %>%
  replace("b", list(foo))

Btw. while at it, maybe it makes sense to define set and set2, analogously to extract, and extract2.

Error when installing vignettes?

using devtools::install_github("smbache/magrittr", build_vignettes = TRUE) gives an error:

Quitting from lines 40-48 (magrittr.Rmd)
Error: processing vignette 'magrittr.Rmd' failed with diagnostics:
:8:1: unexpected symbol
7:
8: weekly
^
Execution halted

Do I need to use different arguments to install_github ?

problems with $ operator

Sorry if this is a duplicate, I think I saw it reported somewhere else but can't find it again.

mtcars %>% cbind(.$carb,.)
Error in .$carb : object of type 'closure' is not subsettable

I was wondering if this is a bug or another "design" decision that will make me a better programmer?

.[[ in list()

Is this expected behavior?

iris %>%
  list(.[[1]], .[[2]]) %>%
  str

# List of 3
# $ :'data.frame':  150 obs. of  5 variables:
#   ..$ Sepal.Length: num [1:150] 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
# ..$ Sepal.Width : num [1:150] 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
# ..$ Petal.Length: num [1:150] 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
# ..$ Petal.Width : num [1:150] 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
# ..$ Species     : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
# $ : num [1:150] 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
# $ : num [1:150] 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...

NS versions of some common aliases, e.g. extract

Some aliases, e.g. extract <- '[', while useful, are a little clumsy when used in pipelines.
Here is an example, slice, which is like extract, but adds a little "sugar". In particular, it allows for using column names in data.frame objects (via NS) and has two keywords which can be used in indexing: .last and .all, see below.

Not sure if this kind of stuff is too fancy for magrittr or should be placed somewhere else. On the one hand, I'd like to keep magrittr as small as possible, on the other I'd often like to use this myself.

#' Slice a rectangular object
#'
#' This is an alternative to "[" for slicing data.frames or matrices. For
#' data.frames, \code{slice} supports non-standard evaluation in the sense that
#' column names are directly available, which is useful in pipelines. There is
#' also two new keywords: \code{.last} which evaluates to the number of rows or
#' columns respectively, and \code{.all} which evaluates to
#' \code{seq_len(NROW(.))}, and \code{seq_len(NCOL(.))}, respectively. The index
#' arguments defaults to NULL, so there is no need to add commas when e.g.
#' \code{j} is not used.
#' 
#' @param . data.frame or matrix
#' @param i the row index/indices or slicing expression
#' @param j the column index/indeces or slicing expression
#' @return a subset of the original data argument
#' @export
#' @examples 
#' iris %>% slice(1:10)
#' iris %>% slice(100:.last)
#' iris %>% slice(Species == "setosa")
#' iris %>% slice(.all, 3:.last)
#' iris %>% slice(, 3:.last)
#' iris %>% slice(.all %% 2 == 0)
#' iris %>% slice(1:5, .all %% 2 == 0)
slice <- function(., i, j)
{
  if (missing(i))
    i <- seq_len(NROW(.))
  else 
    i <- eval(call("substitute", substitute(i), 
                   list(.last = NROW(.), .all = seq_len(NROW(.)))))

  if (missing(j))
    j <- seq_len(NCOL(.))
  else 
    j <- eval(call("substitute", substitute(j), 
                   list(.last = NCOL(.), .all = seq_len(NCOL(.)))))

  if (is.data.frame(.))
    eval(call("[", ., i, j), ., parent.frame())
  else
    eval.parent(call("[", ., i, j))
}

How to avoid piping to first argument?

Consider the following function:

f1 <- function(x,y,z="hello") {
     cat("x =",x,"\n")
     cat("y =",y,"\n")
     cat("z =",z,"\n")
}
> 1:10 %>% f1(mean(.),length(.))
x = 1 2 3 4 5 6 7 8 9 10 
y = 5.5 
z = 10 

If I really want f1(mean(1:10),length(1:10)) how can I do that?
Oh, lambda expression can do that.

> 1:10 %>% (l(f1(mean(.),length(.))))
x = 5.5 
y = 10 
z = hello 

Is there cleaner way?

Matching the lhs in dplyr vs magrittr

Not really an issue but more of a general comment.
Using the dplyr %.% operator, I got used to reference the left hand side in a chain as __prev, e.g.

iris %.% mutate(rownames = rownames(`__prev`))

This always felt a bit hacky, but actually worked fine in my experience.
While I do like the idea of replacing __prev with a nicer placeholder symbol, I am not convinced that . is best for the job. It seems that using . forces the implementation to only match the outmost call, in order to allow for the same symbol to be used with a different meaning in inner formulas. So we can do stuff like

iris %>% aggregate(. ~ Species, ., mean)

(which is not particularly readable imo), but to reproduce my above dplyr code in magrittr I would need something like

iris %>% l(x -> mutate(x, rownames = rownames(x)))

(is there a better way?).
I was wondering if using a slightly "heavier" placeholder - maybe ._ or something like that - and actually have it mached in the full rhs, like __prev in dlyr, would actually turn out more powerful/readable.

strange issue using pipes and qplot

All of the following calls

rnorm(1000) %>% qplot
rnorm(1000) %>% qplot(.)
rnorm(1000) %>% {qplot(.)}

Yield the following error

Error in as.data.frame.default(x[[i]], optional = TRUE, stringsAsFactors = stringsAsFactors) : 
  cannot coerce class "c("gg", "ggplot")" to a data.frame

Strangely, however, the following works

rnorm(1000) %>% {x=.; qplot(x)}

but that breaks the pipe magic... something else that works is debug(qplot), then calling str(x) once debugging, which makes me think there may be some sort of lazy evaluation happening (perhaps in ggplot2?) which doesn't play nicely with the . placeholder?

I can confirm that this wasn't an issue in magrittr 1.1.0

Calling return() from a pipe

I guess this cannot be implemented, but maybe I am missing something:

f <- function() {
   "this is the pipe" %>% return()
   "no, it is not"
}
f()
# [1] "no, it is not"

R CMD check and no visible binding for global variable '.'

This is more of a question than an issue. You are probably aware of R CMD check complaining about the dot. I could just ignore the complaints of course, but they sometimes signal real errors, so every time I see the NOTE in the output, I need to check.

I just put this in the package:

. <- "Shut up"

Do you see any potential problems with this? It seems to me that it will not interfere with the pipes, but I don't know much about magrittr internals.

Wrong precedence of %>% vs $

magrittr fiddles with the expected precedence of %>% and $. Consider the following MWE:

x = list(f = function (x) x)
1 : 10 %>% x$f

If %>% is defined naively1, this works. If it’s imported from magrittr, we get the error message

Error in ``1:10$lst : 3 arguments passed to '$' which requires 2

I’m guessing this is due to the fact that magrittr needs to handle . on right-hand side of %>% (e.g. foo %>% .$bar). However, it’s unintuitive.

I’m aware that there are (at least) two workarounds:

1 : 10 %>% x$f()
1 : 10 %>% (x$f)

Nevertheless, I’m wondering whether it wouldn’t be possible (and desirable) to make the initial syntax work. In particular, it’s quite puzzling to the user that the second workaround differs semantically from the initial (non-working) code, since the use of parentheses here conveys the same operator precedence that R uses when parsing the non-parenthesised expression.

This issue has cropped up in combination with modules (klmr/modules#43) but it’s equally relevant when programming with Reference Classes.


1 ``%>% = function (x, f) f(x)

%>% does not work well with data.table

I am trying to use %>% with data.tables but I had problems:

x = data.table(cars)
25 %>% x[speed == .] # Fails with: Error in function_list[[k]](value) : object 'speed' not found

25 %>% {x[speed == .]} # Ok

New operator for lapply/sapply/vapply, etc.

Just an idea. How about having an operator that automagically does lapply/sapply on a chain? I end up writing code like this many times:

filenames <- files %>% 
  sapply(lambda(x -> rename(x$metadata$.path))) %>%
  no_dup_names   

and instead of this it would be nice to apply the lambda to each element of the list/vector, with some simpler syntax. E.g.

filenames <- files %[[>%
  rename(.$metadata$.path) %>%
  no_dup_names

I guess it is challenging to make this work without introducing a new operator for each of the *apply commands (and there are many), but maybe having lapply() and sapply() is reasonable. The lapply operator could be %[>%, and the sapply operator could be %[[>%.

I realize I could also do

filenames <- files %>%
  lapply(extract2, "metadata") %>%
  sapply(extract2, ".path") %>%
  no_dup_names

but this is also somewhat cumbersome.

A variant of assignment

I'd be happy if magritter has assignment operator like %=>% (or %->%). For example,

Batting %>%
   group_by(playerID) %>%
   summarise(total = sum(G)) %=>% Batting_raw %>% 
   arrange(desc(total)) %=>% Batting_ordered_all %>% 
   head(5) %=>% Batting_head

Thus we can easily have objects that are generated during the call chain.
Objects should be generated in the environment calling this function.

I imagine this is easy to implement, but unfortunately I have no time to do it.

@kohske

Make alias for assignment?

Hello,

the one thing that bugs me about the current chaining syntax is that I have to use the standard assignment operator <- if I want to capture the final outcome of a chain of operations. It would be more elegant if I could just pipe the final result into a variable, e.g.:

x %>% f() %>% g() %>% h() %>% y

where y is the target variable. The above is probably not possible in R, but what about:

x %>% f() %>% g() %>% h() %>% assign_to(y)

which would be equivalent to

y <- x %>% f() %>% g() %>% h()

That should be easy enough to implement, no?

%<% vs dplyr::%.%

I like the idea of being able to place the LHS in an arbitrary position in the RHS call, but it seems like overkill to be able to insert it in any arbitrary position - do you really want to be able to do x %>% f(g(), h(i, j(k(.)))?.

If you made it only match the . symbol in the first call, then you wouldn't need %>>%. This would simplify the package with minimal loss of functionality.

Performance loss when using pipe chaining

I'm not sure whether it's appropriate to copy/paste issue tidyverse/dplyr#454 here, just for your convenience.

I'm not sure whether there is significant performance loss when the pipe chain gets longer using %>% implemented by magrittr, compared with traditional code and with pipeR.

See details here.

Here are some plots to illustrate the performance loss.
piper-performance1
piper-performance2
piper-performance3

Although performance is not necessarily a goal of pipe operators. It might be an issue for those who want fluent, readable code but also do very intensive computing with it.

Best practices in programming?

I just ran afoul of the "bare functions preceded by:: and :::" thing (#61). I am successfully using statements like x %>% foo::yo %>% blah(foobar) in a package I am developing, but tests were mysteriously failing on Travis. I now know that's due to differences in the CRAN and GitHub versions of magrittr. Lesson learned and I think I know what my options are going forward.

But the incident raises this question: are there certain ways of using magrittr that are safe in "interactive" analysis or in scripts that are categorically a bad idea in programming and inside packages?

I'm thinking specifically about dplyr and how @hadley provides two functions, e.g. dplyr::select() and dplyr::select_(), for use in analysis vs. programming, respectively. Is there anything analogous for magrittr?

. should be piped in any place in the calling function

I just create a demo pipeline operator %|% with the following code:

`%|%` <- function(x,f) {
  call <- match.call()
  if(is.name(call$f)) {
    f(x)
  } else if(is.call(call$f)) {
    env <- new.env()
    env$`.` <- x
    eval(call$f,envir = env)
  } else {
    stop("Error: unsupported type of function call")
  }
}

It not only allows me to call functions like:

rnorm(100,10,1) %|% plot

but also is convenient to pass parameter in a flexible way just as %>% does:

rnorm(100,10,1) %|% log %|% diff %|% plot(., col="red", type="l")

Note that . represents the result evaluated from the previous segment in the chain.

But when I use %>% in {magrittr}, it does not seem to support the following code:

rnorm(100) %>% plot(.,col="red",main=sprintf("length: %d",length(.)))

My code allows this:

rnorm(100) %>% plot(.,col="red",main=sprintf("length: %d",length(.)))

Since the evaluating environment contains . definition.

Performance loss when using pipe chaining

I'm not sure whether it's appropriate to copy/paste issue tidyverse/dplyr#454 here, just for your convenience.

I'm not sure whether there is significant performance loss when the pipe chain gets longer using %>% implemented by magrittr, compared with traditional code and with pipeR.

See details here.

Here are some plots to illustrate the performance loss.
piper-performance1
piper-performance2
piper-performance3

Although performance is not necessarily a goal of pipe operators. It might be an issue for those who want fluent, readable code but also do very intensive computing with it.

A function to invoke functions

For example, something like:

invoke <- function(f, ...) f(...)

This would be useful for pipelines when one of the functions returns a function:

create_adder <- function(value) {
  function(x) x + value
}

1+1 %>% create_adder() %>% invoke(3)
# [1] 5

use of `.` by itself within a `{ }` block?

Is it possible to use the . within a block and not within an explicit function call?

This works normally:

library(magrittr)
mtcars %>% subset(drat > 0) %>% head()
##                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
## Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

But when I try to do something similar inside of a block, it fails:

mtcars %>% { browser(); .; }
## Called from: function_list[[k]](value)
## Browse[1]> 
## debug at #1: .
## Browse[2]> . %>% subset(drat > 0)
## Functional sequence with the following components:
## 
##  1. subset(., drat > 0)
## 
## Use 'functions' to extract the individual functions. 
## Browse[2]> subset(., drat > 0) %>% head()
##                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
## Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

sessionInfo()
## R version 3.1.2 (2014-10-31)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## 
## locale:
## [1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252
## [4] LC_NUMERIC=C                           LC_TIME=English_United States.1252    
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] magrittr_1.5
## 
## loaded via a namespace (and not attached):
## [1] compiler_3.1.2 tools_3.1.2   

dplyr::mutate and %>%

The following does not seem to work:

tmp <- data.frame(date = "2013-01-01")
tmp %>% mutate(date = date %>% as.Date("%Y-%m-%d"))

whereas, using only dplyr, this will work:

tmp <- data.frame(date = "2013-01-01")
tmp %.% mutate(date = date %.% as.Date("%Y-%m-%d"))

Not sure why.

History

It's (sometimes) possible to access the result of the last pipe evaluation through .Last.value. This is restricted to interactive sessions, and it's not possible to access e.g. the second-to-last entry.

Use case: I'm performing a complex transformation on a data frame and use %>% head to view the first few lines of the result. Then I realize that in fact I want to see more, but to view the same results with %>% View, I need to re-run the entire pipe.

How about having magrittr memoize the last few (10? 20?) evaluations (with the RHS as descriptor) and provide access to it? This would also fix #43 (from a user's perspective).

Fails with functions without parens preceded by double colon

For example:
rnorm(1000) %>% ggplot2::qplot

gives:

Error in `rnorm(1000)`::ggplot2 : unused argument (qplot)

I have made a quick patch here (56a7540), that treats such calls as anonymous function calls. However, I didn't find a way to check if either the function or the namespace exist.

One problem when evaluating LHS

Consider the example:

rnorm(100) %>% 
    arima0(order = c(1, 0, 1)) %>% 
    predict(5)

This will produce the error:

Error in parse(text = object$series) : <text>:1:1: unexpected input
1: _
    ^

The reason lies in the call resulting from the evaluated pipe:

Call:
arima0(x = `__LHS`, order = c(1, 0, 1))

Coefficients:
          ar1     ma1  intercept
      -0.7596  0.8630     0.0016
s.e.   0.0704  0.4441     0.0997

Diggin' into predict.arima0, one finds

eval.parent(parse(text = object$series))

where

object$series == "__LHS"

Operator for piping into an lapply?

Is there any change of adding an operator to pipe into an lapply operation? For example,

x <- list(a=1:10, b=11:20, c=21:30)
x.means <- x %>apply% mean

would be equivalent to lapply(x, mean). And it could also allow arbitrary expressions/blocks including the dot:

# Calculate coefficient of variation ( stdev divided by mean )
x.cvs <- x %>apply% { sd(.) / mean(.); } # Need braces because of operator precedence

This would be equivalent to lapply(x, function(xi) { sd(xi) / mean(xi); }). In this case, you are saved from having to type the function(xi) boilerplate and you can just write only the body. As for naming, I've used %>apply here, but maybe something like %>each? It almost becomes like a list comprehension or Ruby's .each method. Maybe one could also add a similar operator for filtering (e.g. x %>filter% { mean(.) > 0; }. And then make them both streaming (like Python generators) and able to accept streaming input instead of just vectors and lists. And then optionally parallelize them. But that would all be future stuff. For now, how about an %>apply% operator?

And yes, I know that you can already do x %>% lapply(function(xi) sd(xi) / mean(xi)), but then you still need the boilerplate of the function declaration.

%>% does not work correctly when I use "lazy evaluation"

Hi sambache;

For example, I assume that I write the function which returns NULL when the error caused by "expr" argument variable happens.

null.when.error <- function(expr){
  tryCatch(expr, error=function(e)NULL)
}

In this situation, the below code works well and I can get null value correctly.

null.when.error(as.POSIXct("2013-333-222"))

On the other hand, the below code generate usual error in R not null value.

as.POSIXct("2013-333-222") %>% null.when.error

I think that this issue is related to lazy evaluation process in R and current magrittr does not have a correct lazy evaluation process, right?

Question: Why not use f from pryr instead of lambda?

Hi,

as far as I can see it f() from pryr is just as concise and somewhat more powerful (as it can take multiple arguments and is a universal solution for quicker anonymous functions) than lambda.

l(x~x>0)
f(x,x>0)

Is there a reason not to adapt that function for lambda, i.e. l() is like f() but evaluated immediately, with arguments passed treated as inputs instead of defaults like for f()? That would make the syntax more consistent. :)

Functions with envir = parent.frame()

Compare the following two statements:

"file.rda" %>% load
load("file.rda")

The former will load the data into a temporary environment; the reason is that load is declared as

load <- function (file, envir = parent.frame(), verbose = FALSE)

The same happens with assign and get.

This is at least unexpected, I'm not sure if we should call it a bug. Worse, I'm not sure if this issue can be solved at all in a robust way.

Similar to #32.

Deparsed name of argument

Until recently, the following showed a pane titled "iris" in RStudio:

iris %>% View

Now, the pane is titled ".". I assume this was introduced with the functional sequences. Is this behavior desired?

If not: Perhaps it's possible to use only parts of the pipe, e.g.

iris %>% has %>% been %>% processed %>% by(1, "overly") %>% long::pipe() %>% View

could show something like ... %>% by(1, "overly") %>% long::pipe(), with a configurable maximum number of characters.

Caching

Not saving intermediate result also means... not saving intermediate results. When creating longish pipes (10 steps or so), testing the pipe always means executing all steps. This slows down development.

How about providing, for interactive mode, some kind of caching functionality. The value of a %>% call could be reused if both the LHS (hash? address?) and the RHS (parse tree equality? string equality?) haven't changed. If values are taken from the cache, a message could indicate this. Of course, there also should be an easy way to invalidate the cache.

In a preliminary test, inserting R.cache::evalWithMemoization into the pipe didn't work due to eager evaluation of the pipe. Still, the functionality in R.cache could be helpful here.

Need simpler way of putting conditionals inline with %>%

Here's an example of what I'd like to do:

rowcount <- TRUE
mtcars %>% function(x) if(rowcount) nrow(x) else x
# [1] 32

rowcount <- FALSE
mtcars %>% function(x) if(rowcount) nrow(x) else x
# [prints out mtcars]

But the syntax is clunky.

One thought is to make x %>% NULL simply return x. Then, because if(F) 123 returns NULL, x %>% if(F) 123 would just return x. But this doesn't quite work since the if isn't evaluated before being processed by %>%.

Any thoughts on how to do this cleanly?

Order of operations

Are the pipe operators by design made to evaluate before logical comparisons? For example:

bool <- rep(TRUE, 5)

bool == bool %>% sum()
## FALSE FALSE FALSE FALSE FALSE

# What I expected it to do, because of reading left to right
(bool == bool) %>% sum()
## 5

# What it actually does
bool == (bool %>% sum())
## FALSE FALSE FALSE FALSE FALSE

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.