tidyverse / magrittr Goto Github PK
View Code? Open in Web Editor NEWImprove the readability of R code with the pipe
Home Page: https://magrittr.tidyverse.org
License: Other
Improve the readability of R code with the pipe
Home Page: https://magrittr.tidyverse.org
License: Other
magrittr fiddles with the expected precedence of %>%
and $
. Consider the following MWE:
x = list(f = function (x) x)
1 : 10 %>% x$f
If %>%
is defined naively1, this works. If it’s imported from magrittr, we get the error message
Error in ``1:10
$lst
: 3 arguments passed to '$' which requires 2
I’m guessing this is due to the fact that magrittr needs to handle .
on right-hand side of %>%
(e.g. foo %>% .$bar
). However, it’s unintuitive.
I’m aware that there are (at least) two workarounds:
1 : 10 %>% x$f()
1 : 10 %>% (x$f)
Nevertheless, I’m wondering whether it wouldn’t be possible (and desirable) to make the initial syntax work. In particular, it’s quite puzzling to the user that the second workaround differs semantically from the initial (non-working) code, since the use of parentheses here conveys the same operator precedence that R uses when parsing the non-parenthesised expression.
This issue has cropped up in combination with modules (klmr/modules#43) but it’s equally relevant when programming with Reference Classes.
1 ``%>% = function (x, f) f(x)
%>% conflict with %>% from the operators package,
I know this is a problem with R's poor exporting functions, but I do have a suggested solution. Magrittr's %>% operator is so useful that it should be pushed into R core and given either its own operator perhaps
bullet • ascii(149) � -or-
middle dot · ascii(183) ·
%>% is a more flexible variant of the de-referencing operators in other languages.
It would be necessary to choose a ascii character, that renders well in rstudio, vim, emacs, ... as well as prints well in stackoverflow, github, google groups, etc.
Consider the following function:
f1 <- function(x,y,z="hello") {
cat("x =",x,"\n")
cat("y =",y,"\n")
cat("z =",z,"\n")
}
> 1:10 %>% f1(mean(.),length(.))
x = 1 2 3 4 5 6 7 8 9 10
y = 5.5
z = 10
If I really want f1(mean(1:10),length(1:10))
how can I do that?
Oh, lambda expression can do that.
> 1:10 %>% (l(f1(mean(.),length(.))))
x = 5.5
y = 10
z = hello
Is there cleaner way?
Hi sambache;
For example, I assume that I write the function which returns NULL when the error caused by "expr" argument variable happens.
null.when.error <- function(expr){
tryCatch(expr, error=function(e)NULL)
}
In this situation, the below code works well and I can get null value correctly.
null.when.error(as.POSIXct("2013-333-222"))
On the other hand, the below code generate usual error in R not null value.
as.POSIXct("2013-333-222") %>% null.when.error
I think that this issue is related to lazy evaluation process in R and current magrittr does not have a correct lazy evaluation process, right?
This is more of a question than an issue. You are probably aware of R CMD check
complaining about the dot. I could just ignore the complaints of course, but they sometimes signal real errors, so every time I see the NOTE in the output, I need to check.
I just put this in the package:
. <- "Shut up"
Do you see any potential problems with this? It seems to me that it will not interfere with the pipes, but I don't know much about magrittr internals.
Are the pipe operators by design made to evaluate before logical comparisons? For example:
bool <- rep(TRUE, 5)
bool == bool %>% sum()
## FALSE FALSE FALSE FALSE FALSE
# What I expected it to do, because of reading left to right
(bool == bool) %>% sum()
## 5
# What it actually does
bool == (bool %>% sum())
## FALSE FALSE FALSE FALSE FALSE
For example:
rnorm(1000) %>% ggplot2::qplot
gives:
Error in `rnorm(1000)`::ggplot2 : unused argument (qplot)
I have made a quick patch here (56a7540), that treats such calls as anonymous function calls. However, I didn't find a way to check if either the function or the namespace exist.
This may be taking pipelines a bit too far but it may also be nice to have:
A common scenario I encounter is that I have two data sources which need to be processed in a certain way, and are then both fed into one function. As an example, consider the case of two non-tidy data sources which I want to join:
a = data.frame(A=1:3, B=2:4, C=3:5)
b = data.frame(A=1:3, B=2:4, C=3:5) * 2
aa = melt(a, value.name='A')
bb = melt(b, value.name='B')
result = inner_join(aa, bb, by = 'variable')
Somehow I imagine that it could be more readable to write this as a pipeline (at least once the simple melt
operation here becomes a more complex chain of operations):
result = {
a %>%
… %>%
melt(value.name = 'A')
} %++%
{
b %>%
… %>%
melt(value.name = 'B')
} %>%
inner_join(by = 'variable')
The %++%
would somehow construct a stream “tuple” here.
I’m not even convinced myself that this has any advantage over the first, stream-less variant. I’m also not aware of any other programming languages which would support this, so there may not be much precedence (but visual languages where the developer constructs streams, such as Knime, do of course support this).
I'm not sure whether it's appropriate to copy/paste issue tidyverse/dplyr#454 here, just for your convenience.
I'm not sure whether there is significant performance loss when the pipe chain gets longer using %>%
implemented by magrittr, compared with traditional code and with pipeR.
See details here.
Here are some plots to illustrate the performance loss.
Although performance is not necessarily a goal of pipe operators. It might be an issue for those who want fluent, readable code but also do very intensive computing with it.
I'd be happy if magritter has assignment operator like %=>% (or %->%). For example,
Batting %>%
group_by(playerID) %>%
summarise(total = sum(G)) %=>% Batting_raw %>%
arrange(desc(total)) %=>% Batting_ordered_all %>%
head(5) %=>% Batting_head
Thus we can easily have objects that are generated during the call chain.
Objects should be generated in the environment calling this function.
I imagine this is easy to implement, but unfortunately I have no time to do it.
Until recently, the following showed a pane titled "iris" in RStudio:
iris %>% View
Now, the pane is titled ".". I assume this was introduced with the functional sequences. Is this behavior desired?
If not: Perhaps it's possible to use only parts of the pipe, e.g.
iris %>% has %>% been %>% processed %>% by(1, "overly") %>% long::pipe() %>% View
could show something like ... %>% by(1, "overly") %>% long::pipe()
, with a configurable maximum number of characters.
Here's an example of what I'd like to do:
rowcount <- TRUE
mtcars %>% function(x) if(rowcount) nrow(x) else x
# [1] 32
rowcount <- FALSE
mtcars %>% function(x) if(rowcount) nrow(x) else x
# [prints out mtcars]
But the syntax is clunky.
One thought is to make x %>% NULL
simply return x. Then, because if(F) 123
returns NULL, x %>% if(F) 123
would just return x
. But this doesn't quite work since the if
isn't evaluated before being processed by %>%
.
Any thoughts on how to do this cleanly?
Currently considering whether there is a place for a compose
function in magrittr
, although one may say its not exactly composition (ceci n'est pas un composite!). To try it out, I have implemented it in the dev
branch. The idea is the following: when creating pipelines, there are steps that may well occur in other pipelines later on, and therefore really should be composed into a re-usable step (function). I was not interested in traditional (simple) composition, but wanted something more magrittr-like. Furthermore, suppose you have three functions, f(x)
, g(y)
, h(z)
. Then in a call h(g(f(x)))
, h
will not know x
or f(x)
, only g(f(x)))
. In the implementation here, one can access the inputs to previous sub-functions in the later functions. To illustrate a few examples:
# Compose a mean-absolute error function.
# note that the default argument to each sub-function is `.`
mae <- compose(abs, mean(., na.rm = TRUE))
rnorm(100) %>% mae
# the above is equivalent to following
mae <- compose(abs, x ~ mean(x, na.rm = TRUE))
mae <- compose(. ~ abs(.), x ~ mean(x, na.rm = TRUE))
# The following example shows how to name and access inputs in later functions:
f <- compose(x ~ sin(x), y ~ cos(x*y))
f(1:100)
In general the proposed syntax for each component in compose is
symbol
will be converted into function(.) symbol(.)
call
will be converted into function(.) call
formula call
will name the argument, e.g. x ~ somefun(x, arg1, arg2, ...)
will be function(x) somefun(x, arg1, arg2, ...)
.The direction, as indicated in the examples, is that result of left-most functions are piped along into the right-most functions...
Edit: I have added also an operator, %,%
(any ideas for a better name?), which allows e.g.
somefun <- abs %,% cos %,% sum(., na.rm = TRUE)
Also, since the autogenerated&nested nature of the composed functions are quite hard to decipher, I added a generic print for the composed functions, for prettier output.
Still not really sure whether this stuff belongs in magrittr?!
%<>%
breaks when the user redefines <-
because it evaluates the assignment (via <-
) in the caller’s environment. This shouldn’t happen; instead, while both lhs
and rhs
should be evaluated in the caller’s environment, <-
should be evaluated in the package’s environment (or ditched in favour of assign
).
I’d submit a PR but pipe
looks sufficiently complicated that I prefer not to meddle. I do believe it would be enough to replace the assignment statement with
assign(deparse(lhs), result[["value"]], envir = parent)
I guess this cannot be implemented, but maybe I am missing something:
f <- function() {
"this is the pipe" %>% return()
"no, it is not"
}
f()
# [1] "no, it is not"
Not saving intermediate result also means... not saving intermediate results. When creating longish pipes (10 steps or so), testing the pipe always means executing all steps. This slows down development.
How about providing, for interactive mode, some kind of caching functionality. The value of a %>%
call could be reused if both the LHS (hash? address?) and the RHS (parse tree equality? string equality?) haven't changed. If values are taken from the cache, a message could indicate this. Of course, there also should be an easy way to invalidate the cache.
In a preliminary test, inserting R.cache::evalWithMemoization
into the pipe didn't work due to eager evaluation of the pipe. Still, the functionality in R.cache
could be helpful here.
Hello,
I'm trying to figure out how to use function operators with maggritr. For example, I would like to use a function such as plyr::colwise in a chain of pipes:
data %>% colwise(mean)
This does not work since it is interpreted as colwise(mean, data)
. A workaround is
data %>% colwise(mean)()
But this is ugly and not very legible.
So I thought to do:
data %>% (colwise(mean))
to force the evaluation of the function operator, but it did not work.
It was interpreted as "("(data, colwise(mean)))
One possibility to make the pipe operator compatible with this behavior is to add the following line in its source code:
else {
if (identical(rhs[[1]], quote(`(`))) rhs <- rhs[2] ## Add this line
dots <- c(FALSE, vapply(rhs[-1], identical, quote(.),
FUN.VALUE = logical(1)))
Wrapping the function operator with parentheses is still not ideal in my opinion, because one purpose of piping is to limit the number of parentheses around. However, I find it much better than adding empty parentheses at the end. Maybe someone will have a better idea.
Compare the following two statements:
"file.rda" %>% load
load("file.rda")
The former will load the data into a temporary environment; the reason is that load
is declared as
load <- function (file, envir = parent.frame(), verbose = FALSE)
The same happens with assign
and get
.
This is at least unexpected, I'm not sure if we should call it a bug. Worse, I'm not sure if this issue can be solved at all in a robust way.
Similar to #32.
It's (sometimes) possible to access the result of the last pipe evaluation through .Last.value
. This is restricted to interactive sessions, and it's not possible to access e.g. the second-to-last entry.
Use case: I'm performing a complex transformation on a data frame and use %>% head
to view the first few lines of the result. Then I realize that in fact I want to see more, but to view the same results with %>% View
, I need to re-run the entire pipe.
How about having magrittr
memoize the last few (10? 20?) evaluations (with the RHS as descriptor) and provide access to it? This would also fix #43 (from a user's perspective).
All of the following calls
rnorm(1000) %>% qplot
rnorm(1000) %>% qplot(.)
rnorm(1000) %>% {qplot(.)}
Yield the following error
Error in as.data.frame.default(x[[i]], optional = TRUE, stringsAsFactors = stringsAsFactors) :
cannot coerce class "c("gg", "ggplot")" to a data.frame
Strangely, however, the following works
rnorm(1000) %>% {x=.; qplot(x)}
but that breaks the pipe magic... something else that works is debug(qplot)
, then calling str(x)
once debugging, which makes me think there may be some sort of lazy evaluation happening (perhaps in ggplot2
?) which doesn't play nicely with the .
placeholder?
I can confirm that this wasn't an issue in magrittr 1.1.0
using devtools::install_github("smbache/magrittr", build_vignettes = TRUE)
gives an error:
Quitting from lines 40-48 (magrittr.Rmd)
Error: processing vignette 'magrittr.Rmd' failed with diagnostics:
:8:1: unexpected symbol
7:
8: weekly
^
Execution halted
Do I need to use different arguments to install_github
?
Is there any change of adding an operator to pipe into an lapply operation? For example,
x <- list(a=1:10, b=11:20, c=21:30)
x.means <- x %>apply% mean
would be equivalent to lapply(x, mean)
. And it could also allow arbitrary expressions/blocks including the dot:
# Calculate coefficient of variation ( stdev divided by mean )
x.cvs <- x %>apply% { sd(.) / mean(.); } # Need braces because of operator precedence
This would be equivalent to lapply(x, function(xi) { sd(xi) / mean(xi); })
. In this case, you are saved from having to type the function(xi)
boilerplate and you can just write only the body. As for naming, I've used %>apply
here, but maybe something like %>each
? It almost becomes like a list comprehension or Ruby's .each
method. Maybe one could also add a similar operator for filtering (e.g. x %>filter% { mean(.) > 0; }
. And then make them both streaming (like Python generators) and able to accept streaming input instead of just vectors and lists. And then optionally parallelize them. But that would all be future stuff. For now, how about an %>apply%
operator?
And yes, I know that you can already do x %>% lapply(function(xi) sd(xi) / mean(xi))
, but then you still need the boilerplate of the function declaration.
I am trying to use %>% with data.tables but I had problems:
x = data.table(cars)
25 %>% x[speed == .] # Fails with: Error in function_list[[k]](value) : object 'speed' not found
25 %>% {x[speed == .]} # Ok
Perhaps just:
set_class <- `class<-`
I just stumbled upon odd error due to missing()
when implementing magrittr pipes in my package, below is an example that illustrates the odd error:
fa <- function(a, b){
a %>%
fb(b=b)
}
fb <- function(a, b){
if(missing(b)){
print("missing")
}else{
print(b)
}
}
fa(a=1)
Throws: Error in print(b) : argument "b" is missing, with no default
PS. Thank you for this excellent package - I'm excited as a kid on X-mas...
Sorry if this is a duplicate, I think I saw it reported somewhere else but can't find it again.
mtcars %>% cbind(.$carb,.)
Error in .$carb : object of type 'closure' is not subsettable
I was wondering if this is a bug or another "design" decision that will make me a better programmer?
This seems to have been merged into the master
branch readme:
install_github("smbache/magrittr", ref = "dev")
I assume this should be:
install_github("smbache/magrittr")
?
Cheers
Is this expected behavior?
iris %>%
list(.[[1]], .[[2]]) %>%
str
# List of 3
# $ :'data.frame': 150 obs. of 5 variables:
# ..$ Sepal.Length: num [1:150] 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
# ..$ Sepal.Width : num [1:150] 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
# ..$ Petal.Length: num [1:150] 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
# ..$ Petal.Width : num [1:150] 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
# ..$ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
# $ : num [1:150] 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
# $ : num [1:150] 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
Don't trust the french guy. Magritte was Belgian.
For reference, please see:
http://en.wikipedia.org/wiki/File:MagrittePipe.jpg
http://en.wikipedia.org/wiki/The_Treachery_of_Images
The following does not seem to work:
tmp <- data.frame(date = "2013-01-01")
tmp %>% mutate(date = date %>% as.Date("%Y-%m-%d"))
whereas, using only dplyr, this will work:
tmp <- data.frame(date = "2013-01-01")
tmp %.% mutate(date = date %.% as.Date("%Y-%m-%d"))
Not sure why.
I started creating a gist for R-specific keybindings in Sublime Text 3 which made me wonder:
Do you guys have any suggestions for a (universal) "pipe-forward" keyboard shortcut??
I thought maybe ctrl + alt + .
might work--any thoughts?
The "operator replacement" function multiply_by
and its companions are helpful but too wordy in my opinion. How about defining, say,
`%>*%` <- `*`
`%>+%` <- `+`
Then, instead of 5 %>% add(2) %>% multiply_by(3)
. it's simply 5 %>+% 2 %>*% 3
. This is shorter, and readability could be improved with syntax highlighting in the editor. Of course:
> 5 %>+% 2 %>*% 3
[1] 21
> 5 + 2 * 3
[1] 11
but that's intended. (This issue is a bit of a counterpart to #19.)
Just an idea. How about having an operator that automagically does lapply/sapply on a chain? I end up writing code like this many times:
filenames <- files %>%
sapply(lambda(x -> rename(x$metadata$.path))) %>%
no_dup_names
and instead of this it would be nice to apply the lambda to each element of the list/vector, with some simpler syntax. E.g.
filenames <- files %[[>%
rename(.$metadata$.path) %>%
no_dup_names
I guess it is challenging to make this work without introducing a new operator for each of the *apply commands (and there are many), but maybe having lapply()
and sapply()
is reasonable. The lapply
operator could be %[>%
, and the sapply operator could be %[[>%
.
I realize I could also do
filenames <- files %>%
lapply(extract2, "metadata") %>%
sapply(extract2, ".path") %>%
no_dup_names
but this is also somewhat cumbersome.
Is it possible to use the .
within a block and not within an explicit function call?
This works normally:
library(magrittr)
mtcars %>% subset(drat > 0) %>% head()
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
## Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
## Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
## Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
But when I try to do something similar inside of a block, it fails:
mtcars %>% { browser(); .; }
## Called from: function_list[[k]](value)
## Browse[1]>
## debug at #1: .
## Browse[2]> . %>% subset(drat > 0)
## Functional sequence with the following components:
##
## 1. subset(., drat > 0)
##
## Use 'functions' to extract the individual functions.
## Browse[2]> subset(., drat > 0) %>% head()
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
## Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
## Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
## Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
sessionInfo()
## R version 3.1.2 (2014-10-31)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
##
## locale:
## [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252
## [4] LC_NUMERIC=C LC_TIME=English_United States.1252
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] magrittr_1.5
##
## loaded via a namespace (and not attached):
## [1] compiler_3.1.2 tools_3.1.2
At the moment, the functions for setting names, colnames and rownames are direct aliases to the corresponding primitive functions. As a result, this requires wrapping the names in a vector, ie:
data %>% set_names(c("name1", "name2"))
I think the c() is superfluous and makes the code slightly less easy to read. It may make sense to implement set_names in the following way:
set_names <- function(x, ...) {
value <- c(...)
names(x) <- value
x
}
This way, we could simply call set_names like this:
data %>% set_names("name1", "name2")
I cannot find a functional equivalent of [[<-
, maybe there isn't one. In which case it would be nice to have one, so that I can write
list(a = 1, b = 2, c = 3) %>%
set(b = "foo")
or maybe rather
list(a = 1, b = 2, c = 3) %>%
set("b", "foo")
The function replace
is actually almost good, but it drops the attributes of the value, so
foo <- "foo"
class(foo) <- "bar"
list(a = 1, b = 2, c = 3) %>%
replace("b", foo)
only work if I write
foo <- "foo"
class(foo) <- "bar"
list(a = 1, b = 2, c = 3) %>%
replace("b", list(foo))
Btw. while at it, maybe it makes sense to define set
and set2
, analogously to extract
, and extract2
.
Hello,
the one thing that bugs me about the current chaining syntax is that I have to use the standard assignment operator <-
if I want to capture the final outcome of a chain of operations. It would be more elegant if I could just pipe the final result into a variable, e.g.:
x %>% f() %>% g() %>% h() %>% y
where y is the target variable. The above is probably not possible in R, but what about:
x %>% f() %>% g() %>% h() %>% assign_to(y)
which would be equivalent to
y <- x %>% f() %>% g() %>% h()
That should be easy enough to implement, no?
For example, something like:
invoke <- function(f, ...) f(...)
This would be useful for pipelines when one of the functions returns a function:
create_adder <- function(value) {
function(x) x + value
}
1+1 %>% create_adder() %>% invoke(3)
# [1] 5
In a for loop, you explicitly need to use print()
in order to display ggplots. It doesn't seem to work here, however; it prints only the metainformation.
for (i in 1:3) {
ggplot(data.frame(x=rnorm(1e3)), aes(x=x)) + geom_histogram() %>%
print()
}
## geom_histogram:
## stat_bin:
## position_stack: (width = NULL, height = NULL)
## geom_histogram:
## stat_bin:
## position_stack: (width = NULL, height = NULL)
## geom_histogram:
## stat_bin:
## position_stack: (width = NULL, height = NULL)
# This works though.
for (i in 1:3) {
print(ggplot(data.frame(x=rnorm(1e3)), aes(x=x)) + geom_histogram())
}
On magrittr version 1.5, ggplot2 version 1.0.0.
What exactly is the difference between ":=" and "%<>%"? Are they inter-changeable? Or is one the super-set of the other? Thanks!
The semantics of the %>%
operator can be interpreted as follows:
The %>%
operator is very useful in itself, however, sometimes I wish I could isolate step 1. Assume the following simplified scenario:
x <- rnorm(100)
x <- Filter(function(x) x > 0, x)
I prefer the Filter
to the redundant x[x > 0]
. How would one write this with the help of magrittr::is_greater_than
?
If we had access to the "curry" functionality in step 1:
x <- Filter(curry(is_greater_than(., 0)), x)
Or, shorter and more readable:
x <- Filter(curry(. > 0), x)
If there was a "tee" operator %tee%
that returns both input and output in a list:
x <- x %tee% is_greater_than(., 0) %>% do.call(extract, .)
I think the first option is easier to read.
I'm aware of functional::Curry
, but it'd be really great to have a variant that follows the rules in magrittr, to be able to easily interchange currying and piping.
Consider the example:
rnorm(100) %>%
arima0(order = c(1, 0, 1)) %>%
predict(5)
This will produce the error:
Error in parse(text = object$series) : <text>:1:1: unexpected input
1: _
^
The reason lies in the call resulting from the evaluated pipe:
Call:
arima0(x = `__LHS`, order = c(1, 0, 1))
Coefficients:
ar1 ma1 intercept
-0.7596 0.8630 0.0016
s.e. 0.0704 0.4441 0.0997
Diggin' into predict.arima0, one finds
eval.parent(parse(text = object$series))
where
object$series == "__LHS"
Some aliases, e.g. extract <- '['
, while useful, are a little clumsy when used in pipelines.
Here is an example, slice
, which is like extract
, but adds a little "sugar". In particular, it allows for using column names in data.frame
objects (via NS) and has two keywords which can be used in indexing: .last
and .all
, see below.
Not sure if this kind of stuff is too fancy for magrittr
or should be placed somewhere else. On the one hand, I'd like to keep magrittr
as small as possible, on the other I'd often like to use this myself.
#' Slice a rectangular object
#'
#' This is an alternative to "[" for slicing data.frames or matrices. For
#' data.frames, \code{slice} supports non-standard evaluation in the sense that
#' column names are directly available, which is useful in pipelines. There is
#' also two new keywords: \code{.last} which evaluates to the number of rows or
#' columns respectively, and \code{.all} which evaluates to
#' \code{seq_len(NROW(.))}, and \code{seq_len(NCOL(.))}, respectively. The index
#' arguments defaults to NULL, so there is no need to add commas when e.g.
#' \code{j} is not used.
#'
#' @param . data.frame or matrix
#' @param i the row index/indices or slicing expression
#' @param j the column index/indeces or slicing expression
#' @return a subset of the original data argument
#' @export
#' @examples
#' iris %>% slice(1:10)
#' iris %>% slice(100:.last)
#' iris %>% slice(Species == "setosa")
#' iris %>% slice(.all, 3:.last)
#' iris %>% slice(, 3:.last)
#' iris %>% slice(.all %% 2 == 0)
#' iris %>% slice(1:5, .all %% 2 == 0)
slice <- function(., i, j)
{
if (missing(i))
i <- seq_len(NROW(.))
else
i <- eval(call("substitute", substitute(i),
list(.last = NROW(.), .all = seq_len(NROW(.)))))
if (missing(j))
j <- seq_len(NCOL(.))
else
j <- eval(call("substitute", substitute(j),
list(.last = NCOL(.), .all = seq_len(NCOL(.)))))
if (is.data.frame(.))
eval(call("[", ., i, j), ., parent.frame())
else
eval.parent(call("[", ., i, j))
}
I'm not sure whether it's appropriate to copy/paste issue tidyverse/dplyr#454 here, just for your convenience.
I'm not sure whether there is significant performance loss when the pipe chain gets longer using %>%
implemented by magrittr, compared with traditional code and with pipeR.
See details here.
Here are some plots to illustrate the performance loss.
Although performance is not necessarily a goal of pipe operators. It might be an issue for those who want fluent, readable code but also do very intensive computing with it.
Hi,
as far as I can see it f() from pryr is just as concise and somewhat more powerful (as it can take multiple arguments and is a universal solution for quicker anonymous functions) than lambda.
l(x~x>0)
f(x,x>0)
Is there a reason not to adapt that function for lambda, i.e. l()
is like f()
but evaluated immediately, with arguments passed treated as inputs instead of defaults like for f()
? That would make the syntax more consistent. :)
There are times when it is desirable to reverse the direction of the pipe operator to match the order of parenthesis. The motivating example is a shiny application in which the shiny commands build up HTML. A backpipe operator would preserve the hierarchy of the HTML while cleaning up the code immensely. Consider:
require(shiny)
# Original Shiny:
div( class="color-green", h1( "Header 1" ) )
# Current magrittr:
h1( "Header 1" ) %>% div( class="color-green" )
# proposed
div( class="color-green" ) %<% h1( "Header 1" )
The proposed syntax not only make the code more readable, but alos preserves the hierarchical nature of the HTML so the code more closely matches the generated HTML.
( BTW, I am already doing this with shiny application by simply cloning %>% and reversing the order of the arguments. I have not investigated how this would work with the changes for 1.1.0. I am happy to submit code if @smbache thinks this is a useful feature. )
I just create a demo pipeline operator %|%
with the following code:
`%|%` <- function(x,f) {
call <- match.call()
if(is.name(call$f)) {
f(x)
} else if(is.call(call$f)) {
env <- new.env()
env$`.` <- x
eval(call$f,envir = env)
} else {
stop("Error: unsupported type of function call")
}
}
It not only allows me to call functions like:
rnorm(100,10,1) %|% plot
but also is convenient to pass parameter in a flexible way just as %>% does:
rnorm(100,10,1) %|% log %|% diff %|% plot(., col="red", type="l")
Note that .
represents the result evaluated from the previous segment in the chain.
But when I use %>%
in {magrittr}
, it does not seem to support the following code:
rnorm(100) %>% plot(.,col="red",main=sprintf("length: %d",length(.)))
My code allows this:
rnorm(100) %>% plot(.,col="red",main=sprintf("length: %d",length(.)))
Since the evaluating environment contains .
definition.
I like the idea of being able to place the LHS in an arbitrary position in the RHS call, but it seems like overkill to be able to insert it in any arbitrary position - do you really want to be able to do x %>% f(g(), h(i, j(k(.)))
?.
If you made it only match the .
symbol in the first call, then you wouldn't need %>>%
. This would simplify the package with minimal loss of functionality.
Not really an issue but more of a general comment.
Using the dplyr %.%
operator, I got used to reference the left hand side in a chain as __prev
, e.g.
iris %.% mutate(rownames = rownames(`__prev`))
This always felt a bit hacky, but actually worked fine in my experience.
While I do like the idea of replacing __prev
with a nicer placeholder symbol, I am not convinced that .
is best for the job. It seems that using .
forces the implementation to only match the outmost call, in order to allow for the same symbol to be used with a different meaning in inner formulas. So we can do stuff like
iris %>% aggregate(. ~ Species, ., mean)
(which is not particularly readable imo), but to reproduce my above dplyr code in magrittr I would need something like
iris %>% l(x -> mutate(x, rownames = rownames(x)))
(is there a better way?).
I was wondering if using a slightly "heavier" placeholder - maybe ._
or something like that - and actually have it mached in the full rhs, like __prev
in dlyr, would actually turn out more powerful/readable.
I just ran afoul of the "bare functions preceded by:: and :::" thing (#61). I am successfully using statements like x %>% foo::yo %>% blah(foobar)
in a package I am developing, but tests were mysteriously failing on Travis. I now know that's due to differences in the CRAN and GitHub versions of magrittr
. Lesson learned and I think I know what my options are going forward.
But the incident raises this question: are there certain ways of using magrittr
that are safe in "interactive" analysis or in scripts that are categorically a bad idea in programming and inside packages?
I'm thinking specifically about dplyr
and how @hadley provides two functions, e.g. dplyr::select()
and dplyr::select_()
, for use in analysis vs. programming, respectively. Is there anything analogous for magrittr
?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.