Code Monkey home page Code Monkey logo

dplyr's Introduction

tidyverse

CRAN status R-CMD-check Codecov test coverage

Overview

The tidyverse is a set of packages that work in harmony because they share common data representations and API design. The tidyverse package is designed to make it easy to install and load core packages from the tidyverse in a single command.

If you’d like to learn how to use the tidyverse effectively, the best place to start is R for Data Science (2e).

Installation

# Install from CRAN
install.packages("tidyverse")
# Install the development version from GitHub
# install.packages("pak")
pak::pak("tidyverse/tidyverse")

If you’re compiling from source, you can run pak::pkg_system_requirements("tidyverse"), to see the complete set of system packages needed on your machine.

Usage

library(tidyverse) will load the core tidyverse packages:

You also get a condensed summary of conflicts with other packages you have loaded:

library(tidyverse)
#> ── Attaching core tidyverse packages ─────────────────── tidyverse 2.0.0.9000 ──
#> ✔ dplyr     1.1.3     ✔ readr     2.1.4
#> ✔ forcats   1.0.0     ✔ stringr   1.5.0
#> ✔ ggplot2   3.4.4     ✔ tibble    3.2.1
#> ✔ lubridate 1.9.3     ✔ tidyr     1.3.0
#> ✔ purrr     1.0.2     
#> ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
#> ✖ dplyr::filter() masks stats::filter()
#> ✖ dplyr::lag()    masks stats::lag()
#> ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

You can see conflicts created later with tidyverse_conflicts():

library(MASS)
#> 
#> Attaching package: 'MASS'
#> The following object is masked from 'package:dplyr':
#> 
#>     select
tidyverse_conflicts()
#> ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
#> ✖ dplyr::filter() masks stats::filter()
#> ✖ dplyr::lag()    masks stats::lag()
#> ✖ MASS::select()  masks dplyr::select()
#> ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

And you can check that all tidyverse packages are up-to-date with tidyverse_update():

tidyverse_update()
#> The following packages are out of date:
#>  * broom (0.4.0 -> 0.4.1)
#>  * DBI   (0.4.1 -> 0.5)
#>  * Rcpp  (0.12.6 -> 0.12.7)
#>  
#> Start a clean R session then run:
#> install.packages(c("broom", "DBI", "Rcpp"))

Packages

As well as the core tidyverse, installing this package also installs a selection of other packages that you’re likely to use frequently, but probably not in every analysis. This includes packages for:

  • Working with specific types of vectors:

    • hms, for times.
  • Importing other types of data:

    • feather, for sharing with Python and other languages.
    • haven, for SPSS, SAS and Stata files.
    • httr, for web apis.
    • jsonlite for JSON.
    • readxl, for .xls and .xlsx files.
    • rvest, for web scraping.
    • xml2, for XML.
  • Modelling

    • modelr, for modelling within a pipeline
    • broom, for turning models into tidy data

Code of Conduct

Please note that the tidyverse project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.

dplyr's People

Contributors

arunsrinivasan avatar batpigandme avatar billdenney avatar cderv avatar coolbutuseless avatar cosinequanon avatar davisvaughan avatar earowang avatar eibanez avatar hadley avatar hannes avatar ilarischeinin avatar javierluraschi avatar jennybc avatar jimhester avatar kevinushey avatar krlmlr avatar leondutoit avatar lindbrook avatar lionel- avatar maurolepore avatar mine-cetinkaya-rundel avatar pimentel avatar romainfrancois avatar s-fleck avatar salim-b avatar sfirke avatar steveharoz avatar yutannihilation avatar zeehio avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

dplyr's Issues

Fail to build on Windows 64bit R-studio

  • installing source package 'dplyr' ...
    ** libs
    g++ -m64 -I"C:/PROGRA1/R/R-301.1/include" -DNDEBUG -I"C:/Users/jta/Documents/R/win-library/3.0/Rcpp/include" -I"d:/RCompile/CRANpkg/extralibs64/local/include" -O2 -Wall -mtune=core2 -c RcppExports.cpp -o RcppExports.o
    g++ -m64 -I"C:/PROGRA1/R/R-301.1/include" -DNDEBUG -I"C:/Users/jta/Documents/R/win-library/3.0/Rcpp/include" -I"d:/RCompile/CRANpkg/extralibs64/local/include" -O2 -Wall -mtune=core2 -c split-indices.cpp -o split-indices.o
    g++ -m64 -shared -s -static-libgcc -o dplyr.dll tmp.def RcppExports.o split-indices.o C:/Users/jta/Documents/R/win-library/3.0/Rcpp/lib/x64/libRcpp.a -Ld:/RCompile/CRANpkg/extralibs64/local/lib/x64 -Ld:/RCompile/CRANpkg/extralibs64/local/lib -LC:/PROGRA1/R/R-301.1/bin/x64 -lR
    installing to C:/Users/jta/Documents/R/win-library/3.0/dplyr/libs/x64
    ** R
    ** data
    ** inst
    ** tests
    ** preparing package for lazy loading
    ** help
    *** installing help indices
    ** building package indices
    ** testing if installed package can be loaded
    Error in namespaceExport(ns, exports) :
    undefined exports: join.tbl_sqlite
    Error: loading failed
    Execution halted
    ERROR: loading failed
  • removing 'C:/Users/jta/Documents/R/win-library/3.0/dplyr'

Did try installing Rsqllite and later RSQLite.extfuns to see if it was missing dependencies regarding SQLite, but neither did solve the problem.

Have devtools package and R-tools installed

Other types of grouping

  • bootstrap
  • binning (continuous data)
  • moving window/shingles
  • accumulating window
  • individual rows

Error install dplyr package on Mac

I have been trying to install "dplyr" using "devtools" on my Mac and keep getting this output and error.

devtools::install_github("dplyr")

Installing github repo(s) dplyr/master from hadley
Installing dplyr.zip from https://github.com/hadley/dplyr/archive/master.zip
Installing dplyr
'/Library/Frameworks/R.framework/Resources/bin/R' --vanilla CMD INSTALL
'/private/var/folders/Mn/MnBlvG3QH+eJFvQUQ38vvk+++TI/-Tmp-/RtmpwOIX5d/dplyr-master'
--library='/Library/Frameworks/R.framework/Versions/3.0/Resources/library' --with-keep.source

  • installing source package 'dplyr' ...
    ** libs
    sh: make: command not found
    ERROR: compilation failed for package 'dplyr'
  • removing '/Library/Frameworks/R.framework/Versions/3.0/Resources/library/dplyr'
    Error: Command failed (1)

Here is the sessionInfo()

R version 3.0.1 (2013-05-16)
Platform: x86_64-apple-darwin10.8.0 (64-bit)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] graphics grDevices utils datasets grid stats methods base

other attached packages:
[1] devtools_1.2

loaded via a namespace (and not attached):
[1] colorspace_1.2-2 dichromat_2.0-0 digest_0.6.3 evaluate_0.4.3 formatR_0.8 gtable_0.1.2
[7] httr_0.2 labeling_0.2 lattice_0.20-15 MASS_7.3-26 memoise_0.1 munsell_0.4
[13] parallel_3.0.1 plyr_1.8 proto_0.3-10 rCharts_0.3.5 RColorBrewer_1.0-5 RCurl_1.95-4.1
[19] reshape2_1.2.2 RJSONIO_1.0-3 stringr_0.6.2 tools_3.0.1 whisker_0.3-2 yaml_2.1.7

Loading "assertthat" seems to work fine using devtools::install_github("assertthat"), so I'm not sure what the error is. "plyr" is loaded, but not attached so I'm not sure if that's the issue. Maybe I'm missing something.

Simplify generated sql

By recognising that from can be:

  • table name
  • join
  • select

And when it's a table name or a join, you can omit a layer of selects.

Bigquery backend

(Some initial code at http://code.google.com/p/google-bigquery-r-client/source/browse/googlebigquery/R/bigquery_client.R)

Querying differences:

  • can't use *
  • can use within/flatten for nested data (out of scope for dplyr?)
  • table unions might be more important
  • join for small databases (<8 meg), join each for larger
  • group by requires all variables to be in select

Mutate/summarise bugs

From Dave Cooper:

# mutate and summarize bugs.R

rm(list = ls())

require(plyr)
require(ggplot2)

d = diamonds

#########################
# MUTATE BUG
# this works
e = mutate(d,
  cut2 = as.character(cut) )
e = mutate(e,
  cut2 = ifelse(cut2 == 'Fair', '*Fairest*', cut2))
plyr::count(e, ~cut+cut2)

# this fails
e = mutate(d,
  cut2 = as.character(cut),
  cut2 = ifelse(cut2 == 'Fair', '*Fairest*', cut2))
plyr::count(e, ~cut+cut2)

# but this works!
e = mutate(d,
  cut2 = as.character(cut),
  cut3 = ifelse(cut2 == 'Fair', '*Fairest*', cut2))
plyr::count(e, ~cut+cut2+cut3)

#########################
# SAME PROBLEM, BUT WITH SUMMARIZE
# this works
e = summarize(d,
  cut2 = as.character(cut) )
e = summarize(e,
  cut2 = ifelse(cut2 == 'Fair', '*Fairest*', cut2))
count(e, ~cut2)

# this fails
e = summarize(d,
  cut2 = as.character(cut),
  cut2 = ifelse(cut2 == 'Fair', '*Fairest*', cut2))
count(e, ~cut2)

# but this works!
e = summarize(d,
  cut2 = as.character(cut),
  cut3 = ifelse(cut2 == 'Fair', '*Fairest*', cut2))
count(e, ~cut2+cut3)


#########################
# MUTATE with REVALUE
# this works
e = mutate(d,
 cut2 = revalue(cut, c(Fair = '*Fairest*')) )
e = mutate(e,
 cut2 = revalue(cut2, c('*Fairest*' = '*Fairest of All*')) )
count(e, ~cut+cut2)

# this fails
e = mutate(d,
 cut2 = revalue(cut, c(Good = '*Fairest*')),
 cut2 = revalue(cut2, c('*Fairest*' = '*Fairest of All*')) )
count(e, ~cut+cut2)

# but this works!
e = mutate(d,
 cut2 = revalue(cut, c(Good = '*Fairest*')),
 cut3 = revalue(cut2, c('*Fairest*' = '*Fairest of All*')) )
count(e, ~cut+cut2+cut3)

Printing sources should display succinct column info

From Dave Cooper:

h = function(x, r=6, c=8) {
  if (is.null(dim(x))) {
    cat(format(x[1:min(r, length(x))], digits=2), '...')
    cat('\nlength=', length(x), ', class=', class(x), '\n', sep='')
  }
  else {
    print(x[1:min(r, nrow(x)), 1:min(c, ncol(x))], digits=2)
    cat('dim = (', dim(x)[1], ', ', dim(x)[2], '), class=', class(x), '\n', sep='')

Safer escaping

  • escape_sql -> name
  • build_sql to automatically escape names/calls.

installation with cygwin

Hi did anyone tried to install dplyr on R on a windows machine?
Needless to say that the workstation is installed with cygwin including the make and gcc commands

I'm constantly getting the error:
ERROR: compilation failed for package 'dplyr'

Dave

Installation Error Macosx

I get an error when trying to install dplyr via devtools. I also have the latest XCode (5.0)/Command Line Tools.

> install_github("dplyr")
Installing github repo(s) dplyr/master from hadley
Downloading dplyr.zip from https://github.com/hadley/dplyr/archive/master.zip
Installing package from /var/folders/tx/5yp3lm_j6_l076sm5htmvxxm0000gn/T//RtmpXuhEU8/dplyr.zip
Installing dplyr
'/Library/Frameworks/R.framework/Resources/bin/R' --vanilla CMD INSTALL  \
  '/private/var/folders/tx/5yp3lm_j6_l076sm5htmvxxm0000gn/T/RtmpXuhEU8/dplyr-master'  \
  --library='/Library/Frameworks/R.framework/Versions/3.0/Resources/library' --with-keep.source --install-tests 

* installing *source* package 'dplyr' ...
** libs
llvm-g++-4.2 -arch x86_64 -I/Library/Frameworks/R.framework/Resources/include -DNDEBUG  -I/usr/local/include -I"/Library/Frameworks/R.framework/Versions/3.0/Resources/library/Rcpp/include"   -fPIC  -mtune=core2 -g -O2  -c RcppExports.cpp -o RcppExports.o
llvm-g++-4.2 -arch x86_64 -I/Library/Frameworks/R.framework/Resources/include -DNDEBUG  -I/usr/local/include -I"/Library/Frameworks/R.framework/Versions/3.0/Resources/library/Rcpp/include"   -fPIC  -mtune=core2 -g -O2  -c split-indices.cpp -o split-indices.o
llvm-g++-4.2 -arch x86_64 -dynamiclib -Wl,-headerpad_max_install_names -undefined dynamic_lookup -single_module -multiply_defined suppress -L/usr/local/lib -L/usr/local/lib -o dplyr.so RcppExports.o split-indices.o /Library/Frameworks/R.framework/Versions/3.0/Resources/library/Rcpp/lib/libRcpp.a -F/Library/Frameworks/R.framework/.. -framework R -Wl,-framework -Wl,CoreFoundation
installing to /Library/Frameworks/R.framework/Versions/3.0/Resources/library/dplyr/libs
** R
** data
** inst
** tests
** preparing package for lazy loading
** help
*** installing help indices
** building package indices
** testing if installed package can be loaded
Error in namespaceExport(ns, exports) : 
  undefined exports: join.tbl_sqlite
Error: loading failed
Execution halted
ERROR: loading failed
* removing '/Library/Frameworks/R.framework/Versions/3.0/Resources/library/dplyr'
Error: Command failed (1)

> sessionInfo()
R version 3.0.1 (2013-05-16)
Platform: x86_64-apple-darwin10.8.0 (64-bit)

locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] devtools_1.3 vimcom_0.9-8

loaded via a namespace (and not attached):
[1] digest_0.6.3   evaluate_0.4.7 httr_0.2       memoise_0.1    parallel_3.0.1 RCurl_1.95-4.1 stringr_0.6.2 
[8] tools_3.0.1    whisker_0.3-2 

Strict version of translate_sql

Shouldn't fill in arbitrary functions - could be activated by global option. Will need considerable fill in of simple mathematical functions.

Failed to install

devtools::install_github("dplyr")

...
** testing if installed package can be loaded
Error : object 'as.data.table' not found whilst loading namespace 'dplyr'
Error: loading failed
Execution halted
ERROR: loading failed
* removing 'D:/Software/R/R/win-library/3.0/dplyr'
Error: Command failed (1)

sessionInfo()

R version 3.0.2 (2013-09-25)
Platform: x86_64-w64-mingw32/x64 (64-bit)
locale:
[1] LC_COLLATE=English_United Kingdom.1252 
[2] LC_CTYPE=English_United Kingdom.1252   
[3] LC_MONETARY=English_United Kingdom.1252
[4] LC_NUMERIC=C                           
[5] LC_TIME=English_United Kingdom.1252    
attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     
loaded via a namespace (and not attached):
[1] devtools_1.3   digest_0.6.3   evaluate_0.5.1 httr_0.2       memoise_0.1   
[6] parallel_3.0.2 RCurl_1.95-4.1 stringr_0.6.2  tools_3.0.2    whisker_0.3-2 

RTools 3.0

I quickly looked in the code, but could find nothing strange. Any ideas?

Develop flexible SQL backend

Packages reviewed: RPostgreSQL, RMySQL, MonetDB.R, RODBC, RJDBC

Differences:

  • RPostgreSQL: has dbApply which could be used to implement do, multiple open connections, prepared queries use special syntax, windowing, explain output is json, subqueries need to be named, ANALYZE called automatically (but not on temporary tables)
  • MonetDB.R: looks like prepared statements might be supported through dbSendQuery, no semi_joins?
  • RMySQL: also has dbApply, no prepared queries
  • RODBC: non-DBI, manual transactions

Other things that I may need to make generic:

  • variable name escaping
  • translation of R variable types to db types (for creation)

Steps to make dplyr adapt to different sql variants:

  • Make query a base class and add subclasses for other databases
  • Turn tbl_sqlite object into a tbl_sql object - method would dispatch on src/con

Changes needed:

  • Query needs to become base class
  • Existing tbl_sqlite methods should become tbl_sql methods
  • When implementing first non-sqlite sql adaptor, gradually change

Postgresql and MonetDB clearly the most important and should be tackled first. MonetDB might be simpler to start with since it implements a more limited subset of sql.

Cannot install dplyr

I'm having trouble installing the dplyr package. Im running a mac osx 10.8.5 and I've installed Xcode and the command line tools and uninstalled them and installed them again (3 or 4 times over), restarted my computer numerous times and each time I get this error:

* installing *source* package 'dplyr' ...
** libs
llvm-g++-4.2 -arch x86_64 -I/Library/Frameworks/R.framework/Resources/include -DNDEBUG  -I/usr/local/include -I"/Library/Frameworks/R.framework/Versions/3.0/Resources/library/Rcpp/include"   -fPIC  -mtune=core2 -g -O2  -c RcppExports.cpp -o RcppExports.o
/bin/sh: llvm-g++-4.2: command not found
make: *** [RcppExports.o] Error 127
ERROR: compilation failed for package 'dplyr'
* removing '/Library/Frameworks/R.framework/Versions/3.0/Resources/library/dplyr'
Error: Command failed (1)

Any ideas? Thanks!

Implement regroup

So that you can easily ungroup, do some operation and then regroup.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.