Code Monkey home page Code Monkey logo

pins-r's Introduction

pins pins website

R-CMD-check CRAN Status Codecov test coverage Lifecycle: stable

The pins package publishes data, models, and other R objects, making it easy to share them across projects and with your colleagues. You can pin objects to a variety of pin boards, including folders (to share on a networked drive or with services like DropBox), Posit Connect, Amazon S3, Google Cloud Storage, Azure storage, and Microsoft 365 (OneDrive and SharePoint). Pins can be automatically versioned, making it straightforward to track changes, re-run analyses on historical data, and undo mistakes.

pins 1.0.0 includes a new more explicit API and greater support for versioning. The legacy API (pin(), pin_get(), and board_register()) will continue to work, but new features will only be implemented with the new API, so we encourage you to switch to the modern API as quickly as possible. Learn more in vignette("pins-update").

You can use pins from Python as well as R. For example, you can use one language to read a pin created with the other. Learn more about pins for Python.

Installation

You can install pins from CRAN with:

install.packages("pins")

You can install the development version from GitHub:

# install.packages("pak")
pak::pak("rstudio/pins-r")

Usage

To use the pins package, you must first create a pin board. A good place to start is board_folder(), which stores pins in a directory you specify. Here I’ll use a special version of board_folder() called board_temp() which creates a temporary board that’s automatically deleted when your R session ends. This is great for examples, but obviously you shouldn’t use it for real work!

library(pins)

board <- board_temp()
board
#> Pin board <pins_board_folder>
#> Path:
#> '/var/folders/hv/hzsmmyk9393_m7q3nscx1slc0000gn/T/Rtmpvoaxgw/pins-142d05cc7724a'
#> Cache size: 0

You can “pin” (save) data to a board with pin_write(). It takes three arguments: the board to pin to, an object, and a name:

board %>% pin_write(head(mtcars), "mtcars")
#> Guessing `type = 'rds'`
#> Creating new version '20231108T211157Z-8df40'
#> Writing to pin 'mtcars'

As you can see, the data saved as an .rds by default, but depending on what you’re saving and who else you want to read it, you might use the type argument to instead save it as a Parquet, Arrow, CSV, or JSON file.

You can later retrieve the pinned data with pin_read():

board %>% pin_read("mtcars")
#>                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
#> Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
#> Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
#> Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
#> Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
#> Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
#> Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

A board on your computer is good place to start, but the real power of pins comes when you use a board that’s shared with multiple people. To get started, you can use board_folder() with a directory on a shared drive or in dropbox, or if you use Posit Connect you can use board_connect():

board <- board_connect()
#> Connecting to Posit Connect 2023.01.0 at <https://colorado.posit.co/rsc>
board %>% pin_write(tidy_sales_data, "sales-summary", type = "rds")
#> Writing to pin 'hadley/sales-summary'

Then, someone else (or an automated Rmd report) can read and use your pin:

board <- board_connect()
board %>% pin_read("hadley/sales-summary")

You can easily control who gets to access the data using the Posit Connect permissions pane.

The pins package also includes boards that allow you to share data on services like Amazon’s S3 (board_s3()), Azure’s blob storage (board_azure()), and Google Cloud Storage (board_gcs()). Learn more in vignette("pins").

pins-r's People

Contributors

atusy avatar chronofanz avatar colearendt avatar duju211 avatar edavidaja avatar fh-mthomson avatar gardiners avatar gshotwell avatar hadley avatar hongooi73 avatar ijlyttle avatar javierluraschi avatar jduckles avatar jthomasmock avatar juliasilge avatar kellobri avatar kevinykuo avatar machow avatar mzorko avatar ojziff avatar olivroy avatar rkb965 avatar salim-b avatar sellorm avatar smingerson avatar thomaszwagerman avatar tomsing1 avatar uchidamizuki avatar wurli avatar yue-jiang avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pins-r's Issues

Error in View : Board '' not a board, available boards: local, packages

via addin, search for something with select box set to "all", then click on one dataset:

> pins:::ui_addin_pin_find()

Listening on http://127.0.0.1:5397
[1] "fivethirtyeight/mediacloud_trump"

> View(pins::pin_preview(pins::pin_get("fivethirtyeight/mediacloud_trump", board = "")))
Error in View : Board '' not a board, available boards: local, packages

Can't replace GitHub pin if file is larger than 1MB

Workaround: pin_remove() the pin then pin_create().

When replacing existing pin with pin larger than 1mb:

[1] "This API returns blobs up to 1 MB in size. The requested blob is too large to fetch via the API, but you can use the Git Data API to request blobs up to 100 MB in size."

and

"sha" wasn't supplied. 

Support for caching in pin_get()

Caching is currently only supported in pin(), should be enabled in pin_get() as well, example:

pins::pin_get("gunnvant/game-of-thrones-srt", "kaggle") %>%
    dir(full.names = TRUE) %>%
    purrr::map(~jsonlite::read_json(.x)) %>%
    unlist() %>%
    tibble::tibble(captions = .)

In addition, when a zip is downloaded it should return all files, not the folder, such that the following works:

pins::pin_get("gunnvant/game-of-thrones-srt", "kaggle") %>%
    purrr::map(~jsonlite::read_json(.x)) %>%
    unlist() %>%
    tibble::tibble(captions = .)

Default pin goes to temporary directory?

pins::pin("https://raw.githubusercontent.com/facebook/prophet/master/examples/example_retail_sales.csv")
#> [1] "/private/tmp/RtmpK3Ni0B/file8ebc1adfe617/temp/example_retail_sales/example_retail_sales.csv"

I found this surprising, since I thought the point of pin() was to cache it persistently

Function names

I think it would be easier to see the shape of the api if you used pin()/board() as a prefix rather than a suffix

  • find_pin() -> pin_find()

  • get_pin() -> pin_get()

  • use_board() -> board_use()

  • get_board() -> board_get()

  • register_board() -> board_register()

install `zip` automatically?

Should pins try to install zip?

==> R CMD INSTALL --no-multiarch --with-keep.source pins

* installing to library ‘/home/key/R/x86_64-redhat-linux-gnu-library/3.6’
ERROR: dependency ‘zip’ is not available for package ‘pins’

Pin paths need namespacing

At a minimum, you should be able to support slashes in the board name. And if there are no slashes, assumes some user specific namespace.

Documentation request - Access settings for RStudio Connect

Hi,

It would be great if there could be some documentation on how to pin a resource that is used by a shiny app which is hosted on Connect

Does the access settings have to be manually set on the resource to mirror the access on the Shiny app?
If so, what error message would a user get if they don't have access to the resource but they do have access to the Shiny app?
Is there a way to automatically mirror the settings between the shiny app and the resource?

Thanks

Iain

Thanks

Iain

Support for config files and persistent boards

If you are using a board constantly, you don't want to be running board_register_*() on every session, we could implement board_persist() to persist all boards (or particular ones) in other R sessions by creating a config file with the config package.

More helpful error when forget `rsconnect` API key?

This works as expected:

> pins::board_register(server = "https://colorado.rstudio.com/rsc", key = Sys.getenv("RSCONNECT_API_KEY"), name = "jkr_rsc", board = "rsconnect")
> pin(iris, description = "The iris data set", board = "jkr_rsc")
# A tibble: 150 x 5
   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
          <dbl>       <dbl>        <dbl>       <dbl> <fct>  
 1          5.1         3.5          1.4         0.2 setosa 
 2          4.9         3            1.4         0.2 setosa 
 3          4.7         3.2          1.3         0.2 setosa 
 4          4.6         3.1          1.5         0.2 setosa 
 5          5           3.6          1.4         0.2 setosa 
 6          5.4         3.9          1.7         0.4 setosa 
 7          4.6         3.4          1.4         0.3 setosa 
 8          5           3.4          1.5         0.2 setosa 
 9          4.4         2.9          1.4         0.2 setosa 
10          4.9         3.1          1.5         0.1 setosa 
# … with 140 more rows
Warning messages:
1: invalid uid value replaced by that for user 'nobody' 
2: invalid gid value replaced by that for user 'nobody' 

This throws unhelpful error:

> pins::board_register(server = "https://colorado.rstudio.com/rsc", name = "jkr_rsc", board = "rsconnect")
> pin(iris, description = "The iris data set", board = "jkr_rsc")
Error in gsub(paste0("\\{\\{", template, "\\}\\}"), value, html_index) : 
  invalid 'replacement' argument
In addition: Warning message:
In value[[3L]](cond) :
  Error searching 'jkr_rsc' board: server named 'colorado.rstudio.com/rsc' does not exist

Not including https throw error?

This breaks because server is https. Any way to handle more gracefully? Most rsconnect instances are going to be https, so possible to try automatically? Or at least to warn on that? Perhaps on connect step?

> pins::board_register_rsconnect(server = "colorado.rstudio.com/rsc", key = Sys.getenv("RSCONNECT_API_KEY"))
> pins::board_connect("rsconnect")
> pins::pin(mtcars, "mtcars", board = "rsconnect")
No encoding supplied: defaulting to UTF-8.
Error in board_pin_create.rsconnect(board, store_path, name = name, metadata = metadata,  : 
  Failed to create pin: Operation failed with status 404: 404 page not found

Changing URL works fine:

> pins::board_register_rsconnect(server = "https://colorado.rstudio.com/rsc", key = Sys.getenv("RSCONNECT_API_KEY"))
> pins::board_connect("rsconnect")
> pins::pin(mtcars, "mtcars", board = "rsconnect")
# A tibble: 32 x 11
     mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
 1  21       6  160    110  3.9   2.62  16.5     0     1     4     4
 2  21       6  160    110  3.9   2.88  17.0     0     1     4     4
 3  22.8     4  108     93  3.85  2.32  18.6     1     1     4     1
 4  21.4     6  258    110  3.08  3.22  19.4     1     0     3     1
 5  18.7     8  360    175  3.15  3.44  17.0     0     0     3     2
 6  18.1     6  225    105  2.76  3.46  20.2     1     0     3     1
 7  14.3     8  360    245  3.21  3.57  15.8     0     0     3     4
 8  24.4     4  147.    62  3.69  3.19  20       1     0     4     2
 9  22.8     4  141.    95  3.92  3.15  22.9     1     0     4     2
10  19.2     6  168.   123  3.92  3.44  18.3     1     0     4     4
# … with 22 more rows
Warning messages:
1: invalid uid value replaced by that for user 'nobody' 
2: invalid gid value replaced by that for user 'nobody'

Pin should be an S3 generic

And then instead of pin_connection(), you should consider pin(con_define()).

Then the documentation for pin() can describe the different types of thing that you can pin: an dataset, a url, a database connection etc.

Warning when package is not loaded

pins::pin_get("gunnvant/game-of-thrones-srt")
Warning message:
In data(cranfiles, envir = .globals$datasets) :
  data set ‘cranfiles’ not found

Note: Does not trigger from within the pins project.

Breaks on list-columns

Not always getting the same error

Original Problem

class_types <- tibble::tribble(
    ~classroom_type, 
    "Standard",      
    "Admin"        
)
class_types$attrs <- tibble::tribble(
  ~R,    ~Python, ~`R Packages`,  ~`Python Packages`, ~Tensorflow,
  "V??",  "V??",   "tidyverse, ...", "numpy, ...",    "Yes", 
  "No",   "No",   "None",            "None",         "No"   
)

pins::pin(class_types, 
          name = "virtual_class_types2", 
          description = "Types (Images) for Virtual Classrooms", 
          board = "rsconnect")
Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE,  : 
  arguments imply differing number of rows: 2, 5

Repro attempt

list_df <- tibble::tibble(col = list(1:5, 3:10))
pins::pin(list_df, "list_df", "Testing a list df", "rsconnect")    
Error in write.table(x, file.path(path, "data.csv"), row.names = FALSE,  : 
  unimplemented type 'list' in 'EncodeElement'

Support for folder pins

One might want to pin a set of images, multiple csvs or why not. Currently we only support one resource at a time. Workaround is to create a zip and upload that as resource.

CRAN is requesting no data to be saved in user's home folder

In pins 0.1.0, the list of registered boards and default local board is stored in ~/.pins; however, CRAN is requesting this to not be the default since packages are not allowed to write outside the temp folder without user consent.

Instead, we need to opt-in to all boards and let users choose where data is stored by explicitly calling:

board_register_local()

The default board should become a temp board that makes use of the temp folder only, anything else would have to be explicitly requested by the user.

Publish the pin with the code in RStudio Connect

It seems like when using the pin function to publish that pin to RSC, I can't schedule jobs because no code was uploaded with the pin. Is it there additional argument in the function that allows that?

Extract zip files

Related to #12

Is it possible to pin a zip file and in some way automatically extract it?

pin() should give more information about what it's doing

Not sure exactly how much information is needed, but I think at least indicating that it's being downloaded or retrieved from local cache is important.

e.g.

url <- "https://raw.githubusercontent.com/facebook/prophet/master/examples/example_retail_sales.csv"

pins::pin(url)
# Downloading https://raw.githubusercontent.com/.../example_retail_sales.csv
# Caching to /tmp/example_retail_sales
pins::pin(url)
# Retrieving url from local cache

Error when retrieving kaggle files

Retrieving some pins from kaggle boards trigger:

error -1 in extracting from zip file

This is caused due to utils::unzip not being able to extract all files while zip::unzip can.

Roadmap available?

This is a very exciting package! It will enable FAIR data sets in an organization. Do you have a roadmap that you could share? I am wondering specifically about versioning, storing additional metadata with a resource, and how best to administer a board on RStudio connect

Multiple pins that match name are not returned properly

https://github.com/rstudio/pins/blob/bfb0f6d979f76d04ee99f56bc7b2f9ce49825d93/R/board_rsconnect.R#L177

I'm not sure whether this line is the root of the issue, or just a symptom of a problem elsewhere, but when trying to find a pin (but the name returns 2 results), we get broken behavior on Connect:

# before this line
                            name                                                                   description  type
1  alex.gold/virtual_class_types    Config Vars for Virtual Classrooms. A table pin with 3 rows and 6 columns. table
2 alex.gold/virtual_class_types2 Types (Images) for Virtual Classrooms. A table pin with 2 rows and 9 columns. table 
metadata
1 {"id":373,"guid":"c5da3c8a-5d8a-4ce7-b294-dc3849982e29",...}
2       {"id":372,"guid":"5fc8daf8-ccce-4caa-8ed1-d0481daf9d69",...}

# after
                            name                                                                   description  type                           metadata
1  alex.gold/virtual_class_types    Config Vars for Virtual Classrooms. A table pin with 3 rows and 6 columns. table {"type":"table","rows":3,"cols":6}
2 alex.gold/virtual_class_types2 Types (Images) for Virtual Classrooms. A table pin with 2 rows and 9 columns. table {"type":"table","rows":2,"cols":9}

Because this line messes up the metadata object, later attempts to pull out URLs and such will fail, resulting in an empty set and the pin not returned.

Moreover, debugging this behavior was kinda painful (trying to ensure that the content was where I expected it on Connect, that I had permissions, that pins was addressing the right RSC server, etc. All of this information is hidden and not logged). At very least, making some information available in pin_list regarding the URL to visit the content directly, or a bit more verbose error logging when things go awry would be very helpful to understand what pins is trying to grab.

Support auto-registering boards

Many times, wee use the defaults for a board, like:

pins::board_register_rsconnect()
pins::pin(iris, board = "rsconnect")

It would make sense to make simplify this by allowing:

pins::pin(iris, board = "rsconnect")

and simply auto-conneect when possible.

register a github board could check for repo existence or offer to create one ?

Hi,

I am just discovering the package and trying pin some data.
I tried naively this thinking it would set up everything.

libary(pins)
board_register_github(repo = "cderv/pins-board")

There is no error, warning or message but pinning does work because the repo does not exist yet in Github.

pin(iris, description = "The iris data set", board = "github")

I would have expected the registration process to warn about it or help create the repository.

Also, the github vignette does not precise explicitely to create the repository before registering.

would this be some precision in documentation only or would you accept board_register_github to check for existence and if the repo does not exist, to warn and ask fr manual creation ?

How do you update a pin?

Or if pin() already lets you overwrite an existing pin, how do you refer to a previous version of the data?

How/where does pins store a dataset?

From the main README, it is not clear to me if pin will cache the dataset just for the extent of your RStudio session, or write it to a file, and if so, where, so I can find what I downloaded last time? ... I see this is documented in a dedicated article, boards, it's just that for me, this is the very first question I'd have :-)

Perhaps it's overkill, but I wonder if it would make sense to print a message to the user, as part of pin?

As an aside, now that I've inspected where things are stored, I wonder if on linux this might be .pins (convention for application data)?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.