Code Monkey home page Code Monkey logo

httr2's Introduction

httr2 httr2 website

R-CMD-check Codecov test coverage

httr2 (pronounced hitter2) is a ground-up rewrite of httr that provides a pipeable API with an explicit request object that solves more problems felt by packages that wrap APIs (e.g. built-in rate-limiting, retries, OAuth, secure secrets, and more).

Installation

You can install httr2 from CRAN with:

install.packages("httr2")

Usage

To use httr2, start by creating a request:

library(httr2)

req <- request("https://r-project.org")
req
#> <httr2_request>
#> GET https://r-project.org
#> Body: empty

You can tailor this request with the req_ family of functions:

# Add custom headers
req |> req_headers("Accept" = "application/json")
#> <httr2_request>
#> GET https://r-project.org
#> Headers:
#> • Accept: 'application/json'
#> Body: empty

# Add a body, turning it into a POST
req |> req_body_json(list(x = 1, y = 2))
#> <httr2_request>
#> POST https://r-project.org
#> Body: json encoded data

# Automatically retry if the request fails
req |> req_retry(max_tries = 5)
#> <httr2_request>
#> GET https://r-project.org
#> Body: empty
#> Policies:
#> • retry_max_tries: 5

# Change the HTTP method
req |> req_method("PATCH")
#> <httr2_request>
#> PATCH https://r-project.org
#> Body: empty

And see exactly what httr2 will send to the server with req_dry_run():

req |> req_dry_run()
#> GET / HTTP/1.1
#> Host: r-project.org
#> User-Agent: httr2/1.0.0.9000 r-curl/5.2.1 libcurl/8.4.0
#> Accept: */*
#> Accept-Encoding: deflate, gzip

Use req_perform() to perform the request, retrieving a response:

resp <- req_perform(req)
resp
#> <httr2_response>
#> GET https://www.r-project.org/
#> Status: 200 OK
#> Content-Type: text/html
#> Body: In memory (6854 bytes)

The resp_ functions help you extract various useful components of the response:

resp |> resp_content_type()
#> [1] "text/html"
resp |> resp_status_desc()
#> [1] "OK"
resp |> resp_body_html()
#> {html_document}
#> <html lang="en">
#> [1] <head>\n<meta http-equiv="Content-Type" content="text/html; charset=UTF-8 ...
#> [2] <body>\n    <div class="container page">\n      <div class="row">\n       ...

Major differences to httr

  • You can now create and modify a request without performing it. This means that there’s now a single function to perform the request and fetch the result: req_perform(). req_perform() replaces httr::GET(), httr::POST(), httr::DELETE(), and more.

  • HTTP errors are automatically converted into R errors. Use req_error() to override the defaults (which turn all 4xx and 5xx responses into errors) or to add additional details to the error message.

  • You can automatically retry if the request fails or encounters a transient HTTP error (e.g. a 429 rate limit request). req_retry() defines the maximum number of retries, which errors are transient, and how long to wait between tries.

  • OAuth support has been totally overhauled to directly support many more flows and to make it much easier to both customise the built-in flows and to create your own.

  • You can manage secrets (often needed for testing) with secret_encrypt() and friends. You can obfuscate mildly confidential data with obfuscate(), preventing it from being scraped from published code.

  • You can automatically cache all cacheable results with req_cache(). Relatively few API responses are cacheable, but when they are it typically makes a big difference.

Acknowledgements

httr2 wouldn’t be possible without curl, openssl, jsonlite, and jose, which are all maintained by Jeroen Ooms. A big thanks also go to Jenny Bryan and Craig Citro who have given me much useful feedback on both the design of the internals and the user facing API.

httr2's People

Contributors

atheriel avatar boshek avatar casa-henrym avatar dyfanjones avatar fh-mthomson avatar flahn avatar hadley avatar hongooi73 avatar howardbaek avatar jameslairdsmith avatar jchrom avatar jennybc avatar jeroen avatar jl5000 avatar jonthegeek avatar judith-bourque avatar koderkow avatar maelle avatar maxheld83 avatar mgirlich avatar michaelchirico avatar mshelm avatar nealrichardson avatar nelson-gon avatar olivroy avatar owenjonesuob avatar ramiromagno avatar romainfrancois avatar salim-b avatar taerwin avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

httr2's Issues

URL substitution

Think about the common patterns used to document REST APIs, i.e. the colon signals a placeholder, and hooking this up nicely to glue(). Relates to glue transformers.

GitHub:
GET /repos/:owner/:repo/topics

Canvas:
DELETE /api/v1/courses/:course_id/discussion_topics/:topic_id

serialize NULL as null for JSON

In httr2, PUT/POST/PATCH with a JSON body should turn a NULL field into a JSON null, not an empty object. The current behaviour makes it unnecessarily hard to interface with REST APIs that distinguish between nulls and empty objects.

This is arguably a wart in jsonlite::toJSON, but changing that or httr now would probably break too much code. Since httr2 will be a new package, there shouldn't be any legacy issues.

# current behaviour, as seen in httr:::body_config
jsonlite::toJSON(list(a=1, b=list(), c=list(x=2, y=NULL)), auto_unbox=TRUE)
# {"a":1,"b":[],"c":{"x":2,"y":{}}}

# better behaviour
jsonlite::toJSON(list(a=1, b=list(), c=list(x=2, y=NULL)), auto_unbox=TRUE, null="null")
# {"a":1,"b":[],"c":{"x":2,"y":null}}

Check first arguments

And give clear error message if resp_ doesn't get a response object/req_ doesn't get a request object.

Can we use multifetch in req_fetch()?

Two big advantages:

  • only have to maintain one code path, not one in req_fetch() and one in req_multi_fetch()
  • easier to do progress bars which are otherwise challenging inside of curl callbacks

Will need to add additional queue of requests that need to be added to the pool in the future. And keep looping until both pool and queue are empty.

Broadly consider redaction

When is a user likely to accidental leak confidential information when trying to get help?

  • What else needs to be redacted apart from the Authorization header?
  • Should req_dry_run() redact?
  • Should req_verbose() redact?

Make caching easier

Make it possible to designate some cache directory for storing body content.

Check Azure certificate based auth

https://docs.microsoft.com/en-us/azure/active-directory/develop/v2-oauth2-auth-code-flow#redeem-a-code-for-an-access-token

Something like this:

client_id <- "my_client_id"
claims <- list(
  aud = "https://login.microsoftonline.com/{tenant}/v2.0",
  iss = client_id,
  sub = client_id
)
oauth_app(
  oauth_client("id"),
  endpoints = c(
    token = "https://login.microsoftonline.com/{tenant}/oauth2/v2.0/token"
  ),
  auth = "jwt_enc",
  auth_params = list(
    claims = claims,
    key = "path/to/private/key",
    header = list(
      x5t = base64url_encode(openssl::sha1(openssl::read_cert("path/to/certificate")))
    )
  )
)

Login at https://portal.azure.com/. Need to figure out how to create app, and how to create certificate.

Create apps at https://portal.azure.com/#blade/Microsoft_AAD_RegisteredApps/ApplicationsListBlade

See code in https://github.com/Azure/AzureAuth/blob/6f6ecf4ea47e3da35afe1e05e5c5a196a00b8aae/R/cert_creds.R

Pagination

Help with auto-traversal, as we do in gh. There are some pretty standard ways of doing this.

httr2

httr is a library for two main tasks: creating http requests and parsing the responses. Currently this dichotomoy is a little muddled because:

  • There is no explicit request object - it is only created internally by
    the request functions. This tends to lead to large request functions that
    have many arguments passed in ...
  • There's no consistent naming scheme that indicates whether a function
    works with a request or a response. This is particuarly confusing for
    functions like content_type(): does it set the content type of
    the request or extract the content type of the response?

Additionally, httr was designed prior to the pipe and so uses ... rather than functions that modify an object. This makes the API feel rather different to similar APIs (e.g. rvest). It also makes it harder to test, because you can only easily test the result of issuing a request, not the internal request object. Overall the API feels a little dated, and a little underdesigned.

Request API

Basics

req("http://url.com/") %>% req_fetch()

# req_fetch could be a generic so you could still do
req_fetch("http://url.com/")

Performing the request

The HTTP method doesn't affect the input arguments or the output type, so that suggests that the key API verb should be the output, rather than the HTTP method:

req("http://url.com") %>% req_fetch()
req("http://url.com") %>% req_async()
req("http://url.com") %>% req_save(path = path)
req("http://url.com") %>% req_stream(fun = fun)

The request verb would default to GET, unless a body was set, in which case it would change to POST. Otherwise, you could override yourself in two ways:

req("http://url.com") %>% req_method("DELETE") %>% req_fetch()
req("http://url.com") %>% req_fetch("DELETE")

The first form would be most useful when generating partial requests for API wrappers.

Url

# Would generally make smaller functions that took multiple arguments
# This would make iterative url construction much nicer for APIs
req("http://url.com/") %>% 
  req_path("a", username, "b") %>%  # replaces
  req_path_suffix("y") %>%  # appends
  req_path_prefix("x") %>%  # for completeness
  req_query(y = 1) %>% 
  req_params() %>% 
  req_fetch()

# req would need to parse the url so you could start with
req("http://url.com/api/v1/?q=2")

Body

# This would also lead to a nicer API for bodies
req("http://url.com/") %>% 
  req_body_json(a = 1, b = 2, c = 3) %>% 
  req_fetch() 

# Setting body would change default request type, but you could
# override
req("http://url.com/") %>% 
  req_body_json(a = 1, b = 2, c = 3) %>% 
  req_fetch("PUT") 

req("http://url.com/") %>% 
  req_body_json(a = 1, b = 2, c = 3) %>% 
  req_save(path = "~/Desktop/bigfile.blah") 
  # If path is directory, automatically add name from url?

# req_body_file()
# req_body_form()
# req_body_json()
# req_body_multipart()
# req_body_raw()

Authentication

req("http://url.com/") %>% req_auth_basic()
req("http://url.com/") %>% req_auth_oauth1()
req("http://url.com/") %>% req_auth_oauth2()

Headers

req("http://url.com/") %>% 
  req_header(`Content-type` = "application/json") %>% 
  req_header(a_list_from_somewhere_else) # would unlist() inputs as needed.
# list(...) %>% flatten() %>% map_chr(as.character())

Curl

# And there would be a new function for setting specific curl requests
# These would be applied after other request parameters
req_config()

Response API

Basics

resp <- req("http://google.com") %>% req_fetch()

# Pipeable API to check that the response is what you expect
resp %>% 
  resp_check_ok() %>% 
  resp_check_body_xml() %>% 
  resp_content_xml()

# Other functions extract headers
resp %>% resp_headers()
resp %>% resp_content_type()

# And other http components
resp %>% resp_status()
resp %>% resp_url()
resp %>% resp_timings()

Content

resp %>% resp_content_raw()
resp %>% resp_content_text(encoding = "UTF-8")

# Would rely on user to check content type using helper from above
resp %>% resp_content_json()
resp %>% resp_content_xml()
resp %>% resp_content_html()
resp %>% resp_content_png()
resp %>% resp_content_jpeg()

Would not have resp_content_auto() because I think time has shown that this is a bad idea.

New package?

Should be a new package, httr2?

Pros:

  • Can start from scratch without having to work with existing API
  • Could aim for high unit test coverage from the beginning.
  • Documentation will be less confusing because it doesn't have to describe
    two APIs.

Cons:

  • May re-introduce bugs because I miss important logic
  • Will have to maintain two packages in the short term (in the long term
    would deprecate httr).

I think the pros probably outweigh the cons - the API will a sufficiently large change that it's worth starting from scratch.

@craigcitro, @jeroenooms, @jennybc I'd love your thoughts on this, if you have a little spare time.

Checking content type

In resp_body_json() the only accepted media type seems to be application/json. Many APIs have vendor specific types, e.g. application/vnd.github-issue.text+json (see other examples on Swagger or the RFC 6838).
It would be great if these media types were also recognized as valid json media type.

Figure out shiny integration

r-lib/gargle#157

Code in PR currently uses OAuth as gate to access app; might also want to use it as optional feature (i.e. log in to save this file to your google drive), so will also need to work out that flow.

Lightweight json class?

e.g.

print.httr2_json <- function(x, ...) {
  "List from json"
  jsonlite::fromJSON(x, auto_unbox = TRUE, pretty = TRUE)
}

Because it's a relatively compact format. OTOH maybe this will be confusing because it's not obvious you can treat it like a list?

Think about encryption

  • Basic with built-in password to avoid storing client_secret in plain text
  • Password argument to auth or cache to make it harder for a different app to use your tokens

Read from stream

  • Provide timeout, optionally inf
  • Callback function + chunk size
  • Callback function has way to gracefully terminate

Multi-VERB partials

In wrapper packages, it's common to create thin wrappers around several httr::VERB()s, and use a common, e.g., base URL, user agent, token-injecting strategy, etc. Think about the best pattern for facilitating this. Could possibly extend to other API-wide aspects, like batching, retries, throttling. Make it easier to propagate shared data and policies.

HTTP state handler / HTTP event handler

Discussion/Idea

For the the interpretation of robots.txt files it turns out that a lot of the interpretation depends on the status of the HTTP-request: server error, client error, returning a robots.txt file or some other format, ...

I put some effort into designing a callback / event handler system. It will do but its all but elegant.

The question is, if handling different states/events and status messages is a broad enough problem to maybe get handled in httr(2) in a better thought through, consistent and robust manner.

States and events I have got to handle (just to give an idea of the problem space):

  • 404
  • client error other than 404
  • 5xx (aka server error)
  • mime type is not "text/plain"
  • response data looks like HTML, XML or JSON (aka it does not look like plain text)
  • redirects without domain change
  • redirect to subdomain www
  • redirects with domain change

Automatically remove old tokens

I think it makes sense to have some default expiration policy, so that tokens that haven't been used for x days (maybe default to 30?) are automatically deleted.

Redesign OAuth

  • Break down into smaller pieces since there's so much variation across sites
  • Copy token caching from gargle/rtweet
  • Provide more flow

Extract query parameters

I spend a few minutes to try out httr2. Great work with the new oauth system. I didn't really managed to use oauth with httr but it only took me two minutes to make it work in httr2 😄

I want to extract the url query paramters of a request and wondered whether there is an easier way than the following

library(httr2)
req <- request("https://example.com/path/to/page?name=ferret&color=purple")

req_queries <- url_parse(req$url)$query
req_queries$name
#> [1] "ferret"

Created on 2021-07-29 by the reprex package (v2.0.0)

OAuth as a new package?

@jennybc and I discussed this just a bit in DM's - she suggested opening an issue here.

The major feature crul does not have that httr does have is OAuth. It's not a common use case in scientific web resources, so there's not a big push for me to support it. However, it would be great if there was an easy way for crul users to incorporate OAuth with a separate package.

There's only two higher level http libraries now, but surely more http libraries will come along in the future, and maybe they'll even be httr and httr2 co-existing.

Breaking out OAuth into a separate library that could be integrated into any http library seems to make sense. In python theres https://github.com/oauthlib/oauthlib and ruby has https://github.com/oauth-xx/oauth2 (i may be missing some other libs that are more widely used)

From what Jenny tells me, OAuth in httr is pretty integrated into the package, so maybe it isn't that easy to do?

API wrapping vignette

  • Basics of core function: request, possibly template.
  • How to improve error handling with req_error()
  • Auth (including basic OAuth discussion)

Return mocked responses

With httr we can hook into requests via set_callback, which I use to make webmockr and vcr work. Can there be something similar here?

cc @maelle

Consider moving endpoints to individual flows

And auth to the client. Also add key to client as in #33, and then think about whether auth_params could just be .... Then the app object could go away.

# old
client <- oauth_client(
  "732991327087-cdl705uujluehert8a47rhr0umetg5ut.apps.googleusercontent.com",
  obfuscated("RmOi9uaoCNx8o6XmVEMU9A_fopiN5-iQ")
)
app <- oauth_app(client, endpoints = c(
  authorization = "https://accounts.google.com/o/oauth2/auth",
  token = "https://accounts.google.com/o/oauth2/token",
  device_authorization = "https://oauth2.googleapis.com/device/code"
))

oauth_flow_auth_code(app, scope = "https://www.googleapis.com/auth/userinfo.email")
oauth_flow_device(app, scope = "https://www.googleapis.com/auth/userinfo.email")

# new
client <- oauth_client(
  "732991327087-cdl705uujluehert8a47rhr0umetg5ut.apps.googleusercontent.com",
  obfuscated("RmOi9uaoCNx8o6XmVEMU9A_fopiN5-iQ")
)
oauth_flow_auth_code(client, 
  auth_url = "https://accounts.google.com/o/oauth2/auth",
  token_url = "https://accounts.google.com/o/oauth2/token",
  scope = "https://www.googleapis.com/auth/userinfo.email"
)
oauth_flow_device(app, 
  auth_url = "https://oauth2.googleapis.com/device/code",
  token_url = "https://accounts.google.com/o/oauth2/token",
  scope = "https://www.googleapis.com/auth/userinfo.email"
)

That makes it much more clear what urls each flow requires, and would simplify oauth_flow_check_app().

If you're providing multiple auth flows for a single API, you'd need to avoid repeating the token URLs in some other way.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.