Code Monkey home page Code Monkey logo

Comments (5)

sporkmonger avatar sporkmonger commented on June 7, 2024

The client should not be calling normalize, but this is a mistake in the client, not the URI parser. However, if it's been like this for awhile, resolving it will likely cause breaking changes in any projects that have em-http-request as a dependency and that have been relying on this functionality. i.e., the 'edge cases' Ilya was referring to.

If Ilya needs convincing, point him at OAuth and ask him if he's ever tried using em-http-request in conjunction with an auth mechanism that signs parts of the URI. If you pre-normalize like this, the signatures won't match and the request will fail in ways that are nearly impossible to debug.

However, in this particular case you've given, that is not an example of a semantic change. All URI-aware software should treat those two as equivalent. The main problem here is simply that if I give an HTTP client a URI, I expect it to make a request against exactly the byte-for-byte data I give it. Pre-normalizing is the kind of magic Ruby is known and sometimes reviled for, and we shouldn't be making a habit of that.

from addressable.

igrigorik avatar igrigorik commented on June 7, 2024

Bob, the normalize! call in em-http is a fairly recent addition. Perhaps I misunderstood the utility / semantics of it? I assumed the same behavior as built in URI lib...

ruby-1.9.2-p0 > require 'uri'
ruby-1.9.2-p0 > u = URI.parse('http://example.com/path?a=%28%2B%29')
ruby-1.9.2-p0 > u.normalize
# URI::HTTP:0x00000101959fa0 URL:http://example.com/path?a=%28%2B%29
ruby-1.9.2-p0 > u.normalize.to_s
"http://example.com/path?a=%28%2B%29"
ruby-1.9.2-p0 > u.query
"a=%28%2B%29"
ruby-1.9.2-p0 > require 'addressable/uri'
ruby-1.9.2-p0 > a = Addressable::URI.parse('http://example.com/path?a=%28%2B%29')
ruby-1.9.2-p0 > a.normalize!
# Addressable::URI:0x80e588c0 URI:http://example.com/path?a=(+)
ruby-1.9.2-p0 > a.query
"a=%28%2B%29"

I'm guessing you're following the normalization spec? [1] If thats the case, this is a tricky case.. In theory, the URI's should be the same, in practice (due to server implementations) they are not. At the same time, the last thing I want to do is reimplement pars of Addressable in em-http.

It seems like saying "client shouldn't call normalize" defeats the purpose of the lib? Having said that, it's a catch-22 because that's what the spec says you should do. Ugh!

Any suggestions for how to deal with this?

[1] http://labs.apache.org/webarch/uri/rfc/rfc3986.html#normalize-encoding

from addressable.

sporkmonger avatar sporkmonger commented on June 7, 2024

Addressable performs encoding normalization as per the spec, yes. It also performs all the other normalization steps given, like path segment normalization and so on. The problem is not that Addressable's normalization is non-conformant. The problem is that an HTTP client must not perform normalization prior to sending the request. Nowhere does any spec require a generalized HTTP client perform normalization prior to sending the request. That's always something that should be done manually.

Normalization can and often does result in a new identifier. It's a process that attempts to produce a new URI that points to the same resource as the original URI. From the spec: "Implementations may use logic based on the definitions provided by this specification to reduce the probability of false negatives." In other words, any time you perform normalization, you run the risk of a false negative; i.e., a new URI that points to the wrong resource.

In the case of an HTTP client, it's critical that the client makes a request against precisely the same URI it was given. Because of the way HTTP splits the URI in half and only passes the request URI section to the server, it's OK to normalize the scheme and authority piece. But a client should not attempt to normalize the path or query components unless explicitly requested to do so.

And as I pointed out to the other guy, OAuth 1.0 is a perfect example of why this is so important. If you were to sign a request prior to passing it through to the HTTP client, and then the client performed normalization, the signatures would no longer match. The problem would be nearly impossible to debug on top of it, because it would work for almost all requests, and only if you encoded something that was already in canonical form would the signature fail.

from addressable.

sporkmonger avatar sporkmonger commented on June 7, 2024

Also, in the particular example you gave, both implementations are quite possibly wrong. The most correct normalization may actually be http://example.com/path?a=(%2B), depending on the context.

However, to be clear, this one could probably be argued two different ways according to two different specifications (RFC 3986 vs HTML 4.01). Which pretty much should make this the perfect example of why you don't want to normalize here.

from addressable.

igrigorik avatar igrigorik commented on June 7, 2024

Bob, that makes sense. Let me take a pass over the code in em-http. Should be able to remove the normalize! call without too much trouble, since its localized to a single location. Just have to make sure that the requests are dispatched correctly in a few edge cases.

from addressable.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.