Comments (5)
The client should not be calling normalize, but this is a mistake in the client, not the URI parser. However, if it's been like this for awhile, resolving it will likely cause breaking changes in any projects that have em-http-request as a dependency and that have been relying on this functionality. i.e., the 'edge cases' Ilya was referring to.
If Ilya needs convincing, point him at OAuth and ask him if he's ever tried using em-http-request in conjunction with an auth mechanism that signs parts of the URI. If you pre-normalize like this, the signatures won't match and the request will fail in ways that are nearly impossible to debug.
However, in this particular case you've given, that is not an example of a semantic change. All URI-aware software should treat those two as equivalent. The main problem here is simply that if I give an HTTP client a URI, I expect it to make a request against exactly the byte-for-byte data I give it. Pre-normalizing is the kind of magic Ruby is known and sometimes reviled for, and we shouldn't be making a habit of that.
from addressable.
Bob, the normalize! call in em-http is a fairly recent addition. Perhaps I misunderstood the utility / semantics of it? I assumed the same behavior as built in URI lib...
ruby-1.9.2-p0 > require 'uri' ruby-1.9.2-p0 > u = URI.parse('http://example.com/path?a=%28%2B%29') ruby-1.9.2-p0 > u.normalize # URI::HTTP:0x00000101959fa0 URL:http://example.com/path?a=%28%2B%29 ruby-1.9.2-p0 > u.normalize.to_s "http://example.com/path?a=%28%2B%29" ruby-1.9.2-p0 > u.query "a=%28%2B%29"
ruby-1.9.2-p0 > require 'addressable/uri' ruby-1.9.2-p0 > a = Addressable::URI.parse('http://example.com/path?a=%28%2B%29') ruby-1.9.2-p0 > a.normalize! # Addressable::URI:0x80e588c0 URI:http://example.com/path?a=(+) ruby-1.9.2-p0 > a.query "a=%28%2B%29"
I'm guessing you're following the normalization spec? [1] If thats the case, this is a tricky case.. In theory, the URI's should be the same, in practice (due to server implementations) they are not. At the same time, the last thing I want to do is reimplement pars of Addressable in em-http.
It seems like saying "client shouldn't call normalize" defeats the purpose of the lib? Having said that, it's a catch-22 because that's what the spec says you should do. Ugh!
Any suggestions for how to deal with this?
[1] http://labs.apache.org/webarch/uri/rfc/rfc3986.html#normalize-encoding
from addressable.
Addressable performs encoding normalization as per the spec, yes. It also performs all the other normalization steps given, like path segment normalization and so on. The problem is not that Addressable's normalization is non-conformant. The problem is that an HTTP client must not perform normalization prior to sending the request. Nowhere does any spec require a generalized HTTP client perform normalization prior to sending the request. That's always something that should be done manually.
Normalization can and often does result in a new identifier. It's a process that attempts to produce a new URI that points to the same resource as the original URI. From the spec: "Implementations may use logic based on the definitions provided by this specification to reduce the probability of false negatives." In other words, any time you perform normalization, you run the risk of a false negative; i.e., a new URI that points to the wrong resource.
In the case of an HTTP client, it's critical that the client makes a request against precisely the same URI it was given. Because of the way HTTP splits the URI in half and only passes the request URI section to the server, it's OK to normalize the scheme and authority piece. But a client should not attempt to normalize the path or query components unless explicitly requested to do so.
And as I pointed out to the other guy, OAuth 1.0 is a perfect example of why this is so important. If you were to sign a request prior to passing it through to the HTTP client, and then the client performed normalization, the signatures would no longer match. The problem would be nearly impossible to debug on top of it, because it would work for almost all requests, and only if you encoded something that was already in canonical form would the signature fail.
from addressable.
Also, in the particular example you gave, both implementations are quite possibly wrong. The most correct normalization may actually be http://example.com/path?a=(%2B)
, depending on the context.
However, to be clear, this one could probably be argued two different ways according to two different specifications (RFC 3986 vs HTML 4.01). Which pretty much should make this the perfect example of why you don't want to normalize here.
from addressable.
Bob, that makes sense. Let me take a pass over the code in em-http. Should be able to remove the normalize! call without too much trouble, since its localized to a single location. Just have to make sure that the requests are dispatched correctly in a few edge cases.
from addressable.
Related Issues (20)
- Day HOT 5
- Dependabot couldn't find a Gemfile HOT 3
- Templates doesn't handle IPv6 IPs
- Invalid scheme format for ssh URL HOT 4
- Is it intended that `normalized_path` destroys the trailing dot when it's the only char? HOT 1
- Improve pure ruby IDNA implementation to match browsers behavior (IDNA2008 and UTS#46) HOT 3
- Equivalent of `URI.regexp(schemes)`? HOT 4
- Crypto mining
- undefined method `to_str' for :id:Symbol (NoMethodError) in 2.8.2 HOT 8
- Template expansion does not work with symbolized hashes in 2.8.1 HOT 1
- Update to 2.8.2 break test env HOT 1
- Any version after 2.8.1 causes errors in our test suite coming from addressable. HOT 8
- Drop support for Ruby 2.2 (and more?) HOT 3
- Disallow backtick in host HOT 1
- Normalize errors when trying to run a simple url normalize HOT 4
- Unsafe concurrent Hash access HOT 9
- k
- feed:http: crashes servers HOT 11
- Valid domain not parsing HOT 1
- Improve release flow HOT 6
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from addressable.