juliaweb / uris.jl Goto Github PK
View Code? Open in Web Editor NEWURI parsing in Julia
Home Page: https://juliahub.com/docs/URIs
License: Other
URI parsing in Julia
Home Page: https://juliahub.com/docs/URIs
License: Other
Julia 1.4.0
HTTP.jl 0.8.14
MbedTLS.jl 1.0.1
The use of etc...
in the documentation for HTTP.URI
is pretty ambiguous. In particular, the query
keyword is not well documented. One could reasonably assume the following should work:
julia> HTTP.URI(scheme="https", host="julialang.org", path="/foo/bar", query="id"=>1234)
ERROR: BoundsError: attempt to access Int64
at index [2]
Stacktrace:
[1] indexed_iterate(::Int64, ::Int64, ::Nothing) at ./tuple.jl:90
[2] (::HTTP.URIs.var"#16#17")(::Int64) at ./none:0
[3] iterate at ./generator.jl:47 [inlined]
[4] join(::Base.GenericIOBuffer{Array{UInt8,1}}, ::Base.Generator{Pair{String,Int64},HTTP.URIs.var"#16#17"}, ::String) at ./strings/io.jl:296
[5] sprint(::Function, ::Base.Generator{Pair{String,Int64},HTTP.URIs.var"#16#17"}, ::Vararg{Any,N} where N; context::Nothing, sizehint::Int64) at ./strings/io.jl:105
[6] sprint at ./strings/io.jl:101 [inlined]
[7] join at ./strings/io.jl:301 [inlined]
[8] escapeuri at /Users/bieganek/.julia/packages/HTTP/GkPBm/src/URIs.jl:320 [inlined]
[9] merge(::HTTP.URIs.URI; scheme::String, userinfo::SubString{String}, host::String, port::SubString{String}, path::String, query::Pair{String,Int64}, fragment::SubString{String}) at /Users/bieganek/.julia/packages/HTTP/GkPBm/src/URIs.jl:87
[10] #URI#3 at /Users/bieganek/.julia/packages/HTTP/GkPBm/src/URIs.jl:66 [inlined]
[11] top-level scope at REPL[22]:1
It turns out the correct syntax is either
julia> HTTP.URI(scheme="https", host="julialang.org", path="/foo/bar", query=["id"=>1234])
HTTP.URI("https://julialang.org/foo/bar?id=1234")
or
julia> HTTP.URI(scheme="https", host="julialang.org", path="/foo/bar", query=Dict("id"=>1234))
HTTP.URI("https://julialang.org/foo/bar?id=1234")
Heck, I even tried this:
julia> HTTP.URI(; scheme="https", host="julialang.org", path="/foo/bar", "id"=>1234)
ERROR: TypeError: in Type, in parameter, expected Type, got Tuple{String}
Stacktrace:
[1] top-level scope at REPL[28]:1
though I was pretty sure it wouldn't work...
How do I use URIs.jl to add a query parameter to a URL that might already have query parameters?
e.g.
URIs.magicfunction(URI("http://hello.com/page?x=123", Dict("y" => "no")) ==
URI("http://hello.com/page?x=123&y=no")
URIs.magicfunction(URI("http://hello.com/page", Dict("y" => "no")) ==
URI("http://hello.com/page?y=no")
I guess it makes sense that you may not want to implicitly convert S15->String, but for someone that may not care what a String15 is, (like it being read from a CSV), it's a confusing error.
Should the fix be defining _bytes(s::AbstractString) = _bytes(String(s))
? Not sure if that's right
The quoted section of code doesn't actually check whether a URI contains invalid characters or not - e.g., ' '
(space character) is not allowed in the host part of an authority (or anywhere in a URI for that matter as far as I can tell), but still makes it through and can lead to some weird requests. There's also no two @
allowed.
When parsing user-typed URIs "should attempt to recognize and strip both delimiters and embedded whitespace", according to RFC3986.
It should be noted that the regex from the RFC for seperating valid URIs doesn't check for invalid characters that are not explicitly excluded in the grammar definitions and so is not enough for ensuring a URI is valid.
Here a MWE for showing the fault:
julia> using HTTP
julia> badURL = "http://[email protected] @test.com:8080/test"
julia> t = HTTP.URIs.URI(badURL)
julia> for f in 1:fieldcount(typeof(t))
n = fieldname(typeof(t),f)
println("$n - '$(getfield(t, f))'")
end
uri - 'http://[email protected] @test.com:8080/test'
scheme - 'http'
userinfo - 'foo'
host - '127.0.0.1 @test.com'
port - '8080'
path - '/test'
query - ''
fragment - ''
julia> HTTP.get(badURL)
and in a seperate terminal session:
<snip>:~ $ nc -l -p 8080
GET /test HTTP/1.1
Host: 127.0.0.1 @test.com
Content-Length: 0
That the request actually goes through is a problem with Sockets.getaddrinfo
though, since that does no checking either and the underlying OS library returns 127.0.0.1
for the (invalid) host 127.0.0.1 @test.com
on my machine.
julia> versioninfo()
Julia Version 1.2.0
Commit c6da87ff4b (2019-08-20 00:03 UTC)
Platform Info:
OS: Linux (x86_64-linux-gnu)
CPU: Intel(R) Core(TM) i7-6600U CPU @ 2.60GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-6.0.1 (ORCJIT, skylake)
Environment:
JULIA_NUM_THREADS = 4
(v1.2) pkg> st
Status `~/.julia/environments/v1.2/Project.toml`
[cd3eb016] HTTP v0.8.5
julia> using URIs
julia> uri = URI("postgresql://localhost:5432/postgres")
URI("postgresql://localhost:5432/postgres")
julia> URI(uri; userinfo="user:password")
ERROR: ArgumentError: backtrace() requires scheme in uses_authority || isempty(host)
Stacktrace:
[1] macro expansion
@ ~/.julia/packages/URIs/o9DQG/src/debug.jl:52 [inlined]
[2] URI(uri::URI; scheme::SubString{String}, userinfo::String, host::SubString{String}, port::SubString{String}, path::SubString{String}, query::SubString{String}, fragment::SubString{String})
@ URIs ~/.julia/packages/URIs/o9DQG/src/URIs.jl:70
[3] top-level scope
@ REPL[4]:1
julia> URIs.URI(scheme = "ssh", host="test.com", userinfo="testuser" )
ERROR: ArgumentError: backtrace() requires scheme in uses_authority || isempty(host)
Stacktrace:
[1] macro expansion
@ ~/.julia/packages/URIs/o9DQG/src/debug.jl:52 [inlined]
[2] URI(uri::URI; scheme::String, userinfo::SubString{String}, host::String, port::SubString{String}, path::SubString{String}, query::SubString{String}, fragment::SubString{String})
@ URIs ~/.julia/packages/URIs/o9DQG/src/URIs.jl:70
[3] #URI#4
@ ~/.julia/packages/URIs/o9DQG/src/URIs.jl:83 [inlined]
[4] top-level scope
@ REPL[17]:1
I see ssh
is not present in
Line 282 in 5a1a0a1
Any reason to exclude ssh
?
I just want to point out that RFC 3986 covers both URLs and URNs under the umbrella of URIs. But there is no mention of URNs anywhere in this project. Granted, it is possible to parse a URN as a URI. But the data types still have fields that are URL specific (e.g., host
, path
) which are URL specific and don't contain any of the terms associated with URNs from RFC 8141 (e.g., NID
or NSS
).
Also, the isvalid
function isn't actually performing the checks from RFC 8141 relating to valid URNs. The result is that invalid URNs (e.g., urn:--name:foo
) are considered valid by the isvalid
function and valid URNs (e.g., urn:example:weather?=op=map&lat=39.56&lon=-104.85&datetime=1969-07-21T02:56:15Z
) are considered invalid by the isvalid
function.
In short, I think in order to really claim RFC 3986 compatibility, some work needs to be done to add support for RFC 8141 or the documentation should be updated to reflect that the library is really meant for supporting only URLs.
Thoughts?
Are we happy with the API which is exported from this package, and is there public API which should be documented but which is not exported?
Not being the author of this code, it's not entirely clear to me where the boundaries lie but I'm happy to do some legwork. @quinnj any thoughts?
At least we should document the exports (queryparams
, absuri
, escapepath
) which are currently undocumented.
Also pinging @samoconnor on the off chance he's around and interested in commenting on some old code of his :-)
Of course, this doesn't block a 0.1 release. We should probably just get that ball rolling asap.
Maybe queryparams
could return Dict{String,String}()
in the empty case.
julia> import HTTP
julia> x = HTTP.URI("http://hey.you")
HTTP.URI("http://hey.you")
julia> HTTP.queryparams(x)
Dict{Any,Any}()
julia> x = HTTP.URI("http://hey.you?a=1")
HTTP.URI("http://hey.you?a=1")
julia> HTTP.queryparams(x)
Dict{String,String} with 1 entry:
"a" => "1"
julia> x = HTTP.URI("http://hey.you?a=b")
HTTP.URI("http://hey.you?a=b")
julia> HTTP.queryparams(x)
Dict{String,String} with 1 entry:
"a" => "b"
u = URI("//foo.com:bar")
using Test
@test u.port == "bar"
breaks the parsing of port as expected from https://www.ietf.org/rfc/rfc3986.html#section-3.2.3
The port subcomponent of authority is designated by an optional port
number in decimal following the host and delimited from it by a
single colon (":") character.port = *DIGIT
The regex part is too broad @ https://github.com/JuliaWeb/URIs.jl/blob/master/src/URIs.jl#L94
splitpath
does not handle URL parameters correctlysplitpath
does not accept a URI
julia> URIs.splitpath("/a/b?c=d")
1-element Array{String,1}:
"a"
julia> URIs.splitpath("/a/b")
2-element Array{String,1}:
"a"
"b"
julia> URIs.splitpath(URI("/a/b?c=d"))
ERROR: MethodError: no method matching splitpath(::URI)
Closest candidates are:
splitpath(::AbstractString) at /Users/fons/.julia/packages/URIs/1jrj1/src/URIs.jl:382
Stacktrace:
[1] top-level scope at REPL[9]:1
julia> URIs.splitpath(URI("/a/b?c=d").path)
2-element Array{String,1}:
"a"
"b"
URIs.jl v1.1.0
Attempting to create a URI
with only a host
results in the following error:
julia> using URIs
julia> URI(; host="example.com")
ERROR: ArgumentError: backtrace() requires scheme in uses_authority || isempty(host)
Stacktrace:
[1] macro expansion
@ ~/.julia/packages/URIs/o9DQG/src/debug.jl:52 [inlined]
[2] URI(uri::URI; scheme::SubString{String}, userinfo::SubString{String}, host::String, port::SubString{String}, path::SubString{String}, query::SubString{String}, fragment::SubString{String})
@ URIs ~/.julia/packages/URIs/o9DQG/src/URIs.jl:70
[3] #URI#4
@ ~/.julia/packages/URIs/o9DQG/src/URIs.jl:83 [inlined]
[4] top-level scope
@ REPL[3]:1
The strange part about this error is the reference to backtrace()
In playing with SnoopCompile, there's a method invalidation
inserting joinpath(uri::URIs.URI, parts::String...) in URIs at /Users/tlienart/.julia/packages/URIs/1jrj1/src/URIs.jl:466 invalidated:
mt_backedges: 1: signature Tuple{typeof(joinpath), Any, String} triggered MethodInstance for Artifacts.jointail(::Any, ::String) (0 children)
I'm not quite savvy enough to know what to do about this though (see also timholy/SnoopCompile.jl#210)
Hi there --
While I was finishing up writing tests for #19, I noticed that the normpath
function doesn't correctly handle two of the abnormal examples mentioned in RFC 3986 Section 5.4.2:
julia> # The following is supposed to come out as http://a/b/c/g.
julia> URIs.normpath("http://a/b/c/g.")
"http://a/b/c/g./"
julia> # The following is supposed to come out as http://a/b/c/g..
julia> URIs.normpath("http://a/b/c/g..")
"http://a/b/c/g../"
Representing an array in query parameter is tricky and there is no single true way, but I think at least one of the often-used approaches should be supported.
Current behavior is:
julia> using URIs
julia> queryparams("foo=bar,qux")
Dict{String,String} with 1 entry:
"foo" => "bar,qux"
julia> queryparams("foo[]=bar&foo[]=qux")
Dict{String,String} with 1 entry:
"foo[]" => "qux"
julia> queryparams("foo=bar&foo=qux")
Dict{String,String} with 1 entry:
"foo" => "qux"
julia> queryparams("foo%5B%5D=bar&foo%5B%5D=qux")
Dict{String,String} with 1 entry:
"foo[]" => "qux"
I would expect all of them, or at least the ones with repeated foo
to return an array instead of string with single value.
It would make the type-stability more complicated, because now it's Dict{String,String}
, so accommodating arrays there would mean changing it probably to Dict{String,Union{Vector},String}}
?
I'm not sure how the proper solution should look like, but we should not definitely lose parameters as the current behavior does.
julia 1.8.5
URIs 1.4.2
Is that a regression or I really need more coffee?
julia> using URIs
julia> URI("http://www.example.com"; query=Dict("a" => "b"))
ERROR: MethodError: no method matching URI(::String; query=Dict("a" => "b"))
Closest candidates are:
URI(::String, ::SubString{String}, ::SubString{String}, ::SubString{String}, ::SubString{String}, ::SubString{String}, ::SubString{String}, ::SubString{String}) at ~/.julia/packages/URIs/gpp9J/src/URIs.jl:47 got unsupported keyword argument "query"
URI(::AbstractString) at ~/.julia/packages/URIs/gpp9J/src/URIs.jl:146 got unsupported keyword argument "query"
URI(::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any) at ~/.julia/packages/URIs/gpp9J/src/URIs.jl:47 got unsupported keyword argument "query"
This issue is used to trigger TagBot; feel free to unsubscribe.
If you haven't already, you should update your TagBot.yml
to include issue comment triggers.
Please see this post on Discourse for instructions and more details.
RFC 3986 Sec. 5.2 specifies a way to resolve URI references relative to a base URI. I'm thinking that it might be useful to have a resolvereference
function (similar to Go's ResolveReference
and Python's urljoin
) that implements this functionality. E.g.:
julia> using URIs
julia> uri = URI("https://www.example.org/foo/bar/");
julia> resolvereference(uri, URI("/baz/"))
URI("https://www.example.org/baz/")
julia> resolvereference(uri, URI("baz/"))
URI("https://www.example.org/foo/bar/baz/")
julia> resolvereference(uri, URI("../baz/"))
URI("https://www.example.org/foo/baz/")
julia> resolvereference(uri, URI("https://www.google.com"))
URI("https://www.google.com")
Julia 1.4.2
HTTP.jl 0.8.16
MbedTLS.jl 1.0.2
When I pass scheme
, host
, and query
arguments to HTTP.URI()
, the returned object has no .uri
field; rather, .uri
equals ""
, an empty string. When I pass a string URL to HTTP.URI()
, the returned object has a non-empty .uri
field.
HTTP.URI(str)
works as expected:
HTTP.URI("https://example.com").uri
# "https://example.com"
HTTP.URI(; scheme="", host="", port="", etc...)
does not:
HTTP.URI(scheme="https", host="example.com").uri
# ""
Is this intended?
I'm fine doing the following to get what I want, but the above behavior just wasn't what I was expecting.
HTTP.URI(scheme="https", host="example.com") |> string
# "https://example.com"
I was trying to merge some new query params to existing ones with merge
, but to my surprise that merge creates a new URI
with path
, query
etc. instead of merging them with the old ones. This is quite confusing to me before I read the implementation since the merge
on Dict
actually "merge" the old one with the new one together. I'm wondering if the name of merge
should be changed to something else so it is clearer?
On the other hand, I'm wondering if there could be such a method that adds some new properties to the existing query. E.g it's useful when a REST API is provided via a path https:://auth.ibmq.com/api
which is parsed as "https:://auth.ibmq.com"
and "/api" as its path, then a method that join the path of the REST API calls would be very convenient to produce things like "https:://auth.ibmq.com/api/loginWithAuth"
PS. I think probably let the URI
constructor take some keywords makes more sense than overloading merge
in this case.
JavaScript:
> new URL("./style.css", "http://x.org/wow/index.html").href
"http://x.org/wow/style.css"
> new URL("./style.css", "http://x.org/wow/index.html/").href
"http://x.org/wow/index.html/style.css"
URIs master
:
julia> joinpath(URI("http://x.org/wow/index.html"), "./style.css")
URI("http://x.org/wow/index.html/style.css")
julia> joinpath(URI("http://x.org/wow/index.html/"), "./style.css")
URI("http://x.org/wow/index.html/style.css")
Hello,
I wonder if URIs can parse data URI scheme ie https://datatracker.ietf.org/doc/html/rfc2397
An example can be found here
Kind regards
PS : related issue might be JuliaLang/Downloads.jl#122
RegExp that could be used https://gist.github.com/khanzadimahdi/bab8a3416bdb764b9eda5b38b35735b8
julia> URIs.splitpath(URIs.URI("https://asdf.com/a/b?x=y#z"))
2-element Vector{String}:
"a"
"b"
julia> URIs.splitpath("https://asdf.com/a/b?x=y#z")
5-element Vector{String}:
"https:"
""
"asdf.com"
"a"
"b"
Based on the documentation for splitpath
, I expected both to return the same result.
Example:
julia> HTTP.URI(; scheme="https", host="julialang.org", path="foo/bar")
ERROR: ArgumentError: merge(::HTTP.URIs.URI; scheme::String, userinfo::SubString{String}, host::String, port::SubString{String}, path::String, query::SubString{String}, fragment::SubString{String}) requires !(scheme in ["http", "https"]) || (isempty(path) || path[1] == '/')
Stacktrace:
[1] macro expansion at /Users/bieganek/.julia/packages/HTTP/GkPBm/src/debug.jl:52 [inlined]
[2] merge(::HTTP.URIs.URI; scheme::String, userinfo::SubString{String}, host::String, port::SubString{String}, path::String, query::SubString{String}, fragment::SubString{String}) at /Users/bieganek/.julia/packages/HTTP/GkPBm/src/URIs.jl:81
[3] #URI#3 at /Users/bieganek/.julia/packages/HTTP/GkPBm/src/URIs.jl:66 [inlined]
[4] top-level scope at REPL[31]:1
That's not a terribly helpful error message. If you look really closely you can see the (isempty(path) || path[1] == '/')
part at the end, but it's easy to miss.
In addition to a better error message, it would be a good idea to explicitly mention in the docstring that path
must start with a forward slash.
Let's run
import URIs: URI
using Test
@testset begin
u = URI(raw"c:\windows\temp") # on windows
@test isdir(string(u))
@test_throws StackOverflowError isdir(u)
end;
Those two tests pass. IMO, second one should not fail .
Because stat(path...) = stat(joinpath(path...))
implicitely expect a cast to string @ https://github.com/JuliaLang/julia/blob/v1.7.3/base/stat.jl#L193
meanwhile URIs return an URI when called on joinpath(::URI) and loops infinitely
Maybe should we add
Base.stat(a::URI) = stat(string(a))
That fixes that when calling isdir, isfile
As @fonsp pointed out in #39, URIs.jl does not technically handle Unicode characters correctly, at least according to RFC 3986. IETF RFC 3986 Sec. 1.2.1 implies that URIs should only contain characters from the US-ASCII charset and should percent-encode additional characters (RFC 3987 makes this a little more explicit). URIs.jl, however, will accept and work with any string as its input regardless of the underlying character set:
julia> using URIs
julia> url = URI("https://a/๐/e")
URI("https://a/๐/e")
julia> url.path
"/๐/e"
After diving into it for a bit, there seems to be a split in how the standard / canonical library for URI handling works in many other languages. In JavaScript, Go, and Rust, passing in a URI that uses Unicode will either force the URI to be percent-encoded or raise an error:
>> new URL("https://a/๐/e").pathname
"/%F0%9F%8C%9F/e"
package main
import (
"fmt"
"net/url"
"os"
)
func main() {
url, err := url.Parse("https://a/๐/e")
if err != nil {
fmt.Fprintf(os.Stderr, "Error parsing url: %s", err)
return
}
fmt.Printf("%s\n", url)
// Prints https://a/%F0%9F%8C%9F/e
}
Rust's http
crate will actually panic if you try to feed it a Unicode URI at all, e.g.:
use http::Uri;
fn main() {
let uri = Uri::from_static("https://a/๐/e");
println!("{}", uri.path());
}
$ cargo run
Finished dev [unoptimized + debuginfo] target(s) in 0.01s
Running `target/debug/uri`
thread 'main' panicked at 'static str is not valid URI: invalid uri character', /home/kernelmethod/.cargo/registry/src/github.com-1ecc6299db9ec823/http-0.2.7/src/uri/mod.rs:365:23
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
But this isn't universally the case: in Python and Java, the Unicode encoding is preserved:
>>> from urllib.parse import urlparse
>>> url = urlparse("https://a/๐/e")
>>> url.path
'/๐/e'
import java.net.*;
class URITesting {
public static void main(String[] args) {
try {
URI url = new URI("https://a/๐/e");
System.out.printf("path = %s\n", url.getPath());
}
catch (URISyntaxException ex) {
System.out.println(ex);
}
// System.out.println("Hello, World!");
}
}
One potential difference between these languages is that Java's java.net.URI
tries to comply with RFC 2936, whereas Python's urllib.parse.urlparse
seems to try to comply with a mix of standards.
In any case, there's a bit of a dilemma here -- this library doesn't quite implement the RFC as specified, which is also an issue that has cropped up in other places, e.g. in the implementation of normpath
#20 and joinpath
(related issue: #18). As far as this issue is concerned, it seems like there are three ways URIs.jl could go:
urllib.parse
module.I would think that option (1) is the most preferable of all of these -- this library says that it implements URIs according to RFC 3986, so it should comply with that RFC.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.