juliaweb / uriparser.jl Goto Github PK
View Code? Open in Web Editor NEWUniform Resource Identifier (URI) parser in Julia
License: Other
Uniform Resource Identifier (URI) parser in Julia
License: Other
julia> get("http://nominatim.openstreetmap.org/search?format=json&q=México D.F.")
assertion failed: c < 0x80
while loading In[63], in expression starting on line 1
in is_url_char at /Users/jiahao/.julia/v0.3/URIParser/src/parser.jl:1
in parse_url at /Users/jiahao/.julia/v0.3/URIParser/src/parser.jl:267
in get at /Users/jiahao/.julia/v0.3/Requests/src/Requests.jl:575
julia> response = get("http://nominatim.openstreetmap.org/search",
query={"format"=>json, "q"=>"México D.F."})
Response(400 Bad Request, 17 Headers, 393 Bytes in Body)
julia> response.data
"<html><body><h1>Bad Request</h1><p>Nominatim has encountered an error with your request.</p><p><b>Details:</b> Illegal query string (not an UTF-8 string): M跩co D.F.</p><p>If you feel this error is incorrect feel free to report the bug in the <a href=\"http://trac.openstreetmap.org\">OSM bug database</a>. Please include the error message above and the URL you used.</p>\n</body></html>\n\r\n0\r\n\r\n"
Possibly related to #9?
The tag name "V0.2.0" is not of the appropriate SemVer form (vX.Y.Z).
cc: @malmaud
Would anyone else be interested in being able to convert a query to a Dict for easily accessing the key/values? Is this done in any other package?
I could have a go at a PR.
looks like metadata is behind by a couple of commits
Is it just me or this technically correct?
julia> str = "hey there newline: \n"
"hey there newline: \n"
julia> URIParser.escape(str)
"hey%20there%20newline%3A%20%A"
whereas this service correctly gives
hey%20there%20newline%3A%20%0A
I have an API I'm trying to hit, but it's rejecting because the newlines aren't escaped properly
Hello,
when running CI for one of my packages, it shows
WARNING: deprecated syntax "immutable" at /home/travis/.julia/v0.7/URIParser/src/parser.jl:11.
Use "struct" instead.
That's just a warning...
Kind regards
Hi,
Any ideas why build is failing? Any hints as to how to fix it? I could have a go at a PR with some guidance.
Best regards,
Eric
Edit: Looks like an error on Linux (iOS is passing) during the build step. I'm on Windows and unlikely to be able to help fix this one.
https://travis-ci.org/JuliaWeb/URIParser.jl/jobs/115475359#L125
I tried installing three times on a clean version of Julia Version 0.5.0-rc3+0. Is this a known issue?
versioninfo()
:
Julia Version 0.5.0-rc3+0
Commit e6f843b* (2016-08-22 23:43 UTC)
Platform Info:
System: Linux (x86_64-suse-linux)
CPU: Intel(R) Core(TM) i5-4200U CPU @ 1.60GHz
WORD_SIZE: 64
BLAS: liblapack.so.3
LAPACK: liblapack
LIBM: libopenlibm
LLVM: libLLVM-3.7.1 (ORCJIT, haswell)
Result:
_ _ _(_)_ | A fresh approach to technical computing
(_) | (_) (_) | Documentation: http://docs.julialang.org
_ _ _| |_ __ _ | Type "?help" for help.
| | | | | | |/ _` | |
| | |_| | | | (_| | | Version 0.5.0-rc3+0 (2016-08-22 23:43 UTC)
_/ |\__'_|_|_|\__'_| |
|__/ | x86_64-suse-linux
julia> Pkg.add("URIParser")
INFO: Initializing package repository /home/me/.julia/v0.5
INFO: Cloning METADATA from https://github.com/JuliaLang/METADATA.jl
INFO: Installing Compat v0.9.0
INFO: Installing URIParser v0.1.6
signal (11): Segmentation fault
while loading no file, in expression starting on line 0
strlen at /lib64/libc.so.6 (unknown line)
unknown function (ip: 0x7fa4be432d23)
unknown function (ip: 0x7fa4be4338ef)
git_diff_tree_to_index at /usr/lib64/libgit2.so.23 (unknown line)
git_diff_tree_to_workdir_with_index at /usr/lib64/libgit2.so.23 (unknown line)
macro expansion at ./libgit2/error.jl:97 [inlined]
#diff_tree#77 at ./libgit2/diff.jl:16
unknown function (ip: 0x7fa4b9f1b6c9)
diff_tree at ./libgit2/diff.jl:4
unknown function (ip: 0x7fa4b9f1accd)
#isdiff#85 at ./libgit2/libgit2.jl:85
#isdiff at ./<missing>:0
#5 at ./pkg/read.jl:182
with at ./libgit2/types.jl:638
unknown function (ip: 0x7fa4b9f35266)
requires_path at ./pkg/read.jl:181
requires_list at ./pkg/read.jl:195 [inlined]
requires_list at ./pkg/read.jl:195 [inlined]
build! at ./pkg/entry.jl:574
build! at ./pkg/entry.jl:619
build at ./pkg/entry.jl:641
resolve at ./pkg/entry.jl:557
resolve at ./pkg/entry.jl:476
edit at ./pkg/entry.jl:30
#2 at ./task.jl:360
unknown function (ip: 0x7fa4b9f1f59f)
jl_call_method_internal at /home/abuild/rpmbuild/BUILD/julia-0.5.0-rc3/src/julia_internal.h:189 [inlined]
jl_apply_generic at /home/abuild/rpmbuild/BUILD/julia-0.5.0-rc3/src/gf.c:1930
jl_apply at /home/abuild/rpmbuild/BUILD/julia-0.5.0-rc3/src/julia.h:1392 [inlined]
start_task at /home/abuild/rpmbuild/BUILD/julia-0.5.0-rc3/src/task.c:253
unknown function (ip: 0xffffffffffffffff)
Allocations: 5900013 (Pool: 5897160; Big: 2853); GC: 11
Segmentation fault (core dumped)
Thanks
Found a few gaps while testing URIParser. Here's a list of stuff to be done / merged from URLParse.jl.
Should this package recognize "relative URLs" as described in https://www.ietf.org/rfc/rfc1808.txt?
Python's urllib.parse seems to do this — see https://docs.python.org/3/library/urllib.parse.html
Would be nice to add a writemime(io, ::@MIME("text/html"), x::URI) = ...
method to display a URI as a clickable link in IJulia.
After JuliaLang/julia#19449, soon to be merged for Julia 0.6, you will no longer be able to access the raw bytes of a string via string.data
; instead, do Vector{UInt8}(string)
. For example, this affects:
Line 54 in 6ae90da
X-Reference from FTPClient.jl
When using multiple @
characters URIParser fails with various errors depending on the amount of @
characters passed in.
Examples:
using URIParser
URI("ftp://[email protected]:[email protected]")
> ERROR: Port must be numeric (decimal)
using URIParser
URI("ftp://[email protected]:p@[email protected]")
> ERROR: Port must be numeric (decimal)
using URIParser
URI("ftp://user:p@[email protected]")
> ERROR: Unexpected character '@' in host
Upon building HttpParser:
WARNING: Compat.ASCIIString is deprecated, use String instead.
likely near /Users/solver/.julia/v0.6/URIParser/src/parser.jl:11
WARNING: Compat.ASCIIString is deprecated, use String instead.
likely near /Users/solver/.julia/v0.6/URIParser/src/URIParser.jl:17
According to RFC 3986, a scheme must start with an alpha character, but can contain digits and a small set of other symbols. From the document:
scheme = ALPHA *( ALPHA / DIGIT / "+" / "-" / "." )
However, if I try URI("s3://bucket/key"), I get:
ERROR: Unexpected character 3 after schema
in parse_url at C:\Users\s2maki\.julia\v0.4\URIParser\src\parser.jl:212
in call at C:\Users\s2maki\.julia\v0.4\URIParser\src\parser.jl:294
julia> URI("google.com", "/some/path")
URI(http://google.com:80/some/path)
:)
julia> URI("google.com/some/path")
ERROR: Unexpected character . after schema
in error at error.jl:21
in parse_url at /Users/westley/.julia/URIParser/src/parser.jl:205
in URI at /Users/westley/.julia/URIParser/src/parser.jl:287
:(
escape
seems to produce % escaped UTF-8, but unescape
does not re-produce the original character.
julia> s = URIParser.escape("ü")
"%C3%BC"
julia> s = URIParser.unescape(s)
"ü"
Please bump in metadata, this package is causing deprecation warnings with contains
whenever BinDeps is used.
escape
does not escape some strings with Unicode.
julia> escape("深圳")
"深圳"
escape
should probably not try to re-escape strings that are already escaped.
julia> escape("%E3%82%A2")
"%25E3%2582%25A2"
The current behavior makes it challenge to work with any sort of Unicode data. Ref; JuliaWeb/Requests.jl#44
julia> using URIParser
julia> URI("http://www.cairographics.org/releases/pixman-0.28.2.tar.gz")
ERROR: stack overflow
in URI at /home/m/.julia/URIParser/src/parser.jl:32 (repeats 1025 times)
Looking at the code, yeah, how is that going to work?
URI(schema::ASCIIString,host::ASCIIString,port::Integer,path,query::ASCIIString="",fragment="",userinfo="",specifies_authority=false) =
URI(schema,host,uint16(port),path,query,fragment,userinfo)
I'm not sure about the details, but it might be nice to allow URIs of non-ASCII text (i.e. use String
rather than ASCIIString
in the type definition). The RFC is a bit vague on this. Perhaps it isn't technically allowed but it seems possible to encounter in the wild (e.g. ☃.net will resolve in a browser). On a practical level it's somewhat annoying to not be able to pass UTF8String
s or SubString
s thereof to methods in Requests.jl.
Currently:
julia> URI("https://example.com/data.csv")
URI(https://example.com/data.csv)
Proposed:
julia> URI("https://example.com/data.csv")
URI("https://example.com/data.csv")
esc.jl uses deprecated utf8
and bytestring
.
parser.jl
defines a method for non-existant Base.writemime
Patch to reduce deprecation warnings...
(Not sure how Base.writemime
should be replaced)
diff --git a/src/esc.jl b/src/esc.jl
index 36acd6b..0babe6b 100644
--- a/src/esc.jl
+++ b/src/esc.jl
@@ -2,12 +2,12 @@ const escaped_regex = r"%([0-9a-fA-F]{2})"
# Escaping
const control_array = vcat(map(UInt8, 0:parse(Int,"1f",16)))
-const control = utf8(ascii(control_array)*"\x7f")
-const space = utf8(" ")
-const delims = utf8("%<>\"")
-const unwise = utf8("(){}|\\^`")
+const control = ascii(String(control_array)*"\x7f")
+const space = String(" ")
+const delims = String("%<>\"")
+const unwise = String("(){}|\\^`")
-const reserved = utf8(",;/?:@&=+\$![]'*#")
+const reserved = String(",;/?:@&=+\$![]'*#")
# Strings to be escaped
# (Delims goes first so '%' gets escaped first.)
const unescaped = delims * reserved * control * space * unwise
@@ -27,7 +27,7 @@ function unescape(str)
end
push!(r, c)
end
- return bytestring(r)
+ return String(r)
end
unescape_form(str) = unescape(replace(str, "+", " "))
diff --git a/src/parser.jl b/src/parser.jl
index 7a01669..a8e1d35 100644
--- a/src/parser.jl
+++ b/src/parser.jl
@@ -322,6 +322,7 @@ function print(io::IO, uri::URI)
end
end
+#=
function Base.writemime(io::IO, ::MIME"text/html", uri::URI)
print(io, "<a href=\"")
print(io, uri)
@@ -329,3 +330,4 @@ function Base.writemime(io::IO, ::MIME"text/html", uri::URI)
print(io, uri)
print(io, "</a>")
end
+=#
escaping
the following query string
query = "q=(and subreddit:'politics' timestamp:..1406404322)&sort=new&syntax=cloudsearch"
Produces the the following
q = escape(query)
q%3D(and%20subreddit%3A%27politics%27%20timestamp%3A..1406404322)%26sort%3Dnew%26syntax%3Dcloudsearch
Passing this query to the Reddit API via
u = "http://www.reddit.com/search.json?"
url = "$(u)$(q)"
r = get(url, headers = headers)
Fails in both Julia Requests and in python requests.get. The same query escapes to the following string in python's Requests module:
url = u"http://www.reddit.com/search.json?q=(and%20subreddit:'politics'%20timestamp:..1406404322)&sort=new&syntax=cloudsearch"
Notice the '=' and ':' is not percent escaped. This url works in both python's and julia's requests module.
Could we tag a version 1.0.0?
That would make it much easier for packages that depend on this one here to use semver compat bounds in Project.toml. The problem is that pre 1.0.0, every minor update is considered breaking, whereas starting with 1.0.0 one can use the full semver semantics, which make everything easier.
Also, it seems that this package is so widely used with the current API, that one might as well declare what we have as stable and done, right?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.