Code Monkey home page Code Monkey logo

ocaml-uri's Introduction

Uri -- an RFC3986 URI/URL parsing library

This is an OCaml implementation of the RFC3986 specification for parsing URI or URLs.

Installation

Via OPAM

The OPAM package manager can be used to install this library from source.

opam install uri

Locally

You can build the source code locally via the dune build system.

opam install uri --deps-only
eval `opam config env`
dune build
dune runtest

will install the dependencies via OPAM, build the library and then run the tests in the lib_test/ directory.

Usage

Once installed, there are three ocamlfind packages available for your use:

  • uri - the base Uri module
  • uri-re - the legacy implementation At the beginning, uri used re to parse a string. Since 4.0.0, we use angstrom - if something breaks with uri.4.0.0, you should compare with uri-re and submit an issue. uri-re is deprecated and it will be removed on the next release (see #150)
  • uri.top - the toplevel printers for use with utop
  • uri-sexp - provides converters to and from s-expressions (via a Uri_sexp.t type alias)
  • uri.services - the Uri_services module that provides the equivalent of services(5)
  • uri.services_full - the Uri_services_full module that provides a complete copy of the /etc/services file. This is quite large and normally not needed.

Contact

Build Status

ocaml-uri's People

Contributors

anmonteiro avatar avsm avatar chris00 avatar craigfe avatar crotsos avatar davesnx avatar dinosaure avatar djs55 avatar drup avatar dsheets avatar fgimenez avatar gasche avatar hcarty avatar julow avatar kit-ty-kate avatar let-def avatar mjambon avatar moonlightdrive avatar mor1 avatar mseri avatar rgrinberg avatar samoht avatar smorimoto avatar struktured avatar thecbah avatar thelortex avatar tmcgilchrist avatar torinnd avatar vbmithr avatar yallop avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ocaml-uri's Issues

Slashes in path components

# Uri.(to_string (of_string "/foo%2Fbar"));;
- : string = "/foo/bar"

Result should be:

# Uri.(to_string (of_string "/foo%2Fbar"));;
- : string = "/foo%2Fbar"

It seems that some functions treat the path as a whole and encode/decode the whole path instead of each component. As a result, simply marking / as unsafe doesn't solve the problem.

I will let you know if I start working on a fix.

Uri.of_string does not actually invoke pct_decode

Is this expected behaviour?

> let s = "udp%3A//tracker.openbittorrent.com%3A80";;
> let uri = Uri.of_string s;;
val uri : Uri.t = <abstr>
> Uri.scheme uri;;
- : string option = None
> let uri = Uri.of_string (Uri.pct_decode s);;
val uri : Uri.t = <abstr>
> Uri.scheme uri;;
- : string option = Some "udp"

Functorial interface to support RFC3987

Hello,

Do you think it would be possible to provide a functional interface to implement RFC3987 (IRIs) ?

As I see it now, the functor could take as parameter two modules:

  • one for regexp definitions (equivalent of Uri_re),
  • one for encode/decode (equivalent or Uri.Pct).

If you think it's possible, I will give it a try.

Regards,

  • zoggy

Add a Uri.set counterpart to Uri.make

It would be useful to have a Uri.set function (maybe with a better name?) with all of the sample optional arguments as Uri.make which takes an existing URI and replaces whichever values are supplied by the optional arguments.

val make : ?scheme:string -> ?userinfo:string -> ?host:string ->
  ?port:int -> ?path:string -> ?query:(string * string list) list ->
  ?fragment:string -> unit -> t

val set : ?scheme:string option -> ?userinfo:string option -> ?host:string option ->
  ?port:int option -> ?path:string option -> ?query:(string * string list) list option ->
  ?fragment:string option -> t -> t

For each optional argument provided, the result would be equivalent to stacking calls to the corresponding Uri.with_* function. If all of the optional arguments were elided then the given URI would be returned as-is.

interface is inconsistent

functions with primes shouldn't have, functions without should; get_query_param* is the current exception

Behavior of with_path

Say I have a URI of

https://api.stripe.com/v1/charges as Uri.t

and then I want to add something like "q2asddf_token_thing" so that the final Uri.t will be:

https://api.stripe.com/v1/charges/q2asddf_token_thing

I don't want to do string level conversions/concating of course, so I thought Uri.with_path would help me here but all it does is completely replace the path giving me this instead:

https://api.stripe.com/q2asddf_token_thing

Is this correct behavior? To me this is bad.

path segment functions

There should be interface functions to operate over path segments and provide the same kind of services that are available to query components.

Internal representation for IP addresses

I just checked IPv6 support and I noticed that the internal representation of IPv6 addresses is an int32 "fourple".
I did some benchmark on my own regarding comparison speed and int64 tuple seems to be faster (if you are interested, I can provide the code).
Is there a special reason to use int32 and not int64 ? 32bit architecture ?

Using Uri with Google Distance Matrix API (separator issue)

I am using Uri to create a Google Distance Matrix request and i find it a bit awkward since the Google API uses pipe character "|" for separating parameter values but Uri uses commas "," and i could not find a way to override it:

utop # let x = Uri.of_string "http://www.github.com/";;
val x : Uri.t = <abstr>

utop # let y = Uri.add_query_param x ("origins", ["Bristol"; "Cambridge"; "London"]);;
val y : Uri.t = <abstr>

utop # Uri.to_string y;;
- : string = "http://www.github.com/?origins=Bristol,Cambridge,London"

Although this is RFC-compliant, it is not what the (silly) API expects.

I solved the problem by concatenating the values before passing them to Uri, which consequently performed the character escape on them. It worked with Google API, but it left me wonder – what if it would not (i.e. in case of a slightly more retarded web service)? Since Lwt and Cohttp require Uri.t, would that mean no OCaml for me in this project?

utop # let c = String.concat ~sep:"|" ["Bristol"; "Cambridge"; "Plymouth"; "London"];;
- : string = "Bristol|Cambridge|Plymouth|London"

utop # let y = Uri.add_query_param x ("origins", [c]);;
val y : Uri.t = <abstr>

utop # Uri.to_string y;;
- : string = "http://www.github.com/?origins=Bristol%7CCambridge%7CLondon"

I understand that the way Google API handles this is not RFC compliant but it is a real-world API, so maybe there is a scope for a compromise?

If Uri functions took an optional ~sep parameter, like Jane Street's Core.String.concat does, this would not be a problem.

What do you think?

RFC6874

utop # let a = Uri.of_string "http://[ff02::1%25wlan0]:6788";;                               
val a : Uri.t = http:// 

This is a valid URI representation as per RFC6874

Package does not compile on Raspbian Stretch (Raspberry Pi 3)

When running opam install uri on a Raspberry Pi 3 (running Raspbian Stretch), I get the following error:

#=== ERROR while compiling uri.1.9.4 ==========================================#
# command      jbuilder build -p uri -j 3
# path         /home/pi/.opam/4.04.1/.opam-switch/build/uri.1.9.4
# exit-code    1
# env-file     /home/pi/.opam/log/uri-1642-e7966f.env
# output-file  /home/pi/.opam/log/uri-1642-e7966f.out
### output ###
# [...]
# /tmp/camlasmfc3e02.s:188838: Error: offset out of range
# /tmp/camlasmfc3e02.s:188841: Error: offset out of range
# /tmp/camlasmfc3e02.s:188845: Error: offset out of range
# /tmp/camlasmfc3e02.s:188860: Error: offset out of range
# /tmp/camlasmfc3e02.s:188864: Error: offset out of range
# /tmp/camlasmfc3e02.s:188871: Error: offset out of range
# /tmp/camlasmfc3e02.s:188892: Error: offset out of range
# /tmp/camlasmfc3e02.s:188896: Error: offset out of range
# /tmp/camlasmfc3e02.s:188900: Error: offset out of range
# /tmp/camlasmfc3e02.s:188917: Error: offset out of range
# File "etc/uri_services_full.ml", line 1:
# Error: Assembler error, input left in file /tmp/camlasmfc3e02.s

Note that compilation on an AMD64 machine completes without error. Is there perhaps something funky with my particular Raspbian installation, or does anyone else encounter the same problem?
(I can post the assembly file on /tmp if that helps.)

Uri.encoded_of_query and nulls

I'm not sure if this use case should be supported but I wanted to bring it up. The AWS S3 API requires signing of the request. This signing requires that query parameters that are null are presented with an equal sign.

S3 wants: https://foo.com?a=&b=4&c=
Uri.encoded_of_query would return: "a&b=4&c"

Should there be a configuration parameter in Uri to allow one to deal with null query params differently?

It seems to me (probably correct) that one would save on a few bytes in the general case. I didn't take time yet to look at what the standards say.

Interested to hear your thoughts. Thanks.

Trevor

Uri.query_of_encoded "" returns a non-empty list

Uri.query_of_encoded "" returns a non-empty list:

utop # Uri.query_of_encoded "";;
- : (string * string list) list = [("", [])]   

It surprised me that it has this behavior. Is it somehow part of the spec, or is it a bug?

Convenience functions for extracting multiple arguments

Pulling from rgrinberg/opium#32 (comment) it would be nice to have a function for extracting multiple query parameters with a single call, similar to:

(* only match if all query params are present *)
val get_query_params : Uri.t -> string list -> (string list list) option
(* allow for partial matches of parameters list *)
val get_query_params' : Uri.t -> string list -> (string list option) list

usable as:

match get_query_params uri ["ua"; "xx"; "yyy"] with
| Some [ua; xx; yyy] -> ...
| _ -> ...

or, if you're expecting and want to only accept a single argument for a parameter:

match get_query_params uri ["ua"; "xx"; "yyy"] with
| Some [[ua]; [xx]; yyy] -> ... (* Only match if ua and xx are single-valued, yyy gets everything *)
| _ -> ...

Installation fails on most recent opam (2016-02-21)

I tried `opam install uri. Message is:

ocamlfind: Package `sexplib.syntax' not found
W: Field 'pkg_sexplib_syntax' is not set: Command ''/home/simon/.opam/4.02.3/bin/ocamlfind' query -format %d sexplib.syntax > '/tmp/oasis-bd53a8.txt'' terminated with error code 2
E: Cannot find findlib package sexplib.syntax
E: Failure("1 configuration error")

I suppose the packaging of sexplib has changed, now that ppx replaces camlp4.

Allow configuration of pct encoding

I am writing a simple client to interact with AWS S3.

Their docs state: This requires a uri encoding by:
URI encode every byte. Uri-Encode() must enforce the following rules:

  • URI encode every byte except the unreserved characters: 'A'-'Z', 'a'-'z', '0'-'9', '-', '.', '_', and '~'.
  • The space character is a reserved character and must be encoded as "%20" (and not as "+").
  • Each Uri-encoded byte is formed by a '%' and the two-digit hexadecimal value of the byte.
  • Letters in the hexadecimal value must be uppercase, for example "%1A".
  • Encode the forward slash character, '/', everywhere except in the object key name. For example, if the object key name is photos/Jan/sample.jpg, the forward slash in the key name is not encoded.

As discussed in other issues on this project, the standards, and the reality with this sort of thing are very different. I think the answer is to give the end user a few functions to exude more control. This will allow for standards compliant apis, and arbitrarily not-standards-compliant apis to be used.

Thoughts? Thanks!
Trevor

Support URNs

It would be great if Uri supported URNs. While Uri.of_string works on them, Uri.to_string percent-encode some : when it shouldn't:

# Uri.to_string(Uri.of_string "urn:uuid:6fb03739-4ac3-5a9b-b5de-1d22890bbee6");;
 - : string = "urn:uuid%3A6fb03739-4ac3-5a9b-b5de-1d22890bbee6"

Possibly, the components should also be updated for these.

Argument ordering prevents chaining

Current argument ordering for with_*, having the Uri.t parameter first, prevents chaining of the form:

Uri.(empty |> with_host (Some "hostname") |> with_port (Some 12345))

Suggest perhaps either flipping the ordering for with_* which will break things, or perhaps introducing equivalent set_* values with the flipped order. Thoughts?

`path` discrepancy?

Shouldn't these turn out the same?

# let uri = Uri.of_string "http://www.meow.com" |>
               fun u -> Uri.with_path u "kitten";;
# Uri.path uri;;
- : string = "kitten"
# Uri.path @@ Uri.of_string @@ Uri.to_string uri;;
- : string = "/kitten"

Empty path segments handled incorrectly

utop # Uri.(to_string (resolve "" Uri.empty (of_string "/foo/bar/..")));;
- : bytes = "/foo/"
─( 13:45:36 )─< command 9 >──────────────────────────────────────────{ counter: 0 }─
utop # Uri.(to_string (resolve "" Uri.empty (of_string "/foo/bar//..")));;
- : bytes = "/foo/bar"
─( 13:45:40 )─< command 10 >─────────────────────────────────────────{ counter: 0 }─
utop # Uri.(to_string (resolve "" Uri.empty (of_string "/foo/bar///..")));;
- : bytes = "/foo/bar/"
─( 13:45:44 )─< command 11 >─────────────────────────────────────────{ counter: 0 }─
utop # Uri.(to_string (resolve "" Uri.empty (of_string "/foo/bar////..")));;
- : bytes = "/foo/bar//"
─( 13:45:48 )─< command 12 >─────────────────────────────────────────{ counter: 0 }─
utop # Uri.(to_string (resolve "" Uri.empty (of_string "/foo/bar//../baz")));;
- : bytes = "/foo/barbaz"

Suggested by @dbuenzli.

path depends on whether there is a host or not

# Uri.path @@ Uri.make ~host:"bla.com" ~path:"" () ;;
- : bytes = ""
# Uri.path @@ Uri.make ~host:"bla.com" ~path:"/" () ;;
- : bytes = "/"
# Uri.path @@ Uri.make ~host:"bla.com" ~path:"foo/" () ;;
- : bytes = "/foo/"
# Uri.path @@ Uri.make ~path:"" () ;;
- : bytes = ""
# Uri.path @@ Uri.make ~path:"/" () ;;
- : bytes = "/" 
# Uri.path @@ Uri.make ~path:"foo/" () ;;
- : bytes = "foo/"

I guess it's expected, but it's very annoying, since it forces to reimplement some path handling out of uri. I suppose it's related to #53

Fails to install with 4.04.0

When installing ocaml-uri under 4.04.0 with opam, it fails with the following message:

Error: broken invariant in parsetree: Let with no bindings.

It sounds like this error is related to:

I would love to help fix this, but I haven't been able to find any good resources on what "let with no bindings" means.

This is possibly a duplicate of #91, but that issue contains very little information, so I'm not sure.

normalise path function

Need a normalise path function that will strip out .. and other unsafe path components into a canonical version. See RFC3986

percent encoding in URI string

...is inconsistent at the moment (use the pct_encoded type more strongly would be a good idea, as it always needs to be explicitly decoded to be safe).

Uri.pct_encode: single quotes

Since the URI can be between single quotes in HTML, wouldn't it be a good idea that Uri.pct_encode also percent-encodes single quotes?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.