Code Monkey home page Code Monkey logo

urly's Introduction

What is Urly

Urly is a tiny Clojure library that unifies parsing of URIs, URLs and URL-like values like relative href values in real-world HTML.

Why Urly Was Necessary

java.net.URI and java.net.URL in general do a great job of parsing valid (per RFCs) URIs and URLs. However, when working with real world HTML markup, it is common to come across href attribute values that are not valid URIs or URLs but are recognized and accepted by Web browsers. Normalization and resolution of such values cannot use java.net.URI or java.net.URL because both will throw illegal format exceptions.

Urly tries to make this less painful.

Supported Clojure versions

Urly is built from the ground up for Clojure 1.3+ and JDK 6+.

Usage

Installation

With Leiningen

[clojurewerkz/urly "1.0.0"]

clojurewerkz.urly.UrlLike

The central concept in Urly is the UrlLike class. It unifies java.net.URI and java.net.URL as much as practical and also supports relative href attributes values like "/search?q=Clojure". UrlLike instances are immutable and perform normalizations that are safe (for example, uses default pathname of "/" and lowercases protocol and hostnames but not pathnames).

UrlLike instances are immutable. To mutate them, use UrlLike#mutateProtocol, UrlLike#mutatePath and similar methods (see examples below).

Urly is built around Clojure protocols so most of functions are polymorphic and can take strings as well as instances of

  • clojurewerkz.urly.UrlLike
  • java.net.URI
  • java.net.URL

as their first argument.

Key Functions

(ns my.app
  (:refer-clojure :exclude [resolve])
  (:use clojurewerkz.urly.core)
  (:import [java.net URI URL]))

;; Instantiate a UrlLike instance
(url-like (URL. "http://clojure.org"))
(url-like (URI. "http://clojure.org"))
(url-like "http://clojure.org")

;; unline java.net.URI, valid Internet domain names like "clojure.org" and "amazon.co.uk"
;; will be recognized as hostname, not paths
(url-like "clojure.org")
(url-like "amazon.co.uk")


;; accessing parts of the URL

(let [u (url-like "http://clojure.org")]
  (protocol-of u)  ;; => "http"
  (.getProtocol u) ;; => "http"
  (.getSchema u)   ;; => "http"
  (host-of u)      ;; => "clojure.org"
  (.getHost u)     ;; => "clojure.org"
  (.getHostname u) ;; => "clojure.org"
  (port-of u)     ;; => -1
  (path-of u)     ;; => "/", path is normalized to be "/" if not specified
  (query-of u)    ;; => nil
  (fragment-of u) ;; => nil
  (tld-of u)      ;; => "org"
  ;; returns all of the above as an immutable Clojure map
  (as-map u))

;; absolute & relative URLs

(absolute? "/faq") ;; => false
(relative? "/faq") ;; => true

(absolute? (java.net.URL. "http://clojure.org")) ;; => true
(relative? (java.net.URL. "http://clojure.org")) ;; => false

;; resolving URIs

(resolve (URI. "http://clojure.org") (URI. "/Protocols"))                   ;; => (URI. "http://clojure.org/Protocols")
(resolve (URI. "http://clojure.org") "/Protocols")                          ;; => (URI. "http://clojure.org/Protocols")
(resolve (URI. "http://clojure.org") (URL. "http://clojure.org/Protocols")) ;; => (URI. "http://clojure.org/Protocols")
(resolve "http://clojure.org"        (URI. "/Protocols"))                   ;; => (URI. "http://clojure.org/Protocols")
(resolve "http://clojure.org"        (URL. "http://clojure.org/Protocols")) ;; => (URI. "http://clojure.org/Protocols")

;; mutating URL parts

(let [u (url-like "http://clojure.org")]
  ;; returns a UrlLike instance that represents "http://clojure.org/Protocols"
  (.mutatePath u "/Protocols")
  ;; returns a UrlLike instance that represents "https://clojure.org/"
  (.mutateProtocol u "https")
  ;; returns a UrlLike instance with query string URL-encoded using UTF-8 as encoding
  (encode-query (url-like "http://clojuredocs.org/search?x=0&y=0&q=%22predicate function%22~10"))
  ;; returns a UrlLike instance that represents "http://clojure.org/"
  (-> u (.mutateQuery "search=protocols")
        (.withoutQueryStringAndFragment))
  ;; the same via Clojure API
  (-> u (mutate-query "search=protocols")
        (.withoutQueryStringAndFragment))
  ;; returns a UrlLike instance with the same parts as u but no query string
  (.withoutQuery u)
  ;; returns a UrlLike instance with the same parts as u but no fragment (#hash)
  (.withoutFragment u)
  ;; returns a UrlLike instance that represents "http://clojuredocs.org/search?x=0&y=0&q=%22predicate+function%22~10"
  (-> u (mutate-query "x=0&y=0&q=%22PREDICATE+FUNCTION%22~10")
        (mutate-query-with (fn [^String s] (.toLowerCase s)))))



;; stripping of extra protocol prefixes (commonly found in URLs on the Web)

(eliminate-extra-protocol-prefixes "http://https://broken-cms.com") ;; => https://broken-cms.com
(eliminate-extra-protocol-prefixes "https://http://broken-cms.com") ;; => http://broken-cms.com

Documentation & Examples

Documentation site for Urly is coming in the future (sorry!). Please see our extensive test suite for more code examples.

Continuous Integration

Continuous Integration status

CI is hosted by travis-ci.org

Urly Is a ClojureWerkz Project

Urly is part of the group of Clojure libraries known as ClojureWerkz, together with Monger, Neocons, Langohr, Elastisch, Quartzite, Welle and several others.

Development

Urly uses Leiningen 2. Make sure you have it installed and then run tests against all supported Clojure versions using

lein with-profile dev javac
lein all test

Then create a branch and make your changes on it. Once you are done with your changes and all tests pass, submit a pull request on Github.

License

Copyright (C) 2011-2012 Michael S. Klishin

Distributed under the Eclipse Public License, the same as Clojure.

Bitdeli Badge

urly's People

Contributors

bitdeli-chef avatar emidln avatar esehara avatar ifesdjeen avatar michaelklishin avatar ricardojmendez avatar tutysara avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

urly's Issues

Malformed query params lead to inconsistent behavior

Hello,

I had a url with malformed query params, something like:

(def u "www.example.com/?foo=%20%")

Now the following:

(urly/url-like u)

returns nil,

Whereas:

(urly/url-like (java.net.URI. u))

throws CompilerException java.net.URISyntaxException: Malformed escape pair at index 24: www.example.com/?foo=%20% and

(urly/url-like (java.net.URL. u))

throws CompilerException java.net.MalformedURLException: no protocol: www.example.com/?foo=%20%

I know that this is an edge case, but especially the first case when nil is returned can be quite confusing so I thought it was worth mentioning.

Thanks!

Documentation of ClojureWerkz deprecation

Hi Michael,

I noticed that you moved urly to the deprecated projects section on ClojureWerkz but there's no notice here of that fact. Also, no real explanation of what that means is given. Is this project abandoned? Are there alternatives you could recommend? Any light you could shed on this would be much appreciated.

Cheers.

absolute? not consistent with URI.isAbsolute()

Looks like url-like is mis-reporting relative urls as absolute. If I pass a java.net.URI instance to absolute?, the url is correctly reported as not being absolute, but if I pass a url-like instance, then the opposite is true:

user=> (absolute? (url-like "foo.html"))
true
user=> (.isAbsolute (java.net.URI. "foo.html"))
false

I am a clojure noob, so I am afraid I only have a guess as to the problem based on checking out the source: I think it has to do with the relative URL being mis-detected as a domain name. The above relative URL does not throw a IllegalArgumentException as expected in the code.

user=> (let [idn (InternetDomainName/from "foo.html")] (UrlLike/from idn))
#<UrlLike http://foo.html/>

Guava dependency and CLJS compatibility

Here's the exception:

Exception in thread "main" java.lang.NoSuchMethodError: com.google.common.io.ByteStreams.limit(Ljava/io/InputStream;J)Ljava/io/InputStream;
    at com.google.javascript.jscomp.CommandLineRunner.getDefaultExterns(CommandLineRunner.java:939)
    at cljs.closure$load_externs.invoke(closure.clj:235)
    at cljs.closure$optimize.doInvoke(closure.clj:769)

And when I exclude the guava dependency, I get

Caused by: java.lang.ClassNotFoundException: com.google.common.net.InternetDomainName
    at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
    at clojure.lang.DynamicClassLoader.findClass(DynamicClassLoader.java:61)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
    at java.lang.Class.forName0(Native Method)

Guava seems like a really heavy dependency for this one class. Is there any other way to achieve this functionality?

url-like doesn't work on URLs with braces

I think the code demostrates the issue in a concise way:

=> (use '[clojurewerkz.urly.core :only (url-like)])
nil
=> (url-like "test.com?q={foo}")
nil
=> (url-like "test.com?q={foo")
nil
=> (url-like "test.com?q=foo}")
nil
=> (url-like "test.com?q=foo")
#<UrlLike test.com?q=foo>

This is happening with urly versions 1.0.0 and 2.0.0-alpha5. I haven't checked any other versions.

Cheers.

Push latest version to Clojars

Hi,

It looks like the latest version on Clojars is Alpha 5 from two years ago, which depends on guava 11.0.1. Could you push the latest code as an alpha 6 for the updated Guava 18 dependencies?

Host of URIs w/ port but w/o protocol

user=> (urly/host-of (urly/url-like "localhost:4711"))
nil
user=> (urly/host-of (urly/url-like "http://localhost:4711"))
"localhost"
user=> (-> "coogle.com:9000" urly/url-like urly/host-of)
nil
user=> (-> "http://coogle.com:9000" urly/url-like urly/host-of)
"coogle.com"
user=> (-> "coogle.com" urly/url-like urly/host-of)
"coogle.com"

I'm not sure if this is a bug or just not supported?

normalize-url throws NullPointerException on URLs with ">"

I'm using urly (version 1.0.0) to manipulate some URLs in my Clojure project. With some URLs, core/normalize-url throws a NullPointerException.

The entry in my leiningen project.clj:

[clojurewerkz/urly "1.0.0"]

Relevant output from a REPL:

user=> (require '[clojurewerkz.urly.core])
nil
user=> (clojurewerkz.urly.core/normalize-url "http://a.com/foo>")
NullPointerException   clojurewerkz.urly.core/eval46973/fn--46977 (core.clj:326)

user=> (pst)
NullPointerException 
    clojurewerkz.urly.core/eval46973/fn--46977 (core.clj:326)
    clojurewerkz.urly.core/eval46937/fn--46951/G--46928--46956 (core.clj:319)
    user/eval47073 (NO_SOURCE_FILE:1)
    clojure.lang.Compiler.eval (Compiler.java:6511)
    clojure.lang.Compiler.eval (Compiler.java:6477)
    clojure.core/eval (core.clj:2797)
    clojure.main/repl/read-eval-print--6405 (main.clj:245)
    clojure.main/repl/fn--6410 (main.clj:266)
    clojure.main/repl (main.clj:266)
    clojure.tools.nrepl.middleware.interruptible-eval/evaluate/fn--825 (interruptible_eval.clj:56)
    clojure.core/apply (core.clj:601)
    clojure.core/with-bindings* (core.clj:1771)

Release 2.0.0-alpha6

2.0.0-alpha5 is a few years old. We should do a 2.0.0-alpha6.

Any preferred approach for tracking these? git-flow? Something like gitlab-flow?

(url-like "#some-fragment") adds an absolute path (i.e., #<UrlLike /#some-fragment>)

I bumped into the following inconsistency:

user=> (java.net.URI. "#fragment")
#<URI #fragment>
user=> (.toURI (urly/url-like "#fragment"))
#<URI /#fragment>

And indeed (which is what actually bit me):

user=> (urly/url-like "#fragment")
#<UrlLike /#fragment>

Investigating, it turns out that UrlLike.fromURI doesn't seem prepared to deal with a url that's only an anchor (such as you'd have to navigate within the same page). It basically returns

new UrlLike(lowerCaseOrNull(uri.getScheme()), uri.getUserInfo(), uri.getHost(), uri.getAuthority(), uri.getPort(), pathOrDefault(uri.getPath()), uri.getQuery(), uri.getFragment());

which, in the case of a url consisting solely of of a fragment, uses a default path of "/".

Now, I'm happy to produce a pull request. I just want to make sure that this is not by design or something.

Parsing URLs with IDN.

I was trying to use urly to convert unicode domain names to punycode (using .mutateHost), and here is what I've found:

(require '[clojurewerkz.urly.core :as urly])
(import '[java.net URL URI IDN])

(let [my-idn-url "http://фитомаркет-онлайн.рф/test.html"
      url-like (urly/url-like my-idn-url)
      url (URL. my-idn-url)
      uri (URI. my-idn-url)
      all [url-like uri url]]
  (doall (map println all))
  ; #<UrlLike http:/test.html>
  ; #<URI http://фитомаркет-онлайн.рф/test.html>
  ; #<URL http://фитомаркет-онлайн.рф/test.html>
  (doall (map #(println (.getHost %)) all))
  ; nil
  ; nil
  ; фитомаркет-онлайн.рф
  (doall (map #(println (.getAuthority %)) all))
  ; фитомаркет-онлайн.рф
  ; фитомаркет-онлайн.рф
  ; фитомаркет-онлайн.рф
  (doall (map (comp println urly/url-like) [uri url]))
  ; #<UrlLike http:/test.html>
  ; #<UrlLike <malformed URI>>  <-- That's weird
  ;
  ; And here is the solution
  (let [correct-url-like (urly/url-like url)
        host (.getHost correct-url-like)]
    (println correct-url-like)
    ; #<UrlLike <malformed URI>>  <-- Double weird
    (println host)
    ; фитомаркет-онлайн.рф
    (->
      correct-url-like
      (.mutateHost (IDN/toASCII host))
      (println))
    ; #<UrlLike http://xn----7sbbsnkdkeodcfy0agz.xn--p1ai/test.html>
    ; Hooray!
    ))

I'm not sure if it's a bug in urly, more likely it's in java.net.URI, can you confirm?

Versions:

Mac OS X 10.8.5
===
java version "1.7.0_25"
Java(TM) SE Runtime Environment (build 1.7.0_25-b15)
Java HotSpot(TM) 64-Bit Server VM (build 23.25-b01, mixed mode)
===
REPL-y 0.2.0
Clojure 1.5.1

Exception suprise

Hello

My intent of using urly is to leverage as a validation user passed URL/URI however I was surprised when I encountered this:
=> (let [u (url-like "http://127.0.0.1")](as-map u))
IllegalArgumentException Not a valid domain name: '127.0.0.1' com.google.common.base.Preconditions.checkArgument (Preconditions.java:115)

Am I doing something wrong or are my expectations out of line? (clojure 1.5.1).

Thanks,
Frank

How do I check if the url is proper http scheme ?

Hello Michael,

Thanks for the great libraray !

Using Urly, and given a URL is it possible to check if the URL satisfies the http scheme ?

eg given -

(def url "http://google.com" )

(scheme-http? url ) ;;will return => true

Is there some function like scheme-http? to check if a url satisfies the http scheme ?

Thanks,
Murtaza

resolve is more strict than url-like

There are some URLs that url-like accepts but that resolve rejects (by throwing URISyntaxException). For example:

=> (url-like "http://example.com/my document.pdf")
#<UrlLike http://example.com/my%20document.pdf>
=> (resolve "http://example.com/" "my document.pdf")
URISyntaxException Illegal character in path at index 2: my document.pdf  java.net.URI$Parser.fail (URI.java:2829)

Is this by design? Intuitively I'd expect resolve to be no more or less lenient than url-like, but maybe I'm overlooking something.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.