Code Monkey home page Code Monkey logo

Comments (7)

weavejester avatar weavejester commented on August 24, 2024 1

In a separate PR, please.

from ring-codec.

weavejester avatar weavejester commented on August 24, 2024

If you can come up with an alternative implementation that's faster in most real time cases, and not significantly slower in edge cases, then sure. I'll need benchmarks, though!

from ring-codec.

bsless avatar bsless commented on August 24, 2024

Reading carefully the implementation of URLDecoder/decode looks like it only uses the encoding when it encounters hex characters, so the naive solution should be correct in all cases.
I'll upload detailed benchmarks tomorrow

from ring-codec.

bsless avatar bsless commented on August 24, 2024

Please find below the rationale, implementation and benchmark results

Rationale

  1. Base implementation we compare against
  2. Charset lookup incurs some overhead. We can cut it by def-ing the default charset and providing coercion
  3. Decode only changes the string if it contains + or %. The theory is that checking they're contained in it is faster than calling into decode in the happy path, and the overhead is negligible in the sad path

Implementations:

1 - avoid charset lookup

(defn- to-charset
  ^Charset [s]
  (if (string? s)
    (Charset/forName s)
    s))

(def ^Charset utf-8 (to-charset "UTF-8"))

(defn form-decode-str'
  ([encoded]
   (form-decode-str' encoded utf-8))
  ([^String encoded encoding]
   (try
     (URLDecoder/decode encoded (to-charset encoding))
     (catch Exception _ nil))))

2 - avoid calling into decode unnecessarily

(defn form-decode-str''
  ([encoded]
   (form-decode-str'' encoded utf-8))
  ([^String encoded encoding]
   (try
     (if (or
          (.contains encoded "+")
          (.contains encoded "%"))
       (URLDecoder/decode encoded (to-charset encoding))
       encoded)
     (catch Exception _ nil))))

Methodology

  • Use criterium.
  • Input sizes with logarithmic size.
  • Add the char which requires decoding at the end of the string, requiring a full scan
(def results
  (into
   []
   (for [n [1 2 4 8 16 32]
         :let [s (apply str (repeat n "a"))
               s' (str s "+")]]
     {:size n
      :no-decode0 (cc/quick-benchmark (form-decode-str s) nil)
      :no-decode1 (cc/quick-benchmark (form-decode-str' s) nil)
      :no-decode2 (cc/quick-benchmark (form-decode-str'' s) nil)
      :decode0 (cc/quick-benchmark (form-decode-str s') nil)
      :decode1 (cc/quick-benchmark (form-decode-str' s') nil)
      :decode2 (cc/quick-benchmark (form-decode-str'' s') nil)})))

Results

Timing is measured in ns:

size no-decode0 no-decode1 no-decode2 decode0 decode1 decode2
1 10.89 9.39 5.06 18.11 16.48 18.26
2 14.13 12.69 6.41 21.44 19.84 22.76
4 18.90 17.86 6.35 25.66 24.34 26.46
8 28.89 27.38 6.34 38.42 33.96 40.15
16 48.19 46.60 5.94 55.97 54.17 57.04
32 93.64 89.69 8.06 110.20 101.04 98.86

sad-path

happy-path

Overhead between decode1 and decode2 should be only scanning the string before decoding it:

overhead

Based on these results, the full implementation (2) looks good to me. What do you think?


pngs rendered with plotly and clerk

from ring-codec.

weavejester avatar weavejester commented on August 24, 2024

Thank you for the detailed analysis. This looks like a good improvement to the existing implementation. Would you be able to open a PR, and link back to this issue?

from ring-codec.

bsless avatar bsless commented on August 24, 2024

Absolutely

While we're at it, would you like me to replace all instances where encoding is provided as String to Charset, or would you like that to be a different PR?

from ring-codec.

bsless avatar bsless commented on August 24, 2024

Since the form-decode-str PR could build on the charset PR, would you like the Charset PR first?
I can close #38 and split it

from ring-codec.

Related Issues (18)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.