Code Monkey home page Code Monkey logo

Comments (9)

blueberry avatar blueberry commented on July 16, 2024

Can you share an example project with the specific dependencies and exactly the code that you tried?

For example, it the code you've posted, you never define p+ and p+ (my mistake, early morning :), and I can't see how large n is. Maybe foldmap just throws an exception? Or, n might be quite large, so 20ms is what it should be? What is your hardware, Java version, OS?

from fluokitten.

blueberry avatar blueberry commented on July 16, 2024

Just for reference, I've just tried this code with the current Neanderthal snapshot on the same machine I used for writing that post (i7 4790k) and got the following:

(with-progress-reporting (quick-bench (foldmap p+ 0.0 p* nx ny)))
;;  Execution time mean : 189.349159 µs

It's a little bit faster than reported in the blog post (198 µs) but close. Please try to run that benchmark on your machine a couple of times and see whether there are some changes. Maybe the combination of the version and the settings of your JVM/OS have something to do with that, but it is difficult to say without data.

from fluokitten.

randomizedthinking avatar randomizedthinking commented on July 16, 2024

I should have specified all parameters at the first place. All configurations are:

  • n=100000
  • JVM version: OpenJDK 1.8.0
  • OS: Linux Debian
  • CPU: i5-3470
  • Clojure 1.9.0

The computer I use is slow, yet it doesn't explain the huge performance differences. I also tried it on another Xeon E5-2686 machine this morning -- same results.

Further checking shows: fold is reasonable fast, yet fmap is the bottleneck. Another observation. In your post, you showed that

(fold (fmap * nx ny))
;; => ClassCastException clojure.core$_STAR_ cannot be cast to clojure.lang.IFn$DDD  uncomplicate.neanderthal.impl.buffer-block/vector-fmap* (buffer_block.clj:349)

Yet in my tests, this code runs. I suspect you have a different fmap which takes advantage of native MKL or GPU libraries, so you have super performance.

from fluokitten.

blueberry avatar blueberry commented on July 16, 2024

I think that I know what might be the source of the problem: the old Clojure compiler was somewhat inconsistent in applying protocol implementations, so it dispatches to the non-primitive function implementation in your case (why? I don't know). In my tests, Clojure 1.10 fixed this non-determinism. Please upgrade the project to 1.10 and report the timings.

BTW fmap does not use any MKL/GPU acceleration, it just eliminates various JVM bottlenecks. fold does use MKL in a few simple cases where that is possible.

from fluokitten.

randomizedthinking avatar randomizedthinking commented on July 16, 2024

Just tested on Cojure 1.10, but still the performance is sluggish. Also (fold (fmap * nx ny)) runs under 1.10 in my case.

from fluokitten.

randomizedthinking avatar randomizedthinking commented on July 16, 2024

Here I create a project so you can check it out: fluokitten_test. Below are my run results:

Estimating sampling overhead
Warming up for JIT optimisations 10000000000 ...
compilation occurred before 463355 iterations
compilation occurred before 59757179 iterations
compilation occurred before 116734838 iterations
compilation occurred before 118124537 iterations
compilation occurred before 177418361 iterations
compilation occurred before 236712185 iterations
Estimating execution count ...
Sampling ...
Final GC...
Checking GC...
Finding outliers ...
Bootstrapping ...
Checking outlier significance
Warming up for JIT optimisations 5000000000 ...
Estimating execution count ...
Sampling ...
Final GC...
Checking GC...
Finding outliers ...
Bootstrapping ...
Checking outlier significance
Evaluation count : 36 in 6 samples of 6 calls.
Execution time mean : 16.717164 ms
Execution time std-deviation : 188.162809 µs
Execution time lower quantile : 16.481262 ms ( 2.5%)
Execution time upper quantile : 16.925771 ms (97.5%)
Overhead used : 9.915312 ns

from fluokitten.

blueberry avatar blueberry commented on July 16, 2024

When I tried your project as-is on my computer (but starting the benchmark from the repl instead of main), I got 4ms.

Then I added the direct linking option to :jvm-opts in leiningen, and got a significant speedup, 800 microseconds.

:jvm-opts ^:replace ["-Dclojure.compiler.direct-linking=true"
                       "-XX:MaxDirectMemorySize=16g" "-XX:+UseLargePages"] 

I restarted your project a few times with different versions of neanderthal (SNAPHOT and 0.20.4) and Clojure 1.8.0, 1.10.0, and I always got the same result.

However, when I started the repl from the benchmarks example project (https://github.com/uncomplicate/neanderthal/blob/master/examples/benchmarks/src/benchmarks/map_reduce.clj) I always get around 200 microseconds, as reported in the blog post.

So, it is definitely related to JVM/Clojure compiler settings, and possibly order in whick Clojure loads namespaces. I don't have time now to compare your project further and see if there is another setting that you've missed. Can you try the code from the benchmarks project and report your numbers (seeing that our CPUs got 20ms vs 4 ms for the initial version, I should expect that you'll get around 1ms with the benchmarks project)?

from fluokitten.

randomizedthinking avatar randomizedthinking commented on July 16, 2024

Thanks for the prompt reply. I will check under the options provided, and report back later.

from fluokitten.

randomizedthinking avatar randomizedthinking commented on July 16, 2024

Now I found the cause of the issue. In addition to the direct-linking options you pointed out, another factor is the *unchecked-math* option: it has to be set to either true or :warn-on-boxed globally. Only set the option in the module won't work.

The fluokitten_test is updated with the change. Now I can get around 240us as the end results.

from fluokitten.

Related Issues (16)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.