Comments (9)
Can you share an example project with the specific dependencies and exactly the code that you tried?
For example, it the code you've posted, you never define p+ and p+ (my mistake, early morning :), and I can't see how large n
is. Maybe foldmap
just throws an exception? Or, n
might be quite large, so 20ms is what it should be? What is your hardware, Java version, OS?
from fluokitten.
Just for reference, I've just tried this code with the current Neanderthal snapshot on the same machine I used for writing that post (i7 4790k) and got the following:
(with-progress-reporting (quick-bench (foldmap p+ 0.0 p* nx ny)))
;; Execution time mean : 189.349159 µs
It's a little bit faster than reported in the blog post (198 µs) but close. Please try to run that benchmark on your machine a couple of times and see whether there are some changes. Maybe the combination of the version and the settings of your JVM/OS have something to do with that, but it is difficult to say without data.
from fluokitten.
I should have specified all parameters at the first place. All configurations are:
- n=100000
- JVM version: OpenJDK 1.8.0
- OS: Linux Debian
- CPU: i5-3470
- Clojure 1.9.0
The computer I use is slow, yet it doesn't explain the huge performance differences. I also tried it on another Xeon E5-2686 machine this morning -- same results.
Further checking shows: fold
is reasonable fast, yet fmap
is the bottleneck. Another observation. In your post, you showed that
(fold (fmap * nx ny))
;; => ClassCastException clojure.core$_STAR_ cannot be cast to clojure.lang.IFn$DDD uncomplicate.neanderthal.impl.buffer-block/vector-fmap* (buffer_block.clj:349)
Yet in my tests, this code runs. I suspect you have a different fmap
which takes advantage of native MKL or GPU libraries, so you have super performance.
from fluokitten.
I think that I know what might be the source of the problem: the old Clojure compiler was somewhat inconsistent in applying protocol implementations, so it dispatches to the non-primitive function implementation in your case (why? I don't know). In my tests, Clojure 1.10 fixed this non-determinism. Please upgrade the project to 1.10 and report the timings.
BTW fmap
does not use any MKL/GPU acceleration, it just eliminates various JVM bottlenecks. fold
does use MKL in a few simple cases where that is possible.
from fluokitten.
Just tested on Cojure 1.10, but still the performance is sluggish. Also (fold (fmap * nx ny))
runs under 1.10 in my case.
from fluokitten.
Here I create a project so you can check it out: fluokitten_test. Below are my run results:
Estimating sampling overhead
Warming up for JIT optimisations 10000000000 ...
compilation occurred before 463355 iterations
compilation occurred before 59757179 iterations
compilation occurred before 116734838 iterations
compilation occurred before 118124537 iterations
compilation occurred before 177418361 iterations
compilation occurred before 236712185 iterations
Estimating execution count ...
Sampling ...
Final GC...
Checking GC...
Finding outliers ...
Bootstrapping ...
Checking outlier significance
Warming up for JIT optimisations 5000000000 ...
Estimating execution count ...
Sampling ...
Final GC...
Checking GC...
Finding outliers ...
Bootstrapping ...
Checking outlier significance
Evaluation count : 36 in 6 samples of 6 calls.
Execution time mean : 16.717164 ms
Execution time std-deviation : 188.162809 µs
Execution time lower quantile : 16.481262 ms ( 2.5%)
Execution time upper quantile : 16.925771 ms (97.5%)
Overhead used : 9.915312 ns
from fluokitten.
When I tried your project as-is on my computer (but starting the benchmark from the repl instead of main), I got 4ms.
Then I added the direct linking option to :jvm-opts
in leiningen, and got a significant speedup, 800 microseconds.
:jvm-opts ^:replace ["-Dclojure.compiler.direct-linking=true"
"-XX:MaxDirectMemorySize=16g" "-XX:+UseLargePages"]
I restarted your project a few times with different versions of neanderthal (SNAPHOT and 0.20.4) and Clojure 1.8.0, 1.10.0, and I always got the same result.
However, when I started the repl from the benchmarks example project (https://github.com/uncomplicate/neanderthal/blob/master/examples/benchmarks/src/benchmarks/map_reduce.clj) I always get around 200 microseconds, as reported in the blog post.
So, it is definitely related to JVM/Clojure compiler settings, and possibly order in whick Clojure loads namespaces. I don't have time now to compare your project further and see if there is another setting that you've missed. Can you try the code from the benchmarks project and report your numbers (seeing that our CPUs got 20ms vs 4 ms for the initial version, I should expect that you'll get around 1ms with the benchmarks project)?
from fluokitten.
Thanks for the prompt reply. I will check under the options provided, and report back later.
from fluokitten.
Now I found the cause of the issue. In addition to the direct-linking options you pointed out, another factor is the *unchecked-math*
option: it has to be set to either true
or :warn-on-boxed
globally. Only set the option in the module won't work.
The fluokitten_test is updated with the change. Now I can get around 240us as the end results.
from fluokitten.
Related Issues (16)
- Typos in docs HOT 1
- Consider type annotation HOT 2
- Consider Show protocol HOT 3
- Multiarity fmap is no longer fmap HOT 7
- Async Monad for Clojurescript? HOT 4
- Inconsistency in protocol realisations for sequential collections HOT 7
- Functor and Applicative instance for Map HOT 1
- `fold` breaks on large vectors? HOT 15
- [Q] On inconsistent behavior in doc and 0.6.0 release HOT 1
- Running Compiled Jar File Causes ExceptionInInitializerError HOT 9
- Enhancement: Consider operadic semantics (convenient composition) HOT 1
- Dedicated discussion server
- Implement the vararg version of fmap through fapply
- Implement mdo syntactic sugar for chained binds
- Clojurescript HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from fluokitten.