Code Monkey home page Code Monkey logo

permcode's Introduction

Various implementations of the permutation-encoding function
described in this Stack Overflow question:

    http://stackoverflow.com/q/39623081/417501

Use the mk(1) utility to run the benchmarks. Set CC to your
C compiler and CFLAGS to flags you want to test:

    mk CC=... CFLAGS=...

You might want to exclude bmi2 if your CPU is an older one.  You might
want to exclude vector on 32 bit targets.  You should exclude treeasm
and bmi2 on non x86 targets.  Here are the results for some machines:

    AMD Turion(tm) II Neo N54L Dual-Core Processor (gcc 4.7.2)

    baseline    0.1000s  0.1000s  0.2000s  10.0000ns  10.0000ns  20.0000ns
    count       3.5800s  3.2300s  6.8100s 358.0000ns 323.0000ns 681.0000ns
    bitcount    3.6100s  0.2400s  3.8500s 361.0000ns  24.0000ns 385.0000ns
    decrement   6.9100s  3.7900s 10.7000s 691.0000ns 379.0000ns 1070.0000ns
    bin4        4.6200s  3.6300s  8.2500s 462.0000ns 363.0000ns 825.0000ns
    bin5        4.2600s  3.6700s  7.9300s 426.0000ns 367.0000ns 793.0000ns
    bin8        4.8500s  3.6700s  8.5200s 485.0000ns 367.0000ns 852.0000ns
    vector      0.9500s  0.7500s  1.7000s  95.0000ns  75.0000ns 170.0000ns
    shuffle     0.4700s  0.7700s  1.2400s  47.0000ns  77.0000ns 124.0000ns
    tree        3.4700s  2.8200s  6.2900s 347.0000ns 282.0000ns 629.0000ns
    treeasm     2.1700s  1.4200s  3.5900s 217.0000ns 142.0000ns 359.0000ns


    Intel(R) Core(TM) i7-4910MQ CPU @ 2.90GHz (clang 3.8.1)
    baseline    0.0391s  0.0391s  0.0781s   3.9062ns   3.9062ns   7.8125ns
    count       1.3750s  1.4297s  2.8047s 137.5000ns 142.9688ns 280.4688ns
    bitcount    1.5547s  0.1172s  1.6719s 155.4688ns  11.7188ns 167.1875ns
    decrement   2.3281s  1.4062s  3.7344s 232.8125ns 140.6250ns 373.4375ns
    bin4        2.2422s  1.5547s  3.7969s 224.2188ns 155.4688ns 379.6875ns
    bin5        2.0547s  1.6562s  3.7109s 205.4688ns 165.6250ns 371.0938ns
    bin8        2.5859s  1.5625s  4.1484s 258.5938ns 156.2500ns 414.8438ns
    vector      0.6328s  0.4297s  1.0625s  63.2812ns  42.9688ns 106.2500ns
    shuffle     0.1328s  0.3438s  0.4766s  13.2812ns  34.3750ns  47.6562ns
    tree        1.9766s  1.6641s  3.6406s 197.6562ns 166.4062ns 364.0625ns
    treeasm     1.1406s  0.5938s  1.7344s 114.0625ns  59.3750ns 173.4375ns
    bmi2        0.2344s  0.1250s  0.3594s  23.4375ns  12.5000ns  35.9375ns


    Intel(R) Core(TM) i7-4910MQ CPU @ 2.90GHz (gcc 6.2.0)
    baseline    0.0391s  0.0312s  0.0703s   3.9062ns   3.1250ns   7.0312ns
    count       1.5312s  1.4453s  2.9766s 153.1250ns 144.5312ns 297.6562ns
    bitcount    1.5078s  0.0703s  1.5781s 150.7812ns   7.0312ns 157.8125ns
    decrement   2.1875s  1.7969s  3.9844s 218.7500ns 179.6875ns 398.4375ns
    bin4        2.1562s  1.7734s  3.9297s 215.6250ns 177.3438ns 392.9688ns
    bin5        2.0703s  1.8281s  3.8984s 207.0312ns 182.8125ns 389.8438ns
    bin8        2.0547s  1.8672s  3.9219s 205.4688ns 186.7188ns 392.1875ns
    vector      0.3594s  0.2891s  0.6484s  35.9375ns  28.9062ns  64.8438ns
    shuffle     0.1953s  0.3672s  0.5625s  19.5312ns  36.7188ns  56.2500ns
    tree        2.0781s  1.7734s  3.8516s 207.8125ns 177.3438ns 385.1562ns
    treeasm     1.4297s  0.7422s  2.1719s 142.9688ns  74.2188ns 217.1875ns
    bmi2        0.0938s  0.0703s  0.1641s   9.3750ns   7.0312ns  16.4062ns

    RPi 1b BCM2708 (gcc 4.6.3)
    baseline    0.8500s  0.8400s  1.6900s  85.0000ns  84.0000ns 169.0000ns
    count      14.4900s  9.8000s 24.2900s 1449.0000ns 980.0000ns 2429.0000ns
    bitcount   15.2800s  8.6400s 23.9200s 1528.0000ns 864.0000ns 2392.0000ns
    decrement  25.3700s 16.9600s 42.3300s 2537.0000ns 1696.0000ns 4233.0000ns
    bin4       23.3600s 17.2500s 40.6100s 2336.0000ns 1725.0000ns 4061.0000ns
    bin5       22.1400s 17.1300s 39.2700s 2214.0000ns 1713.0000ns 3927.0000ns
    bin8       22.7800s 16.6400s 39.4200s 2278.0000ns 1664.0000ns 3942.0000ns
    shuffle     2.7500s  3.4000s  6.1500s 275.0000ns 340.0000ns 615.0000ns
    tree        8.9500s  9.5300s 18.4800s 895.0000ns 953.0000ns 1848.0000ns

    RPi 1b BCM2708 (clang 3.0-6.2)
    baseline    1.0500s  1.0500s  2.1000s 105.0000ns 105.0000ns 210.0000ns
    count      14.0400s  9.1500s 23.1900s 1404.0000ns 915.0000ns 2319.0000ns
    bitcount   12.4900s  4.8100s 17.3000s 1249.0000ns 481.0000ns 1730.0000ns
    decrement  28.5200s 18.6600s 47.1800s 2852.0000ns 1866.0000ns 4718.0000ns
    bin4       17.3200s 10.7600s 28.0800s 1732.0000ns 1076.0000ns 2808.0000ns
    bin5       16.6900s 12.9600s 29.6500s 1669.0000ns 1296.0000ns 2965.0000ns
    bin8       17.5400s 10.8500s 28.3900s 1754.0000ns 1085.0000ns 2839.0000ns
    shuffle     4.4500s  4.7800s  9.2300s 445.0000ns 478.0000ns 923.0000ns
    tree       10.0500s  9.0300s 19.0800s 1005.0000ns 903.0000ns 1908.0000ns

    RPi 3 BCM2709 (clang 3.8.0)
    baseline    0.5156s  0.5156s  1.0312s  51.5625ns  51.5625ns 103.1250ns
    count      19.6797s 12.4297s 32.1094s 1967.9688ns 1242.9688ns 3210.9375ns
    bitcount   18.1172s  1.8281s 19.9453s 1811.7188ns 182.8125ns 1994.5313ns
    decrement  31.7578s 13.5000s 45.2578s 3175.7812ns 1350.0000ns 4525.7812ns
    bin4       23.0391s 14.2031s 37.2422s 2303.9062ns 1420.3125ns 3724.2188ns
    bin5       22.2500s 15.6641s 37.9141s 2225.0000ns 1566.4062ns 3791.4062ns
    bin8       25.9453s 14.6953s 40.6406s 2594.5312ns 1469.5312ns 4064.0625ns
    vector      5.1719s  4.0625s  9.2344s 517.1875ns 406.2500ns 923.4375ns
    shuffle     2.1953s  2.2656s  4.4609s 219.5312ns 226.5625ns 446.0938ns
    tree       13.8516s 13.4453s 27.2969s 1385.1562ns 1344.5312ns 2729.6875ns

permcode's People

Contributors

clausecker avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

Forkers

rfrfrf clayne

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.