Code Monkey home page Code Monkey logo

branchless_bisect's Introduction

Beating Bisect With Branchless Binary Search

I recently read this article: https://probablydance.com/2023/04/27/beautiful-branchless-binary-search/, and I was inspired. First I implemented the same algorithm in pure Python:

def branchless_bisect_left(arr, value):
begin, end=0,len(arr)
length = end - begin
if length == 0:
return end
step = 1 << (length.bit_length() - 1)
if step != length and arr[step]<value:
length -= step + 1
if length == 0:
return end
begin = end - step
step >>= 1
for s in (step:=step >> 1 for _ in range(step.bit_length())):
begin += s*(arr[s+begin]<value)
return begin+int(arr[begin]<value)

And then I compared it against sortedcontainers's implementation of bisect_left across a large range of array sizes:

image

Pretty handily beats it! However, most people using Python would probably be using bisect_left from the bisect library in the stdlib. To try to beat that implementation, though, I will have to descend into C-land. Here's my implementation as a Python C-extension:

static PyObject* bisect(PyObject *list_obj, PyObject *value, PyObject *compare){
long size = PyList_Size(list_obj);
long begin = 0, end = size;
long length = end - begin;
if (length == 0) {
return PyLong_FromLong(end);
}
long step = bit_floor_shifted(length);
if (step != length && PyObject_RichCompareBool(PyList_GetItem(list_obj, begin + step - 1), value, compare)) {
begin += step;
length -= step;
}
long next = 0;
for (step >>= 1; step != 0; step >>=1) {
if ((next = begin + step) < size) {
begin += PyObject_RichCompareBool(PyList_GetItem(list_obj, next), value, compare) * step;
}
}

image

That beats it as well! Admittedly this is only for arrays of size up to 2**29, but still pretty cool.

Now, you might be asking, how does it perform on non-powers-of-two? I made a graph using the following parameters:

sizes = [i for i in range(0, 2**15)]

image

I also checked whether it successfully compiled with a CMOVE instruction:

bf: e8 00 00 00 00 call c4 <_bl_bisect_left+0xa4>
c4: 83 f8 01 cmp $0x1,%eax
c7: 4c 0f 44 fb cmove %rbx,%r15
cb: 48 83 fb 02 cmp $0x2,%rbx
cf: 73 35 jae 106 <_bl_bisect_left+0xe6>

It does! You can see the full compiled dump in objdump_output.txt. Now, here are all of them combined (what you will get if you run main.py):

image

This repo contains the submodule and benchmarking code I used to obtain these results.

branchless_bisect's People

Contributors

juliusgeo avatar thatbirdguythatuknownot avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.