Code Monkey home page Code Monkey logo

pygram11's Introduction

pygram11

Documentation Status Actions Status PyPI version Conda Forge Python Version

Simple and fast histogramming in Python accelerated with OpenMP with help from pybind11.

pygram11 provides functions for very fast histogram calculations (and the variance in each bin) in one and two dimensions. The API is very simple; documentation can be found here (you'll also find some benchmarks there).

Installing

From PyPI

Binary wheels are provided for Linux, macOS, and Windows. They can be installed from PyPI via pip:

pip install pygram11

From conda-forge

For installation via the conda package manager pygram11 is part of conda-forge.

conda install pygram11 -c conda-forge

From Source

You need is a C++14 compiler and OpenMP. If you are using a relatively modern GCC release on Linux then you probably don't have to worry about the OpenMP dependency. If you are on macOS, you can install libomp from Homebrew (pygram11 does compile on Apple Silicon devices with Python version >= 3.9 and libomp installed from Homebrew). With those dependencies met, simply run:

git clone https://github.com/douglasdavis/pygram11.git --recurse-submodules
cd pygram11
pip install .

Or let pip handle the cloning procedure:

pip install git+https://github.com/douglasdavis/pygram11.git@main

Tests are run on Python versions >= 3.8 and binary wheels are provided for those versions.

In Action

A histogram (with fixed bin width) of weighted data in one dimension:

>>> rng = np.random.default_rng(123)
>>> x = rng.standard_normal(10000)
>>> w = rng.uniform(0.8, 1.2, x.shape[0])
>>> h, err = pygram11.histogram(x, bins=40, range=(-4, 4), weights=w)

A histogram with fixed bin width which saves the under and overflow in the first and last bins:

>>> x = rng.standard_normal(1000000)
>>> h, __ = pygram11.histogram(x, bins=20, range=(-3, 3), flow=True)

where we've used __ to catch the None returned when weights are absent. A histogram in two dimensions with variable width bins:

>>> x = rng.standard_normal(1000)
>>> y = rng.standard_normal(1000)
>>> xbins = [-2.0, -1.0, -0.5, 1.5, 2.0, 3.1]
>>> ybins = [-3.0, -1.5, -0.1, 0.8, 2.0, 2.8]
>>> h, err = pygram11.histogram2d(x, y, bins=[xbins, ybins])

Manually controlling OpenMP acceleration with context managers:

>>> with pygram11.omp_disabled():  # disable all thresholds.
...     result, _ = pygram11.histogram(x, bins=10, range=(-3, 3))
...
>>> with pygram11.omp_forced(key="thresholds.var1d"):  # force a single threshold.
...     result, _ = pygram11.histogram(x, bins=[-3, -2, 0, 2, 3])
...

Histogramming multiple weight variations for the same data, then putting the result in a DataFrame (the input pandas DataFrame will be interpreted as a NumPy array):

>>> N = 10000
>>> weights = pd.DataFrame({"weight_a": np.abs(rng.standard_normal(N)),
...                         "weight_b": rng.uniform(0.5, 0.8, N),
...                         "weight_c": rng.uniform(0.0, 1.0, N)})
>>> data = rng.standard_normal(N)
>>> count, err = pygram11.histogram(data, bins=20, range=(-3, 3), weights=weights, flow=True)
>>> count_df = pd.DataFrame(count, columns=weights.columns)
>>> err_df = pd.DataFrame(err, columns=weights.columns)

I also wrote a blog post with some simple examples.

Other Libraries

  • boost-histogram provides Pythonic object oriented histograms.
  • Simple and fast histogramming in Python using the NumPy C API: fast-histogram (no variance or overflow support).
  • To calculate histograms in Python on a GPU, see cupy.histogram.

If there is something you'd like to see in pygram11, please open an issue or pull request.

pygram11's People

Contributors

douglasdavis avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

pygram11's Issues

Where is pygram11 in performance compared to fast-histogram?

Basically the title.

I want to go for raw speed only and don't care for much more than being able to specify bin-edges and getting back a numpy-array from the function that calculates the histogram.

To give a little more context: It could be, that i will have to calculate a histogram of 1'000'000 bins, where each bin is 1000 integers wide. I would have to fill 1000 values in those bins. Can you tell me if this is possible with pygram11?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.