numba / numba-examples Goto Github PK

Example Numba implementations of functions

Home Page: https://numba.pydata.org/numba-examples/index.html

License: BSD 2-Clause "Simplified" License

Python 1.43% HTML 0.03% Jupyter Notebook 98.54%

numba-examples's Introduction

Numba Examples

This repository contains examples of using Numba to implement various algorithms. If you want to browse the examples and performance results, head over to the examples site.

In the repository is a benchmark runner (called numba_bench) that walks a directory tree of benchmarks, executes them, saves the results in JSON format, then generates HTML pages with pretty-printed source code and performance plots.

We are actively seeking new Numba examples! Keep reading to learn how benchmarks are run and how to make new ones.

Running Benchmarks

Assuming you have this repository checked out into the current directory, you should do the following to setup your environment to run the benchmarks:

conda create -n numba_bench --file conda-requirements.txt
source activate numba_bench
python setup.py install

The most common way to run the benchmarks is like this:

numba_bench -o results

conda install cudatoolkit # required for Numba GPU support
numba_bench -o results -r gpu

to run all the benchmarks, including the "gpu" only benchmarks. The results/ directory will contain an index.html with a list of the examples that were run, and each subdirectory will contain a results.json and results.html file containing the raw performance data and generated plot HTML, respectively.

There are additional options to numba_bench that can be useful:

--skip-existing: Skip any example with a results.json already present in the output directory.
--verify-only: Only run the verification step for each implementation in each benchmark. Very fast and good for debugging, but does not produce results.json or results.html.
--run-only: Only run benchmarks, don't generate HTML output.
--plot-only: Only generate HTML output, don't run benchmarks.
--root: Set the root of the tree of benchmarks. By default it is the current directory.

In addition, substrings can be listed on the command line that will limit which tests will run. For example, this command:

numba_bench -o results waveform pdf

will run any test under the benchmark tree with a directory that contains waveform or pdf in the name.

Making a benchmark

A benchmark is directory containing at least two files:

A bench.yaml describing the benchmark and how to run it.
One or more Python files that contain the benchmark functions.

A typical bench.yaml looks like this:

name: Zero Suppression
description: |
    Map all samples of a waveform below a certain absolute magnitude to zero
input_generator: impl.py:input_generator
xlabel: Number of elements
validator: impl.py:validator
implementations:
    - name: numpy
      description: Basic NumPy implementation
      function: impl.py:numpy_zero_suppression
    - name: numba_single_thread_ufunc
      description: Numba single threaded ufunc
      function: impl.py:numba_zero_suppression
    - name: numba_gpu_ufunc
      description: |
          Numba GPU ufunc.  Note this will be slower than CPU!
          There is not enough work for the GPU to do, so the fixed overhead dominates.
      function: gpu.py:numba_zero_suppression
      requires:
          - gpu
baseline: numpy

The top-level keys are:

name: A short name for the example. Used in plot titles, page titles, etc. Keep it short.
description: A Markdown description of the example. This can be multiple lines and is put at the top of the example page.
input_generator: Python function to call to generate input data. Format is filename:function_name.
validator: Python function to verify that output is correct. (You don't want to benchmark a function that gives wrong answers!)
implementations: A list of implementations to test. Being able to compare multiple implementations is important to see whether Numba is providing any benefit. Different implementations also have different scaling characteristics, which is helpful to compare.
baseline: The name of the implementation to use as the "reference" when computing speedup ratios for the other implementations.

Each implementation also defines:

name: Short name of implementation. Used in legends, tabs, and other places.
description: Longer Markdown description of implementation. Can be multi-line.
function: Python function with implementation. Note that multiple implementations can be in the same file, or they can be in different files.
requires: A list of strings indicating resources that are required to run this benchmark. If the benchmark runner is not told (with a command line option) that it has the required resources for an implementation, that implementation will be skipped.

Benchmarking Process

When benchmarking an example, the runner does the following:

All of the functions are loaded into memory by calling execfile on all of the Python scripts mentioned in bench.yaml. No file is loaded more than once, even if multiple implementations refer to it.
The input generator is called and for each input set it yields and each implementation that is defined:
1. The implementation is called once, and the output (along with the input) sent to the validator function to be checked. This also triggers any JIT compilation in the implementation so it does not contribute to the time measurement.
2. The implementation is called many times with the same input in a loop to get a more accurate time measurement, using roughly the same automatic scaling logic as %timeit in Jupyter so each batch of calls takes between 0.2 and 2 seconds. The best of three batches is recorded.
Results are grouped by category (see description of input generator below) so that one plot will be made for each unique category and each plot will contain one series per implementation.

Input Generators

An input generator is a Python generator that yields dictionaries each containing an input data set to benchmark. An example looks like:

def input_generator():
    for dtype in [np.int16, np.float32, np.float64]:
        for size in [100, 1000, 10000, 50000]:
            name = np.dtype(dtype).name
            input_array = np.random.normal(loc=0.0, scale=5.0, size=size)
            # add a pulse train
            input_array += 50 * np.clip(np.cos(np.linspace(0.0, 1.0, num=size)*np.pi*10), 0, 1.0)
            input_array = input_array.astype(dtype)
            yield dict(category=('%s' % name,), x=size, input_args=(input_array, 8.0), input_kwargs={})

Each dictionary has the following keys:

category: A tuple of strings that can be used to create different plots for the same example. In the above case, the category is used to indicate the data type of the array, so that a separate plot will be made for int16, float32, and float64. This could be used to group inputs into different categories like square array, tall and skinny, etc.
x: A float or int that denotes the input size. The meaning of this value is entirely up to the example author. It will be used as the x-axis in the performance plots. Usually number of array elements is a good choice, but it could be some other size metric.
input_args: A tuple of positional arguments to the implementation function
input_kwargs: A dictionary of keyword arguments to the implementation function

Validator

A validator function takes one set of input args and kwargs yielded by the input generator, and the output from the execution of one of the implementations, and determines if that output is correct. An example looks like:

def validator(input_args, input_kwargs, impl_output):
    # We're using the Numpy implementation as the reference
    expected = numpy_zero_suppression(*input_args, **input_kwargs)
    np.testing.assert_array_equal(expected, impl_output)

As the comment notes, we are treating the NumPy implementation as the reference, but validation can be done any way that makes sense. If the output is incorrect, an AssertionError should be raised.

Marking Implementation Source Code

The output HTML from running the benchmark includes the source code of the implementation. Code formatting is done with pygments, and the terms NOTE and SPEEDTIP in the comments are highlighted to stand out.

Since the implementation might depend on imports and helper functions, by default the benchmark runner will snapshot the entire Python file containing the main implementation function for the HTML output.

For short benchmarks, it might be more convenient to put more than one implementation into the same file. In that case, special comments can be used to tell the runner what section of code to capture. For example, in this file:

import numpy as np

def input_generator():
    for dtype in [np.int16, np.float32, np.float64]:
        for size in [100, 1000, 10000, 50000]:
            name = np.dtype(dtype).name
            input_array = np.random.normal(loc=0.0, scale=5.0, size=size)
            # add a pulse train
            input_array += 50 * np.clip(np.cos(np.linspace(0.0, 1.0, num=size)*np.pi*10), 0, 1.0)
            input_array = input_array.astype(dtype)
            yield dict(category=('%s' % name,), x=size, input_args=(input_array, 8.0), input_kwargs={})

#### BEGIN: numpy
import numpy as np

def numpy_zero_suppression(values, threshold):
    result = np.zeros_like(values)
    selector = np.abs(values) >= threshold
    result[selector] = values[selector]
    return result
#### END: numpy

The implementation named numpy will only show the code between #### BEGIN: numpy and #### END: numpy in the HTML rendering of the example.

numba-examples's People

Contributors

Stargazers

Watchers

Forkers

sklam complyue grlee77 stuartarchibald acechuse debrahe stjordanis jxhhl siebeniris zoushen backofenvelope kang-zhang dacongi blaxe05 fayazr liaobinloves lambert42 hoangtienduc simwawa batmanabcdefg tris-sondon windwalker2291 lwk205 ajayyadav1998 gmarkall makeroftools mousumi700 zerrorbornus cicerolneto esc sh4zkh4n jayef0 kky-fury fanszoro hxd2000 ekowyamoah digital-wit jamboneylj intelpython bombrake tianqant kmuloma97 kkkzxh keuperj chenglong92 kc611 simrit1 kuantumbs animator dliofindia ianna tazeemc tamiresreservoirengineer shabbirhasan1 kinddevil ret1ge ling1042133624 arnimtest mirjunaid26 lulunac27a

numba-examples's Issues

Error executing Benchmarch

root@547a227b1517:~/numba-examples# numba_bench -o results -r gpu
Scanning /root/numba-examples for benchmarks
Writing results to /root/numba-examples/results
/usr/local/lib/python3.6/dist-packages/numba_bench-0.1-py3.6.egg/numba_bench/benchmark.py:54: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
  config = yaml.load(f)

  Running Histogram [/root/numba-examples/examples/density_estimation/histogram]
    numpy: bins10, float32 - 1000 => 3 reps, 1000 iter per rep, 169.493665 usec per call
    numba: bins10, float32 - 1000 => 3 reps, 10000 iter per rep, 15.088746 usec per call
    numba_gpu: bins10, float32 - 1000 => 3 reps, 100 iter per rep, 1593.281750 usec per call
    numpy: bins10, float32 - 10000 => 3 reps, 1000 iter per rep, 265.618970 usec per call
    numba: bins10, float32 - 10000 => 3 reps, 1000 iter per rep, 141.332854 usec per call
    numba_gpu: bins10, float32 - 10000 => 3 reps, 100 iter per rep, 1603.700330 usec per call
    numpy: bins10, float32 - 100000 => 3 reps, 100 iter per rep, 1179.389010 usec per call
    numba: bins10, float32 - 100000 => 3 reps, 100 iter per rep, 1402.002540 usec per call
    numba_gpu: bins10, float32 - 100000 => 3 reps, 100 iter per rep, 1681.287160 usec per call
    numpy: bins10, float32 - 300000 => 3 reps, 100 iter per rep, 3271.321700 usec per call
    numba: bins10, float32 - 300000 => 3 reps, 100 iter per rep, 4213.909610 usec per call
    numba_gpu: bins10, float32 - 300000 => 3 reps, 100 iter per rep, 1949.394360 usec per call
    numpy: bins10, float32 - 3000000 => 3 reps, 10 iter per rep, 32000.128600 usec per call
    numba: bins10, float32 - 3000000 => 3 reps, 10 iter per rep, 42029.814400 usec per call
    numba_gpu: bins10, float32 - 3000000 => 3 reps, 100 iter per rep, 4975.962050 usec per call
    numpy: bins10, float64 - 1000 => 3 reps, 1000 iter per rep, 160.448296 usec per call
    numba: bins10, float64 - 1000 => 3 reps, 10000 iter per rep, 14.967072 usec per call
    numba_gpu: bins10, float64 - 1000 => 3 reps, 100 iter per rep, 1591.028610 usec per call
    numpy: bins10, float64 - 10000 => 3 reps, 1000 iter per rep, 273.850549 usec per call
    numba: bins10, float64 - 10000 => 3 reps, 1000 iter per rep, 137.559821 usec per call
    numba_gpu: bins10, float64 - 10000 => 3 reps, 100 iter per rep, 1585.167370 usec per call
    numpy: bins10, float64 - 100000 => 3 reps, 100 iter per rep, 1402.316260 usec per call
    numba: bins10, float64 - 100000 => 3 reps, 100 iter per rep, 1365.159980 usec per call
    numba_gpu: bins10, float64 - 100000 => 3 reps, 100 iter per rep, 1778.616570 usec per call
    numpy: bins10, float64 - 300000 => 3 reps, 100 iter per rep, 4086.320090 usec per call
    numba: bins10, float64 - 300000 => 3 reps, 100 iter per rep, 4103.344970 usec per call
    numba_gpu: bins10, float64 - 300000 => 3 reps, 100 iter per rep, 2086.206040 usec per call
    numpy: bins10, float64 - 3000000 => 3 reps, 10 iter per rep, 37877.584500 usec per call
    numba: bins10, float64 - 3000000 => 3 reps, 10 iter per rep, 40958.335600 usec per call
    numba_gpu: bins10, float64 - 3000000 => 3 reps, 100 iter per rep, 6885.126960 usec per call
    numpy: bins1000, float32 - 1000 => 3 reps, 1000 iter per rep, 180.907137 usec per call
    numba: bins1000, float32 - 1000 => 3 reps, 10000 iter per rep, 16.114160 usec per call
    numba_gpu: bins1000, float32 - 1000 => 3 reps, 100 iter per rep, 1586.353680 usec per call
    numpy: bins1000, float32 - 10000 => 3 reps, 1000 iter per rep, 275.862535 usec per call
    numba: bins1000, float32 - 10000 => 3 reps, 1000 iter per rep, 142.604908 usec per call
    numba_gpu: bins1000, float32 - 10000 => 3 reps, 100 iter per rep, 1589.610960 usec per call
    numpy: bins1000, float32 - 100000 => 3 reps, 100 iter per rep, 1223.783610 usec per call
    numba: bins1000, float32 - 100000 => 3 reps, 100 iter per rep, 1404.859190 usec per call
    numba_gpu: bins1000, float32 - 100000 => 3 reps, 100 iter per rep, 1684.227960 usec per call
    numpy: bins1000, float32 - 300000 => 3 reps, 100 iter per rep, 3347.087800 usec per call
    numba: bins1000, float32 - 300000Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/numba_bench-0.1-py3.6.egg/numba_bench/benchmark.py", line 184, in _run_and_validate_results
    self.validator(input_args, input_kwargs, actual_results)
  File "/root/numba-examples/examples/density_estimation/histogram/impl.py", line 73, in validator
    np.testing.assert_array_equal(expected_hist, actual_hist)
  File "/usr/local/lib/python3.6/dist-packages/numpy/testing/_private/utils.py", line 918, in assert_array_equal
    verbose=verbose, header='Arrays are not equal')
  File "/usr/local/lib/python3.6/dist-packages/numpy/testing/_private/utils.py", line 841, in assert_array_compare
    raise AssertionError(msg)
AssertionError: 
Arrays are not equal

Mismatch: 0.4%
Max absolute difference: 1
Max relative difference: 0.00301205
 x: array([   1,    0,    0,    0,    0,    0,    0,    0,    0,    1,    0,
          0,    0,    1,    0,    0,    0,    0,    0,    0,    1,    0,
          1,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,...
 y: array([   1,    0,    0,    0,    0,    0,    0,    0,    0,    1,    0,
          0,    0,    1,    0,    0,    0,    0,    0,    0,    1,    0,
          1,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,...

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/bin/numba_bench", line 4, in <module>
    __import__('pkg_resources').run_script('numba-bench==0.1', 'numba_bench')
  File "/usr/local/lib/python3.6/dist-packages/pkg_resources/__init__.py", line 666, in run_script
    self.require(requires)[0].run_script(script_name, ns)
  File "/usr/local/lib/python3.6/dist-packages/pkg_resources/__init__.py", line 1462, in run_script
    exec(code, namespace, namespace)
  File "/usr/local/lib/python3.6/dist-packages/numba_bench-0.1-py3.6.egg/EGG-INFO/scripts/numba_bench", line 7, in <module>
    sys.exit(main(sys.argv))
  File "/usr/local/lib/python3.6/dist-packages/numba_bench-0.1-py3.6.egg/numba_bench/main.py", line 62, in main
    verify_only=args.verify_only)
  File "/usr/local/lib/python3.6/dist-packages/numba_bench-0.1-py3.6.egg/numba_bench/benchmark.py", line 290, in discover_and_run_benchmarks
    results = benchmark.run_benchmark(verify_only=verify_only)
  File "/usr/local/lib/python3.6/dist-packages/numba_bench-0.1-py3.6.egg/numba_bench/benchmark.py", line 229, in run_benchmark
    self._run_and_validate_results(input_dict, impl_dict)
  File "/usr/local/lib/python3.6/dist-packages/numba_bench-0.1-py3.6.egg/numba_bench/benchmark.py", line 186, in _run_and_validate_results
    self._raise_benchmark_error('Implementation %s failed validation on input %s' % (impl_dict['name'], input_dict['x']))
  File "/usr/local/lib/python3.6/dist-packages/numba_bench-0.1-py3.6.egg/numba_bench/benchmark.py", line 59, in _raise_benchmark_error
    raise BenchmarkError(self.benchmark_dir, message)
numba_bench.benchmark.BenchmarkError: [/root/numba-examples/examples/density_estimation/histogram]: Implementation numba failed validation on input 300000

Runing on:

root@547a227b1517:~/numba-examples# numba -s
System info:
--------------------------------------------------------------------------------
__Time Stamp__
2020-01-20 18:47:17.782535

__Hardware Information__
Machine                                       : x86_64
CPU Name                                      : ivybridge
Number of accessible CPU cores                : 4
Listed accessible CPUs cores                  : 0-3
CFS restrictions                              : None
CPU Features                                  : 
64bit aes avx cmov cx16 f16c fsgsbase mmx pclmul popcnt rdrnd sahf sse sse2 sse3
sse4.1 sse4.2 ssse3 xsave xsaveopt

__OS Information__
Platform                                      : Linux-5.0.0-38-generic-x86_64-with-Ubuntu-18.04-bionic
Release                                       : 5.0.0-38-generic
System Name                                   : Linux
Version                                       : #41-Ubuntu SMP Tue Dec 3 00:27:35 UTC 2019
OS specific info                              : Ubuntu18.04bionic
glibc info                                    : glibc 2.25

__Python Information__
Python Compiler                               : GCC 8.3.0
Python Implementation                         : CPython
Python Version                                : 3.6.8
Python Locale                                 : en_US UTF-8

__LLVM information__
LLVM version                                  : 8.0.0

__CUDA Information__
Found 1 CUDA devices
id 0      b'GeForce GTX 760'                              [SUPPORTED]
                      compute capability: 3.0
                           pci device id: 0
                              pci bus id: 1
Summary:
        1/1 devices are supported
CUDA driver version                           : 10010
CUDA libraries:
Finding cublas from System
        named  libcublas.so.10.0.130
        trying to open library...       ok
Finding cusparse from System
        named  libcusparse.so.10.0.130
        trying to open library...       ok
Finding cufft from System
        named  libcufft.so.10.0.145
        trying to open library...       ok
Finding curand from System
        named  libcurand.so.10.0.130
        trying to open library...       ok
Finding nvvm from System
        named  libnvvm.so.3.3.0
        trying to open library...       ok
Finding libdevice from System
        searching for compute_20...     ok
        searching for compute_30...     ok
        searching for compute_35...     ok
        searching for compute_50...     ok

__ROC Information__
ROC available                                 : False
Error initialising ROC due to                 : No ROC toolchains found.
No HSA Agents found, encountered exception when searching:
Error at driver init: 
NUMBA_HSA_DRIVER /opt/rocm/lib/libhsa-runtime64.so is not a valid file path.  Note it must be a filepath of the .so/.dll/.dylib or the driver:

__SVML Information__
SVML state, config.USING_SVML                 : False
SVML library found and loaded                 : False
llvmlite using SVML patched LLVM              : True
SVML operational                              : False

__Threading Layer Information__
TBB Threading layer available                 : True
OpenMP Threading layer available              : False
+--> Disabled due to                          : Unknown import problem.
Workqueue Threading layer available           : True

__Numba Environment Variable Information__
None set.

__Conda Information__
Conda not present/not working.
Error was [Errno 2] No such file or directory: 'conda': 'conda'

--------------------------------------------------------------------------------
If requested, please copy and paste the information between
the dashed (----) lines, or from a given specific section as
appropriate.

=============================================================
IMPORTANT: Please ensure that you are happy with sharing the
contents of the information present, any information that you
wish to keep private you should remove before sharing.
=============================================================

ResolutionError

hello, guys. I followed the instructions and got this error:
pkg_resources.ResolutionError: Script 'scripts/numba_bench' not found in metadata

Anyone have encountered this same error?

Add implementations of KDTree / BallTree

Hello,

thanks for exposing and maintaining this library of Numba examples.

I have a full-Numba implementation of both KDTrees and BallTrees, and wondered whether this would be valuable to add in this repo.
My code is here, and I am willing to enrich it (add a few extra necessary methods) in order to match the minimal functionalities required by most use cases / or match the implementations standards that may reign in this repo.

Let me know,