Code Monkey home page Code Monkey logo

cldrive's Introduction

cldrive - Run arbitrary OpenCL kernels

cldrive is a tool for running arbitrary OpenCL kernels to record their runtimes and outputs. It reads OpenCL kernels from an input file, and for each, generates random inputs (parameterized by a given size), runs the kernel and records its execution time and outputs. It was developed as part of my work on Deep Learning benchmark synthesis, and has been used in the following publications:

  1. Cummins, C., Petoumenos, P., Zang, W., & Leather, H. (2017). Synthesizing Benchmarks for Predictive Modeling. CGO. IEEE.
  2. Cummins, C., Petoumenos, P., Wang, Z., & Leather, H. (2017). End-to-end Deep Learning of Optimization Heuristics. PACT. IEEE.
  3. Ben-Nun, T., Jakobovits, A. S., & Hoefler, T. (2018). Neural Code Comprehension: A Learnable Representation of Code Semantics. NeurIPS.
  4. Cummins, C., Petoumenos, P., Murray, A., & Leather, H. (2018). Compiler Fuzzing through Deep Learning. ISSTA.
  5. Goens, A., Brauckmann, A., Ertel, S., Cummins, C., Leather, H., & Castrillon, J. (2019). A Case Study on Machine Learning for Synthesizing Benchmarks. MAPL.
  6. Cummins, C. (2020). Deep Learning for Compilers. University of Edinburgh.

Build

See INSTALL.md for instructions on setting up the build environment.

Then build cldrive using:

$ bazel build -c opt //gpu/cldrive --incompatible_restrict_string_escapes=false

This will build an optimized cldrive binary and print its path.

Usage

$ cldrive --srcs=<opencl_sources> --envs=<opencl_devices>

Where <opencl_sources> if a comma separated list of absolute paths to OpenCL source files, and <opencl_devices> is a comma separated list of fully-qualified OpenCL device names. To list the available device names use --clinfo. Use --help to see the full list of options.

Example

For example, given a file:

$ cat kernel.cl
kernel void my_kernel(global int* a, global int* b) {
    int tid = get_global_id(0);
    a[tid] += 1;
    b[tid] = a[tid] * 2;
}

and available OpenCL devices:

$ cldrive --clinfo
GPU|NVIDIA|GeForce_GTX_1080|396.37|1.2
CPU|Intel|Intel_Xeon_CPU_E5-2620_v4_@_2.10GHz|1.2.0.25|2.0

To run the kernel 5 times on both devices using 4096 work items divided into work groups of size 1024:

$ 
bazel-bin/gpu/cldrive/cldrive --srcs=kernel.cl --num_runs=5 --gsize=4096 --lsize=1024 --envs='GPU|NVIDIA|Tesla_V100-PCIE-32GB|440.100|1.2','CPU|Intel_CPU_Runtime_for_OpenCL(TM)_Applications|Intel_Xeon_CPU_E5-2690_v4_@_2.60GHz|18.1.0.0920|2.1'

OpenCL Device, Kernel Name, Global Size, Local Size, Transferred Bytes, Runtime (ns)
I 2019-02-26 09:54:10 [gpu/cldrive/libcldrive.cc:59] clBuildProgram() with options '-cl-kernel-arg-info' completed in 1851 ms
GPU|NVIDIA|GeForce_GTX_1080|396.37|1.2, my_kernel, 4096, 1024, 65536, 113344
GPU|NVIDIA|GeForce_GTX_1080|396.37|1.2, my_kernel, 4096, 1024, 65536, 57984
GPU|NVIDIA|GeForce_GTX_1080|396.37|1.2, my_kernel, 4096, 1024, 65536, 64096
GPU|NVIDIA|GeForce_GTX_1080|396.37|1.2, my_kernel, 4096, 1024, 65536, 73696
GPU|NVIDIA|GeForce_GTX_1080|396.37|1.2, my_kernel, 4096, 1024, 65536, 73632
I 2019-02-26 09:54:11 [gpu/cldrive/libcldrive.cc:59] clBuildProgram() with options '-cl-kernel-arg-info' completed in 76 ms
CPU|Intel|Intel_Xeon_CPU_E5-2620_v4_@_2.10GHz|1.2.0.25|2.0, my_kernel, 4096, 1024, 65536, 105440
CPU|Intel|Intel_Xeon_CPU_E5-2620_v4_@_2.10GHz|1.2.0.25|2.0, my_kernel, 4096, 1024, 65536, 55936
CPU|Intel|Intel_Xeon_CPU_E5-2620_v4_@_2.10GHz|1.2.0.25|2.0, my_kernel, 4096, 1024, 65536, 63296
CPU|Intel|Intel_Xeon_CPU_E5-2620_v4_@_2.10GHz|1.2.0.25|2.0, my_kernel, 4096, 1024, 65536, 56192
CPU|Intel|Intel_Xeon_CPU_E5-2620_v4_@_2.10GHz|1.2.0.25|2.0, my_kernel, 4096, 1024, 65536, 55680

By default, cldrive prints a CSV summary of kernel stats and runtimes to stdout, and logging information to stderr. The raw information produced by cldrive is described in a set of protocol buffers //gpu/cldrive/proto:cldrive.proto. To print cldrive.Instances protos to stdout, use argumet --output_format=pbtxt to print text format protos, or --output_format=pb for binary format.

License

Copyright 2016-2020 Chris Cummins [email protected].

Released under the terms of the GPLv3 license. See LICENSE for details.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.