Code Monkey home page Code Monkey logo

Comments (3)

benvanwerkhoven avatar benvanwerkhoven commented on June 12, 2024

Hi @vesuppi! Good question, as this isn't really documented all that well! For Python applications, we have the PythonKernel from kernel_tuner.kernelbuilder. This example shows the simplest way to use it:
https://github.com/KernelTuner/kernel_tuner/blob/master/examples/cuda/python_kernel.py

The idea is that you can either directly specify which configuration you want with the 'params=' option of PythonKernel. For example you could use get_best_config from kernel_tuner.util and pass that as the params.

A probably better way to do this is to let Kernel Tuner figure out which configuration to use based on the tuning results of tune_kernel that have been stored to a file. In that case you have to use the 'results_file=' option of PythonKernel and point to a "results file" written using the store_results function in kernel_tuner.integration.

This may seem like an additional step, but this enables you to only tune once, store the results, and then run the application many times reusing the same tuning results. The selection for which kernel configuration to compile is made based on the GPU available at run time and the specified problem size.

from kernel_tuner.

vesuppi avatar vesuppi commented on June 12, 2024

I see, thank you very much for the detailed explanation! I was able to get the best config using

results, env = tune_kernel("vector_add", kernel_string, N, (c, a, b, torch.tensor(N)), tune_params)
best_config = util.get_best_config(results, 'time')

Haven't tried PythonKernel yet, but will do! Another side question, if we want to tune the size of the thread block, does the parameters have to be named "block_size_x" and "block_size_y"? Thanks!

from kernel_tuner.

benvanwerkhoven avatar benvanwerkhoven commented on June 12, 2024

You can indeed use other names for the thread block dimensions. You can specify the names of these using the block_size_names= option of tune_kernel. This optional argument takes a list of strings with the names for the x, y, and z thread block dimensions.

This test illustrates how to use this option:
https://github.com/KernelTuner/kernel_tuner/blob/master/test/test_runners.py#L65
The example kernel in this test uses 'block_dim_x' instead of 'block_size_x', but you can change it to anything you like.

from kernel_tuner.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.