Once I do tune_kernel("my_kernel", ...) , how do I get

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

How to get the compiled function which can be called later? about kernel_tuner HOT 3 OPEN

vesuppi commented on June 12, 2024

How to get the compiled function which can be called later?

from kernel_tuner.

Comments (3)

benvanwerkhoven commented on June 12, 2024

Hi @vesuppi! Good question, as this isn't really documented all that well! For Python applications, we have the PythonKernel from kernel_tuner.kernelbuilder. This example shows the simplest way to use it:
https://github.com/KernelTuner/kernel_tuner/blob/master/examples/cuda/python_kernel.py

The idea is that you can either directly specify which configuration you want with the 'params=' option of PythonKernel. For example you could use get_best_config from kernel_tuner.util and pass that as the params.

A probably better way to do this is to let Kernel Tuner figure out which configuration to use based on the tuning results of tune_kernel that have been stored to a file. In that case you have to use the 'results_file=' option of PythonKernel and point to a "results file" written using the store_results function in kernel_tuner.integration.

This may seem like an additional step, but this enables you to only tune once, store the results, and then run the application many times reusing the same tuning results. The selection for which kernel configuration to compile is made based on the GPU available at run time and the specified problem size.

from kernel_tuner.

vesuppi commented on June 12, 2024

I see, thank you very much for the detailed explanation! I was able to get the best config using

results, env = tune_kernel("vector_add", kernel_string, N, (c, a, b, torch.tensor(N)), tune_params)
best_config = util.get_best_config(results, 'time')

Haven't tried PythonKernel yet, but will do! Another side question, if we want to tune the size of the thread block, does the parameters have to be named "block_size_x" and "block_size_y"? Thanks!

from kernel_tuner.

benvanwerkhoven commented on June 12, 2024

You can indeed use other names for the thread block dimensions. You can specify the names of these using the block_size_names= option of tune_kernel. This optional argument takes a list of strings with the names for the x, y, and z thread block dimensions.

This test illustrates how to use this option:
https://github.com/KernelTuner/kernel_tuner/blob/master/test/test_runners.py#L65
The example kernel in this test uses 'block_dim_x' instead of 'block_size_x', but you can change it to anything you like.

from kernel_tuner.

How to get the compiled function which can be called later? about kernel_tuner HOT 3 OPEN

Comments (3)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent