I created two workloads by profiling a different kernel each time and tried to compare the performance counters between the two kernels using the following command:
omniperf analyze -p workloads/vcopy_vecCopy/mi200/ -p workloads/vcopy_vecCopy_nocheck/mi200/
This fails with the following error after printing the "System Info" panel (I have intentionally changed the full path to my omniperf install, but this does not change the stack trace otherwise):
Traceback (most recent call last):
File "/path/to/omniperf/1.0.6/bin/omniperf", line 663, in <module>
main()
File "/path/to/omniperf/1.0.6/bin/omniperf", line 643, in main
analyze(args)
File "/path/to/omniperf/1.0.6/bin/omniperf_analyze/omniperf_analyze.py", line 250, in analyze
run_cli(args, runs)
File "/path/to/omniperf/1.0.6/bin/omniperf_analyze/omniperf_analyze.py", line 199, in run_cli
tty.show_all(
File "/path/to/omniperf/1.0.6/bin/omniperf_analyze/utils/tty.py", line 108, in show_all
base_df[header].astype("double"),
File "/path/to/omniperf/python-libs/pandas/core/generic.py", line 6240, in astype
new_data = self._mgr.astype(dtype=dtype, copy=copy, errors=errors)
File "/path/to/omniperf/python-libs/pandas/core/internals/managers.py", line 450, in astype
return self.apply("astype", dtype=dtype, copy=copy, errors=errors)
File "/path/to/omniperf/python-libs/pandas/core/internals/managers.py", line 352, in apply
applied = getattr(b, f)(**kwargs)
File "/path/to/omniperf/python-libs/pandas/core/internals/blocks.py", line 526, in astype
new_values = astype_array_safe(values, dtype, copy=copy, errors=errors)
File "/path/to/omniperf/python-libs/pandas/core/dtypes/astype.py", line 299, in astype_array_safe
new_values = astype_array(values, dtype, copy=copy)
File "/path/to/omniperf/python-libs/pandas/core/dtypes/astype.py", line 230, in astype_array
values = astype_nansafe(values, dtype, copy=copy)
File "/path/to/omniperf/python-libs/pandas/core/dtypes/astype.py", line 170, in astype_nansafe
return arr.astype(dtype, copy=True)
ValueError: could not convert string to float: ''
omniperf v1.0.6 was installed from source using instructions in this repo's documentation without any trouble.
To reproduce, I took the vcopy.cpp example from this repo and added new kernel called vecCopy_nocheck where I just commented out the check for array bounds. I also added a call to launch this kernel. My updates can be seen in the following diff:
$ git diff
diff --git a/sample/vcopy.cpp b/sample/vcopy.cpp
index 0eed487..565d8c0 100644
--- a/sample/vcopy.cpp
+++ b/sample/vcopy.cpp
@@ -18,6 +18,12 @@ __global__ void vecCopy(double *a, double *b, double *c, int n,int stride)
c[id] = a[id];
}
}
+__global__ void vecCopy_nocheck(double *a, double *b, double *c, int n,int stride)
+{
+ // Get our global thread ID
+ int id = blockIdx.x*blockDim.x+threadIdx.x;
+ c[id] = a[id];
+}
void usage()
{
@@ -114,6 +120,7 @@ int main( int argc, char* argv[] )
printf("Launching the kernel on the GPU\n");
// Execute the kernel
hipLaunchKernelGGL(vecCopy, dim3(gridSize), dim3(blockSize), 0, 0, d_a, d_b, d_c, n,stride);
+ hipLaunchKernelGGL(vecCopy_nocheck, dim3(gridSize), dim3(blockSize), 0, 0, d_a, d_b, d_c, n,stride);
hipDeviceSynchronize( );
printf("Finished executing kernel\n");
// Copy array back to host
hipcc -O3 -o vcopy vcopy.cpp
omniperf profile --device 0 -k vecCopy -n vcopy_vecCopy -- ./vcopy 102400 256 0
omniperf profile --device 0 -k vecCopy_nocheck -n vcopy_vecCopy_nocheck -- ./vcopy 102400 256 0
omniperf analyze -p workloads/vcopy_vecCopy/mi200/ -p workloads/vcopy_vecCopy_nocheck/mi200/