This package is renamed to slang-torch and this repository is archieved.
Please visit https://github.com/shader-slang/slang-torch for latest development of this package.
Superseded by github.com/shader-slang/slang-torch
License: MIT License
This package is renamed to slang-torch and this repository is archieved.
Please visit https://github.com/shader-slang/slang-torch for latest development of this package.
Would it be useful in the slang framework to compute determinants of matrices greater than 4x4 ? In the current code base the determinant function is specialized only upto 4x4 matrices.
// Compute determinant function
[AutoPyBindCUDA]
[CUDAKernel]
[Differentiable]
void compute_determinant(DiffTensorView mat, DiffTensorView output) {
matrix<float, N, N> mat_s;
float det;
[MaxIters(N)]
for (int i = 0; i < N; i++) {
[MaxIters(N)]
for (int j = 0; j < N; j++) {
mat_s[i][j] = mat[i, j];
}
}
output[0] = determinant(mat_s);
N is defined correctly as a static compile time constant.
I get the following error message on trying to load this function - Essentially in the .cu file, the slang determinant function is not accessible. Is there a way around this, other than writing my own determinant function in the .slang file?
{
"name": "RuntimeError",
"message": "Error building extension '_slangpy_bezier_compute_44136fa355b3678a': [1/3] /usr/local/cuda-11.7/bin/nvcc -DTORCH_EXTENSION_NAME=_slangpy_bezier_compute_44136fa355b3678a -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\\\"_gcc\\\" -DPYBIND11_STDLIB=\\\"_libstdcpp\\\" -DPYBIND11_BUILD_ABI=\\\"_cxxabi1011\\\" -I/home/ubuntu/slang-python/examples/bezier2d -isystem /home/ubuntu/anaconda3/envs/gaussian_splatting/lib/python3.8/site-packages/torch/include -isystem /home/ubuntu/anaconda3/envs/gaussian_splatting/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /home/ubuntu/anaconda3/envs/gaussian_splatting/lib/python3.8/site-packages/torch/include/TH -isystem /home/ubuntu/anaconda3/envs/gaussian_splatting/lib/python3.8/site-packages/torch/include/THC -isystem /usr/local/cuda-11.7/include -isystem /home/ubuntu/anaconda3/envs/gaussian_splatting/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -std=c++17 -c /home/ubuntu/slang-python/examples/bezier2d/.slangpy_cache/bezier_compute/44136fa355b3678a/bezier_compute_cuda.cu -o bezier_compute_cuda.cuda.o \nFAILED: bezier_compute_cuda.cuda.o \n/usr/local/cuda-11.7/bin/nvcc -DTORCH_EXTENSION_NAME=_slangpy_bezier_compute_44136fa355b3678a -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\\\"_gcc\\\" -DPYBIND11_STDLIB=\\\"_libstdcpp\\\" -DPYBIND11_BUILD_ABI=\\\"_cxxabi1011\\\" -I/home/ubuntu/slang-python/examples/bezier2d -isystem /home/ubuntu/anaconda3/envs/gaussian_splatting/lib/python3.8/site-packages/torch/include -isystem /home/ubuntu/anaconda3/envs/gaussian_splatting/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /home/ubuntu/anaconda3/envs/gaussian_splatting/lib/python3.8/site-packages/torch/include/TH -isystem /home/ubuntu/anaconda3/envs/gaussian_splatting/lib/python3.8/site-packages/torch/include/THC -isystem /usr/local/cuda-11.7/include -isystem /home/ubuntu/anaconda3/envs/gaussian_splatting/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -std=c++17 -c /home/ubuntu/slang-python/examples/bezier2d/.slangpy_cache/bezier_compute/44136fa355b3678a/bezier_compute_cuda.cu -o bezier_compute_cuda.cuda.o \n/home/ubuntu/slang-python/examples/bezier2d/.slangpy_cache/bezier_compute/44136fa355b3678a/bezier_compute_cuda.cu(2372): warning #550-D: variable \"oldVal_0\" was set but never used\n\n/home/ubuntu/slang-python/examples/bezier2d/.slangpy_cache/bezier_compute/44136fa355b3678a/bezier_compute_cuda.cu(2379): warning #550-D: variable \"oldVal_1\" was set but never used\n\n/home/ubuntu/slang-python/examples/bezier2d/.slangpy_cache/bezier_compute/44136fa355b3678a/bezier_compute_cuda.cu(3036): error: identifier \"determinant\" is undefined\n\n/home/ubuntu/slang-python/examples/bezier2d/.slangpy_cache/bezier_compute/44136fa355b3678a/bezier_compute_cuda.cu(3517): error: identifier \"determinant\" is undefined\n\n2 errors detected in the compilation of \"/home/ubuntu/slang-python/examples/bezier2d/.slangpy_cache/bezier_compute/44136fa355b3678a/bezier_compute_cuda.cu\".\n[2/3] c++ -MMD -MF bezier_compute.o.d -DTORCH_EXTENSION_NAME=_slangpy_bezier_compute_44136fa355b3678a -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\\\"_gcc\\\" -DPYBIND11_STDLIB=\\\"_libstdcpp\\\" -DPYBIND11_BUILD_ABI=\\\"_cxxabi1011\\\" -I/home/ubuntu/slang-python/examples/bezier2d -isystem /home/ubuntu/anaconda3/envs/gaussian_splatting/lib/python3.8/site-packages/torch/include -isystem /home/ubuntu/anaconda3/envs/gaussian_splatting/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /home/ubuntu/anaconda3/envs/gaussian_splatting/lib/python3.8/site-packages/torch/include/TH -isystem /home/ubuntu/anaconda3/envs/gaussian_splatting/lib/python3.8/site-packages/torch/include/THC -isystem /usr/local/cuda-11.7/include -isystem /home/ubuntu/anaconda3/envs/gaussian_splatting/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -std=c++17 -c /home/ubuntu/slang-python/examples/bezier2d/.slangpy_cache/bezier_compute/44136fa355b3678a/bezier_compute.cpp -o bezier_compute.o \n/home/ubuntu/slang-python/examples/bezier2d/.slangpy_cache/bezier_compute/44136fa355b3678a/bezier_compute.cpp:8398:9: warning: #pragma once in main file\n 8398 | #pragma once\n | ^~~~\nninja: build stopped: subcommand failed.\n",
"stack": "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m\n\u001b[0;31mCalledProcessError\u001b[0m Traceback (most recent call last)\nFile \u001b[0;32m~/anaconda3/envs/gaussian_splatting/lib/python3.8/site-packages/torch/utils/cpp_extension.py:1900\u001b[0m, in \u001b[0;36m_run_ninja_build\u001b[0;34m(build_directory, verbose, error_prefix)\u001b[0m\n\u001b[1;32m 1899\u001b[0m stdout_fileno \u001b[38;5;241m=\u001b[39m \u001b[38;5;241m1\u001b[39m\n\u001b[0;32m-> 1900\u001b[0m \u001b[43msubprocess\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mrun\u001b[49m\u001b[43m(\u001b[49m\n\u001b[1;32m 1901\u001b[0m \u001b[43m \u001b[49m\u001b[43mcommand\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 1902\u001b[0m \u001b[43m \u001b[49m\u001b[43mstdout\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mstdout_fileno\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;28;43;01mif\u001b[39;49;00m\u001b[43m \u001b[49m\u001b[43mverbose\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;28;43;01melse\u001b[39;49;00m\u001b[43m \u001b[49m\u001b[43msubprocess\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mPIPE\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 1903\u001b[0m \u001b[43m \u001b[49m\u001b[43mstderr\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43msubprocess\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mSTDOUT\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 1904\u001b[0m \u001b[43m \u001b[49m\u001b[43mcwd\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mbuild_directory\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 1905\u001b[0m \u001b[43m \u001b[49m\u001b[43mcheck\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[38;5;28;43;01mTrue\u001b[39;49;00m\u001b[43m,\u001b[49m\n\u001b[1;32m 1906\u001b[0m \u001b[43m \u001b[49m\u001b[43menv\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43menv\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m 1907\u001b[0m \u001b[38;5;28;01mexcept\u001b[39;00m subprocess\u001b[38;5;241m.\u001b[39mCalledProcessError \u001b[38;5;28;01mas\u001b[39;00m e:\n\u001b[1;32m 1908\u001b[0m \u001b[38;5;66;03m# Python 2 and 3 compatible way of getting the error object.\u001b[39;00m\n\nFile \u001b[0;32m~/anaconda3/envs/gaussian_splatting/lib/python3.8/subprocess.py:516\u001b[0m, in \u001b[0;36mrun\u001b[0;34m(input, capture_output, timeout, check, *popenargs, **kwargs)\u001b[0m\n\u001b[1;32m 515\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m check \u001b[38;5;129;01mand\u001b[39;00m retcode:\n\u001b[0;32m--> 516\u001b[0m \u001b[38;5;28;01mraise\u001b[39;00m CalledProcessError(retcode, process\u001b[38;5;241m.\u001b[39margs,\n\u001b[1;32m 517\u001b[0m output\u001b[38;5;241m=\u001b[39mstdout, stderr\u001b[38;5;241m=\u001b[39mstderr)\n\u001b[1;32m 518\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m CompletedProcess(process\u001b[38;5;241m.\u001b[39margs, retcode, stdout, stderr)\n\n\u001b[0;31mCalledProcessError\u001b[0m: Command '['ninja', '-v']' returned non-zero exit status 1.\n\nThe above exception was the direct cause of the following exception:\n\n\u001b[0;31mRuntimeError\u001b[0m Traceback (most recent call last)\nCell \u001b[0;32mIn [12], line 4\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[38;5;28;01mimport\u001b[39;00m \u001b[38;5;21;01mtorch\u001b[39;00m\n\u001b[1;32m 2\u001b[0m \u001b[38;5;28;01mimport\u001b[39;00m \u001b[38;5;21;01mslangpy\u001b[39;00m\n\u001b[0;32m----> 4\u001b[0m m \u001b[38;5;241m=\u001b[39m \u001b[43mslangpy\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mloadModule\u001b[49m\u001b[43m(\u001b[49m\u001b[38;5;124;43m'\u001b[39;49m\u001b[38;5;124;43mbezier_compute.slang\u001b[39;49m\u001b[38;5;124;43m'\u001b[39;49m\u001b[43m)\u001b[49m\n\u001b[1;32m 6\u001b[0m N \u001b[38;5;241m=\u001b[39m \u001b[38;5;241m1000\u001b[39m\n\u001b[1;32m 7\u001b[0m t \u001b[38;5;241m=\u001b[39m torch\u001b[38;5;241m.\u001b[39mlinspace(\u001b[38;5;241m0.0\u001b[39m,\u001b[38;5;241m1\u001b[39m,N, dtype\u001b[38;5;241m=\u001b[39mtorch\u001b[38;5;241m.\u001b[39mfloat)\u001b[38;5;241m.\u001b[39mcuda()\n\nFile \u001b[0;32m~/anaconda3/envs/gaussian_splatting/lib/python3.8/site-packages/slangpy/slangpy.py:616\u001b[0m, in \u001b[0;36mloadModule\u001b[0;34m(fileName, skipSlang, verbose, defines, includePaths)\u001b[0m\n\u001b[1;32m 613\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m verbose:\n\u001b[1;32m 614\u001b[0m \u001b[38;5;28mprint\u001b[39m(\u001b[38;5;124mf\u001b[39m\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mWorking folder: \u001b[39m\u001b[38;5;132;01m{\u001b[39;00mbuildDir\u001b[38;5;132;01m}\u001b[39;00m\u001b[38;5;124m\"\u001b[39m, file\u001b[38;5;241m=\u001b[39msys\u001b[38;5;241m.\u001b[39mstderr)\n\u001b[0;32m--> 616\u001b[0m rawModule \u001b[38;5;241m=\u001b[39m \u001b[43m_loadModule\u001b[49m\u001b[43m(\u001b[49m\u001b[43mfileName\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mmoduleName\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mbuildDir\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43moptions\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43msourceDir\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43moutputFolder\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mverbose\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mverbose\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mincludePaths\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mincludePaths\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mdryRun\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[38;5;28;43;01mFalse\u001b[39;49;00m\u001b[43m)\u001b[49m\n\u001b[1;32m 617\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m wrapModule(rawModule)\n\nFile \u001b[0;32m~/anaconda3/envs/gaussian_splatting/lib/python3.8/site-packages/slangpy/slangpy.py:535\u001b[0m, in \u001b[0;36m_loadModule\u001b[0;34m(fileName, moduleName, outputFolder, options, sourceDir, verbose, includePaths, dryRun)\u001b[0m\n\u001b[1;32m 532\u001b[0m \u001b[38;5;66;03m# Compile host and kernel modules to torch module.\u001b[39;00m\n\u001b[1;32m 533\u001b[0m downstreamStartTime \u001b[38;5;241m=\u001b[39m time\u001b[38;5;241m.\u001b[39mperf_counter()\n\u001b[0;32m--> 535\u001b[0m slangLib, metadata \u001b[38;5;241m=\u001b[39m \u001b[43mcompileAndLoadModule\u001b[49m\u001b[43m(\u001b[49m\n\u001b[1;32m 536\u001b[0m \u001b[43m \u001b[49m\u001b[43mmetadata\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43m[\u001b[49m\u001b[43mcppOutName\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mcudaOutName\u001b[49m\u001b[43m]\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\n\u001b[1;32m 537\u001b[0m \u001b[43m \u001b[49m\u001b[43mmoduleName\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43moutputFolder\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mslangSourceDir\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 538\u001b[0m \u001b[43m \u001b[49m\u001b[43mverbose\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mdryRun\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mdryRun\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m 540\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m dryRun:\n\u001b[1;32m 541\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m slangLib:\n\nFile \u001b[0;32m~/anaconda3/envs/gaussian_splatting/lib/python3.8/site-packages/slangpy/slangpy.py:426\u001b[0m, in \u001b[0;36mcompileAndLoadModule\u001b[0;34m(metadata, sources, moduleName, buildDir, slangSourceDir, verbose, dryRun)\u001b[0m\n\u001b[1;32m 423\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;28;01mTrue\u001b[39;00m, \u001b[38;5;28;01mNone\u001b[39;00m\n\u001b[1;32m 425\u001b[0m \u001b[38;5;66;03m# Compile the module.\u001b[39;00m\n\u001b[0;32m--> 426\u001b[0m slangLib \u001b[38;5;241m=\u001b[39m \u001b[43m_compileAndLoadModule\u001b[49m\u001b[43m(\u001b[49m\u001b[43mmetadata\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43msources\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mmoduleName\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mbuildDir\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mslangSourceDir\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mverbose\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m 428\u001b[0m newMetadata \u001b[38;5;241m=\u001b[39m metadata\u001b[38;5;241m.\u001b[39mcopy()\n\u001b[1;32m 429\u001b[0m newMetadata[\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mmoduleName\u001b[39m\u001b[38;5;124m\"\u001b[39m] \u001b[38;5;241m=\u001b[39m moduleName\n\nFile \u001b[0;32m~/anaconda3/envs/gaussian_splatting/lib/python3.8/site-packages/slangpy/slangpy.py:465\u001b[0m, in \u001b[0;36m_compileAndLoadModule\u001b[0;34m(metadata, sources, moduleName, buildDir, slangSourceDir, verbose)\u001b[0m\n\u001b[1;32m 462\u001b[0m \u001b[38;5;28;01melse\u001b[39;00m:\n\u001b[1;32m 463\u001b[0m extra_include_paths \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;01mNone\u001b[39;00m\n\u001b[0;32m--> 465\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[43mjit_compile\u001b[49m\u001b[43m(\u001b[49m\n\u001b[1;32m 466\u001b[0m \u001b[43m \u001b[49m\u001b[43mmoduleName\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 467\u001b[0m \u001b[43m \u001b[49m\u001b[43msources\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 468\u001b[0m \u001b[43m \u001b[49m\u001b[43mextra_cflags\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mextra_cflags\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 469\u001b[0m \u001b[43m \u001b[49m\u001b[43mextra_cuda_cflags\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mextra_cuda_cflags\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 470\u001b[0m \u001b[43m \u001b[49m\u001b[43mextra_ldflags\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[38;5;28;43;01mNone\u001b[39;49;00m\u001b[43m,\u001b[49m\n\u001b[1;32m 471\u001b[0m \u001b[43m \u001b[49m\u001b[43mextra_include_paths\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mextra_include_paths\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 472\u001b[0m \u001b[43m \u001b[49m\u001b[43mbuild_directory\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mos\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mpath\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mrealpath\u001b[49m\u001b[43m(\u001b[49m\u001b[43mbuildDir\u001b[49m\u001b[43m)\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 473\u001b[0m \u001b[43m \u001b[49m\u001b[43mverbose\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mverbose\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 474\u001b[0m \u001b[43m \u001b[49m\u001b[43mis_python_module\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[38;5;28;43;01mTrue\u001b[39;49;00m\u001b[43m,\u001b[49m\n\u001b[1;32m 475\u001b[0m \u001b[43m \u001b[49m\u001b[43mis_standalone\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[38;5;28;43;01mFalse\u001b[39;49;00m\u001b[43m,\u001b[49m\n\u001b[1;32m 476\u001b[0m \u001b[43m \u001b[49m\u001b[43mkeep_intermediates\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[38;5;28;43;01mTrue\u001b[39;49;00m\u001b[43m,\u001b[49m\n\u001b[1;32m 477\u001b[0m \u001b[43m \u001b[49m\u001b[43mwith_cuda\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[38;5;28;43;01mNone\u001b[39;49;00m\u001b[43m)\u001b[49m\n\nFile \u001b[0;32m~/anaconda3/envs/gaussian_splatting/lib/python3.8/site-packages/slangpy/util/compile.py:71\u001b[0m, in \u001b[0;36mjit_compile\u001b[0;34m(name, sources, extra_cflags, extra_cuda_cflags, extra_ldflags, extra_include_paths, build_directory, verbose, with_cuda, is_python_module, is_standalone, keep_intermediates)\u001b[0m\n\u001b[1;32m 67\u001b[0m hipified_sources\u001b[38;5;241m.\u001b[39madd(hipify_result[s_abs][\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mhipified_path\u001b[39m\u001b[38;5;124m\"\u001b[39m] \u001b[38;5;28;01mif\u001b[39;00m s_abs \u001b[38;5;129;01min\u001b[39;00m hipify_result \u001b[38;5;28;01melse\u001b[39;00m s_abs)\n\u001b[1;32m 69\u001b[0m sources \u001b[38;5;241m=\u001b[39m \u001b[38;5;28mlist\u001b[39m(hipified_sources)\n\u001b[0;32m---> 71\u001b[0m \u001b[43m_write_ninja_file_and_build_library\u001b[49m\u001b[43m(\u001b[49m\n\u001b[1;32m 72\u001b[0m \u001b[43m \u001b[49m\u001b[43mname\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mname\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 73\u001b[0m \u001b[43m \u001b[49m\u001b[43msources\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43msources\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 74\u001b[0m \u001b[43m \u001b[49m\u001b[43mextra_cflags\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mextra_cflags\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;129;43;01mor\u001b[39;49;00m\u001b[43m \u001b[49m\u001b[43m[\u001b[49m\u001b[43m]\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 75\u001b[0m \u001b[43m \u001b[49m\u001b[43mextra_cuda_cflags\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mextra_cuda_cflags\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;129;43;01mor\u001b[39;49;00m\u001b[43m \u001b[49m\u001b[43m[\u001b[49m\u001b[43m]\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 76\u001b[0m \u001b[43m \u001b[49m\u001b[43mextra_ldflags\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mextra_ldflags\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;129;43;01mor\u001b[39;49;00m\u001b[43m \u001b[49m\u001b[43m[\u001b[49m\u001b[43m]\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 77\u001b[0m \u001b[43m \u001b[49m\u001b[43mextra_include_paths\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mextra_include_paths\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;129;43;01mor\u001b[39;49;00m\u001b[43m \u001b[49m\u001b[43m[\u001b[49m\u001b[43m]\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 78\u001b[0m \u001b[43m \u001b[49m\u001b[43mbuild_directory\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mbuild_directory\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 79\u001b[0m \u001b[43m \u001b[49m\u001b[43mverbose\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mverbose\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 80\u001b[0m \u001b[43m \u001b[49m\u001b[43mwith_cuda\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mwith_cuda\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 81\u001b[0m \u001b[43m \u001b[49m\u001b[43mis_standalone\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mis_standalone\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m 82\u001b[0m \u001b[38;5;28;01mfinally\u001b[39;00m:\n\u001b[1;32m 83\u001b[0m baton\u001b[38;5;241m.\u001b[39mrelease()\n\nFile \u001b[0;32m~/anaconda3/envs/gaussian_splatting/lib/python3.8/site-packages/torch/utils/cpp_extension.py:1623\u001b[0m, in \u001b[0;36m_write_ninja_file_and_build_library\u001b[0;34m(name, sources, extra_cflags, extra_cuda_cflags, extra_ldflags, extra_include_paths, build_directory, verbose, with_cuda, is_standalone)\u001b[0m\n\u001b[1;32m 1621\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m verbose:\n\u001b[1;32m 1622\u001b[0m \u001b[38;5;28mprint\u001b[39m(\u001b[38;5;124mf\u001b[39m\u001b[38;5;124m'\u001b[39m\u001b[38;5;124mBuilding extension module \u001b[39m\u001b[38;5;132;01m{\u001b[39;00mname\u001b[38;5;132;01m}\u001b[39;00m\u001b[38;5;124m...\u001b[39m\u001b[38;5;124m'\u001b[39m, file\u001b[38;5;241m=\u001b[39msys\u001b[38;5;241m.\u001b[39mstderr)\n\u001b[0;32m-> 1623\u001b[0m \u001b[43m_run_ninja_build\u001b[49m\u001b[43m(\u001b[49m\n\u001b[1;32m 1624\u001b[0m \u001b[43m \u001b[49m\u001b[43mbuild_directory\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 1625\u001b[0m \u001b[43m \u001b[49m\u001b[43mverbose\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 1626\u001b[0m \u001b[43m \u001b[49m\u001b[43merror_prefix\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[38;5;124;43mf\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[38;5;124;43mError building extension \u001b[39;49m\u001b[38;5;124;43m'\u001b[39;49m\u001b[38;5;132;43;01m{\u001b[39;49;00m\u001b[43mname\u001b[49m\u001b[38;5;132;43;01m}\u001b[39;49;00m\u001b[38;5;124;43m'\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[43m)\u001b[49m\n\nFile \u001b[0;32m~/anaconda3/envs/gaussian_splatting/lib/python3.8/site-packages/torch/utils/cpp_extension.py:1916\u001b[0m, in \u001b[0;36m_run_ninja_build\u001b[0;34m(build_directory, verbose, error_prefix)\u001b[0m\n\u001b[1;32m 1914\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;28mhasattr\u001b[39m(error, \u001b[38;5;124m'\u001b[39m\u001b[38;5;124moutput\u001b[39m\u001b[38;5;124m'\u001b[39m) \u001b[38;5;129;01mand\u001b[39;00m error\u001b[38;5;241m.\u001b[39moutput: \u001b[38;5;66;03m# type: ignore[union-attr]\u001b[39;00m\n\u001b[1;32m 1915\u001b[0m message \u001b[38;5;241m+\u001b[39m\u001b[38;5;241m=\u001b[39m \u001b[38;5;124mf\u001b[39m\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124m: \u001b[39m\u001b[38;5;132;01m{\u001b[39;00merror\u001b[38;5;241m.\u001b[39moutput\u001b[38;5;241m.\u001b[39mdecode(\u001b[38;5;241m*\u001b[39mSUBPROCESS_DECODE_ARGS)\u001b[38;5;132;01m}\u001b[39;00m\u001b[38;5;124m\"\u001b[39m \u001b[38;5;66;03m# type: ignore[union-attr]\u001b[39;00m\n\u001b[0;32m-> 1916\u001b[0m \u001b[38;5;28;01mraise\u001b[39;00m \u001b[38;5;167;01mRuntimeError\u001b[39;00m(message) \u001b[38;5;28;01mfrom\u001b[39;00m \u001b[38;5;21;01me\u001b[39;00m\n\n\u001b[0;31mRuntimeError\u001b[0m: Error building extension '_slangpy_bezier_compute_44136fa355b3678a': [1/3] /usr/local/cuda-11.7/bin/nvcc -DTORCH_EXTENSION_NAME=_slangpy_bezier_compute_44136fa355b3678a -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\\\"_gcc\\\" -DPYBIND11_STDLIB=\\\"_libstdcpp\\\" -DPYBIND11_BUILD_ABI=\\\"_cxxabi1011\\\" -I/home/ubuntu/slang-python/examples/bezier2d -isystem /home/ubuntu/anaconda3/envs/gaussian_splatting/lib/python3.8/site-packages/torch/include -isystem /home/ubuntu/anaconda3/envs/gaussian_splatting/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /home/ubuntu/anaconda3/envs/gaussian_splatting/lib/python3.8/site-packages/torch/include/TH -isystem /home/ubuntu/anaconda3/envs/gaussian_splatting/lib/python3.8/site-packages/torch/include/THC -isystem /usr/local/cuda-11.7/include -isystem /home/ubuntu/anaconda3/envs/gaussian_splatting/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -std=c++17 -c /home/ubuntu/slang-python/examples/bezier2d/.slangpy_cache/bezier_compute/44136fa355b3678a/bezier_compute_cuda.cu -o bezier_compute_cuda.cuda.o \nFAILED: bezier_compute_cuda.cuda.o \n/usr/local/cuda-11.7/bin/nvcc -DTORCH_EXTENSION_NAME=_slangpy_bezier_compute_44136fa355b3678a -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\\\"_gcc\\\" -DPYBIND11_STDLIB=\\\"_libstdcpp\\\" -DPYBIND11_BUILD_ABI=\\\"_cxxabi1011\\\" -I/home/ubuntu/slang-python/examples/bezier2d -isystem /home/ubuntu/anaconda3/envs/gaussian_splatting/lib/python3.8/site-packages/torch/include -isystem /home/ubuntu/anaconda3/envs/gaussian_splatting/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /home/ubuntu/anaconda3/envs/gaussian_splatting/lib/python3.8/site-packages/torch/include/TH -isystem /home/ubuntu/anaconda3/envs/gaussian_splatting/lib/python3.8/site-packages/torch/include/THC -isystem /usr/local/cuda-11.7/include -isystem /home/ubuntu/anaconda3/envs/gaussian_splatting/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -std=c++17 -c /home/ubuntu/slang-python/examples/bezier2d/.slangpy_cache/bezier_compute/44136fa355b3678a/bezier_compute_cuda.cu -o bezier_compute_cuda.cuda.o \n/home/ubuntu/slang-python/examples/bezier2d/.slangpy_cache/bezier_compute/44136fa355b3678a/bezier_compute_cuda.cu(2372): warning #550-D: variable \"oldVal_0\" was set but never used\n\n/home/ubuntu/slang-python/examples/bezier2d/.slangpy_cache/bezier_compute/44136fa355b3678a/bezier_compute_cuda.cu(2379): warning #550-D: variable \"oldVal_1\" was set but never used\n\n/home/ubuntu/slang-python/examples/bezier2d/.slangpy_cache/bezier_compute/44136fa355b3678a/bezier_compute_cuda.cu(3036): error: identifier \"determinant\" is undefined\n\n/home/ubuntu/slang-python/examples/bezier2d/.slangpy_cache/bezier_compute/44136fa355b3678a/bezier_compute_cuda.cu(3517): error: identifier \"determinant\" is undefined\n\n2 errors detected in the compilation of \"/home/ubuntu/slang-python/examples/bezier2d/.slangpy_cache/bezier_compute/44136fa355b3678a/bezier_compute_cuda.cu\".\n[2/3] c++ -MMD -MF bezier_compute.o.d -DTORCH_EXTENSION_NAME=_slangpy_bezier_compute_44136fa355b3678a -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\\\"_gcc\\\" -DPYBIND11_STDLIB=\\\"_libstdcpp\\\" -DPYBIND11_BUILD_ABI=\\\"_cxxabi1011\\\" -I/home/ubuntu/slang-python/examples/bezier2d -isystem /home/ubuntu/anaconda3/envs/gaussian_splatting/lib/python3.8/site-packages/torch/include -isystem /home/ubuntu/anaconda3/envs/gaussian_splatting/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /home/ubuntu/anaconda3/envs/gaussian_splatting/lib/python3.8/site-packages/torch/include/TH -isystem /home/ubuntu/anaconda3/envs/gaussian_splatting/lib/python3.8/site-packages/torch/include/THC -isystem /usr/local/cuda-11.7/include -isystem /home/ubuntu/anaconda3/envs/gaussian_splatting/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -std=c++17 -c /home/ubuntu/slang-python/examples/bezier2d/.slangpy_cache/bezier_compute/44136fa355b3678a/bezier_compute.cpp -o bezier_compute.o \n/home/ubuntu/slang-python/examples/bezier2d/.slangpy_cache/bezier_compute/44136fa355b3678a/bezier_compute.cpp:8398:9: warning: #pragma once in main file\n 8398 | #pragma once\n | ^~~~\nninja: build stopped: subcommand failed.\n"
}
Windows 11
Python 3.12.2 with installer
Slangpy 1.1.22
When attempting to run the example square.slang kernel, a compilation error:
LINK : fatal error LNK1104: cannot open file 'python312.lib'
ninja: build stopped: subcommand failed.
How can I point your build system towards my python312.lib at ~\AppData\Local\Programs\Python\Python312\libs
? I've set a LIB environment variable but that doesn't seem to remedy it.
Hi, I wrote some code in slang, which seems to execute the forward pass correctly. But when I execute the backward pass, GPU utils goes 100%, and system hangs. Specially when I try to print the derivative value. I am using slangpy as the python interface to call the slang kernel file (shown below).
static const int N = NUM_CTRL_PTS;
static const int c = DIM;
static const int N1 = c * (N- 1);
__generic<let C : int>
struct MatrixG : IDifferentiable
{
float vals[C];
}
int nCi(int n, int i) {
if (i > n) return 0;
if (i == 0 || i == n) return 1;
if (i > n - i) i = n - i;
int result = 1;
for (int k = 1; k <= i; ++k) {
result *= n - k + 1;
result /= k;
}
return result;
}
int fact(int n) {
int result = 1;
for (int i = 1; i <= n; ++i) {
result *= i;
}
return result;
}
[CudaDeviceExport]
[Differentiable]
matrix<float, N, c> compute_coeffs_device(DiffTensorView control_pts) {
// Compute the coefficients a_i for t^i, for bezier polynomial \sum a_i . t^i
matrix<float, N, c> coeffs;
[MaxIters(c)]
for (int k = 0; k < c; k++) {
[MaxIters(N)]
for (int j = 0; j < N; j++) {
int nCj = fact(N - 1) / fact(N - 1 - j); // degree of the polynomial is N-1
float sum = 0;
[MaxIters(N)]
for (int i = 0; i < N; i++) {
if (i <= j) {
sum += pow(-1, i + j) * control_pts[i, k] / (fact(i) * fact(j - i));
}
}
coeffs[j][k] = nCj * sum;
}
}
return coeffs;
}
[CudaDeviceExport]
[Differentiable]
MatrixG<N1 * N1> assemble_matrix_sdf(matrix<float, N,c> coeffs) {
// Function to create the matrix whose determinant is to be evaluated to get the sdf
// coeffs: Tensor (N,c)
MatrixG<N1 * N1> mat;
// Initializing
[MaxIters(N1 * N1)]
for (int i = 0; i < N1 * N1; i++)
mat.vals[i] = 0.0;
[MaxIters(N - 1)]
for (int i = 0; i < N - 1; i++)
[MaxIters(N - 1)]
for (int j = 0; j < N; j++)
[MaxIters(c)]
for (int k = 0; k < c; k++)
{
mat.vals[(k * (N - 1) + i) * N1 + j + i] = coeffs[j][k];
}
return mat;
}
[AutoPyBindCUDA]
[CUDAKernel]
[Differentiable]
void bezier2D(DiffTensorView t, DiffTensorView control_pts, DiffTensorView output)
{
// t (tensor Mx1) : indices between 0-1 to traverse across the Bezier curve
// control_pts (Nx2): N - Degree of Bezier Curve 2D
// Get the 'global' index of this thread.
uint3 tIdx = cudaThreadIdx() + cudaBlockIdx() * cudaBlockDim();
// If the thread index is beyond the input size, exit early.
if (tIdx.x > t.size(0))
return;
[MaxIters(N - 1)]
for (int i = 0; i <= N - 1; i++)
{
output[tIdx.x, 0] = output[tIdx.x, 0] + nCi(N - 1, i) * pow((1 - t[tIdx.x]), (N - 1 - i)) * pow(t[tIdx.x], i) * control_pts[i, 0];
output[tIdx.x, 1] = output[tIdx.x, 1] + nCi(N - 1, i) * pow((1 - t[tIdx.x]), (N - 1 - i)) * pow(t[tIdx.x], i) * control_pts[i, 1];
}
}
[AutoPyBindCUDA]
[CudaKernel]
[Differentiable]
void compute_coeffs(DiffTensorView control_pts, DiffTensorView output) {
// Compute the coefficients a_i for t^i, for bezier polynomial \sum a_i . t^i
matrix<float, N, c> coeffs = compute_coeffs_device(control_pts);
for (int i = 0; i < N; i++)
for (int j = 0; j < c; j++)
output[i, j] = coeffs[i][j];
}
[AutoPyBindCUDA]
[CUDAKernel]
[Differentiable]
void bezier2DSDF(DiffTensorView xy, DiffTensorView control_pts, DiffTensorView output) {
// xy - M,c
// coeffs - N,c
// output - M,N1,N1 - matrix for each point at which SDF is to be evaluated
// coeffs - ,c
// Each thread computes the SDF value for a given xy coordinate from the determinant function above. Maybe change it up to be just differentiable, and not AutoPyBindCUDA
uint3 tIdx = cudaThreadIdx() + cudaBlockIdx() * cudaBlockDim();
matrix<float, N, c> coeffs = compute_coeffs_device(control_pts);
int M = xy.size(0); // xy - shaped M,2
if (tIdx.x > M) {
return;
}
float coord[c];
for (int i = 0; i < c; i++)
coord[i] = xy[tIdx.x, i];
for (int i = 0; i < c; i++)
coeffs[0][i] -= coord[i];
MatrixG<N1 * N1> mat;
mat = assemble_matrix_sdf(coeffs);
for (int i = 0; i < N1; i++)
for (int j = 0; j < N1; j++)
output[tIdx.x, i, j] = mat.vals[i*N1 + j];
}
In the python code,
import torch
import slangpy
N = 6
c = 2
m = slangpy.loadModule('bezier.slang', defines={"NUM_CTRL_PTS": N, "DIM":c})
num_pts = 1000
t = torch.linspace(0.0, 1, num_pts, dtype=torch.float).cuda()
control_pts = 1*torch.rand((N,2),dtype=torch.float).cuda()
control_pts[-1] = control_pts[0]
## Computing SDF for line function
num_points = 1000 # for example, 100 points along each axis
# Generate evenly spaced points between 0 and 1
px = torch.linspace(0.2, 1.0, num_points)
py = torch.linspace(0.5, 1.0, num_points)
# Create the meshgrid
x, y = torch.meshgrid(px, py.flip(dims=[0]), indexing='ij') # 'i
xy = torch.stack( [x,y], dim=-1 ).view(-1,2).cuda()
sdf_mats = torch.zeros(xy.shape[0], c*(N-1), c*(N-1)).cuda()
import time
num_iters = 10
t = time.time()
for i in range(num_iters):
m.bezier2DSDF(xy=xy, control_pts=control_pts, output=sdf_mats).launchRaw(blockSize=(1024, 1, 1), gridSize=(1024, 1, 1))
sdf_N = torch.linalg.det(sdf_mats)
sdf_N = torch.sign(sdf_N) * torch.sqrt(torch.abs(sdf_N))
print((time.time() - t)/num_iters)
## Computing Gradients
xy_grad = torch.zeros_like(xy).cuda()
control_pts_grad = torch.zeros_like(control_pts).cuda()
output_grad = torch.zeros_like(sdf_mats).cuda()
m.bezier2DSDF.bwd(xy=(xy, xy_grad), control_pts=(control_pts, control_pts_grad), output=(sdf_mats, output_grad)).launchRaw(blockSize=(1,1,1), gridSize=(1,1,1))
print(control_pts_grad) # This is where things hang.
When I call the .bwd function, it takes extremely long, and returns all 0s and NaNs. But the forward pass works correctly. My guess is that there's something wrong with how the derivatives are being chained through the MatrixG
data structure.
Hello!
By mistake, I added [Differentiable]
to the square
sample but I didn't change TensorView
to DiffTensorView
.
The result was a square.fwd()
that simply calculates square()
. I was wondering if there is ever a valid use case for doing this, and if not, if perhaps the compiler could catch this mistake? Because otherwise, if I pass a single input and single output, also by mistake, then there are no errors, but the result is definitely not the forward gradient :)
In detail, this shader...
[AutoPyBindCUDA]
[CUDAKernel]
[Differentiable]
void square(TensorView<float> input, TensorView<float> output)
{
// ...
output[dispatchIdx.x] = input[dispatchIdx.x] * input[dispatchIdx.x];
}
... compiles without error, but fwd
is not the gradient:
__device__ void s_fwd_square_0(TensorView input_2, TensorView output_2)
{
uint _S14 = (((blockIdx)) * ((blockDim)) + ((threadIdx))).x;
uint _S15 = ((input_2).sizes[(0U)]);
if(_S14 >= _S15)
{
return;
}
float _S16 = ((input_2).load<float>((_S14)));
float _S17 = ((input_2).load<float>((_S14)));
(output_2).store<float>((_S14), (_S16 * _S17)); // ?!
return;
}
extern "C" {
__global__ void __kernel__square_fwd_diff(TensorView _S18, TensorView _S19)
{
s_fwd_square_0(_S18, _S19);
return;
}
So in this python code, forward
is the same as the squared
numbers:
inputs = torch.tensor( (1, 2, 3, 4, 5, 6, 7, 8, 9, 10), dtype=torch.float).cuda()
squared = torch.zeros_like(inputs).cuda()
forward = torch.ones_like(inputs).cuda()
m.square(input=inputs, output=squared).launchRaw(blockSize=(32, 1, 1), gridSize=(64,1,1))
m.square.fwd(input=inputs, output=forward).launchRaw(blockSize=(32, 1, 1), gridSize=(64,1,1))
print(inputs)
print(squared)
print(forward)
Output:
tensor([ 1., 2., 3., 4., 5., 6., 7., 8., 9., 10.], device='cuda:0')
tensor([ 1., 4., 9., 16., 25., 36., 49., 64., 81., 100.], device='cuda:0')
tensor([ 1., 4., 9., 16., 25., 36., 49., 64., 81., 100.], device='cuda:0') # oh noes
Everything is fine with the correct signature in the shader...
[AutoPyBindCUDA]
[CUDAKernel]
[Differentiable]
void square(DiffTensorView<float> input, DiffTensorView<float> output)
{
uint3 dispatchIdx = cudaBlockIdx() * cudaBlockDim() + cudaThreadIdx();
if (dispatchIdx.x >= input.size(0))
return;
output[dispatchIdx.x] = input[dispatchIdx.x] * input[dispatchIdx.x];
}
... and called as ...
inputs = torch.tensor( (1, 2, 3, 4, 5, 6, 7, 8, 9, 10), dtype=torch.float).cuda()
input_grad = torch.ones_like(inputs).cuda()
squared = torch.zeros_like(inputs).cuda()
forward = torch.ones_like(inputs).cuda()
m.square(input=inputs, output=squared).launchRaw(blockSize=(32, 1, 1), gridSize=(64,1,1))
m.square.fwd(input=(inputs, input_grad), output=(squared,forward)).launchRaw(blockSize=(32, 1, 1), gridSize=(64,1,1))
So I was wondering if this user error could be caught? Perhaps it's also useful to add a fwd
example to the docs to make it very clear that, just like bwd
, it expects a pair?
Thanks!
bert
(Edit: rewrote for clarity)
Windows 11
Python 3.12.2 with installer
Slangpy 1.1.22
I am running into an issue where slang's cache is frequently pulling in stale code, for example throwing up warnings for code I deleted since the last run.
It could be that the mechanism used to detect changes to the .slang
source files is faulty?
I am getting an error when I am running python mlp_image_fit.py (see bellow)
To recreate:
build docker:
docker build -t slang-python .
with Dockerfile:
FROM nvcr.io/nvidia/pytorch:23.10-py3
WORKDIR /
RUN git clone https://github.com/shader-slang/slang-python.git
WORKDIR /slang-python
RUN apt-get update
RUN apt-get install -y python3.10-venv
COPY build-package.sh .
RUN bash ./build-package.sh
WORKDIR /slang-python/dist
RUN pip install *.whl
WORKDIR /slang-python/
RUN apt-get install -y ffmpeg
and with build-package.sh file (same directory):
wget https://github.com/shader-slang/slang/releases/download/v2023.4.9/slang-2023.4.9-linux-x86_64.zip
LINUX64ZIP=slang-2023.4.9-linux-x86_64.zip
mkdir -p ./tmp
echo "extracting $LINUX64ZIP"
unzip -n $LINUX64ZIP -d ./tmp
mkdir -p ./slangpy/bin/
cp ./tmp/bin/linux-x64/release/libslang.so ./slangpy/bin/libslang.so
cp ./tmp/bin/linux-x64/release/slangc ./slangpy/bin/slangc
chmod +x ./slangpy/bin/slangc
echo "content of ./slangpy/bin/:"
ls ./slangpy/bin/
rm $LINUX64ZIP
rm -rf ./tmp/
python -m pip install --upgrade pip
pip install build hatchling
python -m build
you will get the following error
/slang-python/examples/inline-mlp-example/cuda_matmul_prelude.cuh(25): error: name followed by "::" must be a class or namespace name
nvcuda::wmma::fragment<nvcuda::wmma::matrix_a, M, N, K, nvcuda::wmma::precision::tf32, nvcuda::wmma::row_major> a_frag;
^
/slang-python/examples/inline-mlp-example/cuda_matmul_prelude.cuh(25): error: name followed by "::" must be a class or namespace name
nvcuda::wmma::fragment<nvcuda::wmma::matrix_a, M, N, K, nvcuda::wmma::precision::tf32, nvcuda::wmma::row_major> a_frag;
^
/slang-python/examples/inline-mlp-example/cuda_matmul_prelude.cuh(25): error: name followed by "::" must be a class or namespace name
nvcuda::wmma::fragment<nvcuda::wmma::matrix_a, M, N, K, nvcuda::wmma::precision::tf32, nvcuda::wmma::row_major> a_frag;
^
/slang-python/examples/inline-mlp-example/cuda_matmul_prelude.cuh(25): error: identifier "a_frag" is undefined
nvcuda::wmma::fragment<nvcuda::wmma::matrix_a, M, N, K, nvcuda::wmma::precision::tf32, nvcuda::wmma::row_major> a_frag;
^
/slang-python/examples/inline-mlp-example/cuda_matmul_prelude.cuh(26): error: name followed by "::" must be a class or namespace name
nvcuda::wmma::fragment<nvcuda::wmma::matrix_b, M, N, K, nvcuda::wmma::precision::tf32, nvcuda::wmma::col_major> b_frag;
^
/slang-python/examples/inline-mlp-example/cuda_matmul_prelude.cuh(26): error: name followed by "::" must be a class or namespace name
nvcuda::wmma::fragment<nvcuda::wmma::matrix_b, M, N, K, nvcuda::wmma::precision::tf32, nvcuda::wmma::col_major> b_frag;
^
/slang-python/examples/inline-mlp-example/cuda_matmul_prelude.cuh(26): error: name followed by "::" must be a class or namespace name
nvcuda::wmma::fragment<nvcuda::wmma::matrix_b, M, N, K, nvcuda::wmma::precision::tf32, nvcuda::wmma::col_major> b_frag;
^
/slang-python/examples/inline-mlp-example/cuda_matmul_prelude.cuh(26): error: identifier "b_frag" is undefined
nvcuda::wmma::fragment<nvcuda::wmma::matrix_b, M, N, K, nvcuda::wmma::precision::tf32, nvcuda::wmma::col_major> b_frag;
^
/slang-python/examples/inline-mlp-example/cuda_matmul_prelude.cuh(27): error: name followed by "::" must be a class or namespace name
nvcuda::wmma::fragment<nvcuda::wmma::accumulator, M, N, K, float> acc_frag;
^
/slang-python/examples/inline-mlp-example/cuda_matmul_prelude.cuh(27): error: type name is not allowed
nvcuda::wmma::fragment<nvcuda::wmma::accumulator, M, N, K, float> acc_frag;
^
/slang-python/examples/inline-mlp-example/cuda_matmul_prelude.cuh(27): error: identifier "acc_frag" is undefined
nvcuda::wmma::fragment<nvcuda::wmma::accumulator, M, N, K, float> acc_frag;
^
/slang-python/examples/inline-mlp-example/cuda_matmul_prelude.cuh(45): error: name followed by "::" must be a class or namespace name
nvcuda::wmma::fill_fragment(acc_frag, 0.0f);
^
/slang-python/examples/inline-mlp-example/cuda_matmul_prelude.cuh(54): error: name followed by "::" must be a class or namespace name
nvcuda::wmma::load_matrix_sync(a_frag, in + tk + ti * lda, lda);
^
/slang-python/examples/inline-mlp-example/cuda_matmul_prelude.cuh(55): error: name followed by "::" must be a class or namespace name
nvcuda::wmma::load_matrix_sync(b_frag, wt + tk + tj * ldb, ldb);
^
/slang-python/examples/inline-mlp-example/cuda_matmul_prelude.cuh(59): error: name followed by "::" must be a class or namespace name
a_frag.x[t] = nvcuda::wmma::__float_to_tf32(a_frag.x[t]);
^
/slang-python/examples/inline-mlp-example/cuda_matmul_prelude.cuh(63): error: name followed by "::" must be a class or namespace name
b_frag.x[t] = nvcuda::wmma::__float_to_tf32(b_frag.x[t]);
^
/slang-python/examples/inline-mlp-example/cuda_matmul_prelude.cuh(66): error: name followed by "::" must be a class or namespace name
nvcuda::wmma::mma_sync(acc_frag, a_frag, b_frag, acc_frag);
^
/slang-python/examples/inline-mlp-example/cuda_matmul_prelude.cuh(70): error: name followed by "::" must be a class or namespace name
nvcuda::wmma::store_matrix_sync(out + tj + ti * O, acc_frag, O, nvcuda::wmma::mem_row_major);
^
/slang-python/examples/inline-mlp-example/cuda_matmul_prelude.cuh(70): error: name followed by "::" must be a class or namespace name
nvcuda::wmma::store_matrix_sync(out + tj + ti * O, acc_frag, O, nvcuda::wmma::mem_row_major);
^
19 errors detected in the compilation of "/slang-python/examples/inline-mlp-example/.slangpy_cache/image-model/1426d7b0fdc875a7/image-model_cuda.cu".
ninja: build stopped: subcommand failed.
Hello, I was wondering if the gradient_output
in the BRDF example perhaps needs a zero_()
in the learning inner loop (ie. before calling m.brdf_loss.bwd
) ? Similar to calling optimizer.zero_grad()
in PyTorch.
Otherwise, wouldn't the code accumulate the gradient with each sample, while also immediately applying it?
Apologies if this is intentional, or a zero-fill is already implied somewhere, or I misunderstood :)
Thanks!
bert
PS Sorry, I don't have Jupyter set up to test a merge request.
for i in range(10000):
L = random_hemi_vector()
V = (0.0, 0.0, 1.0)
input_params = (*L, *V)
loss_output = torch.zeros((original_shape[0], original_shape[1], 1)).cuda()
output_grad = torch.ones_like(loss_output).cuda()
m.brdf(input=full_res_brdf,
output=lighting_from_full_res_brdf,
input_params=input_params).launchRaw(blockSize=block_size, gridSize=grid_size)
gradient_output.zero_() # ++++++++++++++++++++++
m.brdf_loss.bwd(input=(half_res_brdf, gradient_output),
output=(loss_output, output_grad),
reference=lighting_from_full_res_brdf,
input_params=input_params).launchRaw(blockSize=block_size, gridSize=grid_size)
# Clip gradients and prevent.
gradient_output = torch.nan_to_num(gradient_output, 0.0)
gradient_output = torch.clamp(gradient_output, -1.0, 1.0)
half_res_brdf = torch.clip(half_res_brdf - 0.001 * gradient_output, 0.0001, 1.0)
Hi, here are my code and script, I can't pass grad when using DiffTensorView in struct
struct Reservoir
{
DiffTensorView<float> input0;
DiffTensorView<float> input1;
};
[AutoPyBindCUDA]
[CUDAKernel]
[Differentiable]
void square(Reservoir input, DiffTensorView output)
{
uint3 dispatchIdx = cudaThreadIdx() + cudaBlockIdx() * cudaBlockDim();
if (dispatchIdx.x >= output.size(0) || dispatchIdx.y >= output.size(1)) return;
output[dispatchIdx.x, dispatchIdx.y] = input.input0[dispatchIdx.x, dispatchIdx.y] * input.input0[dispatchIdx.x, dispatchIdx.y];
output[dispatchIdx.x, dispatchIdx.y] += input.input1[dispatchIdx.x, dispatchIdx.y] * input.input1[dispatchIdx.x, dispatchIdx.y];
}
import torch
import slangpy
m = slangpy.loadModule("D:/codes/python/slang_test/test_struct/square.slang")
class MySquareFunc(torch.autograd.Function):
@staticmethod
def forward(ctx, input):
input['input0'] = input['input0'].contiguous()
input['input1'] = input['input1'].contiguous()
output = torch.zeros_like(input['input0'])
output = output.contiguous()
kernel_with_args = m.square(input=input, output=output)
kernel_with_args.launchRaw(
blockSize=(32, 32, 1),
gridSize=((output.shape[0] + 31) // 32, (output.shape[1] + 31) // 32, 1))
ctx.save_for_backward(input, output)
return output
@staticmethod
def backward(ctx, grad_output):
(input, output) = ctx.saved_tensors
input_grad = torch.zeros_like(input)
grad_output = grad_output.contiguous()
# Note: When using DiffTensorView, grad_output gets 'consumed' during the reverse-mode.
# If grad_output may be reused, consider calling grad_output = grad_output.clone()
#
kernel_with_args = m.square.bwd(input=(input, input_grad), output=(output, grad_output))
kernel_with_args.launchRaw(
blockSize=(32, 32, 1),
gridSize=((output.shape[0] + 31) // 32, (output.shape[1] + 31) // 32, 1))
return input_grad
x = torch.tensor([[3.0, 4.0],[0.0, 1.0]], requires_grad=True, device='cuda')
y = torch.tensor([[5.0, 6.0],[7.0, 0.0]], requires_grad=True, device='cuda')
#print(f"X = {x}")
input = {'input0': x, 'input1': y}
y_pred = MySquareFunc.apply(input)
loss = y_pred.sum()
loss.backward()
print(f"dX = {x.grad.cpu()}")
print(f"dy = {y.grad.cpu()}")
errorelement 0 of tensors does not require grad and does not have a grad_fn
occurs with loss.backward(),
I tried to add IDifferentiable for the struct, but it can't complie right
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.