shader-slang / slang-python Goto Github PK

View Code? Open in Web Editor NEW

26.0 8.0 4.0 7.01 MB

Superseded by github.com/shader-slang/slang-torch

License: MIT License

Python 98.70% Shell 1.07% Batchfile 0.13% Cuda 0.10%

slang-python's Introduction

This package is renamed to slang-torch and this repository is archieved.

Please visit https://github.com/shader-slang/slang-torch for latest development of this package.

slang-python's People

Contributors

Stargazers

Watchers

Forkers

bartwronskinv kenshi84 aroidzap taiki-yoshino

slang-python's Issues

[Feature Request]

Would it be useful in the slang framework to compute determinants of matrices greater than 4x4 ? In the current code base the determinant function is specialized only upto 4x4 matrices.

how to use determinant function from slang in a slangpy kernel.

// Compute determinant function
[AutoPyBindCUDA]
[CUDAKernel]
[Differentiable]
void compute_determinant(DiffTensorView mat, DiffTensorView output) {
    matrix<float, N, N> mat_s;
    float det;
    [MaxIters(N)]
    for (int i = 0; i < N; i++) {
        [MaxIters(N)]
        for (int j = 0; j < N; j++) {
            mat_s[i][j] = mat[i, j];
		}
    }
    output[0] = determinant(mat_s);

N is defined correctly as a static compile time constant.

I get the following error message on trying to load this function - Essentially in the .cu file, the slang determinant function is not accessible. Is there a way around this, other than writing my own determinant function in the .slang file?

{
	"name": "RuntimeError",
	"message": "Error building extension '_slangpy_bezier_compute_44136fa355b3678a': [1/3] /usr/local/cuda-11.7/bin/nvcc  -DTORCH_EXTENSION_NAME=_slangpy_bezier_compute_44136fa355b3678a -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\\\"_gcc\\\" -DPYBIND11_STDLIB=\\\"_libstdcpp\\\" -DPYBIND11_BUILD_ABI=\\\"_cxxabi1011\\\" -I/home/ubuntu/slang-python/examples/bezier2d -isystem /home/ubuntu/anaconda3/envs/gaussian_splatting/lib/python3.8/site-packages/torch/include -isystem /home/ubuntu/anaconda3/envs/gaussian_splatting/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /home/ubuntu/anaconda3/envs/gaussian_splatting/lib/python3.8/site-packages/torch/include/TH -isystem /home/ubuntu/anaconda3/envs/gaussian_splatting/lib/python3.8/site-packages/torch/include/THC -isystem /usr/local/cuda-11.7/include -isystem /home/ubuntu/anaconda3/envs/gaussian_splatting/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -std=c++17 -c /home/ubuntu/slang-python/examples/bezier2d/.slangpy_cache/bezier_compute/44136fa355b3678a/bezier_compute_cuda.cu -o bezier_compute_cuda.cuda.o \nFAILED: bezier_compute_cuda.cuda.o \n/usr/local/cuda-11.7/bin/nvcc  -DTORCH_EXTENSION_NAME=_slangpy_bezier_compute_44136fa355b3678a -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\\\"_gcc\\\" -DPYBIND11_STDLIB=\\\"_libstdcpp\\\" -DPYBIND11_BUILD_ABI=\\\"_cxxabi1011\\\" -I/home/ubuntu/slang-python/examples/bezier2d -isystem /home/ubuntu/anaconda3/envs/gaussian_splatting/lib/python3.8/site-packages/torch/include -isystem /home/ubuntu/anaconda3/envs/gaussian_splatting/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /home/ubuntu/anaconda3/envs/gaussian_splatting/lib/python3.8/site-packages/torch/include/TH -isystem /home/ubuntu/anaconda3/envs/gaussian_splatting/lib/python3.8/site-packages/torch/include/THC -isystem /usr/local/cuda-11.7/include -isystem /home/ubuntu/anaconda3/envs/gaussian_splatting/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -std=c++17 -c /home/ubuntu/slang-python/examples/bezier2d/.slangpy_cache/bezier_compute/44136fa355b3678a/bezier_compute_cuda.cu -o bezier_compute_cuda.cuda.o \n/home/ubuntu/slang-python/examples/bezier2d/.slangpy_cache/bezier_compute/44136fa355b3678a/bezier_compute_cuda.cu(2372): warning #550-D: variable \"oldVal_0\" was set but never used\n\n/home/ubuntu/slang-python/examples/bezier2d/.slangpy_cache/bezier_compute/44136fa355b3678a/bezier_compute_cuda.cu(2379): warning #550-D: variable \"oldVal_1\" was set but never used\n\n/home/ubuntu/slang-python/examples/bezier2d/.slangpy_cache/bezier_compute/44136fa355b3678a/bezier_compute_cuda.cu(3036): error: identifier \"determinant\" is undefined\n\n/home/ubuntu/slang-python/examples/bezier2d/.slangpy_cache/bezier_compute/44136fa355b3678a/bezier_compute_cuda.cu(3517): error: identifier \"determinant\" is undefined\n\n2 errors detected in the compilation of \"/home/ubuntu/slang-python/examples/bezier2d/.slangpy_cache/bezier_compute/44136fa355b3678a/bezier_compute_cuda.cu\".\n[2/3] c++ -MMD -MF bezier_compute.o.d -DTORCH_EXTENSION_NAME=_slangpy_bezier_compute_44136fa355b3678a -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\\\"_gcc\\\" -DPYBIND11_STDLIB=\\\"_libstdcpp\\\" -DPYBIND11_BUILD_ABI=\\\"_cxxabi1011\\\" -I/home/ubuntu/slang-python/examples/bezier2d -isystem /home/ubuntu/anaconda3/envs/gaussian_splatting/lib/python3.8/site-packages/torch/include -isystem /home/ubuntu/anaconda3/envs/gaussian_splatting/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /home/ubuntu/anaconda3/envs/gaussian_splatting/lib/python3.8/site-packages/torch/include/TH -isystem /home/ubuntu/anaconda3/envs/gaussian_splatting/lib/python3.8/site-packages/torch/include/THC -isystem /usr/local/cuda-11.7/include -isystem /home/ubuntu/anaconda3/envs/gaussian_splatting/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -std=c++17 -c /home/ubuntu/slang-python/examples/bezier2d/.slangpy_cache/bezier_compute/44136fa355b3678a/bezier_compute.cpp -o bezier_compute.o \n/home/ubuntu/slang-python/examples/bezier2d/.slangpy_cache/bezier_compute/44136fa355b3678a/bezier_compute.cpp:8398:9: warning: #pragma once in main file\n 8398 | #pragma once\n      |         ^~~~\nninja: build stopped: subcommand failed.\n",
	"stack": "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m\n\u001b[0;31mCalledProcessError\u001b[0m                        Traceback (most recent call last)\nFile \u001b[0;32m~/anaconda3/envs/gaussian_splatting/lib/python3.8/site-packages/torch/utils/cpp_extension.py:1900\u001b[0m, in \u001b[0;36m_run_ninja_build\u001b[0;34m(build_directory, verbose, error_prefix)\u001b[0m\n\u001b[1;32m   1899\u001b[0m     stdout_fileno \u001b[38;5;241m=\u001b[39m \u001b[38;5;241m1\u001b[39m\n\u001b[0;32m-> 1900\u001b[0m     \u001b[43msubprocess\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mrun\u001b[49m\u001b[43m(\u001b[49m\n\u001b[1;32m   1901\u001b[0m \u001b[43m        \u001b[49m\u001b[43mcommand\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m   1902\u001b[0m \u001b[43m        \u001b[49m\u001b[43mstdout\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mstdout_fileno\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;28;43;01mif\u001b[39;49;00m\u001b[43m \u001b[49m\u001b[43mverbose\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;28;43;01melse\u001b[39;49;00m\u001b[43m \u001b[49m\u001b[43msubprocess\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mPIPE\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m   1903\u001b[0m \u001b[43m        \u001b[49m\u001b[43mstderr\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43msubprocess\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mSTDOUT\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m   1904\u001b[0m \u001b[43m        \u001b[49m\u001b[43mcwd\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mbuild_directory\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m   1905\u001b[0m \u001b[43m        \u001b[49m\u001b[43mcheck\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[38;5;28;43;01mTrue\u001b[39;49;00m\u001b[43m,\u001b[49m\n\u001b[1;32m   1906\u001b[0m \u001b[43m        \u001b[49m\u001b[43menv\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43menv\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m   1907\u001b[0m \u001b[38;5;28;01mexcept\u001b[39;00m subprocess\u001b[38;5;241m.\u001b[39mCalledProcessError \u001b[38;5;28;01mas\u001b[39;00m e:\n\u001b[1;32m   1908\u001b[0m     \u001b[38;5;66;03m# Python 2 and 3 compatible way of getting the error object.\u001b[39;00m\n\nFile \u001b[0;32m~/anaconda3/envs/gaussian_splatting/lib/python3.8/subprocess.py:516\u001b[0m, in \u001b[0;36mrun\u001b[0;34m(input, capture_output, timeout, check, *popenargs, **kwargs)\u001b[0m\n\u001b[1;32m    515\u001b[0m     \u001b[38;5;28;01mif\u001b[39;00m check \u001b[38;5;129;01mand\u001b[39;00m retcode:\n\u001b[0;32m--> 516\u001b[0m         \u001b[38;5;28;01mraise\u001b[39;00m CalledProcessError(retcode, process\u001b[38;5;241m.\u001b[39margs,\n\u001b[1;32m    517\u001b[0m                                  output\u001b[38;5;241m=\u001b[39mstdout, stderr\u001b[38;5;241m=\u001b[39mstderr)\n\u001b[1;32m    518\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m CompletedProcess(process\u001b[38;5;241m.\u001b[39margs, retcode, stdout, stderr)\n\n\u001b[0;31mCalledProcessError\u001b[0m: Command '['ninja', '-v']' returned non-zero exit status 1.\n\nThe above exception was the direct cause of the following exception:\n\n\u001b[0;31mRuntimeError\u001b[0m                              Traceback (most recent call last)\nCell \u001b[0;32mIn [12], line 4\u001b[0m\n\u001b[1;32m      1\u001b[0m \u001b[38;5;28;01mimport\u001b[39;00m \u001b[38;5;21;01mtorch\u001b[39;00m\n\u001b[1;32m      2\u001b[0m \u001b[38;5;28;01mimport\u001b[39;00m \u001b[38;5;21;01mslangpy\u001b[39;00m\n\u001b[0;32m----> 4\u001b[0m m \u001b[38;5;241m=\u001b[39m \u001b[43mslangpy\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mloadModule\u001b[49m\u001b[43m(\u001b[49m\u001b[38;5;124;43m'\u001b[39;49m\u001b[38;5;124;43mbezier_compute.slang\u001b[39;49m\u001b[38;5;124;43m'\u001b[39;49m\u001b[43m)\u001b[49m\n\u001b[1;32m      6\u001b[0m N \u001b[38;5;241m=\u001b[39m \u001b[38;5;241m1000\u001b[39m\n\u001b[1;32m      7\u001b[0m t \u001b[38;5;241m=\u001b[39m torch\u001b[38;5;241m.\u001b[39mlinspace(\u001b[38;5;241m0.0\u001b[39m,\u001b[38;5;241m1\u001b[39m,N, dtype\u001b[38;5;241m=\u001b[39mtorch\u001b[38;5;241m.\u001b[39mfloat)\u001b[38;5;241m.\u001b[39mcuda()\n\nFile \u001b[0;32m~/anaconda3/envs/gaussian_splatting/lib/python3.8/site-packages/slangpy/slangpy.py:616\u001b[0m, in \u001b[0;36mloadModule\u001b[0;34m(fileName, skipSlang, verbose, defines, includePaths)\u001b[0m\n\u001b[1;32m    613\u001b[0m     \u001b[38;5;28;01mif\u001b[39;00m verbose:\n\u001b[1;32m    614\u001b[0m         \u001b[38;5;28mprint\u001b[39m(\u001b[38;5;124mf\u001b[39m\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mWorking folder: \u001b[39m\u001b[38;5;132;01m{\u001b[39;00mbuildDir\u001b[38;5;132;01m}\u001b[39;00m\u001b[38;5;124m\"\u001b[39m, file\u001b[38;5;241m=\u001b[39msys\u001b[38;5;241m.\u001b[39mstderr)\n\u001b[0;32m--> 616\u001b[0m     rawModule \u001b[38;5;241m=\u001b[39m \u001b[43m_loadModule\u001b[49m\u001b[43m(\u001b[49m\u001b[43mfileName\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mmoduleName\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mbuildDir\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43moptions\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43msourceDir\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43moutputFolder\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mverbose\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mverbose\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mincludePaths\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mincludePaths\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mdryRun\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[38;5;28;43;01mFalse\u001b[39;49;00m\u001b[43m)\u001b[49m\n\u001b[1;32m    617\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m wrapModule(rawModule)\n\nFile \u001b[0;32m~/anaconda3/envs/gaussian_splatting/lib/python3.8/site-packages/slangpy/slangpy.py:535\u001b[0m, in \u001b[0;36m_loadModule\u001b[0;34m(fileName, moduleName, outputFolder, options, sourceDir, verbose, includePaths, dryRun)\u001b[0m\n\u001b[1;32m    532\u001b[0m \u001b[38;5;66;03m# Compile host and kernel modules to torch module.\u001b[39;00m\n\u001b[1;32m    533\u001b[0m downstreamStartTime \u001b[38;5;241m=\u001b[39m time\u001b[38;5;241m.\u001b[39mperf_counter()\n\u001b[0;32m--> 535\u001b[0m slangLib, metadata \u001b[38;5;241m=\u001b[39m \u001b[43mcompileAndLoadModule\u001b[49m\u001b[43m(\u001b[49m\n\u001b[1;32m    536\u001b[0m \u001b[43m    \u001b[49m\u001b[43mmetadata\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43m[\u001b[49m\u001b[43mcppOutName\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mcudaOutName\u001b[49m\u001b[43m]\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\n\u001b[1;32m    537\u001b[0m \u001b[43m    \u001b[49m\u001b[43mmoduleName\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43moutputFolder\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mslangSourceDir\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m    538\u001b[0m \u001b[43m    \u001b[49m\u001b[43mverbose\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mdryRun\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mdryRun\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m    540\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m dryRun:\n\u001b[1;32m    541\u001b[0m     \u001b[38;5;28;01mif\u001b[39;00m slangLib:\n\nFile \u001b[0;32m~/anaconda3/envs/gaussian_splatting/lib/python3.8/site-packages/slangpy/slangpy.py:426\u001b[0m, in \u001b[0;36mcompileAndLoadModule\u001b[0;34m(metadata, sources, moduleName, buildDir, slangSourceDir, verbose, dryRun)\u001b[0m\n\u001b[1;32m    423\u001b[0m     \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;28;01mTrue\u001b[39;00m, \u001b[38;5;28;01mNone\u001b[39;00m\n\u001b[1;32m    425\u001b[0m \u001b[38;5;66;03m# Compile the module.\u001b[39;00m\n\u001b[0;32m--> 426\u001b[0m slangLib \u001b[38;5;241m=\u001b[39m \u001b[43m_compileAndLoadModule\u001b[49m\u001b[43m(\u001b[49m\u001b[43mmetadata\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43msources\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mmoduleName\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mbuildDir\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mslangSourceDir\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mverbose\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m    428\u001b[0m newMetadata \u001b[38;5;241m=\u001b[39m metadata\u001b[38;5;241m.\u001b[39mcopy()\n\u001b[1;32m    429\u001b[0m newMetadata[\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mmoduleName\u001b[39m\u001b[38;5;124m\"\u001b[39m] \u001b[38;5;241m=\u001b[39m moduleName\n\nFile \u001b[0;32m~/anaconda3/envs/gaussian_splatting/lib/python3.8/site-packages/slangpy/slangpy.py:465\u001b[0m, in \u001b[0;36m_compileAndLoadModule\u001b[0;34m(metadata, sources, moduleName, buildDir, slangSourceDir, verbose)\u001b[0m\n\u001b[1;32m    462\u001b[0m \u001b[38;5;28;01melse\u001b[39;00m:\n\u001b[1;32m    463\u001b[0m     extra_include_paths \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;01mNone\u001b[39;00m\n\u001b[0;32m--> 465\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[43mjit_compile\u001b[49m\u001b[43m(\u001b[49m\n\u001b[1;32m    466\u001b[0m \u001b[43m    \u001b[49m\u001b[43mmoduleName\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m    467\u001b[0m \u001b[43m    \u001b[49m\u001b[43msources\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m    468\u001b[0m \u001b[43m    \u001b[49m\u001b[43mextra_cflags\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mextra_cflags\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m    469\u001b[0m \u001b[43m    \u001b[49m\u001b[43mextra_cuda_cflags\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mextra_cuda_cflags\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m    470\u001b[0m \u001b[43m    \u001b[49m\u001b[43mextra_ldflags\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[38;5;28;43;01mNone\u001b[39;49;00m\u001b[43m,\u001b[49m\n\u001b[1;32m    471\u001b[0m \u001b[43m    \u001b[49m\u001b[43mextra_include_paths\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mextra_include_paths\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m    472\u001b[0m \u001b[43m    \u001b[49m\u001b[43mbuild_directory\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mos\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mpath\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mrealpath\u001b[49m\u001b[43m(\u001b[49m\u001b[43mbuildDir\u001b[49m\u001b[43m)\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m    473\u001b[0m \u001b[43m    \u001b[49m\u001b[43mverbose\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mverbose\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m    474\u001b[0m \u001b[43m    \u001b[49m\u001b[43mis_python_module\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[38;5;28;43;01mTrue\u001b[39;49;00m\u001b[43m,\u001b[49m\n\u001b[1;32m    475\u001b[0m \u001b[43m    \u001b[49m\u001b[43mis_standalone\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[38;5;28;43;01mFalse\u001b[39;49;00m\u001b[43m,\u001b[49m\n\u001b[1;32m    476\u001b[0m \u001b[43m    \u001b[49m\u001b[43mkeep_intermediates\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[38;5;28;43;01mTrue\u001b[39;49;00m\u001b[43m,\u001b[49m\n\u001b[1;32m    477\u001b[0m \u001b[43m    \u001b[49m\u001b[43mwith_cuda\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[38;5;28;43;01mNone\u001b[39;49;00m\u001b[43m)\u001b[49m\n\nFile \u001b[0;32m~/anaconda3/envs/gaussian_splatting/lib/python3.8/site-packages/slangpy/util/compile.py:71\u001b[0m, in \u001b[0;36mjit_compile\u001b[0;34m(name, sources, extra_cflags, extra_cuda_cflags, extra_ldflags, extra_include_paths, build_directory, verbose, with_cuda, is_python_module, is_standalone, keep_intermediates)\u001b[0m\n\u001b[1;32m     67\u001b[0m                 hipified_sources\u001b[38;5;241m.\u001b[39madd(hipify_result[s_abs][\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mhipified_path\u001b[39m\u001b[38;5;124m\"\u001b[39m] \u001b[38;5;28;01mif\u001b[39;00m s_abs \u001b[38;5;129;01min\u001b[39;00m hipify_result \u001b[38;5;28;01melse\u001b[39;00m s_abs)\n\u001b[1;32m     69\u001b[0m             sources \u001b[38;5;241m=\u001b[39m \u001b[38;5;28mlist\u001b[39m(hipified_sources)\n\u001b[0;32m---> 71\u001b[0m         \u001b[43m_write_ninja_file_and_build_library\u001b[49m\u001b[43m(\u001b[49m\n\u001b[1;32m     72\u001b[0m \u001b[43m            \u001b[49m\u001b[43mname\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mname\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m     73\u001b[0m \u001b[43m            \u001b[49m\u001b[43msources\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43msources\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m     74\u001b[0m \u001b[43m            \u001b[49m\u001b[43mextra_cflags\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mextra_cflags\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;129;43;01mor\u001b[39;49;00m\u001b[43m \u001b[49m\u001b[43m[\u001b[49m\u001b[43m]\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m     75\u001b[0m \u001b[43m            \u001b[49m\u001b[43mextra_cuda_cflags\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mextra_cuda_cflags\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;129;43;01mor\u001b[39;49;00m\u001b[43m \u001b[49m\u001b[43m[\u001b[49m\u001b[43m]\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m     76\u001b[0m \u001b[43m            \u001b[49m\u001b[43mextra_ldflags\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mextra_ldflags\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;129;43;01mor\u001b[39;49;00m\u001b[43m \u001b[49m\u001b[43m[\u001b[49m\u001b[43m]\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m     77\u001b[0m \u001b[43m            \u001b[49m\u001b[43mextra_include_paths\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mextra_include_paths\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;129;43;01mor\u001b[39;49;00m\u001b[43m \u001b[49m\u001b[43m[\u001b[49m\u001b[43m]\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m     78\u001b[0m \u001b[43m            \u001b[49m\u001b[43mbuild_directory\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mbuild_directory\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m     79\u001b[0m \u001b[43m            \u001b[49m\u001b[43mverbose\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mverbose\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m     80\u001b[0m \u001b[43m            \u001b[49m\u001b[43mwith_cuda\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mwith_cuda\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m     81\u001b[0m \u001b[43m            \u001b[49m\u001b[43mis_standalone\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mis_standalone\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m     82\u001b[0m \u001b[38;5;28;01mfinally\u001b[39;00m:\n\u001b[1;32m     83\u001b[0m     baton\u001b[38;5;241m.\u001b[39mrelease()\n\nFile \u001b[0;32m~/anaconda3/envs/gaussian_splatting/lib/python3.8/site-packages/torch/utils/cpp_extension.py:1623\u001b[0m, in \u001b[0;36m_write_ninja_file_and_build_library\u001b[0;34m(name, sources, extra_cflags, extra_cuda_cflags, extra_ldflags, extra_include_paths, build_directory, verbose, with_cuda, is_standalone)\u001b[0m\n\u001b[1;32m   1621\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m verbose:\n\u001b[1;32m   1622\u001b[0m     \u001b[38;5;28mprint\u001b[39m(\u001b[38;5;124mf\u001b[39m\u001b[38;5;124m'\u001b[39m\u001b[38;5;124mBuilding extension module \u001b[39m\u001b[38;5;132;01m{\u001b[39;00mname\u001b[38;5;132;01m}\u001b[39;00m\u001b[38;5;124m...\u001b[39m\u001b[38;5;124m'\u001b[39m, file\u001b[38;5;241m=\u001b[39msys\u001b[38;5;241m.\u001b[39mstderr)\n\u001b[0;32m-> 1623\u001b[0m \u001b[43m_run_ninja_build\u001b[49m\u001b[43m(\u001b[49m\n\u001b[1;32m   1624\u001b[0m \u001b[43m    \u001b[49m\u001b[43mbuild_directory\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m   1625\u001b[0m \u001b[43m    \u001b[49m\u001b[43mverbose\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m   1626\u001b[0m \u001b[43m    \u001b[49m\u001b[43merror_prefix\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[38;5;124;43mf\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[38;5;124;43mError building extension \u001b[39;49m\u001b[38;5;124;43m'\u001b[39;49m\u001b[38;5;132;43;01m{\u001b[39;49;00m\u001b[43mname\u001b[49m\u001b[38;5;132;43;01m}\u001b[39;49;00m\u001b[38;5;124;43m'\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[43m)\u001b[49m\n\nFile \u001b[0;32m~/anaconda3/envs/gaussian_splatting/lib/python3.8/site-packages/torch/utils/cpp_extension.py:1916\u001b[0m, in \u001b[0;36m_run_ninja_build\u001b[0;34m(build_directory, verbose, error_prefix)\u001b[0m\n\u001b[1;32m   1914\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;28mhasattr\u001b[39m(error, \u001b[38;5;124m'\u001b[39m\u001b[38;5;124moutput\u001b[39m\u001b[38;5;124m'\u001b[39m) \u001b[38;5;129;01mand\u001b[39;00m error\u001b[38;5;241m.\u001b[39moutput:  \u001b[38;5;66;03m# type: ignore[union-attr]\u001b[39;00m\n\u001b[1;32m   1915\u001b[0m     message \u001b[38;5;241m+\u001b[39m\u001b[38;5;241m=\u001b[39m \u001b[38;5;124mf\u001b[39m\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124m: \u001b[39m\u001b[38;5;132;01m{\u001b[39;00merror\u001b[38;5;241m.\u001b[39moutput\u001b[38;5;241m.\u001b[39mdecode(\u001b[38;5;241m*\u001b[39mSUBPROCESS_DECODE_ARGS)\u001b[38;5;132;01m}\u001b[39;00m\u001b[38;5;124m\"\u001b[39m  \u001b[38;5;66;03m# type: ignore[union-attr]\u001b[39;00m\n\u001b[0;32m-> 1916\u001b[0m \u001b[38;5;28;01mraise\u001b[39;00m \u001b[38;5;167;01mRuntimeError\u001b[39;00m(message) \u001b[38;5;28;01mfrom\u001b[39;00m \u001b[38;5;21;01me\u001b[39;00m\n\n\u001b[0;31mRuntimeError\u001b[0m: Error building extension '_slangpy_bezier_compute_44136fa355b3678a': [1/3] /usr/local/cuda-11.7/bin/nvcc  -DTORCH_EXTENSION_NAME=_slangpy_bezier_compute_44136fa355b3678a -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\\\"_gcc\\\" -DPYBIND11_STDLIB=\\\"_libstdcpp\\\" -DPYBIND11_BUILD_ABI=\\\"_cxxabi1011\\\" -I/home/ubuntu/slang-python/examples/bezier2d -isystem /home/ubuntu/anaconda3/envs/gaussian_splatting/lib/python3.8/site-packages/torch/include -isystem /home/ubuntu/anaconda3/envs/gaussian_splatting/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /home/ubuntu/anaconda3/envs/gaussian_splatting/lib/python3.8/site-packages/torch/include/TH -isystem /home/ubuntu/anaconda3/envs/gaussian_splatting/lib/python3.8/site-packages/torch/include/THC -isystem /usr/local/cuda-11.7/include -isystem /home/ubuntu/anaconda3/envs/gaussian_splatting/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -std=c++17 -c /home/ubuntu/slang-python/examples/bezier2d/.slangpy_cache/bezier_compute/44136fa355b3678a/bezier_compute_cuda.cu -o bezier_compute_cuda.cuda.o \nFAILED: bezier_compute_cuda.cuda.o \n/usr/local/cuda-11.7/bin/nvcc  -DTORCH_EXTENSION_NAME=_slangpy_bezier_compute_44136fa355b3678a -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\\\"_gcc\\\" -DPYBIND11_STDLIB=\\\"_libstdcpp\\\" -DPYBIND11_BUILD_ABI=\\\"_cxxabi1011\\\" -I/home/ubuntu/slang-python/examples/bezier2d -isystem /home/ubuntu/anaconda3/envs/gaussian_splatting/lib/python3.8/site-packages/torch/include -isystem /home/ubuntu/anaconda3/envs/gaussian_splatting/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /home/ubuntu/anaconda3/envs/gaussian_splatting/lib/python3.8/site-packages/torch/include/TH -isystem /home/ubuntu/anaconda3/envs/gaussian_splatting/lib/python3.8/site-packages/torch/include/THC -isystem /usr/local/cuda-11.7/include -isystem /home/ubuntu/anaconda3/envs/gaussian_splatting/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -std=c++17 -c /home/ubuntu/slang-python/examples/bezier2d/.slangpy_cache/bezier_compute/44136fa355b3678a/bezier_compute_cuda.cu -o bezier_compute_cuda.cuda.o \n/home/ubuntu/slang-python/examples/bezier2d/.slangpy_cache/bezier_compute/44136fa355b3678a/bezier_compute_cuda.cu(2372): warning #550-D: variable \"oldVal_0\" was set but never used\n\n/home/ubuntu/slang-python/examples/bezier2d/.slangpy_cache/bezier_compute/44136fa355b3678a/bezier_compute_cuda.cu(2379): warning #550-D: variable \"oldVal_1\" was set but never used\n\n/home/ubuntu/slang-python/examples/bezier2d/.slangpy_cache/bezier_compute/44136fa355b3678a/bezier_compute_cuda.cu(3036): error: identifier \"determinant\" is undefined\n\n/home/ubuntu/slang-python/examples/bezier2d/.slangpy_cache/bezier_compute/44136fa355b3678a/bezier_compute_cuda.cu(3517): error: identifier \"determinant\" is undefined\n\n2 errors detected in the compilation of \"/home/ubuntu/slang-python/examples/bezier2d/.slangpy_cache/bezier_compute/44136fa355b3678a/bezier_compute_cuda.cu\".\n[2/3] c++ -MMD -MF bezier_compute.o.d -DTORCH_EXTENSION_NAME=_slangpy_bezier_compute_44136fa355b3678a -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\\\"_gcc\\\" -DPYBIND11_STDLIB=\\\"_libstdcpp\\\" -DPYBIND11_BUILD_ABI=\\\"_cxxabi1011\\\" -I/home/ubuntu/slang-python/examples/bezier2d -isystem /home/ubuntu/anaconda3/envs/gaussian_splatting/lib/python3.8/site-packages/torch/include -isystem /home/ubuntu/anaconda3/envs/gaussian_splatting/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /home/ubuntu/anaconda3/envs/gaussian_splatting/lib/python3.8/site-packages/torch/include/TH -isystem /home/ubuntu/anaconda3/envs/gaussian_splatting/lib/python3.8/site-packages/torch/include/THC -isystem /usr/local/cuda-11.7/include -isystem /home/ubuntu/anaconda3/envs/gaussian_splatting/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -std=c++17 -c /home/ubuntu/slang-python/examples/bezier2d/.slangpy_cache/bezier_compute/44136fa355b3678a/bezier_compute.cpp -o bezier_compute.o \n/home/ubuntu/slang-python/examples/bezier2d/.slangpy_cache/bezier_compute/44136fa355b3678a/bezier_compute.cpp:8398:9: warning: #pragma once in main file\n 8398 | #pragma once\n      |         ^~~~\nninja: build stopped: subcommand failed.\n"
}

Slang fails to find python312.lib on windows

Windows 11
Python 3.12.2 with installer
Slangpy 1.1.22

When attempting to run the example square.slang kernel, a compilation error:

LINK : fatal error LNK1104: cannot open file 'python312.lib'
ninja: build stopped: subcommand failed.

How can I point your build system towards my python312.lib at ~\AppData\Local\Programs\Python\Python312\libs? I've set a LIB environment variable but that doesn't seem to remedy it.

Backward derivative too slow / incorrect

Hi, I wrote some code in slang, which seems to execute the forward pass correctly. But when I execute the backward pass, GPU utils goes 100%, and system hangs. Specially when I try to print the derivative value. I am using slangpy as the python interface to call the slang kernel file (shown below).

static const int N = NUM_CTRL_PTS;
static const int c = DIM;
static const int N1 = c * (N- 1);

__generic<let C : int> 
struct MatrixG : IDifferentiable
{
    float vals[C];
}

int nCi(int n, int i) {
    if (i > n) return 0;
    if (i == 0 || i == n) return 1;
    if (i > n - i) i = n - i;
    
    int result = 1;
    for (int k = 1; k <= i; ++k) {
        result *= n - k + 1;
        result /= k;
    }
    
    return result;
}

int fact(int n) {
    int result = 1;
    for (int i = 1; i <= n; ++i) {
        result *= i;
    }
    return result;
}

[CudaDeviceExport]
[Differentiable]
matrix<float, N, c> compute_coeffs_device(DiffTensorView control_pts) {
    // Compute the coefficients a_i for t^i, for bezier polynomial \sum a_i . t^i
    matrix<float, N, c> coeffs;
    [MaxIters(c)]
    for (int k = 0; k < c; k++) {
        [MaxIters(N)]
        for (int j = 0; j < N; j++) {
            int nCj = fact(N - 1) / fact(N - 1 - j); // degree of the polynomial is N-1
            float sum = 0;
            [MaxIters(N)]
            for (int i = 0; i < N; i++) {
                if (i <= j) {
                    sum += pow(-1, i + j) * control_pts[i, k] / (fact(i) * fact(j - i));
                }
            }
            coeffs[j][k] = nCj * sum;
        }
    }
    return coeffs;
}

[CudaDeviceExport]
[Differentiable]
 MatrixG<N1 * N1> assemble_matrix_sdf(matrix<float, N,c> coeffs) {
    // Function to create the matrix whose determinant is to be evaluated to get the sdf
    // coeffs: Tensor (N,c)
    MatrixG<N1 * N1> mat;

    // Initializing
    [MaxIters(N1 * N1)]
    for (int i = 0; i < N1 * N1; i++)
        mat.vals[i] = 0.0;

    [MaxIters(N - 1)]
    for (int i = 0; i < N - 1; i++)
        [MaxIters(N - 1)]
    for (int j = 0; j < N; j++)
        [MaxIters(c)]
        for (int k = 0; k < c; k++)
        {
            mat.vals[(k * (N - 1) + i) * N1 + j + i] = coeffs[j][k];
        }
    return mat;
}

[AutoPyBindCUDA]
[CUDAKernel]
[Differentiable]
void bezier2D(DiffTensorView t, DiffTensorView control_pts, DiffTensorView output)
{
    // t (tensor Mx1) : indices between 0-1 to traverse across the Bezier curve
    // control_pts (Nx2): N - Degree of Bezier Curve 2D
    // Get the 'global' index of this thread.
    uint3 tIdx = cudaThreadIdx() + cudaBlockIdx() * cudaBlockDim();

    // If the thread index is beyond the input size, exit early.
    if (tIdx.x > t.size(0))
        return;
    [MaxIters(N - 1)]
    for (int i = 0; i <= N - 1; i++)
    {
        output[tIdx.x, 0] = output[tIdx.x, 0] + nCi(N - 1, i) * pow((1 - t[tIdx.x]), (N - 1 - i)) * pow(t[tIdx.x], i) * control_pts[i, 0];
        output[tIdx.x, 1] = output[tIdx.x, 1] + nCi(N - 1, i) * pow((1 - t[tIdx.x]), (N - 1 - i)) * pow(t[tIdx.x], i) * control_pts[i, 1];
    }
}


[AutoPyBindCUDA]
[CudaKernel]
[Differentiable]
void compute_coeffs(DiffTensorView control_pts, DiffTensorView output) {
    // Compute the coefficients a_i for t^i, for bezier polynomial \sum a_i . t^i
    matrix<float, N, c> coeffs = compute_coeffs_device(control_pts);
    for (int i = 0; i < N; i++)
        for (int j = 0; j < c; j++)
            output[i, j] = coeffs[i][j];
}


[AutoPyBindCUDA]
[CUDAKernel]
[Differentiable]
void bezier2DSDF(DiffTensorView xy, DiffTensorView control_pts, DiffTensorView output) {
    // xy - M,c
    // coeffs - N,c
    // output - M,N1,N1 - matrix for each point at which SDF is to be evaluated
    // coeffs - ,c
    // Each thread computes the SDF value for a given xy coordinate from the determinant function above. Maybe change it up to be just differentiable, and not AutoPyBindCUDA
    uint3 tIdx = cudaThreadIdx() + cudaBlockIdx() * cudaBlockDim();
    matrix<float, N, c> coeffs = compute_coeffs_device(control_pts);
    
    int M = xy.size(0); // xy - shaped M,2
    if (tIdx.x > M) {
        return;
    }

    float coord[c];
    for (int i = 0; i < c; i++)
        coord[i] = xy[tIdx.x, i];

    for (int i = 0; i < c; i++)
        coeffs[0][i] -= coord[i];

    MatrixG<N1 * N1> mat;
    mat = assemble_matrix_sdf(coeffs);
    for (int i = 0; i < N1; i++)
        for (int j = 0; j < N1; j++)
            output[tIdx.x, i, j] = mat.vals[i*N1 + j];
}

In the python code,

import torch
import slangpy

N = 6
c = 2 
m = slangpy.loadModule('bezier.slang', defines={"NUM_CTRL_PTS": N, "DIM":c})
num_pts = 1000
t = torch.linspace(0.0, 1, num_pts, dtype=torch.float).cuda()

control_pts = 1*torch.rand((N,2),dtype=torch.float).cuda()
control_pts[-1] = control_pts[0]

## Computing SDF for line function 
num_points = 1000 # for example, 100 points along each axis

# Generate evenly spaced points between 0 and 1
px = torch.linspace(0.2, 1.0, num_points)
py = torch.linspace(0.5, 1.0, num_points)

# Create the meshgrid
x, y = torch.meshgrid(px, py.flip(dims=[0]), indexing='ij')  # 'i
xy = torch.stack( [x,y], dim=-1 ).view(-1,2).cuda()
sdf_mats = torch.zeros(xy.shape[0], c*(N-1), c*(N-1)).cuda()

import time 

num_iters = 10
t = time.time()
for i in range(num_iters):
	m.bezier2DSDF(xy=xy, control_pts=control_pts, output=sdf_mats).launchRaw(blockSize=(1024, 1, 1), gridSize=(1024, 1, 1))
	sdf_N = torch.linalg.det(sdf_mats)
	sdf_N = torch.sign(sdf_N) * torch.sqrt(torch.abs(sdf_N))
print((time.time() - t)/num_iters)

## Computing Gradients
xy_grad = torch.zeros_like(xy).cuda()
control_pts_grad = torch.zeros_like(control_pts).cuda()
output_grad = torch.zeros_like(sdf_mats).cuda()
m.bezier2DSDF.bwd(xy=(xy, xy_grad), control_pts=(control_pts, control_pts_grad), output=(sdf_mats, output_grad)).launchRaw(blockSize=(1,1,1), gridSize=(1,1,1))

print(control_pts_grad) # This is where things hang.

When I call the .bwd function, it takes extremely long, and returns all 0s and NaNs. But the forward pass works correctly. My guess is that there's something wrong with how the derivatives are being chained through the MatrixG data structure.

Using [Differentiable] without DiffTensorView

Hello!

By mistake, I added [Differentiable] to the square sample but I didn't change TensorView to DiffTensorView.

The result was a square.fwd() that simply calculates square(). I was wondering if there is ever a valid use case for doing this, and if not, if perhaps the compiler could catch this mistake? Because otherwise, if I pass a single input and single output, also by mistake, then there are no errors, but the result is definitely not the forward gradient :)

In detail, this shader...

[AutoPyBindCUDA]
[CUDAKernel]
[Differentiable]
void square(TensorView<float> input, TensorView<float> output)
{
// ...
	output[dispatchIdx.x] = input[dispatchIdx.x] * input[dispatchIdx.x];
}

... compiles without error, but fwd is not the gradient:

__device__ void s_fwd_square_0(TensorView input_2, TensorView output_2)
{
    uint _S14 = (((blockIdx)) * ((blockDim)) + ((threadIdx))).x;
    uint _S15 = ((input_2).sizes[(0U)]);
    if(_S14 >= _S15)
    {
        return;
    }
    float _S16 = ((input_2).load<float>((_S14)));
    float _S17 = ((input_2).load<float>((_S14)));
    (output_2).store<float>((_S14), (_S16 * _S17));   // ?!
    return;
}

extern "C" {
__global__ void __kernel__square_fwd_diff(TensorView _S18, TensorView _S19)
{
    s_fwd_square_0(_S18, _S19);
    return;
}

So in this python code, forward is the same as the squared numbers:

inputs = torch.tensor( (1, 2, 3, 4, 5, 6, 7, 8, 9, 10), dtype=torch.float).cuda()

squared = torch.zeros_like(inputs).cuda()
forward = torch.ones_like(inputs).cuda()

m.square(input=inputs, output=squared).launchRaw(blockSize=(32, 1, 1), gridSize=(64,1,1))
m.square.fwd(input=inputs, output=forward).launchRaw(blockSize=(32, 1, 1), gridSize=(64,1,1))

print(inputs)
print(squared)
print(forward)

Output:

tensor([ 1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9., 10.], device='cuda:0')
tensor([  1.,   4.,   9.,  16.,  25.,  36.,  49.,  64.,  81., 100.], device='cuda:0')
tensor([  1.,   4.,   9.,  16.,  25.,  36.,  49.,  64.,  81., 100.], device='cuda:0')    # oh noes

Everything is fine with the correct signature in the shader...

[AutoPyBindCUDA]
[CUDAKernel]
[Differentiable]
void square(DiffTensorView<float> input, DiffTensorView<float> output)
{
	uint3 dispatchIdx = cudaBlockIdx() * cudaBlockDim() + cudaThreadIdx();
	if (dispatchIdx.x >= input.size(0))
		return;

	output[dispatchIdx.x] = input[dispatchIdx.x] * input[dispatchIdx.x];
}

... and called as ...

inputs = torch.tensor( (1, 2, 3, 4, 5, 6, 7, 8, 9, 10), dtype=torch.float).cuda()
input_grad = torch.ones_like(inputs).cuda()

squared = torch.zeros_like(inputs).cuda()
forward = torch.ones_like(inputs).cuda()

m.square(input=inputs, output=squared).launchRaw(blockSize=(32, 1, 1), gridSize=(64,1,1))
m.square.fwd(input=(inputs, input_grad), output=(squared,forward)).launchRaw(blockSize=(32, 1, 1), gridSize=(64,1,1))

So I was wondering if this user error could be caught? Perhaps it's also useful to add a fwd example to the docs to make it very clear that, just like bwd, it expects a pair?

Thanks!
bert

(Edit: rewrote for clarity)

Compile cache frequently stale

Windows 11
Python 3.12.2 with installer
Slangpy 1.1.22

I am running into an issue where slang's cache is frequently pulling in stale code, for example throwing up warnings for code I deleted since the last run.

It could be that the mechanism used to detect changes to the .slang source files is faulty?

cuda_matmul_prelude.cuh

I am getting an error when I am running python mlp_image_fit.py (see bellow)

To recreate:
build docker:

docker build -t slang-python .

with Dockerfile:

FROM nvcr.io/nvidia/pytorch:23.10-py3
WORKDIR /
RUN git clone https://github.com/shader-slang/slang-python.git
WORKDIR /slang-python
RUN apt-get update
RUN apt-get install -y python3.10-venv
COPY build-package.sh .
RUN bash ./build-package.sh
WORKDIR /slang-python/dist
RUN pip install *.whl
WORKDIR /slang-python/
RUN apt-get install -y ffmpeg

and with build-package.sh file (same directory):

wget https://github.com/shader-slang/slang/releases/download/v2023.4.9/slang-2023.4.9-linux-x86_64.zip
LINUX64ZIP=slang-2023.4.9-linux-x86_64.zip
mkdir -p ./tmp
echo "extracting $LINUX64ZIP"
unzip -n $LINUX64ZIP -d ./tmp
mkdir -p ./slangpy/bin/
cp ./tmp/bin/linux-x64/release/libslang.so ./slangpy/bin/libslang.so
cp ./tmp/bin/linux-x64/release/slangc ./slangpy/bin/slangc
chmod +x ./slangpy/bin/slangc
echo "content of ./slangpy/bin/:"
ls ./slangpy/bin/
rm $LINUX64ZIP
rm -rf ./tmp/
python -m pip install --upgrade pip
pip install build hatchling
python -m build

you will get the following error

/slang-python/examples/inline-mlp-example/cuda_matmul_prelude.cuh(25): error: name followed by "::" must be a class or namespace name
     nvcuda::wmma::fragment<nvcuda::wmma::matrix_a, M, N, K, nvcuda::wmma::precision::tf32, nvcuda::wmma::row_major> a_frag;
     ^

/slang-python/examples/inline-mlp-example/cuda_matmul_prelude.cuh(25): error: name followed by "::" must be a class or namespace name
     nvcuda::wmma::fragment<nvcuda::wmma::matrix_a, M, N, K, nvcuda::wmma::precision::tf32, nvcuda::wmma::row_major> a_frag;
                                                             ^

/slang-python/examples/inline-mlp-example/cuda_matmul_prelude.cuh(25): error: name followed by "::" must be a class or namespace name
     nvcuda::wmma::fragment<nvcuda::wmma::matrix_a, M, N, K, nvcuda::wmma::precision::tf32, nvcuda::wmma::row_major> a_frag;
                                                                                            ^

/slang-python/examples/inline-mlp-example/cuda_matmul_prelude.cuh(25): error: identifier "a_frag" is undefined
     nvcuda::wmma::fragment<nvcuda::wmma::matrix_a, M, N, K, nvcuda::wmma::precision::tf32, nvcuda::wmma::row_major> a_frag;
                                                                                                                     ^

/slang-python/examples/inline-mlp-example/cuda_matmul_prelude.cuh(26): error: name followed by "::" must be a class or namespace name
     nvcuda::wmma::fragment<nvcuda::wmma::matrix_b, M, N, K, nvcuda::wmma::precision::tf32, nvcuda::wmma::col_major> b_frag;
     ^

/slang-python/examples/inline-mlp-example/cuda_matmul_prelude.cuh(26): error: name followed by "::" must be a class or namespace name
     nvcuda::wmma::fragment<nvcuda::wmma::matrix_b, M, N, K, nvcuda::wmma::precision::tf32, nvcuda::wmma::col_major> b_frag;
                                                             ^

/slang-python/examples/inline-mlp-example/cuda_matmul_prelude.cuh(26): error: name followed by "::" must be a class or namespace name
     nvcuda::wmma::fragment<nvcuda::wmma::matrix_b, M, N, K, nvcuda::wmma::precision::tf32, nvcuda::wmma::col_major> b_frag;
                                                                                            ^

/slang-python/examples/inline-mlp-example/cuda_matmul_prelude.cuh(26): error: identifier "b_frag" is undefined
     nvcuda::wmma::fragment<nvcuda::wmma::matrix_b, M, N, K, nvcuda::wmma::precision::tf32, nvcuda::wmma::col_major> b_frag;
                                                                                                                     ^

/slang-python/examples/inline-mlp-example/cuda_matmul_prelude.cuh(27): error: name followed by "::" must be a class or namespace name
     nvcuda::wmma::fragment<nvcuda::wmma::accumulator, M, N, K, float> acc_frag;
     ^

/slang-python/examples/inline-mlp-example/cuda_matmul_prelude.cuh(27): error: type name is not allowed
     nvcuda::wmma::fragment<nvcuda::wmma::accumulator, M, N, K, float> acc_frag;
                                                                ^

/slang-python/examples/inline-mlp-example/cuda_matmul_prelude.cuh(27): error: identifier "acc_frag" is undefined
     nvcuda::wmma::fragment<nvcuda::wmma::accumulator, M, N, K, float> acc_frag;
                                                                       ^

/slang-python/examples/inline-mlp-example/cuda_matmul_prelude.cuh(45): error: name followed by "::" must be a class or namespace name
           nvcuda::wmma::fill_fragment(acc_frag, 0.0f);
           ^

/slang-python/examples/inline-mlp-example/cuda_matmul_prelude.cuh(54): error: name followed by "::" must be a class or namespace name
              nvcuda::wmma::load_matrix_sync(a_frag, in + tk + ti * lda, lda);
              ^

/slang-python/examples/inline-mlp-example/cuda_matmul_prelude.cuh(55): error: name followed by "::" must be a class or namespace name
              nvcuda::wmma::load_matrix_sync(b_frag, wt + tk + tj * ldb, ldb);
              ^

/slang-python/examples/inline-mlp-example/cuda_matmul_prelude.cuh(59): error: name followed by "::" must be a class or namespace name
                 a_frag.x[t] = nvcuda::wmma::__float_to_tf32(a_frag.x[t]);
                               ^

/slang-python/examples/inline-mlp-example/cuda_matmul_prelude.cuh(63): error: name followed by "::" must be a class or namespace name
                 b_frag.x[t] = nvcuda::wmma::__float_to_tf32(b_frag.x[t]);
                               ^

/slang-python/examples/inline-mlp-example/cuda_matmul_prelude.cuh(66): error: name followed by "::" must be a class or namespace name
              nvcuda::wmma::mma_sync(acc_frag, a_frag, b_frag, acc_frag);
              ^

/slang-python/examples/inline-mlp-example/cuda_matmul_prelude.cuh(70): error: name followed by "::" must be a class or namespace name
           nvcuda::wmma::store_matrix_sync(out + tj + ti * O, acc_frag, O, nvcuda::wmma::mem_row_major);
           ^

/slang-python/examples/inline-mlp-example/cuda_matmul_prelude.cuh(70): error: name followed by "::" must be a class or namespace name
           nvcuda::wmma::store_matrix_sync(out + tj + ti * O, acc_frag, O, nvcuda::wmma::mem_row_major);
                                                                           ^

19 errors detected in the compilation of "/slang-python/examples/inline-mlp-example/.slangpy_cache/image-model/1426d7b0fdc875a7/image-model_cuda.cu".
ninja: build stopped: subcommand failed.

Accumulating gradient_output in the BRDF example

Hello, I was wondering if the gradient_output in the BRDF example perhaps needs a zero_() in the learning inner loop (ie. before calling m.brdf_loss.bwd) ? Similar to calling optimizer.zero_grad() in PyTorch.

Otherwise, wouldn't the code accumulate the gradient with each sample, while also immediately applying it?

Apologies if this is intentional, or a zero-fill is already implied somewhere, or I misunderstood :)

Thanks!
bert

PS Sorry, I don't have Jupyter set up to test a merge request.

for i in range(10000):
    L = random_hemi_vector()
    V = (0.0, 0.0, 1.0)
    input_params = (*L, *V)
    loss_output = torch.zeros((original_shape[0], original_shape[1], 1)).cuda()
    output_grad = torch.ones_like(loss_output).cuda()
    m.brdf(input=full_res_brdf,
           output=lighting_from_full_res_brdf,
           input_params=input_params).launchRaw(blockSize=block_size, gridSize=grid_size)
    
    gradient_output.zero_()    # ++++++++++++++++++++++
    m.brdf_loss.bwd(input=(half_res_brdf, gradient_output),
                    output=(loss_output, output_grad),
                    reference=lighting_from_full_res_brdf,
                    input_params=input_params).launchRaw(blockSize=block_size, gridSize=grid_size)
    # Clip gradients and prevent.
    gradient_output = torch.nan_to_num(gradient_output, 0.0)
    gradient_output = torch.clamp(gradient_output, -1.0, 1.0)
    half_res_brdf = torch.clip(half_res_brdf - 0.001 * gradient_output, 0.0001, 1.0)

how to use DiffTensorView<float> in a struct

Hi, here are my code and script, I can't pass grad when using DiffTensorView in struct

struct Reservoir
{
    DiffTensorView<float> input0;
    DiffTensorView<float> input1;
};
[AutoPyBindCUDA]
[CUDAKernel]
[Differentiable]
void square(Reservoir input, DiffTensorView output)
{
    uint3 dispatchIdx = cudaThreadIdx() + cudaBlockIdx() * cudaBlockDim();
    if (dispatchIdx.x >= output.size(0) || dispatchIdx.y >= output.size(1)) return;

    output[dispatchIdx.x, dispatchIdx.y] = input.input0[dispatchIdx.x, dispatchIdx.y] * input.input0[dispatchIdx.x, dispatchIdx.y];
    output[dispatchIdx.x, dispatchIdx.y] += input.input1[dispatchIdx.x, dispatchIdx.y] * input.input1[dispatchIdx.x, dispatchIdx.y];

}

import torch
import slangpy

m = slangpy.loadModule("D:/codes/python/slang_test/test_struct/square.slang")

class MySquareFunc(torch.autograd.Function):
    @staticmethod
    def forward(ctx, input):

        input['input0'] = input['input0'].contiguous()
        input['input1'] = input['input1'].contiguous()
        output = torch.zeros_like(input['input0'])
        output = output.contiguous()

        kernel_with_args = m.square(input=input, output=output)
        kernel_with_args.launchRaw(
            blockSize=(32, 32, 1),
            gridSize=((output.shape[0] + 31) // 32, (output.shape[1] + 31) // 32, 1))

        ctx.save_for_backward(input, output)

        return output

    @staticmethod
    def backward(ctx, grad_output):
        (input, output) = ctx.saved_tensors

        input_grad = torch.zeros_like(input)
        grad_output = grad_output.contiguous()
        
        # Note: When using DiffTensorView, grad_output gets 'consumed' during the reverse-mode.
        # If grad_output may be reused, consider calling grad_output = grad_output.clone()
        #
        kernel_with_args = m.square.bwd(input=(input, input_grad), output=(output, grad_output))
        kernel_with_args.launchRaw(
            blockSize=(32, 32, 1),
            gridSize=((output.shape[0] + 31) // 32, (output.shape[1] + 31) // 32, 1))
        
        return input_grad
    
x = torch.tensor([[3.0, 4.0],[0.0, 1.0]], requires_grad=True, device='cuda')
y = torch.tensor([[5.0, 6.0],[7.0, 0.0]], requires_grad=True, device='cuda')
#print(f"X = {x}")
input = {'input0': x, 'input1': y}
y_pred = MySquareFunc.apply(input)
loss = y_pred.sum()
loss.backward()
print(f"dX = {x.grad.cpu()}")
print(f"dy = {y.grad.cpu()}")

errorelement 0 of tensors does not require grad and does not have a grad_fn occurs with loss.backward(),
I tried to add IDifferentiable for the struct, but it can't complie right

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.