Thanks for the nice work! I met the following issue when I run <code class="notranslat

What's your GPU hardware architecture? Currently the code uses <code class="notran

I met a similar error. <div class="snippet-clipboard-content notranslate position-

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

ashawkey,torch-ngp

Comments (9)

ashawkey commented on August 16, 2024

What's your GPU hardware architecture?
Currently the code uses atomicAdd for __half, which is only available for a GPU with architecture >= 70.
A temporary solution is to comment out that function here and its use here, and make sure level_dim is even (but a minimal architecture of 60 is still needed for __half2).

from torch-ngp.

aoliao12138 commented on August 16, 2024

I met a similar error.

Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/torch/utils/cpp_extension.py", line 1717, in _run_ninja_build
    subprocess.run(
  File "/usr/lib/python3.8/subprocess.py", line 512, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "train_nerf.py", line 3, in <module>
    from nerf.network import NeRFNetwork
  File "/data/new_disk70/wangla/tmp/torch-ngp/nerf/network.py", line 9, in <module>
    from encoding import get_encoder
  File "/data/new_disk70/wangla/tmp/torch-ngp/encoding.py", line 6, in <module>
    from hashencoder import HashEncoder
  File "/data/new_disk70/wangla/tmp/torch-ngp/hashencoder/__init__.py", line 1, in <module>
    from .hashgrid import HashEncoder
  File "/data/new_disk70/wangla/tmp/torch-ngp/hashencoder/hashgrid.py", line 8, in <module>
    from .backend import _backend
  File "/data/new_disk70/wangla/tmp/torch-ngp/hashencoder/backend.py", line 6, in <module>
    _backend = load(name='_hash_encoder',
  File "/usr/local/lib/python3.8/dist-packages/torch/utils/cpp_extension.py", line 1124, in load
    return _jit_compile(
  File "/usr/local/lib/python3.8/dist-packages/torch/utils/cpp_extension.py", line 1337, in _jit_compile
    _write_ninja_file_and_build_library(
  File "/usr/local/lib/python3.8/dist-packages/torch/utils/cpp_extension.py", line 1449, in _write_ninja_file_and_build_library
    _run_ninja_build(
  File "/usr/local/lib/python3.8/dist-packages/torch/utils/cpp_extension.py", line 1733, in _run_ninja_build
    raise RuntimeError(message) from e
RuntimeError: Error building extension '_hash_encoder': [1/3] :/usr/local/cuda-11.3/bin/nvcc  -DTORCH_EXTENSION_NAME=_hash_encoder -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /usr/local/lib/python3.8/dist-packages/torch/include -isystem /usr/local/lib/python3.8/dist-packages/torch/include/torch/csrc/api/include -isystem /usr/local/lib/python3.8/dist-packages/torch/include/TH -isystem /usr/local/lib/python3.8/dist-packages/torch/include/THC -isystem :/usr/local/cuda-11.3/include -isystem /usr/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -O3 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -std=c++14 -c /data/new_disk70/wangla/tmp/torch-ngp/hashencoder/src/hashencoder.cu -o hashencoder.cuda.o 
FAILED: hashencoder.cuda.o 
:/usr/local/cuda-11.3/bin/nvcc  -DTORCH_EXTENSION_NAME=_hash_encoder -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /usr/local/lib/python3.8/dist-packages/torch/include -isystem /usr/local/lib/python3.8/dist-packages/torch/include/torch/csrc/api/include -isystem /usr/local/lib/python3.8/dist-packages/torch/include/TH -isystem /usr/local/lib/python3.8/dist-packages/torch/include/THC -isystem :/usr/local/cuda-11.3/include -isystem /usr/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -O3 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -std=c++14 -c /data/new_disk70/wangla/tmp/torch-ngp/hashencoder/src/hashencoder.cu -o hashencoder.cuda.o 
/bin/sh: 1: :/usr/local/cuda-11.3/bin/nvcc: not found
[2/3] c++ -MMD -MF bindings.o.d -DTORCH_EXTENSION_NAME=_hash_encoder -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /usr/local/lib/python3.8/dist-packages/torch/include -isystem /usr/local/lib/python3.8/dist-packages/torch/include/torch/csrc/api/include -isystem /usr/local/lib/python3.8/dist-packages/torch/include/TH -isystem /usr/local/lib/python3.8/dist-packages/torch/include/THC -isystem :/usr/local/cuda-11.3/include -isystem /usr/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -O3 -c /data/new_disk70/wangla/tmp/torch-ngp/hashencoder/src/bindings.cpp -o bindings.o 
ninja: build stopped: subcommand failed.

even l comment out that 2 lines, still the same error occurs.

More info:

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Sun_Mar_21_19:15:46_PDT_2021
Cuda compilation tools, release 11.3, V11.3.58
Build cuda_11.3.r11.3/compiler.29745058_0

Python 3.8.5 (default, Jul 28 2020, 12:59:40) 
[GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch.__version__
'1.10.2+cu113'

I am using RTX3090.

from torch-ngp.

ashawkey commented on August 16, 2024

@aoliao12138 The error message says /bin/sh: 1: :/usr/local/cuda-11.3/bin/nvcc: not found, have you included CUDA bin to your path? (e.g., export PATH="/usr/local/cuda/bin:$PATH")

from torch-ngp.

wangjksjtu commented on August 16, 2024

@ashawkey Thank you for the prompt reply!
My GPU is RTX 1080Ti - so the architecture is 61. It seems to work for me when comment that atomicAdd function. However, the following issues (compilation of fully fused network) appear:

  File "train_nerf.py", line 4, in <module>
    from nerf.network_ff import NeRFNetwork as NeRFNetwork_FF
  File "/home/wangjk/programs/torch-ngp/nerf/network_ff.py", line 10, in <module>
    from ffmlp import FFMLP
  File "/home/wangjk/programs/torch-ngp/ffmlp/__init__.py", line 1, in <module>
    from .ffmlp import FFMLP
  File "/home/wangjk/programs/torch-ngp/ffmlp/ffmlp.py", line 10, in <module>
    from .backend import _backend
  File "/home/wangjk/programs/torch-ngp/ffmlp/backend.py", line 16, in <module>
    sources=[os.path.join(_src_path, 'src', f) for f in [
  File "/home/wangjk/anaconda3/envs/torch-ngp/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1136, in load
    keep_intermediates=keep_intermediates)
  File "/home/wangjk/anaconda3/envs/torch-ngp/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1347, in _jit_compile
    is_standalone=is_standalone)
  File "/home/wangjk/anaconda3/envs/torch-ngp/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1452, in _write_ninja_file_and_build_library
    error_prefix=f"Error building extension '{name}'")
  File "/home/wangjk/anaconda3/envs/torch-ngp/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1733, in _run_ninja_build
    raise RuntimeError(message) from e
RuntimeError: Error building extension '_ffmlp': [1/2] /home/wangjk/anaconda3/envs/torch-ngp/bin/nvcc  -DTORCH_EXTENSION_NAME=_ffmlp -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/home/wangjk/programs/torch-ngp/ffmlp/include -isystem /home/wangjk/anaconda3/envs/torch-ngp/lib/python3.7/site-packages/torch/include -isystem /home/wangjk/anaconda3/envs/torch-ngp/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -isystem /home/wangjk/anaconda3/envs/torch-ngp/lib/python3.7/site-packages/torch/include/TH -isystem /home/wangjk/anaconda3/envs/torch-ngp/lib/python3.7/site-packages/torch/include/THC -isystem /home/wangjk/anaconda3/envs/torch-ngp/include -isystem /home/wangjk/anaconda3/envs/torch-ngp/include/python3.7m -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_61,code=compute_61 -gencode=arch=compute_61,code=sm_61 --compiler-options '-fPIC' -O3 -Xcompiler=-mf16c -Xcompiler=-Wno-float-conversion -Xcompiler=-fno-strict-aliasing --extended-lambda --expt-relaxed-constexpr -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -std=c++14 -c /home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu -o ffmlp.cuda.o 
FAILED: ffmlp.cuda.o 
/home/wangjk/anaconda3/envs/torch-ngp/bin/nvcc  -DTORCH_EXTENSION_NAME=_ffmlp -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/home/wangjk/programs/torch-ngp/ffmlp/include -isystem /home/wangjk/anaconda3/envs/torch-ngp/lib/python3.7/site-packages/torch/include -isystem /home/wangjk/anaconda3/envs/torch-ngp/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -isystem /home/wangjk/anaconda3/envs/torch-ngp/lib/python3.7/site-packages/torch/include/TH -isystem /home/wangjk/anaconda3/envs/torch-ngp/lib/python3.7/site-packages/torch/include/THC -isystem /home/wangjk/anaconda3/envs/torch-ngp/include -isystem /home/wangjk/anaconda3/envs/torch-ngp/include/python3.7m -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_61,code=compute_61 -gencode=arch=compute_61,code=sm_61 --compiler-options '-fPIC' -O3 -Xcompiler=-mf16c -Xcompiler=-Wno-float-conversion -Xcompiler=-fno-strict-aliasing --extended-lambda --expt-relaxed-constexpr -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -std=c++14 -c /home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu -o ffmlp.cuda.o 
/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(243): error: explicit type is missing ("int" assumed)

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(243): error: expected a ")"

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(344): error: explicit type is missing ("int" assumed)

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(344): error: expected a ")"

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(577): error: name followed by "::" must be a class or namespace name

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(396): error: identifier "output_layout" is undefined
          detected during instantiation of "void ffmlp_forward_cuda<WIDTH,INFERENCE>(const __half *, const __half *, uint32_t, uint32_t, uint32_t, uint32_t, Activation, Activation, __half *, __half *) [with WIDTH=16U, INFERENCE=false]" 
(655): here

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(396): error: name followed by "::" must be a class or namespace name
          detected during instantiation of "void ffmlp_forward_cuda<WIDTH,INFERENCE>(const __half *, const __half *, uint32_t, uint32_t, uint32_t, uint32_t, Activation, Activation, __half *, __half *) [with WIDTH=16U, INFERENCE=false]" 
(655): here

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(60): error: name must be a namespace name
          detected during:
            instantiation of "void kernel_mlp_fused<WIDTH,BLOCK_DIM_Z,N_ITERS,OUT_T,INFERENCE>(Activation, Activation, const __half *, const __half *, OUT_T *, OUT_T *, uint32_t, uint32_t, uint32_t, uint32_t, int) [with WIDTH=16, BLOCK_DIM_Z=1, N_ITERS=8, OUT_T=__half, INFERENCE=false]" 
(564): here
            instantiation of "void ffmlp_forward_cuda<WIDTH,INFERENCE>(const __half *, const __half *, uint32_t, uint32_t, uint32_t, uint32_t, Activation, Activation, __half *, __half *) [with WIDTH=16U, INFERENCE=false]" 
(655): here

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(64): error: identifier "wmma" is undefined
          detected during:
            instantiation of "void kernel_mlp_fused<WIDTH,BLOCK_DIM_Z,N_ITERS,OUT_T,INFERENCE>(Activation, Activation, const __half *, const __half *, OUT_T *, OUT_T *, uint32_t, uint32_t, uint32_t, uint32_t, int) [with WIDTH=16, BLOCK_DIM_Z=1, N_ITERS=8, OUT_T=__half, INFERENCE=false]" 
(564): here
            instantiation of "void ffmlp_forward_cuda<WIDTH,INFERENCE>(const __half *, const __half *, uint32_t, uint32_t, uint32_t, uint32_t, Activation, Activation, __half *, __half *) [with WIDTH=16U, INFERENCE=false]" 
(655): here
....
....
85 errors detected in the compilation of "/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu".
ninja: build stopped: subcommand failed.

Full log here:

Click to expand!

Traceback (most recent call last):
  File "train_nerf.py", line 4, in <module>
    from nerf.network_ff import NeRFNetwork as NeRFNetwork_FF
  File "/home/wangjk/programs/torch-ngp/nerf/network_ff.py", line 10, in <module>
    from ffmlp import FFMLP
  File "/home/wangjk/programs/torch-ngp/ffmlp/__init__.py", line 1, in <module>
    from .ffmlp import FFMLP
  File "/home/wangjk/programs/torch-ngp/ffmlp/ffmlp.py", line 10, in <module>
    from .backend import _backend
  File "/home/wangjk/programs/torch-ngp/ffmlp/backend.py", line 16, in <module>
    sources=[os.path.join(_src_path, 'src', f) for f in [
  File "/home/wangjk/anaconda3/envs/torch-ngp/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1136, in load
    keep_intermediates=keep_intermediates)
  File "/home/wangjk/anaconda3/envs/torch-ngp/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1347, in _jit_compile
    is_standalone=is_standalone)
  File "/home/wangjk/anaconda3/envs/torch-ngp/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1452, in _write_ninja_file_and_build_library
    error_prefix=f"Error building extension '{name}'")
  File "/home/wangjk/anaconda3/envs/torch-ngp/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1733, in _run_ninja_build
    raise RuntimeError(message) from e
RuntimeError: Error building extension '_ffmlp': [1/2] /home/wangjk/anaconda3/envs/torch-ngp/bin/nvcc  -DTORCH_EXTENSION_NAME=_ffmlp -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/home/wangjk/programs/torch-ngp/ffmlp/include -isystem /home/wangjk/anaconda3/envs/torch-ngp/lib/python3.7/site-packages/torch/include -isystem /home/wangjk/anaconda3/envs/torch-ngp/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -isystem /home/wangjk/anaconda3/envs/torch-ngp/lib/python3.7/site-packages/torch/include/TH -isystem /home/wangjk/anaconda3/envs/torch-ngp/lib/python3.7/site-packages/torch/include/THC -isystem /home/wangjk/anaconda3/envs/torch-ngp/include -isystem /home/wangjk/anaconda3/envs/torch-ngp/include/python3.7m -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_61,code=compute_61 -gencode=arch=compute_61,code=sm_61 --compiler-options '-fPIC' -O3 -Xcompiler=-mf16c -Xcompiler=-Wno-float-conversion -Xcompiler=-fno-strict-aliasing --extended-lambda --expt-relaxed-constexpr -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -std=c++14 -c /home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu -o ffmlp.cuda.o 
FAILED: ffmlp.cuda.o 
/home/wangjk/anaconda3/envs/torch-ngp/bin/nvcc  -DTORCH_EXTENSION_NAME=_ffmlp -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/home/wangjk/programs/torch-ngp/ffmlp/include -isystem /home/wangjk/anaconda3/envs/torch-ngp/lib/python3.7/site-packages/torch/include -isystem /home/wangjk/anaconda3/envs/torch-ngp/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -isystem /home/wangjk/anaconda3/envs/torch-ngp/lib/python3.7/site-packages/torch/include/TH -isystem /home/wangjk/anaconda3/envs/torch-ngp/lib/python3.7/site-packages/torch/include/THC -isystem /home/wangjk/anaconda3/envs/torch-ngp/include -isystem /home/wangjk/anaconda3/envs/torch-ngp/include/python3.7m -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_61,code=compute_61 -gencode=arch=compute_61,code=sm_61 --compiler-options '-fPIC' -O3 -Xcompiler=-mf16c -Xcompiler=-Wno-float-conversion -Xcompiler=-fno-strict-aliasing --extended-lambda --expt-relaxed-constexpr -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -std=c++14 -c /home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu -o ffmlp.cuda.o 
/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(243): error: explicit type is missing ("int" assumed)

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(243): error: expected a ")"

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(344): error: explicit type is missing ("int" assumed)

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(344): error: expected a ")"

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(577): error: name followed by "::" must be a class or namespace name

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(396): error: identifier "output_layout" is undefined
          detected during instantiation of "void ffmlp_forward_cuda<WIDTH,INFERENCE>(const __half *, const __half *, uint32_t, uint32_t, uint32_t, uint32_t, Activation, Activation, __half *, __half *) [with WIDTH=16U, INFERENCE=false]" 
(655): here

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(396): error: name followed by "::" must be a class or namespace name
          detected during instantiation of "void ffmlp_forward_cuda<WIDTH,INFERENCE>(const __half *, const __half *, uint32_t, uint32_t, uint32_t, uint32_t, Activation, Activation, __half *, __half *) [with WIDTH=16U, INFERENCE=false]" 
(655): here

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(60): error: name must be a namespace name
          detected during:
            instantiation of "void kernel_mlp_fused<WIDTH,BLOCK_DIM_Z,N_ITERS,OUT_T,INFERENCE>(Activation, Activation, const __half *, const __half *, OUT_T *, OUT_T *, uint32_t, uint32_t, uint32_t, uint32_t, int) [with WIDTH=16, BLOCK_DIM_Z=1, N_ITERS=8, OUT_T=__half, INFERENCE=false]" 
(564): here
            instantiation of "void ffmlp_forward_cuda<WIDTH,INFERENCE>(const __half *, const __half *, uint32_t, uint32_t, uint32_t, uint32_t, Activation, Activation, __half *, __half *) [with WIDTH=16U, INFERENCE=false]" 
(655): here

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(64): error: identifier "wmma" is undefined
          detected during:
            instantiation of "void kernel_mlp_fused<WIDTH,BLOCK_DIM_Z,N_ITERS,OUT_T,INFERENCE>(Activation, Activation, const __half *, const __half *, OUT_T *, OUT_T *, uint32_t, uint32_t, uint32_t, uint32_t, int) [with WIDTH=16, BLOCK_DIM_Z=1, N_ITERS=8, OUT_T=__half, INFERENCE=false]" 
(564): here
            instantiation of "void ffmlp_forward_cuda<WIDTH,INFERENCE>(const __half *, const __half *, uint32_t, uint32_t, uint32_t, uint32_t, Activation, Activation, __half *, __half *) [with WIDTH=16U, INFERENCE=false]" 
(655): here

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(64): error: too few arguments for alias template "std::conditional_t"
          detected during:
            instantiation of "void kernel_mlp_fused<WIDTH,BLOCK_DIM_Z,N_ITERS,OUT_T,INFERENCE>(Activation, Activation, const __half *, const __half *, OUT_T *, OUT_T *, uint32_t, uint32_t, uint32_t, uint32_t, int) [with WIDTH=16, BLOCK_DIM_Z=1, N_ITERS=8, OUT_T=__half, INFERENCE=false]" 
(564): here
            instantiation of "void ffmlp_forward_cuda<WIDTH,INFERENCE>(const __half *, const __half *, uint32_t, uint32_t, uint32_t, uint32_t, Activation, Activation, __half *, __half *) [with WIDTH=16U, INFERENCE=false]" 
(655): here

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(64): error: expected a ";"
          detected during:
            instantiation of "void kernel_mlp_fused<WIDTH,BLOCK_DIM_Z,N_ITERS,OUT_T,INFERENCE>(Activation, Activation, const __half *, const __half *, OUT_T *, OUT_T *, uint32_t, uint32_t, uint32_t, uint32_t, int) [with WIDTH=16, BLOCK_DIM_Z=1, N_ITERS=8, OUT_T=__half, INFERENCE=false]" 
(564): here
            instantiation of "void ffmlp_forward_cuda<WIDTH,INFERENCE>(const __half *, const __half *, uint32_t, uint32_t, uint32_t, uint32_t, Activation, Activation, __half *, __half *) [with WIDTH=16U, INFERENCE=false]" 
(655): here

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(67): error: name followed by "::" must be a class or namespace name
          detected during:
            instantiation of "void kernel_mlp_fused<WIDTH,BLOCK_DIM_Z,N_ITERS,OUT_T,INFERENCE>(Activation, Activation, const __half *, const __half *, OUT_T *, OUT_T *, uint32_t, uint32_t, uint32_t, uint32_t, int) [with WIDTH=16, BLOCK_DIM_Z=1, N_ITERS=8, OUT_T=__half, INFERENCE=false]" 
(564): here
            instantiation of "void ffmlp_forward_cuda<WIDTH,INFERENCE>(const __half *, const __half *, uint32_t, uint32_t, uint32_t, uint32_t, Activation, Activation, __half *, __half *) [with WIDTH=16U, INFERENCE=false]" 
(655): here

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(67): error: type name is not allowed
          detected during:
            instantiation of "void kernel_mlp_fused<WIDTH,BLOCK_DIM_Z,N_ITERS,OUT_T,INFERENCE>(Activation, Activation, const __half *, const __half *, OUT_T *, OUT_T *, uint32_t, uint32_t, uint32_t, uint32_t, int) [with WIDTH=16, BLOCK_DIM_Z=1, N_ITERS=8, OUT_T=__half, INFERENCE=false]" 
(564): here
            instantiation of "void ffmlp_forward_cuda<WIDTH,INFERENCE>(const __half *, const __half *, uint32_t, uint32_t, uint32_t, uint32_t, Activation, Activation, __half *, __half *) [with WIDTH=16U, INFERENCE=false]" 
(655): here

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(67): error: name followed by "::" must be a class or namespace name
          detected during:
            instantiation of "void kernel_mlp_fused<WIDTH,BLOCK_DIM_Z,N_ITERS,OUT_T,INFERENCE>(Activation, Activation, const __half *, const __half *, OUT_T *, OUT_T *, uint32_t, uint32_t, uint32_t, uint32_t, int) [with WIDTH=16, BLOCK_DIM_Z=1, N_ITERS=8, OUT_T=__half, INFERENCE=false]" 
(564): here
            instantiation of "void ffmlp_forward_cuda<WIDTH,INFERENCE>(const __half *, const __half *, uint32_t, uint32_t, uint32_t, uint32_t, Activation, Activation, __half *, __half *) [with WIDTH=16U, INFERENCE=false]" 
(655): here

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(67): error: identifier "act_frag" is undefined
          detected during:
            instantiation of "void kernel_mlp_fused<WIDTH,BLOCK_DIM_Z,N_ITERS,OUT_T,INFERENCE>(Activation, Activation, const __half *, const __half *, OUT_T *, OUT_T *, uint32_t, uint32_t, uint32_t, uint32_t, int) [with WIDTH=16, BLOCK_DIM_Z=1, N_ITERS=8, OUT_T=__half, INFERENCE=false]" 
(564): here
            instantiation of "void ffmlp_forward_cuda<WIDTH,INFERENCE>(const __half *, const __half *, uint32_t, uint32_t, uint32_t, uint32_t, Activation, Activation, __half *, __half *) [with WIDTH=16U, INFERENCE=false]" 
(655): here

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(68): error: name followed by "::" must be a class or namespace name
          detected during:
            instantiation of "void kernel_mlp_fused<WIDTH,BLOCK_DIM_Z,N_ITERS,OUT_T,INFERENCE>(Activation, Activation, const __half *, const __half *, OUT_T *, OUT_T *, uint32_t, uint32_t, uint32_t, uint32_t, int) [with WIDTH=16, BLOCK_DIM_Z=1, N_ITERS=8, OUT_T=__half, INFERENCE=false]" 
(564): here
            instantiation of "void ffmlp_forward_cuda<WIDTH,INFERENCE>(const __half *, const __half *, uint32_t, uint32_t, uint32_t, uint32_t, Activation, Activation, __half *, __half *) [with WIDTH=16U, INFERENCE=false]" 
(655): here

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(68): error: type name is not allowed
          detected during:
            instantiation of "void kernel_mlp_fused<WIDTH,BLOCK_DIM_Z,N_ITERS,OUT_T,INFERENCE>(Activation, Activation, const __half *, const __half *, OUT_T *, OUT_T *, uint32_t, uint32_t, uint32_t, uint32_t, int) [with WIDTH=16, BLOCK_DIM_Z=1, N_ITERS=8, OUT_T=__half, INFERENCE=false]" 
(564): here
            instantiation of "void ffmlp_forward_cuda<WIDTH,INFERENCE>(const __half *, const __half *, uint32_t, uint32_t, uint32_t, uint32_t, Activation, Activation, __half *, __half *) [with WIDTH=16U, INFERENCE=false]" 
(655): here

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(68): error: type name is not allowed
          detected during:
            instantiation of "void kernel_mlp_fused<WIDTH,BLOCK_DIM_Z,N_ITERS,OUT_T,INFERENCE>(Activation, Activation, const __half *, const __half *, OUT_T *, OUT_T *, uint32_t, uint32_t, uint32_t, uint32_t, int) [with WIDTH=16, BLOCK_DIM_Z=1, N_ITERS=8, OUT_T=__half, INFERENCE=false]" 
(564): here
            instantiation of "void ffmlp_forward_cuda<WIDTH,INFERENCE>(const __half *, const __half *, uint32_t, uint32_t, uint32_t, uint32_t, Activation, Activation, __half *, __half *) [with WIDTH=16U, INFERENCE=false]" 
(655): here

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(68): error: identifier "weights_frag" is undefined
          detected during:
            instantiation of "void kernel_mlp_fused<WIDTH,BLOCK_DIM_Z,N_ITERS,OUT_T,INFERENCE>(Activation, Activation, const __half *, const __half *, OUT_T *, OUT_T *, uint32_t, uint32_t, uint32_t, uint32_t, int) [with WIDTH=16, BLOCK_DIM_Z=1, N_ITERS=8, OUT_T=__half, INFERENCE=false]" 
(564): here
            instantiation of "void ffmlp_forward_cuda<WIDTH,INFERENCE>(const __half *, const __half *, uint32_t, uint32_t, uint32_t, uint32_t, Activation, Activation, __half *, __half *) [with WIDTH=16U, INFERENCE=false]" 
(655): here

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(69): error: name followed by "::" must be a class or namespace name
          detected during:
            instantiation of "void kernel_mlp_fused<WIDTH,BLOCK_DIM_Z,N_ITERS,OUT_T,INFERENCE>(Activation, Activation, const __half *, const __half *, OUT_T *, OUT_T *, uint32_t, uint32_t, uint32_t, uint32_t, int) [with WIDTH=16, BLOCK_DIM_Z=1, N_ITERS=8, OUT_T=__half, INFERENCE=false]" 
(564): here
            instantiation of "void ffmlp_forward_cuda<WIDTH,INFERENCE>(const __half *, const __half *, uint32_t, uint32_t, uint32_t, uint32_t, Activation, Activation, __half *, __half *) [with WIDTH=16U, INFERENCE=false]" 
(655): here

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(69): error: type name is not allowed
          detected during:
            instantiation of "void kernel_mlp_fused<WIDTH,BLOCK_DIM_Z,N_ITERS,OUT_T,INFERENCE>(Activation, Activation, const __half *, const __half *, OUT_T *, OUT_T *, uint32_t, uint32_t, uint32_t, uint32_t, int) [with WIDTH=16, BLOCK_DIM_Z=1, N_ITERS=8, OUT_T=__half, INFERENCE=false]" 
(564): here
            instantiation of "void ffmlp_forward_cuda<WIDTH,INFERENCE>(const __half *, const __half *, uint32_t, uint32_t, uint32_t, uint32_t, Activation, Activation, __half *, __half *) [with WIDTH=16U, INFERENCE=false]" 
(655): here

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(69): error: identifier "result_frag" is undefined
          detected during:
            instantiation of "void kernel_mlp_fused<WIDTH,BLOCK_DIM_Z,N_ITERS,OUT_T,INFERENCE>(Activation, Activation, const __half *, const __half *, OUT_T *, OUT_T *, uint32_t, uint32_t, uint32_t, uint32_t, int) [with WIDTH=16, BLOCK_DIM_Z=1, N_ITERS=8, OUT_T=__half, INFERENCE=false]" 
(564): here
            instantiation of "void ffmlp_forward_cuda<WIDTH,INFERENCE>(const __half *, const __half *, uint32_t, uint32_t, uint32_t, uint32_t, Activation, Activation, __half *, __half *) [with WIDTH=16U, INFERENCE=false]" 
(655): here

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(88): error: name followed by "::" must be a class or namespace name
          detected during:
            instantiation of "void kernel_mlp_fused<WIDTH,BLOCK_DIM_Z,N_ITERS,OUT_T,INFERENCE>(Activation, Activation, const __half *, const __half *, OUT_T *, OUT_T *, uint32_t, uint32_t, uint32_t, uint32_t, int) [with WIDTH=16, BLOCK_DIM_Z=1, N_ITERS=8, OUT_T=__half, INFERENCE=false]" 
(564): here
            instantiation of "void ffmlp_forward_cuda<WIDTH,INFERENCE>(const __half *, const __half *, uint32_t, uint32_t, uint32_t, uint32_t, Activation, Activation, __half *, __half *) [with WIDTH=16U, INFERENCE=false]" 
(655): here

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(90): error: name followed by "::" must be a class or namespace name
          detected during:
            instantiation of "void kernel_mlp_fused<WIDTH,BLOCK_DIM_Z,N_ITERS,OUT_T,INFERENCE>(Activation, Activation, const __half *, const __half *, OUT_T *, OUT_T *, uint32_t, uint32_t, uint32_t, uint32_t, int) [with WIDTH=16, BLOCK_DIM_Z=1, N_ITERS=8, OUT_T=__half, INFERENCE=false]" 
(564): here
            instantiation of "void ffmlp_forward_cuda<WIDTH,INFERENCE>(const __half *, const __half *, uint32_t, uint32_t, uint32_t, uint32_t, Activation, Activation, __half *, __half *) [with WIDTH=16U, INFERENCE=false]" 
(655): here

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(96): error: name followed by "::" must be a class or namespace name
          detected during:
            instantiation of "void kernel_mlp_fused<WIDTH,BLOCK_DIM_Z,N_ITERS,OUT_T,INFERENCE>(Activation, Activation, const __half *, const __half *, OUT_T *, OUT_T *, uint32_t, uint32_t, uint32_t, uint32_t, int) [with WIDTH=16, BLOCK_DIM_Z=1, N_ITERS=8, OUT_T=__half, INFERENCE=false]" 
(564): here
            instantiation of "void ffmlp_forward_cuda<WIDTH,INFERENCE>(const __half *, const __half *, uint32_t, uint32_t, uint32_t, uint32_t, Activation, Activation, __half *, __half *) [with WIDTH=16U, INFERENCE=false]" 
(655): here

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(101): error: name followed by "::" must be a class or namespace name
          detected during:
            instantiation of "void kernel_mlp_fused<WIDTH,BLOCK_DIM_Z,N_ITERS,OUT_T,INFERENCE>(Activation, Activation, const __half *, const __half *, OUT_T *, OUT_T *, uint32_t, uint32_t, uint32_t, uint32_t, int) [with WIDTH=16, BLOCK_DIM_Z=1, N_ITERS=8, OUT_T=__half, INFERENCE=false]" 
(564): here
            instantiation of "void ffmlp_forward_cuda<WIDTH,INFERENCE>(const __half *, const __half *, uint32_t, uint32_t, uint32_t, uint32_t, Activation, Activation, __half *, __half *) [with WIDTH=16U, INFERENCE=false]" 
(655): here

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(102): error: name followed by "::" must be a class or namespace name
          detected during:
            instantiation of "void kernel_mlp_fused<WIDTH,BLOCK_DIM_Z,N_ITERS,OUT_T,INFERENCE>(Activation, Activation, const __half *, const __half *, OUT_T *, OUT_T *, uint32_t, uint32_t, uint32_t, uint32_t, int) [with WIDTH=16, BLOCK_DIM_Z=1, N_ITERS=8, OUT_T=__half, INFERENCE=false]" 
(564): here
            instantiation of "void ffmlp_forward_cuda<WIDTH,INFERENCE>(const __half *, const __half *, uint32_t, uint32_t, uint32_t, uint32_t, Activation, Activation, __half *, __half *) [with WIDTH=16U, INFERENCE=false]" 
(655): here

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(108): error: name followed by "::" must be a class or namespace name
          detected during:
            instantiation of "void kernel_mlp_fused<WIDTH,BLOCK_DIM_Z,N_ITERS,OUT_T,INFERENCE>(Activation, Activation, const __half *, const __half *, OUT_T *, OUT_T *, uint32_t, uint32_t, uint32_t, uint32_t, int) [with WIDTH=16, BLOCK_DIM_Z=1, N_ITERS=8, OUT_T=__half, INFERENCE=false]" 
(564): here
            instantiation of "void ffmlp_forward_cuda<WIDTH,INFERENCE>(const __half *, const __half *, uint32_t, uint32_t, uint32_t, uint32_t, Activation, Activation, __half *, __half *) [with WIDTH=16U, INFERENCE=false]" 
(655): here

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(119): error: name followed by "::" must be a class or namespace name
          detected during:
            instantiation of "void kernel_mlp_fused<WIDTH,BLOCK_DIM_Z,N_ITERS,OUT_T,INFERENCE>(Activation, Activation, const __half *, const __half *, OUT_T *, OUT_T *, uint32_t, uint32_t, uint32_t, uint32_t, int) [with WIDTH=16, BLOCK_DIM_Z=1, N_ITERS=8, OUT_T=__half, INFERENCE=false]" 
(564): here
            instantiation of "void ffmlp_forward_cuda<WIDTH,INFERENCE>(const __half *, const __half *, uint32_t, uint32_t, uint32_t, uint32_t, Activation, Activation, __half *, __half *) [with WIDTH=16U, INFERENCE=false]" 
(655): here

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(119): error: name followed by "::" must be a class or namespace name
          detected during:
            instantiation of "void kernel_mlp_fused<WIDTH,BLOCK_DIM_Z,N_ITERS,OUT_T,INFERENCE>(Activation, Activation, const __half *, const __half *, OUT_T *, OUT_T *, uint32_t, uint32_t, uint32_t, uint32_t, int) [with WIDTH=16, BLOCK_DIM_Z=1, N_ITERS=8, OUT_T=__half, INFERENCE=false]" 
(564): here
            instantiation of "void ffmlp_forward_cuda<WIDTH,INFERENCE>(const __half *, const __half *, uint32_t, uint32_t, uint32_t, uint32_t, Activation, Activation, __half *, __half *) [with WIDTH=16U, INFERENCE=false]" 
(655): here

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(165): error: name must be a namespace name
          detected during:
            instantiation of "void kernel_mlp_fused<WIDTH,BLOCK_DIM_Z,N_ITERS,OUT_T,INFERENCE>(Activation, Activation, const __half *, const __half *, OUT_T *, OUT_T *, uint32_t, uint32_t, uint32_t, uint32_t, int) [with WIDTH=16, BLOCK_DIM_Z=1, N_ITERS=8, OUT_T=__half, INFERENCE=false]" 
(564): here
            instantiation of "void ffmlp_forward_cuda<WIDTH,INFERENCE>(const __half *, const __half *, uint32_t, uint32_t, uint32_t, uint32_t, Activation, Activation, __half *, __half *) [with WIDTH=16U, INFERENCE=false]" 
(655): here

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(168): error: name followed by "::" must be a class or namespace name
          detected during:
            instantiation of "void kernel_mlp_fused<WIDTH,BLOCK_DIM_Z,N_ITERS,OUT_T,INFERENCE>(Activation, Activation, const __half *, const __half *, OUT_T *, OUT_T *, uint32_t, uint32_t, uint32_t, uint32_t, int) [with WIDTH=16, BLOCK_DIM_Z=1, N_ITERS=8, OUT_T=__half, INFERENCE=false]" 
(564): here
            instantiation of "void ffmlp_forward_cuda<WIDTH,INFERENCE>(const __half *, const __half *, uint32_t, uint32_t, uint32_t, uint32_t, Activation, Activation, __half *, __half *) [with WIDTH=16U, INFERENCE=false]" 
(655): here

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(168): error: type name is not allowed
          detected during:
            instantiation of "void kernel_mlp_fused<WIDTH,BLOCK_DIM_Z,N_ITERS,OUT_T,INFERENCE>(Activation, Activation, const __half *, const __half *, OUT_T *, OUT_T *, uint32_t, uint32_t, uint32_t, uint32_t, int) [with WIDTH=16, BLOCK_DIM_Z=1, N_ITERS=8, OUT_T=__half, INFERENCE=false]" 
(564): here
            instantiation of "void ffmlp_forward_cuda<WIDTH,INFERENCE>(const __half *, const __half *, uint32_t, uint32_t, uint32_t, uint32_t, Activation, Activation, __half *, __half *) [with WIDTH=16U, INFERENCE=false]" 
(655): here

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(168): error: name followed by "::" must be a class or namespace name
          detected during:
            instantiation of "void kernel_mlp_fused<WIDTH,BLOCK_DIM_Z,N_ITERS,OUT_T,INFERENCE>(Activation, Activation, const __half *, const __half *, OUT_T *, OUT_T *, uint32_t, uint32_t, uint32_t, uint32_t, int) [with WIDTH=16, BLOCK_DIM_Z=1, N_ITERS=8, OUT_T=__half, INFERENCE=false]" 
(564): here
            instantiation of "void ffmlp_forward_cuda<WIDTH,INFERENCE>(const __half *, const __half *, uint32_t, uint32_t, uint32_t, uint32_t, Activation, Activation, __half *, __half *) [with WIDTH=16U, INFERENCE=false]" 
(655): here

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(168): error: identifier "act_frag" is undefined
          detected during:
            instantiation of "void kernel_mlp_fused<WIDTH,BLOCK_DIM_Z,N_ITERS,OUT_T,INFERENCE>(Activation, Activation, const __half *, const __half *, OUT_T *, OUT_T *, uint32_t, uint32_t, uint32_t, uint32_t, int) [with WIDTH=16, BLOCK_DIM_Z=1, N_ITERS=8, OUT_T=__half, INFERENCE=false]" 
(564): here
            instantiation of "void ffmlp_forward_cuda<WIDTH,INFERENCE>(const __half *, const __half *, uint32_t, uint32_t, uint32_t, uint32_t, Activation, Activation, __half *, __half *) [with WIDTH=16U, INFERENCE=false]" 
(655): here

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(169): error: name followed by "::" must be a class or namespace name
          detected during:
            instantiation of "void kernel_mlp_fused<WIDTH,BLOCK_DIM_Z,N_ITERS,OUT_T,INFERENCE>(Activation, Activation, const __half *, const __half *, OUT_T *, OUT_T *, uint32_t, uint32_t, uint32_t, uint32_t, int) [with WIDTH=16, BLOCK_DIM_Z=1, N_ITERS=8, OUT_T=__half, INFERENCE=false]" 
(564): here
            instantiation of "void ffmlp_forward_cuda<WIDTH,INFERENCE>(const __half *, const __half *, uint32_t, uint32_t, uint32_t, uint32_t, Activation, Activation, __half *, __half *) [with WIDTH=16U, INFERENCE=false]" 
(655): here

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(169): error: type name is not allowed
          detected during:
            instantiation of "void kernel_mlp_fused<WIDTH,BLOCK_DIM_Z,N_ITERS,OUT_T,INFERENCE>(Activation, Activation, const __half *, const __half *, OUT_T *, OUT_T *, uint32_t, uint32_t, uint32_t, uint32_t, int) [with WIDTH=16, BLOCK_DIM_Z=1, N_ITERS=8, OUT_T=__half, INFERENCE=false]" 
(564): here
            instantiation of "void ffmlp_forward_cuda<WIDTH,INFERENCE>(const __half *, const __half *, uint32_t, uint32_t, uint32_t, uint32_t, Activation, Activation, __half *, __half *) [with WIDTH=16U, INFERENCE=false]" 
(655): here

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(169): error: name followed by "::" must be a class or namespace name
          detected during:
            instantiation of "void kernel_mlp_fused<WIDTH,BLOCK_DIM_Z,N_ITERS,OUT_T,INFERENCE>(Activation, Activation, const __half *, const __half *, OUT_T *, OUT_T *, uint32_t, uint32_t, uint32_t, uint32_t, int) [with WIDTH=16, BLOCK_DIM_Z=1, N_ITERS=8, OUT_T=__half, INFERENCE=false]" 
(564): here
            instantiation of "void ffmlp_forward_cuda<WIDTH,INFERENCE>(const __half *, const __half *, uint32_t, uint32_t, uint32_t, uint32_t, Activation, Activation, __half *, __half *) [with WIDTH=16U, INFERENCE=false]" 
(655): here

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(169): error: identifier "weights_frag" is undefined
          detected during:
            instantiation of "void kernel_mlp_fused<WIDTH,BLOCK_DIM_Z,N_ITERS,OUT_T,INFERENCE>(Activation, Activation, const __half *, const __half *, OUT_T *, OUT_T *, uint32_t, uint32_t, uint32_t, uint32_t, int) [with WIDTH=16, BLOCK_DIM_Z=1, N_ITERS=8, OUT_T=__half, INFERENCE=false]" 
(564): here
            instantiation of "void ffmlp_forward_cuda<WIDTH,INFERENCE>(const __half *, const __half *, uint32_t, uint32_t, uint32_t, uint32_t, Activation, Activation, __half *, __half *) [with WIDTH=16U, INFERENCE=false]" 
(655): here

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(170): error: name followed by "::" must be a class or namespace name
          detected during:
            instantiation of "void kernel_mlp_fused<WIDTH,BLOCK_DIM_Z,N_ITERS,OUT_T,INFERENCE>(Activation, Activation, const __half *, const __half *, OUT_T *, OUT_T *, uint32_t, uint32_t, uint32_t, uint32_t, int) [with WIDTH=16, BLOCK_DIM_Z=1, N_ITERS=8, OUT_T=__half, INFERENCE=false]" 
(564): here
            instantiation of "void ffmlp_forward_cuda<WIDTH,INFERENCE>(const __half *, const __half *, uint32_t, uint32_t, uint32_t, uint32_t, Activation, Activation, __half *, __half *) [with WIDTH=16U, INFERENCE=false]" 
(655): here

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(170): error: type name is not allowed
          detected during:
            instantiation of "void kernel_mlp_fused<WIDTH,BLOCK_DIM_Z,N_ITERS,OUT_T,INFERENCE>(Activation, Activation, const __half *, const __half *, OUT_T *, OUT_T *, uint32_t, uint32_t, uint32_t, uint32_t, int) [with WIDTH=16, BLOCK_DIM_Z=1, N_ITERS=8, OUT_T=__half, INFERENCE=false]" 
(564): here
            instantiation of "void ffmlp_forward_cuda<WIDTH,INFERENCE>(const __half *, const __half *, uint32_t, uint32_t, uint32_t, uint32_t, Activation, Activation, __half *, __half *) [with WIDTH=16U, INFERENCE=false]" 
(655): here

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(170): error: identifier "result_frag" is undefined
          detected during:
            instantiation of "void kernel_mlp_fused<WIDTH,BLOCK_DIM_Z,N_ITERS,OUT_T,INFERENCE>(Activation, Activation, const __half *, const __half *, OUT_T *, OUT_T *, uint32_t, uint32_t, uint32_t, uint32_t, int) [with WIDTH=16, BLOCK_DIM_Z=1, N_ITERS=8, OUT_T=__half, INFERENCE=false]" 
(564): here
            instantiation of "void ffmlp_forward_cuda<WIDTH,INFERENCE>(const __half *, const __half *, uint32_t, uint32_t, uint32_t, uint32_t, Activation, Activation, __half *, __half *) [with WIDTH=16U, INFERENCE=false]" 
(655): here

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(213): error: name followed by "::" must be a class or namespace name
          detected during:
            instantiation of "void kernel_mlp_fused<WIDTH,BLOCK_DIM_Z,N_ITERS,OUT_T,INFERENCE>(Activation, Activation, const __half *, const __half *, OUT_T *, OUT_T *, uint32_t, uint32_t, uint32_t, uint32_t, int) [with WIDTH=16, BLOCK_DIM_Z=1, N_ITERS=8, OUT_T=__half, INFERENCE=false]" 
(564): here
            instantiation of "void ffmlp_forward_cuda<WIDTH,INFERENCE>(const __half *, const __half *, uint32_t, uint32_t, uint32_t, uint32_t, Activation, Activation, __half *, __half *) [with WIDTH=16U, INFERENCE=false]" 
(655): here

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(217): error: name followed by "::" must be a class or namespace name
          detected during:
            instantiation of "void kernel_mlp_fused<WIDTH,BLOCK_DIM_Z,N_ITERS,OUT_T,INFERENCE>(Activation, Activation, const __half *, const __half *, OUT_T *, OUT_T *, uint32_t, uint32_t, uint32_t, uint32_t, int) [with WIDTH=16, BLOCK_DIM_Z=1, N_ITERS=8, OUT_T=__half, INFERENCE=false]" 
(564): here
            instantiation of "void ffmlp_forward_cuda<WIDTH,INFERENCE>(const __half *, const __half *, uint32_t, uint32_t, uint32_t, uint32_t, Activation, Activation, __half *, __half *) [with WIDTH=16U, INFERENCE=false]" 
(655): here

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(218): error: name followed by "::" must be a class or namespace name
          detected during:
            instantiation of "void kernel_mlp_fused<WIDTH,BLOCK_DIM_Z,N_ITERS,OUT_T,INFERENCE>(Activation, Activation, const __half *, const __half *, OUT_T *, OUT_T *, uint32_t, uint32_t, uint32_t, uint32_t, int) [with WIDTH=16, BLOCK_DIM_Z=1, N_ITERS=8, OUT_T=__half, INFERENCE=false]" 
(564): here
            instantiation of "void ffmlp_forward_cuda<WIDTH,INFERENCE>(const __half *, const __half *, uint32_t, uint32_t, uint32_t, uint32_t, Activation, Activation, __half *, __half *) [with WIDTH=16U, INFERENCE=false]" 
(655): here

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(219): error: name followed by "::" must be a class or namespace name
          detected during:
            instantiation of "void kernel_mlp_fused<WIDTH,BLOCK_DIM_Z,N_ITERS,OUT_T,INFERENCE>(Activation, Activation, const __half *, const __half *, OUT_T *, OUT_T *, uint32_t, uint32_t, uint32_t, uint32_t, int) [with WIDTH=16, BLOCK_DIM_Z=1, N_ITERS=8, OUT_T=__half, INFERENCE=false]" 
(564): here
            instantiation of "void ffmlp_forward_cuda<WIDTH,INFERENCE>(const __half *, const __half *, uint32_t, uint32_t, uint32_t, uint32_t, Activation, Activation, __half *, __half *) [with WIDTH=16U, INFERENCE=false]" 
(655): here

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(229): error: name followed by "::" must be a class or namespace name
          detected during:
            instantiation of "void kernel_mlp_fused<WIDTH,BLOCK_DIM_Z,N_ITERS,OUT_T,INFERENCE>(Activation, Activation, const __half *, const __half *, OUT_T *, OUT_T *, uint32_t, uint32_t, uint32_t, uint32_t, int) [with WIDTH=16, BLOCK_DIM_Z=1, N_ITERS=8, OUT_T=__half, INFERENCE=false]" 
(564): here
            instantiation of "void ffmlp_forward_cuda<WIDTH,INFERENCE>(const __half *, const __half *, uint32_t, uint32_t, uint32_t, uint32_t, Activation, Activation, __half *, __half *) [with WIDTH=16U, INFERENCE=false]" 
(655): here

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(229): error: name followed by "::" must be a class or namespace name
          detected during:
            instantiation of "void kernel_mlp_fused<WIDTH,BLOCK_DIM_Z,N_ITERS,OUT_T,INFERENCE>(Activation, Activation, const __half *, const __half *, OUT_T *, OUT_T *, uint32_t, uint32_t, uint32_t, uint32_t, int) [with WIDTH=16, BLOCK_DIM_Z=1, N_ITERS=8, OUT_T=__half, INFERENCE=false]" 
(564): here
            instantiation of "void ffmlp_forward_cuda<WIDTH,INFERENCE>(const __half *, const __half *, uint32_t, uint32_t, uint32_t, uint32_t, Activation, Activation, __half *, __half *) [with WIDTH=16U, INFERENCE=false]" 
(655): here

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(252): error: name must be a namespace name
          detected during:
            instantiation of "void kernel_mlp_fused<WIDTH,BLOCK_DIM_Z,N_ITERS,OUT_T,INFERENCE>(Activation, Activation, const __half *, const __half *, OUT_T *, OUT_T *, uint32_t, uint32_t, uint32_t, uint32_t, int) [with WIDTH=16, BLOCK_DIM_Z=1, N_ITERS=8, OUT_T=__half, INFERENCE=false]" 
(564): here
            instantiation of "void ffmlp_forward_cuda<WIDTH,INFERENCE>(const __half *, const __half *, uint32_t, uint32_t, uint32_t, uint32_t, Activation, Activation, __half *, __half *) [with WIDTH=16U, INFERENCE=false]" 
(655): here

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(255): error: name followed by "::" must be a class or namespace name
          detected during:
            instantiation of "void kernel_mlp_fused<WIDTH,BLOCK_DIM_Z,N_ITERS,OUT_T,INFERENCE>(Activation, Activation, const __half *, const __half *, OUT_T *, OUT_T *, uint32_t, uint32_t, uint32_t, uint32_t, int) [with WIDTH=16, BLOCK_DIM_Z=1, N_ITERS=8, OUT_T=__half, INFERENCE=false]" 
(564): here
            instantiation of "void ffmlp_forward_cuda<WIDTH,INFERENCE>(const __half *, const __half *, uint32_t, uint32_t, uint32_t, uint32_t, Activation, Activation, __half *, __half *) [with WIDTH=16U, INFERENCE=false]" 
(655): here

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(255): error: type name is not allowed
          detected during:
            instantiation of "void kernel_mlp_fused<WIDTH,BLOCK_DIM_Z,N_ITERS,OUT_T,INFERENCE>(Activation, Activation, const __half *, const __half *, OUT_T *, OUT_T *, uint32_t, uint32_t, uint32_t, uint32_t, int) [with WIDTH=16, BLOCK_DIM_Z=1, N_ITERS=8, OUT_T=__half, INFERENCE=false]" 
(564): here
            instantiation of "void ffmlp_forward_cuda<WIDTH,INFERENCE>(const __half *, const __half *, uint32_t, uint32_t, uint32_t, uint32_t, Activation, Activation, __half *, __half *) [with WIDTH=16U, INFERENCE=false]" 
(655): here

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(255): error: name followed by "::" must be a class or namespace name
          detected during:
            instantiation of "void kernel_mlp_fused<WIDTH,BLOCK_DIM_Z,N_ITERS,OUT_T,INFERENCE>(Activation, Activation, const __half *, const __half *, OUT_T *, OUT_T *, uint32_t, uint32_t, uint32_t, uint32_t, int) [with WIDTH=16, BLOCK_DIM_Z=1, N_ITERS=8, OUT_T=__half, INFERENCE=false]" 
(564): here
            instantiation of "void ffmlp_forward_cuda<WIDTH,INFERENCE>(const __half *, const __half *, uint32_t, uint32_t, uint32_t, uint32_t, Activation, Activation, __half *, __half *) [with WIDTH=16U, INFERENCE=false]" 
(655): here

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(255): error: identifier "act_frag" is undefined
          detected during:
            instantiation of "void kernel_mlp_fused<WIDTH,BLOCK_DIM_Z,N_ITERS,OUT_T,INFERENCE>(Activation, Activation, const __half *, const __half *, OUT_T *, OUT_T *, uint32_t, uint32_t, uint32_t, uint32_t, int) [with WIDTH=16, BLOCK_DIM_Z=1, N_ITERS=8, OUT_T=__half, INFERENCE=false]" 
(564): here
            instantiation of "void ffmlp_forward_cuda<WIDTH,INFERENCE>(const __half *, const __half *, uint32_t, uint32_t, uint32_t, uint32_t, Activation, Activation, __half *, __half *) [with WIDTH=16U, INFERENCE=false]" 
(655): here

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(256): error: name followed by "::" must be a class or namespace name
          detected during:
            instantiation of "void kernel_mlp_fused<WIDTH,BLOCK_DIM_Z,N_ITERS,OUT_T,INFERENCE>(Activation, Activation, const __half *, const __half *, OUT_T *, OUT_T *, uint32_t, uint32_t, uint32_t, uint32_t, int) [with WIDTH=16, BLOCK_DIM_Z=1, N_ITERS=8, OUT_T=__half, INFERENCE=false]" 
(564): here
            instantiation of "void ffmlp_forward_cuda<WIDTH,INFERENCE>(const __half *, const __half *, uint32_t, uint32_t, uint32_t, uint32_t, Activation, Activation, __half *, __half *) [with WIDTH=16U, INFERENCE=false]" 
(655): here

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(256): error: type name is not allowed
          detected during:
            instantiation of "void kernel_mlp_fused<WIDTH,BLOCK_DIM_Z,N_ITERS,OUT_T,INFERENCE>(Activation, Activation, const __half *, const __half *, OUT_T *, OUT_T *, uint32_t, uint32_t, uint32_t, uint32_t, int) [with WIDTH=16, BLOCK_DIM_Z=1, N_ITERS=8, OUT_T=__half, INFERENCE=false]" 
(564): here
            instantiation of "void ffmlp_forward_cuda<WIDTH,INFERENCE>(const __half *, const __half *, uint32_t, uint32_t, uint32_t, uint32_t, Activation, Activation, __half *, __half *) [with WIDTH=16U, INFERENCE=false]" 
(655): here

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(256): error: name followed by "::" must be a class or namespace name
          detected during:
            instantiation of "void kernel_mlp_fused<WIDTH,BLOCK_DIM_Z,N_ITERS,OUT_T,INFERENCE>(Activation, Activation, const __half *, const __half *, OUT_T *, OUT_T *, uint32_t, uint32_t, uint32_t, uint32_t, int) [with WIDTH=16, BLOCK_DIM_Z=1, N_ITERS=8, OUT_T=__half, INFERENCE=false]" 
(564): here
            instantiation of "void ffmlp_forward_cuda<WIDTH,INFERENCE>(const __half *, const __half *, uint32_t, uint32_t, uint32_t, uint32_t, Activation, Activation, __half *, __half *) [with WIDTH=16U, INFERENCE=false]" 
(655): here

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(256): error: identifier "weights_frag" is undefined
          detected during:
            instantiation of "void kernel_mlp_fused<WIDTH,BLOCK_DIM_Z,N_ITERS,OUT_T,INFERENCE>(Activation, Activation, const __half *, const __half *, OUT_T *, OUT_T *, uint32_t, uint32_t, uint32_t, uint32_t, int) [with WIDTH=16, BLOCK_DIM_Z=1, N_ITERS=8, OUT_T=__half, INFERENCE=false]" 
(564): here
            instantiation of "void ffmlp_forward_cuda<WIDTH,INFERENCE>(const __half *, const __half *, uint32_t, uint32_t, uint32_t, uint32_t, Activation, Activation, __half *, __half *) [with WIDTH=16U, INFERENCE=false]" 
(655): here

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(257): error: name followed by "::" must be a class or namespace name
          detected during:
            instantiation of "void kernel_mlp_fused<WIDTH,BLOCK_DIM_Z,N_ITERS,OUT_T,INFERENCE>(Activation, Activation, const __half *, const __half *, OUT_T *, OUT_T *, uint32_t, uint32_t, uint32_t, uint32_t, int) [with WIDTH=16, BLOCK_DIM_Z=1, N_ITERS=8, OUT_T=__half, INFERENCE=false]" 
(564): here
            instantiation of "void ffmlp_forward_cuda<WIDTH,INFERENCE>(const __half *, const __half *, uint32_t, uint32_t, uint32_t, uint32_t, Activation, Activation, __half *, __half *) [with WIDTH=16U, INFERENCE=false]" 
(655): here

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(257): error: type name is not allowed
          detected during:
            instantiation of "void kernel_mlp_fused<WIDTH,BLOCK_DIM_Z,N_ITERS,OUT_T,INFERENCE>(Activation, Activation, const __half *, const __half *, OUT_T *, OUT_T *, uint32_t, uint32_t, uint32_t, uint32_t, int) [with WIDTH=16, BLOCK_DIM_Z=1, N_ITERS=8, OUT_T=__half, INFERENCE=false]" 
(564): here
            instantiation of "void ffmlp_forward_cuda<WIDTH,INFERENCE>(const __half *, const __half *, uint32_t, uint32_t, uint32_t, uint32_t, Activation, Activation, __half *, __half *) [with WIDTH=16U, INFERENCE=false]" 
(655): here

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(257): error: identifier "result_frag" is undefined
          detected during:
            instantiation of "void kernel_mlp_fused<WIDTH,BLOCK_DIM_Z,N_ITERS,OUT_T,INFERENCE>(Activation, Activation, const __half *, const __half *, OUT_T *, OUT_T *, uint32_t, uint32_t, uint32_t, uint32_t, int) [with WIDTH=16, BLOCK_DIM_Z=1, N_ITERS=8, OUT_T=__half, INFERENCE=false]" 
(564): here
            instantiation of "void ffmlp_forward_cuda<WIDTH,INFERENCE>(const __half *, const __half *, uint32_t, uint32_t, uint32_t, uint32_t, Activation, Activation, __half *, __half *) [with WIDTH=16U, INFERENCE=false]" 
(655): here

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(280): error: name followed by "::" must be a class or namespace name
          detected during:
            instantiation of "void kernel_mlp_fused<WIDTH,BLOCK_DIM_Z,N_ITERS,OUT_T,INFERENCE>(Activation, Activation, const __half *, const __half *, OUT_T *, OUT_T *, uint32_t, uint32_t, uint32_t, uint32_t, int) [with WIDTH=16, BLOCK_DIM_Z=1, N_ITERS=8, OUT_T=__half, INFERENCE=false]" 
(564): here
            instantiation of "void ffmlp_forward_cuda<WIDTH,INFERENCE>(const __half *, const __half *, uint32_t, uint32_t, uint32_t, uint32_t, Activation, Activation, __half *, __half *) [with WIDTH=16U, INFERENCE=false]" 
(655): here

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(284): error: name followed by "::" must be a class or namespace name
          detected during:
            instantiation of "void kernel_mlp_fused<WIDTH,BLOCK_DIM_Z,N_ITERS,OUT_T,INFERENCE>(Activation, Activation, const __half *, const __half *, OUT_T *, OUT_T *, uint32_t, uint32_t, uint32_t, uint32_t, int) [with WIDTH=16, BLOCK_DIM_Z=1, N_ITERS=8, OUT_T=__half, INFERENCE=false]" 
(564): here
            instantiation of "void ffmlp_forward_cuda<WIDTH,INFERENCE>(const __half *, const __half *, uint32_t, uint32_t, uint32_t, uint32_t, Activation, Activation, __half *, __half *) [with WIDTH=16U, INFERENCE=false]" 
(655): here

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(289): error: name followed by "::" must be a class or namespace name
          detected during:
            instantiation of "void kernel_mlp_fused<WIDTH,BLOCK_DIM_Z,N_ITERS,OUT_T,INFERENCE>(Activation, Activation, const __half *, const __half *, OUT_T *, OUT_T *, uint32_t, uint32_t, uint32_t, uint32_t, int) [with WIDTH=16, BLOCK_DIM_Z=1, N_ITERS=8, OUT_T=__half, INFERENCE=false]" 
(564): here
            instantiation of "void ffmlp_forward_cuda<WIDTH,INFERENCE>(const __half *, const __half *, uint32_t, uint32_t, uint32_t, uint32_t, Activation, Activation, __half *, __half *) [with WIDTH=16U, INFERENCE=false]" 
(655): here

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(290): error: name followed by "::" must be a class or namespace name
          detected during:
            instantiation of "void kernel_mlp_fused<WIDTH,BLOCK_DIM_Z,N_ITERS,OUT_T,INFERENCE>(Activation, Activation, const __half *, const __half *, OUT_T *, OUT_T *, uint32_t, uint32_t, uint32_t, uint32_t, int) [with WIDTH=16, BLOCK_DIM_Z=1, N_ITERS=8, OUT_T=__half, INFERENCE=false]" 
(564): here
            instantiation of "void ffmlp_forward_cuda<WIDTH,INFERENCE>(const __half *, const __half *, uint32_t, uint32_t, uint32_t, uint32_t, Activation, Activation, __half *, __half *) [with WIDTH=16U, INFERENCE=false]" 
(655): here

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(295): error: identifier "output_layout" is undefined
          detected during:
            instantiation of "void kernel_mlp_fused<WIDTH,BLOCK_DIM_Z,N_ITERS,OUT_T,INFERENCE>(Activation, Activation, const __half *, const __half *, OUT_T *, OUT_T *, uint32_t, uint32_t, uint32_t, uint32_t, int) [with WIDTH=16, BLOCK_DIM_Z=1, N_ITERS=8, OUT_T=__half, INFERENCE=false]" 
(564): here
            instantiation of "void ffmlp_forward_cuda<WIDTH,INFERENCE>(const __half *, const __half *, uint32_t, uint32_t, uint32_t, uint32_t, Activation, Activation, __half *, __half *) [with WIDTH=16U, INFERENCE=false]" 
(655): here

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(295): error: name followed by "::" must be a class or namespace name
          detected during:
            instantiation of "void kernel_mlp_fused<WIDTH,BLOCK_DIM_Z,N_ITERS,OUT_T,INFERENCE>(Activation, Activation, const __half *, const __half *, OUT_T *, OUT_T *, uint32_t, uint32_t, uint32_t, uint32_t, int) [with WIDTH=16, BLOCK_DIM_Z=1, N_ITERS=8, OUT_T=__half, INFERENCE=false]" 
(564): here
            instantiation of "void ffmlp_forward_cuda<WIDTH,INFERENCE>(const __half *, const __half *, uint32_t, uint32_t, uint32_t, uint32_t, Activation, Activation, __half *, __half *) [with WIDTH=16U, INFERENCE=false]" 
(655): here

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(296): error: name followed by "::" must be a class or namespace name
          detected during:
            instantiation of "void kernel_mlp_fused<WIDTH,BLOCK_DIM_Z,N_ITERS,OUT_T,INFERENCE>(Activation, Activation, const __half *, const __half *, OUT_T *, OUT_T *, uint32_t, uint32_t, uint32_t, uint32_t, int) [with WIDTH=16, BLOCK_DIM_Z=1, N_ITERS=8, OUT_T=__half, INFERENCE=false]" 
(564): here
            instantiation of "void ffmlp_forward_cuda<WIDTH,INFERENCE>(const __half *, const __half *, uint32_t, uint32_t, uint32_t, uint32_t, Activation, Activation, __half *, __half *) [with WIDTH=16U, INFERENCE=false]" 
(655): here

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(299): error: name followed by "::" must be a class or namespace name
          detected during:
            instantiation of "void kernel_mlp_fused<WIDTH,BLOCK_DIM_Z,N_ITERS,OUT_T,INFERENCE>(Activation, Activation, const __half *, const __half *, OUT_T *, OUT_T *, uint32_t, uint32_t, uint32_t, uint32_t, int) [with WIDTH=16, BLOCK_DIM_Z=1, N_ITERS=8, OUT_T=__half, INFERENCE=false]" 
(564): here
            instantiation of "void ffmlp_forward_cuda<WIDTH,INFERENCE>(const __half *, const __half *, uint32_t, uint32_t, uint32_t, uint32_t, Activation, Activation, __half *, __half *) [with WIDTH=16U, INFERENCE=false]" 
(655): here

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(616): error: name followed by "::" must be a class or namespace name

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(618): error: name followed by "::" must be a class or namespace name

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(618): error: expected an identifier

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(618): error: "threads" has already been declared in the current scope

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(618): error: "shmem_size" has already been declared in the current scope

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(618): error: expected an identifier

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(619): error: parameter "activation" is not a type name

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(620): error: parameter "grad" is not a type name

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(621): error: variable "weights_second" is not a type name

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(622): error: parameter "backward_buffer" is not a type name

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(623): error: parameter "forward_buffer" is not a type name

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(624): error: parameter "grad_inputs" is not a type name

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(625): error: variable "weights_first" is not a type name

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(626): error: parameter "B" is not a type name

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(627): error: parameter "output_dim" is not a type name

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(628): error: parameter "num_layers" is not a type name

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(628): error: expected a ")"

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(597): warning: variable "weights_first" was declared but never referenced

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(598): warning: variable "weights_second" was declared but never referenced

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(606): warning: variable "threads" was declared but never referenced
          detected during instantiation of "void ffmlp_backward_cuda<WIDTH>(const __half *, const __half *, uint32_t, uint32_t, uint32_t, uint32_t, Activation, const __half *, __half *, __half *) [with WIDTH=16U]" 
(832): here

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(606): warning: variable "threads" was declared but never referenced
          detected during instantiation of "void ffmlp_backward_cuda<WIDTH>(const __half *, const __half *, uint32_t, uint32_t, uint32_t, uint32_t, Activation, const __half *, __half *, __half *) [with WIDTH=32U]" 
(833): here

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(606): warning: variable "threads" was declared but never referenced
          detected during instantiation of "void ffmlp_backward_cuda<WIDTH>(const __half *, const __half *, uint32_t, uint32_t, uint32_t, uint32_t, Activation, const __half *, __half *, __half *) [with WIDTH=64U]" 
(834): here

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(606): warning: variable "threads" was declared but never referenced
          detected during instantiation of "void ffmlp_backward_cuda<WIDTH>(const __half *, const __half *, uint32_t, uint32_t, uint32_t, uint32_t, Activation, const __half *, __half *, __half *) [with WIDTH=128U]" 
(835): here

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(606): warning: variable "threads" was declared but never referenced
          detected during instantiation of "void ffmlp_backward_cuda<WIDTH>(const __half *, const __half *, uint32_t, uint32_t, uint32_t, uint32_t, Activation, const __half *, __half *, __half *) [with WIDTH=256U]" 
(836): here

85 errors detected in the compilation of "/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu".
ninja: build stopped: subcommand failed.

from torch-ngp.

ashawkey commented on August 16, 2024

@wangjksjtu ffmlp uses cutlass, which also requires architecture >= 70 (here), maybe you could remove the import and usage in python script to avoid using ffmlp, as it doesn't help very much now.

from torch-ngp.

wangjksjtu commented on August 16, 2024

yeah, that is what I am doing now! However, I cannot obtain decent performance. Any thoughts? see issue #5

from torch-ngp.

ashawkey commented on August 16, 2024

@wangjksjtu thanks for spotting the bug, I have fixed it!

from torch-ngp.

aoliao12138 commented on August 16, 2024

@ashawkey Thanks for your reply! I solved it.

from torch-ngp.

ashawkey commented on August 16, 2024

Closed for now.

from torch-ngp.

Compilation issue - RuntimeError: Error building extension '_hash_encoder' about torch-ngp HOT 9 CLOSED

Comments (9)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent