xuchen-ethz / fast-snarf Goto Github PK
View Code? Open in Web Editor NEWLicense: MIT License
License: MIT License
Hello,
Nice work!
I am trying to integrate fast-snarf into another project which requires torch>=1.11 because I need some newer functionalities of torch. I see that you specified torch version to be 1.10.0. When I use a higher version, the file filter.cu cannot be loaded and gives the following error messages:
filter_cuda = load(name='filter',
... sources=[f'{cuda_dir}/filter/filter.cpp',
... f'{cuda_dir}/filter/filter.cu'])
Traceback (most recent call last):
File "/opt/conda/envs/avatar/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1893, in _run_ninja_build
subprocess.run(
File "/opt/conda/envs/avatar/lib/python3.8/subprocess.py", line 516, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/opt/conda/envs/avatar/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1284, in load
return _jit_compile(
File "/opt/conda/envs/avatar/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1509, in _jit_compile
_write_ninja_file_and_build_library(
File "/opt/conda/envs/avatar/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1624, in _write_ninja_file_and_build_library
_run_ninja_build(
File "/opt/conda/envs/avatar/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1909, in _run_ninja_build
raise RuntimeError(message) from e
RuntimeError: Error building extension 'filter': [1/3] /usr/local/cuda-11.3/bin/nvcc -DTORCH_EXTENSION_NAME=filter -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /opt/conda/envs/avatar/lib/python3.8/site-packages/torch/include -isystem /opt/conda/envs/avatar/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /opt/conda/envs/avatar/lib/python3.8/site-packages/torch/include/TH -isystem /opt/conda/envs/avatar/lib/python3.8/site-packages/torch/include/THC -isystem /usr/local/cuda-11.3/include -isystem /opt/conda/envs/avatar/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -std=c++17 -c root/avatar/lib/cuda/filter/filter.cu -o filter.cuda.o
FAILED: filter.cuda.o
/usr/local/cuda-11.3/bin/nvcc -DTORCH_EXTENSION_NAME=filter -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /opt/conda/envs/avatar/lib/python3.8/site-packages/torch/include -isystem /opt/conda/envs/avatar/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /opt/conda/envs/avatar/lib/python3.8/site-packages/torch/include/TH -isystem /opt/conda/envs/avatar/lib/python3.8/site-packages/torch/include/THC -isystem /usr/local/cuda-11.3/include -isystem /opt/conda/envs/avatar/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -std=c++17 -c root/avatar/lib/cuda/filter/filter.cu -o filter.cuda.o
root/avatar/lib/cuda/filter/filter.cu(72): error: incomplete class type "at::Tensor" is not allowed
root/avatar/lib/cuda/filter/filter.cu(73): error: incomplete class type "at::Tensor" is not allowed
root/avatar/lib/cuda/filter/filter.cu(77): error: incomplete class type "at::Tensor" is not allowed
root/avatar/lib/cuda/filter/filter.cu(77): error: incomplete class type "at::Tensor" is not allowed
root/avatar/lib/cuda/filter/filter.cu(77): error: type name is not allowed
root/avatar/lib/cuda/filter/filter.cu(77): error: argument list for class template "at::RestrictPtrTraits" is missing
root/avatar/lib/cuda/filter/filter.cu(77): error: expected an expression
root/avatar/lib/cuda/filter/filter.cu(77): error: incomplete class type "at::Tensor" is not allowed
root/avatar/lib/cuda/filter/filter.cu(77): error: type name is not allowed
root/avatar/lib/cuda/filter/filter.cu(77): error: argument list for class template "at::RestrictPtrTraits" is missing
root/avatar/lib/cuda/filter/filter.cu(77): error: expected an expression
root/avatar/lib/cuda/filter/filter.cu(77): error: incomplete class type "at::Tensor" is not allowed
root/avatar/lib/cuda/filter/filter.cu(77): error: type name is not allowed
root/avatar/lib/cuda/filter/filter.cu(77): error: argument list for class template "at::RestrictPtrTraits" is missing
root/avatar/lib/cuda/filter/filter.cu(77): error: expected an expression
root/avatar/lib/cuda/filter/filter.cu(77): error: no instance of function template "filter" matches the argument list
argument types are: (int, <error-type>, int, <error-type>, <error-type>, int, <error-type>, <error-type>, int, <error-type>)
root/avatar/lib/cuda/filter/filter.cu(77): error: incomplete class type "at::Tensor" is not allowed
root/avatar/lib/cuda/filter/filter.cu(77): error: type name is not allowed
root/avatar/lib/cuda/filter/filter.cu(77): error: argument list for class template "at::RestrictPtrTraits" is missing
root/avatar/lib/cuda/filter/filter.cu(77): error: expected an expression
root/avatar/lib/cuda/filter/filter.cu(77): error: incomplete class type "at::Tensor" is not allowed
root/avatar/lib/cuda/filter/filter.cu(77): error: type name is not allowed
root/avatar/lib/cuda/filter/filter.cu(77): error: argument list for class template "at::RestrictPtrTraits" is missing
root/avatar/lib/cuda/filter/filter.cu(77): error: expected an expression
root/avatar/lib/cuda/filter/filter.cu(77): error: incomplete class type "at::Tensor" is not allowed
root/avatar/lib/cuda/filter/filter.cu(77): error: type name is not allowed
root/avatar/lib/cuda/filter/filter.cu(77): error: argument list for class template "at::RestrictPtrTraits" is missing
root/avatar/lib/cuda/filter/filter.cu(77): error: expected an expression
root/avatar/lib/cuda/filter/filter.cu(77): error: no instance of function template "filter" matches the argument list
argument types are: (int, <error-type>, int, <error-type>, <error-type>, int, <error-type>, <error-type>, int, <error-type>)
root/avatar/lib/cuda/filter/filter.cu(77): error: incomplete class type "at::Tensor" is not allowed
root/avatar/lib/cuda/filter/filter.cu(77): error: type name is not allowed
root/avatar/lib/cuda/filter/filter.cu(77): error: argument list for class template "at::RestrictPtrTraits" is missing
root/avatar/lib/cuda/filter/filter.cu(77): error: expected an expression
root/avatar/lib/cuda/filter/filter.cu(77): error: incomplete class type "at::Tensor" is not allowed
root/avatar/lib/cuda/filter/filter.cu(77): error: type name is not allowed
root/avatar/lib/cuda/filter/filter.cu(77): error: argument list for class template "at::RestrictPtrTraits" is missing
root/avatar/lib/cuda/filter/filter.cu(77): error: expected an expression
root/avatar/lib/cuda/filter/filter.cu(77): error: incomplete class type "at::Tensor" is not allowed
root/avatar/lib/cuda/filter/filter.cu(77): error: type name is not allowed
root/avatar/lib/cuda/filter/filter.cu(77): error: argument list for class template "at::RestrictPtrTraits" is missing
root/avatar/lib/cuda/filter/filter.cu(77): error: expected an expression
root/avatar/lib/cuda/filter/filter.cu(77): error: no instance of function template "filter" matches the argument list
argument types are: (int, <error-type>, int, <error-type>, <error-type>, int, <error-type>, <error-type>, int, <error-type>)
42 errors detected in the compilation of "root/avatar/lib/cuda/filter/filter.cu".
[2/3] c++ -MMD -MF filter.o.d -DTORCH_EXTENSION_NAME=filter -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /opt/conda/envs/avatar/lib/python3.8/site-packages/torch/include -isystem /opt/conda/envs/avatar/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /opt/conda/envs/avatar/lib/python3.8/site-packages/torch/include/TH -isystem /opt/conda/envs/avatar/lib/python3.8/site-packages/torch/include/THC -isystem /usr/local/cuda-11.3/include -isystem /opt/conda/envs/avatar/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++17 -c root/avatar/lib/cuda/filter/filter.cpp -o filter.o
ninja: build stopped: subcommand failed.
Do you know if anything in the cuda code needs to be modified to make it compatible for a torch 1.11.0?
Hi @xuchen-ethz!
Thanks so much for releasing the code!
I quickly tested this using the command python train.py subject=50002
, and found really great speed-up!
But unfortunately, the loss did not converge to reasonable value. Do you have any idea on what could have gone wrong?
Appreciate your help!
Dr. @xuchen-ethz
Do you have the plan to open soure the "More application (e.g. learning from images) will be announced later."
Looking forward to the progress.
Hi, thanks to your great work.
I just trying to make this module into my own projects. While I found when I tried to use multi-gpu for training. The code will report errors as:
precompute_cuda.precompute(self.lbs_voxel_final, tfs, voxel_d, voxel_J, self.offset, self.scale)
RuntimeError: CUDA error: an illegal memory access was encountered
And I just check the source code "train.py" in this repo and I found it also reports an error when I try to use gpu > 2:
RuntimeError: Cowardly refusing to serialize non-leaf tensor which requires_grad, since autograd does not support crossing process boundaries. If you just want to transfer the data, call detach() on the tensor before serializing (e.g., putting it on the queue).
Anyone meets this error before?
Please help me. I have been having issues running it. I have tried on different operating system and cuda versions etc but to no avail.
ImportError: Encountered error:
/.../.../.cache/torch_extensions/py38_cu116/filter/filter.so: cannot open shared object file: No such file or directory when loading module 'lib.model.fast_snarf.ForwardDeformer'
The content of the folder is
test@test:~/.cache/torch_extensions/py38_cu116/filter$ ls
build.ninja filter.o
The source of the error is below, i believe where you compile .cu kernels
/.../.../anaconda3/envs/fast_snarf/lib/python3.8/site-packages/hydra/_internal/utils.py
Hello, thank you for your great work. I run your demo and find the output has strange white lines as the video shows. Do you know the reason?
To make the environment work in RTX 3090, I updated the cudatoolkit to 11.3. And then I installed the following package:
PyTorch version: 1.11.0
Is debug build: False
CUDA used to build PyTorch: 11.3
ROCM used to build PyTorch: N/A
OS: Ubuntu 18.04.6 LTS (x86_64)
GCC version: (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
Clang version: Could not collect
CMake version: version 3.10.2
Libc version: glibc-2.27
Python version: 3.8.16 (default, Jan 17 2023, 23:13:24) [GCC 11.2.0] (64-bit runtime)
Python platform: Linux-4.15.0-191-generic-x86_64-with-glibc2.17
Is CUDA available: True
CUDA runtime version: 11.3.58
GPU models and configuration:
GPU 0: NVIDIA GeForce RTX 3090
GPU 1: NVIDIA GeForce RTX 3090
GPU 2: NVIDIA GeForce RTX 3090
Nvidia driver version: 465.19.01
cuDNN version: Probably one of the following:
/usr/local/cuda-11.3/targets/x86_64-linux/lib/libcudnn.so.8
/usr/local/cuda-11.3/targets/x86_64-linux/lib/libcudnn_adv_infer.so.8
/usr/local/cuda-11.3/targets/x86_64-linux/lib/libcudnn_adv_train.so.8
/usr/local/cuda-11.3/targets/x86_64-linux/lib/libcudnn_cnn_infer.so.8
/usr/local/cuda-11.3/targets/x86_64-linux/lib/libcudnn_cnn_train.so.8
/usr/local/cuda-11.3/targets/x86_64-linux/lib/libcudnn_ops_infer.so.8
/usr/local/cuda-11.3/targets/x86_64-linux/lib/libcudnn_ops_train.so.8
HIP runtime version: N/A
MIOpen runtime version: N/A
Versions of relevant libraries:
[pip3] numpy==1.23.5
[pip3] pytorch-lightning==1.5.0
[pip3] pytorch3d==0.7.2
[pip3] torch==1.11.0
[pip3] torchmetrics==0.11.1
[pip3] torchvision==0.12.0
[conda] blas 1.0 mkl http://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free
[conda] cudatoolkit 11.3.1 ha36c431_9 nvidia
[conda] ffmpeg 4.3 hf484d3e_0 pytorch
[conda] mkl 2021.4.0 h06a4308_640 http://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
[conda] mkl-service 2.4.0 py38h7f8727e_0 http://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
[conda] mkl_fft 1.3.1 py38hd3c417c_0 http://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
[conda] mkl_random 1.2.2 py38h51133e4_0 http://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
[conda] numpy 1.23.5 py38h14f4228_0 http://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
[conda] numpy-base 1.23.5 py38h31eccc5_0 http://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
[conda] pytorch 1.11.0 py3.8_cuda11.3_cudnn8.2.0_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/pytorch
[conda] pytorch-lightning 1.5.0 pypi_0 pypi
[conda] pytorch-mutex 1.0 cuda https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/pytorch
[conda] pytorch3d 0.7.2 pypi_0 pypi
[conda] torchmetrics 0.11.1 pypi_0 pypi
[conda] torchvision 0.12.0 py38_cu113 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/pytorch
Hey, I was integrating Fast-Snarf with another project. However, I am not able to free up the gradients even though I am detaching the tensors of broyden cuda. Are you familiar with this error?
RuntimeError: Trying to backward through the graph a second time (or directly access saved tensors after they have already been freed). Saved intermediate values of the graph are freed when you call .backward() or autograd.grad(). Specify retain_graph=True if you need to backward through the graph a second time or if you need to access saved tensors after calling backward.
Can this code be used for smplx deformation? Since it's cuda code here which i'm not familiar with, i'm not sure if it can be used for other init_bones.
And i wonder how init_bones are selected? Any advice?
https://github.com/xuchen-ethz/fast-snarf/blob/1f8361c04717e2ca9246e8044b692c4ef04ff89f/lib/model/fast_snarf.py#L38C12-L38C12
Hi @xuchen-ethz
Thanks for releasing the code!
Was wondering if the filter is capable of handling background points?
I understand it primarily filters diverged and converged points based on the threshold.
Does it mean it picks up points that don't contribute to the surface and mask those as well? (prevent them following the skeleton while deforming)
Is there a way to do the above?
Thanks!
Thank you for providing the nice code.
Under what conditions is the training completed? (Epoch? BCELoss threshold?).
I am not familiar with Pytorch-Lightning description ...
Hi Xu,
Thanks for publishing your code!
I have run into some really weird results by following the demo instruction:
Results are obtained by building from environment.yml
. Tested on
CUDA-10.2
RTX 2080 Ti
Driver version 515.43.04
I am pretty sure something has went wrong. Could you please share your thought?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.