xuchen-ethz / fast-snarf Goto Github PK

View Code? Open in Web Editor NEW

242.0 242.0 11.0 13 MB

License: MIT License

Python 74.04% Shell 0.24% C++ 2.94% Cuda 14.99% C 1.72% Cython 6.06%

fast-snarf's People

Contributors

Stargazers

Watchers

Forkers

chhaviilli kyuhyoung curtincomputing babyblue26 louhz jackzhousz shishenghuang daydreamer2023 fangwudi peterzs joeywen99

fast-snarf's Issues

error loading filter.cu for torch 1.11

Hello,
Nice work!

I am trying to integrate fast-snarf into another project which requires torch>=1.11 because I need some newer functionalities of torch. I see that you specified torch version to be 1.10.0. When I use a higher version, the file filter.cu cannot be loaded and gives the following error messages:



filter_cuda = load(name='filter',
...                    sources=[f'{cuda_dir}/filter/filter.cpp',
...                             f'{cuda_dir}/filter/filter.cu'])
Traceback (most recent call last):
  File "/opt/conda/envs/avatar/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1893, in _run_ninja_build
    subprocess.run(
  File "/opt/conda/envs/avatar/lib/python3.8/subprocess.py", line 516, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/opt/conda/envs/avatar/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1284, in load
    return _jit_compile(
  File "/opt/conda/envs/avatar/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1509, in _jit_compile
    _write_ninja_file_and_build_library(
  File "/opt/conda/envs/avatar/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1624, in _write_ninja_file_and_build_library
    _run_ninja_build(
  File "/opt/conda/envs/avatar/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1909, in _run_ninja_build
    raise RuntimeError(message) from e
RuntimeError: Error building extension 'filter': [1/3] /usr/local/cuda-11.3/bin/nvcc  -DTORCH_EXTENSION_NAME=filter -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /opt/conda/envs/avatar/lib/python3.8/site-packages/torch/include -isystem /opt/conda/envs/avatar/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /opt/conda/envs/avatar/lib/python3.8/site-packages/torch/include/TH -isystem /opt/conda/envs/avatar/lib/python3.8/site-packages/torch/include/THC -isystem /usr/local/cuda-11.3/include -isystem /opt/conda/envs/avatar/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -std=c++17 -c root/avatar/lib/cuda/filter/filter.cu -o filter.cuda.o 
FAILED: filter.cuda.o 
/usr/local/cuda-11.3/bin/nvcc  -DTORCH_EXTENSION_NAME=filter -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /opt/conda/envs/avatar/lib/python3.8/site-packages/torch/include -isystem /opt/conda/envs/avatar/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /opt/conda/envs/avatar/lib/python3.8/site-packages/torch/include/TH -isystem /opt/conda/envs/avatar/lib/python3.8/site-packages/torch/include/THC -isystem /usr/local/cuda-11.3/include -isystem /opt/conda/envs/avatar/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -std=c++17 -c root/avatar/lib/cuda/filter/filter.cu -o filter.cuda.o 
root/avatar/lib/cuda/filter/filter.cu(72): error: incomplete class type "at::Tensor" is not allowed

root/avatar/lib/cuda/filter/filter.cu(73): error: incomplete class type "at::Tensor" is not allowed

root/avatar/lib/cuda/filter/filter.cu(77): error: incomplete class type "at::Tensor" is not allowed

root/avatar/lib/cuda/filter/filter.cu(77): error: incomplete class type "at::Tensor" is not allowed

root/avatar/lib/cuda/filter/filter.cu(77): error: type name is not allowed

root/avatar/lib/cuda/filter/filter.cu(77): error: argument list for class template "at::RestrictPtrTraits" is missing

root/avatar/lib/cuda/filter/filter.cu(77): error: expected an expression

root/avatar/lib/cuda/filter/filter.cu(77): error: incomplete class type "at::Tensor" is not allowed

root/avatar/lib/cuda/filter/filter.cu(77): error: type name is not allowed

root/avatar/lib/cuda/filter/filter.cu(77): error: argument list for class template "at::RestrictPtrTraits" is missing

root/avatar/lib/cuda/filter/filter.cu(77): error: expected an expression

root/avatar/lib/cuda/filter/filter.cu(77): error: incomplete class type "at::Tensor" is not allowed

root/avatar/lib/cuda/filter/filter.cu(77): error: type name is not allowed

root/avatar/lib/cuda/filter/filter.cu(77): error: argument list for class template "at::RestrictPtrTraits" is missing

root/avatar/lib/cuda/filter/filter.cu(77): error: expected an expression

root/avatar/lib/cuda/filter/filter.cu(77): error: no instance of function template "filter" matches the argument list
            argument types are: (int, <error-type>, int, <error-type>, <error-type>, int, <error-type>, <error-type>, int, <error-type>)

root/avatar/lib/cuda/filter/filter.cu(77): error: incomplete class type "at::Tensor" is not allowed

root/avatar/lib/cuda/filter/filter.cu(77): error: type name is not allowed

root/avatar/lib/cuda/filter/filter.cu(77): error: argument list for class template "at::RestrictPtrTraits" is missing

root/avatar/lib/cuda/filter/filter.cu(77): error: expected an expression

root/avatar/lib/cuda/filter/filter.cu(77): error: incomplete class type "at::Tensor" is not allowed

root/avatar/lib/cuda/filter/filter.cu(77): error: type name is not allowed

root/avatar/lib/cuda/filter/filter.cu(77): error: argument list for class template "at::RestrictPtrTraits" is missing

root/avatar/lib/cuda/filter/filter.cu(77): error: expected an expression

root/avatar/lib/cuda/filter/filter.cu(77): error: incomplete class type "at::Tensor" is not allowed

root/avatar/lib/cuda/filter/filter.cu(77): error: type name is not allowed

root/avatar/lib/cuda/filter/filter.cu(77): error: argument list for class template "at::RestrictPtrTraits" is missing

root/avatar/lib/cuda/filter/filter.cu(77): error: expected an expression

root/avatar/lib/cuda/filter/filter.cu(77): error: no instance of function template "filter" matches the argument list
            argument types are: (int, <error-type>, int, <error-type>, <error-type>, int, <error-type>, <error-type>, int, <error-type>)

root/avatar/lib/cuda/filter/filter.cu(77): error: incomplete class type "at::Tensor" is not allowed

root/avatar/lib/cuda/filter/filter.cu(77): error: type name is not allowed

root/avatar/lib/cuda/filter/filter.cu(77): error: argument list for class template "at::RestrictPtrTraits" is missing

root/avatar/lib/cuda/filter/filter.cu(77): error: expected an expression

root/avatar/lib/cuda/filter/filter.cu(77): error: incomplete class type "at::Tensor" is not allowed

root/avatar/lib/cuda/filter/filter.cu(77): error: type name is not allowed

root/avatar/lib/cuda/filter/filter.cu(77): error: argument list for class template "at::RestrictPtrTraits" is missing

root/avatar/lib/cuda/filter/filter.cu(77): error: expected an expression

root/avatar/lib/cuda/filter/filter.cu(77): error: incomplete class type "at::Tensor" is not allowed

root/avatar/lib/cuda/filter/filter.cu(77): error: type name is not allowed

root/avatar/lib/cuda/filter/filter.cu(77): error: argument list for class template "at::RestrictPtrTraits" is missing

root/avatar/lib/cuda/filter/filter.cu(77): error: expected an expression

root/avatar/lib/cuda/filter/filter.cu(77): error: no instance of function template "filter" matches the argument list
            argument types are: (int, <error-type>, int, <error-type>, <error-type>, int, <error-type>, <error-type>, int, <error-type>)

42 errors detected in the compilation of "root/avatar/lib/cuda/filter/filter.cu".
[2/3] c++ -MMD -MF filter.o.d -DTORCH_EXTENSION_NAME=filter -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /opt/conda/envs/avatar/lib/python3.8/site-packages/torch/include -isystem /opt/conda/envs/avatar/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /opt/conda/envs/avatar/lib/python3.8/site-packages/torch/include/TH -isystem /opt/conda/envs/avatar/lib/python3.8/site-packages/torch/include/THC -isystem /usr/local/cuda-11.3/include -isystem /opt/conda/envs/avatar/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++17 -c root/avatar/lib/cuda/filter/filter.cpp -o filter.o 
ninja: build stopped: subcommand failed.

Do you know if anything in the cuda code needs to be modified to make it compatible for a torch 1.11.0?

Training won't converge.

Hi @xuchen-ethz!

Thanks so much for releasing the code!

I quickly tested this using the command python train.py subject=50002, and found really great speed-up!

But unfortunately, the loss did not converge to reasonable value. Do you have any idea on what could have gone wrong?

Appreciate your help!

About application of learning from images.

Dr. @xuchen-ethz
Do you have the plan to open soure the "More application (e.g. learning from images) will be announced later."
Looking forward to the progress.

Multi-gpu failed

Hi, thanks to your great work.

I just trying to make this module into my own projects. While I found when I tried to use multi-gpu for training. The code will report errors as:

precompute_cuda.precompute(self.lbs_voxel_final, tfs, voxel_d, voxel_J, self.offset, self.scale)

RuntimeError: CUDA error: an illegal memory access was encountered

And I just check the source code "train.py" in this repo and I found it also reports an error when I try to use gpu > 2:

RuntimeError: Cowardly refusing to serialize non-leaf tensor which requires_grad, since autograd does not support crossing process boundaries. If you just want to transfer the data, call detach() on the tensor before serializing (e.g., putting it on the queue).

Anyone meets this error before?

filter.so from cuda folder isnt generated

Please help me. I have been having issues running it. I have tried on different operating system and cuda versions etc but to no avail.

ImportError: Encountered error: /.../.../.cache/torch_extensions/py38_cu116/filter/filter.so: cannot open shared object file: No such file or directory when loading module 'lib.model.fast_snarf.ForwardDeformer'

The content of the folder is

test@test:~/.cache/torch_extensions/py38_cu116/filter$ ls
build.ninja  filter.o

The source of the error is below, i believe where you compile .cu kernels
/.../.../anaconda3/envs/fast_snarf/lib/python3.8/site-packages/hydra/_internal/utils.py

White line in demo on RTX 3090

Hello, thank you for your great work. I run your demo and find the output has strange white lines as the video shows. Do you know the reason?

aist.mp4

To make the environment work in RTX 3090, I updated the cudatoolkit to 11.3. And then I installed the following package:

PyTorch version: 1.11.0
Is debug build: False
CUDA used to build PyTorch: 11.3
ROCM used to build PyTorch: N/A

OS: Ubuntu 18.04.6 LTS (x86_64)
GCC version: (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
Clang version: Could not collect
CMake version: version 3.10.2
Libc version: glibc-2.27

Python version: 3.8.16 (default, Jan 17 2023, 23:13:24)  [GCC 11.2.0] (64-bit runtime)
Python platform: Linux-4.15.0-191-generic-x86_64-with-glibc2.17
Is CUDA available: True
CUDA runtime version: 11.3.58
GPU models and configuration: 
GPU 0: NVIDIA GeForce RTX 3090
GPU 1: NVIDIA GeForce RTX 3090
GPU 2: NVIDIA GeForce RTX 3090

Nvidia driver version: 465.19.01
cuDNN version: Probably one of the following:
/usr/local/cuda-11.3/targets/x86_64-linux/lib/libcudnn.so.8
/usr/local/cuda-11.3/targets/x86_64-linux/lib/libcudnn_adv_infer.so.8
/usr/local/cuda-11.3/targets/x86_64-linux/lib/libcudnn_adv_train.so.8
/usr/local/cuda-11.3/targets/x86_64-linux/lib/libcudnn_cnn_infer.so.8
/usr/local/cuda-11.3/targets/x86_64-linux/lib/libcudnn_cnn_train.so.8
/usr/local/cuda-11.3/targets/x86_64-linux/lib/libcudnn_ops_infer.so.8
/usr/local/cuda-11.3/targets/x86_64-linux/lib/libcudnn_ops_train.so.8
HIP runtime version: N/A
MIOpen runtime version: N/A

Versions of relevant libraries:
[pip3] numpy==1.23.5
[pip3] pytorch-lightning==1.5.0
[pip3] pytorch3d==0.7.2
[pip3] torch==1.11.0
[pip3] torchmetrics==0.11.1
[pip3] torchvision==0.12.0
[conda] blas                      1.0                         mkl    http://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free
[conda] cudatoolkit               11.3.1               ha36c431_9    nvidia
[conda] ffmpeg                    4.3                  hf484d3e_0    pytorch
[conda] mkl                       2021.4.0           h06a4308_640    http://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
[conda] mkl-service               2.4.0            py38h7f8727e_0    http://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
[conda] mkl_fft                   1.3.1            py38hd3c417c_0    http://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
[conda] mkl_random                1.2.2            py38h51133e4_0    http://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
[conda] numpy                     1.23.5           py38h14f4228_0    http://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
[conda] numpy-base                1.23.5           py38h31eccc5_0    http://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
[conda] pytorch                   1.11.0          py3.8_cuda11.3_cudnn8.2.0_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/pytorch
[conda] pytorch-lightning         1.5.0                    pypi_0    pypi
[conda] pytorch-mutex             1.0                        cuda    https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/pytorch
[conda] pytorch3d                 0.7.2                    pypi_0    pypi
[conda] torchmetrics              0.11.1                   pypi_0    pypi
[conda] torchvision               0.12.0               py38_cu113    https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/pytorch

Unable to free the intermediate values in the graph.

Hey, I was integrating Fast-Snarf with another project. However, I am not able to free up the gradients even though I am detaching the tensors of broyden cuda. Are you familiar with this error?

RuntimeError: Trying to backward through the graph a second time (or directly access saved tensors after they have already been freed). Saved intermediate values of the graph are freed when you call .backward() or autograd.grad(). Specify retain_graph=True if you need to backward through the graph a second time or if you need to access saved tensors after calling backward.

Can this code be used for smplx?

Can this code be used for smplx deformation? Since it's cuda code here which i'm not familiar with, i'm not sure if it can be used for other init_bones.
And i wonder how init_bones are selected? Any advice?
https://github.com/xuchen-ethz/fast-snarf/blob/1f8361c04717e2ca9246e8044b692c4ef04ff89f/lib/model/fast_snarf.py#L38C12-L38C12

background points and transforms

Hi @xuchen-ethz

Thanks for releasing the code!

Was wondering if the filter is capable of handling background points?

I understand it primarily filters diverged and converged points based on the threshold.
Does it mean it picks up points that don't contribute to the surface and mask those as well? (prevent them following the skeleton while deforming)

Is there a way to do the above?

Thanks!

Training task completion condition

Thank you for providing the nice code.
Under what conditions is the training completed? (Epoch? BCELoss threshold?).
I am not familiar with Pytorch-Lightning description ...

Weird demo results

Hi Xu,

Thanks for publishing your code!

I have run into some really weird results by following the demo instruction:

aist.mp4

Results are obtained by building from environment.yml. Tested on

CUDA-10.2
RTX 2080 Ti
Driver version 515.43.04

I am pretty sure something has went wrong. Could you please share your thought?

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.