Comments (3)
I have a similar situation to you. The first two stages are normal. While I have tried several times to run stage 3, the error always keeps coming out and seems like endless:
jaxlib.xla_extension.XlaRuntimeError: INTERNAL: stream did not block host until done; was already in an error state
2022-10-01 05:44:29.136328: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:1047] could not synchronize on CUDA context: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered :: *** Begin stack trace ***
Strangely, I do not take so much time running stages 1 and 2, but It takes so long to run stage 3.
from jax3d.
I also have a similar situation.
root@container-c42911ad3c-daf18020:~/mobilenerf# python stage3.py && shutdown train images: (100, 800, 800, 3) c2w: (100, 4, 4) hwf: (3,) test images: (200, 800, 800, 3) c2w: (200, 4, 4) hwf: (3,) Number of quad faces: 137472 Removing invisible triangles 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [53:41<00:00, 32.22s/it] Removing invisible triangles 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 300/300 [2:40:52<00:00, 32.17s/it] Number of quad faces: 89443 Testing 46%|███████████████████████████████████████████████████████████████████▍ | 93/200 [50:11<57:10, 32.06s/it]2022-09-24 19:30:13.127196: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:1163] failed to enqueue async memcpy from device to host: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered; host dst: 0x559810d57c40; GPU src: 0x7f9e77657600; size: 32768=0x8000 2022-09-24 19:30:13.127242: E external/org_tensorflow/tensorflow/stream_executor/stream.cc:344] Error recording event in stream: Error recording CUDA event: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered; not marking stream as bad, as the Event object may be at fault. Monitor for further errors. 2022-09-24 19:30:13.127256: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:618] unable to add host callback: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered 46%|███████████████████████████████████████████████████████████████████▍ | 93/200 [50:44<58:22, 32.73s/it] Traceback (most recent call last): File "stage3.py", line 2222, in <module> out = render_loop(camera_ray_batch(p, hwf), vars, point_UV_grid, texture_alpha, texture_features, test_batch_size) File "stage3.py", line 2071, in render_loop outs = [render_test([x[i:i+chunk] for x in rays], vars, uv, alp, feat) File "stage3.py", line 2071, in <listcomp> outs = [render_test([x[i:i+chunk] for x in rays], vars, uv, alp, feat) File "stage3.py", line 2055, in render_test selected_uv = numpy.array(selected_uv) File "/root/miniconda3/lib/python3.8/site-packages/jax/_src/device_array.py", line 264, in __array__ return np.asarray(self._value, dtype=dtype) File "/root/miniconda3/lib/python3.8/site-packages/jax/interpreters/pxla.py", line 674, in _sda_value npy_value[self.indices[i]] = self.device_buffers[i].to_py() jaxlib.xla_extension.XlaRuntimeError: INTERNAL: stream did not block host until done; was already in an error state 2022-09-24 19:30:13.284806: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:1047] could not synchronize on CUDA context: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered :: *** Begin stack trace *** _PyGC_CollectNoFail PyImport_Cleanup Py_FinalizeEx Py_RunMain Py_BytesMain __libc_start_main *** End stack trace *** 2022-09-24 19:30:13.285110: F external/org_tensorflow/tensorflow/compiler/xla/service/gpu/gpu_executable.cc:284] Check failed: pair.first->SynchronizeAllActivity() Aborted (core dumped)
from jax3d.
I fixed this issue by using Linux Ubuntu system, now I have a new problem when trying to run real360 data: not enough memory
from jax3d.
Related Issues (20)
- There is no render_semantic_lib file
- Testing Nesf
- MultiNerf Result Samples HOT 3
- Deeplab v3 pretrained model
- Using eight A40GPUs to run the real360 model, the result is not ideal HOT 1
- Massive difference between stage3 psnr and the resulting mesh HOT 1
- Please provide trained models HOT 5
- Has anyone tried rendering multiple models at the same time? HOT 1
- NeSF dataset ground truth labels
- test result HOT 1
- .
- Will subjective results on datasets be published? HOT 2
- [MobileNerf] Integrating result to unity or omniverse HOT 1
- MobileNeRF Inference on server side GPU
- Running MobileNeRF on non-GPU server HOT 5
- Creating custom nerf model and use it in ue5 using single 3080, Is it workable?
- Trained model for mobilenerf
- How do I generate a dataset for real 360? HOT 3
- Installation Issue from flax vs jax compatibility HOT 4
- How to train? HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from jax3d.