Code Monkey home page Code Monkey logo

Comments (3)

JunhuaLiu0 avatar JunhuaLiu0 commented on August 16, 2024

I have a similar situation to you. The first two stages are normal. While I have tried several times to run stage 3, the error always keeps coming out and seems like endless:
jaxlib.xla_extension.XlaRuntimeError: INTERNAL: stream did not block host until done; was already in an error state
2022-10-01 05:44:29.136328: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:1047] could not synchronize on CUDA context: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered :: *** Begin stack trace ***

Strangely, I do not take so much time running stages 1 and 2, but It takes so long to run stage 3.

from jax3d.

Wushuangpin avatar Wushuangpin commented on August 16, 2024

I also have a similar situation.
root@container-c42911ad3c-daf18020:~/mobilenerf# python stage3.py && shutdown train images: (100, 800, 800, 3) c2w: (100, 4, 4) hwf: (3,) test images: (200, 800, 800, 3) c2w: (200, 4, 4) hwf: (3,) Number of quad faces: 137472 Removing invisible triangles 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [53:41<00:00, 32.22s/it] Removing invisible triangles 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 300/300 [2:40:52<00:00, 32.17s/it] Number of quad faces: 89443 Testing 46%|███████████████████████████████████████████████████████████████████▍ | 93/200 [50:11<57:10, 32.06s/it]2022-09-24 19:30:13.127196: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:1163] failed to enqueue async memcpy from device to host: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered; host dst: 0x559810d57c40; GPU src: 0x7f9e77657600; size: 32768=0x8000 2022-09-24 19:30:13.127242: E external/org_tensorflow/tensorflow/stream_executor/stream.cc:344] Error recording event in stream: Error recording CUDA event: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered; not marking stream as bad, as the Event object may be at fault. Monitor for further errors. 2022-09-24 19:30:13.127256: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:618] unable to add host callback: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered 46%|███████████████████████████████████████████████████████████████████▍ | 93/200 [50:44<58:22, 32.73s/it] Traceback (most recent call last): File "stage3.py", line 2222, in <module> out = render_loop(camera_ray_batch(p, hwf), vars, point_UV_grid, texture_alpha, texture_features, test_batch_size) File "stage3.py", line 2071, in render_loop outs = [render_test([x[i:i+chunk] for x in rays], vars, uv, alp, feat) File "stage3.py", line 2071, in <listcomp> outs = [render_test([x[i:i+chunk] for x in rays], vars, uv, alp, feat) File "stage3.py", line 2055, in render_test selected_uv = numpy.array(selected_uv) File "/root/miniconda3/lib/python3.8/site-packages/jax/_src/device_array.py", line 264, in __array__ return np.asarray(self._value, dtype=dtype) File "/root/miniconda3/lib/python3.8/site-packages/jax/interpreters/pxla.py", line 674, in _sda_value npy_value[self.indices[i]] = self.device_buffers[i].to_py() jaxlib.xla_extension.XlaRuntimeError: INTERNAL: stream did not block host until done; was already in an error state 2022-09-24 19:30:13.284806: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:1047] could not synchronize on CUDA context: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered :: *** Begin stack trace *** _PyGC_CollectNoFail PyImport_Cleanup Py_FinalizeEx Py_RunMain Py_BytesMain __libc_start_main *** End stack trace *** 2022-09-24 19:30:13.285110: F external/org_tensorflow/tensorflow/compiler/xla/service/gpu/gpu_executable.cc:284] Check failed: pair.first->SynchronizeAllActivity() Aborted (core dumped)

from jax3d.

cyz2727327 avatar cyz2727327 commented on August 16, 2024

I fixed this issue by using Linux Ubuntu system, now I have a new problem when trying to run real360 data: not enough memory

from jax3d.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.