mrphys / tensorflow-nufft Goto Github PK
View Code? Open in Web Editor NEWFast, Native Non-Uniform Fast Fourier Transform for TensorFlow
Home Page: https://mrphys.github.io/tensorflow-nufft/
License: Apache License 2.0
Fast, Native Non-Uniform Fast Fourier Transform for TensorFlow
Home Page: https://mrphys.github.io/tensorflow-nufft/
License: Apache License 2.0
I observe that we get NaNs coming up in my computations.
I have stopped my runs with conditional breakpoints and I have repeated the specific functions (tfft.nufft
, tfft.spread
etc.) and observed NaN's repeating once or twice. However, if i repeat the computations, usually, I dont get NaNs again
For now, I am writing my own wrapper to carry out recomputation in case this occurs, but I feel this can be something bigger like a missing _sync_threads
or cudaDeviceSynchronize
.
Sadly, given how random this issue is, I dont have a repro. I will see if I can setup one. For now this is a more of check if you faced this issue too in which case, do we have any idea?
I would love to debug deeper, is there a way to have a gdb
or cuda-gdb
setup?
The following code reproduces the problem:
with tf.device('/cpu:0'):
rank = 2
num_points = 20000
grid_shape = [128] * rank
batch_size = 100
rng = tf.random.Generator.from_seed(10)
points = rng.uniform([batch_size, num_points, rank], minval=-np.pi, maxval=np.pi)
source = tf.complex(tf.ones([batch_size, num_points]),
tf.zeros([batch_size, num_points]))
@tf.function
def parallel_nufft_adjoint(source, points):
def nufft_adjoint(inputs):
src, pts = inputs
return nufft_ops.nufft(src, pts, grid_shape=grid_shape,
transform_type='type_1',
fft_direction='backward')
return tf.map_fn(nufft_adjoint, [source, points],
parallel_iterations=4,
fn_output_signature=tf.TensorSpec(grid_shape, tf.complex64))
result = parallel_nufft_adjoint(source, points)
Even after a lot of efforts, we still seem to face NaN issues and sadly this seems to be verry erratic and random. I dont have a repro test at all...
From what I see, I think that this issue is only present in Graph Mode. @jmontalt does that hint at any issues possible? Is there any separate code paths possible for eager mode / graph mode?
I will try to use this issue to track its progress on what I observe.
I get the following issues sometimes (again, they are quite random, and hence quite hard to debug)
Failed to associate cuFFT plan with CUDA stream: 1
Mostly arising from:
tensorflow-nufft/tensorflow_nufft/cc/kernels/nufft_plan.cu.cc
Lines 1887 to 1890 in 0210901
Seems like cufft returns cufftInvalidPlan
which is error code 1. I am a bit clueless on how to start on this one. I might as usual try to get a repro soon though. @jmontalt any idea if you faced this and any pointers where to start?
Hello,
Thank you for this package!
We had our own version NUFFT based on Kaiser Bessel interpolation here: https://github.com/zaccharieramzi/tfkbnufft
However, this method as is based on CUDA and cufiNUFFT, is more efficient in terms of speed and memory
However, I feel that the gradients wrt to the trajectory is wrong. I remember debuging this exact same issue in the tfkbnufft
.
As we can see here, we had to add conjugates of dx
and dy
.
Equivalently, in these lines:
tensorflow-nufft/tensorflow_nufft/python/ops/nufft_ops.py
Lines 118 to 131 in 20d4fce
must be modified to:
if transform_type == 'type_2':
# print((tf.expand_dims(source, -(rank + 1)) * grid_points).shape, tf.expand_dims(points, -3).shape)
grad_points = nufft(tf.expand_dims(source, -(rank + 1)) * grid_points,
tf.expand_dims(points, -3),
transform_type='type_2',
fft_direction=fft_direction,
tol=tol) * tf.expand_dims(tf.math.conj(grad), -2) * imag_unit
if transform_type == 'type_1':
grad_points = nufft(tf.expand_dims(tf.math.conj(grad), -(rank + 1)) * grid_points,
tf.expand_dims(points, -3),
transform_type='type_2',
fft_direction=fft_direction,
tol=tol) * tf.expand_dims(source, -2) * imag_unit
You can possibly use this method for testing:
https://github.com/zaccharieramzi/tfkbnufft/blob/master/tfkbnufft/tests/ndft_test.py
I can make a PR if needed.
Currently the max_batch_size is heuristically calculated. While this is a great idea, I guess its also good to allow user to have freedom to also define this to have better control over the tradeoff of Speed
vs GPU memory
.
I can handle this PR if we agree its need, but not at the moment.
I feel it will help a lot to have scaling coded up inside the nufft operator, similar to https://github.com/chaithyagr/tfkbnufft/blob/da8de17bc5cb738d11150662d0876bec9efb54d8/tfkbnufft/nufft/fft_functions.py#L159 and https://github.com/chaithyagr/tfkbnufft/blob/da8de17bc5cb738d11150662d0876bec9efb54d8/tfkbnufft/nufft/fft_functions.py#L197 as it will help a lot to make sure that Op and AdjOp can match the scales for ortho
case.
Hi everyone,
first of all thanks for the nice package. Now to my two points:
Thanks and Regards
Stefan
I am trying to install the package via pip as mentioned. I checked that my Tensorflow version is 2.10 and supported. However I keep getting the error message "ERROR: Could not find a version that satisfies the requirement tensorflow-nufft (from versions: none). ERROR: No matching distribution found for tensorflow-nufft"
Currently the custom op does not implement the shape inference function.
Hi,
This isn't really an issue, I just have two clarification questions:
Thanks fo your help.
This would enable the NUFFT op to be used when the grid shape is not known statically.
tfft.util.estimate_density
, the utility to estimate density compensation weights for arbitrary trajectories, produces inaccurate results.
This needs to be investigated. Perhaps the NUFFT kernel is not suitable for the density estimation algorithm?
I'm working in a daskhub environment without cuda, and when I try to import the library I get the error NotFoundError: libcudart.so.11.0: cannot open shared object file: No such file or directory. I'd rather not have to try to install libcudart on this environment; is there a way to perform the
_nufft_ops = tf.load_op_library(
24 tf.compat.v1.resource_loader.get_path_to_datafile('_nufft_ops.so'))
operation without attempting to import the GPU version of the code, just a CPU implementation?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.