tsc2017 / frechet-inception-distance Goto Github PK
View Code? Open in Web Editor NEWCPU/GPU/TPU implementation of the Fréchet Inception Distance
CPU/GPU/TPU implementation of the Fréchet Inception Distance
2019-04-19 21:16:23.177552: E tensorflow/stream_executor/cuda/cuda_blas.cc:654] failed to run cuBLAS routine cublasDgemm_v2: CUBLAS_STATUS_EXECUTION_FAILED Traceback (most recent call last): File "/home/iie/.conda/envs/env36/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1322, in _do_call return fn(*args) File "/home/iie/.conda/envs/env36/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1307, in _run_fn options, feed_dict, fetch_list, target_list, run_metadata) File "/home/iie/.conda/envs/env36/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1409, in _call_tf_sessionrun run_metadata) tensorflow.python.framework.errors_impl.InternalError: Blas GEMM launch failed : a.shape=(2048, 2048), b.shape=(2048, 2048), m=2048, n=2048, k=2048 [[Node: MatMul_6 = MatMul[T=DT_DOUBLE, transpose_a=false, transpose_b=false, _device="/job:localhost/replica:0/task:0/device:GPU:0"](Svd_1:1, Diag_1)]]
I try to calculate FID of to batch image,and the results is nan.I try many times but it always is nan.Can you help me? Thanks so much!
Error:
InvalidArgumentError Traceback (most recent call last)
/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py in _do_call(self, fn, *args)
1364 try:
-> 1365 return fn(*args)
1366 except errors.OpError as e:
8 frames
/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py in _run_fn(feed_dict, fetch_list, target_list, options, run_metadata)
1349 return self._call_tf_sessionrun(options, feed_dict, fetch_list,
-> 1350 target_list, run_metadata)
1351
/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py in _call_tf_sessionrun(self, options, feed_dict, fetch_list, target_list, run_metadata)
1442 fetch_list, target_list,
-> 1443 run_metadata)
1444
InvalidArgumentError: No OpKernel was registered to support Op 'TPUReplicateMetadata' used by {{node TPUReplicateMetadata}}with these attrs: [topology="", step_marker_location="STEP_MARK_AT_ENTRY", allow_soft_placement=false, num_cores_per_replica=1, use_tpu=true, num_replicas=8, computation_shape=[], host_compute_core=[], device_assignment=[], padding_map=[], _tpu_replicate="cluster"]
Registered devices: [CPU, XLA_CPU]
Registered kernels:
<no registered kernels>
[[TPUReplicateMetadata]]
During handling of the above exception, another exception occurred:
InvalidArgumentError Traceback (most recent call last)
<ipython-input-5-df8550f415a5> in <module>()
----> 1 get_fid(images1, images2, session, strategy)
<ipython-input-2-d9e7a5cdebce> in get_fid(images1, images2, session, strategy)
75 print('Calculating FID with %i images from each distribution' % (images1.shape[0]))
76 start_time = time.time()
---> 77 act1 = get_inception_activations(images1, session, strategy)
78 act2 = get_inception_activations(images2, session, strategy)
79 fid = activations2distance(act1, act2, session)
<ipython-input-2-d9e7a5cdebce> in get_inception_activations(inps, session, strategy)
57 for i in range(n_batches):
58 inp = inps[i * BATCH_SIZE : (i + 1) * BATCH_SIZE] / 255. * 2 - 1
---> 59 act[i * BATCH_SIZE : i * BATCH_SIZE + min(BATCH_SIZE, inp.shape[0])] = session.run(activations[0], feed_dict = {inception_images[0]: inp})
60 return act
61
/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py in run(self, fetches, feed_dict, options, run_metadata)
954 try:
955 result = self._run(None, fetches, feed_dict, options_ptr,
--> 956 run_metadata_ptr)
957 if run_metadata:
958 proto_data = tf_session.TF_GetBuffer(run_metadata_ptr)
/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py in _run(self, handle, fetches, feed_dict, options, run_metadata)
1178 if final_fetches or final_targets or (handle and feed_dict_tensor):
1179 results = self._do_run(handle, final_targets, final_fetches,
-> 1180 feed_dict_tensor, options, run_metadata)
1181 else:
1182 results = []
/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py in _do_run(self, handle, target_list, fetch_list, feed_dict, options, run_metadata)
1357 if handle is None:
1358 return self._do_call(_run_fn, feeds, fetches, targets, options,
-> 1359 run_metadata)
1360 else:
1361 return self._do_call(_prun_fn, handle, feeds, fetches)
/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py in _do_call(self, fn, *args)
1382 '\nsession_config.graph_options.rewrite_options.'
1383 'disable_meta_optimizer = True')
-> 1384 raise type(e)(node_def, op, message)
1385
1386 def _extend_graph(self):
InvalidArgumentError: No OpKernel was registered to support Op 'TPUReplicateMetadata' used by node TPUReplicateMetadata (defined at /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py:1748) with these attrs: [topology="", step_marker_location="STEP_MARK_AT_ENTRY", allow_soft_placement=false, num_cores_per_replica=1, use_tpu=true, num_replicas=8, computation_shape=[], host_compute_core=[], device_assignment=[], padding_map=[], _tpu_replicate="cluster"]
Registered devices: [CPU, XLA_CPU]
Registered kernels:
<no registered kernels>
[[TPUReplicateMetadata]]
I was able to reproduce this issue on Colab. This code from fid_tpu_tf1.py and run on TF1, also I ran fid_tpu.py on TF2 and got the same error.
I run the fid,py, but the error accurs:
``ssh://[email protected]:22/home/wangchy/anaconda3/bin/python3 -u /home/wangchy/.pycharm_helpers/pydev/pydevd.py --multiproc --qt-support=auto --client 0.0.0.0 --port 38755 --file /home/wangchy/wcy/FAGFSR/metrics/fid.py
pydev debugger: process 65872 is connecting
Connected to pydev debugger (build 193.5662.61)
pydev debugger: warning: trying to add breakpoint to file that does not exist: /home/wangchy/wcy/FAGFSR/model/fishsrnet.py (will have no effect)
2020-10-20 20:23:46.970816: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcudart.so.10.1'; dlerror: libcudart.so.10.1: cannot open shared object file: No such file or directory
2020-10-20 20:23:46.970864: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
WARNING:tensorflow:From /home/wangchy/anaconda3/lib/python3.7/site-packages/tensorflow/python/compat/v2_compat.py:96: disable_resource_variables (from tensorflow.python.ops.variable_scope) is deprecated and will be removed in a future version.
Instructions for updating:
non-resource variables are not supported in the long term
WARNING:tensorflow:From /home/wangchy/anaconda3/lib/python3.7/site-packages/tensorflow_gan/python/estimator/tpu_gan_estimator.py:42: The name tf.estimator.tpu.TPUEstimator is deprecated. Please use tf.compat.v1.estimator.tpu.TPUEstimator instead.
2020-10-20 20:23:55.176133: I tensorflow/core/platform/profile_utils/cpu_utils.cc:104] CPU Frequency: 2394330000 Hz
2020-10-20 20:23:55.180586: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55dcd0e64260 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-10-20 20:23:55.180654: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version
2020-10-20 20:23:55.184841: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcuda.so.1
2020-10-20 20:23:55.910091: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-10-20 20:23:56.005441: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-10-20 20:23:56.007801: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55dcd0f38cd0 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2020-10-20 20:23:56.007864: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): GeForce GTX 1080 Ti, Compute Capability 6.1
2020-10-20 20:23:56.007880: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (1): GeForce GTX 1080 Ti, Compute Capability 6.1
2020-10-20 20:23:56.007899: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (2): GeForce GTX 1080 Ti, Compute Capability 6.1
2020-10-20 20:23:56.007927: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (3): GeForce GTX 1080 Ti, Compute Capability 6.1
2020-10-20 20:23:56.014048: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties:
pciBusID: 0000:02:00.0 name: GeForce GTX 1080 Ti computeCapability: 6.1
coreClock: 1.62GHz coreCount: 28 deviceMemorySize: 10.91GiB deviceMemoryBandwidth: 451.17GiB/s
2020-10-20 20:23:56.016255: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 1 with properties:
pciBusID: 0000:03:00.0 name: GeForce GTX 1080 Ti computeCapability: 6.1
coreClock: 1.62GHz coreCount: 28 deviceMemorySize: 10.92GiB deviceMemoryBandwidth: 451.17GiB/s
2020-10-20 20:23:56.016421: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-10-20 20:23:56.018552: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 2 with properties:
pciBusID: 0000:83:00.0 name: GeForce GTX 1080 Ti computeCapability: 6.1
coreClock: 1.62GHz coreCount: 28 deviceMemorySize: 10.92GiB deviceMemoryBandwidth: 451.17GiB/s
2020-10-20 20:23:56.018714: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-10-20 20:23:56.020875: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 3 with properties:
pciBusID: 0000:84:00.0 name: GeForce GTX 1080 Ti computeCapability: 6.1
coreClock: 1.62GHz coreCount: 28 deviceMemorySize: 10.92GiB deviceMemoryBandwidth: 451.17GiB/s
2020-10-20 20:23:56.021192: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcudart.so.10.1'; dlerror: libcudart.so.10.1: cannot open shared object file: No such file or directory
2020-10-20 20:23:56.021402: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcublas.so.10'; dlerror: libcublas.so.10: cannot open shared object file: No such file or directory
2020-10-20 20:23:56.021620: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcufft.so.10'; dlerror: libcufft.so.10: cannot open shared object file: No such file or directory
2020-10-20 20:23:56.021832: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcurand.so.10'; dlerror: libcurand.so.10: cannot open shared object file: No such file or directory
2020-10-20 20:23:56.022043: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcusolver.so.10'; dlerror: libcusolver.so.10: cannot open shared object file: No such file or directory
2020-10-20 20:23:56.022256: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcusparse.so.10'; dlerror: libcusparse.so.10: cannot open shared object file: No such file or directory
2020-10-20 20:23:56.028004: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.7
2020-10-20 20:23:56.028032: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1753] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
2020-10-20 20:23:56.028223: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1257] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-10-20 20:23:56.028244: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1263] 0 1 2 3
2020-10-20 20:23:56.028255: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1276] 0: N Y N N
2020-10-20 20:23:56.028262: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1276] 1: Y N N N
2020-10-20 20:23:56.028277: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1276] 2: N N N Y
2020-10-20 20:23:56.028295: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1276] 3: N N Y N
############################
over
1
2
Traceback (most recent call last):
File "/home/wangchy/anaconda3/lib/python3.7/contextlib.py", line 130, in __exit__
self.gen.throw(type, value, traceback)
File "/home/wangchy/anaconda3/lib/python3.7/site-packages/tensorflow/python/framework/ops.py", line 4200, in name_scope
yield "" if new_stack is None else new_stack + "/"
File "/home/wangchy/anaconda3/lib/python3.7/site-packages/tensorflow/python/ops/map_fn.py", line 499, in map_fn
maximum_iterations=n)
File "/home/wangchy/anaconda3/lib/python3.7/site-packages/tensorflow/python/ops/control_flow_ops.py", line 2774, in while_loop
return_same_structure)
File "/home/wangchy/anaconda3/lib/python3.7/site-packages/tensorflow/python/ops/control_flow_ops.py", line 2256, in BuildLoop
pred, body, original_loop_vars, loop_vars, shape_invariants)
File "/home/wangchy/anaconda3/lib/python3.7/site-packages/tensorflow/python/ops/control_flow_ops.py", line 2181, in _BuildLoop
body_result = body(*packed_vars_for_body)
File "/home/wangchy/anaconda3/lib/python3.7/site-packages/tensorflow/python/ops/control_flow_ops.py", line 2726, in <lambda>
body = lambda i, lv: (i + 1, orig_body(*lv))
File "/home/wangchy/anaconda3/lib/python3.7/site-packages/tensorflow/python/ops/map_fn.py", line 483, in compute
result_value = autographed_fn(elems_value)
File "/home/wangchy/anaconda3/lib/python3.7/site-packages/tensorflow/python/autograph/impl/api.py", line 258, in wrapper
raise e.ag_error_metadata.to_exception(e)
tensorflow.python.autograph.impl.api.StagingError: in user code:
/home/wangchy/anaconda3/lib/python3.7/site-packages/tensorflow_gan/python/eval/inception_metrics.py:94 _classifier_fn *
output = tfhub.load(tfhub_module)(images)
/home/wangchy/anaconda3/lib/python3.7/site-packages/tensorflow_hub/module_v2.py:101 load *
module_path = resolve(handle)
/home/wangchy/anaconda3/lib/python3.7/site-packages/tensorflow_hub/module_v2.py:53 resolve *
return registry.resolver(handle)
/home/wangchy/anaconda3/lib/python3.7/site-packages/tensorflow_hub/registry.py:44 __call__ *
return impl(*args, **kwargs)
/home/wangchy/anaconda3/lib/python3.7/site-packages/tensorflow_hub/compressed_module_resolver.py:83 download *
response = self._call_urlopen(request)
/home/wangchy/anaconda3/lib/python3.7/site-packages/tensorflow_hub/resolver.py:418 atomic_download *
download_fn(handle, tmp_dir)
/home/wangchy/anaconda3/lib/python3.7/site-packages/tensorflow_hub/compressed_module_resolver.py:96 _call_urlopen *
return url.urlopen(request)
/home/wangchy/anaconda3/lib/python3.7/urllib/request.py:222 urlopen **
return opener.open(url, data, timeout)
/home/wangchy/anaconda3/lib/python3.7/urllib/request.py:525 open
response = self._open(req, data)
/home/wangchy/anaconda3/lib/python3.7/urllib/request.py:543 _open
'_open', req)
/home/wangchy/anaconda3/lib/python3.7/urllib/request.py:503 _call_chain
result = func(*args)
/home/wangchy/anaconda3/lib/python3.7/urllib/request.py:1360 https_open
context=self._context, check_hostname=self._check_hostname)
/home/wangchy/anaconda3/lib/python3.7/urllib/request.py:1319 do_open
raise URLError(err)
URLError: <urlopen error [Errno 110] Connection timed out>
Process finished with exit code 1
What can i do to address this problem? Thank you very much.
is there a way to calculate the Frechet Inception Distance between a dataset and just one sample (instead of a set of them), producing consistent results?
How can I implement fid.py for mnist dataset. What I need to change is in fid.py. Can you help me? Thanks so much!
Hi, I just wanted to let you know I found a minor bug when using datasets that are smaller than the batch size. When this happens, the integer division evaluates to zero and no batch is computed whatsoever. With no values computed, no FID is given, resulting in a NaN output.
The fix is pretty simple and is basically:
n_batches = Max(1, inps.shape[0]//BATCH_SIZE)
At line 52.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.