tsc2017 / frechet-inception-distance Goto Github PK

View Code? Open in Web Editor NEW

79.0 3.0 14.0 56 KB

CPU/GPU/TPU implementation of the Fréchet Inception Distance

Python 100.00%

gan generative-model distance-measures deep-learning frechet-distance

frechet-inception-distance's Issues

E tensorflow/stream_executor/cuda/cuda_blas.cc:654] failed to run cuBLAS routine cublasDgemm_v2: CUBLAS_STATUS_EXECUTION_FAILED

Hi!

When I run the program, I get this error:

2019-04-19 21:16:23.177552: E tensorflow/stream_executor/cuda/cuda_blas.cc:654] failed to run cuBLAS routine cublasDgemm_v2: CUBLAS_STATUS_EXECUTION_FAILED Traceback (most recent call last): File "/home/iie/.conda/envs/env36/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1322, in _do_call return fn(*args) File "/home/iie/.conda/envs/env36/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1307, in _run_fn options, feed_dict, fetch_list, target_list, run_metadata) File "/home/iie/.conda/envs/env36/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1409, in _call_tf_sessionrun run_metadata) tensorflow.python.framework.errors_impl.InternalError: Blas GEMM launch failed : a.shape=(2048, 2048), b.shape=(2048, 2048), m=2048, n=2048, k=2048 [[Node: MatMul_6 = MatMul[T=DT_DOUBLE, transpose_a=false, transpose_b=false, _device="/job:localhost/replica:0/task:0/device:GPU:0"](Svd_1:1, Diag_1)]]

what should I do?

@tsc2017

result is nan

I try to calculate FID of to batch image,and the results is nan.I try many times but it always is nan.Can you help me? Thanks so much!

FID score evaluation throwing error on TPU

Error:

InvalidArgumentError                      Traceback (most recent call last)

/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py in _do_call(self, fn, *args)
   1364     try:
-> 1365       return fn(*args)
   1366     except errors.OpError as e:

8 frames

/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py in _run_fn(feed_dict, fetch_list, target_list, options, run_metadata)
   1349       return self._call_tf_sessionrun(options, feed_dict, fetch_list,
-> 1350                                       target_list, run_metadata)
   1351 

/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py in _call_tf_sessionrun(self, options, feed_dict, fetch_list, target_list, run_metadata)
   1442                                             fetch_list, target_list,
-> 1443                                             run_metadata)
   1444 

InvalidArgumentError: No OpKernel was registered to support Op 'TPUReplicateMetadata' used by {{node TPUReplicateMetadata}}with these attrs: [topology="", step_marker_location="STEP_MARK_AT_ENTRY", allow_soft_placement=false, num_cores_per_replica=1, use_tpu=true, num_replicas=8, computation_shape=[], host_compute_core=[], device_assignment=[], padding_map=[], _tpu_replicate="cluster"]
Registered devices: [CPU, XLA_CPU]
Registered kernels:
  <no registered kernels>

	 [[TPUReplicateMetadata]]


During handling of the above exception, another exception occurred:

InvalidArgumentError                      Traceback (most recent call last)

<ipython-input-5-df8550f415a5> in <module>()
----> 1 get_fid(images1, images2, session, strategy)

<ipython-input-2-d9e7a5cdebce> in get_fid(images1, images2, session, strategy)
     75     print('Calculating FID with %i images from each distribution' % (images1.shape[0]))
     76     start_time = time.time()
---> 77     act1 = get_inception_activations(images1, session, strategy)
     78     act2 = get_inception_activations(images2, session, strategy)
     79     fid = activations2distance(act1, act2, session)

<ipython-input-2-d9e7a5cdebce> in get_inception_activations(inps, session, strategy)
     57     for i in range(n_batches):
     58         inp = inps[i * BATCH_SIZE : (i + 1) * BATCH_SIZE] / 255. * 2 - 1
---> 59         act[i * BATCH_SIZE : i * BATCH_SIZE + min(BATCH_SIZE, inp.shape[0])] = session.run(activations[0], feed_dict = {inception_images[0]: inp})
     60     return act
     61 

/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py in run(self, fetches, feed_dict, options, run_metadata)
    954     try:
    955       result = self._run(None, fetches, feed_dict, options_ptr,
--> 956                          run_metadata_ptr)
    957       if run_metadata:
    958         proto_data = tf_session.TF_GetBuffer(run_metadata_ptr)

/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py in _run(self, handle, fetches, feed_dict, options, run_metadata)
   1178     if final_fetches or final_targets or (handle and feed_dict_tensor):
   1179       results = self._do_run(handle, final_targets, final_fetches,
-> 1180                              feed_dict_tensor, options, run_metadata)
   1181     else:
   1182       results = []

/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py in _do_run(self, handle, target_list, fetch_list, feed_dict, options, run_metadata)
   1357     if handle is None:
   1358       return self._do_call(_run_fn, feeds, fetches, targets, options,
-> 1359                            run_metadata)
   1360     else:
   1361       return self._do_call(_prun_fn, handle, feeds, fetches)

/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py in _do_call(self, fn, *args)
   1382                     '\nsession_config.graph_options.rewrite_options.'
   1383                     'disable_meta_optimizer = True')
-> 1384       raise type(e)(node_def, op, message)
   1385 
   1386   def _extend_graph(self):

InvalidArgumentError: No OpKernel was registered to support Op 'TPUReplicateMetadata' used by node TPUReplicateMetadata (defined at /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py:1748) with these attrs: [topology="", step_marker_location="STEP_MARK_AT_ENTRY", allow_soft_placement=false, num_cores_per_replica=1, use_tpu=true, num_replicas=8, computation_shape=[], host_compute_core=[], device_assignment=[], padding_map=[], _tpu_replicate="cluster"]
Registered devices: [CPU, XLA_CPU]
Registered kernels:
  <no registered kernels>

	 [[TPUReplicateMetadata]]

I was able to reproduce this issue on Colab. This code from fid_tpu_tf1.py and run on TF1, also I ran fid_tpu.py on TF2 and got the same error.

Error in running fid.py

I run the fid,py, but the error accurs:

``ssh://[email protected]:22/home/wangchy/anaconda3/bin/python3 -u /home/wangchy/.pycharm_helpers/pydev/pydevd.py --multiproc --qt-support=auto --client 0.0.0.0 --port 38755 --file /home/wangchy/wcy/FAGFSR/metrics/fid.py
pydev debugger: process 65872 is connecting

Connected to pydev debugger (build 193.5662.61)
pydev debugger: warning: trying to add breakpoint to file that does not exist: /home/wangchy/wcy/FAGFSR/model/fishsrnet.py (will have no effect)
2020-10-20 20:23:46.970816: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcudart.so.10.1'; dlerror: libcudart.so.10.1: cannot open shared object file: No such file or directory
2020-10-20 20:23:46.970864: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
WARNING:tensorflow:From /home/wangchy/anaconda3/lib/python3.7/site-packages/tensorflow/python/compat/v2_compat.py:96: disable_resource_variables (from tensorflow.python.ops.variable_scope) is deprecated and will be removed in a future version.
Instructions for updating:
non-resource variables are not supported in the long term
WARNING:tensorflow:From /home/wangchy/anaconda3/lib/python3.7/site-packages/tensorflow_gan/python/estimator/tpu_gan_estimator.py:42: The name tf.estimator.tpu.TPUEstimator is deprecated. Please use tf.compat.v1.estimator.tpu.TPUEstimator instead.

2020-10-20 20:23:55.176133: I tensorflow/core/platform/profile_utils/cpu_utils.cc:104] CPU Frequency: 2394330000 Hz
2020-10-20 20:23:55.180586: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55dcd0e64260 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-10-20 20:23:55.180654: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2020-10-20 20:23:55.184841: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcuda.so.1
2020-10-20 20:23:55.910091: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-10-20 20:23:56.005441: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-10-20 20:23:56.007801: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55dcd0f38cd0 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2020-10-20 20:23:56.007864: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): GeForce GTX 1080 Ti, Compute Capability 6.1
2020-10-20 20:23:56.007880: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (1): GeForce GTX 1080 Ti, Compute Capability 6.1
2020-10-20 20:23:56.007899: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (2): GeForce GTX 1080 Ti, Compute Capability 6.1
2020-10-20 20:23:56.007927: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (3): GeForce GTX 1080 Ti, Compute Capability 6.1
2020-10-20 20:23:56.014048: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties: 
pciBusID: 0000:02:00.0 name: GeForce GTX 1080 Ti computeCapability: 6.1
coreClock: 1.62GHz coreCount: 28 deviceMemorySize: 10.91GiB deviceMemoryBandwidth: 451.17GiB/s
2020-10-20 20:23:56.016255: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 1 with properties: 
pciBusID: 0000:03:00.0 name: GeForce GTX 1080 Ti computeCapability: 6.1
coreClock: 1.62GHz coreCount: 28 deviceMemorySize: 10.92GiB deviceMemoryBandwidth: 451.17GiB/s
2020-10-20 20:23:56.016421: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-10-20 20:23:56.018552: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 2 with properties: 
pciBusID: 0000:83:00.0 name: GeForce GTX 1080 Ti computeCapability: 6.1
coreClock: 1.62GHz coreCount: 28 deviceMemorySize: 10.92GiB deviceMemoryBandwidth: 451.17GiB/s
2020-10-20 20:23:56.018714: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-10-20 20:23:56.020875: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 3 with properties: 
pciBusID: 0000:84:00.0 name: GeForce GTX 1080 Ti computeCapability: 6.1
coreClock: 1.62GHz coreCount: 28 deviceMemorySize: 10.92GiB deviceMemoryBandwidth: 451.17GiB/s
2020-10-20 20:23:56.021192: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcudart.so.10.1'; dlerror: libcudart.so.10.1: cannot open shared object file: No such file or directory
2020-10-20 20:23:56.021402: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcublas.so.10'; dlerror: libcublas.so.10: cannot open shared object file: No such file or directory
2020-10-20 20:23:56.021620: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcufft.so.10'; dlerror: libcufft.so.10: cannot open shared object file: No such file or directory
2020-10-20 20:23:56.021832: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcurand.so.10'; dlerror: libcurand.so.10: cannot open shared object file: No such file or directory
2020-10-20 20:23:56.022043: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcusolver.so.10'; dlerror: libcusolver.so.10: cannot open shared object file: No such file or directory
2020-10-20 20:23:56.022256: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcusparse.so.10'; dlerror: libcusparse.so.10: cannot open shared object file: No such file or directory
2020-10-20 20:23:56.028004: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.7
2020-10-20 20:23:56.028032: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1753] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
2020-10-20 20:23:56.028223: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1257] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-10-20 20:23:56.028244: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1263]      0 1 2 3 
2020-10-20 20:23:56.028255: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1276] 0:   N Y N N 
2020-10-20 20:23:56.028262: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1276] 1:   Y N N N 
2020-10-20 20:23:56.028277: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1276] 2:   N N N Y 
2020-10-20 20:23:56.028295: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1276] 3:   N N Y N 
############################
over
1
2
Traceback (most recent call last):
  File "/home/wangchy/anaconda3/lib/python3.7/contextlib.py", line 130, in __exit__
    self.gen.throw(type, value, traceback)
  File "/home/wangchy/anaconda3/lib/python3.7/site-packages/tensorflow/python/framework/ops.py", line 4200, in name_scope
    yield "" if new_stack is None else new_stack + "/"
  File "/home/wangchy/anaconda3/lib/python3.7/site-packages/tensorflow/python/ops/map_fn.py", line 499, in map_fn
    maximum_iterations=n)
  File "/home/wangchy/anaconda3/lib/python3.7/site-packages/tensorflow/python/ops/control_flow_ops.py", line 2774, in while_loop
    return_same_structure)
  File "/home/wangchy/anaconda3/lib/python3.7/site-packages/tensorflow/python/ops/control_flow_ops.py", line 2256, in BuildLoop
    pred, body, original_loop_vars, loop_vars, shape_invariants)
  File "/home/wangchy/anaconda3/lib/python3.7/site-packages/tensorflow/python/ops/control_flow_ops.py", line 2181, in _BuildLoop
    body_result = body(*packed_vars_for_body)
  File "/home/wangchy/anaconda3/lib/python3.7/site-packages/tensorflow/python/ops/control_flow_ops.py", line 2726, in <lambda>
    body = lambda i, lv: (i + 1, orig_body(*lv))
  File "/home/wangchy/anaconda3/lib/python3.7/site-packages/tensorflow/python/ops/map_fn.py", line 483, in compute
    result_value = autographed_fn(elems_value)
  File "/home/wangchy/anaconda3/lib/python3.7/site-packages/tensorflow/python/autograph/impl/api.py", line 258, in wrapper
    raise e.ag_error_metadata.to_exception(e)
tensorflow.python.autograph.impl.api.StagingError: in user code:

    /home/wangchy/anaconda3/lib/python3.7/site-packages/tensorflow_gan/python/eval/inception_metrics.py:94 _classifier_fn  *
        output = tfhub.load(tfhub_module)(images)
    /home/wangchy/anaconda3/lib/python3.7/site-packages/tensorflow_hub/module_v2.py:101 load  *
        module_path = resolve(handle)
    /home/wangchy/anaconda3/lib/python3.7/site-packages/tensorflow_hub/module_v2.py:53 resolve  *
        return registry.resolver(handle)
    /home/wangchy/anaconda3/lib/python3.7/site-packages/tensorflow_hub/registry.py:44 __call__  *
        return impl(*args, **kwargs)
    /home/wangchy/anaconda3/lib/python3.7/site-packages/tensorflow_hub/compressed_module_resolver.py:83 download  *
        response = self._call_urlopen(request)
    /home/wangchy/anaconda3/lib/python3.7/site-packages/tensorflow_hub/resolver.py:418 atomic_download  *
        download_fn(handle, tmp_dir)
    /home/wangchy/anaconda3/lib/python3.7/site-packages/tensorflow_hub/compressed_module_resolver.py:96 _call_urlopen  *
        return url.urlopen(request)
    /home/wangchy/anaconda3/lib/python3.7/urllib/request.py:222 urlopen  **
        return opener.open(url, data, timeout)
    /home/wangchy/anaconda3/lib/python3.7/urllib/request.py:525 open
        response = self._open(req, data)
    /home/wangchy/anaconda3/lib/python3.7/urllib/request.py:543 _open
        '_open', req)
    /home/wangchy/anaconda3/lib/python3.7/urllib/request.py:503 _call_chain
        result = func(*args)
    /home/wangchy/anaconda3/lib/python3.7/urllib/request.py:1360 https_open
        context=self._context, check_hostname=self._check_hostname)
    /home/wangchy/anaconda3/lib/python3.7/urllib/request.py:1319 do_open
        raise URLError(err)

    URLError: <urlopen error [Errno 110] Connection timed out>


Process finished with exit code 1

What can i do to address this problem? Thank you very much.

FID between a dataset and a single sample

is there a way to calculate the Frechet Inception Distance between a dataset and just one sample (instead of a set of them), producing consistent results?

calculate for mnist dataset

How can I implement fid.py for mnist dataset. What I need to change is in fid.py. Can you help me? Thanks so much!

Issue on small datasets

Hi, I just wanted to let you know I found a minor bug when using datasets that are smaller than the batch size. When this happens, the integer division evaluates to zero and no batch is computed whatsoever. With no values computed, no FID is given, resulting in a NaN output.

The fix is pretty simple and is basically:
n_batches = Max(1, inps.shape[0]//BATCH_SIZE)

At line 52.

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.