caicloud / tensorflow-tutorial Goto Github PK

View Code? Open in Web Editor NEW

2.9K 257.0 2.1K 467.75 MB

Example TensorFlow codes and Caicloud TensorFlow as a Service dev environment.

Shell 0.03% Python 0.82% Jupyter Notebook 99.15% Dockerfile 0.01%

clever

tensorflow-tutorial's Introduction

tensorflow-tutorial

Example TensorFlow codes and Caicloud TensorFlow as a Service dev environment.

tensorflow-tutorial's People

Contributors

Stargazers

Watchers

Forkers

lienhua34 jinyu0310 mingxuan-yi wugang33 wanby cloudthink xiao2mo pythonai zhangyuxin621 wangg12 dragonforce2010 kidkid168 sunjieee allensmile chagge hgjldx uestcwangxiao zhugongzaici wwbigdata902 hejunbok daoos wanjinchang tianming2 statml hkybupt vovoma four-clover camellia89 87170360 lixiaosi33 yanghongkai wangkunctc jaylenzhang ithjz awesome-archive brucelau126 chao-jiang kioco chen9154 tengxing yangsuo 0xqq melody-xiaomi shawnwongmilab vanyoung mashangxue amzhanghan am-corporation x-hacker nh007cs gongqingyi-github dingqunfei fanfanfeng cgdeeplearn aimicm youngyik goofusuper sthsf shyant epirs hanlos lszxlong cosastro zhumingfei derweeyang lammmmm2018 mlweihx tangxinkevin fighterlpwan p870668723 payshangjj maobit ruyiwei-cas sweetice matrixsun guangpu hxfxjun dengmin155 los-phoenix lxgfairytail wjdfx jianyuchen23 wocclyl wsxiaozhang shanekao yesnina xshhhm nanfengpo telescopeuser xraycheng ading1977 mryoungci xiaolingzang lichangbing xs14331309 zhengzhixian douli9862 hsh0107 xupeng082008 wsf1990

tensorflow-tutorial's Issues

第七章中的输入文件队列，运行match_filenames_once报错

错误信息：

tensorflow.python.framework.errors_impl.FailedPreconditionError: Attempting to use uninitialized value matching_filenames
	 [[Node: _send_matching_filenames_0 = _Send[T=DT_STRING, client_terminated=true, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/cpu:0", send_device_incarnation=7488469813197262420, tensor_name="matching_filenames:0", _device="/job:localhost/replica:0/task:0/cpu:0"](matching_filenames)]]

是否是与我环境有问题？
我电脑是Mac的，安装tensorFlow是用virtualenv方式

feed_dict是可以省略的吗

bottleneck_values = sess.run(bottleneck_tensor, {image_data_tensor: image_data})

后面大括号的内容不应该是带一个feed_dict={}吗
请问这是什么意思

具体的代码位置在于：
https://github.com/caicloud/tensorflow-tutorial/blob/master/Deep_Learning_with_TensorFlow/1.0.0/Chapter06/2.%20%E8%BF%81%E7%A7%BB%E5%AD%A6%E4%B9%A0.ipynb

第9章9.22 节点信息问题

train_writer.add_run_metadata(run_metadata,'step%03d',%i)

train_writer哪里定义了呢?

出错:
train_writer.add_run_metadata(run_metadata,'step%03d',%i)
^
SyntaxError: invalid syntax

项目的代码是不是都要基于python2.7运行？有没有基于python3.5的版本？

6.5.2的tensorflow实现迁移学习出现代码问题，该怎么解决。

第六章6.5.2的tensorflow实现迁移学习出现了一个代码问题：empty range for randrange()。问题出在执行get_random_cached_bottlenecks函数里的label_index = random.randrange(n_classes)。我的代码完全是跟书上一样的。不知道怎么回事。

第八章循环神经网络中的 reader.py 会报错

环境: Win10 x64 + Tensorflow 1.0.0 + Python 3.6
错误文件位置: Deep_Learning_with_TensorFlow/1.0.0/Chapter08/reader.py
报错内容:

28 def _read_words(filename):
     29   with tf.gfile.GFile(filename, "r") as f:
---> 30     return f.read().decode("utf-8").replace("\n", "<eos>").split()
     31 
     32 

AttributeError: 'str' object has no attribute 'decode'

原内容

def _read_words(filename):
  with tf.gfile.GFile(filename, "r") as f:
    return f.read().decode("utf-8").replace("\n", "<eos>").split()

建议修改为

def _read_words(filename):
  with tf.gfile.GFile(filename, "r") as f:
    return f.read().replace("\n", "<eos>").split()

6.5.2节迁移学习图像大小的问题？

提供的flower_photos数据集中图片的大小是不一样的，为什么大小不一样也可以训练啊？代码里哪里有处理吗？

Deep_Learning_with_TensorFlow/1.0.0/Chapter03/1. 图，张量及会话.ipynb 打不开

显示error

关于第五章mnist_train.py文件中的滑动平均值的问题。

第23行，“global_step = tf.Variable(0, trainable=False)”，定义了变量global_step。
第26行，“variable_averages = tf.train.ExponentialMovingAverage(MOVING_AVERAGE_DECAY, global_step)”，定义了滑动平均对象。
按理说，以后每次训练global_step都要加1。
但此后，并没我并未发现“global_step每次训练都加1的代码”。这是怎么回事呢？

第六章迁移学习中读取inception-v3模型的问题

with gfile.FastGFile(os.path.join(MODEL_DIR, MODEL_FILE), 'rb') as f:
    graph_def = tf.GraphDef()
    graph_def.ParseFromString(f.read())

以上代码是读取inception-v3的模型时raise error的代码行，在sublime中我会收到以下错误：
tensorflow.python.framework.errors_impl.NotFoundError: NewRandomAccessFile failed to Create/Open: 代码资源/Deep_Learning_with_TensorFlow/datasets/inception_dec_2015\tensorflow_inception_graph.pb : ϵͳ\udcd5Ҳ\udcbb\udcb5\udcbdָ\udcb6\udca8\udcb5\udcc4·\udcbe\udcb6\udca1\udca3

但是在spyder中，我得到的是以下的错误：
UnicodeEncodeError: 'utf-8' codec can't encode character '\udcd5' in position 2463: surrogates not allowed

请问是字符转换的问题吗？如何解决这个问题呢?

OS: windows 7
Python: 3,5,2
Tensorflow: 1.0.1

第10章多GPU并行训练一半报错

# 定义输入队列并返回 min_after_dequeue = 10000 capacity = min_after_dequeue + 3 * BATCH_SIZE return tf.train.shuffle_batch([retyped_image, label], batch_size=BATCH_SIZE, capacity=capacity, min_after_dequeue=min_after_dequeue) --------这一行报的错

log里边这样写的
`Caused by op 'shuffle_batch', defined at:
File "C:\Users\weizhen\workspace\TextUtil\TFMULTIGPU_.py", line 203, in
tf.app.run()
File "C:\Users\weizhen\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\python\platform\app.py", line 43, in run
sys.exit(main(sys.argv[:1] + flags_passthrough))
File "C:\Users\weizhen\workspace\TextUtil\TFMULTIGPU_.py", line 100, in main
x, y_ = get_input()
File "C:\Users\weizhen\workspace\TextUtil\TFMULTIGPU_.py", line 59, in get_input
min_after_dequeue=min_after_dequeue)
File "C:\Users\weizhen\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\python\training\input.py", line 917, in shuffle_batch
dequeued = queue.dequeue_many(batch_size, name=name)
File "C:\Users\weizhen\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\python\ops\data_flow_ops.py", line 458, in dequeue_many
self._queue_ref, n=n, component_types=self._dtypes, name=name)
File "C:\Users\weizhen\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\python\ops\gen_data_flow_ops.py", line 1099, in _queue_dequeue_many
timeout_ms=timeout_ms, name=name)
File "C:\Users\weizhen\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 759, in apply_op
op_def=op_def)
File "C:\Users\weizhen\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\python\framework\ops.py", line 2240, in create_op
original_op=self._default_original_op, op_def=op_def)
File "C:\Users\weizhen\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\python\framework\ops.py", line 1128, in init
self._traceback = _extract_stack()

OutOfRangeError (see above for traceback): RandomShuffleQueue '_2_shuffle_batch/random_shuffle_queue' is closed and has insufficient elements (requested 100, current size 0)
[[Node: shuffle_batch = QueueDequeueMany[_class=["loc:@shuffle_batch/random_shuffle_queue"], component_types=[DT_FLOAT, DT_INT32], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](shuffle_batch/random_shuffle_queue, shuffle_batch/n)]]

E c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\cuda\cuda_gpu_executor.cc:652] Deallocating stream with pending work
`

关于多GPU并行和分布式并行的设想和疑问

是否可以实现多GPU实现和分布式并行的结合，即集群每个节点有多个GPU，每个节点的计算用多GPU代码实现，求得的参数再用分布式代码提交到参数服务器进行整合。因为我发现，每个task只管理1个GPU效率较低，是否可以实现一个task管理多个GPU？请问可以给点建议吗？是否有相关的资料可以参考？

Deep_Learning_with_TensorFlow/1.0.0/Chapter10/2. 多GPU并行.py中某些代码还没有更新为1.0.0的版本

需要做以下更新：

tf.nn.sparse_softmax_cross_entropy_with_logits(y, y_) 　-> tf.nn.sparse_softmax_cross_entropy_with_logits(logits = y, labels = y_)

tf.histogram_summary() -> tf.summary.histogram()

tf.merge_all_summaries() -> tf.summary.merge_all()

tf.train.SummaryWriter() -> tf.summary.FileWriter()

书本代码问题

首先感谢作者，通过阅读这本书收获很大，学到很多。
我在运行书本中的代码中遇到如下问题：
书本P195，第7.3.4输入数据处理框架（这里github库相应章节中代码和书中不同）
features = tf.parse_single_example(
serializerd_example,
features={
'image': tf.FixedLenFeature([], tf.string),
'label': tf.FixedLenFeature([], tf.int64),
'height': tf.FixedLenFeature([], tf.int64),
'width': tf.FixedLenFeature([], tf.int64),
'channels': tf.FixedLenFeature([], tf.int64)
})

image, label = features['image'], features['label']
height, width = features['height'], features['width']
channels = features['channels']
decoded_image = tf.decode_raw(image, tf.uint8)
decoded_image.set_shape([height, width, channels]) （1）

这里（1）处的height, width, channels是从TFRecords文件中读取的，是Tensor，
代码运行时出现以下错误：
TypeError: int() argument must be a string or a number, not 'Tensor'

即使我把height，width，channels直接替换为数字，
decoded_image.set_shape([224, 224, 3])
也会报错
ValueError: Shapes (?,) and (224, 224, 3) are not compatible

最后通过尝试以下方式是可行的，但还是要通过tf.reshape来恢复图片shape
decoded_image.set_shape([224 * 224 * 3])

所以我想问

decoded_image.set_shape函数该如何用, 是否直接用tf.reshape来设定图片的shape更好？
height，width，channels等从TFRecords文件读取的信息都是Tensor，在运行Session前不能得到它们的值，所以是否可以这样理解，在TFRecords文件保存这些信息是没有用的，因为还是必须人为设定这些信息，才能通过reshape恢复图片shape？

按书上路径安装docker TF 0.12.0版本，juypter需要密码才能打开。

如果不宜公开密码请发至[email protected]。谢谢

第八章RNN程序疑惑

问题1

embedding = tf.get_variable("embedding", [VOCAB_SIZE, HIDDEN_SIZE])
# 将原本batch_size * num_steps 个单词ID转换为单词向量
# 转换后的输入层维度为batch_size * num_size * HIDDEN_SIZE
inputs = tf.nn.embedding_lookup(embedding, self.input_data)

我看这两行的代码，embedding只是定义了一个VOCAB_SIZE*HIDDEN_SIZE大小的矩阵，并没有进行初始化，这个地方通过embedding_lookup就转换为了input的单词向量，这个实际上是怎么办到的呢？

问题2

def main(_):
# 获取原始数据
train_data, valid_data, test_data, _ = reader.ptb_raw_data("/Users/xxx/work5/tensorflow/data/ptb_dataset/simple-examples/data")
print len(train_data)

# 定义初始化函数
initializer = tf.random_uniform_initializer(-0.05, 0.05)
# 定义训练用的神经网络模型
with tf.variable_scope("language_model", reuse=None, initializer=initializer):
    train_model = PTBModel(True, TRAIN_BATCH_SIZE, TRAIN_NUM_STEP)

# 定义评测用的神经网络模型
with tf.variable_scope("language_model", reuse=True, initializer=initializer):
    eval_model = PTBModel(False, EVAL_BATCH_SIZE, EVAL_NUM_STEP)

with tf.Session() as sess:
    tf.initialize_all_variables().run()
    
    # 使用训练数据训练模型
    for i in range(NUM_EPOCH):
        print "In iteration: %d" % (i + 1)
        # 在训练数据上训练神经网络模型
        run_epoch(sess, train_model, train_data, train_model.train_op, True)
        
        # 使用验证数据评测模型效果
        valid_perplexity = run_epoch(sess, eval_model, valid_data, tf.no_op(), False)
        print "Epoch: %d validation perplexity: %.3f" % (i + 1, valid_perplexity)
        
    # 最后使用测试数据测试模型效果
    test_perplexity = run_epoch(sess, eval_model, test_data, tf.no_op(), False)
    
    print "Test perplexity: %.3f" % test_perplexity

还有这段代码，在测试数据上测试模型的效果test_perplexity = run_epoch(sess, eval_model, test_data, tf.no_op(), False)
为什么不用训练好的模型train_model，而使用验证数据集上的eval_model模型呢？

以上代码主要是参考您书上的第217-221页。
期望能够在空余时间帮忙解惑一下，非常感谢

第五章, mnist_train.py和mnist_eval.py同时启动报错

第五章的时候, 老师介绍了训练结果的保存, 同时mnist_eval利用训练结果进行测试集的验证,
但是我如果两个脚本同时启动就会报错. 启动任意一个没有问题, 请问是因为我的电脑只有一个GPU, 所以一起只能启动一个吗?
谢谢

请问在tensorflow里面怎样定义自己的激活函数？

最近遇到一个问题，不知道怎么在tensorflow里面自定义自己的激活函数，比如（log），在tensorflow里面的反向求导机制是怎样的，是框架自动求导还是要自己预先写好求导文件，然后再一层一层连接起来？求大神指导，谢谢！

something wrong with the last part of code

something wrong with the last part of code:
I run the jupyter code, but there is something on my laptop.
What's wrong with it?
Thanks for your kindly help for a beginner ：）
结果显示如下，不知道哪里错了：

w1: [[-0.81131822 1.48459876 0.06532937]
[-2.4427042 0.0992484 0.59122431]]
w2: [[-0.81131822]
[ 1.48459876]
[ 0.06532937]]

InternalError Traceback (most recent call last)
in ()
13 start = (ibatch_size) % 128
14 end = (ibatch_size) % 128 + batch_size
---> 15 sess.run(train_step, feed_dict={x: X[start:end], y_: Y[start:end]})
16 if i % 1000 == 0:
17 total_cross_entropy = sess.run(cross_entropy, feed_dict={x: X, y_: Y})

/home/brucelau/anaconda2/lib/python2.7/site-packages/tensorflow/python/client/session.pyc in run(self, fetches, feed_dict, options, run_metadata)
765 try:
766 result = self._run(None, fetches, feed_dict, options_ptr,
--> 767 run_metadata_ptr)
768 if run_metadata:
769 proto_data = tf_session.TF_GetBuffer(run_metadata_ptr)

/home/brucelau/anaconda2/lib/python2.7/site-packages/tensorflow/python/client/session.pyc in _run(self, handle, fetches, feed_dict, options, run_metadata)
963 if final_fetches or final_targets:
964 results = self._do_run(handle, final_targets, final_fetches,
--> 965 feed_dict_string, options, run_metadata)
966 else:
967 results = []

/home/brucelau/anaconda2/lib/python2.7/site-packages/tensorflow/python/client/session.pyc in _do_run(self, handle, target_list, fetch_list, feed_dict, options, run_metadata)
1013 if handle is None:
1014 return self._do_call(_run_fn, self._session, feed_dict, fetch_list,
-> 1015 target_list, options, run_metadata)
1016 else:
1017 return self._do_call(_prun_fn, self._session, handle, feed_dict,

/home/brucelau/anaconda2/lib/python2.7/site-packages/tensorflow/python/client/session.pyc in _do_call(self, fn, *args)
1033 except KeyError:
1034 pass
-> 1035 raise type(e)(node_def, op, message)
1036
1037 def _extend_graph(self):

InternalError: Blas SGEMM launch failed : a.shape=(8, 2), b.shape=(2, 3), m=8, n=3, k=2
[[Node: MatMul_2 = MatMul[T=DT_FLOAT, transpose_a=false, transpose_b=false, _device="/job:localhost/replica:0/task:0/gpu:0"](_recv_x-input_1_0/_11, Variable_2/read)]]

Caused by op u'MatMul_2', defined at:
File "/home/brucelau/anaconda2/lib/python2.7/runpy.py", line 174, in _run_module_as_main
"main", fname, loader, pkg_name)
File "/home/brucelau/anaconda2/lib/python2.7/runpy.py", line 72, in _run_code
exec code in run_globals
File "/home/brucelau/anaconda2/lib/python2.7/site-packages/ipykernel/main.py", line 3, in
app.launch_new_instance()
File "/home/brucelau/anaconda2/lib/python2.7/site-packages/traitlets/config/application.py", line 658, in launch_instance
app.start()
File "/home/brucelau/anaconda2/lib/python2.7/site-packages/ipykernel/kernelapp.py", line 474, in start
ioloop.IOLoop.instance().start()
File "/home/brucelau/anaconda2/lib/python2.7/site-packages/zmq/eventloop/ioloop.py", line 177, in start
super(ZMQIOLoop, self).start()
File "/home/brucelau/anaconda2/lib/python2.7/site-packages/tornado/ioloop.py", line 887, in start
handler_func(fd_obj, events)
File "/home/brucelau/anaconda2/lib/python2.7/site-packages/tornado/stack_context.py", line 275, in null_wrapper
return fn(*args, **kwargs)
File "/home/brucelau/anaconda2/lib/python2.7/site-packages/zmq/eventloop/zmqstream.py", line 440, in _handle_events
self._handle_recv()
File "/home/brucelau/anaconda2/lib/python2.7/site-packages/zmq/eventloop/zmqstream.py", line 472, in _handle_recv
self._run_callback(callback, msg)
File "/home/brucelau/anaconda2/lib/python2.7/site-packages/zmq/eventloop/zmqstream.py", line 414, in _run_callback
callback(*args, **kwargs)
File "/home/brucelau/anaconda2/lib/python2.7/site-packages/tornado/stack_context.py", line 275, in null_wrapper
return fn(*args, **kwargs)
File "/home/brucelau/anaconda2/lib/python2.7/site-packages/ipykernel/kernelbase.py", line 276, in dispatcher
return self.dispatch_shell(stream, msg)
File "/home/brucelau/anaconda2/lib/python2.7/site-packages/ipykernel/kernelbase.py", line 228, in dispatch_shell
handler(stream, idents, msg)
File "/home/brucelau/anaconda2/lib/python2.7/site-packages/ipykernel/kernelbase.py", line 390, in execute_request
user_expressions, allow_stdin)
File "/home/brucelau/anaconda2/lib/python2.7/site-packages/ipykernel/ipkernel.py", line 196, in do_execute
res = shell.run_cell(code, store_history=store_history, silent=silent)
File "/home/brucelau/anaconda2/lib/python2.7/site-packages/ipykernel/zmqshell.py", line 501, in run_cell
return super(ZMQInteractiveShell, self).run_cell(*args, **kwargs)
File "/home/brucelau/anaconda2/lib/python2.7/site-packages/IPython/core/interactiveshell.py", line 2717, in run_cell
interactivity=interactivity, compiler=compiler, result=result)
File "/home/brucelau/anaconda2/lib/python2.7/site-packages/IPython/core/interactiveshell.py", line 2821, in run_ast_nodes
if self.run_code(code, result):
File "/home/brucelau/anaconda2/lib/python2.7/site-packages/IPython/core/interactiveshell.py", line 2881, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "", line 1, in
a = tf.matmul(x, w1)
File "/home/brucelau/anaconda2/lib/python2.7/site-packages/tensorflow/python/ops/math_ops.py", line 1855, in matmul
a, b, transpose_a=transpose_a, transpose_b=transpose_b, name=name)
File "/home/brucelau/anaconda2/lib/python2.7/site-packages/tensorflow/python/ops/gen_math_ops.py", line 1454, in _mat_mul
transpose_b=transpose_b, name=name)
File "/home/brucelau/anaconda2/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 763, in apply_op
op_def=op_def)
File "/home/brucelau/anaconda2/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2395, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/home/brucelau/anaconda2/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1264, in init
self._traceback = _extract_stack()

InternalError (see above for traceback): Blas SGEMM launch failed : a.shape=(8, 2), b.shape=(2, 3), m=8, n=3, k=2
[[Node: MatMul_2 = MatMul[T=DT_FLOAT, transpose_a=false, transpose_b=false, _device="/job:localhost/replica:0/task:0/gpu:0"](_recv_x-input_1_0/_11, Variable_2/read)]]

第九章中TensorBoard可视化问题

如何将tensorboard网页上的图清晰的保存下来。供论文使用？
Ps：直接截图都不是很清晰

8.4.2节预测正弦函数的疑惑

您好！在预测正弦函数的实例中，HIDDEN_SIZE为什么要设置成30呢？

我查阅了关于HIDDEN_SIZE的资料，发现这个参数反映网络的学习能力，可根据经验设置。但为何我把HIDDEN_SIZE修改为其他数值时（例如20、40等），程序就跑不通了呢？（会报错Invalid Argument Error: Assign requires shapes of both tensors to match）按理说如果可以根据经验设置，应该类似BATCH_SIZE，设置在合理范围就不会影响程序的运行吧？

在8.4.1节PTB的程序中HIDDEN_SIZE取值为词向量的维度，这个很好理解，但这里取值是30，真的想不通为什么。烦请指教！

6.5.2的tensorflow实现迁移学习两个平台上训练得到差别很大的测试结果，这是什么原理

我分别在ubuntu16.04和windows7下测试来这个迁移学习的demo。在windows7下，无GPU，测试结果为95.1%。而在ubuntu下，有一个GPU，测试结果为91.9%。采用同样的数据集，同样的代码，多次测试后结果差距都是如此明显，这该怎么解释呢。

mnist样例缺少import_model.sh

https://github.com/caicloud/tensorflow-tutorial/tree/master/caicloud.tensorflow/caicloud/clever/examples/mnist

这个目录下并没有这个文件。

而且能否提供一个样例来介绍如何在本地通过导出的模型启服务？

第八章使用循环神经网络实现语言模型

    # 定义输出列表，在这里先将不同时刻LSTM结构的输出收集起来，再通过一个全连接层得到最终的输出
    outputs = []
    # state存储不同batch种LSTM的状态，将其初始化为0
    state = self.initial_state
    with tf.variable_scope("RNN"):
        for time_step in range(num_steps):
            if time_step > 0: tf.get_variable_scope().reuse_variables()
            # 从输入数据中获取当前时刻的输入并传入LSTM结构
            cell_output, state = cell(inputs[:, time_step, :], state)                      **-----------------------这一行报错了**
            # 当前输出加入输出队列
            outputs.append(cell_output)
    
    # 把输出队列展开成[batch*hidden_size*num_steps]的形状，然后再
    # reshape成[batch*numsteps,hidden_size]的形状
    output = tf.reshape(tf.concat(1, outputs), [-1, HIDDEN_SIZE])

Traceback (most recent call last):
File "C:\Users\weizhen\workspace\TextUtil\LanguageModel.py", line 159, in
tf.app.run()
File "C:\Users\weizhen\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\python\platform\app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "C:\Users\weizhen\workspace\TextUtil\LanguageModel.py", line 135, in main
train_model = PTBModel(True, TRAIN_BATCH_SIZE, TRAIN_NUM_STEP)
File "C:\Users\weizhen\workspace\TextUtil\LanguageModel.py", line 69, in init
cell_output, state = cell(inputs[:, time_step, :], state)
File "C:\Users\weizhen\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\contrib\rnn\python\ops\core_rnn_cell_impl.py", line 953, in call
cur_inp, new_state = cell(cur_inp, cur_state)
File "C:\Users\weizhen\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\contrib\rnn\python\ops\core_rnn_cell_impl.py", line 713, in call
output, new_state = self._cell(inputs, state, scope)
File "C:\Users\weizhen\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\contrib\rnn\python\ops\core_rnn_cell_impl.py", line 235, in call
with _checked_scope(self, scope or "basic_lstm_cell", reuse=self._reuse):
File "C:\Users\weizhen\AppData\Local\Programs\Python\Python35\lib\contextlib.py", line 59, in enter
return next(self.gen)
File "C:\Users\weizhen\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\contrib\rnn\python\ops\core_rnn_cell_impl.py", line 77, in _checked_scope
type(cell).name))
ValueError: Attempt to reuse RNNCell <tensorflow.contrib.rnn.python.ops.core_rnn_cell_impl.BasicLSTMCell object at 0x00000215F1A63B00> with a different variable scope than its first use. First use of cell was with scope 'language_model/RNN/multi_rnn_cell/cell_0/basic_lstm_cell', this attempt is with scope 'language_model/RNN/multi_rnn_cell/cell_1/basic_lstm_cell'. Please create a new instance of the cell if you would like it to use a different set of weights. If before you were using: MultiRNNCell([BasicLSTMCell(...)] * num_layers), change to: MultiRNNCell([BasicLSTMCell(...) for _ in range(num_layers)]). If before you were using the same cell instance as both the forward and reverse cell of a bidirectional RNN, simply create two instances (one for forward, one for reverse). In May 2017, we will start transitioning this cell's behavior to use existing stored weights, if any, when it is called with scope=None (which can lead to silent model degradation, so this error will remain until then.)

章节6.4.1中, LeNet-5模型的实现是不是比原来的原始模型少一个全连接层？

code in 1.0.0/chapter09/ 2. 改造后的mnist_train.ipynb can not be opened

The code of" 2. 改造后的mnist_train.ipynb" in 1.0.0 chapter09 can not be opened. Maybe there are something wrong.

第6章迁移学习例程疑问

6.5.2节中为什么不同大小的图片均能通过加载的inception-v3模型生成一个2048的特征向量，求具体原理。

def get_or_create_bottleneck(sess, image_lists, label_name, index, category, jpeg_data_tensor, bottleneck_tensor):
    ......
    if not os.path.exists(bottleneck_path):

        image_path = get_image_path(image_lists, INPUT_DATA, label_name, index, category)

        image_data = gfile.FastGFile(image_path, 'rb').read() 
       # 我试了一下，对于不同的图片这里读到的image_data大小是不一样的。

        bottleneck_values = run_bottleneck_on_image(sess, image_data, jpeg_data_tensor, bottleneck_tensor)

        ......

    return bottleneck_values

def run_bottleneck_on_image(sess, image_data, image_data_tensor, bottleneck_tensor):

    bottleneck_values = sess.run(bottleneck_tensor, {image_data_tensor: image_data})
    # 也就是说这里的image_data大小也是不一样的，那具体inception-v3模型是怎么处理的呢？网上查到的资料inception-v3模型输入应该是229*229*3的固定大小啊。
    bottleneck_values = np.squeeze(bottleneck_values)
    return bottleneck_values

@caicloud 恳请指教。

8.4.1 PTBModel中num_step的问题

我参考的是1.0版本的代码，我想提一个小问题

P218说：在测试时不需要使用截断，所以可以将测试数据看成一个超长的序列。

那么按照这个思路，测试用的num_step应该是一个比较大的数，这样才符合'超长序列'的概念。
但是实际上书上采用的NUM_STEP=1，这和长序列好像并不一致。
那么这个表述和参数究竟应该如何理解？

在我目前的理解中，NUM_STEP=1在Model中就相当于每次只提供给模型长度为1的字符串，这就相当于预测单词时不提供任何上文信息直接开始预测，最终预测出来的可能会全都是词频最大的那个单词，而且经过我的实验，当train的NUM_STEP改成1是的确会让结果发生变化。

所以我想知道为什么测试时可以采用num_step=1这样的设置而不会影响结果？

第10章第三小节：多GPU并行的问题

文章中说此节用的是同步模式来训练神经网络。根据前一小节的介绍，如果我没理解错的话，同步模式为所有的GPU同时读取参数的取值，之后根据所有的GPU得到的更新参数或者是梯度来计算平均值，然后再更新参数。这里的GPU运行应该没有先后顺序的吧。

下面是10.3小节中的代码片段

        # 将神经网络的优化过程跑在不同的GPU上。
        for i in range(N_GPU):
            # 将优化过程指定在一个GPU上。
            with tf.device('/gpu:%d' % i):
                with tf.name_scope('GPU_%d' % i) as scope:
                    cur_loss = get_loss(x, y_, regularizer, scope, reuse_variables)
                    
                    # 在第一次声明变量之后，将控制变量重用的参数设置为True。这样可以让不同
                    # 的GPU更新同一组参数。注意tf.name_scope函数不会影响tf.get_variable
                    # 的命名空间。
                    reuse_variables = True
                    
                    # 使用当前GPU计算所有变量的梯度。
                    grads = opt.compute_gradients(cur_loss)
                    tower_grads.append(grads)

这里使用了for循环来调用GPU。for循环的运行机制是一步一步循环下去，这样的话，这里的GPU调用就是相当于先调用第一个GPU，第一个GPU调用完成之后再调用第二个GPU，之后再调用第三个GPU。如此往复地一一调用GPU下去。但是同步模式的训练不应该是没有这种顺序的么？是不是应该一起同时调用4个GPU？

还有，感觉这里每个GPU使用的都是相同的输入x.

能给一下tensorflow 1.1.0对应的程序吗？

使用TensorFlow实现手写数字识别显示raise IOError, 'Not a gzipped file'

1.git clone https://github.com/martin-gorner/tensorflow-mnist-tutorial.git 下载手写数字识别和下载依赖的训练数据的代码
2.运行手写数字训练示例
python mnist_1.0_softmax.py

经过对关键词的google之后，我似乎找到解决方法，如下:

我于是照做，下载了这四个文件。
但是我并不知道把这四个文件放在哪里，该解决方法也未说明。
我尝试把这四个文件直接放在git clone过来的文件夹中，继续执行语句python mnist_1.0_softmax.py
仍然重复如上错误。
请问，这四个数据文件应该放在哪里呢？

2.2.1 使用Docker安装出现问题

命令
$ docker run -it -p 8888:8888 -p 6006:6006 \ cargo.caicloud.io/tensorflow tensorflow:0.12.0
在ubuntu14.04，Docker17.09.0-ce上运行报错：
“docker: invalid reference format.
See 'docker run --help'.”

修改’-p'为‘-P'以及前面加’sudo'后会提示：
”docker: Error response from daemon: pull access denied for 8888, repository does not exist or may require 'docker login'.
See 'docker run --help'.“
请问是什么问题？怎么解决？

第六章6.4.1经典卷积网络模型实际训练出来的结果比前面章节的全连接网络差很多。

用6.4.1的经典卷积网络模型LeNet-5训练到几万次错误率也只有0.几。
而前面的全连接模型早就0.0几了。是不是模型有些问题。
而书上说："上面给出的卷积神经网络可以达到99.4%的正确率，相比第五章中最高的98.4%...”，挺困惑。

5.2.1书上96-100的源代怎么找不到，找到的都是改动很大的，求解

8.4.1章书中实例代码可能有问题

8.4.1章中有一个转换向量形态的操作.
output = tf.reshape(tf.concat(outputs, 1), [-1, HIDDEN_SIZE])
这是正确的代码.
但是书中写的是
output = tf.reshape(tf.concat(1, outputs), [-1, HIDDEN_SIZE])
会导致代码报错. 如果我提出的是错误的, 麻烦指正.
感谢

第八章中使用循环神经网络实现语言模型代码中，model.initial_state真的可以作为feeddict吗?

在第八章 3. 使用循环神经网络实现语言模型.ipynb 中，阅读代码我们可以看到，首先定义了一个类，在类中首先定义了self.initial_state = cell.zero_state(batch_size, tf.float32)，之后将值给了变量state = self.initial_state，在这里使用了变量cell_output, state = cell(inputs[:, time_step, :], state)，之后self.final_state = state。但是在session.run的代码为cost, state, _ = session.run([model.cost, model.final_state, train_op],{model.input_data: x, model.targets: y, model.initial_state: state})，我们可以看到，feeddict为{model.input_data: x, model.targets: y, model.initial_state: state}。所以我想问，我们真的可以给model.initial_state: state赋值吗？这里没有tf.placeholder啊

有关第六章的迁移学习，如何实现在训练以后将整个模型保存并使用保存好的模型预测新的图片？

我将以下代码从main函数中提取出来并实现模型的训练与保存：
`def train(bottleneck_tensor, jpeg_data_tensor):
image_lists = create_image_lists(TEST_PERCENTAGE, VALIDATION_PERCENTAGE)
n_classes = len(image_lists.keys())

# 定义新的神经网络输入
bottleneck_input = tf.placeholder(tf.float32, [None, BOTTLENECK_TENSOR_SIZE],
                                  name='BottleneckInputPlaceholder')
ground_truth_input = tf.placeholder(tf.float32, [None, n_classes], name='GroundTruthInput')

logits=last_layer(n_classes,bottleneck_input)
final_tensor = tf.nn.softmax(logits)

# 定义交叉熵损失函数。
cross_entropy = tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=ground_truth_input)
cross_entropy_mean = tf.reduce_mean(cross_entropy)
train_step = tf.train.GradientDescentOptimizer(LEARNING_RATE).minimize(cross_entropy_mean)

# 计算正确率。
with tf.name_scope('evaluation'):
    correct_prediction = tf.equal(tf.argmax(final_tensor, 1), tf.argmax(ground_truth_input, 1))
    evaluation_step = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

saver = tf.train.Saver(tf.global_variables(), write_version=tf.train.SaverDef.V1)

with tf.Session() as sess:
    init = tf.global_variables_initializer()
    sess.run(init)
    print("重新训练模型")
    for i in range(STEPS):
        train_bottlenecks, train_ground_truth = get_random_cached_bottlenecks(
            sess, n_classes, image_lists, BATCH, 'training', jpeg_data_tensor, bottleneck_tensor)
        sess.run(train_step,
                 feed_dict={bottleneck_input: train_bottlenecks, ground_truth_input: train_ground_truth})
        # 在验证数据上测试正确率
        if i % 100 == 0 or i + 1 == STEPS:
            validation_bottlenecks, validation_ground_truth = get_random_cached_bottlenecks(
                sess, n_classes, image_lists, BATCH, 'validation', jpeg_data_tensor, bottleneck_tensor)
            validation_accuracy = sess.run(evaluation_step, feed_dict={
                bottleneck_input: validation_bottlenecks, ground_truth_input: validation_ground_truth})
            print('Step %d: Validation accuracy on random sampled %d examples = %.1f%%' %(i, BATCH, validation_accuracy * 100))
    print('Beginning Test')
    # 在最后的测试数据上测试正确率。
    test_bottlenecks, test_ground_truth = get_tst_bottlenecks(
        sess, image_lists, n_classes, jpeg_data_tensor, bottleneck_tensor)
    test_accuracy = sess.run(evaluation_step, feed_dict={
        bottleneck_input: test_bottlenecks, ground_truth_input: test_ground_truth})
    print('Final test accuracy = %.1f%%' % (test_accuracy * 100))

    saver.save(sess, 'F:/_pythonWS/imageClassifier/ckpt/imagesClassFilter.ckpt')`

使用以下代码尝试恢复模型并使用测试图片再次预测测试图片的分类，
`def image_Classfier(bottleneck_tensor, jpeg_data_tensor):

image_lists = create_image_lists(TEST_PERCENTAGE, VALIDATION_PERCENTAGE)
n_classes = len(image_lists.keys())

# 定义新的神经网络输入
bottleneck_input = tf.placeholder(tf.float32, [None, BOTTLENECK_TENSOR_SIZE],
                                  name='BottleneckInputPlaceholder')
ground_truth_input = tf.placeholder(tf.float32, [None, n_classes], name='GroundTruthInput')

logits=last_layer(n_classes,bottleneck_input)
final_tensor = tf.nn.softmax(logits)

# 计算正确率。
with tf.name_scope('evaluation'):
    correct_prediction = tf.equal(tf.argmax(final_tensor, 1), tf.argmax(ground_truth_input, 1))
    evaluation_step = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

saver = tf.train.Saver(tf.global_variables(), write_version=tf.train.SaverDef.V1)
with tf.Session() as sess:
    # init = tf.global_variables_initializer()
    # sess.run(init)
    if os.path.exists('F:/_pythonWS/imageClassifier/ckpt/imagesClassFilter.ckpt'):
        saver.restore(sess, 'F:/_pythonWS/imageClassifier/ckpt/imagesClassFilter.ckpt')
        print("ckpt file already exist!")

    # 在最后的测试数据上测试正确率。
    test_bottlenecks, test_ground_truth = get_tst_bottlenecks(
        sess, image_lists, n_classes, jpeg_data_tensor, bottleneck_tensor)
    test_accuracy = sess.run(evaluation_step, feed_dict={
        bottleneck_input: test_bottlenecks, ground_truth_input: test_ground_truth})
    print('Final test accuracy = %.1f%%' % (test_accuracy * 100))

    with tf.name_scope('kind'):
        # image_kind=image_lists.keys()[tf.arg_max(final_tensor,1)]
        image_order_step = tf.arg_max(final_tensor, 1)
    label_name_list = list(image_lists.keys())
    for label_index, label_name in enumerate(label_name_list):
        category = 'testing'
        for index, unused_base_name in enumerate(image_lists[label_name][category]):
            bottlenecks = []
            ground_truths = []
            print("真实值%s:" % label_name)
            # print(unused_base_name)
            bottleneck = get_or_create_bottleneck(sess, image_lists, label_name, index, category,
                                                  jpeg_data_tensor, bottleneck_tensor)
            ground_truth = np.zeros(n_classes, dtype=np.float32)
            ground_truth[label_index] = 1.0
            bottlenecks.append(bottleneck)
            ground_truths.append(ground_truth)
            image_kind = sess.run(image_order_step, feed_dict={
                bottleneck_input: bottlenecks, ground_truth_input: ground_truths})
            image_kind_order = int(image_kind[0])
            print("预测值%s:" % label_name_list[image_kind_order])`

`def main():
graph_def = load_inception_v3()
bottleneck_tensor, jpeg_data_tensor = tf.import_graph_def(
graph_def, return_elements=[BOTTLENECK_TENSOR_NAME, JPEG_DATA_TENSOR_NAME])
train(bottleneck_tensor, jpeg_data_tensor)
image_Classfier(bottleneck_tensor, jpeg_data_tensor)

if name == 'main':
main()`
但是失败了，似乎并没能将保存的模型正确的恢复，请问我哪里做错了，又该如何实现？

关于本书Tensorflow5.2.1节中的问题

书中4.4.3节讲述了滑动平均技术。书中99页中用到这个技术时有一段代码如下：

感觉这段代码怪怪的，tf.control_dependecies这个是用来进行控制依赖的，本意应该是让操作variables_averges_op开始执行时确保train_op这个操作已经执行完成。查了下control_dependecies的功能，感觉应该写成
with tf.control_dependencies([train_op]):
sess.run(variables_averages_op)

才比较符合滑动平均的逻辑，对书中注释和代码中提到的tf.no_op和tf.group也感到很费解，能不能请老师给解释下啊，感觉滑动平均还是很重要的。

第九章2.改造后的mnist_trian.ipnb运行出现ValueError

ValueError: Variable layer1/weights already exists, disallowed. Did you mean to set reuse=True in VarScope? Originally defined at:...
不知道应该怎么修改？是应该在inference中添加reuse么？

第三章的完整神经网络样例程序结果与预期不符

在代码中，模型训练结束之后，进行验证时，结果与预期相差很大：
print(sess.run(y, feed_dict={x: [[0.8, 0.8], [0.1, 0.1], [0.4, 0.4], [0.5, 0.5]]}))
[[ 20.08403206]
[ 2.51050401]
[ 10.04201603]
[ 12.55251884]]
与训练集中期望的x1+x2<1的0/1不同
建议的patch如下
：
w2 = tf.Variable(tf.random_normal([3, 1], stddev=1, seed=1))
+b = tf.Variable(tf.random_normal([1, 1], stddev=1, seed=1))

x = tf.placeholder(tf.float32, (None, 2), 'x-input')
y_ = tf.placeholder(tf.float32, (None, 1), 'y-input')

a = tf.matmul(x, w1)
y = tf.matmul(a, w2)
+y = tf.sigmoid(y + b)

-cross_entropy = -tf.reduce_mean(y_ * tf.log(tf.clip_by_value(y, 1e-10, 1.0)))
+cross_entropy = tf.reduce_mean(tf.square(y_ - y))
train_step = tf.train.AdamOptimizer(0.001).minimize(cross_entropy)

第九章中监控可视化代码运行报错。版本1.2.0

Extracting /MNIST_data\train-images-idx3-ubyte.gz
Extracting /MNIST_data\train-labels-idx1-ubyte.gz
Extracting /MNIST_data\t10k-images-idx3-ubyte.gz
Extracting /MNIST_data\t10k-labels-idx1-ubyte.gz

ValueError Traceback (most recent call last)
in ()
1 if name == 'main':
----> 2 main()

in main()
18
19 with tf.name_scope('train'):
---> 20 train_step = tf.train.AdamOptimizer(0.001).minimize(cross_entropy)
21
22 with tf.name_scope('accuracy'):

D:\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\training\optimizer.py in minimize(self, loss, global_step, var_list, gate_gradients, aggregation_method, colocate_gradients_with_ops, name, grad_loss)
320 "No gradients provided for any variable, check your graph for ops"
321 " that do not support gradients, between variables %s and loss %s." %
--> 322 ([str(v) for _, v in grads_and_vars], loss))
323
324 return self.apply_gradients(grads_and_vars, global_step=global_step,

ValueError: No gradients provided for any variable, check your graph for ops that do not support gradients, between variables ["<tf.Variable 'layer1/weights/Variable:0' shape=(784, 500) dtype=float32_ref>", "<tf.Variable 'layer1/biases/Variable:0' shape=(500,) dtype=float32_ref>", "<tf.Variable 'layer2/weights/Variable:0' shape=(500, 10) dtype=float32_ref>", "<tf.Variable 'layer2/biases/Variable:0' shape=(10,) dtype=float32_ref>", "<tf.Variable 'layer1_1/weights/Variable:0' shape=(784, 500) dtype=float32_ref>", "<tf.Variable 'layer1_1/biases/Variable:0' shape=(500,) dtype=float32_ref>", "<tf.Variable 'layer2_1/weights/Variable:0' shape=(500, 10) dtype=float32_ref>", "<tf.Variable 'layer2_1/biases/Variable:0' shape=(10,) dtype=float32_ref>", "<tf.Variable 'layer1_2/weights/Variable:0' shape=(784, 500) dtype=float32_ref>", "<tf.Variable 'layer1_2/biases/Variable:0' shape=(500,) dtype=float32_ref>", "<tf.Variable 'layer2_2/weights/Variable:0' shape=(500, 10) dtype=float32_ref>", "<tf.Variable 'layer2_2/biases/Variable:0' shape=(10,) dtype=float32_ref>"] and loss Tensor("cross_entropy_2/Mean:0", shape=(), dtype=float32).

5.5中代码问题

在运行5.5中mnist_train.py的例程时，出现报错

runfile('C:/Users/Chen Yanxi/deeplearning_pythoncode/2017.7/mnist_train_55.py', wdir='C:/Users/Chen Yanxi/deeplearning_pythoncode/2017.7')
Reloaded modules: mnist_inference_55, mnist_train_55
Extracting /tmp/data\train-images-idx3-ubyte.gz
Extracting /tmp/data\train-labels-idx1-ubyte.gz
Extracting /tmp/data\t10k-images-idx3-ubyte.gz
Extracting /tmp/data\t10k-labels-idx1-ubyte.gz
Traceback (most recent call last):

File "", line 1, in
runfile('C:/Users/Chen Yanxi/deeplearning_pythoncode/2017.7/mnist_train_55.py', wdir='C:/Users/Chen Yanxi/deeplearning_pythoncode/2017.7')

File "C:\ProgramData\Anaconda2\envs\num2\lib\site-packages\spyder\utils\site\sitecustomize.py", line 880, in runfile
execfile(filename, namespace)

File "C:\ProgramData\Anaconda2\envs\num2\lib\site-packages\spyder\utils\site\sitecustomize.py", line 102, in execfile
exec(compile(f.read(), filename, 'exec'), namespace)

File "C:/Users/Chen Yanxi/deeplearning_pythoncode/2017.7/mnist_train_55.py", line 84, in
tf.app.run()

File "C:\ProgramData\Anaconda2\envs\num2\lib\site-packages\tensorflow\python\platform\app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))

File "C:/Users/Chen Yanxi/deeplearning_pythoncode/2017.7/mnist_train_55.py", line 81, in main
train(mnist)

File "C:/Users/Chen Yanxi/deeplearning_pythoncode/2017.7/mnist_train_55.py", line 43, in train
variables_averages_op = variable_averages.apply(tf.trainable_variables())

File "C:\ProgramData\Anaconda2\envs\num2\lib\site-packages\tensorflow\python\training\moving_averages.py", line 367, in apply
colocate_with_primary=True)

File "C:\ProgramData\Anaconda2\envs\num2\lib\site-packages\tensorflow\python\training\slot_creator.py", line 113, in create_slot
return _create_slot_var(primary, val, "", validate_shape, None, None)

File "C:\ProgramData\Anaconda2\envs\num2\lib\site-packages\tensorflow\python\training\slot_creator.py", line 66, in _create_slot_var
validate_shape=validate_shape)

File "C:\ProgramData\Anaconda2\envs\num2\lib\site-packages\tensorflow\python\ops\variable_scope.py", line 1065, in get_variable
use_resource=use_resource, custom_getter=custom_getter)

File "C:\ProgramData\Anaconda2\envs\num2\lib\site-packages\tensorflow\python\ops\variable_scope.py", line 962, in get_variable
use_resource=use_resource, custom_getter=custom_getter)

File "C:\ProgramData\Anaconda2\envs\num2\lib\site-packages\tensorflow\python\ops\variable_scope.py", line 367, in get_variable
validate_shape=validate_shape, use_resource=use_resource)

File "C:\ProgramData\Anaconda2\envs\num2\lib\site-packages\tensorflow\python\ops\variable_scope.py", line 352, in _true_getter
use_resource=use_resource)

File "C:\ProgramData\Anaconda2\envs\num2\lib\site-packages\tensorflow\python\ops\variable_scope.py", line 664, in _get_single_variable
name, "".join(traceback.format_list(tb))))

ValueError: Variable layer1/weights/ExponentialMovingAverage/ already exists, disallowed. Did you mean to set reuse=True in VarScope? Originally defined at:

File "C:\ProgramData\Anaconda2\envs\num2\lib\site-packages\tensorflow\python\framework\ops.py", line 1269, in init
self._traceback = _extract_stack()
File "C:\ProgramData\Anaconda2\envs\num2\lib\site-packages\tensorflow\python\framework\ops.py", line 2506, in create_op
original_op=self._default_original_op, op_def=op_def)
File "C:\ProgramData\Anaconda2\envs\num2\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 767, in apply_op
op_def=op_def)

请问应如何修改关于变量名的设置呢，谢谢。

8.2章节：LSTM结构中的问题

图8-6显示的LSTM单元结构示意图说明，输入门和输出门都需要有上一时刻的输出作为输入。而208页最上面的代码中，tensorflow中的lstm为什么不用输入上一时刻的输出？这里只用输入当前时刻的输入和上一时刻的状态就可以了，并没有显示地输入上一时刻的输出。

第6章迁移学习疑问：怎样获取瓶颈层张量的名称？

我想用自己的模型做迁移学习，如何能获得瓶颈层张量的名称?
我在我的原始代码训练模型过程中，加入
Tensor_name=tf.Tensor.name()
print"Tensor_name:",Tensor_name
报错：
Tensor_name=tf.Tensor.name()
TypeError: 'property' object is not callable

原文代码：

# Inception-v3模型中代表瓶颈层结果的张量名称。

在谷歌提出的Inception-v3模型中，这个张量名称就是'pool_3/_reshape:0'。

在训练模型时，可以通过tensor.name来获取张量的名称。

BOTTLENECK_TENSOR_NAME = 'pool_3/_reshape:0'

# 图像输入张量所对应的名称。

JPEG_DATA_TENSOR_NAME = 'DecodeJpeg/contents:0'`

使用tf.train.shuffle_batch方法出现异常

使用caicloud.clever.tensorflow库在本机运行的，不带caicloud.clever.tensorflow库的代码原本可用
W tensorflow/core/framework/op_kernel.cc:993] Invalid argument: You must feed a value for placeholder tensor 'image' with dtype float and shape [12,224,224,3]
[[Node: image = Placeholderdtype=DT_FLOAT, shape=[12,224,224,3], _device="/job:localhost/replica:0/task:0/cpu:0"]]
Traceback (most recent call last):
File "./src/train.py", line 256, in
distTfRunner.run(train_fn)
File "/usr/local/lib/python2.7/site-packages/caicloud/clever/tensorflow/dist_base.py", line 250, in run
should_stop = train_fn(sess, step)
File "./src/train.py", line 166, in train_fn
images_array, labels_array = session.run([_train_images, _train_labels])
File "/usr/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 767, in run
run_metadata_ptr)
File "/usr/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 965, in _run
feed_dict_string, options, run_metadata)
File "/usr/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1015, in _do_run
target_list, options, run_metadata)
File "/usr/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1035, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.OutOfRangeError: RandomShuffleQueue '_2_shuffle_batch/random_shuffle_queue' is closed and has insufficient elements (requested 12, current size 1)
[[Node: shuffle_batch = QueueDequeueManyV2[component_types=[DT_FLOAT, DT_FLOAT], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](shuffle_batch/random_shuffle_queue, shuffle_batch/n)]]

Caused by op u'shuffle_batch', defined at:
File "./src/train.py", line 256, in
distTfRunner.run(train_fn)
File "/usr/local/lib/python2.7/site-packages/caicloud/clever/tensorflow/dist_base.py", line 208, in run
model_fn_handler = self._call_model_fn()
File "/usr/local/lib/python2.7/site-packages/caicloud/clever/tensorflow/dist_base.py", line 169, in _call_model_fn
model_fn_handler = self._model_fn(False, 1)
File "./src/train.py", line 83, in model_fn
_train_images, _train_labels = inputs(FLAGS.batch, FLAGS.train, FLAGS.train_labels)
File "/Users/xieanping/sourcecode/github/segnet-caicloud/src/inputs.py", line 29, in inputs
min_after_dequeue=500)
File "/usr/local/lib/python2.7/site-packages/tensorflow/python/training/input.py", line 1165, in shuffle_batch
name=name)
File "/usr/local/lib/python2.7/site-packages/tensorflow/python/training/input.py", line 739, in _shuffle_batch
dequeued = queue.dequeue_many(batch_size, name=name)
File "/usr/local/lib/python2.7/site-packages/tensorflow/python/ops/data_flow_ops.py", line 458, in dequeue_many
self._queue_ref, n=n, component_types=self._dtypes, name=name)
File "/usr/local/lib/python2.7/site-packages/tensorflow/python/ops/gen_data_flow_ops.py", line 1310, in _queue_dequeue_many_v2
timeout_ms=timeout_ms, name=name)
File "/usr/local/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 763, in apply_op
op_def=op_def)
File "/usr/local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2327, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/usr/local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1226, in init
self._traceback = _extract_stack()

OutOfRangeError (see above for traceback): RandomShuffleQueue '_2_shuffle_batch/random_shuffle_queue' is closed and has insufficient elements (requested 12, current size 1)
[[Node: shuffle_batch = QueueDequeueManyV2[component_types=[DT_FLOAT, DT_FLOAT], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](shuffle_batch/random_shuffle_queue, shuffle_batch/n)]]

关于第五章的代码问题

python版本：3.6
TensorFlow版本：1.0.0
在运行"5. MNIST最佳实践/mnist_train.py"时报错

/usr/local/Cellar/python3/3.6.1/Frameworks/Python.framework/Versions/3.6/bin/python3.6 "/Users/AlfredCai/PycharmProjects/tensorflow/src/Deep_Learning_with_TensorFlow/tensorflow-tutorial/Deep_Learning_with_TensorFlow/1.0.0/Chapter05/5. MNIST最佳实践/mnist_train.py"
Extracting ../../../datasets/MNIST_data/train-images-idx3-ubyte.gz
Extracting ../../../datasets/MNIST_data/train-labels-idx1-ubyte.gz
Extracting ../../../datasets/MNIST_data/t10k-images-idx3-ubyte.gz
Extracting ../../../datasets/MNIST_data/t10k-labels-idx1-ubyte.gz
Traceback (most recent call last):
  File "/Users/AlfredCai/PycharmProjects/tensorflow/src/Deep_Learning_with_TensorFlow/tensorflow-tutorial/Deep_Learning_with_TensorFlow/1.0.0/Chapter05/5. MNIST最佳实践/mnist_train.py", line 56, in <module>
    tf.app.run()
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 44, in run
    _sys.exit(main(_sys.argv[:1] + flags_passthrough))
  File "/Users/AlfredCai/PycharmProjects/tensorflow/src/Deep_Learning_with_TensorFlow/tensorflow-tutorial/Deep_Learning_with_TensorFlow/1.0.0/Chapter05/5. MNIST最佳实践/mnist_train.py", line 52, in main
    train(mnist)
  File "/Users/AlfredCai/PycharmProjects/tensorflow/src/Deep_Learning_with_TensorFlow/tensorflow-tutorial/Deep_Learning_with_TensorFlow/1.0.0/Chapter05/5. MNIST最佳实践/mnist_train.py", line 26, in train
    cross_entropy = tf.nn.sparse_softmax_cross_entropy_with_logits(y, tf.argmax(y_, 1))
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/ops/nn_ops.py", line 1684, in sparse_softmax_cross_entropy_with_logits
    labels, logits)
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/ops/nn_ops.py", line 1533, in _ensure_xent_args
    "named arguments (labels=..., logits=..., ...)" % name)
ValueError: Only call `sparse_softmax_cross_entropy_with_logits` with named arguments (labels=..., logits=..., ...)

Process finished with exit code 1

将源码的26行改成 cross_entropy = tf.nn.sparse_softmax_cross_entropy_with_logits(logits=y, labels=tf.argmax(y_, 1))可以成功运行

8.4.1节自然语言建模中关于reuse_variables的疑问?

outputs = []
state = self.initial_state
with tf.variable_scope("RNN"):
    for time_step in range(num_steps):
        if time_step > 0: tf.get_variable_scope().reuse_variables()
        cell_output, state = cell(inputs[:, time_step, :], state)
        outputs.append(cell_output)

if time_step > 0: tf.get_variable_scope().reuse_variables() 这句到底reuse的是什么呢？一定需要写这一行吗？
我在代码中打印了所有的trainable variables, 结果如下(True代表is_training):
trainable name: language_model/embedding:0 True
trainable name: language_model/RNN/multi_rnn_cell/cell_0/basic_lstm_cell/kernel:0 True
trainable name: language_model/RNN/multi_rnn_cell/cell_0/basic_lstm_cell/bias:0 True
trainable name: language_model/weight:0 True
trainable name: language_model/bias:0 True

trainable name: language_model/embedding:0 False
trainable name: language_model/RNN/multi_rnn_cell/cell_0/basic_lstm_cell/kernel:0 False
trainable name: language_model/RNN/multi_rnn_cell/cell_0/basic_lstm_cell/bias:0 False
trainable name: language_model/weight:0 False
trainable name: language_model/bias:0 False

可以看出，所有变量都在variable_scope("language_model")下(在main函数中定义)，而这个scope在is_training=False时使用了reuse=True, 那这些变量在Train，Valid, Test时使用的都是一套变量, variable_scope("RNN")嵌套在variable_scope("language_model")下，自然也继承了reuse=True, 何必再写
if time_step > 0: tf.get_variable_scope().reuse_variables() 呢？

注：我代码只有一行和书中的代码不一样，这个是搜索到了官网的示例程序后改的，改动如下：
cell = tf.nn.rnn_cell.MultiRNNCell([lstm_cell for _ in range(NUM_EPOCH)])
顺便请教一下书中的写法和上面写法的区别，在这里先感谢您了，希望能得到您的解答，谢谢~
tensorflow rnn_example ptb_word_lm.py

第六章迁移学习怎么读取png图片

我使用加载的inception-v3模型去计算图片的特征向量，图片是jpg格式的没问题，但是是png格式的就会出错，采用
image_data = gfile.FastGFile(image_path, 'rb').read()
去读取png图片会在
bottleneck_values = sess.run(bottleneck_tensor, {image_data_tensor: image_data})

处计算特征向量时出错，控制台错误如下：

......
Not a JPEG file: starts with 0x89 0x50
......
InvalidArgumentError (see above for traceback): Invalid JPEG data, size 19839
     [[Node: import/DecodeJpeg = DecodeJpeg[acceptable_fraction=1, channels=3, dct_method="", fancy_upscaling=true, ratio=1, try_recover_truncated=false, _device="/job:localhost/replica:0/task:0/cpu:0"](_recv_import/DecodeJpeg/contents_0)]]

我猜测应该是那句读取图片的代码只支持jpg类型，但是我该怎么修改使其支持png格式？

第七章图片预处理关于完整样例中读取文件报错

image_raw_data = tf.gfile.FastGFile("../../datasets/cat.jpg", "r").read()
代码中读取图片的代码. 在我本地报错.UnicodeDecodeError: 'utf-8' codec can't decode byte error

将
image_raw_data = tf.gfile.FastGFile("../../datasets/cat.jpg", "r").read()
改为
image_raw_data = tf.gfile.FastGFile("../../datasets/cat.jpg", "rb").read()
错误消失, 我初学python,查了一下rb是读取二进制文件.
请问老师是否要改成rb?
谢谢

无法用下载 caicloud 的 tensorflow docker 镜像

无法用docker pull cargo.caicloud.io/tensorflow/tensorflow:0.12.0
docker: Error response from daemon: Get https://cargo.caicloud.io/v1/_ping: dial tcp: lookup cargo.caicloud.io on 210.22.70.3:53: server misbehaving