naturomics / capsnet-tensorflow Goto Github PK
View Code? Open in Web Editor NEWA Tensorflow implementation of CapsNet(Capsules Net) in paper Dynamic Routing Between Capsules
License: Apache License 2.0
A Tensorflow implementation of CapsNet(Capsules Net) in paper Dynamic Routing Between Capsules
License: Apache License 2.0
I'm trying to read through the definition of the class CapsLayer. Does num_outputs actually correspond to the number of capsules ? From what I understand from the following code, it looks like the number of capsules is actually stored in vec_len.
capsules = []
for i in range(self.vec_len):
# each capsule i: [batch_size, 6, 6, 32]
...
capsules.append(caps_i)
Sorry to bother you about the documentation, it's just to have a better understanding of how capsules work. Thanks for sharing your work by the way.
CapsNet-Tensorflow/capsLayer.py
Line 78 in 4be551a
I think this line is not preserving the following in the paper:
"Each primary capsule output sees the outputs of all 256 × 81 Conv1 units whose receptive fields overlap with the location of the center of the capsule."
i.e. we should ensure that the first capsule after the view corresponds to the pixel [0,0] of the first 8 filters, and the second with [0,1] and so on.
When run in windows + GTX960M,I get this error
InternalError (see above for traceback): Dst tensor is not initialized.
Some blogs told that it is caused by lack of GPU memory. But I cannot fix the problem. Wish some one could help me.
运行在Windows10 + GTX960M,出现错误
InternalError (see above for traceback): Dst tensor is not initialized.
我查了些博客,说是GPU 内存不足的时候,会出现这个错误。希望能够修复这个问题。
Thanks for the nice code. I had a question regarding the capsnet. Would it be possible to add layers (like a conv-caps layer after the first primary layer, or a fully connected caps layer with 20 capsules befor the digit caps layer?)
I tried it myself and am getting terrible results! I can't understand why is it happening. Do you have any idea?
Hi Huadong,
I've been running succesful tests of CapsNets with Pytorch and would like to compare notes with you. Maybe we can take our discussion offline?
My email is: firstname.lastname[@]gmail.com
Let me know!
Tarry
I am confused that why you use the softmax to v_length here. since i have not found this operate in hitton's paper and Figure 1?
In contrast, it seems that the capsnet allows a muti-label classification which means that it is not necessary to use the softmax to v_length, according to Section 3 of hitton's paper (To allow for multiple digits, we use a separate margin),
Hello! I'm currently working on a project where I'd like to experiment with capsules in lieu of CNNs for Deep-Q Learning. Great work on releasing this implementation! While working with this code I ran into issues with using more capsule layers than just the ones in the CapsNet architecture. For instance, I was wondering if was possible to use multiple convolutional capsule layers with routing and to change their output sizes? I've tried to tweak the code to do this but I keep running into size issues and fear I might break the logical implementation. Any tips greatly appreciated!
Hi,
in CapsLayer.py, consider the current code. One sees that if cfg.iter_routing == 1, that b_IJ never gets updated. Surely that is not the intent? Shouldn't b_IJ be updated at every iteration of the routing? Thanks.
Gordon
if r_iter == cfg.iter_routing - 1:
# line 5:
# weighting u_hat with c_IJ, element-wise in the last two dims
# => [batch_size, 1152, 10, 16, 1]
s_J = tf.multiply(c_IJ, u_hat)
# then sum in the second dim, resulting in [batch_size, 1, 10, 16, 1]
s_J = tf.reduce_sum(s_J, axis=1, keep_dims=True)
assert s_J.get_shape() == [cfg.batch_size, 1, 10, 16, 1]
# line 6:
# squash using Eq.1,
v_J = squash(s_J)
assert v_J.get_shape() == [cfg.batch_size, 1, 10, 16, 1]
elif r_iter < cfg.iter_routing - 1: # Inner iterations, do not apply backpropagation
s_J = tf.multiply(c_IJ, u_hat_stopped)
s_J = tf.reduce_sum(s_J, axis=1, keep_dims=True)
v_J = squash(s_J) # <<<<<<<< MISSING UPDATE of B_IJ?
# line 7:
# reshape & tile v_j from [batch_size ,1, 10, 16, 1] to [batch_size, 1152, 10, 16, 1]
# then matmul in the last tow dim: [16, 1].T x [16, 1] => [1, 1], reduce mean in the
# batch_size dim, resulting in [1, 1152, 10, 1, 1]
v_J_tiled = tf.tile(v_J, [1, 1152, 1, 1, 1])
u_produce_v = tf.matmul(u_hat_stopped, v_J_tiled, transpose_a=True)
assert u_produce_v.get_shape() == [cfg.batch_size, 1152, 10, 1, 1]
# b_IJ += tf.reduce_sum(u_produce_v, axis=0, keep_dims=True)
b_IJ += u_produce_v # <<<<<< PERHAPS THIS LINE SHOULD BE OUTSIDE THE r_iter LOOP?
Hi,
If I want to load an image and get its softmax score, How to write the script?
I've been trying several hours, since I'm a beginner in tensorflow and it's kind diffcult for me.
with tf.Graph().as_default():
image = tf.cast(image_arr, tf.float32)
image = tf.image.per_image_standardization(image)
image = tf.reshape(image, [1,28, 28, 1])
#x = tf.placeholder(tf.float32,shape = [1,28, 28, 1])
feature=CapsNet.test_net(image)
logits = tf.nn.softmax(feature)
#saver = tf.train.Saver()
aaa=1
with tf.Session() as sess:
saver = tf.train.import_meta_graph('./logdir/model_epoch_0048_step_23899.meta')
saver.restore(sess, './logdir/model_epoch_0048_step_23899')
print(image.shape)
test_result = sess.run(aaa,image)
I think there are some mistake on annotation. I found two.
L61
`
Reshape the input into [batch_size, 1, 1152, 8, 1]
--->
Reshape the input into [batch_size, 1152, 1, 8, 1]
`
L77
`
input: A Tensor with [batch_size, 1, num_caps_l=1152, length(u_i)=8, 1]
--->
input: A Tensor with [batch_size, num_caps_l=1152, 1, length(u_i)=8, 1]
`
order should be change I think
Thanks
My tf version is 1.2.1. The following code in capsNet.py :
argmax_idx = tf.argmax(self.softmax_v, axis=1, output type=tf.int32)
should be changed to:
argmax_idx = tf.to_int32(tf.argmax(self.softmax_v, axis=1))
in version 1.2.1.
Hi, I think this two lines may be conflict, the shape of b is not as the comments
shape confusing
And for here,line.
It was confusing also.
Please check it.
In the distributed version, this line use reduce_mean
to calculate the norm. Is that correct?
I think we should make a wechat group here for who interest this kind of subject. My wechat is bn31201 . Hope your adding, make some deep communicating.
Could you please test and post the result on Fashion-MNIST? https://github.com/zalandoresearch/fashion-mnist
It shares the same size & format as mnist, should be straightforward to integrate.
hey,everyone, i'm really wondering, is this architecture really better than original cnn? is there some wonderful performances finished in this architecture?
Evaluating/Testing the trained model using python main.py --is_training=False
gives the following error
ValueError: Can't load save_path when it is None.
hi, the uploaded code is incomplete. did you upload the wrong version?
CapsNet is said to perform better for rotated images.. but i trained the network with original images .. and tested the model with rotated images... the test accuracy was 10%... which is so depressing..
the tf.contrib.layers.conv2d applies a relu activation,the PrimaryCap convolution does not included a relu activation before grouping neurons into capsules and then squashed, or did I miss something from the paper
CapsNet-Tensorflow/capsLayer.py
Line 59 in 894c79c
Hi, good job !
I have a small question that: in the squash func, you keep the dims of norm('vec_squared_norm') as
that of the 'vector', I wonder why not collapse its dims to [batch_size,1] ?
From where I stand, the norm should be a scalar.
e.g.:
x= [a,b,c,d]
||x||^2 = norm(x)^2 = (|a|^2+|b|^2+|c|^2+|d|^2)
thus x --> norm(x)^2 : [batch_size, 1, num_caps, vec_len, 1] --> [batch_size, 1] ?
Firstly.thanks for your answer on zhihu as well as the implementation on github, it helps me a lot understanding the original paper.
I would like to share my doubt about the very lines just below the figure 2 of the original paper which says "each capsule in the [6,6] grid is sharing their weights with each other".which by my understanding ,means capsule outputs(vector ui) among a [6,6] grid shares the same Wij.thus,just 32 W should be updated using adam.but in your implementation ,I can't find any codes to handle the weights sharing mechanism.
Besides,I think the shape of Wij should be [16,8] as the ui is [1,8] or [8,1] vector and obviously conflicts with the Eq 2 .although it looks like a problem without any importance,I pick it out so that i would be righted if i am wrong with understanding this paper and your implementation.
Hello, I want to test the data I generate myself, But I can not see how can I do?
in capsNet init() else branch, how can the label(a placeholder with shape (batch_size,)) be reshaped to (batch_size,10,1) ?
Thanks for writing the code so shortly after the article was released. I'm trying to change the structure such that the capsule network can be trained for any image(x,y,z), but I am having trouble re-structuring the code. Can you help me identify which lines needs to be modified. I am guessing all lines with ... 28, 28, 1) -> ... 32, 32, 3) for CIFAR 10. But I am still not able to make it work.
Thank you again 👍
The time-consuming result of inference process are given in dist_version/README.md, and how do you get the test results?
In paper, each capsule in the [6 × 6] grid is sharing their
weights with each other and is your code miss this point?
First of all, thank you for this wonderful implementation. Not only does it work like a charm, I am learning a lot about how to use Tensorflow effectively 👍
I trained the code with all default on the MNIST dataset, which returned an accuracy of 99.49 %. That great!
I am now trying to classify some of my own handwritten MNIST digits. I have created 15,000 samples, black and white digits, with the same dimensions as MNIST. I created a small function to feed my data into main.py
, and eventually got things working.
My problem is that I get a test-accuracy of ~9%, which equates to random guessing on the 10 classes!
For this reason, I would like to get the predicted labels back for each of the images, so that I can try to debug. Is there an easy way to do this? Could you please provide any hints?
Any help would be much appreciated!
on this step
global_step = sess.run(capsNet.global_step)
https://github.com/naturomics/CapsNet-Tensorflow/blob/master/capsLayer.py#L151
# then matmul in the last tow dim: [16, 1].T x [16, 1] => [1, 1], reduce mean in the
# batch_size dim, resulting in [1, 1152, 10, 1, 1]
v_J_tiled = tf.tile(v_J, [1, 1152, 1, 1, 1])
u_produce_v = tf.matmul(u_hat, v_J_tiled, transpose_a=True)
assert u_produce_v.get_shape() == [cfg.batch_size, 1152, 10, 1, 1]
b_IJ += tf.reduce_sum(u_produce_v, axis=0, keep_dims=True)
Why would you need to average b across batch dimension? I don't see why would that be good, since that would make the model batch-size dependent. If there is any mention on this in the paper or other source, can you point out where and send a link, appreciated.
When I set batch as 1, there is a value error when building the graph. I think it was caused the this line of code.
Hi,
Sorry to interrupt you again. I feel exicting when I see your work proceeding. I here realize that you do squashing operation in PrimaryCaps layer, which I don't see the reason. The paper uses squashing during routing process, but there is no routing process between Conv1 and PrimaryCaps. So I wonder is it reasonable to put squashing operation in PrimaryCaps layer? Expecting your reply! Thanks in advance.
作者您好,我想纵向比较下显卡性能,您能告诉我,在默认配置下,您得训练所用时间是多少么,我现在正在用tesla显卡训练,如果训练结束,我会告诉您,我得用时:》
I noticed when reading your code that you have left an inconsistency between your code and the original paper by Hinton. When you run the decoder, the input is the masked only correct capsule. This does not follow what Hinton did in the paper, because they mask the remaining capsules to 0, and pass all of the capsules to the next layer. This way positionally, the decoder can decide what it is trying to construct. The specific error is in this line self.masked_v = tf.matmul(tf.squeeze(self.caps2), tf.reshape(self.Y, (-1, 10, 1)), transpose_a=True)
. Therefore the first layer of the decoder should take an input of size 160, not 16.
If squashing is done per capsule, then whey is the input dimensions to it 32, 1152, 8, 1, where 32 is the batch size? Shouldn't it be 32, 668, 32, 1?
Line 45 in 1e06680
Why num_outputs
is set in this while it will not be used?
For the input feature map (batch_size, 20, 20, 256), the Conv of version 1 do 256x32x9x9 for each point in feature map, then concat each 8 output feature maps. And Conv of version version 2 do 256x(32x8)x9x9 for each point. That is to say, in version 1, the result of each point of input feature map is effected by only 32 kernels, but in version 2, it will be effected by 32*8 kernels.
Forgive me if I got this wrong but it seems like the b_IJ
are shared between all examples within a single batch (see reduce_sum and the shape).
I didn't see any mention of the batches in the paper, so I have assumed that there is a separate set of b_IJ
weights for every batch. Why do you think that it's better to share those variables?
Edit:
I've corrected the statement:
b_IJ
are shared between all batches
to:
b_IJ
are shared between all examples within a single batch
which is was I originally meant.
Hello sir,
I am following the Capsule Network paper and your implementation.
I have a quick question about the valid padding in the conv2 you used to get output for the Primary Caps. So as I understand, after the 1st conv layer, the size of output is (batchsize,20,20,256). So if the conv2 has 256, 9x9 kernel, stride 2 then the formula output should be (20-9+2*p)/2+1 = 6. However, mathematically, the formula above can not be solved so I would like to ask how did exactly padding (valid) works in this situation to have the output is (batchsize,6,6,256).
Thanks !
Hi,
Thanks for your contribution. And I think the b_ij defined in your code probably is unmatched with the paper.
Your code is:
self.b_ij = tf.get_variable('b_ij', shape=(1, 1152, 1, 1))
...
c_i = tf.nn.softmax(self.b_ij, dim=1)
But in fact it should be
b_i = tf.get_variable(‘b_i’, shape=(1, 1152, 16, 1))
...
c_i = tf.nn.softmax(b_i, dim=2)
If I have misunderstand your code, please ignore me. Thanks~
To the owner and all other visitors:
I do not mean to be offensive, but I decided to speak out my understanding of this routing algorithm as I have not seen any correct implementation so far yet.
The correct implementation of the routing algorithm should be treated something like the dynamic RNN in TensorFlow. In other words, if you implement it in a static way, and if you do 3 iterations, the two caps layers are actually 6 such layers. The primary layer performs line 4 and output to the digits layer, and then the digits layer performs line 5, 6, and 7 with b_ij updated, and then loop back to the primary layer again. This will need to use tf.while_loop if you use a dynamic way.
What confuses me or stops me from implementing myself is I am not sure how the weights and biases associated with the conv units are updated, as I assume other than the weights and biases associated with the capsules, each individual conv unit inside still carries its own parameters. Maybe I missed this by reading the paper.
Feel free to correct me if you believe I am wrong. Thanks.
How can we use ur code in other RGB dataset?
Suppose the structure of dataset is like that. it contains some sub-folder. Each sub-folder represents one class.
Class A:
0001.jpg 1
0002.jpg 1
Class B:
0001.jpg 2
0002.jpg 2
In your code "capsNet.py",you add "self.decoded" to the "tf.summary.image" as "recon_img ",but self.X= input_image/255,and in your code
"
orgin = tf.reshape(self.X, shape=(cfg.batch_size, -1))
squared = tf.square(self.decoded - orgin)
self.reconstruction_err = tf.reduce_mean(squared)
"
so self.decoded is not reconstructed image,you need to multiply it by 255,right?
Hi,nice work! But I got an error in my local computer:
python train.py
Traceback (most recent call last):
File "train.py", line 11, in <module>
capsNet = CapsNet(is_training=cfg.is_training)
File "/home/joffrey/projects/CapsNet-Tensorflow/capsNet.py", line 16, in __init__
self.loss()
File "/home/joffrey/projects/CapsNet-Tensorflow/capsNet.py", line 84, in loss
squared = tf.square(self.decoded - orgin)
File "/home/joffrey/anaconda2/lib/python2.7/site-packages/tensorflow/python/ops/math_ops.py", line 865, in binary_op_wrapper
return func(x, y, name=name)
File "/home/joffrey/anaconda2/lib/python2.7/site-packages/tensorflow/python/ops/gen_math_ops.py", line 2629, in _sub
result = _op_def_lib.apply_op("Sub", x=x, y=y, name=name)
File "/home/joffrey/anaconda2/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 767, in apply_op
op_def=op_def)
File "/home/joffrey/anaconda2/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2632, in create_op
set_shapes_for_outputs(ret)
File "/home/joffrey/anaconda2/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1911, in set_shapes_for_outputs
shapes = shape_func(op)
File "/home/joffrey/anaconda2/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1861, in call_with_requiring
return call_cpp_shape_fn(op, require_shape_fn=True)
File "/home/joffrey/anaconda2/lib/python2.7/site-packages/tensorflow/python/framework/common_shapes.py", line 595, in call_cpp_shape_fn
require_shape_fn)
File "/home/joffrey/anaconda2/lib/python2.7/site-packages/tensorflow/python/framework/common_shapes.py", line 659, in _call_cpp_shape_fn_impl
raise ValueError(err.message)
ValueError: Dimensions must be equal, but are 16 and 128 for 'sub_3' (op: 'Sub') with input shapes: [128,10,16,784], [128,784].
My env version:
Python 2.7.13 |Anaconda custom (64-bit)| (default, Dec 20 2016, 23:09:15)
tensorflow==1.3.0rc2
tensorflow-gpu==1.3.0
Hi!
I'm a student interested in Speech Synthesis with neural networks.
I suppose this CapsNet might improve the quality of synthesized speech,
so I try to apply this great program to the other program to generate artificial speech with neural network.
I would like to ask whether this CapsNet could replace other popular neural networks like CNN.
Thank you for answering.
CapsNet is said to perform better for scaled images but i trained the network with original images and tested the network with the scaled images to find out the test accuracy to be only 10%... #CapsBoringNet
In 'capsLayer.py', the 'fully_connected' function uses 'CapseLayer' class to build a fully connected layer. It returns 'layer.apply(inputs)'. However, I did not find the 'apply' method definition in the class. Are you going to define it? Or it is just my problem that I did not find the definition? Could you please tell me where it is defined?
Hi,
Thanks for your great job. I found that the b_IJ is update with the order of J in your code.
In CapsConv
for j in range(self.num_outputs):
with tf.variable_scope('caps_' + str(j)):
caps_j, b_IJ = capsule(input, b_IJ, j)
capsules.append(caps_j)
In capsule
c_IJ = tf.nn.softmax(b_IJ, dim=2)
In your case, b_I(J+1) is not independent with b_IJ, which means the sequence matter the routing process. But in my opinion, all b_IJ should be update in parallel. Thanks for your reply in advance!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.