naturomics / capsnet-tensorflow Goto Github PK

View Code? Open in Web Editor NEW

3.8K 249.0 1.2K 1.59 MB

A Tensorflow implementation of CapsNet(Capsules Net) in paper Dynamic Routing Between Capsules

License: Apache License 2.0

Python 98.18% R 1.82%

capsnet tensorflow capsule capsule-network routing-algorithm dynamic-routing

capsnet-tensorflow's People

Contributors

Stargazers

Watchers

Forkers

mornydew gitter-badger mutual-ai harshanavkis huaxinxiao ml-lab wanjinchang 443582555 allensmile lakehui jafei0912 luojiaji djoffrey jdc08161063 colingogo zengjianping leezqcst sunxingxingtf barbecacov xiaozhenboy tanbendong stevenlol wyn314 gogozhaoya caibing1872 sfidea duxuhao napolun279 xiaoliang008 liyuanyaun ricefryegg zhuwenxiao shaowenwei cnn-gan cuhk-pjs benjamesbabala yangbuaa kimisissi felicia126 hudaduchao shibei00 gjtjx huazhelei collector-m johndpope sar-gupta jiefloyd lucko515 dhingratul iamtpb tarrysingh hwangtamu migueleichler debarko nitinreddy3 bingo619 xufabing praveenmunagapati hsm207 zgsxwsdxg heavyflavor falaktheoptimist samithaj shubhampachori12110095 alibaheri xiaodongyichuan v-italy geekrick88 13331151 noahfl anthonymcqueen21 prafull7 cove9988 liuhycv rosssong roryshively samlanka szupzp cedrickchee iqbal-chowdhury jihobak redeipirati shravankumar147 pangye papercoming longchuan1985 codeaudit msalvaris limberc zhilangtaosha hefeiq mpmoturi coderx7 fulkast kenyangzq shiva1387 cbhust8025 jianghuairong quxiaofeng qlaboratory

capsnet-tensorflow's Issues

Question about num_outputs

I'm trying to read through the definition of the class CapsLayer. Does num_outputs actually correspond to the number of capsules ? From what I understand from the following code, it looks like the number of capsules is actually stored in vec_len.

capsules = []
    for i in range(self.vec_len):
    # each capsule i: [batch_size, 6, 6, 32]
        ...
        capsules.append(caps_i)

Sorry to bother you about the documentation, it's just to have a better understanding of how capsules work. Thanks for sharing your work by the way.

Reshape is correct?

CapsNet-Tensorflow/capsLayer.py

Line 78 in 4be551a

capsules = tf.reshape(capsules, (cfg.batch_size, -1, self.vec_len, 1))

I think this line is not preserving the following in the paper:

"Each primary capsule output sees the outputs of all 256 × 81 Conv1 units whose receptive fields overlap with the location of the center of the capsule."

i.e. we should ensure that the first capsule after the view corresponds to the pixel [0,0] of the first 8 filters, and the second with [0,1] and so on.

Run in GTX960M get error [InternalError (see above for traceback): Dst tensor is not initialized.]——out of memory

When run in windows + GTX960M，I get this error

InternalError (see above for traceback): Dst tensor is not initialized.

Some blogs told that it is caused by lack of GPU memory. But I cannot fix the problem. Wish some one could help me.

运行在Windows10 + GTX960M，出现错误

InternalError (see above for traceback): Dst tensor is not initialized.

我查了些博客，说是GPU 内存不足的时候，会出现这个错误。希望能够修复这个问题。

Adding layers

Thanks for the nice code. I had a question regarding the capsnet. Would it be possible to add layers (like a conv-caps layer after the first primary layer, or a fully connected caps layer with 20 capsules befor the digit caps layer?)
I tried it myself and am getting terrible results! I can't understand why is it happening. Do you have any idea?

Note to Huadong

Hi Huadong,

I've been running succesful tests of CapsNets with Pytorch and would like to compare notes with you. Maybe we can take our discussion offline?
My email is: firstname.lastname[@]gmail.com

Let me know!

Tarry

confused about softmax(v_length)

I am confused that why you use the softmax to v_length here. since i have not found this operate in hitton's paper and Figure 1?
In contrast, it seems that the capsnet allows a muti-label classification which means that it is not necessary to use the softmax to v_length, according to Section 3 of hitton's paper (To allow for multiple digits, we use a separate margin),

Extending the Capslayers

Hello! I'm currently working on a project where I'd like to experiment with capsules in lieu of CNNs for Deep-Q Learning. Great work on releasing this implementation! While working with this code I ran into issues with using more capsule layers than just the ones in the CapsNet architecture. For instance, I was wondering if was possible to use multiple convolutional capsule layers with routing and to change their output sizes? I've tried to tweak the code to do this but I keep running into size issues and fear I might break the logical implementation. Any tips greatly appreciated!

b_IJ update

Hi,
in CapsLayer.py, consider the current code. One sees that if cfg.iter_routing == 1, that b_IJ never gets updated. Surely that is not the intent? Shouldn't b_IJ be updated at every iteration of the routing? Thanks.

Gordon

if r_iter == cfg.iter_routing - 1:
                # line 5:
                # weighting u_hat with c_IJ, element-wise in the last two dims
                # => [batch_size, 1152, 10, 16, 1]
                s_J = tf.multiply(c_IJ, u_hat)
                # then sum in the second dim, resulting in [batch_size, 1, 10, 16, 1]
                s_J = tf.reduce_sum(s_J, axis=1, keep_dims=True)
                assert s_J.get_shape() == [cfg.batch_size, 1, 10, 16, 1]

                # line 6:
                # squash using Eq.1,
                v_J = squash(s_J)
                assert v_J.get_shape() == [cfg.batch_size, 1, 10, 16, 1]
            elif r_iter < cfg.iter_routing - 1:  # Inner iterations, do not apply backpropagation
                s_J = tf.multiply(c_IJ, u_hat_stopped)
                s_J = tf.reduce_sum(s_J, axis=1, keep_dims=True)
                v_J = squash(s_J)         # <<<<<<<< MISSING UPDATE of B_IJ? 

                # line 7:
                # reshape & tile v_j from [batch_size ,1, 10, 16, 1] to [batch_size, 1152, 10, 16, 1]
                # then matmul in the last tow dim: [16, 1].T x [16, 1] => [1, 1], reduce mean in the
                # batch_size dim, resulting in [1, 1152, 10, 1, 1]
                v_J_tiled = tf.tile(v_J, [1, 1152, 1, 1, 1])
                u_produce_v = tf.matmul(u_hat_stopped, v_J_tiled, transpose_a=True)
                assert u_produce_v.get_shape() == [cfg.batch_size, 1152, 10, 1, 1]

                # b_IJ += tf.reduce_sum(u_produce_v, axis=0, keep_dims=True)
                b_IJ += u_produce_v   # <<<<<< PERHAPS THIS LINE SHOULD BE OUTSIDE THE r_iter LOOP?

How to test a new image?

Hi,

If I want to load an image and get its softmax score, How to write the script?
I've been trying several hours, since I'm a beginner in tensorflow and it's kind diffcult for me.

    with tf.Graph().as_default():
        image = tf.cast(image_arr, tf.float32)
        image = tf.image.per_image_standardization(image)
        image = tf.reshape(image, [1,28, 28, 1])
        #x = tf.placeholder(tf.float32,shape = [1,28, 28, 1])
        feature=CapsNet.test_net(image)
        logits = tf.nn.softmax(feature)
        #saver = tf.train.Saver()
        aaa=1
        with tf.Session() as sess:
            saver = tf.train.import_meta_graph('./logdir/model_epoch_0048_step_23899.meta')
            saver.restore(sess, './logdir/model_epoch_0048_step_23899')
            print(image.shape)
            test_result = sess.run(aaa,image)

mistake on annotation

I think there are some mistake on annotation. I found two.

L61
`
Reshape the input into [batch_size, 1, 1152, 8, 1]
--->

Reshape the input into [batch_size, 1152, 1, 8, 1]
`

L77
`
input: A Tensor with [batch_size, 1, num_caps_l=1152, length(u_i)=8, 1]
--->

input: A Tensor with [batch_size, num_caps_l=1152, 1, length(u_i)=8, 1]
`

order should be change I think

Thanks

about tf.argmax() function

My tf version is 1.2.1. The following code in capsNet.py :

argmax_idx = tf.argmax(self.softmax_v, axis=1, output type=tf.int32)

should be changed to:

argmax_idx = tf.to_int32(tf.argmax(self.softmax_v, axis=1)) in version 1.2.1.

Comment is different from your code

Hi, I think this two lines may be conflict, the shape of b is not as the comments
shape confusing

And for here,line.
It was confusing also.

Please check it.

Norm with reduce_mean

In the distributed version, this line use reduce_mean to calculate the norm. Is that correct?

What's the current accuracy ?

Deer man

I think we should make a wechat group here for who interest this kind of subject. My wechat is bn31201 . Hope your adding, make some deep communicating.

Try Fashion-MNIST

Could you please test and post the result on Fashion-MNIST? https://github.com/zalandoresearch/fashion-mnist

It shares the same size & format as mnist, should be straightforward to integrate.

a question

hey,everyone, i'm really wondering, is this architecture really better than original cnn? is there some wonderful performances finished in this architecture?

Cannot evaluate the model when using python main --is_training=False

Evaluating/Testing the trained model using python main.py --is_training=False gives the following error
ValueError: Can't load save_path when it is None.

did you upload the wrong version?

hi, the uploaded code is incomplete. did you upload the wrong version?

Only 10% test accuracy on rotated images!!!!

CapsNet is said to perform better for rotated images.. but i trained the network with original images .. and tested the model with rotated images... the test accuracy was 10%... which is so depressing..

Relu activation in PrimaryCap?

the tf.contrib.layers.conv2d applies a relu activation,the PrimaryCap convolution does not included a relu activation before grouping neurons into capsules and then squashed, or did I miss something from the paper

CapsNet-Tensorflow/capsLayer.py

Line 59 in 894c79c

capsules = tf.contrib.layers.conv2d(input, self.num_outputs * self.vec_len,

will the norm in squash func be a scalar

Hi, good job !
I have a small question that: in the squash func, you keep the dims of norm('vec_squared_norm') as
that of the 'vector', I wonder why not collapse its dims to [batch_size,1] ?
From where I stand, the norm should be a scalar.
e.g.:
x= [a,b,c,d]
||x||^2 = norm(x)^2 = (|a|^2+|b|^2+|c|^2+|d|^2)

thus x --> norm(x)^2 : [batch_size, 1, num_caps, vec_len, 1] --> [batch_size, 1] ?

questions about the weight maxtrix Wij between ui and vj

Firstly.thanks for your answer on zhihu as well as the implementation on github, it helps me a lot understanding the original paper.

I would like to share my doubt about the very lines just below the figure 2 of the original paper which says "each capsule in the [6,6] grid is sharing their weights with each other".which by my understanding ,means capsule outputs(vector ui) among a [6,6] grid shares the same Wij.thus,just 32 W should be updated using adam.but in your implementation ,I can't find any codes to handle the weights sharing mechanism.

Besides,I think the shape of Wij should be [16,8] as the ui is [1,8] or [8,1] vector and obviously conflicts with the Eq 2 .although it looks like a problem without any importance,I pick it out so that i would be righted if i am wrong with understanding this paper and your implementation.

If this version has no test or predict function.

Hello, I want to test the data I generate myself, But I can not see how can I do?

reshape question

in capsNet init() else branch, how can the label(a placeholder with shape (batch_size,)) be reshaped to (batch_size,10,1) ?

Training on different input dimensions than MNIST

Thanks for writing the code so shortly after the article was released. I'm trying to change the structure such that the capsule network can be trained for any image(x,y,z), but I am having trouble re-structuring the code. Can you help me identify which lines needs to be modified. I am guessing all lines with ... 28, 28, 1) -> ... 32, 32, 3) for CIFAR 10. But I am still not able to make it work.

Thank you again 👍

Test of time consuming in inference process

The time-consuming result of inference process are given in dist_version/README.md, and how do you get the test results?

share weights in 6x6x8 grids

In paper, each capsule in the [6 × 6] grid is sharing their
weights with each other and is your code miss this point?

Getting test prediction labels per image

First of all, thank you for this wonderful implementation. Not only does it work like a charm, I am learning a lot about how to use Tensorflow effectively 👍

I trained the code with all default on the MNIST dataset, which returned an accuracy of 99.49 %. That great!

I am now trying to classify some of my own handwritten MNIST digits. I have created 15,000 samples, black and white digits, with the same dimensions as MNIST. I created a small function to feed my data into main.py, and eventually got things working.

My problem is that I get a test-accuracy of ~9%, which equates to random guessing on the 10 classes!

For this reason, I would like to get the predicted labels back for each of the images, so that I can try to debug. Is there an easy way to do this? Could you please provide any hints?

Any help would be much appreciated!

Question- Why does it throw Attribute not found error when we run with training =False?

on this step
global_step = sess.run(capsNet.global_step)

why average b_ij a cross example?

https://github.com/naturomics/CapsNet-Tensorflow/blob/master/capsLayer.py#L151

            # then matmul in the last tow dim: [16, 1].T x [16, 1] => [1, 1], reduce mean in the
            # batch_size dim, resulting in [1, 1152, 10, 1, 1]
            v_J_tiled = tf.tile(v_J, [1, 1152, 1, 1, 1])
            u_produce_v = tf.matmul(u_hat, v_J_tiled, transpose_a=True)
            assert u_produce_v.get_shape() == [cfg.batch_size, 1152, 10, 1, 1]
            b_IJ += tf.reduce_sum(u_produce_v, axis=0, keep_dims=True)

Why would you need to average b across batch dimension? I don't see why would that be good, since that would make the model batch-size dependent. If there is any mention on this in the paper or other source, can you point out where and send a link, appreciated.

Set batch = 1 error

When I set batch as 1, there is a value error when building the graph. I think it was caused the this line of code.

Should we squash in the PrimaryCaps layer?

Hi,
Sorry to interrupt you again. I feel exicting when I see your work proceeding. I here realize that you do squashing operation in PrimaryCaps layer, which I don't see the reason. The paper uses squashing during routing process, but there is no routing process between Conv1 and PrimaryCaps. So I wonder is it reasonable to put squashing operation in PrimaryCaps layer? Expecting your reply! Thanks in advance.

dear author， can you tell me what is your device ,what is you used time with the default config;

作者您好，我想纵向比较下显卡性能，您能告诉我，在默认配置下，您得训练所用时间是多少么，我现在正在用tesla显卡训练，如果训练结束，我会告诉您，我得用时：》

Inconsistency with the Paper

I noticed when reading your code that you have left an inconsistency between your code and the original paper by Hinton. When you run the decoder, the input is the masked only correct capsule. This does not follow what Hinton did in the paper, because they mask the remaining capsules to 0, and pass all of the capsules to the next layer. This way positionally, the decoder can decide what it is trying to construct. The specific error is in this line self.masked_v = tf.matmul(tf.squeeze(self.caps2), tf.reshape(self.Y, (-1, 10, 1)), transpose_a=True). Therefore the first layer of the decoder should take an input of size 160, not 16.

Is your squashing input dimensions correct?

If squashing is done per capsule, then whey is the input dimensions to it 32, 1152, 8, 1, where 32 is the batch size? Shouldn't it be 32, 668, 32, 1?

TypeError: reduce_sum() got an unexpected keyword argument 'keepdims'

Why num_outputs is mandatory?

CapsNet-Tensorflow/capsNet.py

Line 45 in 1e06680

    
           digitCaps = CapsLayer(num_outputs=10, vec_len=16, with_routing=True, layer_type='FC')

Why num_outputs is set in this while it will not be used?

Why does CapsLayer version 2 equivalent to version 1?

For the input feature map (batch_size, 20, 20, 256), the Conv of version 1 do 256x32x9x9 for each point in feature map, then concat each 8 output feature maps. And Conv of version version 2 do 256x(32x8)x9x9 for each point. That is to say, in version 1, the result of each point of input feature map is effected by only 32 kernels, but in version 2, it will be effected by 32*8 kernels.

Why b_IJ is shared between single batch examples.

Forgive me if I got this wrong but it seems like the b_IJ are shared between all examples within a single batch (see reduce_sum and the shape).

I didn't see any mention of the batches in the paper, so I have assumed that there is a separate set of b_IJ weights for every batch. Why do you think that it's better to share those variables?

Edit:
I've corrected the statement:

b_IJ are shared between all batches

to:

b_IJ are shared between all examples within a single batch

which is was I originally meant.

Valid padding in CapNets

Hello sir,
I am following the Capsule Network paper and your implementation.
I have a quick question about the valid padding in the conv2 you used to get output for the Primary Caps. So as I understand, after the 1st conv layer, the size of output is (batchsize,20,20,256). So if the conv2 has 256, 9x9 kernel, stride 2 then the formula output should be (20-9+2*p)/2+1 = 6. However, mathematically, the formula above can not be solved so I would like to ask how did exactly padding (valid) works in this situation to have the output is (batchsize,6,6,256).
Thanks !

Is your b_ij wrong?

Hi,
Thanks for your contribution. And I think the b_ij defined in your code probably is unmatched with the paper.
Your code is:

self.b_ij = tf.get_variable('b_ij', shape=(1, 1152, 1, 1))
...
c_i = tf.nn.softmax(self.b_ij, dim=1)

But in fact it should be

b_i = tf.get_variable(‘b_i’, shape=(1, 1152, 16, 1))
...
c_i = tf.nn.softmax(b_i, dim=2)

If I have misunderstand your code, please ignore me. Thanks~

Routing algorithm

To the owner and all other visitors:

I do not mean to be offensive, but I decided to speak out my understanding of this routing algorithm as I have not seen any correct implementation so far yet.

The correct implementation of the routing algorithm should be treated something like the dynamic RNN in TensorFlow. In other words, if you implement it in a static way, and if you do 3 iterations, the two caps layers are actually 6 such layers. The primary layer performs line 4 and output to the digits layer, and then the digits layer performs line 5, 6, and 7 with b_ij updated, and then loop back to the primary layer again. This will need to use tf.while_loop if you use a dynamic way.

What confuses me or stops me from implementing myself is I am not sure how the weights and biases associated with the conv units are updated, as I assume other than the weights and biases associated with the capsules, each individual conv unit inside still carries its own parameters. Maybe I missed this by reading the paper.

Feel free to correct me if you believe I am wrong. Thanks.

RGB dataset (224*224)

How can we use ur code in other RGB dataset?
Suppose the structure of dataset is like that. it contains some sub-folder. Each sub-folder represents one class.

Class A:
0001.jpg 1
0002.jpg 1
Class B:
0001.jpg 2
0002.jpg 2

something about the Summary

In your code "capsNet.py",you add "self.decoded" to the "tf.summary.image" as "recon_img ",but self.X= input_image/255,and in your code
"
orgin = tf.reshape(self.X, shape=(cfg.batch_size, -1))

    squared = tf.square(self.decoded - orgin)

    self.reconstruction_err = tf.reduce_mean(squared)

"
so self.decoded is not reconstructed image,you need to multiply it by 255,right?

ValueError: Dimensions must be equal, but are 16 and 128 for 'sub_3' (op: 'Sub') with input shapes: [128,10,16,784], [128,784].

Hi，nice work! But I got an error in my local computer：

python train.py
Traceback (most recent call last):
  File "train.py", line 11, in <module>
    capsNet = CapsNet(is_training=cfg.is_training)
  File "/home/joffrey/projects/CapsNet-Tensorflow/capsNet.py", line 16, in __init__
    self.loss()
  File "/home/joffrey/projects/CapsNet-Tensorflow/capsNet.py", line 84, in loss
    squared = tf.square(self.decoded - orgin)
  File "/home/joffrey/anaconda2/lib/python2.7/site-packages/tensorflow/python/ops/math_ops.py", line 865, in binary_op_wrapper
    return func(x, y, name=name)
  File "/home/joffrey/anaconda2/lib/python2.7/site-packages/tensorflow/python/ops/gen_math_ops.py", line 2629, in _sub
    result = _op_def_lib.apply_op("Sub", x=x, y=y, name=name)
  File "/home/joffrey/anaconda2/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 767, in apply_op
    op_def=op_def)
  File "/home/joffrey/anaconda2/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2632, in create_op
    set_shapes_for_outputs(ret)
  File "/home/joffrey/anaconda2/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1911, in set_shapes_for_outputs
    shapes = shape_func(op)
  File "/home/joffrey/anaconda2/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1861, in call_with_requiring
    return call_cpp_shape_fn(op, require_shape_fn=True)
  File "/home/joffrey/anaconda2/lib/python2.7/site-packages/tensorflow/python/framework/common_shapes.py", line 595, in call_cpp_shape_fn
    require_shape_fn)
  File "/home/joffrey/anaconda2/lib/python2.7/site-packages/tensorflow/python/framework/common_shapes.py", line 659, in _call_cpp_shape_fn_impl
    raise ValueError(err.message)
ValueError: Dimensions must be equal, but are 16 and 128 for 'sub_3' (op: 'Sub') with input shapes: [128,10,16,784], [128,784].

My env version:

Python 2.7.13 |Anaconda custom (64-bit)| (default, Dec 20 2016, 23:09:15)

tensorflow==1.3.0rc2
tensorflow-gpu==1.3.0

[Question] Could the CapsNet unit apply to other more complex architecture ?

Hi!

I'm a student interested in Speech Synthesis with neural networks.
I suppose this CapsNet might improve the quality of synthesized speech,
so I try to apply this great program to the other program to generate artificial speech with neural network.

I would like to ask whether this CapsNet could replace other popular neural networks like CNN.

Thank you for answering.

Only 10% accuracy for scaled images!!!!!

CapsNet is said to perform better for scaled images but i trained the network with original images and tested the network with the scaled images to find out the test accuracy to be only 10%... #CapsBoringNet

'apply' method is not defined in 'capsLayer.py'

In 'capsLayer.py', the 'fully_connected' function uses 'CapseLayer' class to build a fully connected layer. It returns 'layer.apply(inputs)'. However, I did not find the 'apply' method definition in the class. Are you going to define it? Or it is just my problem that I did not find the definition? Could you please tell me where it is defined?

Improper loop of b_IJ

Hi,
Thanks for your great job. I found that the b_IJ is update with the order of J in your code.
In CapsConv

            for j in range(self.num_outputs):
                with tf.variable_scope('caps_' + str(j)):
                    caps_j, b_IJ = capsule(input, b_IJ, j)
                    capsules.append(caps_j)

In capsule
c_IJ = tf.nn.softmax(b_IJ, dim=2)

In your case, b_I(J+1) is not independent with b_IJ, which means the sequence matter the routing process. But in my opinion, all b_IJ should be update in parallel. Thanks for your reply in advance!