kevinzakka / recurrent-visual-attention Goto Github PK

View Code? Open in Web Editor NEW

468.0 468.0 124.0 21 MB

A PyTorch Implementation of "Recurrent Models of Visual Attention"

License: MIT License

Python 100.00%

attention pytorch ram recurrent-attention-model recurrent-models

recurrent-visual-attention's People

Contributors

Stargazers

Watchers

Forkers

marvis shubhampachori12110095 pandinosaurus ramonyeung hyzcn ipod825 southatsouth kaiqiao1992 dsp6414 ychenzhang jigya bemoregt christinaliang xycforgithub aymenx17 viveksck anhngml xiaoliang008 linzhiqiu flt19940317 wangwenshan kamranash yingweiy chaitusvk nickledave xiaoyigwr xiahaifeng1995 xbutterflyx zhouweiti chyojn mdiephuis canalstar ashutosh-adhikari sanduanji kfzyqin hyy001 xiaodongdreams j-alex-hanson wsgharvey lnhieuvn amseej johannah fcreate kaikangsdu somer81 calwld pbloem icaresth shehabk vkkhare luoyaqiao shaunlipy michaeljteng raihan2108 forestliurui sanghiad candisio xliucs koryako hhu06 wangxinqi94 mysqlsc wengdunfang wwxfromtju knowledgehacker cianhwang zhenfengcao skccks gumpw hou-yz ilvkai minzhangm rehan-ai coolsunxu fanyuzeng yuby14 deepchatterjeevns mccaffary kwatcharasupat clvcooke yanxg jinshengye-git malashinroman johnrobinsn vnesh-san bennzo chinar0c jameslandry xiaoruishan integritynoble wdr123 sbasu276 darkknight314 yodepapa joshualin24 felixnon husnain-ali21 xcq5689 zhenglinli0621 jiangjiang1108

recurrent-visual-attention's Issues

Log probs calculation is wrong

When calculating the lob probablity of the sample the code currently doesn't take into account that a non-linerarity has occured.

Speicifically:
https://github.com/kevinzakka/recurrent-visual-attention/blob/master/model.py#L109

Assumes an untransformed normal distribution. But the sample variables, l_t, has been transformed: https://github.com/kevinzakka/recurrent-visual-attention/blob/master/modules.py#L350

The easy solution to this is calculate the log probs prior applying the non-linearity. Therefore making the location_network return the log_probs and l_t (mu is no longer needed).

This probably hasn't had much of an effect if you're in the linear region of tanh its fine, however it is theoretically incorrect.

Performance

Have you ran you code? How about the performance?

why the value of loss function can be negative number?

why the value of loss function can be negative number? what does it mean when tran_loss or vac_loss <0

Epoch: 3/70 - LR: 0.000300
2.7s - loss: -0.948 - acc: 100.000: 100%|██████████| 196/196 [00:02<00:00, 73.88it/s]
train loss: 1.554 - train acc: 76.020 - val loss: 2.431 - val acc: 70.833
0%| | 0/196 [00:00<?, ?it/s]
Epoch: 4/70 - LR: 0.000300
2.6s - loss: -1.604 - acc: 100.000: 100%|██████████| 196/196 [00:02<00:00, 75.53it/s]
train loss: 1.571 - train acc: 76.020 - val loss: 1.200 - val acc: 70.833
0%| | 0/196 [00:00<?, ?it/s]
Epoch: 5/70 - LR: 0.000300
2.6s - loss: -1.605 - acc: 100.000: 100%|██████████| 196/196 [00:02<00:00, 76.86it/s]
train loss: 0.381 - train acc: 76.531 - val loss: 0.502 - val acc: 70.833

Epoch: 6/70 - LR: 0.000300
2.5s - loss: -0.820 - acc: 100.000: 100%|██████████| 196/196 [00:02<00:00, 77.22it/s]
train loss: -0.091 - train acc: 76.020 - val loss: 0.235 - val acc: 70.833

Epoch: 7/70 - LR: 0.000300
2.5s - loss: 0.282 - acc: 50.000: 100%|██████████| 196/196 [00:02<00:00, 77.19it/s]
train loss: -0.178 - train acc: 76.020 - val loss: -0.037 - val acc: 72.917 [*]

Epoch: 8/70 - LR: 0.000300
2.5s - loss: 1.670 - acc: 50.000: 100%|██████████| 196/196 [00:02<00:00, 79.95it/s]
train loss: -0.814 - train acc: 82.653 - val loss: 1.127 - val acc: 70.833
0%| | 0/196 [00:00<?, ?it/s]
Epoch: 9/70 - LR: 0.000300

Run time error with GPU

python main.py --use_gpu 1

RuntimeError: Expected object of type Variable[torch.cuda.FloatTensor] but found type Variable[torch.Flo
atTensor] for argument #1 'mat1'

accuracy calculation bit confusing

How do you calculate the final accuracy? If you have 8 steps with one glimpse, do you only consider last step as final prediction for accuracy calculations or do you average predictions from all steps given each step can have different predictions?

pytorch version

Thank you for this repo!
Which version of pytorch are you using exactly?

import torch
print(torch.__version__)

After running this code, mine is 0.2.0_4. There are many errors when I try to run yor code, include variable shape like

recurrent-visual-attention/trainer.py

Line 220 in 3828ad9

self.batch_size = x.shape[0]

and

recurrent-visual-attention/model.py

Line 6 in 3828ad9

from torch.distributions import Normal

Caveat in last commit

99c4cbe#diff-40d9c2c37e955447b1175a32afab171fL353
This is not an unnecessary detach.
As it is used in
log_pi = Normal(mu, self.std).log_prob(l_t)
which is then used in
loss_reinforce = torch.sum(-log_pi*adjusted_reward, dim=1)
which means when minimizing reinforce loss, you are altering your location network through both mu and l_t (and yes, log_pi is differentiable w.r.t both mu and l_t). However, l_t is just mu+noise and we only want the gradient to flow through mu.

Performance on Test data

The test results were very disappointing. Seem that the Recurrent Models of Visual Attention the ability to locate targets, which mentioned in the paper.

I have created several datasets on the MNIST, including changing the object size in original data to 20x20, 14x14, etc. The trained model which has been trained 101 epochs and Train acc and Val acc reached 71.654%, 94.867%, respectively, is then used to test the new datasets.

The result as follows:
Test on size_28x28 (without changing the data shape, to verify the feasibility of data shape change operation)
[] Test Acc: 1933/2000 (96.00% - 4.00%)
[] Test Acc: 3858/4000 (96.00% - 4.00%)
[] Test Acc: 5808/6000 (96.00% - 4.00%)
[] Test Acc: 7776/8000 (97.00% - 3.00%)
[*] Test Acc: 9749/10000 (97.00% - 3.00%)

Test on size_20x20:
[] Test Acc: 390/2000 (19.00% - 81.00%)
[] Test Acc: 761/4000 (19.00% - 81.00%)
[] Test Acc: 1107/6000 (18.00% - 82.00%)
[] Test Acc: 1469/8000 (18.00% - 82.00%)
[*] Test Acc: 1829/10000 (18.00% - 82.00%)

Test on size_14x14:
[] Test Acc: 257/2000 (12.00% - 88.00%)
[] Test Acc: 502/4000 (12.00% - 88.00%)
[] Test Acc: 744/6000 (12.00% - 88.00%)
[] Test Acc: 956/8000 (11.00% - 89.00%)
[*] Test Acc: 1167/10000 (11.00% - 89.00%)

The above results are basically the same as those I've seen with other deep learning (CNNs) models...

Restoring optimizer state in load_checkpoint?

In load_checkpoint function, shouldn't you also load the optimizer state?

The comment mistake

The comment in module.py (line 21) says "x" is a 4D tensor of shape (B, H, W, C)
but it's actually a 4d tensor of shape (B, C, H, W)
this mistake appears in many places

Why validation has to have 10 times repition of data?

Hello,
Great work. Can you tell me why there is 10 time repetition while validating?

License

This repository does not have a license file yet. Addition of a license to this repo would really help my research. Thanks!
This website helps you find a right one: https://choosealicense.com/

Is this model can support multi-labels per an image ?

HI, @clvcooke @kevinzakka @malashinroman

Is this model can support multi-labels per an image ?

I need classification model for images which has multiple classes.

I wonder ...

Thanks in advance.

Best,
@bemoregt.

Location embeddings

The location embeddings within the glimpse network are generated as a 128-dimensional vector by passing in two inputs to an NN, the x, and y coordinates. Could someone kindly explain the rationale behind this decision?

Recurrent Attention Model using Transformer?

Hi, @clvcooke @kevinzakka @malashinroman

Visual Recurrent Attention Model using Transformer is not yet?

That is possible?

I wonder ...

Thanks.

Best,

@bemoregt.

BUG?

https://github.com/kevinzakka/recurrent-visual-attention/blob/master/model.py#L110

This line should be deleted? Because log_pi is a vector of length (B,) in the last line, we dont need sum by dim=1

How to use our own dataset on this library?

I am trying ot understand the code for my own data, how to provide this data to the repo in pytorch data format?

Is the M parameter necessary as it is not mentioned in the paper? (the number of monte carlo sample when validation&test?)

In this implementation, there is a M parameter in validation and test mode that duplicate the input. The same input instance is processed by the REM model multiple times and the prediction is averaged.

When I remove this part so that there is no dupilcate of instance and no averaging (just as in train mode), the performance seems to have a huge drop. This should indicate that the convergence of performance is in fact much slower.

Also, the use of multiple duplicate of input instance, as mentioned as Monte Carlo Sample in your code, seems not to be a necessary part in the paper. The paper didn't mention it(correct me if I am wrong.)

Is this a design choice in order to augment the test-time performance? @kevinzakka

how negative numbers affect gradient descent.

The loss may be negative number in the model. The reason is that the reinforce loss is often to be a negative number since the reward is the larger the better. But I am very confusing about how negative numbers affect gradient descent.

I also notice that the hybrid loss tend to be zero eventually. How can loss increase with gradient descent?

How can I loading my images to this model ?

Hi, @kevinzakka

How can I loading my images to this model ?

My Image dataset has animal/cats and animal/dogs folder structure.

and 480x480 sized color images.

How Can ?

Thanks in advance.

from @bemoregt.

Start with random initial location

Does anyone know how can I start with random initial coordinates for the first square patch?

formula of loss_reinforce

recurrent-visual-attention/trainer.py

Line 389 in b659b6f

loss_reinforce = torch.mean(-log_pi*adjusted_reward)

According to the paper's formula, the gradient is summed over samples and time steps but only averaged over samples. So I think it's more appropriate to calculate loss_reinforce as

loss_reinforce = torch.sum(-log_pi*adjusted_reward, dim=1)                                    
loss_reinforce = torch.mean(loss_reinforce)

Though it's just a matter of a scaler and should be absorbed by self-adjustable optimizer...
What do you think?

Dimension is not matching

Currently i am working on making a RAM with pytorch.
i found your code and following now.

https://github.com/kevinzakka/recurrent-visual-attention/blob/master/modules.py#L164
in this line, the phi is 4D tensor if given image is color, otherwise 3D
Whatever it should be 2D tensor to apply linear operation.

isn't it a typo? or missing a reshape?

Performance on CPU vs GPU

Hi,
Just wonder if anyone encounters the same problem - it looks like the code is faster on cpu than on gpu. On my cpu (i7) it only takes around 80s per epoch but on gpu (a P100) it takes around 180s.

Anyone with the same problem?

Performance not as claimed

I checkout the recent commit that changed the optimizer. Though you claim that
"With the Adam optimizer, paper accuracy can be reached in 30 epochs."
But as I run python main.py --is_train 1, the performance isn't so desirable as claimed.
Here's the log of my running result. Can you confirm on this? (run based on commit 99c4cbe)

Epoch: 1/200 - LR: 0.000300
159.4s - loss: 0.484 - acc: 75.000: 100%|████████████████████████| 54000/54000 [02:39<00:00, 338.75it/s]
train loss: 1.646 - train acc: 44.728 - val loss: 0.898 - val acc: 73.050 [*]

Epoch: 2/200 - LR: 0.000300
136.2s - loss: 1.691 - acc: 62.500: 100%|████████████████████████| 54000/54000 [02:16<00:00, 396.43it/s]
train loss: 0.928 - train acc: 69.515 - val loss: 0.667 - val acc: 80.483 [*]

Epoch: 3/200 - LR: 0.000300
178.1s - loss: 1.099 - acc: 75.000: 100%|████████████████████████| 54000/54000 [02:58<00:00, 303.14it/s]
train loss: 0.754 - train acc: 77.141 - val loss: 0.255 - val acc: 89.717 [*]

Epoch: 4/200 - LR: 0.000300
164.8s - loss: 0.124 - acc: 87.500: 100%|████████████████████████| 54000/54000 [02:44<00:00, 327.69it/s]
train loss: 0.711 - train acc: 79.198 - val loss: 0.448 - val acc: 88.617

Epoch: 5/200 - LR: 0.000300
164.7s - loss: 1.203 - acc: 75.000: 100%|████████████████████████| 54000/54000 [02:44<00:00, 327.76it/s]
train loss: 0.711 - train acc: 79.774 - val loss: 0.218 - val acc: 91.150 [*]

Epoch: 6/200 - LR: 0.000300
152.3s - loss: 1.857 - acc: 62.500: 100%|████████████████████████| 54000/54000 [02:32<00:00, 354.63it/s]
train loss: 0.690 - train acc: 80.306 - val loss: 0.074 - val acc: 92.233 [*]

Epoch: 7/200 - LR: 0.000300
167.9s - loss: 0.470 - acc: 87.500: 100%|████████████████████████| 54000/54000 [02:47<00:00, 321.62it/s]
train loss: 0.644 - train acc: 81.596 - val loss: 0.187 - val acc: 92.150

Epoch: 8/200 - LR: 0.000300
160.9s - loss: 0.292 - acc: 87.500: 100%|████████████████████████| 54000/54000 [02:40<00:00, 335.51it/s]
train loss: 0.605 - train acc: 82.700 - val loss: 0.176 - val acc: 92.700 [*]

Epoch: 9/200 - LR: 0.000300
137.5s - loss: 1.000 - acc: 75.000: 100%|████████████████████████| 54000/54000 [02:17<00:00, 392.73it/s]
train loss: 0.590 - train acc: 83.144 - val loss: 0.179 - val acc: 93.017 [*]

Epoch: 10/200 - LR: 0.000300
157.0s - loss: 1.242 - acc: 81.250: 100%|████████████████████████| 54000/54000 [02:36<00:00, 407.21it/s]
train loss: 0.567 - train acc: 84.050 - val loss: 0.200 - val acc: 93.133 [*]

Epoch: 11/200 - LR: 0.000300
160.3s - loss: 1.275 - acc: 75.000: 100%|████████████████████████| 54000/54000 [02:40<00:00, 345.11it/s]
train loss: 0.544 - train acc: 84.524 - val loss: 0.182 - val acc: 95.033 [*]

Epoch: 12/200 - LR: 0.000300
173.2s - loss: 1.563 - acc: 75.000: 100%|████████████████████████| 54000/54000 [02:53<00:00, 242.64it/s]
train loss: 0.536 - train acc: 84.783 - val loss: 0.192 - val acc: 94.417

Epoch: 13/200 - LR: 0.000300
155.8s - loss: 0.343 - acc: 93.750: 100%|████████████████████████| 54000/54000 [02:35<00:00, 346.46it/s]
train loss: 0.525 - train acc: 85.424 - val loss: 0.127 - val acc: 95.217 [*]

Epoch: 14/200 - LR: 0.000300
159.2s - loss: 0.462 - acc: 87.500: 100%|████████████████████████| 54000/54000 [02:39<00:00, 339.10it/s]
train loss: 0.530 - train acc: 85.400 - val loss: 0.139 - val acc: 94.967

Epoch: 15/200 - LR: 0.000300
162.9s - loss: 0.065 - acc: 93.750: 100%|████████████████████████| 54000/54000 [02:42<00:00, 331.49it/s]
train loss: 0.525 - train acc: 85.461 - val loss: 0.110 - val acc: 94.983

Epoch: 16/200 - LR: 0.000300
173.3s - loss: 0.422 - acc: 87.500: 100%|████████████████████████| 54000/54000 [02:53<00:00, 345.51it/s]
train loss: 0.553 - train acc: 84.639 - val loss: 0.208 - val acc: 94.400

Epoch: 17/200 - LR: 0.000300
140.1s - loss: 0.626 - acc: 75.000: 100%|████████████████████████| 54000/54000 [02:20<00:00, 385.36it/s]
train loss: 0.555 - train acc: 84.563 - val loss: 0.196 - val acc: 95.383 [*]

Epoch: 18/200 - LR: 0.000300
153.3s - loss: 1.402 - acc: 68.750: 100%|████████████████████████| 54000/54000 [02:33<00:00, 309.71it/s]
train loss: 0.546 - train acc: 84.311 - val loss: 0.113 - val acc: 96.317 [*]

Epoch: 19/200 - LR: 0.000300
156.3s - loss: 0.039 - acc: 93.750: 100%|████████████████████████| 54000/54000 [02:36<00:00, 345.51it/s]
train loss: 0.553 - train acc: 84.543 - val loss: 0.188 - val acc: 95.433

Epoch: 20/200 - LR: 0.000300
182.3s - loss: 0.912 - acc: 68.750: 100%|████████████████████████| 54000/54000 [03:02<00:00, 231.90it/s]
train loss: 0.564 - train acc: 84.020 - val loss: 0.213 - val acc: 94.800

Epoch: 21/200 - LR: 0.000300
156.9s - loss: 0.433 - acc: 87.500: 100%|████████████████████████| 54000/54000 [02:36<00:00, 354.48it/s]
train loss: 0.589 - train acc: 83.404 - val loss: 0.145 - val acc: 94.850

Epoch: 22/200 - LR: 0.000300
171.1s - loss: 0.564 - acc: 87.500: 100%|████████████████████████| 54000/54000 [02:51<00:00, 243.82it/s]
train loss: 0.590 - train acc: 83.189 - val loss: 0.168 - val acc: 95.500

Epoch: 23/200 - LR: 0.000300
184.0s - loss: -0.073 - acc: 93.750: 100%|███████████████████████| 54000/54000 [03:04<00:00, 293.42it/s]
train loss: 0.620 - train acc: 82.443 - val loss: 0.057 - val acc: 94.850

Epoch: 24/200 - LR: 0.000300
195.0s - loss: 0.498 - acc: 68.750: 100%|████████████████████████| 54000/54000 [03:14<00:00, 215.77it/s]
train loss: 0.627 - train acc: 82.209 - val loss: 0.121 - val acc: 94.933

Epoch: 25/200 - LR: 0.000300
157.3s - loss: 0.568 - acc: 87.500: 100%|████████████████████████| 54000/54000 [02:37<00:00, 281.71it/s]
train loss: 0.618 - train acc: 82.150 - val loss: 0.133 - val acc: 95.017

Epoch: 26/200 - LR: 0.000300
150.3s - loss: -0.639 - acc: 100.000: 100%|██████████████████████| 54000/54000 [02:30<00:00, 291.28it/s]
train loss: 0.613 - train acc: 81.933 - val loss: 0.168 - val acc: 94.017

Epoch: 27/200 - LR: 0.000300
163.9s - loss: 0.304 - acc: 87.500: 100%|████████████████████████| 54000/54000 [02:43<00:00, 329.53it/s]
train loss: 0.627 - train acc: 81.819 - val loss: 0.144 - val acc: 95.933

Epoch: 28/200 - LR: 0.000300
153.0s - loss: 0.380 - acc: 87.500: 100%|████████████████████████| 54000/54000 [02:32<00:00, 306.57it/s]
train loss: 0.603 - train acc: 82.076 - val loss: 0.057 - val acc: 95.633

Epoch: 29/200 - LR: 0.000300
172.2s - loss: -0.071 - acc: 93.750: 100%|███████████████████████| 54000/54000 [02:52<00:00, 313.52it/s]
train loss: 0.623 - train acc: 82.124 - val loss: 0.114 - val acc: 96.017

Epoch: 30/200 - LR: 0.000300
143.8s - loss: 0.675 - acc: 81.250: 100%|████████████████████████| 54000/54000 [02:23<00:00, 329.06it/s]
train loss: 0.636 - train acc: 81.933 - val loss: 0.185 - val acc: 95.717

Epoch: 31/200 - LR: 0.000300
166.7s - loss: 1.192 - acc: 75.000: 100%|████████████████████████| 54000/54000 [02:46<00:00, 307.40it/s]
train loss: 0.611 - train acc: 82.126 - val loss: 0.173 - val acc: 96.133

Epoch: 32/200 - LR: 0.000300
153.9s - loss: 1.137 - acc: 75.000: 100%|████████████████████████| 54000/54000 [02:33<00:00, 350.93it/s]
train loss: 0.581 - train acc: 82.957 - val loss: 0.143 - val acc: 95.783

Epoch: 33/200 - LR: 0.000300
191.4s - loss: 0.138 - acc: 93.750: 100%|████████████████████████| 54000/54000 [03:11<00:00, 282.08it/s]
train loss: 0.593 - train acc: 82.683 - val loss: 0.259 - val acc: 95.650

Epoch: 34/200 - LR: 0.000300
150.6s - loss: 0.535 - acc: 81.250: 100%|████████████████████████| 54000/54000 [02:30<00:00, 363.96it/s]
train loss: 0.642 - train acc: 81.769 - val loss: 0.246 - val acc: 96.000

Epoch: 35/200 - LR: 0.000300
�106.5s - loss: 0.685 - acc: 78.125:  63%|███████████████▏        | 34048/54000 [01:46<00:58, 342.94it/s
171.6s - loss: 0.586 - acc: 81.250: 100%|████████████████████████| 54000/54000 [02:51<00:00, 259.16it/s]
train loss: 0.621 - train acc: 82.106 - val loss: 0.211 - val acc: 95.900

Epoch: 36/200 - LR: 0.000300
174.5s - loss: 0.722 - acc: 75.000: 100%|████████████████████████| 54000/54000 [02:54<00:00, 309.45it/s]
train loss: 0.615 - train acc: 82.026 - val loss: 0.167 - val acc: 96.000

Epoch: 37/200 - LR: 0.000300
168.1s - loss: 0.512 - acc: 81.250: 100%|████████████████████████| 54000/54000 [02:48<00:00, 321.16it/s]
train loss: 0.608 - train acc: 82.265 - val loss: 0.152 - val acc: 96.317

Epoch: 38/200 - LR: 0.000300
155.8s - loss: 0.390 - acc: 87.500: 100%|████████████████████████| 54000/54000 [02:35<00:00, 305.99it/s]
train loss: 0.626 - train acc: 81.854 - val loss: 0.173 - val acc: 96.550 [*]

Epoch: 39/200 - LR: 0.000300
154.8s - loss: 0.108 - acc: 93.750: 100%|████████████████████████| 54000/54000 [02:34<00:00, 348.91it/s]
train loss: 0.634 - train acc: 81.515 - val loss: 0.220 - val acc: 96.183

Epoch: 40/200 - LR: 0.000300
159.1s - loss: -0.091 - acc: 100.000: 100%|██████████████████████| 54000/54000 [02:39<00:00, 339.46it/s]
train loss: 0.618 - train acc: 81.963 - val loss: 0.243 - val acc: 95.600

Issue with RGB data

I've trained the RAM implementation on various 3 channel images and plotted the glimpses extracted by the network on a random batch at various epochs. The bounding box does not seem to move around the input image to explore different locations (see videos below). Any idea why glimpses seem to be stuck on the top left side of the images when using RGB images but seem to move around with the grayscale MNIST? Have you encountered such behaviour when trained on other data?

SVHN:
epoch 12
epoch 24

CIFAR10

MNIST

@clvcooke @kevinzakka @malashinroman

pycharm step debug freeze, bug fixed

python3 main.py --use_gpu False --is_train True

#kwargs = {}
    if config.use_gpu:
        torch.cuda.manual_seed(config.random_seed)
        kwargs = {"num_workers": 1, "pin_memory": True}
    else:
        kwargs = {}

    # instantiate data loaders
   '''if config.is_train:
        dloader = data_loader.get_train_valid_loader(config.data_dir,
                                                    config.batch_size,
                                                    config.random_seed,
                                                    config.valid_size,
                                                    config.shuffle,
                                                    config.show_sample,
                                                    **kwargs)
    else:
        dloader = data_loader.get_test_loader(config.data_dir,
                                              config.batch_size,
                                              **kwargs)'''

    if config.is_train:
        dloader = data_loader.get_train_valid_loader(config.data_dir,
                                                    config.batch_size,
                                                    config.random_seed,
                                                    config.valid_size,
                                                    config.shuffle,
                                                    config.show_sample,
                                                    kwargs)
    else:
        dloader = data_loader.get_test_loader(config.data_dir,
                                              config.batch_size,
                                              kwargs)

~~~~~~~~~~~~~~~~~~~~~data_loader.py~~~~~~~~~~~~~~~
'''def get_train_valid_loader(
    data_dir,
    batch_size,
    random_seed,
    valid_size=0.1,
    shuffle=True,
    show_sample=False,
    num_workers=4,
    pin_memory=False,
):'''
def get_train_valid_loader(data_dir,
                            batch_size,
                            random_seed,
                            valid_size,
                            shuffle,
                            show_sample,
                            kwargs):

'''train_loader = torch.utils.data.DataLoader(dataset,
                                                batch_size=batch_size,
                                                sampler=train_sampler,
                                                num_workers=num_workers,
                                                pin_memory=pin_memory)

    valid_loader = torch.utils.data.DataLoader(dataset,
                                                batch_size=batch_size,
                                                sampler=valid_sampler,
                                                num_workers=num_workers,
                                                pin_memory=pin_memory)'''

    train_loader = torch.utils.data.DataLoader(dataset, batch_size=batch_size, sampler=train_sampler, **kwargs)
    valid_loader = torch.utils.data.DataLoader(dataset, batch_size=batch_size, sampler=valid_sampler, **kwargs)

    # visualize some images
    if show_sample:
        '''sample_loader = torch.utils.data.DataLoader(dataset,
                                                    batch_size=9,
                                                    shuffle=shuffle,
                                                    num_workers=num_workers,
                                                    pin_memory=pin_memory)'''
        sample_loader = torch.utils.data.DataLoader(dataset, batch_size=9, shuffle=shuffle, **kwargs)
        data_iter = iter(sample_loader)
        images, labels = data_iter.next()
        X = images.numpy()
        X = np.transpose(X, [0, 2, 3, 1])
        plot_images(X, labels)

    return (train_loader, valid_loader)

'''def get_test_loader(data_dir, batch_size, num_workers=4, pin_memory=False):'''
def get_test_loader(data_dir, batch_size, kwargs):
    """Test datalaoder.

    If using CUDA, num_workers should be set to 1 and pin_memory to True.

    Args:
        data_dir: path directory to the dataset.
        batch_size: how many samples per batch to load.
        num_workers: number of subprocesses to use when loading the dataset.
        pin_memory: whether to copy tensors into CUDA pinned memory. Set it to
            True if using GPU.
    """
    # define transforms
    normalize = transforms.Normalize((0.1307,), (0.3081,))
    trans = transforms.Compose([transforms.ToTensor(), normalize])

    # load dataset
    dataset = datasets.MNIST(data_dir, train=False, download=True, transform=trans)

    '''data_loader = torch.utils.data.DataLoader(
        dataset,
        batch_size=batch_size,
        shuffle=False,
        num_workers=num_workers,
        pin_memory=pin_memory,
    )'''
    data_loader = torch.utils.data.DataLoader(dataset, batch_size=batch_size, shuffle=False, **kwargs)

    return data_loader

Issues faced while using my own dataset

Hello I want to use recurrent visual attention with my own dataset so I have a custom dataloader which looks like below. I have run the code with MNIST without any trouble but with my own dataset I am facing issues.

from __future__ import print_function, division #ds
import numpy as np
from utils import plot_images

import os #ds
import pandas as pd #ds
from skimage import io, transform #ds
import torch
from torchvision import datasets
from torch.utils.data import Dataset, DataLoader #ds
from torchvision import transforms
from torchvision import utils #ds
from torch.utils.data.sampler import SubsetRandomSampler


class CDataset(Dataset):


    def __init__(self, csv_file, root_dir, transform=None):
        """
        Args:
            csv_file (string): Path to the csv file with annotations.
            root_dir (string): Directory with all the images.
            transform (callable, optional): Optional transform to be applied
                on a sample.
        """
        self.frame = pd.read_csv(csv_file)
        self.root_dir = root_dir
        self.transform = transform

    def __len__(self):
        return len(self.frame)

    def __getitem__(self, idx):
        img_name = os.path.join(self.root_dir,
                                self.frame.iloc[idx, 0]+'.jpg')
        image = io.imread(img_name)
#       image = image.transpose((2, 0, 1))
        labels = np.array(self.frame.iloc[idx, 1])#.as_matrix() #ds
        #landmarks = landmarks.astype('float').reshape(-1, 2)
        #print(image.shape)
        #print(img_name,labels)
        sample = {'image': image, 'labels': labels}

        if self.transform:
            sample = self.transform(sample)

        return sample

class ToTensor(object):
    """Convert ndarrays in sample to Tensors."""


    def __call__(self, sample):
        image, labels = sample['image'], sample['labels']
        #print(image)
        #print(labels)
        # swap color axis because
        # numpy image: H x W x C
        # torch image: C X H X W
        image = image.transpose((2, 0, 1))
        #print(image.shape)
        #print((torch.from_numpy(image)))
        #print((torch.from_numpy(labels)))
        return {'image': torch.from_numpy(image),
                'labels': torch.from_numpy(labels)}


def get_train_valid_loader(data_dir,
                           batch_size,
                           random_seed,
                           #valid_size=0.1, #ds
                           #shuffle=True,
                           show_sample=False,
                           num_workers=4,
                           pin_memory=False):
    """
    Utility function for loading and returning train and valid
    multi-process iterators over the MNIST dataset. A sample
    9x9 grid of the images can be optionally displayed.

    If using CUDA, num_workers should be set to 1 and pin_memory to True.

    Args
    ----
    - data_dir: path directory to the dataset.
    - batch_size: how many samples per batch to load.
    - random_seed: fix seed for reproducibility.
    - #ds valid_size: percentage split of the training set used for
      the validation set. Should be a float in the range [0, 1].
      In the paper, this number is set to 0.1.
    - shuffle: whether to shuffle the train/validation indices.
    - show_sample: plot 9x9 sample grid of the dataset.
    - num_workers: number of subprocesses to use when loading the dataset.
    - pin_memory: whether to copy tensors into CUDA pinned memory. Set it to
      True if using GPU.

    Returns
    -------
    - train_loader: training set iterator.
    - valid_loader: validation set iterator.
    """
    #ds
    #error_msg = "[!] valid_size should be in the range [0, 1]."
    #assert ((valid_size >= 0) and (valid_size <= 1)), error_msg
    #ds

    # define transforms
    #normalize = transforms.Normalize((0.1307,), (0.3081,))
    trans = transforms.Compose([
        ToTensor(), #normalize,
    ])

    # load train dataset
    #train_dataset = datasets.MNIST(
    #    data_dir, train=True, download=True, transform=trans
    #)


    train_dataset = CDataset(csv_file='/home/Desktop/6June17/util/train.csv',
                                    root_dir='/home/caffe/data/images/',transform=trans)

    # load validation dataset
    #valid_dataset = datasets.MNIST( #ds
    #    data_dir, train=True, download=True, transform=trans #ds
    #)

    valid_dataset = CDataset(csv_file='/home/Desktop/6June17/util/eval.csv',
                                    root_dir='/home/caffe/data/images/',transform=trans)

    num_train = len(train_dataset) 
    train_indices = list(range(num_train)) 
    #ds split = int(np.floor(valid_size * num_train))

    num_valid = len(valid_dataset) #ds
    valid_indices = list(range(num_valid)) #ds

    #if shuffle:
    #    np.random.seed(random_seed)
    #    np.random.shuffle(indices)

    #ds train_idx, valid_idx = indices[split:], indices[:split]
    train_idx = train_indices #ds
    valid_idx = valid_indices #ds

    train_sampler = SubsetRandomSampler(train_idx)
    valid_sampler = SubsetRandomSampler(valid_idx)

    train_loader = torch.utils.data.DataLoader(
        train_dataset, batch_size=batch_size, sampler=train_sampler,
        num_workers=num_workers, pin_memory=pin_memory,
    )

    print(train_loader)

    valid_loader = torch.utils.data.DataLoader(
        valid_dataset, batch_size=batch_size, sampler=valid_sampler,
        num_workers=num_workers, pin_memory=pin_memory,
    )

    # visualize some images
    if show_sample:
        sample_loader = torch.utils.data.DataLoader(
            dataset, batch_size=9, #shuffle=shuffle,
            num_workers=num_workers, pin_memory=pin_memory
        )
        data_iter = iter(sample_loader)
        images, labels = data_iter.next()
        X = images.numpy()
        X = np.transpose(X, [0, 2, 3, 1])
        plot_images(X, labels)

    return (train_loader, valid_loader)


def get_test_loader(data_dir,
                    batch_size,
                    num_workers=4,
                    pin_memory=False):
    """
    Utility function for loading and returning a multi-process
    test iterator over the MNIST dataset.

    If using CUDA, num_workers should be set to 1 and pin_memory to True.

    Args
    ----
    - data_dir: path directory to the dataset.
    - batch_size: how many samples per batch to load.
    - num_workers: number of subprocesses to use when loading the dataset.
    - pin_memory: whether to copy tensors into CUDA pinned memory. Set it to
      True if using GPU.

    Returns
    -------
    - data_loader: test set iterator.
    """
    # define transforms
    #normalize = transforms.Normalize((0.1307,), (0.3081,))
    trans = transforms.Compose([
        ToTensor(), #normalize,
    ])

    # load dataset
    #dataset = datasets.MNIST(
    #    data_dir, train=False, download=True, transform=trans
    #)

    test_dataset = CDataset(csv_file='/home/Desktop/6June17/util/test.csv',
                                    root_dir='/home/caffe/data/images/',transform=trans)

    test_loader = torch.utils.data.DataLoader(
        test_dataset, batch_size=batch_size, shuffle=False,
        num_workers=num_workers, pin_memory=pin_memory,
    )

    return test_loader


#for i_batch, sample_batched in enumerate(dataloader):
#    print(i_batch, sample_batched['image'].size(),
#          sample_batched['landmarks'].size())

#    # observe 4th batch and stop.
#    if i_batch == 3:
#        plt.figure()
#        show_landmarks_batch(sample_batched)
#        plt.axis('off')
#        plt.ioff()
#        plt.show()
#        break

Other main change I have made is closing off the parameter intake for validation size and shuffling (as I am using a pre-existing train, validation and test split and I have already shuffled these splits)

And my last change is,while iterating at trainer.py train_one_epoch(self, epoch) function. I have changed this part because formerly the x,y was being returned as strings of "image" and "labels" - headers of the pyton dictionary rather than the values in batches.

      for i, batch in enumerate(self.train_loader):
 
                x, y = batch["image"], batch["labels"]

But now I get issues that I can not figure out:

Without the GPU, I get this error:

[*] Train on 64034 samples, validate on 18951 samples
Epoch: 1/200 - LR: 0.000300
<torch.utils.data.dataloader.DataLoader object at 0x7fe065fd4f60>
  0%|                                                                                                                                                                             | 0/64034 [00:00<?, ?it/s]/home/duygu/recurrent-visual-attention-master/modules.py:106: UserWarning: invalid index of a 0-dim tensor. This will be an error in PyTorch 0.5. Use tensor.item() to convert a 0-dim tensor to a Python number
  from_x, to_x = from_x.data[0], to_x.data[0]
/home/duygu/recurrent-visual-attention-master/modules.py:107: UserWarning: invalid index of a 0-dim tensor. This will be an error in PyTorch 0.5. Use tensor.item() to convert a 0-dim tensor to a Python number
  from_y, to_y = from_y.data[0], to_y.data[0]

Traceback (most recent call last):
  File "main.py", line 49, in <module>
    main(config)
  File "main.py", line 40, in main
    trainer.train()
  File "/home/duygu/recurrent-visual-attention-master/trainer.py", line 168, in train
    train_loss, train_acc = self.train_one_epoch(epoch)
  File "/home/duygu/recurrent-visual-attention-master/trainer.py", line 252, in train_one_epoch
    h_t, l_t, b_t, p = self.model(x, l_t, h_t)
  File "/usr/local/lib/python3.5/dist-packages/torch/nn/modules/module.py", line 491, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/duygu/recurrent-visual-attention-master/model.py", line 101, in forward
    g_t = self.sensor(x, l_t_prev)
  File "/usr/local/lib/python3.5/dist-packages/torch/nn/modules/module.py", line 491, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/duygu/recurrent-visual-attention-master/modules.py", line 214, in forward
    phi_out = F.relu(self.fc1(phi))
  File "/usr/local/lib/python3.5/dist-packages/torch/nn/modules/module.py", line 491, in __call__
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python3.5/dist-packages/torch/nn/modules/linear.py", line 55, in forward
    return F.linear(input, self.weight, self.bias)
  File "/usr/local/lib/python3.5/dist-packages/torch/nn/functional.py", line 992, in linear
    return torch.addmm(bias, input, weight.t())
RuntimeError: Expected object of type torch.FloatTensor but found type torch.ByteTensor for argument #4 'mat1'

Also is there any modifications that we can do to use GPU (frankly, I have chosen this implementation thinking GPU is supported so I am a little discouraged with other comments saying it is not)? I could potentially try it out. But of course, most crucial part is that I have a running example and to ensure I am not doing anything wrong (kind of difficult to track as I am new to pytorch).

Thanks.

Performance is not good when using my dataset.

Hi, @kevinzakka

I entered my own data with MNIST Format(256x256, Gray Images, 5000 Images/class)

But Performance is not good.

What's wrong with me?

Epoch: 196/500 - LR: 0.000300
0.8s - loss: 1.055 - acc: 100.000: 100%|█████████| 217/217 [00:00<00:00, 267.85it/s]
train loss: 0.834 - train acc: 62.212 - val loss: 1.192 - val acc: 54.167

Epoch: 197/500 - LR: 0.000300
0.8s - loss: -0.885 - acc: 100.000: 100%|████████| 217/217 [00:00<00:00, 273.63it/s]
train loss: 0.568 - train acc: 60.369 - val loss: 0.844 - val acc: 54.167

Epoch: 198/500 - LR: 0.000300
0.8s - loss: 0.780 - acc: 100.000: 100%|█████████| 217/217 [00:00<00:00, 270.30it/s]
train loss: 0.565 - train acc: 57.604 - val loss: 1.076 - val acc: 50.000

Epoch: 199/500 - LR: 0.000300
0.8s - loss: 3.553 - acc: 0.000: 100%|███████████| 217/217 [00:00<00:00, 271.82it/s]
train loss: 0.678 - train acc: 58.525 - val loss: 0.533 - val acc: 58.333

Epoch: 200/500 - LR: 0.000300
0.8s - loss: 0.116 - acc: 100.000: 100%|█████████| 217/217 [00:00<00:00, 272.74it/s]
train loss: 0.651 - train acc: 58.986 - val loss: 1.418 - val acc: 45.833

Epoch: 201/500 - LR: 0.000300
0.8s - loss: 5.108 - acc: 0.000: 100%|███████████| 217/217 [00:00<00:00, 275.17it/s]
train loss: 0.779 - train acc: 63.594 - val loss: 0.921 - val acc: 62.500

Epoch: 202/500 - LR: 0.000300
0.8s - loss: 1.587 - acc: 0.000: 100%|███████████| 217/217 [00:00<00:00, 270.84it/s]
train loss: 0.830 - train acc: 58.525 - val loss: 0.746 - val acc: 58.333
[!] No improvement in a while, stopping training.

Thanks.

from @bemoregt.

convert torch.tensor to numpy

Nice code. But

I think where you conver torch.tensor to numpy, where gradient backpropagation broken.
use_gpu do not works.

Detaching l_t

At the moment the location tensor l_t is never detached from the computational graph in spite of both being produced by and 'consumed' by trainable modules. As far as I understand the code this enables the gradients to 'backpropagate through time' in a way that the authors of RAM did not intend: the gradients that originated in the action_network and reached the fc2 layer inside the glimpse network would travel back to the previous timestep's location_network and alter its weights and only stop once they reach the detached RNN memory vector h_t. As far as I understand the authors intended the location_network to only be trained using reinforcement learning.

This could be a bug or it could be an accidental improvement to the network; either way please let me know if my understanding is correct in here as I am still learning Pytorch and my project is heavily reliant on your code :)

Question on how to train Location network with detach function

Hi, thanks for your work and release of the code, I have one question related with training location network using REINFORCE algorithm. If I understand right,
In modules.py , the following part is the implementation for REINFORCE

# compute mean
feat = F.relu(self.fc(h_t.detach())) 
mu = torch.tanh(self.fc_lt(feat))

# reparametrization trick
l_t = torch.distributions.Normal(mu, self.std).rsample() 
l_t = l_t.detach()
log_pi = Normal(mu, self.std).log_prob(l_t)

and for calculating the loss_reinforce and reward, the relevant part is the following

# calculate reward
predicted = torch.max(log_probas, 1)[1]
R = (predicted.detach() == y).float()
R = R.unsqueeze(1).repeat(1, self.num_glimpses)

...
...

# compute reinforce loss
# summed over timesteps and averaged across batch
adjusted_reward = R - baselines.detach()
loss_reinforce = torch.sum(-log_pi * adjusted_reward, dim=1) # gradient ascent (negative)
loss_reinforce = torch.mean(loss_reinforce, dim=0)

My question is how do we update parameters in fully connected layer if we detach all the related parameters?
I read some examples on REINFORCE algorithm implementation like
pytorch document and pytorch REINFORCE official example.
however, I still cannot figure out how the detach function works
I saw another similar issues #29 and issues #20

Any help would be appreciated and thanks for your time!
Best wishes

is detach() required?

There seems to be a detach() in location_network() while obtaining mu from h_t. Same thing for the baseline or value estimation. Is this required? If yes, then essentially, the log_prob loss is not training the RNN, but only the fc layer for mu computation.
Is this correct?

nan loss whn locations are provided

I am providing locations of the important landmarks for image classification problem I am solving, but when these 2 dimensions are passed for each image, instead of loss being decreased, loss is becoming nan along with accuracy. Is it expected to normalise the location pixels?

segmentation fault while running

when running main.py, the program gives segmentation fault in the first 3 iterations in line "accs.update(acc.data[0], x.size()[0])",

I tried to increasse the hidden units size from 128 to 256 but I am getting following error.

Epoch: 1/1500 - LR: 0.000300
0%| | 0/11044 [00:04<?, ?it/s]
Traceback (most recent call last):
File "main.py", line 49, in
main(config)
File "main.py", line 41, in main
trainer.train()
File "C:\Users\apatil\pyram - exp 2\pytorch_ram\trainer.py", line 164, in train
train_loss, train_acc = self.train_one_epoch(epoch)
File "C:\Users\apatil\pyram - exp 2\pytorch_ram\trainer.py", line 241, in train_one_epoch
h_t, l_t, b_t, p = self.model(x, l_t, h_t)
File "C:\tools\Anaconda3\envs\pytorch_ram\lib\site-packages\torch\nn\modules\module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "C:\Users\apatil\pyram - exp 2\pytorch_ram\model.py", line 80, in forward
h_t = self.rnn(g_t, h_t_prev)
File "C:\tools\Anaconda3\envs\pytorch_ram\lib\site-packages\torch\nn\modules\module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "C:\Users\apatil\pyram - exp 2\pytorch_ram\modules.py", line 226, in forward
h1 = self.i2h(g_t)
File "C:\tools\Anaconda3\envs\pytorch_ram\lib\site-packages\torch\nn\modules\module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "C:\tools\Anaconda3\envs\pytorch_ram\lib\site-packages\torch\nn\modules\linear.py", line 93, in forward
return F.linear(input, self.weight, self.bias)
File "C:\tools\Anaconda3\envs\pytorch_ram\lib\site-packages\torch\nn\functional.py", line 1690, in linear
ret = torch.addmm(bias, input, weight.t())
RuntimeError: mat1 and mat2 shapes cannot be multiplied (128x512 and 256x256)

(pytorch_ram) C:\Users\apatil\pyram - exp 2\pytorch_ram>

Using the code for 3D data

Hi @kevinzakka, thank you for making your well written code public. I am trying to use DRAM for the 3D data (videos). Can you advise me if your code can be extended to 3D data?

IndexError: invalid index of a 0-dim tensor. Use tensor.item() to convert a 0-dim tensor to a Python number

use pytorch 1.0 python3.7

(Possibly) faster retina

recurrent-visual-attention/modules.py

Line 10 in b659b6f

class retina(object):

Just FYI, I am re-factoring your code and found that the retina network can be made a little faster by padding the whole batch with sufficient 0s and then extracting the patches directly. You can check a working version here.

p.s. I didn't do much profile, just check the time for the first epoch several times (about 1.3 times faster).

Test trained model

How can I load a trained model and test it with an input image?

RuntimeError: size mismatch, m1: [32 x 192], m2: [64 x 128] at /pytorch/aten/src/TH/generic/THTensorMath.c:2033

Hello, I am trying to use this with my custom dataset. I am using a dataloader (see here #18) though even when I cast my image input to Float32 and get rid of that error, I get a mismatch of tensors while training the network.

Traceback (most recent call last):
  File "main.py", line 49, in <module>
    main(config)
  File "main.py", line 40, in main
    trainer.train()
  File "/home/duygu/recurrent-visual-attention-master/trainer.py", line 168, in train
    train_loss, train_acc = self.train_one_epoch(epoch)
  File "/home/duygu/recurrent-visual-attention-master/trainer.py", line 252, in train_one_epoch
    h_t, l_t, b_t, p = self.model(x, l_t, h_t)
  File "/usr/local/lib/python3.5/dist-packages/torch/nn/modules/module.py", line 491, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/duygu/recurrent-visual-attention-master/model.py", line 101, in forward
    g_t = self.sensor(x, l_t_prev)
  File "/usr/local/lib/python3.5/dist-packages/torch/nn/modules/module.py", line 491, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/duygu/recurrent-visual-attention-master/modules.py", line 214, in forward
    phi_out = F.relu(self.fc1(phi))
  File "/usr/local/lib/python3.5/dist-packages/torch/nn/modules/module.py", line 491, in __call__
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python3.5/dist-packages/torch/nn/modules/linear.py", line 55, in forward
    return F.linear(input, self.weight, self.bias)
  File "/usr/local/lib/python3.5/dist-packages/torch/nn/functional.py", line 992, in linear
    return torch.addmm(bias, input, weight.t())
RuntimeError: size mismatch, m1: [32 x 192], m2: [64 x 128] at /pytorch/aten/src/TH/generic/THTensorMath.c:2033

I can not figure out what goes wrong. Is it about patches or weights? Any insights could be really helpful. Thanks.

gradient detach

hello, i have read you code several days ago. I have some trouble with you code. i don't understand your policy network about the .detach, i dont konw how to realize this function? Can you provide some idea.

Is this code can be a "online learning model"?

Hi, @kevinzakka

Is this code can be a "online learning model"?

I mean ... In reference time, retraining is implemented using additional data , repeatedly.
So, accuracy is better and better repeatedly ...

Is it possible?

kevinzakka / recurrent-visual-attention Goto Github PK

recurrent-visual-attention's People

Contributors

Stargazers

Watchers

Forkers

recurrent-visual-attention's Issues

Recommend Projects

Recommend Topics

Recommend Org