kevinzakka / recurrent-visual-attention Goto Github PK
View Code? Open in Web Editor NEWA PyTorch Implementation of "Recurrent Models of Visual Attention"
License: MIT License
A PyTorch Implementation of "Recurrent Models of Visual Attention"
License: MIT License
When calculating the lob probablity of the sample the code currently doesn't take into account that a non-linerarity has occured.
Speicifically:
https://github.com/kevinzakka/recurrent-visual-attention/blob/master/model.py#L109
Assumes an untransformed normal distribution. But the sample variables, l_t, has been transformed: https://github.com/kevinzakka/recurrent-visual-attention/blob/master/modules.py#L350
The easy solution to this is calculate the log probs prior applying the non-linearity. Therefore making the location_network
return the log_probs and l_t (mu is no longer needed).
This probably hasn't had much of an effect if you're in the linear region of tanh
its fine, however it is theoretically incorrect.
Have you ran you code? How about the performance?
why the value of loss function can be negative number? what does it mean when tran_loss or vac_loss <0
Epoch: 3/70 - LR: 0.000300
2.7s - loss: -0.948 - acc: 100.000: 100%|██████████| 196/196 [00:02<00:00, 73.88it/s]
train loss: 1.554 - train acc: 76.020 - val loss: 2.431 - val acc: 70.833
0%| | 0/196 [00:00<?, ?it/s]
Epoch: 4/70 - LR: 0.000300
2.6s - loss: -1.604 - acc: 100.000: 100%|██████████| 196/196 [00:02<00:00, 75.53it/s]
train loss: 1.571 - train acc: 76.020 - val loss: 1.200 - val acc: 70.833
0%| | 0/196 [00:00<?, ?it/s]
Epoch: 5/70 - LR: 0.000300
2.6s - loss: -1.605 - acc: 100.000: 100%|██████████| 196/196 [00:02<00:00, 76.86it/s]
train loss: 0.381 - train acc: 76.531 - val loss: 0.502 - val acc: 70.833
Epoch: 6/70 - LR: 0.000300
2.5s - loss: -0.820 - acc: 100.000: 100%|██████████| 196/196 [00:02<00:00, 77.22it/s]
train loss: -0.091 - train acc: 76.020 - val loss: 0.235 - val acc: 70.833
Epoch: 7/70 - LR: 0.000300
2.5s - loss: 0.282 - acc: 50.000: 100%|██████████| 196/196 [00:02<00:00, 77.19it/s]
train loss: -0.178 - train acc: 76.020 - val loss: -0.037 - val acc: 72.917 [*]
Epoch: 8/70 - LR: 0.000300
2.5s - loss: 1.670 - acc: 50.000: 100%|██████████| 196/196 [00:02<00:00, 79.95it/s]
train loss: -0.814 - train acc: 82.653 - val loss: 1.127 - val acc: 70.833
0%| | 0/196 [00:00<?, ?it/s]
Epoch: 9/70 - LR: 0.000300
python main.py --use_gpu 1
RuntimeError: Expected object of type Variable[torch.cuda.FloatTensor] but found type Variable[torch.Flo
atTensor] for argument #1 'mat1'
How do you calculate the final accuracy? If you have 8 steps with one glimpse, do you only consider last step as final prediction for accuracy calculations or do you average predictions from all steps given each step can have different predictions?
Thank you for this repo!
Which version of pytorch are you using exactly?
import torch
print(torch.__version__)
After running this code, mine is 0.2.0_4
. There are many errors when I try to run yor code, include variable shape like
recurrent-visual-attention/trainer.py
Line 220 in 3828ad9
recurrent-visual-attention/model.py
Line 6 in 3828ad9
99c4cbe#diff-40d9c2c37e955447b1175a32afab171fL353
This is not an unnecessary detach.
As it is used in
log_pi = Normal(mu, self.std).log_prob(l_t)
which is then used in
loss_reinforce = torch.sum(-log_pi*adjusted_reward, dim=1)
which means when minimizing reinforce loss, you are altering your location network through both mu and l_t (and yes, log_pi is differentiable w.r.t both mu and l_t). However, l_t is just mu+noise and we only want the gradient to flow through mu.
The test results were very disappointing. Seem that the Recurrent Models of Visual Attention the ability to locate targets, which mentioned in the paper.
I have created several datasets on the MNIST, including changing the object size in original data to 20x20, 14x14, etc. The trained model which has been trained 101 epochs and Train acc and Val acc reached 71.654%, 94.867%, respectively, is then used to test the new datasets.
The result as follows:
Test on size_28x28 (without changing the data shape, to verify the feasibility of data shape change operation)
[] Test Acc: 1933/2000 (96.00% - 4.00%)
[] Test Acc: 3858/4000 (96.00% - 4.00%)
[] Test Acc: 5808/6000 (96.00% - 4.00%)
[] Test Acc: 7776/8000 (97.00% - 3.00%)
[*] Test Acc: 9749/10000 (97.00% - 3.00%)
Test on size_20x20:
[] Test Acc: 390/2000 (19.00% - 81.00%)
[] Test Acc: 761/4000 (19.00% - 81.00%)
[] Test Acc: 1107/6000 (18.00% - 82.00%)
[] Test Acc: 1469/8000 (18.00% - 82.00%)
[*] Test Acc: 1829/10000 (18.00% - 82.00%)
Test on size_14x14:
[] Test Acc: 257/2000 (12.00% - 88.00%)
[] Test Acc: 502/4000 (12.00% - 88.00%)
[] Test Acc: 744/6000 (12.00% - 88.00%)
[] Test Acc: 956/8000 (11.00% - 89.00%)
[*] Test Acc: 1167/10000 (11.00% - 89.00%)
The above results are basically the same as those I've seen with other deep learning (CNNs) models...
In load_checkpoint
function, shouldn't you also load the optimizer state?
The comment in module.py (line 21) says "x" is a 4D tensor of shape (B, H, W, C)
but it's actually a 4d tensor of shape (B, C, H, W)
this mistake appears in many places
Hello,
Great work. Can you tell me why there is 10 time repetition while validating?
This repository does not have a license file yet. Addition of a license to this repo would really help my research. Thanks!
This website helps you find a right one: https://choosealicense.com/
HI, @clvcooke @kevinzakka @malashinroman
Is this model can support multi-labels per an image ?
I need classification model for images which has multiple classes.
I wonder ...
Thanks in advance.
Best,
@bemoregt.
The location embeddings within the glimpse network are generated as a 128-dimensional vector by passing in two inputs to an NN, the x, and y coordinates. Could someone kindly explain the rationale behind this decision?
Hi, @clvcooke @kevinzakka @malashinroman
Visual Recurrent Attention Model using Transformer is not yet?
That is possible?
I wonder ...
Thanks.
Best,
https://github.com/kevinzakka/recurrent-visual-attention/blob/master/model.py#L110
This line should be deleted? Because log_pi is a vector of length (B,) in the last line, we dont need sum by dim=1
I am trying ot understand the code for my own data, how to provide this data to the repo in pytorch data format?
In this implementation, there is a M parameter in validation and test mode that duplicate the input. The same input instance is processed by the REM model multiple times and the prediction is averaged.
When I remove this part so that there is no dupilcate of instance and no averaging (just as in train mode), the performance seems to have a huge drop. This should indicate that the convergence of performance is in fact much slower.
Also, the use of multiple duplicate of input instance, as mentioned as Monte Carlo Sample in your code, seems not to be a necessary part in the paper. The paper didn't mention it(correct me if I am wrong.)
Is this a design choice in order to augment the test-time performance? @kevinzakka
The loss may be negative number in the model. The reason is that the reinforce loss is often to be a negative number since the reward is the larger the better. But I am very confusing about how negative numbers affect gradient descent.
I also notice that the hybrid loss tend to be zero eventually. How can loss increase with gradient descent?
Hi, @kevinzakka
How can I loading my images to this model ?
My Image dataset has animal/cats and animal/dogs folder structure.
and 480x480 sized color images.
How Can ?
Thanks in advance.
from @bemoregt.
Does anyone know how can I start with random initial coordinates for the first square patch?
recurrent-visual-attention/trainer.py
Line 389 in b659b6f
According to the paper's formula, the gradient is summed over samples and time steps but only averaged over samples. So I think it's more appropriate to calculate loss_reinforce as
loss_reinforce = torch.sum(-log_pi*adjusted_reward, dim=1)
loss_reinforce = torch.mean(loss_reinforce)
Though it's just a matter of a scaler and should be absorbed by self-adjustable optimizer...
What do you think?
Currently i am working on making a RAM with pytorch.
i found your code and following now.
https://github.com/kevinzakka/recurrent-visual-attention/blob/master/modules.py#L164
in this line, the phi is 4D tensor if given image is color, otherwise 3D
Whatever it should be 2D tensor to apply linear operation.
isn't it a typo? or missing a reshape?
Hi,
Just wonder if anyone encounters the same problem - it looks like the code is faster on cpu than on gpu. On my cpu (i7) it only takes around 80s per epoch but on gpu (a P100) it takes around 180s.
Anyone with the same problem?
I checkout the recent commit that changed the optimizer. Though you claim that
"With the Adam optimizer, paper accuracy can be reached in 30 epochs."
But as I run python main.py --is_train 1
, the performance isn't so desirable as claimed.
Here's the log of my running result. Can you confirm on this? (run based on commit 99c4cbe)
Epoch: 1/200 - LR: 0.000300
159.4s - loss: 0.484 - acc: 75.000: 100%|████████████████████████| 54000/54000 [02:39<00:00, 338.75it/s]
train loss: 1.646 - train acc: 44.728 - val loss: 0.898 - val acc: 73.050 [*]
Epoch: 2/200 - LR: 0.000300
136.2s - loss: 1.691 - acc: 62.500: 100%|████████████████████████| 54000/54000 [02:16<00:00, 396.43it/s]
train loss: 0.928 - train acc: 69.515 - val loss: 0.667 - val acc: 80.483 [*]
Epoch: 3/200 - LR: 0.000300
178.1s - loss: 1.099 - acc: 75.000: 100%|████████████████████████| 54000/54000 [02:58<00:00, 303.14it/s]
train loss: 0.754 - train acc: 77.141 - val loss: 0.255 - val acc: 89.717 [*]
Epoch: 4/200 - LR: 0.000300
164.8s - loss: 0.124 - acc: 87.500: 100%|████████████████████████| 54000/54000 [02:44<00:00, 327.69it/s]
train loss: 0.711 - train acc: 79.198 - val loss: 0.448 - val acc: 88.617
Epoch: 5/200 - LR: 0.000300
164.7s - loss: 1.203 - acc: 75.000: 100%|████████████████████████| 54000/54000 [02:44<00:00, 327.76it/s]
train loss: 0.711 - train acc: 79.774 - val loss: 0.218 - val acc: 91.150 [*]
Epoch: 6/200 - LR: 0.000300
152.3s - loss: 1.857 - acc: 62.500: 100%|████████████████████████| 54000/54000 [02:32<00:00, 354.63it/s]
train loss: 0.690 - train acc: 80.306 - val loss: 0.074 - val acc: 92.233 [*]
Epoch: 7/200 - LR: 0.000300
167.9s - loss: 0.470 - acc: 87.500: 100%|████████████████████████| 54000/54000 [02:47<00:00, 321.62it/s]
train loss: 0.644 - train acc: 81.596 - val loss: 0.187 - val acc: 92.150
Epoch: 8/200 - LR: 0.000300
160.9s - loss: 0.292 - acc: 87.500: 100%|████████████████████████| 54000/54000 [02:40<00:00, 335.51it/s]
train loss: 0.605 - train acc: 82.700 - val loss: 0.176 - val acc: 92.700 [*]
Epoch: 9/200 - LR: 0.000300
137.5s - loss: 1.000 - acc: 75.000: 100%|████████████████████████| 54000/54000 [02:17<00:00, 392.73it/s]
train loss: 0.590 - train acc: 83.144 - val loss: 0.179 - val acc: 93.017 [*]
Epoch: 10/200 - LR: 0.000300
157.0s - loss: 1.242 - acc: 81.250: 100%|████████████████████████| 54000/54000 [02:36<00:00, 407.21it/s]
train loss: 0.567 - train acc: 84.050 - val loss: 0.200 - val acc: 93.133 [*]
Epoch: 11/200 - LR: 0.000300
160.3s - loss: 1.275 - acc: 75.000: 100%|████████████████████████| 54000/54000 [02:40<00:00, 345.11it/s]
train loss: 0.544 - train acc: 84.524 - val loss: 0.182 - val acc: 95.033 [*]
Epoch: 12/200 - LR: 0.000300
173.2s - loss: 1.563 - acc: 75.000: 100%|████████████████████████| 54000/54000 [02:53<00:00, 242.64it/s]
train loss: 0.536 - train acc: 84.783 - val loss: 0.192 - val acc: 94.417
Epoch: 13/200 - LR: 0.000300
155.8s - loss: 0.343 - acc: 93.750: 100%|████████████████████████| 54000/54000 [02:35<00:00, 346.46it/s]
train loss: 0.525 - train acc: 85.424 - val loss: 0.127 - val acc: 95.217 [*]
Epoch: 14/200 - LR: 0.000300
159.2s - loss: 0.462 - acc: 87.500: 100%|████████████████████████| 54000/54000 [02:39<00:00, 339.10it/s]
train loss: 0.530 - train acc: 85.400 - val loss: 0.139 - val acc: 94.967
Epoch: 15/200 - LR: 0.000300
162.9s - loss: 0.065 - acc: 93.750: 100%|████████████████████████| 54000/54000 [02:42<00:00, 331.49it/s]
train loss: 0.525 - train acc: 85.461 - val loss: 0.110 - val acc: 94.983
Epoch: 16/200 - LR: 0.000300
173.3s - loss: 0.422 - acc: 87.500: 100%|████████████████████████| 54000/54000 [02:53<00:00, 345.51it/s]
train loss: 0.553 - train acc: 84.639 - val loss: 0.208 - val acc: 94.400
Epoch: 17/200 - LR: 0.000300
140.1s - loss: 0.626 - acc: 75.000: 100%|████████████████████████| 54000/54000 [02:20<00:00, 385.36it/s]
train loss: 0.555 - train acc: 84.563 - val loss: 0.196 - val acc: 95.383 [*]
Epoch: 18/200 - LR: 0.000300
153.3s - loss: 1.402 - acc: 68.750: 100%|████████████████████████| 54000/54000 [02:33<00:00, 309.71it/s]
train loss: 0.546 - train acc: 84.311 - val loss: 0.113 - val acc: 96.317 [*]
Epoch: 19/200 - LR: 0.000300
156.3s - loss: 0.039 - acc: 93.750: 100%|████████████████████████| 54000/54000 [02:36<00:00, 345.51it/s]
train loss: 0.553 - train acc: 84.543 - val loss: 0.188 - val acc: 95.433
Epoch: 20/200 - LR: 0.000300
182.3s - loss: 0.912 - acc: 68.750: 100%|████████████████████████| 54000/54000 [03:02<00:00, 231.90it/s]
train loss: 0.564 - train acc: 84.020 - val loss: 0.213 - val acc: 94.800
Epoch: 21/200 - LR: 0.000300
156.9s - loss: 0.433 - acc: 87.500: 100%|████████████████████████| 54000/54000 [02:36<00:00, 354.48it/s]
train loss: 0.589 - train acc: 83.404 - val loss: 0.145 - val acc: 94.850
Epoch: 22/200 - LR: 0.000300
171.1s - loss: 0.564 - acc: 87.500: 100%|████████████████████████| 54000/54000 [02:51<00:00, 243.82it/s]
train loss: 0.590 - train acc: 83.189 - val loss: 0.168 - val acc: 95.500
Epoch: 23/200 - LR: 0.000300
184.0s - loss: -0.073 - acc: 93.750: 100%|███████████████████████| 54000/54000 [03:04<00:00, 293.42it/s]
train loss: 0.620 - train acc: 82.443 - val loss: 0.057 - val acc: 94.850
Epoch: 24/200 - LR: 0.000300
195.0s - loss: 0.498 - acc: 68.750: 100%|████████████████████████| 54000/54000 [03:14<00:00, 215.77it/s]
train loss: 0.627 - train acc: 82.209 - val loss: 0.121 - val acc: 94.933
Epoch: 25/200 - LR: 0.000300
157.3s - loss: 0.568 - acc: 87.500: 100%|████████████████████████| 54000/54000 [02:37<00:00, 281.71it/s]
train loss: 0.618 - train acc: 82.150 - val loss: 0.133 - val acc: 95.017
Epoch: 26/200 - LR: 0.000300
150.3s - loss: -0.639 - acc: 100.000: 100%|██████████████████████| 54000/54000 [02:30<00:00, 291.28it/s]
train loss: 0.613 - train acc: 81.933 - val loss: 0.168 - val acc: 94.017
Epoch: 27/200 - LR: 0.000300
163.9s - loss: 0.304 - acc: 87.500: 100%|████████████████████████| 54000/54000 [02:43<00:00, 329.53it/s]
train loss: 0.627 - train acc: 81.819 - val loss: 0.144 - val acc: 95.933
Epoch: 28/200 - LR: 0.000300
153.0s - loss: 0.380 - acc: 87.500: 100%|████████████████████████| 54000/54000 [02:32<00:00, 306.57it/s]
train loss: 0.603 - train acc: 82.076 - val loss: 0.057 - val acc: 95.633
Epoch: 29/200 - LR: 0.000300
172.2s - loss: -0.071 - acc: 93.750: 100%|███████████████████████| 54000/54000 [02:52<00:00, 313.52it/s]
train loss: 0.623 - train acc: 82.124 - val loss: 0.114 - val acc: 96.017
Epoch: 30/200 - LR: 0.000300
143.8s - loss: 0.675 - acc: 81.250: 100%|████████████████████████| 54000/54000 [02:23<00:00, 329.06it/s]
train loss: 0.636 - train acc: 81.933 - val loss: 0.185 - val acc: 95.717
Epoch: 31/200 - LR: 0.000300
166.7s - loss: 1.192 - acc: 75.000: 100%|████████████████████████| 54000/54000 [02:46<00:00, 307.40it/s]
train loss: 0.611 - train acc: 82.126 - val loss: 0.173 - val acc: 96.133
Epoch: 32/200 - LR: 0.000300
153.9s - loss: 1.137 - acc: 75.000: 100%|████████████████████████| 54000/54000 [02:33<00:00, 350.93it/s]
train loss: 0.581 - train acc: 82.957 - val loss: 0.143 - val acc: 95.783
Epoch: 33/200 - LR: 0.000300
191.4s - loss: 0.138 - acc: 93.750: 100%|████████████████████████| 54000/54000 [03:11<00:00, 282.08it/s]
train loss: 0.593 - train acc: 82.683 - val loss: 0.259 - val acc: 95.650
Epoch: 34/200 - LR: 0.000300
150.6s - loss: 0.535 - acc: 81.250: 100%|████████████████████████| 54000/54000 [02:30<00:00, 363.96it/s]
train loss: 0.642 - train acc: 81.769 - val loss: 0.246 - val acc: 96.000
Epoch: 35/200 - LR: 0.000300
�106.5s - loss: 0.685 - acc: 78.125: 63%|███████████████▏ | 34048/54000 [01:46<00:58, 342.94it/s
171.6s - loss: 0.586 - acc: 81.250: 100%|████████████████████████| 54000/54000 [02:51<00:00, 259.16it/s]
train loss: 0.621 - train acc: 82.106 - val loss: 0.211 - val acc: 95.900
Epoch: 36/200 - LR: 0.000300
174.5s - loss: 0.722 - acc: 75.000: 100%|████████████████████████| 54000/54000 [02:54<00:00, 309.45it/s]
train loss: 0.615 - train acc: 82.026 - val loss: 0.167 - val acc: 96.000
Epoch: 37/200 - LR: 0.000300
168.1s - loss: 0.512 - acc: 81.250: 100%|████████████████████████| 54000/54000 [02:48<00:00, 321.16it/s]
train loss: 0.608 - train acc: 82.265 - val loss: 0.152 - val acc: 96.317
Epoch: 38/200 - LR: 0.000300
155.8s - loss: 0.390 - acc: 87.500: 100%|████████████████████████| 54000/54000 [02:35<00:00, 305.99it/s]
train loss: 0.626 - train acc: 81.854 - val loss: 0.173 - val acc: 96.550 [*]
Epoch: 39/200 - LR: 0.000300
154.8s - loss: 0.108 - acc: 93.750: 100%|████████████████████████| 54000/54000 [02:34<00:00, 348.91it/s]
train loss: 0.634 - train acc: 81.515 - val loss: 0.220 - val acc: 96.183
Epoch: 40/200 - LR: 0.000300
159.1s - loss: -0.091 - acc: 100.000: 100%|██████████████████████| 54000/54000 [02:39<00:00, 339.46it/s]
train loss: 0.618 - train acc: 81.963 - val loss: 0.243 - val acc: 95.600
I've trained the RAM implementation on various 3 channel images and plotted the glimpses extracted by the network on a random batch at various epochs. The bounding box does not seem to move around the input image to explore different locations (see videos below). Any idea why glimpses seem to be stuck on the top left side of the images when using RGB images but seem to move around with the grayscale MNIST? Have you encountered such behaviour when trained on other data?
python3 main.py --use_gpu False --is_train True
#kwargs = {}
if config.use_gpu:
torch.cuda.manual_seed(config.random_seed)
kwargs = {"num_workers": 1, "pin_memory": True}
else:
kwargs = {}
# instantiate data loaders
'''if config.is_train:
dloader = data_loader.get_train_valid_loader(config.data_dir,
config.batch_size,
config.random_seed,
config.valid_size,
config.shuffle,
config.show_sample,
**kwargs)
else:
dloader = data_loader.get_test_loader(config.data_dir,
config.batch_size,
**kwargs)'''
if config.is_train:
dloader = data_loader.get_train_valid_loader(config.data_dir,
config.batch_size,
config.random_seed,
config.valid_size,
config.shuffle,
config.show_sample,
kwargs)
else:
dloader = data_loader.get_test_loader(config.data_dir,
config.batch_size,
kwargs)
~~~~~~~~~~~~~~~~~~~~~data_loader.py~~~~~~~~~~~~~~~
'''def get_train_valid_loader(
data_dir,
batch_size,
random_seed,
valid_size=0.1,
shuffle=True,
show_sample=False,
num_workers=4,
pin_memory=False,
):'''
def get_train_valid_loader(data_dir,
batch_size,
random_seed,
valid_size,
shuffle,
show_sample,
kwargs):
'''train_loader = torch.utils.data.DataLoader(dataset,
batch_size=batch_size,
sampler=train_sampler,
num_workers=num_workers,
pin_memory=pin_memory)
valid_loader = torch.utils.data.DataLoader(dataset,
batch_size=batch_size,
sampler=valid_sampler,
num_workers=num_workers,
pin_memory=pin_memory)'''
train_loader = torch.utils.data.DataLoader(dataset, batch_size=batch_size, sampler=train_sampler, **kwargs)
valid_loader = torch.utils.data.DataLoader(dataset, batch_size=batch_size, sampler=valid_sampler, **kwargs)
# visualize some images
if show_sample:
'''sample_loader = torch.utils.data.DataLoader(dataset,
batch_size=9,
shuffle=shuffle,
num_workers=num_workers,
pin_memory=pin_memory)'''
sample_loader = torch.utils.data.DataLoader(dataset, batch_size=9, shuffle=shuffle, **kwargs)
data_iter = iter(sample_loader)
images, labels = data_iter.next()
X = images.numpy()
X = np.transpose(X, [0, 2, 3, 1])
plot_images(X, labels)
return (train_loader, valid_loader)
'''def get_test_loader(data_dir, batch_size, num_workers=4, pin_memory=False):'''
def get_test_loader(data_dir, batch_size, kwargs):
"""Test datalaoder.
If using CUDA, num_workers should be set to 1 and pin_memory to True.
Args:
data_dir: path directory to the dataset.
batch_size: how many samples per batch to load.
num_workers: number of subprocesses to use when loading the dataset.
pin_memory: whether to copy tensors into CUDA pinned memory. Set it to
True if using GPU.
"""
# define transforms
normalize = transforms.Normalize((0.1307,), (0.3081,))
trans = transforms.Compose([transforms.ToTensor(), normalize])
# load dataset
dataset = datasets.MNIST(data_dir, train=False, download=True, transform=trans)
'''data_loader = torch.utils.data.DataLoader(
dataset,
batch_size=batch_size,
shuffle=False,
num_workers=num_workers,
pin_memory=pin_memory,
)'''
data_loader = torch.utils.data.DataLoader(dataset, batch_size=batch_size, shuffle=False, **kwargs)
return data_loader
Hello I want to use recurrent visual attention with my own dataset so I have a custom dataloader which looks like below. I have run the code with MNIST without any trouble but with my own dataset I am facing issues.
from __future__ import print_function, division #ds
import numpy as np
from utils import plot_images
import os #ds
import pandas as pd #ds
from skimage import io, transform #ds
import torch
from torchvision import datasets
from torch.utils.data import Dataset, DataLoader #ds
from torchvision import transforms
from torchvision import utils #ds
from torch.utils.data.sampler import SubsetRandomSampler
class CDataset(Dataset):
def __init__(self, csv_file, root_dir, transform=None):
"""
Args:
csv_file (string): Path to the csv file with annotations.
root_dir (string): Directory with all the images.
transform (callable, optional): Optional transform to be applied
on a sample.
"""
self.frame = pd.read_csv(csv_file)
self.root_dir = root_dir
self.transform = transform
def __len__(self):
return len(self.frame)
def __getitem__(self, idx):
img_name = os.path.join(self.root_dir,
self.frame.iloc[idx, 0]+'.jpg')
image = io.imread(img_name)
# image = image.transpose((2, 0, 1))
labels = np.array(self.frame.iloc[idx, 1])#.as_matrix() #ds
#landmarks = landmarks.astype('float').reshape(-1, 2)
#print(image.shape)
#print(img_name,labels)
sample = {'image': image, 'labels': labels}
if self.transform:
sample = self.transform(sample)
return sample
class ToTensor(object):
"""Convert ndarrays in sample to Tensors."""
def __call__(self, sample):
image, labels = sample['image'], sample['labels']
#print(image)
#print(labels)
# swap color axis because
# numpy image: H x W x C
# torch image: C X H X W
image = image.transpose((2, 0, 1))
#print(image.shape)
#print((torch.from_numpy(image)))
#print((torch.from_numpy(labels)))
return {'image': torch.from_numpy(image),
'labels': torch.from_numpy(labels)}
def get_train_valid_loader(data_dir,
batch_size,
random_seed,
#valid_size=0.1, #ds
#shuffle=True,
show_sample=False,
num_workers=4,
pin_memory=False):
"""
Utility function for loading and returning train and valid
multi-process iterators over the MNIST dataset. A sample
9x9 grid of the images can be optionally displayed.
If using CUDA, num_workers should be set to 1 and pin_memory to True.
Args
----
- data_dir: path directory to the dataset.
- batch_size: how many samples per batch to load.
- random_seed: fix seed for reproducibility.
- #ds valid_size: percentage split of the training set used for
the validation set. Should be a float in the range [0, 1].
In the paper, this number is set to 0.1.
- shuffle: whether to shuffle the train/validation indices.
- show_sample: plot 9x9 sample grid of the dataset.
- num_workers: number of subprocesses to use when loading the dataset.
- pin_memory: whether to copy tensors into CUDA pinned memory. Set it to
True if using GPU.
Returns
-------
- train_loader: training set iterator.
- valid_loader: validation set iterator.
"""
#ds
#error_msg = "[!] valid_size should be in the range [0, 1]."
#assert ((valid_size >= 0) and (valid_size <= 1)), error_msg
#ds
# define transforms
#normalize = transforms.Normalize((0.1307,), (0.3081,))
trans = transforms.Compose([
ToTensor(), #normalize,
])
# load train dataset
#train_dataset = datasets.MNIST(
# data_dir, train=True, download=True, transform=trans
#)
train_dataset = CDataset(csv_file='/home/Desktop/6June17/util/train.csv',
root_dir='/home/caffe/data/images/',transform=trans)
# load validation dataset
#valid_dataset = datasets.MNIST( #ds
# data_dir, train=True, download=True, transform=trans #ds
#)
valid_dataset = CDataset(csv_file='/home/Desktop/6June17/util/eval.csv',
root_dir='/home/caffe/data/images/',transform=trans)
num_train = len(train_dataset)
train_indices = list(range(num_train))
#ds split = int(np.floor(valid_size * num_train))
num_valid = len(valid_dataset) #ds
valid_indices = list(range(num_valid)) #ds
#if shuffle:
# np.random.seed(random_seed)
# np.random.shuffle(indices)
#ds train_idx, valid_idx = indices[split:], indices[:split]
train_idx = train_indices #ds
valid_idx = valid_indices #ds
train_sampler = SubsetRandomSampler(train_idx)
valid_sampler = SubsetRandomSampler(valid_idx)
train_loader = torch.utils.data.DataLoader(
train_dataset, batch_size=batch_size, sampler=train_sampler,
num_workers=num_workers, pin_memory=pin_memory,
)
print(train_loader)
valid_loader = torch.utils.data.DataLoader(
valid_dataset, batch_size=batch_size, sampler=valid_sampler,
num_workers=num_workers, pin_memory=pin_memory,
)
# visualize some images
if show_sample:
sample_loader = torch.utils.data.DataLoader(
dataset, batch_size=9, #shuffle=shuffle,
num_workers=num_workers, pin_memory=pin_memory
)
data_iter = iter(sample_loader)
images, labels = data_iter.next()
X = images.numpy()
X = np.transpose(X, [0, 2, 3, 1])
plot_images(X, labels)
return (train_loader, valid_loader)
def get_test_loader(data_dir,
batch_size,
num_workers=4,
pin_memory=False):
"""
Utility function for loading and returning a multi-process
test iterator over the MNIST dataset.
If using CUDA, num_workers should be set to 1 and pin_memory to True.
Args
----
- data_dir: path directory to the dataset.
- batch_size: how many samples per batch to load.
- num_workers: number of subprocesses to use when loading the dataset.
- pin_memory: whether to copy tensors into CUDA pinned memory. Set it to
True if using GPU.
Returns
-------
- data_loader: test set iterator.
"""
# define transforms
#normalize = transforms.Normalize((0.1307,), (0.3081,))
trans = transforms.Compose([
ToTensor(), #normalize,
])
# load dataset
#dataset = datasets.MNIST(
# data_dir, train=False, download=True, transform=trans
#)
test_dataset = CDataset(csv_file='/home/Desktop/6June17/util/test.csv',
root_dir='/home/caffe/data/images/',transform=trans)
test_loader = torch.utils.data.DataLoader(
test_dataset, batch_size=batch_size, shuffle=False,
num_workers=num_workers, pin_memory=pin_memory,
)
return test_loader
#for i_batch, sample_batched in enumerate(dataloader):
# print(i_batch, sample_batched['image'].size(),
# sample_batched['landmarks'].size())
# # observe 4th batch and stop.
# if i_batch == 3:
# plt.figure()
# show_landmarks_batch(sample_batched)
# plt.axis('off')
# plt.ioff()
# plt.show()
# break
Other main change I have made is closing off the parameter intake for validation size and shuffling (as I am using a pre-existing train, validation and test split and I have already shuffled these splits)
And my last change is,while iterating at trainer.py train_one_epoch(self, epoch) function. I have changed this part because formerly the x,y was being returned as strings of "image" and "labels" - headers of the pyton dictionary rather than the values in batches.
for i, batch in enumerate(self.train_loader):
x, y = batch["image"], batch["labels"]
But now I get issues that I can not figure out:
Without the GPU, I get this error:
[*] Train on 64034 samples, validate on 18951 samples
Epoch: 1/200 - LR: 0.000300
<torch.utils.data.dataloader.DataLoader object at 0x7fe065fd4f60>
0%| | 0/64034 [00:00<?, ?it/s]/home/duygu/recurrent-visual-attention-master/modules.py:106: UserWarning: invalid index of a 0-dim tensor. This will be an error in PyTorch 0.5. Use tensor.item() to convert a 0-dim tensor to a Python number
from_x, to_x = from_x.data[0], to_x.data[0]
/home/duygu/recurrent-visual-attention-master/modules.py:107: UserWarning: invalid index of a 0-dim tensor. This will be an error in PyTorch 0.5. Use tensor.item() to convert a 0-dim tensor to a Python number
from_y, to_y = from_y.data[0], to_y.data[0]
Traceback (most recent call last):
File "main.py", line 49, in <module>
main(config)
File "main.py", line 40, in main
trainer.train()
File "/home/duygu/recurrent-visual-attention-master/trainer.py", line 168, in train
train_loss, train_acc = self.train_one_epoch(epoch)
File "/home/duygu/recurrent-visual-attention-master/trainer.py", line 252, in train_one_epoch
h_t, l_t, b_t, p = self.model(x, l_t, h_t)
File "/usr/local/lib/python3.5/dist-packages/torch/nn/modules/module.py", line 491, in __call__
result = self.forward(*input, **kwargs)
File "/home/duygu/recurrent-visual-attention-master/model.py", line 101, in forward
g_t = self.sensor(x, l_t_prev)
File "/usr/local/lib/python3.5/dist-packages/torch/nn/modules/module.py", line 491, in __call__
result = self.forward(*input, **kwargs)
File "/home/duygu/recurrent-visual-attention-master/modules.py", line 214, in forward
phi_out = F.relu(self.fc1(phi))
File "/usr/local/lib/python3.5/dist-packages/torch/nn/modules/module.py", line 491, in __call__
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.5/dist-packages/torch/nn/modules/linear.py", line 55, in forward
return F.linear(input, self.weight, self.bias)
File "/usr/local/lib/python3.5/dist-packages/torch/nn/functional.py", line 992, in linear
return torch.addmm(bias, input, weight.t())
RuntimeError: Expected object of type torch.FloatTensor but found type torch.ByteTensor for argument #4 'mat1'
Also is there any modifications that we can do to use GPU (frankly, I have chosen this implementation thinking GPU is supported so I am a little discouraged with other comments saying it is not)? I could potentially try it out. But of course, most crucial part is that I have a running example and to ensure I am not doing anything wrong (kind of difficult to track as I am new to pytorch).
Thanks.
Hi, @kevinzakka
I entered my own data with MNIST Format(256x256, Gray Images, 5000 Images/class)
But Performance is not good.
What's wrong with me?
Epoch: 196/500 - LR: 0.000300
0.8s - loss: 1.055 - acc: 100.000: 100%|█████████| 217/217 [00:00<00:00, 267.85it/s]
train loss: 0.834 - train acc: 62.212 - val loss: 1.192 - val acc: 54.167
Epoch: 197/500 - LR: 0.000300
0.8s - loss: -0.885 - acc: 100.000: 100%|████████| 217/217 [00:00<00:00, 273.63it/s]
train loss: 0.568 - train acc: 60.369 - val loss: 0.844 - val acc: 54.167
Epoch: 198/500 - LR: 0.000300
0.8s - loss: 0.780 - acc: 100.000: 100%|█████████| 217/217 [00:00<00:00, 270.30it/s]
train loss: 0.565 - train acc: 57.604 - val loss: 1.076 - val acc: 50.000
Epoch: 199/500 - LR: 0.000300
0.8s - loss: 3.553 - acc: 0.000: 100%|███████████| 217/217 [00:00<00:00, 271.82it/s]
train loss: 0.678 - train acc: 58.525 - val loss: 0.533 - val acc: 58.333
Epoch: 200/500 - LR: 0.000300
0.8s - loss: 0.116 - acc: 100.000: 100%|█████████| 217/217 [00:00<00:00, 272.74it/s]
train loss: 0.651 - train acc: 58.986 - val loss: 1.418 - val acc: 45.833
Epoch: 201/500 - LR: 0.000300
0.8s - loss: 5.108 - acc: 0.000: 100%|███████████| 217/217 [00:00<00:00, 275.17it/s]
train loss: 0.779 - train acc: 63.594 - val loss: 0.921 - val acc: 62.500
Epoch: 202/500 - LR: 0.000300
0.8s - loss: 1.587 - acc: 0.000: 100%|███████████| 217/217 [00:00<00:00, 270.84it/s]
train loss: 0.830 - train acc: 58.525 - val loss: 0.746 - val acc: 58.333
[!] No improvement in a while, stopping training.
Thanks.
from @bemoregt.
Nice code. But
At the moment the location tensor l_t is never detached from the computational graph in spite of both being produced by and 'consumed' by trainable modules. As far as I understand the code this enables the gradients to 'backpropagate through time' in a way that the authors of RAM did not intend: the gradients that originated in the action_network and reached the fc2 layer inside the glimpse network would travel back to the previous timestep's location_network and alter its weights and only stop once they reach the detached RNN memory vector h_t. As far as I understand the authors intended the location_network to only be trained using reinforcement learning.
This could be a bug or it could be an accidental improvement to the network; either way please let me know if my understanding is correct in here as I am still learning Pytorch and my project is heavily reliant on your code :)
Hi, thanks for your work and release of the code, I have one question related with training location network using REINFORCE algorithm. If I understand right,
In modules.py , the following part is the implementation for REINFORCE
# compute mean
feat = F.relu(self.fc(h_t.detach()))
mu = torch.tanh(self.fc_lt(feat))
# reparametrization trick
l_t = torch.distributions.Normal(mu, self.std).rsample()
l_t = l_t.detach()
log_pi = Normal(mu, self.std).log_prob(l_t)
and for calculating the loss_reinforce and reward, the relevant part is the following
# calculate reward
predicted = torch.max(log_probas, 1)[1]
R = (predicted.detach() == y).float()
R = R.unsqueeze(1).repeat(1, self.num_glimpses)
...
...
# compute reinforce loss
# summed over timesteps and averaged across batch
adjusted_reward = R - baselines.detach()
loss_reinforce = torch.sum(-log_pi * adjusted_reward, dim=1) # gradient ascent (negative)
loss_reinforce = torch.mean(loss_reinforce, dim=0)
My question is how do we update parameters in fully connected layer if we detach all the related parameters?
I read some examples on REINFORCE algorithm implementation like
pytorch document and pytorch REINFORCE official example.
however, I still cannot figure out how the detach function works
I saw another similar issues #29 and issues #20
Any help would be appreciated and thanks for your time!
Best wishes
There seems to be a detach() in location_network() while obtaining mu from h_t. Same thing for the baseline or value estimation. Is this required? If yes, then essentially, the log_prob loss is not training the RNN, but only the fc layer for mu computation.
Is this correct?
I am providing locations of the important landmarks for image classification problem I am solving, but when these 2 dimensions are passed for each image, instead of loss being decreased, loss is becoming nan along with accuracy. Is it expected to normalise the location pixels?
when running main.py, the program gives segmentation fault in the first 3 iterations in line "accs.update(acc.data[0], x.size()[0])",
Epoch: 1/1500 - LR: 0.000300
0%| | 0/11044 [00:04<?, ?it/s]
Traceback (most recent call last):
File "main.py", line 49, in
main(config)
File "main.py", line 41, in main
trainer.train()
File "C:\Users\apatil\pyram - exp 2\pytorch_ram\trainer.py", line 164, in train
train_loss, train_acc = self.train_one_epoch(epoch)
File "C:\Users\apatil\pyram - exp 2\pytorch_ram\trainer.py", line 241, in train_one_epoch
h_t, l_t, b_t, p = self.model(x, l_t, h_t)
File "C:\tools\Anaconda3\envs\pytorch_ram\lib\site-packages\torch\nn\modules\module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "C:\Users\apatil\pyram - exp 2\pytorch_ram\model.py", line 80, in forward
h_t = self.rnn(g_t, h_t_prev)
File "C:\tools\Anaconda3\envs\pytorch_ram\lib\site-packages\torch\nn\modules\module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "C:\Users\apatil\pyram - exp 2\pytorch_ram\modules.py", line 226, in forward
h1 = self.i2h(g_t)
File "C:\tools\Anaconda3\envs\pytorch_ram\lib\site-packages\torch\nn\modules\module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "C:\tools\Anaconda3\envs\pytorch_ram\lib\site-packages\torch\nn\modules\linear.py", line 93, in forward
return F.linear(input, self.weight, self.bias)
File "C:\tools\Anaconda3\envs\pytorch_ram\lib\site-packages\torch\nn\functional.py", line 1690, in linear
ret = torch.addmm(bias, input, weight.t())
RuntimeError: mat1 and mat2 shapes cannot be multiplied (128x512 and 256x256)
(pytorch_ram) C:\Users\apatil\pyram - exp 2\pytorch_ram>
Hi @kevinzakka, thank you for making your well written code public. I am trying to use DRAM for the 3D data (videos). Can you advise me if your code can be extended to 3D data?
use pytorch 1.0 python3.7
recurrent-visual-attention/modules.py
Line 10 in b659b6f
Just FYI, I am re-factoring your code and found that the retina network can be made a little faster by padding the whole batch with sufficient 0s and then extracting the patches directly. You can check a working version here.
p.s. I didn't do much profile, just check the time for the first epoch several times (about 1.3 times faster).
How can I load a trained model and test it with an input image?
Hello, I am trying to use this with my custom dataset. I am using a dataloader (see here #18) though even when I cast my image input to Float32 and get rid of that error, I get a mismatch of tensors while training the network.
Traceback (most recent call last):
File "main.py", line 49, in <module>
main(config)
File "main.py", line 40, in main
trainer.train()
File "/home/duygu/recurrent-visual-attention-master/trainer.py", line 168, in train
train_loss, train_acc = self.train_one_epoch(epoch)
File "/home/duygu/recurrent-visual-attention-master/trainer.py", line 252, in train_one_epoch
h_t, l_t, b_t, p = self.model(x, l_t, h_t)
File "/usr/local/lib/python3.5/dist-packages/torch/nn/modules/module.py", line 491, in __call__
result = self.forward(*input, **kwargs)
File "/home/duygu/recurrent-visual-attention-master/model.py", line 101, in forward
g_t = self.sensor(x, l_t_prev)
File "/usr/local/lib/python3.5/dist-packages/torch/nn/modules/module.py", line 491, in __call__
result = self.forward(*input, **kwargs)
File "/home/duygu/recurrent-visual-attention-master/modules.py", line 214, in forward
phi_out = F.relu(self.fc1(phi))
File "/usr/local/lib/python3.5/dist-packages/torch/nn/modules/module.py", line 491, in __call__
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.5/dist-packages/torch/nn/modules/linear.py", line 55, in forward
return F.linear(input, self.weight, self.bias)
File "/usr/local/lib/python3.5/dist-packages/torch/nn/functional.py", line 992, in linear
return torch.addmm(bias, input, weight.t())
RuntimeError: size mismatch, m1: [32 x 192], m2: [64 x 128] at /pytorch/aten/src/TH/generic/THTensorMath.c:2033
I can not figure out what goes wrong. Is it about patches or weights? Any insights could be really helpful. Thanks.
hello, i have read you code several days ago. I have some trouble with you code. i don't understand your policy network about the .detach, i dont konw how to realize this function? Can you provide some idea.
Hi, @kevinzakka
Is this code can be a "online learning model"?
I mean ... In reference time, retraining is implemented using additional data , repeatedly.
So, accuracy is better and better repeatedly ...
Is it possible?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.