The non-local_pytorch from alexhex7

Visualization of Nonlocal Map

Hi, @AlexHex7 . I was confused how to implement visualization of the Nonlocal Map.

As for the Nonlocal Map of Net.nl_2 layer, which size is (49, 9), and the code you released first reshapes each row of nl_map to (3, 3) and converts it to (56, 56, 3) as the resized-input image. Then directly add the image and heat_map by a weighted sum to obtain img_add as the attention visualization for current query rect[(y0, x0), (y1, x1)].

Why each row of nl_map can represent the attention of current query?

Confusion on max_pool in non-local block

Hi, I am wondering why the kernel size in non-local block for 3D is (1,2,2) rather than (2,2,2)?
Thanks

TypeError: softmax() got an unexpected keyword argument 'dim'

Hello！ when I run the code， I got an error ''TypeError: softmax() got an unexpected keyword argument 'dim' '' in non_local_simple version.py ( line 77 : f_div_C = F.softmax(f, dim=-1) ). Can you give me some advice? thanks!

About implementing the general non-local operator

Hi, I am wondering whether have you considered to implement the general form of the non-local operator, where we compute the attention of a given kernel size for each pixel.

A problem found in the code

Should nn.init.constant(self.W[1].weight, 0) be changed to nn.init.constant(self.W[0].weight, 0)?

python demo_MNIST.py can run in pytorch0.3.0, but out of memory in pytorch 0.4.0

excuse me
your code can run in my pytorch 0.3.0 environment.when I run your code in pytorch 0.4.0 :

RuntimeError: cuda runtime error (2) : out of memory at /opt/conda/conda-bld/pytorch_1525812548180/work/aten/src/THC/generic/THCStorage.cu:58

but myself code need pytorch 0.4.0 ,can you give me some advice/help?

少批量数据有用，大批量数据无效

我把non-local 用在reid上面的，在resnet50的前三个layers后面都接了non-local层。用1000个ID进行训练是发现能提高一个点，而且收敛速度比原来快，但是当我用5000个ID训练时发现结果比原来低了一个多点。是不是有哪些参数我需要再调一下？烦请解答哦
PS：我没有发现其他网友提出的W权重为0的情况，我这边W还是挺正常的

AttributeError: 'NONLocalBlock2D' object has no attribute 'nn'

Hi~
I use the code but some problems happened , my torch version is 1.1.0, is it must to use the 0.4.0 torch?

how to understand heatmap in data?

Hi, why can a heat map be generated?? I don't understant visualization of non-local.
Are there any references available？

Computing Cost of Non-local block

Hi！Is this module computationally heavy? When I add this module to my network, there are always CUDA out of memory errors. However, when I remove this module, it will work properly. I don't know why.Thanks

Softmax activation

If you test Softmax instead scaling by 1/N

3D pooling

Non-local_pytorch/lib/non_local_simple_version.py

Line 59 in 589dde8

self.g = nn.Sequential(self.g, max_pool(kernel_size=2))

In the paper, the pooling is only in the spatial domain. I think the kernel size is not set correctly when the input contains temporal dimension.

https://github.com/facebookresearch/video-nonlocal-net/blob/b273c446e8e10dbaec266520e4005d27d7052125/lib/models/nonlocal_helper.py#L40

non-local doesn't work in my experiments （I3D,UCF101）

non-local doesn't work in my experiments （I3D,UCF101）,Is this normal?

multi-gpu problem

When I use one gpu to run, it is OK. When I use muti-gpu, something is wrong like this.

File "/data2/gjt/pyenv/versions/anaconda3-5.2.0/lib/python3.6/site-packages/torch/nn/modules/module.py", line 491, in call
result = self.forward(*input, **kwargs)
File "/data2/gjt/pyenv/versions/anaconda3-5.2.0/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 114, in forward
outputs = self.parallel_apply(replicas, inputs, kwargs)
File "/data2/gjt/pyenv/versions/anaconda3-5.2.0/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 124, in parallel_apply
return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
File "/data2/gjt/pyenv/versions/anaconda3-5.2.0/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 65, in parallel_apply
raise output
File "/data2/gjt/pyenv/versions/anaconda3-5.2.0/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 41, in _worker
output = module(*input, **kwargs)
File "/data2/gjt/pyenv/versions/anaconda3-5.2.0/lib/python3.6/site-packages/torch/nn/modules/module.py", line 491, in call
result = self.forward(*input, **kwargs)
File "/data3/hooks/retinanet/model.py", line 274, in forward
x3 = self.layer3(x2)
File "/data2/gjt/pyenv/versions/anaconda3-5.2.0/lib/python3.6/site-packages/torch/nn/modules/module.py", line 491, in call
result = self.forward(*input, **kwargs)
File "/data2/gjt/pyenv/versions/anaconda3-5.2.0/lib/python3.6/site-packages/torch/nn/modules/container.py", line 91, in forward
input = module(input)
File "/data2/gjt/pyenv/versions/anaconda3-5.2.0/lib/python3.6/site-packages/torch/nn/modules/module.py", line 491, in call
result = self.forward(*input, **kwargs)
File "/data3/hooks/retinanet/non_local.py", line 94, in forward
output = self.operation_function(x)
File "/data3/hooks/retinanet/non_local.py", line 101, in _embedded_gaussian
g_x = self.g(x).view(batch_size, self.inter_channels, -1)
File "/data2/gjt/pyenv/versions/anaconda3-5.2.0/lib/python3.6/site-packages/torch/nn/modules/module.py", line 491, in call
result = self.forward(*input, **kwargs)
File "/data2/gjt/pyenv/versions/anaconda3-5.2.0/lib/python3.6/site-packages/torch/nn/modules/conv.py", line 301, in forward
self.padding, self.dilation, self.groups)
RuntimeError: Expected tensor for argument #1 'input' to have the same device as tensor for argument #2 'weight'; but device 1 does not equal 0 (while checking arguments for cudnn_convolution)

Why init the parameters within self.W as zero?

            nn.init.constant(self.W[1].bias, 0)
        else:
            self.W = conv_nd(in_channels=self.inter_channels, out_channels=self.in_channels,
                             kernel_size=1, stride=1, padding=0)
            nn.init.constant(self.W.weight, 0)
            nn.init.constant(self.W.bias, 0)

I just can not figure out whey initialize the weights and biases within the self.W as zero.

关于 Embedded Gaussian问题

想问一下，在non_local_embedded_gaussian.py实现文件中，我并没有发现Embedded Gaussian的具体表达式，只是有矩阵乘法+softmax函数...如果是Embedded Gaussian的实现，具体是哪几行代码实现呢？

Non-local_pytorch/lib/non_local_embedded_gaussian.py

Line 85 in 39ad90c

f = torch.matmul(theta_x, phi_x)

Non-local_pytorch/lib/non_local_embedded_gaussian.py

Line 86 in 39ad90c

f_div_C = F.softmax(f, dim=-1)

Visualization

Awesome library, I was able to convert it to train against my own dataset after making some modifications. Are there any plans to include (or do you have something personal written up) for doing attention masks or any other visualizations (such as the one in https://arxiv.org/pdf/1805.08318.pdf)? Trying to understand the non-local dependencies the model is forming

如果使用更大的数据集？例如CIFAR100？

感谢你很棒的工作，如果使用更大的数据集？例如CIFAR100？能提供可以跑多个不同数据集的训练脚本吗？谢谢

将Non-Local加在目标检测模型中

您好，我想请问一下，这种方法加在目标检测模型中，针对小物体的检测精度是否会提升

以及，一般加在哪里呢，怎么加？

谢谢~~~ ！

About the bias of 3d convolutions in the attention block

Hello, I noticed you did not set bias=False in the 1x1 3d convolution layers which implements

phi=W_phix+B_phi
g=W_gx+B_g
theta=W_theta*x+B_theta

I have read some materials and papers. None of them have mentioned if there are bias terms like B_phi, B_g and B_theta.
I have tried my implementation with Bias=True, just like you did, and it did improve the performance.

I just want to ask where did you come to this idea of setting bias=True(as default). Just in case if I was missing something in my reading.

Which dim to perform softmax

question about Initialization

Thanks for providing us this wonderful repository. But I have a question about how to initialize the parameters in g、theta and phi when training。Thanks！

batch norm initialization

In non_local.py, 50 51 line,
nn.init.constant(self.W[1].weight, 0)
why is the weight in batchnorm layer set to zero initialization?
I worry that they will compute the same gradients during backpropagation and undergo the exact same parameter updates...

Non-local model trained on Kinetics

Hi, I have trained a Non-local ResNet-50 TSM model on Kinetics dataset using your code here: https://github.com/MIT-HAN-LAB/temporal-shift-module (I have stated the implementation is from your GitHub repo in my code). It achieves a good performance of 75.6% on Kinetics, which is even higher than Non-local ResNet-50 I3D as reported in the paper.

Since you do not include a License file in the repo, I want to make sure you are OK with using your code. And also you could consider adding a reference to my repo to show the results on Kinetics if it helps :).

problem of function W initialization

I think you want to initialize self.W as zero, so that the residual path won't affect the pre-trained model, but I can not figure out why you initialize self.W[1] rather than self.W[0] when using bn layer?

BUG: cv2.resize(nl_map, dsize=(h, w))

https://github.com/AlexHex7/Non-local_pytorch/blame/ec15789af4533448169f3dc36abe542badd34221/nl_map_vis/nl_map_vis.py#L33

nl_map = cv2.resize(nl_map, dsize=(h, w))

Shouldn't it be dsize=(w, h) ?
I encountered this bug when my image size W ≠ H.

multi-gpu problem

When I use one gpu to run, it is OK. When I use muti-gpu, something is wrong like this.

File "/data2/gjt/pyenv/versions/anaconda3-5.2.0/lib/python3.6/site-packages/torch/nn/modules/module.py", line 491, in call
result = self.forward(*input, **kwargs)
File "/data2/gjt/pyenv/versions/anaconda3-5.2.0/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 114, in forward
outputs = self.parallel_apply(replicas, inputs, kwargs)
File "/data2/gjt/pyenv/versions/anaconda3-5.2.0/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 124, in parallel_apply
return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
File "/data2/gjt/pyenv/versions/anaconda3-5.2.0/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 65, in parallel_apply
raise output
File "/data2/gjt/pyenv/versions/anaconda3-5.2.0/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 41, in _worker
output = module(*input, **kwargs)
File "/data2/gjt/pyenv/versions/anaconda3-5.2.0/lib/python3.6/site-packages/torch/nn/modules/module.py", line 491, in call
result = self.forward(*input, **kwargs)
File "/data3/hooks/retinanet/model.py", line 274, in forward
x3 = self.layer3(x2)
File "/data2/gjt/pyenv/versions/anaconda3-5.2.0/lib/python3.6/site-packages/torch/nn/modules/module.py", line 491, in call
result = self.forward(*input, **kwargs)
File "/data2/gjt/pyenv/versions/anaconda3-5.2.0/lib/python3.6/site-packages/torch/nn/modules/container.py", line 91, in forward
input = module(input)
File "/data2/gjt/pyenv/versions/anaconda3-5.2.0/lib/python3.6/site-packages/torch/nn/modules/module.py", line 491, in call
result = self.forward(*input, **kwargs)
File "/data3/hooks/retinanet/non_local.py", line 94, in forward
output = self.operation_function(x)
File "/data3/hooks/retinanet/non_local.py", line 101, in _embedded_gaussian
g_x = self.g(x).view(batch_size, self.inter_channels, -1)
File "/data2/gjt/pyenv/versions/anaconda3-5.2.0/lib/python3.6/site-packages/torch/nn/modules/module.py", line 491, in call
result = self.forward(*input, **kwargs)
File "/data2/gjt/pyenv/versions/anaconda3-5.2.0/lib/python3.6/site-packages/torch/nn/modules/conv.py", line 301, in forward
self.padding, self.dilation, self.groups)
RuntimeError: Expected tensor for argument #1 'input' to have the same device as tensor for argument #2 'weight'; but device 1 does not equal 0 (while checking arguments for cudnn_convolution)

code problem

When setting subsample to True，why self.theta don't add a maxpool layer?

在运行第三点时 $ CUDA_VISIBLE_DEVICES=0,1 python nl_map_save.py 报这个错，该怎么改啊

Traceback (most recent call last): File "nl_map_save.py", line 20, in net.load_state_dict(torch.load('weights/net.pth')) File "/home/user/anaconda3/envs/TEST2/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1045, in load_state_dict self.class.name, "\n\t".join(error_msgs))) RuntimeError: Error(s) in loading state_dict for Network: Missing key(s) in state_dict: "conv_1.0.weight", "conv_1.0.bias", "conv_1.1.weight", "conv_1.1.bias", "conv_1.1.running_mean", "conv_1.1.running_var", "nl_1.g.0.weight", "nl_1.g.0.bias", "nl_1.W.0.weight", "nl_1.W.0.bias", "nl_1.W.1.weight", "nl_1.W.1.bias", "nl_1.W.1.running_mean", "nl_1.W.1.running_var", "nl_1.theta.weight", "nl_1.theta.bias", "nl_1.phi.0.weight", "nl_1.phi.0.bias", "conv_2.0.weight", "conv_2.0.bias", "conv_2.1.weight", "conv_2.1.bias", "conv_2.1.running_mean", "conv_2.1.running_var", "nl_2.g.0.weight", "nl_2.g.0.bias", "nl_2.W.0.weight", "nl_2.W.0.bias", "nl_2.W.1.weight", "nl_2.W.1.bias", "nl_2.W.1.running_mean", "nl_2.W.1.running_var", "nl_2.theta.weight", "nl_2.theta.bias", "nl_2.phi.0.weight", "nl_2.phi.0.bias", "conv_3.0.weight", "conv_3.0.bias", "conv_3.1.weight", "conv_3.1.bias", "conv_3.1.running_mean", "conv_3.1.running_var", "fc.0.weight", "fc.0.bias", "fc.3.weight", "fc.3.bias". Unexpected key(s) in state_dict: "module.conv_1.0.weight", "module.conv_1.0.bias", "module.conv_1.1.weight", "module.conv_1.1.bias", "module.conv_1.1.running_mean", "module.conv_1.1.running_var", "module.conv_1.1.num_batches_tracked", "module.nl_1.g.0.weight", "module.nl_1.g.0.bias", "module.nl_1.W.0.weight", "module.nl_1.W.0.bias", "module.nl_1.W.1.weight", "module.nl_1.W.1.bias", "module.nl_1.W.1.running_mean", "module.nl_1.W.1.running_var", "module.nl_1.W.1.num_batches_tracked", "module.nl_1.theta.weight", "module.nl_1.theta.bias", "module.nl_1.phi.0.weight", "module.nl_1.phi.0.bias", "module.conv_2.0.weight", "module.conv_2.0.bias", "module.conv_2.1.weight", "module.conv_2.1.bias", "module.conv_2.1.running_mean", "module.conv_2.1.running_var", "module.conv_2.1.num_batches_tracked", "module.nl_2.g.0.weight", "module.nl_2.g.0.bias", "module.nl_2.W.0.weight", "module.nl_2.W.0.bias", "module.nl_2.W.1.weight", "module.nl_2.W.1.bias", "module.nl_2.W.1.running_mean", "module.nl_2.W.1.running_var", "module.nl_2.W.1.num_batches_tracked", "module.nl_2.theta.weight", "module.nl_2.theta.bias", "module.nl_2.phi.0.weight", "module.nl_2.phi.0.bias", "module.conv_3.0.weight", "module.conv_3.0.bias", "module.conv_3.1.weight", "module.conv_3.1.bias", "module.conv_3.1.running_mean", "module.conv_3.1.running_var", "module.conv_3.1.num_batches_tracked", "module.fc.0.weight", "module.fc.0.bias", "module.fc.3.weight", "module.fc.3.bias".

the initialization of batchnorm and some other layers

net.train() again in demo_MNIST_train.py file

Hi, I started to learn pytorch a few months ago,
so I might be questioning basic stuff.

in [Non-local_pytorch/demo_MNIST_train.py](url) file,
there was net.train() in the 50th line, within the training loop.
after the testing loop, net.train() is called once more in the 70th line.
I know that switching the network into training mode after the testing loop is important,
but why is net.train() in the 70th line necessary when it would be called at the beginning of the training loop anyways?

优秀（Great job）

代码深入浅出，清晰易懂，点赞（Thanks for author for opening the source code. It's a great job）

Should γ be 1 and β be 0?

If you want the output of the Non-local block equal to the input, I think γ and β should be 1 and 0 respectively rather than both 0. Then output = γ * input + β = input. What do you think?

理解热力图

你好，我也想问一下如何去理解热力图，看了paper还是不太理解~，希望能解答一下~

About GPU memory usage

If non-local is applied to the low-level feature map, CUDA out of memory will happen.Is this due to the amount of memory required to compute the Attention matrix?
Looking forward to your reply

AttributeError: 'NoneType' object has no attribute 'fill_'

I want to embed this non-local code in my framework, but when I run my code, the following error will occur:

File "/home/Global/models/non_local_embeded_gaussian.py", line 102, in init
super(NONLocalBlock2D, self).init(in_channels,
File "/home/Global/models/non_local_embeded_gaussian.py", line 54, in init
nn.init.constant_(self.W[1].weight, 0)
File "/home/anaconda3/envs/torch/lib/python3.8/site-packages/torch/nn/init.py", line 176, in constant_
return no_grad_fill(tensor, val)
File "/home/anaconda3/envs/torch/lib/python3.8/site-packages/torch/nn/init.py", line 59, in no_grad_fill
return tensor.fill_(val)
AttributeError: 'NoneType' object has no attribute 'fill_'

In my envirnoment, the pyhton version: 3.8, torch version: 1.8.0, cuda:11.2

bias=False before BatchNorm layer

Thanks very much for your contribution.

In this line:

I think the conv layer should have a bias=False before the batch norm layer.

Please tell me if I ignore something?

implement in other network

SO if I want to add nonblock into my own experiment
I can only add NONLocalBlock2D() into somewhere of the network ? and I did't need to change other params ?

Can you give experiment result of this code?

Thank you for your work!
Can you give the result of experiment on MNIST to check?

关于此代码效果问题

您好，想请问一下您是否有在某个数据集上测试过您的代码呢？最后的结果是否work呢？

Cuda Out of memory problem

if torch.cuda.is_available():
net.cuda()
should be
if torch.cuda.is_available():
net.cuda(cfg.cuda_num) ?

When I run demo_MNIST.py, It shows
RuntimeError: cuda runtime error (2) : out of memory at
THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1524577177097/work/aten/src/THC/generic/THCStorage.cu line=58 error=2 : out of memory

How should I use the non-local block in the segmentation task?

I try to add the non-local block in the segmentation task as follows, but it seems that there is no improvement, can you give me some advice?Thank you!
(layer1): Sequential(
(0): NONLocalBlock2D(
(g): Sequential(
(0): Conv2d(128, 64, kernel_size=(1, 1), stride=(1, 1))
(1): MaxPool2d(kernel_size=(2, 2), stride=(2, 2), padding=0, dilation=1, ceil_mode=False)
)
(W): Sequential(
(0): Conv2d(64, 128, kernel_size=(1, 1), stride=(1, 1))
(1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(theta): Conv2d(128, 64, kernel_size=(1, 1), stride=(1, 1))
(phi): Sequential(
(0): Conv2d(128, 64, kernel_size=(1, 1), stride=(1, 1))
(1): MaxPool2d(kernel_size=(2, 2), stride=(2, 2), padding=0, dilation=1, ceil_mode=False)
)
)
(1): Sequential(
(0): Bottleneck(
(conv1): Conv2d(128, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace)
(downsample): Sequential(
(0): Conv2d(128, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(1): Bottleneck(
(conv1): Conv2d(256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace)
)
(2): Bottleneck(
(conv1): Conv2d(256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace)
)
)
)

alexhex7 / non-local_pytorch Goto Github PK

non-local_pytorch's People

Contributors

Stargazers

Watchers

Forkers

non-local_pytorch's Issues

Recommend Projects

Recommend Topics

Recommend Org