Code Monkey home page Code Monkey logo

non-local_pytorch's People

Contributors

alexhex7 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

non-local_pytorch's Issues

Visualization of Nonlocal Map

Hi, @AlexHex7 . I was confused how to implement visualization of the Nonlocal Map.

As for the Nonlocal Map of Net.nl_2 layer, which size is (49, 9), and the code you released first reshapes each row of nl_map to (3, 3) and converts it to (56, 56, 3) as the resized-input image. Then directly add the image and heat_map by a weighted sum to obtain img_add as the attention visualization for current query rect[(y0, x0), (y1, x1)].

Why each row of nl_map can represent the attention of current query?

少批量数据有用,大批量数据无效

我把non-local 用在reid上面的,在resnet50的前三个layers后面都接了non-local层。用1000个ID进行训练是发现能提高一个点,而且收敛速度比原来快,但是当我用5000个ID训练时发现结果比原来低了一个多点。是不是有哪些参数我需要再调一下?烦请解答哦
PS:我没有发现其他网友提出的W权重为0的情况,我这边W还是挺正常的

Computing Cost of Non-local block

Hi!Is this module computationally heavy? When I add this module to my network, there are always CUDA out of memory errors. However, when I remove this module, it will work properly. I don't know why.Thanks

multi-gpu problem

When I use one gpu to run, it is OK. When I use muti-gpu, something is wrong like this.

File "/data2/gjt/pyenv/versions/anaconda3-5.2.0/lib/python3.6/site-packages/torch/nn/modules/module.py", line 491, in call
result = self.forward(*input, **kwargs)
File "/data2/gjt/pyenv/versions/anaconda3-5.2.0/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 114, in forward
outputs = self.parallel_apply(replicas, inputs, kwargs)
File "/data2/gjt/pyenv/versions/anaconda3-5.2.0/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 124, in parallel_apply
return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
File "/data2/gjt/pyenv/versions/anaconda3-5.2.0/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 65, in parallel_apply
raise output
File "/data2/gjt/pyenv/versions/anaconda3-5.2.0/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 41, in _worker
output = module(*input, **kwargs)
File "/data2/gjt/pyenv/versions/anaconda3-5.2.0/lib/python3.6/site-packages/torch/nn/modules/module.py", line 491, in call
result = self.forward(*input, **kwargs)
File "/data3/hooks/retinanet/model.py", line 274, in forward
x3 = self.layer3(x2)
File "/data2/gjt/pyenv/versions/anaconda3-5.2.0/lib/python3.6/site-packages/torch/nn/modules/module.py", line 491, in call
result = self.forward(*input, **kwargs)
File "/data2/gjt/pyenv/versions/anaconda3-5.2.0/lib/python3.6/site-packages/torch/nn/modules/container.py", line 91, in forward
input = module(input)
File "/data2/gjt/pyenv/versions/anaconda3-5.2.0/lib/python3.6/site-packages/torch/nn/modules/module.py", line 491, in call
result = self.forward(*input, **kwargs)
File "/data3/hooks/retinanet/non_local.py", line 94, in forward
output = self.operation_function(x)
File "/data3/hooks/retinanet/non_local.py", line 101, in _embedded_gaussian
g_x = self.g(x).view(batch_size, self.inter_channels, -1)
File "/data2/gjt/pyenv/versions/anaconda3-5.2.0/lib/python3.6/site-packages/torch/nn/modules/module.py", line 491, in call
result = self.forward(*input, **kwargs)
File "/data2/gjt/pyenv/versions/anaconda3-5.2.0/lib/python3.6/site-packages/torch/nn/modules/conv.py", line 301, in forward
self.padding, self.dilation, self.groups)
RuntimeError: Expected tensor for argument #1 'input' to have the same device as tensor for argument #2 'weight'; but device 1 does not equal 0 (while checking arguments for cudnn_convolution)

Why init the parameters within self.W as zero?

            nn.init.constant(self.W[1].bias, 0)
        else:
            self.W = conv_nd(in_channels=self.inter_channels, out_channels=self.in_channels,
                             kernel_size=1, stride=1, padding=0)
            nn.init.constant(self.W.weight, 0)
            nn.init.constant(self.W.bias, 0)

I just can not figure out whey initialize the weights and biases within the self.W as zero.

Visualization

Awesome library, I was able to convert it to train against my own dataset after making some modifications. Are there any plans to include (or do you have something personal written up) for doing attention masks or any other visualizations (such as the one in https://arxiv.org/pdf/1805.08318.pdf)? Trying to understand the non-local dependencies the model is forming

将Non-Local加在目标检测模型中

您好,我想请问一下,这种方法加在目标检测模型中,针对小物体的检测精度是否会提升

以及,一般加在哪里呢,怎么加?

谢谢~~~ !

About the bias of 3d convolutions in the attention block

Hello, I noticed you did not set bias=False in the 1x1 3d convolution layers which implements

phi=W_phix+B_phi
g=W_g
x+B_g
theta=W_theta*x+B_theta

I have read some materials and papers. None of them have mentioned if there are bias terms like B_phi, B_g and B_theta.
I have tried my implementation with Bias=True, just like you did, and it did improve the performance.

I just want to ask where did you come to this idea of setting bias=True(as default). Just in case if I was missing something in my reading.

question about Initialization

Thanks for providing us this wonderful repository. But I have a question about how to initialize the parameters in g、theta and phi when training。Thanks!

batch norm initialization

In non_local.py, 50 51 line,
nn.init.constant(self.W[1].weight, 0)
why is the weight in batchnorm layer set to zero initialization?
I worry that they will compute the same gradients during backpropagation and undergo the exact same parameter updates...

Non-local model trained on Kinetics

Hi, I have trained a Non-local ResNet-50 TSM model on Kinetics dataset using your code here: https://github.com/MIT-HAN-LAB/temporal-shift-module (I have stated the implementation is from your GitHub repo in my code). It achieves a good performance of 75.6% on Kinetics, which is even higher than Non-local ResNet-50 I3D as reported in the paper.

Since you do not include a License file in the repo, I want to make sure you are OK with using your code. And also you could consider adding a reference to my repo to show the results on Kinetics if it helps :).

problem of function W initialization

I think you want to initialize self.W as zero, so that the residual path won't affect the pre-trained model, but I can not figure out why you initialize self.W[1] rather than self.W[0] when using bn layer?
initialization problem

multi-gpu problem

When I use one gpu to run, it is OK. When I use muti-gpu, something is wrong like this.

File "/data2/gjt/pyenv/versions/anaconda3-5.2.0/lib/python3.6/site-packages/torch/nn/modules/module.py", line 491, in call
result = self.forward(*input, **kwargs)
File "/data2/gjt/pyenv/versions/anaconda3-5.2.0/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 114, in forward
outputs = self.parallel_apply(replicas, inputs, kwargs)
File "/data2/gjt/pyenv/versions/anaconda3-5.2.0/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 124, in parallel_apply
return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
File "/data2/gjt/pyenv/versions/anaconda3-5.2.0/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 65, in parallel_apply
raise output
File "/data2/gjt/pyenv/versions/anaconda3-5.2.0/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 41, in _worker
output = module(*input, **kwargs)
File "/data2/gjt/pyenv/versions/anaconda3-5.2.0/lib/python3.6/site-packages/torch/nn/modules/module.py", line 491, in call
result = self.forward(*input, **kwargs)
File "/data3/hooks/retinanet/model.py", line 274, in forward
x3 = self.layer3(x2)
File "/data2/gjt/pyenv/versions/anaconda3-5.2.0/lib/python3.6/site-packages/torch/nn/modules/module.py", line 491, in call
result = self.forward(*input, **kwargs)
File "/data2/gjt/pyenv/versions/anaconda3-5.2.0/lib/python3.6/site-packages/torch/nn/modules/container.py", line 91, in forward
input = module(input)
File "/data2/gjt/pyenv/versions/anaconda3-5.2.0/lib/python3.6/site-packages/torch/nn/modules/module.py", line 491, in call
result = self.forward(*input, **kwargs)
File "/data3/hooks/retinanet/non_local.py", line 94, in forward
output = self.operation_function(x)
File "/data3/hooks/retinanet/non_local.py", line 101, in _embedded_gaussian
g_x = self.g(x).view(batch_size, self.inter_channels, -1)
File "/data2/gjt/pyenv/versions/anaconda3-5.2.0/lib/python3.6/site-packages/torch/nn/modules/module.py", line 491, in call
result = self.forward(*input, **kwargs)
File "/data2/gjt/pyenv/versions/anaconda3-5.2.0/lib/python3.6/site-packages/torch/nn/modules/conv.py", line 301, in forward
self.padding, self.dilation, self.groups)
RuntimeError: Expected tensor for argument #1 'input' to have the same device as tensor for argument #2 'weight'; but device 1 does not equal 0 (while checking arguments for cudnn_convolution)

code problem

When setting subsample to True,why self.theta don't add a maxpool layer?

在运行第三点时 $ CUDA_VISIBLE_DEVICES=0,1 python nl_map_save.py 报这个错,该怎么改啊

Traceback (most recent call last): File "nl_map_save.py", line 20, in net.load_state_dict(torch.load('weights/net.pth')) File "/home/user/anaconda3/envs/TEST2/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1045, in load_state_dict self.class.name, "\n\t".join(error_msgs))) RuntimeError: Error(s) in loading state_dict for Network: Missing key(s) in state_dict: "conv_1.0.weight", "conv_1.0.bias", "conv_1.1.weight", "conv_1.1.bias", "conv_1.1.running_mean", "conv_1.1.running_var", "nl_1.g.0.weight", "nl_1.g.0.bias", "nl_1.W.0.weight", "nl_1.W.0.bias", "nl_1.W.1.weight", "nl_1.W.1.bias", "nl_1.W.1.running_mean", "nl_1.W.1.running_var", "nl_1.theta.weight", "nl_1.theta.bias", "nl_1.phi.0.weight", "nl_1.phi.0.bias", "conv_2.0.weight", "conv_2.0.bias", "conv_2.1.weight", "conv_2.1.bias", "conv_2.1.running_mean", "conv_2.1.running_var", "nl_2.g.0.weight", "nl_2.g.0.bias", "nl_2.W.0.weight", "nl_2.W.0.bias", "nl_2.W.1.weight", "nl_2.W.1.bias", "nl_2.W.1.running_mean", "nl_2.W.1.running_var", "nl_2.theta.weight", "nl_2.theta.bias", "nl_2.phi.0.weight", "nl_2.phi.0.bias", "conv_3.0.weight", "conv_3.0.bias", "conv_3.1.weight", "conv_3.1.bias", "conv_3.1.running_mean", "conv_3.1.running_var", "fc.0.weight", "fc.0.bias", "fc.3.weight", "fc.3.bias". Unexpected key(s) in state_dict: "module.conv_1.0.weight", "module.conv_1.0.bias", "module.conv_1.1.weight", "module.conv_1.1.bias", "module.conv_1.1.running_mean", "module.conv_1.1.running_var", "module.conv_1.1.num_batches_tracked", "module.nl_1.g.0.weight", "module.nl_1.g.0.bias", "module.nl_1.W.0.weight", "module.nl_1.W.0.bias", "module.nl_1.W.1.weight", "module.nl_1.W.1.bias", "module.nl_1.W.1.running_mean", "module.nl_1.W.1.running_var", "module.nl_1.W.1.num_batches_tracked", "module.nl_1.theta.weight", "module.nl_1.theta.bias", "module.nl_1.phi.0.weight", "module.nl_1.phi.0.bias", "module.conv_2.0.weight", "module.conv_2.0.bias", "module.conv_2.1.weight", "module.conv_2.1.bias", "module.conv_2.1.running_mean", "module.conv_2.1.running_var", "module.conv_2.1.num_batches_tracked", "module.nl_2.g.0.weight", "module.nl_2.g.0.bias", "module.nl_2.W.0.weight", "module.nl_2.W.0.bias", "module.nl_2.W.1.weight", "module.nl_2.W.1.bias", "module.nl_2.W.1.running_mean", "module.nl_2.W.1.running_var", "module.nl_2.W.1.num_batches_tracked", "module.nl_2.theta.weight", "module.nl_2.theta.bias", "module.nl_2.phi.0.weight", "module.nl_2.phi.0.bias", "module.conv_3.0.weight", "module.conv_3.0.bias", "module.conv_3.1.weight", "module.conv_3.1.bias", "module.conv_3.1.running_mean", "module.conv_3.1.running_var", "module.conv_3.1.num_batches_tracked", "module.fc.0.weight", "module.fc.0.bias", "module.fc.3.weight", "module.fc.3.bias".

net.train() again in demo_MNIST_train.py file

Hi, I started to learn pytorch a few months ago,
so I might be questioning basic stuff.

in [Non-local_pytorch/demo_MNIST_train.py](url) file,
there was net.train() in the 50th line, within the training loop.
after the testing loop, net.train() is called once more in the 70th line.
I know that switching the network into training mode after the testing loop is important,
but why is net.train() in the 70th line necessary when it would be called at the beginning of the training loop anyways?

优秀(Great job)

代码深入浅出,清晰易懂,点赞(Thanks for author for opening the source code. It's a great job)

Should γ be 1 and β be 0?

If you want the output of the Non-local block equal to the input, I think γ and β should be 1 and 0 respectively rather than both 0. Then output = γ * input + β = input. What do you think?

理解热力图

你好,我也想问一下如何去理解热力图,看了paper还是不太理解~,希望能解答一下~

About GPU memory usage

If non-local is applied to the low-level feature map, CUDA out of memory will happen.Is this due to the amount of memory required to compute the Attention matrix?
Looking forward to your reply

AttributeError: 'NoneType' object has no attribute 'fill_'

I want to embed this non-local code in my framework, but when I run my code, the following error will occur:

File "/home/Global/models/non_local_embeded_gaussian.py", line 102, in init
super(NONLocalBlock2D, self).init(in_channels,
File "/home/Global/models/non_local_embeded_gaussian.py", line 54, in init
nn.init.constant_(self.W[1].weight, 0)
File "/home/anaconda3/envs/torch/lib/python3.8/site-packages/torch/nn/init.py", line 176, in constant_
return no_grad_fill(tensor, val)
File "/home/anaconda3/envs/torch/lib/python3.8/site-packages/torch/nn/init.py", line 59, in no_grad_fill
return tensor.fill_(val)
AttributeError: 'NoneType' object has no attribute 'fill_'

In my envirnoment, the pyhton version: 3.8, torch version: 1.8.0, cuda:11.2

bias=False before BatchNorm layer

Thanks very much for your contribution.

In this line:

I think the conv layer should have a bias=False before the batch norm layer.

Please tell me if I ignore something?

implement in other network

SO if I want to add nonblock into my own experiment
I can only add NONLocalBlock2D() into somewhere of the network ? and I did't need to change other params ?

关于此代码效果问题

您好,想请问一下您是否有在某个数据集上测试过您的代码呢?最后的结果是否work呢?

Cuda Out of memory problem

if torch.cuda.is_available():
net.cuda()
should be
if torch.cuda.is_available():
net.cuda(cfg.cuda_num) ?

When I run demo_MNIST.py, It shows
RuntimeError: cuda runtime error (2) : out of memory at
THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1524577177097/work/aten/src/THC/generic/THCStorage.cu line=58 error=2 : out of memory

How should I use the non-local block in the segmentation task?

I try to add the non-local block in the segmentation task as follows, but it seems that there is no improvement, can you give me some advice?Thank you!
(layer1): Sequential(
(0): NONLocalBlock2D(
(g): Sequential(
(0): Conv2d(128, 64, kernel_size=(1, 1), stride=(1, 1))
(1): MaxPool2d(kernel_size=(2, 2), stride=(2, 2), padding=0, dilation=1, ceil_mode=False)
)
(W): Sequential(
(0): Conv2d(64, 128, kernel_size=(1, 1), stride=(1, 1))
(1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(theta): Conv2d(128, 64, kernel_size=(1, 1), stride=(1, 1))
(phi): Sequential(
(0): Conv2d(128, 64, kernel_size=(1, 1), stride=(1, 1))
(1): MaxPool2d(kernel_size=(2, 2), stride=(2, 2), padding=0, dilation=1, ceil_mode=False)
)
)
(1): Sequential(
(0): Bottleneck(
(conv1): Conv2d(128, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace)
(downsample): Sequential(
(0): Conv2d(128, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(1): Bottleneck(
(conv1): Conv2d(256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace)
)
(2): Bottleneck(
(conv1): Conv2d(256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace)
)
)
)

使用点积的热图可视化

您好,我用点积版本的non_local对f进行softmax之后,
f = torch.matmul(theta_x, phi_x)
N = f.size(-1)
f_div_C = f / N
sft = nn.Softmax(dim=2)
f_div_C = sft(f_div_C)
在可视化时还需要对NL_MAP进行normalize吗?如果需要,应该如何normalize呢?
现在我的热图结果只有一个box,看不到正常热图颜色的变化,这是什么原因呢?

What is the pattern of Nonlocal map?

In case of using nonlocal block like self-attention, what is the pattern of map? Mang results of my experiments are vertical or heterogeneous, where nonlocal blocks dont work as expected.

accuracy of the test

Hi, thank you for your code.I think is great. But I have some problems.When I delete Non-local module.I find that the accuracy of test will go higher.I'll test again.Have you ever met that ?

初始化参数问题

您好,我按照您的方法训练,w初始设置0 ,在训练完成w还是0,这样的话non-local就不起作用了啊,望解答,谢谢。

Resnet version

Hi,
Did you try to add the non-local block into Resnet model?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.