Code Monkey home page Code Monkey logo

Comments (8)

FlotingDream avatar FlotingDream commented on August 14, 2024 1

😬hiii

  1. There is no pooling operation in MHDLSA since doing so would result in some information loss for SR tasks.
  2. Due to the comparable parameter count with MHSA, the squeeze factor number is 4.
  3. The num of kernel size is 7, compares to the 8 window size of MHSA.
  4. The fnn_expansion_factor is 2, compares to the SwinIR.
  5. The kernel size of TLC is set to 48 (the training patch size) for the x4 upscale factor.

Tips: We reconstruct the https://github.com/Atten4Vis/DemystifyLocalViT/blob/master/models/dwnet.py into MHDLSA, as indicated in the paper. We are grateful for the excellent work done by the https://github.com/Atten4Vis/DemystifyLocalViT/

🙏: We apologize for not extending our work to the onnx model for mobile devices.

TODO: Soon, we'll make the archs file available with more information about the network architecture.

We admit that our work has advanced significantly as a result of our completion of several details during the network structure design phase. I do, however, hope that it might help you in applying the methods that we proposed.

thx your quick replay.

  1. no pool seems the dim do not mach? how to get that work? the Reshape() in eq.1 combine H*W to 1? how? and what is G mean? maybe I just miss this part in the whole implemente. thx
    seems I miss the line x denotes the pixel index. have a clue thx, so it is idynamic? G head use? and maybe a code example for The detailed network of the dynamic weight generation is shown in Figure 2. Similar to the multi-head selfattention methods [21, 23, 31], we divide the number of feature channels into G heads and learn separate dynamic weights in parallel
    2-5. fine

onnx just request for more detiles of arch. if official arches will release soon. onnx is not needed.

thx your good work!

from dlgsanet.

FlotingDream avatar FlotingDream commented on August 14, 2024 1

one more thing the group_channels setting to?

from dlgsanet.

FlotingDream avatar FlotingDream commented on August 14, 2024 1

one more thing the group_channels setting to?

While setting the inhomogeneous=True, the heads is set to 6 (compares to SwinIR) in the DWBlock-> https://github.com/Atten4Vis/DemystifyLocalViT/blob/master/models/dwnet.py

maybe the paper https://github.com/Atten4Vis/DemystifyLocalViT/ will help you understand the key designs of inhomogeneous DWConvolution and our proposed MHDLSA.

class DWBlock(nn.Module):

    def __init__(self, dim, window_size, dynamic=False, inhomogeneous=False, heads=None):
        super().__init__()
        self.dim = dim
        self.window_size = window_size  # Wh, Ww
        self.dynamic = dynamic 
        self.inhomogeneous = inhomogeneous
        self.heads = heads
        
       # remove the BatchNorm according to the research of EDSR.
       # remove the redundant activation.        

        # pw-linear
        self.conv0 = nn.Conv2d(dim, dim, 1, bias=False)
        # self.bn0 = nn.BatchNorm2d(dim)
        
        if dynamic and not inhomogeneous:
            self.conv = DynamicDWConv(dim, kernel_size=window_size, stride=1, padding=window_size // 2, groups=dim)
        if dynamic and inhomogeneous:
            print(window_size, heads)
            self.conv = IDynamicDWConv(dim, window_size, heads)
        else :
            self.conv = nn.Conv2d(dim, dim, kernel_size=window_size, stride=1, padding=window_size // 2, groups=dim)
        
        # self.bn = nn.BatchNorm2d(dim)
        # self.relu=nn.ReLU(inplace=True)
                
        # pw-linear
        self.conv2=nn.Conv2d(dim, dim, 1, bias=False)
        self.bn2 = nn.BatchNorm2d(dim)

that's clear, thx!

from dlgsanet.

NeonLeexiang avatar NeonLeexiang commented on August 14, 2024

😬hiii

  1. There is no pooling operation in MHDLSA since doing so would result in some information loss for SR tasks.
  2. Due to the comparable parameter count with MHSA, the squeeze factor number is 4.
  3. The num of kernel size is 7, compares to the 8 window size of MHSA.
  4. The fnn_expansion_factor is 2, compares to the SwinIR.
  5. The kernel size of TLC is set to 48 (the training patch size) for the x4 upscale factor.

Tips: We reconstruct the https://github.com/Atten4Vis/DemystifyLocalViT/blob/master/models/dwnet.py into MHDLSA, as indicated in the paper. We are grateful for the excellent work done by the https://github.com/Atten4Vis/DemystifyLocalViT/

🙏: We apologize for not extending our work to the onnx model for mobile devices.

TODO: Soon, we'll make the archs file available with more information about the network architecture.

We admit that our work has advanced significantly as a result of our completion of several details during the network structure design phase. I do, however, hope that it might help you in applying the methods that we proposed.

from dlgsanet.

NeonLeexiang avatar NeonLeexiang commented on August 14, 2024

😬hiii

  1. There is no pooling operation in MHDLSA since doing so would result in some information loss for SR tasks.
  2. Due to the comparable parameter count with MHSA, the squeeze factor number is 4.
  3. The num of kernel size is 7, compares to the 8 window size of MHSA.
  4. The fnn_expansion_factor is 2, compares to the SwinIR.
  5. The kernel size of TLC is set to 48 (the training patch size) for the x4 upscale factor.

Tips: We reconstruct the https://github.com/Atten4Vis/DemystifyLocalViT/blob/master/models/dwnet.py into MHDLSA, as indicated in the paper. We are grateful for the excellent work done by the https://github.com/Atten4Vis/DemystifyLocalViT/
🙏: We apologize for not extending our work to the onnx model for mobile devices.

TODO: Soon, we'll make the archs file available with more information about the network architecture.

We admit that our work has advanced significantly as a result of our completion of several details during the network structure design phase. I do, however, hope that it might help you in applying the methods that we proposed.

thx your quick replay.

  1. ~no pool seems the dim do not mach? how to get that work? the Reshape() in eq.1 combine H*W to 1? how? and what is G mean? maybe I just miss this part in the whole implemente. thx ~
    seems I miss the line x denotes the pixel index. have a clue thx, so it is idynamic? G head use? and maybe a code example for The detailed network of the dynamic weight generation is shown in Figure 2. Similar to the multi-head selfattention methods [21, 23, 31], we divide the number of feature channels into G heads and learn separate dynamic weights in parallel
    2-5. fine

onnx just request for more detiles of arch. if official arches will release soon. onnx is not needed.

thx your good work!


Literally, the brilliant work of idyanamic from Atten4vis is the foundation for our proposed MHDLSA. We devoted a lot of effort to reconstructing the DemystifyLocalViT, though, because it was designed for high-level computer vision tasks. Ultimately, we proposed our MHDLSA for SR tasks. As pointed out in the paper, in order to provide a more relevant attention map for dynamic convolution, we remove the redundant normalization layer and non-linear activation and add a depth-wise convolution. (L262-L270 in Section 3.1 in the manuscript).

from dlgsanet.

NeonLeexiang avatar NeonLeexiang commented on August 14, 2024

😬hiii

  1. There is no pooling operation in MHDLSA since doing so would result in some information loss for SR tasks.
  2. Due to the comparable parameter count with MHSA, the squeeze factor number is 4.
  3. The num of kernel size is 7, compares to the 8 window size of MHSA.
  4. The fnn_expansion_factor is 2, compares to the SwinIR.
  5. The kernel size of TLC is set to 48 (the training patch size) for the x4 upscale factor.

Tips: We reconstruct the https://github.com/Atten4Vis/DemystifyLocalViT/blob/master/models/dwnet.py into MHDLSA, as indicated in the paper. We are grateful for the excellent work done by the https://github.com/Atten4Vis/DemystifyLocalViT/
🙏: We apologize for not extending our work to the onnx model for mobile devices.

TODO: Soon, we'll make the archs file available with more information about the network architecture.

We admit that our work has advanced significantly as a result of our completion of several details during the network structure design phase. I do, however, hope that it might help you in applying the methods that we proposed.

thx your quick replay.

  1. ~no pool seems the dim do not mach? how to get that work? the Reshape() in eq.1 combine H*W to 1? how? and what is G mean? maybe I just miss this part in the whole implemente. thx ~
    seems I miss the line x denotes the pixel index. have a clue thx, so it is idynamic? G head use? and maybe a code example for The detailed network of the dynamic weight generation is shown in Figure 2. Similar to the multi-head selfattention methods [21, 23, 31], we divide the number of feature channels into G heads and learn separate dynamic weights in parallel
    2-5. fine

onnx just request for more detiles of arch. if official arches will release soon. onnx is not needed.
thx your good work!

Literally, the brilliant work of idyanamic from Atten4vis is the foundation for our proposed MHDLSA. We devoted a lot of effort to reconstructing the DemystifyLocalViT, though, because it was designed for high-level computer vision tasks. Ultimately, we proposed our MHDLSA for SR tasks. As pointed out in the paper, in order to provide a more relevant attention map for dynamic convolution, we remove the redundant normalization layer and non-linear activation and add a depth-wise convolution. (L262-L270 in Section 3.1 in the manuscript).


class IDynamicDWConv(nn.Module):

    def __init__(self,
                 channels,
                 kernel_size,
                 group_channels):
        super(IDynamicDWConv, self).__init__()
        self.kernel_size = kernel_size
        self.channels = channels
        reduction_ratio = 4
        self.group_channels = group_channels
        self.groups = self.channels // self.group_channels
        self.conv1 = nn.Sequential(
            nn.Conv2d(channels, channels // reduction_ratio, 1),
            # nn.BatchNorm2d(channels // reduction_ratio),
            # nn.ReLU()
            # As mentioned, remove redundant normalization and activation, add depth-wise convolution
            nn.Conv2d(channels // reduction_ratio, channels // reduction_ratio,  self.kernel_size, padding=self.kernel_size//2, groups=channels // reduction_ratio),
        )
        self.conv2 = nn.Sequential(
            nn.Conv2d(channels // reduction_ratio, kernel_size**2 * self.groups, 1)
        )

    def forward(self, x):
        weight = self.conv2(self.conv1(x))
        b, c, h, w = weight.shape
        weight = weight.view(b, self.groups, self.kernel_size, self.kernel_size, h, w)
        out = _idynamic_cuda(x, weight, stride=1, padding=(self.kernel_size-1)//2)
        return out

from dlgsanet.

FlotingDream avatar FlotingDream commented on August 14, 2024

😬hiii

  1. There is no pooling operation in MHDLSA since doing so would result in some information loss for SR tasks.
  2. Due to the comparable parameter count with MHSA, the squeeze factor number is 4.
  3. The num of kernel size is 7, compares to the 8 window size of MHSA.
  4. The fnn_expansion_factor is 2, compares to the SwinIR.
  5. The kernel size of TLC is set to 48 (the training patch size) for the x4 upscale factor.

Tips: We reconstruct the https://github.com/Atten4Vis/DemystifyLocalViT/blob/master/models/dwnet.py into MHDLSA, as indicated in the paper. We are grateful for the excellent work done by the https://github.com/Atten4Vis/DemystifyLocalViT/
🙏: We apologize for not extending our work to the onnx model for mobile devices.

TODO: Soon, we'll make the archs file available with more information about the network architecture.

We admit that our work has advanced significantly as a result of our completion of several details during the network structure design phase. I do, however, hope that it might help you in applying the methods that we proposed.

thx your quick replay.

  1. ~no pool seems the dim do not mach? how to get that work? the Reshape() in eq.1 combine H*W to 1? how? and what is G mean? maybe I just miss this part in the whole implemente. thx ~
    seems I miss the line x denotes the pixel index. have a clue thx, so it is idynamic? G head use? and maybe a code example for The detailed network of the dynamic weight generation is shown in Figure 2. Similar to the multi-head selfattention methods [21, 23, 31], we divide the number of feature channels into G heads and learn separate dynamic weights in parallel
    2-5. fine

onnx just request for more detiles of arch. if official arches will release soon. onnx is not needed.
thx your good work!

Literally, the brilliant work of idyanamic from Atten4vis is the foundation for our proposed MHDLSA. We devoted a lot of effort to reconstructing the DemystifyLocalViT, though, because it was designed for high-level computer vision tasks. Ultimately, we proposed our MHDLSA for SR tasks. As pointed out in the paper, in order to provide a more relevant attention map for dynamic convolution, we remove the redundant normalization layer and non-linear activation and add a depth-wise convolution. (L262-L270 in Section 3.1 in the manuscript).

class IDynamicDWConv(nn.Module):

    def __init__(self,
                 channels,
                 kernel_size,
                 group_channels):
        super(IDynamicDWConv, self).__init__()
        self.kernel_size = kernel_size
        self.channels = channels
        reduction_ratio = 4
        self.group_channels = group_channels
        self.groups = self.channels // self.group_channels
        self.conv1 = nn.Sequential(
            nn.Conv2d(channels, channels // reduction_ratio, 1),
            # nn.BatchNorm2d(channels // reduction_ratio),
            # nn.ReLU()
            # As mentioned, remove redundant normalization and activation, add depth-wise convolution
            nn.Conv2d(channels // reduction_ratio, channels // reduction_ratio,  self.kernel_size, padding=self.kernel_size//2, groups=channels // reduction_ratio),
        )
        self.conv2 = nn.Sequential(
            nn.Conv2d(channels // reduction_ratio, kernel_size**2 * self.groups, 1)
        )

    def forward(self, x):
        weight = self.conv2(self.conv1(x))
        b, c, h, w = weight.shape
        weight = weight.view(b, self.groups, self.kernel_size, self.kernel_size, h, w)
        out = _idynamic_cuda(x, weight, stride=1, padding=(self.kernel_size-1)//2)
        return out

that's clear. thx

from dlgsanet.

NeonLeexiang avatar NeonLeexiang commented on August 14, 2024

one more thing the group_channels setting to?


While setting the inhomogeneous=True, the heads is set to 6 (compares to SwinIR) in the DWBlock-> https://github.com/Atten4Vis/DemystifyLocalViT/blob/master/models/dwnet.py

maybe the paper https://github.com/Atten4Vis/DemystifyLocalViT/ will help you understand the key designs of inhomogeneous DWConvolution and our proposed MHDLSA.

class DWBlock(nn.Module):

    def __init__(self, dim, window_size, dynamic=False, inhomogeneous=False, heads=None):
        super().__init__()
        self.dim = dim
        self.window_size = window_size  # Wh, Ww
        self.dynamic = dynamic 
        self.inhomogeneous = inhomogeneous
        self.heads = heads
        
       # remove the BatchNorm according to the research of EDSR.
       # remove the redundant activation.        

        # pw-linear
        self.conv0 = nn.Conv2d(dim, dim, 1, bias=False)
        # self.bn0 = nn.BatchNorm2d(dim)
        
        if dynamic and not inhomogeneous:
            self.conv = DynamicDWConv(dim, kernel_size=window_size, stride=1, padding=window_size // 2, groups=dim)
        if dynamic and inhomogeneous:
            print(window_size, heads)
            self.conv = IDynamicDWConv(dim, window_size, heads)
        else :
            self.conv = nn.Conv2d(dim, dim, kernel_size=window_size, stride=1, padding=window_size // 2, groups=dim)
        
        # self.bn = nn.BatchNorm2d(dim)
        # self.relu=nn.ReLU(inplace=True)
                
        # pw-linear
        self.conv2=nn.Conv2d(dim, dim, 1, bias=False)
        self.bn2 = nn.BatchNorm2d(dim)

from dlgsanet.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.