Hi, I am wondering if you could share the arch/model? Or I'm try

one more thing the group_channels setting to? </blockquo

one more thing the group_channels setting to? <p d

about arch about dlgsanet HOT 8 CLOSED

FlotingDream commented on August 14, 2024 1

about arch

from dlgsanet.

Comments (8)

FlotingDream commented on August 14, 2024 1

😬hiii

There is no pooling operation in MHDLSA since doing so would result in some information loss for SR tasks.

Due to the comparable parameter count with MHSA, the squeeze factor number is 4.

The num of kernel size is 7, compares to the 8 window size of MHSA.

The fnn_expansion_factor is 2, compares to the SwinIR.

The kernel size of TLC is set to 48 (the training patch size) for the x4 upscale factor.

Tips: We reconstruct the https://github.com/Atten4Vis/DemystifyLocalViT/blob/master/models/dwnet.py into MHDLSA, as indicated in the paper. We are grateful for the excellent work done by the https://github.com/Atten4Vis/DemystifyLocalViT/

🙏: We apologize for not extending our work to the onnx model for mobile devices.

TODO: Soon, we'll make the archs file available with more information about the network architecture.

We admit that our work has advanced significantly as a result of our completion of several details during the network structure design phase. I do, however, hope that it might help you in applying the methods that we proposed.

thx your quick replay.

~~no pool seems the dim do not mach? how to get that work? the Reshape() in eq.1 combine H*W to 1? how? and what is G mean? maybe I just miss this part in the whole implemente. thx~~
seems I miss the line x denotes the pixel index. have a clue thx, so it is idynamic? G head use? and maybe a code example for The detailed network of the dynamic weight generation is shown in Figure 2. Similar to the multi-head selfattention methods [21, 23, 31], we divide the number of feature channels into G heads and learn separate dynamic weights in parallel
2-5. fine

onnx just request for more detiles of arch. if official arches will release soon. onnx is not needed.

thx your good work!

from dlgsanet.

FlotingDream commented on August 14, 2024 1

one more thing the group_channels setting to?

from dlgsanet.

FlotingDream commented on August 14, 2024 1

one more thing the group_channels setting to?

While setting the inhomogeneous=True, the heads is set to 6 (compares to SwinIR) in the DWBlock-> https://github.com/Atten4Vis/DemystifyLocalViT/blob/master/models/dwnet.py

maybe the paper https://github.com/Atten4Vis/DemystifyLocalViT/ will help you understand the key designs of inhomogeneous DWConvolution and our proposed MHDLSA.

class DWBlock(nn.Module):

    def __init__(self, dim, window_size, dynamic=False, inhomogeneous=False, heads=None):
        super().__init__()
        self.dim = dim
        self.window_size = window_size  # Wh, Ww
        self.dynamic = dynamic 
        self.inhomogeneous = inhomogeneous
        self.heads = heads
        
       # remove the BatchNorm according to the research of EDSR.
       # remove the redundant activation.        

        # pw-linear
        self.conv0 = nn.Conv2d(dim, dim, 1, bias=False)
        # self.bn0 = nn.BatchNorm2d(dim)
        
        if dynamic and not inhomogeneous:
            self.conv = DynamicDWConv(dim, kernel_size=window_size, stride=1, padding=window_size // 2, groups=dim)
        if dynamic and inhomogeneous:
            print(window_size, heads)
            self.conv = IDynamicDWConv(dim, window_size, heads)
        else :
            self.conv = nn.Conv2d(dim, dim, kernel_size=window_size, stride=1, padding=window_size // 2, groups=dim)
        
        # self.bn = nn.BatchNorm2d(dim)
        # self.relu=nn.ReLU(inplace=True)
                
        # pw-linear
        self.conv2=nn.Conv2d(dim, dim, 1, bias=False)
        self.bn2 = nn.BatchNorm2d(dim)

that's clear, thx!

from dlgsanet.

NeonLeexiang commented on August 14, 2024

😬hiii

There is no pooling operation in MHDLSA since doing so would result in some information loss for SR tasks.
Due to the comparable parameter count with MHSA, the squeeze factor number is 4.
The num of kernel size is 7, compares to the 8 window size of MHSA.
The fnn_expansion_factor is 2, compares to the SwinIR.
The kernel size of TLC is set to 48 (the training patch size) for the x4 upscale factor.

Tips: We reconstruct the https://github.com/Atten4Vis/DemystifyLocalViT/blob/master/models/dwnet.py into MHDLSA, as indicated in the paper. We are grateful for the excellent work done by the https://github.com/Atten4Vis/DemystifyLocalViT/

🙏: We apologize for not extending our work to the onnx model for mobile devices.

TODO: Soon, we'll make the archs file available with more information about the network architecture.

We admit that our work has advanced significantly as a result of our completion of several details during the network structure design phase. I do, however, hope that it might help you in applying the methods that we proposed.

from dlgsanet.

NeonLeexiang commented on August 14, 2024

😬hiii

There is no pooling operation in MHDLSA since doing so would result in some information loss for SR tasks.

Due to the comparable parameter count with MHSA, the squeeze factor number is 4.

The num of kernel size is 7, compares to the 8 window size of MHSA.

The fnn_expansion_factor is 2, compares to the SwinIR.

The kernel size of TLC is set to 48 (the training patch size) for the x4 upscale factor.

Tips: We reconstruct the https://github.com/Atten4Vis/DemystifyLocalViT/blob/master/models/dwnet.py into MHDLSA, as indicated in the paper. We are grateful for the excellent work done by the https://github.com/Atten4Vis/DemystifyLocalViT/
🙏: We apologize for not extending our work to the onnx model for mobile devices.

TODO: Soon, we'll make the archs file available with more information about the network architecture.

We admit that our work has advanced significantly as a result of our completion of several details during the network structure design phase. I do, however, hope that it might help you in applying the methods that we proposed.

thx your quick replay.

~no pool seems the dim do not mach? how to get that work? the Reshape() in eq.1 combine H*W to 1? how? and what is G mean? maybe I just miss this part in the whole implemente. thx ~
seems I miss the line x denotes the pixel index. have a clue thx, so it is idynamic? G head use? and maybe a code example for The detailed network of the dynamic weight generation is shown in Figure 2. Similar to the multi-head selfattention methods [21, 23, 31], we divide the number of feature channels into G heads and learn separate dynamic weights in parallel
2-5. fine

onnx just request for more detiles of arch. if official arches will release soon. onnx is not needed.

thx your good work!

Literally, the brilliant work of idyanamic from Atten4vis is the foundation for our proposed MHDLSA. We devoted a lot of effort to reconstructing the DemystifyLocalViT, though, because it was designed for high-level computer vision tasks. Ultimately, we proposed our MHDLSA for SR tasks. As pointed out in the paper, in order to provide a more relevant attention map for dynamic convolution, we remove the redundant normalization layer and non-linear activation and add a depth-wise convolution. (L262-L270 in Section 3.1 in the manuscript).

from dlgsanet.

NeonLeexiang commented on August 14, 2024

😬hiii

There is no pooling operation in MHDLSA since doing so would result in some information loss for SR tasks.

Due to the comparable parameter count with MHSA, the squeeze factor number is 4.

The num of kernel size is 7, compares to the 8 window size of MHSA.

The fnn_expansion_factor is 2, compares to the SwinIR.

The kernel size of TLC is set to 48 (the training patch size) for the x4 upscale factor.

Tips: We reconstruct the https://github.com/Atten4Vis/DemystifyLocalViT/blob/master/models/dwnet.py into MHDLSA, as indicated in the paper. We are grateful for the excellent work done by the https://github.com/Atten4Vis/DemystifyLocalViT/
🙏: We apologize for not extending our work to the onnx model for mobile devices.

TODO: Soon, we'll make the archs file available with more information about the network architecture.

We admit that our work has advanced significantly as a result of our completion of several details during the network structure design phase. I do, however, hope that it might help you in applying the methods that we proposed.

thx your quick replay.

~no pool seems the dim do not mach? how to get that work? the Reshape() in eq.1 combine H*W to 1? how? and what is G mean? maybe I just miss this part in the whole implemente. thx ~
seems I miss the line x denotes the pixel index. have a clue thx, so it is idynamic? G head use? and maybe a code example for The detailed network of the dynamic weight generation is shown in Figure 2. Similar to the multi-head selfattention methods [21, 23, 31], we divide the number of feature channels into G heads and learn separate dynamic weights in parallel
2-5. fine

onnx just request for more detiles of arch. if official arches will release soon. onnx is not needed.
thx your good work!

Literally, the brilliant work of idyanamic from Atten4vis is the foundation for our proposed MHDLSA. We devoted a lot of effort to reconstructing the DemystifyLocalViT, though, because it was designed for high-level computer vision tasks. Ultimately, we proposed our MHDLSA for SR tasks. As pointed out in the paper, in order to provide a more relevant attention map for dynamic convolution, we remove the redundant normalization layer and non-linear activation and add a depth-wise convolution. (L262-L270 in Section 3.1 in the manuscript).

class IDynamicDWConv(nn.Module):

    def __init__(self,
                 channels,
                 kernel_size,
                 group_channels):
        super(IDynamicDWConv, self).__init__()
        self.kernel_size = kernel_size
        self.channels = channels
        reduction_ratio = 4
        self.group_channels = group_channels
        self.groups = self.channels // self.group_channels
        self.conv1 = nn.Sequential(
            nn.Conv2d(channels, channels // reduction_ratio, 1),
            # nn.BatchNorm2d(channels // reduction_ratio),
            # nn.ReLU()
            # As mentioned, remove redundant normalization and activation, add depth-wise convolution
            nn.Conv2d(channels // reduction_ratio, channels // reduction_ratio,  self.kernel_size, padding=self.kernel_size//2, groups=channels // reduction_ratio),
        )
        self.conv2 = nn.Sequential(
            nn.Conv2d(channels // reduction_ratio, kernel_size**2 * self.groups, 1)
        )

    def forward(self, x):
        weight = self.conv2(self.conv1(x))
        b, c, h, w = weight.shape
        weight = weight.view(b, self.groups, self.kernel_size, self.kernel_size, h, w)
        out = _idynamic_cuda(x, weight, stride=1, padding=(self.kernel_size-1)//2)
        return out

from dlgsanet.

FlotingDream commented on August 14, 2024

😬hiii

There is no pooling operation in MHDLSA since doing so would result in some information loss for SR tasks.

Due to the comparable parameter count with MHSA, the squeeze factor number is 4.

The num of kernel size is 7, compares to the 8 window size of MHSA.

The fnn_expansion_factor is 2, compares to the SwinIR.

The kernel size of TLC is set to 48 (the training patch size) for the x4 upscale factor.

Tips: We reconstruct the https://github.com/Atten4Vis/DemystifyLocalViT/blob/master/models/dwnet.py into MHDLSA, as indicated in the paper. We are grateful for the excellent work done by the https://github.com/Atten4Vis/DemystifyLocalViT/
🙏: We apologize for not extending our work to the onnx model for mobile devices.

TODO: Soon, we'll make the archs file available with more information about the network architecture.

We admit that our work has advanced significantly as a result of our completion of several details during the network structure design phase. I do, however, hope that it might help you in applying the methods that we proposed.

thx your quick replay.

~no pool seems the dim do not mach? how to get that work? the Reshape() in eq.1 combine H*W to 1? how? and what is G mean? maybe I just miss this part in the whole implemente. thx ~
seems I miss the line x denotes the pixel index. have a clue thx, so it is idynamic? G head use? and maybe a code example for The detailed network of the dynamic weight generation is shown in Figure 2. Similar to the multi-head selfattention methods [21, 23, 31], we divide the number of feature channels into G heads and learn separate dynamic weights in parallel
2-5. fine

onnx just request for more detiles of arch. if official arches will release soon. onnx is not needed.
thx your good work!

Literally, the brilliant work of idyanamic from Atten4vis is the foundation for our proposed MHDLSA. We devoted a lot of effort to reconstructing the DemystifyLocalViT, though, because it was designed for high-level computer vision tasks. Ultimately, we proposed our MHDLSA for SR tasks. As pointed out in the paper, in order to provide a more relevant attention map for dynamic convolution, we remove the redundant normalization layer and non-linear activation and add a depth-wise convolution. (L262-L270 in Section 3.1 in the manuscript).
class IDynamicDWConv(nn.Module):

    def __init__(self,
                 channels,
                 kernel_size,
                 group_channels):
        super(IDynamicDWConv, self).__init__()
        self.kernel_size = kernel_size
        self.channels = channels
        reduction_ratio = 4
        self.group_channels = group_channels
        self.groups = self.channels // self.group_channels
        self.conv1 = nn.Sequential(
            nn.Conv2d(channels, channels // reduction_ratio, 1),
            # nn.BatchNorm2d(channels // reduction_ratio),
            # nn.ReLU()
            # As mentioned, remove redundant normalization and activation, add depth-wise convolution
            nn.Conv2d(channels // reduction_ratio, channels // reduction_ratio,  self.kernel_size, padding=self.kernel_size//2, groups=channels // reduction_ratio),
        )
        self.conv2 = nn.Sequential(
            nn.Conv2d(channels // reduction_ratio, kernel_size**2 * self.groups, 1)
        )

    def forward(self, x):
        weight = self.conv2(self.conv1(x))
        b, c, h, w = weight.shape
        weight = weight.view(b, self.groups, self.kernel_size, self.kernel_size, h, w)
        out = _idynamic_cuda(x, weight, stride=1, padding=(self.kernel_size-1)//2)
        return out

that's clear. thx

from dlgsanet.

NeonLeexiang commented on August 14, 2024

one more thing the group_channels setting to?

While setting the inhomogeneous=True, the heads is set to 6 (compares to SwinIR) in the DWBlock-> https://github.com/Atten4Vis/DemystifyLocalViT/blob/master/models/dwnet.py

maybe the paper https://github.com/Atten4Vis/DemystifyLocalViT/ will help you understand the key designs of inhomogeneous DWConvolution and our proposed MHDLSA.

class DWBlock(nn.Module):

    def __init__(self, dim, window_size, dynamic=False, inhomogeneous=False, heads=None):
        super().__init__()
        self.dim = dim
        self.window_size = window_size  # Wh, Ww
        self.dynamic = dynamic 
        self.inhomogeneous = inhomogeneous
        self.heads = heads
        
       # remove the BatchNorm according to the research of EDSR.
       # remove the redundant activation.        

        # pw-linear
        self.conv0 = nn.Conv2d(dim, dim, 1, bias=False)
        # self.bn0 = nn.BatchNorm2d(dim)
        
        if dynamic and not inhomogeneous:
            self.conv = DynamicDWConv(dim, kernel_size=window_size, stride=1, padding=window_size // 2, groups=dim)
        if dynamic and inhomogeneous:
            print(window_size, heads)
            self.conv = IDynamicDWConv(dim, window_size, heads)
        else :
            self.conv = nn.Conv2d(dim, dim, kernel_size=window_size, stride=1, padding=window_size // 2, groups=dim)
        
        # self.bn = nn.BatchNorm2d(dim)
        # self.relu=nn.ReLU(inplace=True)
                
        # pw-linear
        self.conv2=nn.Conv2d(dim, dim, 1, bias=False)
        self.bn2 = nn.BatchNorm2d(dim)

from dlgsanet.

about arch about dlgsanet HOT 8 CLOSED

Comments (8)

TODO: Soon, we'll make the archs file available with more information about the network architecture.

TODO: Soon, we'll make the archs file available with more information about the network architecture.

TODO: Soon, we'll make the archs file available with more information about the network architecture.

TODO: Soon, we'll make the archs file available with more information about the network architecture.

TODO: Soon, we'll make the archs file available with more information about the network architecture.

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent