jugghm / orepa_cvpr2022 Goto Github PK

View Code? Open in Web Editor NEW

162.0 162.0 16.0 896 KB

CVPR 2022 "Online Convolutional Re-parameterization"

License: Apache License 2.0

Python 100.00%

pytorch re-parameterization

orepa_cvpr2022's People

Contributors

Stargazers

Watchers

Forkers

mldl scott-mao hwijune zxc1667543276 hehongjie kepengxu allenkaichen dl-cnn vladpaunescu janfschr longamu xiaojake dumoio wangyanghu1

orepa_cvpr2022's Issues

about accuracy of ResNet34

Hello, I'm very confused about accuracy of ResNet34. Specifically, I train ResNet34 many time, but accuracy of ResNet34 is about 74.40. I found that this paper and RepVGG both report accuracy about 74.13. I comfirm that my setting is identical with RepVGG including the number of devices. could you help me please?

I think there is a minor error

Hi, thanks for making your code public. It is really great work!
I ran your code, and i think there is a minor error in your code.

On line 217 in train.py
lr_scheduler = WarmupCosineAnnealingLR(optimizer=optimizer, T_cosine_max=args.epochs * IMAGENET_TRAINSET_SIZE // args.batch_size // ngpus_per_node, warmup=args.epochs/24)

I think it makes warm up the learning rate during only 5 steps, not 5 epochs. To warm up the learning rate during 5 epochs, 'args.epochs/24' should be 'args.epochs*len(train_loader)/24'.

Therefore, I modify line 217 as follows,
lr_scheduler = WarmupCosineAnnealingLR(optimizer=optimizer, T_cosine_max=args.epochs * len(train_loader), warmup=args.epochs * len(train_loader) / 24)

Thank you!

预训练权重下载需要申请访问权限

您好，可以放开一下您谷歌云盘预训练权重的访问权限吗？需要申请权限才可下载，感谢！

关于scaling

作者你好，论文提到用scaling替代bn，但是为什么源码中还是用bn？

OREPA_LargeConvBase result is wrong

Hi,
When I was reproducing OREPA_LargeConvBase, I found some results that were different from what I expected. Could you tell me the reason for this result?

import torch
from torch import nn as nn
import torch.nn.functional as F
import torch.nn.init as init
import math

weight = nn.Parameter(torch.Tensor(128, 64, 3, 3))
weight1 = nn.Parameter(torch.Tensor(128, 128, 3, 3))
weight2 = nn.Parameter(torch.Tensor(64, 128, 3, 3))

init.kaiming_uniform_(weight, a=math.sqrt(5))
init.kaiming_uniform_(weight1, a=math.sqrt(5))
init.kaiming_uniform_(weight2, a=math.sqrt(5))

rep_weight = weight.transpose(0, 1)
rep_weight = F.conv2d(rep_weight, weight1, groups=1, padding=2)
rep_weight = F.conv2d(rep_weight, weight2, groups=1, padding=2)
rep_weight = rep_weight.transpose(0, 1)

data = torch.randn((1, 64, 1080, 1920)) * 255
conv_result = F.conv2d(F.conv2d(F.conv2d(data, weight=weight, padding=1), weight=weight1, padding=1), weight=weight2, padding=1)
rep_result = F.conv2d(input=data, weight=rep_weight, bias=None, stride=1, padding=3)

diff = torch.abs(rep_result - conv_result)
print(f"max diff: {diff.max()}")
print(f"median diff: {diff.median()}")
print(f"mean diff: {diff.mean()}")


# max diff: 365.46533203125
# median diff: 44.0756950378418
# mean diff: 52.17301559448242

about linear deep stem

Hi,
Thank you for your great work.
Where is the implementation code for the "Linear Deep Dry" method?
Is it in "OREPA_LargeConvBase"?

Numerical Stability

Hi, I'm wondering if you've run into any issues with numerical stability or know what may be the cause.

With normal RepVGG, I get differences as high as 4e-4 comparing before and after switching to deploy. After changing first conv to OREPA_LargeConv, I get errors as high as 2e-3. After changing the 1x1 conv in the RepVGG block to OREPA_1x1, I get differences as high as 0.1.

It seems numerical stability makes it challenging to use identity + OREPA_1x1 + OREPA_3x3 blocks in RepVGG style model. Any thoughts about why?

About weight similarity across branches.

Hi, thanks for your great work!

I tried to reproduce the visualization of branch-level similarity of OREPA blocks, but the unexpected results emerged.

Could you share details about it?

BN memory usage

Hello, impressed by your block squeezing and block linearization idea, but the memory usage of bn implemented by pytorch is some kind of weird, which is not IN-PLACE, which allocates an input tensor size memory buffer for output, thus doubles the memory consumption. please refer to https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/cudnn/BatchNorm.cpp so i think your comparison is not fair, to some extent.

TestTestTest123123

可以解释下Proposition 1吗，没太明白，谢谢

Proposition 1 A single-branch linear mapping, when re-parameterizing parts or all of it by over-two-layer multi-branch topologies, the entire end-to-end weight matrix will
be differently optimized. If one layer of the mapping is re-parameterized to up-to-one-layer multi-branch topologies,
the optimization will remain unchanged.