jugghm / orepa_cvpr2022 Goto Github PK
View Code? Open in Web Editor NEWCVPR 2022 "Online Convolutional Re-parameterization"
License: Apache License 2.0
CVPR 2022 "Online Convolutional Re-parameterization"
License: Apache License 2.0
Hello, I'm very confused about accuracy of ResNet34. Specifically, I train ResNet34 many time, but accuracy of ResNet34 is about 74.40. I found that this paper and RepVGG both report accuracy about 74.13. I comfirm that my setting is identical with RepVGG including the number of devices. could you help me please?
Hi, thanks for making your code public. It is really great work!
I ran your code, and i think there is a minor error in your code.
On line 217 in train.py
lr_scheduler = WarmupCosineAnnealingLR(optimizer=optimizer, T_cosine_max=args.epochs * IMAGENET_TRAINSET_SIZE // args.batch_size // ngpus_per_node, warmup=args.epochs/24)
I think it makes warm up the learning rate during only 5 steps, not 5 epochs. To warm up the learning rate during 5 epochs, 'args.epochs/24' should be 'args.epochs*len(train_loader)/24'.
Therefore, I modify line 217 as follows,
lr_scheduler = WarmupCosineAnnealingLR(optimizer=optimizer, T_cosine_max=args.epochs * len(train_loader), warmup=args.epochs * len(train_loader) / 24)
Thank you!
您好,可以放开一下您谷歌云盘预训练权重的访问权限吗?需要申请权限才可下载,感谢!
作者你好,论文提到用scaling替代bn,但是为什么源码中还是用bn?
Hi,
When I was reproducing OREPA_LargeConvBase, I found some results that were different from what I expected. Could you tell me the reason for this result?
import torch
from torch import nn as nn
import torch.nn.functional as F
import torch.nn.init as init
import math
weight = nn.Parameter(torch.Tensor(128, 64, 3, 3))
weight1 = nn.Parameter(torch.Tensor(128, 128, 3, 3))
weight2 = nn.Parameter(torch.Tensor(64, 128, 3, 3))
init.kaiming_uniform_(weight, a=math.sqrt(5))
init.kaiming_uniform_(weight1, a=math.sqrt(5))
init.kaiming_uniform_(weight2, a=math.sqrt(5))
rep_weight = weight.transpose(0, 1)
rep_weight = F.conv2d(rep_weight, weight1, groups=1, padding=2)
rep_weight = F.conv2d(rep_weight, weight2, groups=1, padding=2)
rep_weight = rep_weight.transpose(0, 1)
data = torch.randn((1, 64, 1080, 1920)) * 255
conv_result = F.conv2d(F.conv2d(F.conv2d(data, weight=weight, padding=1), weight=weight1, padding=1), weight=weight2, padding=1)
rep_result = F.conv2d(input=data, weight=rep_weight, bias=None, stride=1, padding=3)
diff = torch.abs(rep_result - conv_result)
print(f"max diff: {diff.max()}")
print(f"median diff: {diff.median()}")
print(f"mean diff: {diff.mean()}")
# max diff: 365.46533203125
# median diff: 44.0756950378418
# mean diff: 52.17301559448242
Hi,
Thank you for your great work.
Where is the implementation code for the "Linear Deep Dry" method?
Is it in "OREPA_LargeConvBase"?
Hi, I'm wondering if you've run into any issues with numerical stability or know what may be the cause.
With normal RepVGG, I get differences as high as 4e-4 comparing before and after switching to deploy. After changing first conv to OREPA_LargeConv
, I get errors as high as 2e-3. After changing the 1x1 conv in the RepVGG block to OREPA_1x1
, I get differences as high as 0.1.
It seems numerical stability makes it challenging to use identity + OREPA_1x1 + OREPA_3x3
blocks in RepVGG style model. Any thoughts about why?
Hi, thanks for your great work!
I tried to reproduce the visualization of branch-level similarity of OREPA blocks, but the unexpected results emerged.
Could you share details about it?
Hello, impressed by your block squeezing and block linearization idea, but the memory usage of bn implemented by pytorch is some kind of weird, which is not IN-PLACE, which allocates an input tensor size memory buffer for output, thus doubles the memory consumption. please refer to https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/cudnn/BatchNorm.cpp so i think your comparison is not fair, to some extent.
Proposition 1 A single-branch linear mapping, when re-parameterizing parts or all of it by over-two-layer multi-branch topologies, the entire end-to-end weight matrix will
be differently optimized. If one layer of the mapping is re-parameterized to up-to-one-layer multi-branch topologies,
the optimization will remain unchanged.
你好:
我看文章里OREPA和RepVGG结合时,是直接在conv_33上加OREPA, 而不是直接将conv_33/conv_11/identity三个分支换成OREPA的形式。请问这样做的是因为直接将conv_33/conv_1*1/identity三个分支换成OREPA进行训练效果不好么?
作者你好,非常喜歡你這篇文章的idea,然後我現在是想extend到3D,但是我沒有太看懂在OREPA function裡的self.fre_init的作用,可以再解釋一下ma? 然後對於prior_tensor,我們能變成3D的嗎?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.