michaelsdr / momentumnet Goto Github PK

View Code? Open in Web Editor NEW

207.0 207.0 19.0 4.5 MB

Drop-in replacement for any ResNet with a significantly reduced memory footprint and better representation capabilities

Home Page: https://michaelsdr.github.io/momentumnet/

License: MIT License

Python 98.41% Makefile 1.59%

momentumnet's People

Contributors

Stargazers

Watchers

Forkers

linhduongtuan hehongjie peerdavid stjordanis exp-deeplearning-tools flyingbird93 lmy0217 abdoujaouhar liyang-whu fan-haijun alein-gfanab mortal12138 izzatum sanjingshou11 wwhappylife logichen guillaumestaermanml

momentumnet's Issues

Figure 9 Experiments code

Is there any code for the experiment on figure 9 of the paper?
I want to compare three models in Figure 9.

Error in "drop_in_replacement_tutorial.py" ?

In this example setting gamma=0.0 lead to mnet1 and net having two different outputs.

However, everything is fine with resnet18.

(it seems that it apply the residual connection of ResBlock after TransformMemory)

add a warning when gamma is too small

Setup website with sphinx + gh-pages

图一含义

（1）不明白图一和论文的关系。（2）图一左右两边想表达什么意思。右边能表达更复杂的动力学行为？

Wrong description about gamma in the api document

I found that in the code:
v_{n+1} = gamma * v_n + (1-gamma) * f

However, in the document, it is written that:
v_{n+1} = (1-gamma) * v_n + gamma * f

Add method for computing inverse

Hi thanks so much for your work!

Is it possible to add a method for computing the inverse of a layer's activations? Both the iResNet and iRevNet repos have this method. Adding this would really help me experiment with your model more.

Thanks!

Make a flexible wrapper to transform a general resnet into a momnet

Query about Momentum DenseNets

Dear Michael,

Thank you for making your work publicly available. I was wondering your opinion on including momentum in DenseNets. More specifically, do you think that adding momentum in each layers that are connected to every layer in a neural model would yield similar gains in memory?

Add a single class for all methods

Add docstrings to all public classes

Missing baseline?

Thanks for your interesting work!

The Reformer uses RevNet in a clever way. They double the dimension of x such that for x1,x2=split(x) both x1 and x2 have the same dimension as the original x. This gives their invertible architecture the "same parameters" as the initial architecture. Let's call this ReformerRevNet.

Question 0. In Table 2, RevNet differs to MomentumNet only in the row "same parameters". I don't see why ReformerRevNet and MomNet would be different in Table 2?

Question 1. Is there any reason this ReformerRevNet baseline was not included?

Apologies for any misunderstanding.

Large memory consumption

I tested the memory consumption of resnet 101 and the corresponding momentum version using the scripts below. However, difference in memory consumption between these 2 models is only a few hundreds MBs.

Pytorch version: 1.9
CUDA toolkit: 11.1
GPU: GTX 1080 Ti
Python 3.7
Ubuntu 18.04

from momentumnet import transform_to_momentumnet
from torchvision.models import resnet101

resnet = resnet101(pretrained=True).to(0)
mresnet101 = transform_to_momentumnet(resnet, gamma=0.99998, use_backprop=False)
#mresnet101 = resnet101(pretrained=True).to(0)
optim = torch.optim.Adam(mresnet101.parameters(), lr=1e-4)

for i in range(100):
    x = torch.randn(70, 3, 224, 224).cuda()
    y = mresnet101(x)
    y = (y**2).mean()
    optim.zero_grad()
    y.backward()
    optim.step()