dmlc / mxnet-memonger Goto Github PK

Sublinear memory optimization for deep learning, reduce GPU memory cost to train deeper nets

License: Apache License 2.0

Python 100.00%

mxnet-memonger's Introduction

Distributed Machine Learning Common Codebase

DMLC-Core is the backbone library to support all DMLC projects, offers the bricks to build efficient and scalable distributed machine learning libraries.

Developer Channel

What's New

Note on Parameter Module for Machine Learning

Known Issues

RecordIO format is not portable across different processor endians. So it is not possible to save RecordIO file on a x86 machine and then load it on a SPARC machine, because x86 is little endian while SPARC is big endian.

Contributing

Contributing to dmlc-core is welcomed! dmlc-core follows google's C style guide. If you are interested in contributing, take a look at feature wishlist and open a new issue if you like to add something.

DMLC-Core uses C++11 standard. Ensure that your C++ compiler supports C++11.
Try to introduce minimum dependency when possible

CheckList before submit code

Type make lint and fix all the style problems.
Type make doc and fix all the warnings.

NOTE

deps:

libcurl4-openssl-dev

mxnet-memonger's People

Contributors

Stargazers

Watchers

Forkers

chenglongchen baiyancheng20 wanjinchang hitluobin bygreencn gzzgz yanqingmen nieshaoshuai cysu leliaonvidia roborthuan arsenluca my777777 thomasdic2000 tsingjinyun esraaragaa robindong leotam codeaudit hyuen xuchaoxiong hao-zi mike07026 hectorsun pangye angzz absorbguo scapeqin hongdayu ixhorse feitiandemiaomi afcarl xiaotaochen irvingshu engineerfh 501177639 wjsxlb2017 pgsrv melodylail trantorrepository gm19900510 ddeeppnneett jianweilin lizonghang clhne pengxin99 sowenkin govan111 happyfee zhen-jia googol-lab chidaidai liuxiaoxuanpku amr-devman isabella232 qingshanyinyin shuo-ouyang phelpsou fantasyrain ahoython joancafom

mxnet-memonger's Issues

Dose mxnet memonger work for mxnet gluon?

Description

I want to reduce gpu memory costing when using gluon, I tryed MXNet memonger but it did not work for me, After that I setting os.environ['MXNET_BACKWARD_DO_MIRROR'] = '1', But it not work for me too.

Environment info (Required)

----------Python Info----------
('Version      :', '2.7.5')
('Compiler     :', 'GCC 4.8.5 20150623 (Red Hat 4.8.5-11)')
('Build        :', ('default', 'Nov  6 2016 00:28:07'))
('Arch         :', ('64bit', 'ELF'))
------------Pip Info-----------
('Version      :', '9.0.1')
('Directory    :', '/usr/lib/python2.7/site-packages/pip')
----------MXNet Info-----------
('Version      :', '1.2.0')
('Directory    :', '/home/yinghuang/incubator-mxnet-newest/python/mxnet')
Hashtag not found. Not installed from pre-built package.
----------System Info----------
('Platform     :', 'Linux-3.10.0-327.22.2.el7.x86_64-x86_64-with-centos-7.2.1511-Core')
('system       :', 'Linux')
('node         :', 'gz-open-gpu-c117')
('release      :', '3.10.0-327.22.2.el7.x86_64')
('version      :', '#1 SMP Thu Jun 23 17:05:11 UTC 2016')
----------Hardware Info----------
('machine      :', 'x86_64')
('processor    :', 'x86_64')
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                32
On-line CPU(s) list:   0-31
Thread(s) per core:    2
Core(s) per socket:    8
座：                 2
NUMA 节点：         2
厂商 ID：           GenuineIntel
CPU 系列：          6
型号：              62
型号名称：        Genuine Intel(R) CPU  @ 2.80GHz
步进：              2
CPU MHz：             1706.250
BogoMIPS：            5617.25
虚拟化：           VT-x
L1d 缓存：          32K
L1i 缓存：          32K
L2 缓存：           256K
L3 缓存：           25600K
NUMA 节点0 CPU：    0-7,16-23
NUMA 节点1 CPU：    8-15,24-31
----------Network Test----------
Setting timeout: 10
Timing for MXNet: https://github.com/apache/incubator-mxnet, DNS: 0.0234 sec, LOAD: 1.4669 sec.
Timing for PYPI: https://pypi.python.org/pypi/pip, DNS: 0.0253 sec, LOAD: 0.4839 sec.
Timing for FashionMNIST: https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/dataset/fashion-mnist/train-labels-idx1-ubyte.gz, DNS: 0.3580 sec, LOAD: 2.5292 sec.
Timing for Conda: https://repo.continuum.io/pkgs/free/, DNS: 0.0229 sec, LOAD: 0.8054 sec.
Timing for Gluon Tutorial(en): http://gluon.mxnet.io, DNS: 0.7395 sec, LOAD: 1.3768 sec.
Timing for Gluon Tutorial(cn): https://zh.gluon.ai, DNS: 0.0230 sec, LOAD: 3.4856 sec.

question definition

I am using python 2 and newest mxnet gluon 3D convolution to do video action recognition. I want gpu memory to be effiently used, 1) so I try [memonger] to optimize the sym the hybridblock generated, but with no effect after hard working. 2) Then I try to save memory by only setting the environment variable MXNET_BACKWARD_DO_MIRROR to be '1', the memory not reduced too. Can you show me what I have missed for memory saving. Thank you very much.

Steps to reproduce

here is my code of my network


import sys
import os


mxnet_path = os.path.expanduser('~') + '/incubator-mxnet-newest'
sys.path.insert(0, os.path.abspath(os.path.join(mxnet_path, "python")))
import mxnet as mx
from mxnet.gluon import nn
from mxnet import nd
from mxnet.gluon.block import _flatten,_regroup

import memonger


def bn_relu_conv(ks, nout, stride, pad,  name=None):
    layer = nn.HybridSequential()
    layer.add(nn.BatchNorm())
    layer.add(nn.Activation('relu'))
    layer.add(nn.Conv3D(channels=nout, kernel_size=ks, padding=pad, strides=stride))
    return layer

def bn_relu_block(growth_rate):
    layer = nn.HybridSequential()
    layer.add(bn_relu_conv(1, nout=growth_rate, stride=1, pad=0))
    layer.add(bn_relu_conv(3, nout=growth_rate, stride=1, pad=1))
    return layer

def conv_act_layer(channels, kernel=(1,1,1) , pad=(0,0,0), stride=(1,1,1), act_type="relu", use_batchnorm=False):
    layer = nn.HybridSequential()
    layer.add(nn.Conv3D(channels=channels, kernel_size=kernel, padding=pad, strides=stride))
    layer.add(nn.BatchNorm())
    layer.add(nn.Activation(act_type))
    return layer

def transition(channels):
    transition_layer = nn.HybridSequential()
    transition_layer.add(bn_relu_conv(ks=1, nout=channels, stride=1, pad=0))
    transition_layer.add(nn.MaxPool3D(pool_size=2, strides=2))
    return transition_layer

def transition_w_o_pooling(channels):
    layer = bn_relu_conv(ks=1, nout=channels, stride=1, pad=0)
    return layer


class DsodBlock(nn.HybridBlock):
    def __init__(self, layers, growth_rate, use_memonger=False, **kwargs):
        super(DsodBlock, self).__init__(**kwargs)
        self.use_memonger = use_memonger
        self.net = nn.HybridSequential()
        for i in range(layers):
            lay = bn_relu_block(growth_rate)
            self.net.add(lay)

    def hybrid_forward(self, F, x):
        for idx, layer in enumerate(self.net):
            out = layer(x)
            x = F.concat(x, out, dim=1)
            if self.use_memonger and (idx % 2 == 0):
                #print("use memonger true")
                x._set_attr(mirror_stage='True')

        return x

class DenseNet(nn.HybridBlock):
    def __init__(self, net_def, num_classes, growth_rate, use_memonger=False, batch_size=32, input_depth=16, input_size=112, **kwargs):
        super(DenseNet, self).__init__(**kwargs)
        channels = 128
        self.use_memonger = use_memonger
        self.batch_size = batch_size
        self.input_depth = input_depth
        self.input_size = input_size

        #assert self.use_memonger
        with self.name_scope():
            self.features = nn.HybridSequential(prefix='')
            self.features.add(conv_act_layer(64, kernel=3, pad=1, stride=2, act_type="relu", use_batchnorm=True))
            self.features.add(conv_act_layer(64, kernel=3, pad=1, stride=1, act_type="relu", use_batchnorm=True))
            self.features.add(conv_act_layer(128, kernel=3, pad=1, stride=1, act_type="relu", use_batchnorm=True))

            self.features.add(nn.MaxPool3D(pool_size=2, strides=2, padding=1))

            for i,(dense_layers,transition_fun) in enumerate(net_def):
                self.features.add(DsodBlock(layers=dense_layers, growth_rate=growth_rate, use_memonger=use_memonger))
                channels += growth_rate*dense_layers
                self.features.add(transition_fun(channels))

            self.features.add(nn.BatchNorm())
            self.features.add(nn.Activation('relu'))
            self.features.add(nn.GlobalAvgPool3D())
            self.features.add(nn.Flatten())

            self.output = nn.Dense(num_classes, in_units=channels)

    def _get_graph(self, *args):
        #assert False
        if not self._cached_graph:
            args, self._in_format = _flatten(args)
            if len(args) > 1:
                inputs = [mx.symbol.var('data%d'%i) for i in range(len(args))]
            else:
                inputs = [mx.symbol.var('data')]
            grouped_inputs = _regroup(inputs, self._in_format)[0]

            params = {i: j.var() for i, j in self._reg_params.items()}
            with self.name_scope():
                out = self.hybrid_forward(mx.symbol, *grouped_inputs, **params)  # pylint: disable=no-value-for-parameter
            out, self._out_format = _flatten(out)

            assert len(out) == 1
            if self.use_memonger:
                assert len(inputs) == 1
                out = memonger.search_plan(out[0], data=(self.batch_size, 3, self.input_depth, self.input_size, self.input_size))
                out = [out]

            self._cached_graph = inputs, out[0] #mx.symbol.Group(out)

        return self._cached_graph


    def hybrid_forward(self, F, x):
        x = self.features(x)
        x = self.output(x)
        return x


def dsod_net(net_def, num_classes, growth_rate=64):
    growth_rate = growth_rate
    channels = 128
    net = nn.HybridSequential()
    with net.name_scope():
        ## dsod backbone
        net.add(conv_act_layer(64, kernel=3, pad=1, stride=2, act_type="relu", use_batchnorm=True))
        net.add(conv_act_layer(64, kernel=3, pad=1, stride=1, act_type="relu", use_batchnorm=True))
        net.add(conv_act_layer(128, kernel=3, pad=1, stride=1, act_type="relu", use_batchnorm=True))

        net.add(nn.MaxPool3D(pool_size=2, strides=2, padding=1))

        for i,(dense_layers,transition_fun) in enumerate(net_def):
            net.add(DsodBlock(layers=dense_layers, growth_rate=growth_rate))
            channels += growth_rate*dense_layers
            net.add(transition_fun(channels))

        classifier = nn.HybridSequential()
        classifier.add(nn.BatchNorm())
        classifier.add(nn.Activation('relu'))
        classifier.add(nn.GlobalAvgPool3D())
        classifier.add(nn.Flatten())
        classifier.add(nn.Dense(num_classes))

        net.add(classifier)

    return net

def dsod_net_v2(net_def, num_classes, growth_rate=64, use_memonger=False, **kwargs):
    net = DenseNet(net_def, num_classes, growth_rate, use_memonger, **kwargs)
    return net

def get_net(net_depth, num_classes, hybridize=True, growth_rate=64, **kwargs):
    densenet_spec = {30:[(6,transition), (8,transition), (8,transition_w_o_pooling), (8,transition_w_o_pooling)],
                     22:[(6,transition), (8,transition), (8,transition_w_o_pooling)]}
    net_def =  densenet_spec[net_depth]
    #net = dsod_net(net_def, num_classes, growth_rate)
    net = dsod_net_v2(net_def, num_classes, growth_rate, False, **kwargs)

    if hybridize:
        net.hybridize()
    return net


if __name__ == '__main__':
    dsod = get_net(22, 101, True, 32)
    #print dsod
    dsod.initialize(ctx=mx.gpu(4))
    x = mx.nd.ones((32,3,16,112,112), ctx=mx.gpu(4))
    res = dsod(x)
    #print res.shape
    #print res

By using the memonger, the network could not be finetuned?

When I use the memonger, and I use the function of load_checkpoint(), but in this line
assert sys.tojson() == network.tojson()
is failed. This means that the structure of the network using the memonger is not match with the pre-trained network. So I do not know what is the reason?

it seems doesn't work on googlenet

# my example_googlenet.py as follows:

"""References:

Szegedy, Christian, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir
Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. "Going deeper
with convolutions." arXiv preprint arXiv:1409.4842 (2014).

"""
import sys
sys.path.append('../../python/')
import math
import mxnet as mx
import memonger

def Conv(data, num_filter, kernel=(1, 1), stride=(1, 1), pad=(0, 0), name=None, suffix=''):
conv = mx.sym.Convolution(data=data, num_filter=num_filter, kernel=kernel, stride=stride, pad=pad, no_bias=True, name='%s%s_conv2d' %(name, suffix))
bn = mx.sym.BatchNorm(data=conv, name='%s%s_batchnorm' %(name, suffix), fix_gamma=True)
act = mx.sym.Activation(data=bn, act_type='relu', name='%s%s_relu' %(name, suffix))
return act

def Inception7A(data,
num_1x1,
num_3x3_red, num_3x3_1, num_3x3_2,
num_5x5_red, num_5x5,
pool, proj,
name):
tower_1x1 = Conv(data, num_1x1, name=('%s_conv' % name))
tower_5x5 = Conv(data, num_5x5_red, name=('%s_tower' % name), suffix='_conv')
tower_5x5 = Conv(tower_5x5, num_5x5, kernel=(5, 5), pad=(2, 2), name=('%s_tower' % name), suffix='_conv_1')
tower_3x3 = Conv(data, num_3x3_red, name=('%s_tower_1' % name), suffix='_conv')
tower_3x3 = Conv(tower_3x3, num_3x3_1, kernel=(3, 3), pad=(1, 1), name=('%s_tower_1' % name), suffix='_conv_1')
tower_3x3 = Conv(tower_3x3, num_3x3_2, kernel=(3, 3), pad=(1, 1), name=('%s_tower_1' % name), suffix='conv_2')
pooling = mx.sym.Pooling(data=data, kernel=(3, 3), stride=(1, 1), pad=(1, 1), pool_type=pool, name=('%s_pool%s_pool' % (pool, name)))
cproj = Conv(pooling, proj, name=('%s_tower_2' % name), suffix='conv')
concat = mx.sym.Concat(*[tower_1x1, tower_5x5, tower_3x3, cproj], name='ch_concat%s_chconcat' % name)
return concat

First Downsample

def Inception7B(data,
num_3x3,
num_d3x3_red, num_d3x3_1, num_d3x3_2,
pool,
name):
tower_3x3 = Conv(data, num_3x3, kernel=(3, 3), pad=(0, 0), stride=(2, 2), name=('%s_conv' % name))
tower_d3x3 = Conv(data, num_d3x3_red, name=('%s_tower' % name), suffix='_conv')
tower_d3x3 = Conv(tower_d3x3, num_d3x3_1, kernel=(3, 3), pad=(1, 1), stride=(1, 1), name=('%s_tower' % name), suffix='conv_1')
tower_d3x3 = Conv(tower_d3x3, num_d3x3_2, kernel=(3, 3), pad=(0, 0), stride=(2, 2), name=('%s_tower' % name), suffix='conv_2')
pooling = mx.symbol.Pooling(data=data, kernel=(3, 3), stride=(2, 2), pad=(0,0), pool_type="max", name=('max_pool%s_pool' % name))
concat = mx.sym.Concat(*[tower_3x3, tower_d3x3, pooling], name='ch_concat%s_chconcat' % name)
return concat

def Inception7C(data,
num_1x1,
num_d7_red, num_d7_1, num_d7_2,
num_q7_red, num_q7_1, num_q7_2, num_q7_3, num_q7_4,
pool, proj,
name):
tower_1x1 = Conv(data=data, num_filter=num_1x1, kernel=(1, 1), name=('%s_conv' % name))
tower_d7 = Conv(data=data, num_filter=num_d7_red, name=('%s_tower' % name), suffix='_conv')
tower_d7 = Conv(data=tower_d7, num_filter=num_d7_1, kernel=(1, 7), pad=(0, 3), name=('%s_tower' % name), suffix='_conv_1')
tower_d7 = Conv(data=tower_d7, num_filter=num_d7_2, kernel=(7, 1), pad=(3, 0), name=('%s_tower' % name), suffix='_conv_2')
tower_q7 = Conv(data=data, num_filter=num_q7_red, name=('%s_tower_1' % name), suffix='_conv')
tower_q7 = Conv(data=tower_q7, num_filter=num_q7_1, kernel=(7, 1), pad=(3, 0), name=('%s_tower_1' % name), suffix='_conv_1')
tower_q7 = Conv(data=tower_q7, num_filter=num_q7_2, kernel=(1, 7), pad=(0, 3), name=('%s_tower_1' % name), suffix='_conv_2')
tower_q7 = Conv(data=tower_q7, num_filter=num_q7_3, kernel=(7, 1), pad=(3, 0), name=('%s_tower_1' % name), suffix='_conv_3')
tower_q7 = Conv(data=tower_q7, num_filter=num_q7_4, kernel=(1, 7), pad=(0, 3), name=('%s_tower_1' % name), suffix='conv_4')
pooling = mx.sym.Pooling(data=data, kernel=(3, 3), stride=(1, 1), pad=(1, 1), pool_type=pool, name=('%s_pool%s_pool' % (pool, name)))
cproj = Conv(data=pooling, num_filter=proj, kernel=(1, 1), name=('%s_tower_2' % name), suffix='conv')
# concat
concat = mx.sym.Concat(*[tower_1x1, tower_d7, tower_q7, cproj], name='ch_concat%s_chconcat' % name)
return concat

def Inception7D(data,
num_3x3_red, num_3x3,
num_d7_3x3_red, num_d7_1, num_d7_2, num_d7_3x3,
pool,
name):
tower_3x3 = Conv(data=data, num_filter=num_3x3_red, name=('%s_tower' % name), suffix='_conv')
tower_3x3 = Conv(data=tower_3x3, num_filter=num_3x3, kernel=(3, 3), pad=(0,0), stride=(2, 2), name=('%s_tower' % name), suffix='_conv_1')
tower_d7_3x3 = Conv(data=data, num_filter=num_d7_3x3_red, name=('%s_tower_1' % name), suffix='_conv')
tower_d7_3x3 = Conv(data=tower_d7_3x3, num_filter=num_d7_1, kernel=(1, 7), pad=(0, 3), name=('%s_tower_1' % name), suffix='_conv_1')
tower_d7_3x3 = Conv(data=tower_d7_3x3, num_filter=num_d7_2, kernel=(7, 1), pad=(3, 0), name=('%s_tower_1' % name), suffix='conv_2')
tower_d7_3x3 = Conv(data=tower_d7_3x3, num_filter=num_d7_3x3, kernel=(3, 3), stride=(2, 2), name=('%s_tower_1' % name), suffix='conv_3')
pooling = mx.sym.Pooling(data=data, kernel=(3, 3), stride=(2, 2), pool_type=pool, name=('%s_pool%s_pool' % (pool, name)))
# concat
concat = mx.sym.Concat(*[tower_3x3, tower_d7_3x3, pooling], name='ch_concat%s_chconcat' % name)
return concat

def Inception7E(data,
num_1x1,
num_d3_red, num_d3_1, num_d3_2,
num_3x3_d3_red, num_3x3, num_3x3_d3_1, num_3x3_d3_2,
pool, proj,
name):
tower_1x1 = Conv(data=data, num_filter=num_1x1, kernel=(1, 1), name=('%s_conv' % name))
tower_d3 = Conv(data=data, num_filter=num_d3_red, name=('%s_tower' % name), suffix='_conv')
tower_d3_a = Conv(data=tower_d3, num_filter=num_d3_1, kernel=(1, 3), pad=(0, 1), name=('%s_tower' % name), suffix='_mixed_conv')
tower_d3_b = Conv(data=tower_d3, num_filter=num_d3_2, kernel=(3, 1), pad=(1, 0), name=('%s_tower' % name), suffix='_mixed_conv_1')
tower_3x3_d3 = Conv(data=data, num_filter=num_3x3_d3_red, name=('%s_tower_1' % name), suffix='_conv')
tower_3x3_d3 = Conv(data=tower_3x3_d3, num_filter=num_3x3, kernel=(3, 3), pad=(1, 1), name=('%s_tower_1' % name), suffix='_conv_1')
tower_3x3_d3_a = Conv(data=tower_3x3_d3, num_filter=num_3x3_d3_1, kernel=(1, 3), pad=(0, 1), name=('%s_tower_1' % name), suffix='_mixed_conv')
tower_3x3_d3_b = Conv(data=tower_3x3_d3, num_filter=num_3x3_d3_2, kernel=(3, 1), pad=(1, 0), name=('%s_tower_1' % name), suffix='mixed_conv_1')
pooling = mx.sym.Pooling(data=data, kernel=(3, 3), stride=(1, 1), pad=(1, 1), pool_type=pool, name=('%s_pool%s_pool' % (pool, name)))
cproj = Conv(data=pooling, num_filter=proj, kernel=(1, 1), name=('%s_tower_2' % name), suffix='conv')
# concat
concat = mx.sym.Concat(*[tower_1x1, tower_d3_a, tower_d3_b, tower_3x3_d3_a, tower_3x3_d3_b, cproj], name='ch_concat%s_chconcat' % name)
return concat

In[49]:

def get_symbol(num_classes=1000, **kwargs):
data = mx.symbol.Variable(name="data")
# stage 1
conv = Conv(data, 32, kernel=(3, 3), stride=(2, 2), name="conv")
conv_1 = Conv(conv, 32, kernel=(3, 3), name="conv_1")
conv_2 = Conv(conv_1, 64, kernel=(3, 3), pad=(1, 1), name="conv_2")
pool = mx.sym.Pooling(data=conv_2, kernel=(3, 3), stride=(2, 2), pool_type="max", name="pool")
# stage 2
conv_3 = Conv(pool, 80, kernel=(1, 1), name="conv_3")
conv_4 = Conv(conv_3, 192, kernel=(3, 3), name="conv_4")
pool1 = mx.sym.Pooling(data=conv_4, kernel=(3, 3), stride=(2, 2), pool_type="max", name="pool1")
# stage 3
in3a = Inception7A(pool1, 64,
64, 96, 96,
48, 64,
"avg", 32, "mixed")
in3b = Inception7A(in3a, 64,
64, 96, 96,
48, 64,
"avg", 64, "mixed_1")
in3c = Inception7A(in3b, 64,
64, 96, 96,
48, 64,
"avg", 64, "mixed_2")
in3d = Inception7B(in3c, 384,
64, 96, 96,
"max", "mixed_3")
# stage 4
in4a = Inception7C(in3d, 192,
128, 128, 192,
128, 128, 128, 128, 192,
"avg", 192, "mixed_4")
in4b = Inception7C(in4a, 192,
160, 160, 192,
160, 160, 160, 160, 192,
"avg", 192, "mixed_5")
in4c = Inception7C(in4b, 192,
160, 160, 192,
160, 160, 160, 160, 192,
"avg", 192, "mixed_6")
in4d = Inception7C(in4c, 192,
192, 192, 192,
192, 192, 192, 192, 192,
"avg", 192, "mixed_7")
in4e = Inception7D(in4d, 192, 320,
192, 192, 192, 192,
"max", "mixed_8")
# stage 5
in5a = Inception7E(in4e, 320,
384, 384, 384,
448, 384, 384, 384,
"avg", 192, "mixed_9")
in5b = Inception7E(in5a, 320,
384, 384, 384,
448, 384, 384, 384,
"max", 192, "mixed_10")
# pool
pool = mx.sym.Pooling(data=in5b, kernel=(8, 8), stride=(1, 1), pool_type="avg", name="global_pool")
flatten = mx.sym.Flatten(data=pool, name="flatten")
fc1 = mx.symbol.FullyConnected(data=flatten, num_hidden=num_classes, name='fc1')
softmax = mx.symbol.SoftmaxOutput(data=fc1, name='softmax')
return softmax

batch_size = 32
image_shape = [3, 224, 224]
num_classes = 1000

net = get_symbol(num_classes)
print net
dshape = (32, 3, 227, 227)

net_mem_planned = memonger.search_plan(net, data=dshape)
old_cost = memonger.get_cost(net, data=dshape)
new_cost = memonger.get_cost(net_mem_planned, data=dshape)

print('Old feature map cost=%d MB' % old_cost)
print('New feature map cost=%d MB' % new_cost)

errors below:

/home/user/anaconda2/bin/python /home/user/workspace/mxnet-std/example/mxnet-memonger/example_googlenet.py

[11:30:47] /home/user/workspace/mxnet-std/dmlc-core/include/dmlc/logging.h:304: [11:30:47] src/operator/./pooling-inl.h:196: Check failed: param_.kernel[0] <= dshape[2] + 2 * param_.pad[0] kernel size (8) exceeds input (5 padded to 5)

Stack trace returned 10 entries:
[bt] (0) /home/user/workspace/mxnet-std/python/mxnet/../../lib/libmxnet.so(_ZN4dmlc15LogMessageFatalD1Ev+0x3c) [0x7f738b405c2c]
[bt] (1) /home/user/workspace/mxnet-std/python/mxnet/../../lib/libmxnet.so(ZNK5mxnet2op11PoolingProp10InferShapeEPSt6vectorIN4nnvm6TShapeESaIS4_EES7_S7+0x93b) [0x7f738c35610b]
[bt] (2) /home/user/workspace/mxnet-std/python/mxnet/../../lib/libmxnet.so(+0x149f387) [0x7f738c181387]
[bt] (3) /home/user/workspace/mxnet-std/python/mxnet/../../lib/libmxnet.so(+0x266f6a1) [0x7f738d3516a1]
[bt] (4) /home/user/workspace/mxnet-std/python/mxnet/../../lib/libmxnet.so(+0x2670eb2) [0x7f738d352eb2]
[bt] (5) /home/user/workspace/mxnet-std/python/mxnet/../../lib/libmxnet.so(+0x26717f6) [0x7f738d3537f6]
[bt] (6) /home/user/workspace/mxnet-std/python/mxnet/../../lib/libmxnet.so(_ZN4nnvm11ApplyPassesENS_5GraphERKSt6vectorISsSaISsEE+0x518) [0x7f738d33c6c8]
[bt] (7) /home/user/workspace/mxnet-std/python/mxnet/../../lib/libmxnet.so(_ZN4nnvm9ApplyPassENS_5GraphERKSs+0x8e) [0x7f738c015f5e]
[bt] (8) /home/user/workspace/mxnet-std/python/mxnet/../../lib/libmxnet.so(_ZN4nnvm4pass10InferShapeENS_5GraphESt6vectorINS_6TShapeESaIS3_EESs+0x24e) [0x7f738c0187ae]
[bt] (9) /home/user/workspace/mxnet-std/python/mxnet/../../lib/libmxnet.so(MXSymbolInferShape+0x156f) [0x7f738c01119f]

Traceback (most recent call last):
File "/home/user/workspace/mxnet-std/example/mxnet-memonger/example_googlenet.py", line 187, in
net_mem_planned = memonger.search_plan(net, data=dshape)
File "/home/user/workspace/mxnet-std/example/mxnet-memonger/memonger.py", line 140, in search_plan
sym = make_mirror_plan(sym, threshold=threshold, plan_info=info, **kwargs)
File "/home/user/workspace/mxnet-std/example/mxnet-memonger/memonger.py", line 62, in make_mirror_plan
_, out_shapes, _ = internals.infer_shape(**kwargs)
File "../../python/mxnet/symbol.py", line 932, in infer_shape
res = self._infer_shape_impl(False, *args, **kwargs)
File "../../python/mxnet/symbol.py", line 1058, in infer_shape_impl
ctypes.byref(complete)))
File "../../python/mxnet/base.py", line 85, in check_call
raise MXNetError(py_str(LIB.MXGetLastError()))
mxnet.base.MXNetError: Error in operator global_pool: [11:30:47] src/operator/./pooling-inl.h:196: Check failed: param.kernel[0] <= dshape[2] + 2 * param.pad[0] kernel size (8) exceeds input (5 padded to 5)

infer_shape error. Arguments:
data: (32, 3, 227, 227)

Process finished with exit code 1

I really appreciate it for who can solve this

A activation should be applied to sum of residual and shortcut in the resnetv1 example

how to install memonger

is memonger installed by default with mxnet ?

i can't import

>>> import mxnet
>>> import memonger
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ImportError: No module named memonger

What does the "memonger" mean?

Is the "memonger" short for something?
memo + nger?

How to use on DDP

I used it on DDP ,and an error came out, I dont know if it can be used on DDP.
Error maybe occurred in loss.backward(). Thank you !

RuntimeError: Expected to mark a variable ready only once.
This error is caused by use of a module parameter outside the forward function.
The return value of the forward function is inspected by the distributed data parallel
wrapper to figure out if any of the module's parameters went unused.
If this is the case, it knows they won't receive gradients in a backward pass.
If any of those parameters are then used outside forward, this error condition is triggered.
You can disable unused parameter detection by passing the keyword argument
find_unused_parameters=False to torch.nn.parallel.DistributedDataParallel.

cost in example

Hi,
I tried runing both memonger examples using mxnet, however I got same cost for old and new feature map for both examples. Did I miss any parameters in the example code? I saw the threshold in search_plan, but it is set to zero in the function. So I'm wondering how to use this example? Also, is there any example based on the symbol_*.py files? I tried to use memonger in train_cifar10.py. It crashed on line63 creating shape dict with input out_shapes not iteratable.

If I feed the planned net to mx.FeedForward() it would report:

 [23:14:29] src/operator/./pooling-inl.h:166: Check failed: dshape.ndim() >= 3U (0 vs. 3) Pooling: Input data should be  3D in (batch, channel, x) Or 4D in (batch, channel, y, x)  Or 5D in (batch, channel, d, y, x)

while everything works well if I don't use memonger.

further information:

infer_shape error. Arguments:
Traceback (most recent call last):
  File "train_densenet.py", line 240, in <module>
    main()
  File "train_densenet.py", line 161, in main
    net_planned = memonger.search_plan(symbol)
  File "/home/taineleau/densenet.mxnet/memonger.py", line 140, in search_plan
    sym = make_mirror_plan(sym, threshold=threshold, plan_info=info, **kwargs)
  File "/home/taineleau/densenet.mxnet/memonger.py", line 62, in make_mirror_plan
    _, out_shapes, _ = internals.infer_shape(**kwargs)
  File "/home/taineleau/.local/lib/python2.7/site-packages/mxnet-0.10.1-py2.7.egg/mxnet/symbol.py", line 878, in infer_shape
    res = self._infer_shape_impl(False, *args, **kwargs)
  File "/home/taineleau/.local/lib/python2.7/site-packages/mxnet-0.10.1-py2.7.egg/mxnet/symbol.py", line 1004, in _infer_shape_impl
    ctypes.byref(complete)))
  File "/home/taineleau/.local/lib/python2.7/site-packages/mxnet-0.10.1-py2.7.egg/mxnet/base.py", line 85, in check_call
    raise MXNetError(py_str(_LIB.MXGetLastError()))
mxnet.base.MXNetError: Error in operator TBstage1_pool1: [23:14:29] src/operator/./pooling-inl.h:166: Check failed: dshape.ndim() >= 3U (0 vs. 3) Pooling: Input data should be  3D in (batch, channel, x) Or 4D in (batch, channel, y, x)  Or 5D in (batch, channel, d, y, x)