Code Monkey home page Code Monkey logo

hpi-xnor / bmxnet Goto Github PK

View Code? Open in Web Editor NEW
349.0 33.0 95.0 40.63 MB

(New version is out: https://github.com/hpi-xnor/BMXNet-v2) BMXNet: An Open-Source Binary Neural Network Implementation Based on MXNet

License: Apache License 2.0

CMake 0.52% Makefile 0.33% R 2.12% C++ 35.10% Python 32.16% Java 0.13% Shell 1.43% Jupyter Notebook 8.70% Cuda 5.27% C 0.89% MATLAB 0.23% Scala 5.96% Batchfile 0.09% Perl 7.00% Perl 6 0.01% Groovy 0.01% Rebol 0.01% Dockerfile 0.08%
binary-neural-networks xnor-convolutions deep-learning mxnet

bmxnet's Introduction

xnor enhanced neural nets // Hasso Plattner Institute

A fork of the deep learning framework mxnet to study and implement quantization and binarization in neural networks.

Our current efforts are focused on binarizing the inputs and weights of convolutional layers, enabling the use of performant bit operations instead of expensive matrix multiplications as described in:

News

  • Dec 06, 2018 - BMXNet-v2

  • Dec 22, 2017 - MXNet v1.0.0 and cuDNN

    • We are updating the underlying MXNet to version 1.0.0, see changes and release notes here.
    • cuDNN is now supported in the training of binary networks, speeding up the training process by about 2x

Setup

We use cmake to build the project. Make sure to install all the dependencies described here.

Adjust settings in cmake (build-type Release or Debug, configure CUDA, OpenBLAS or Atlas, OpenCV, OpenMP etc.)

$ git clone --recursive https://github.com/hpi-xnor/mxnet.git # remember to include the --recursive
$ mkdir build/Release && cd build/Release
$ cmake ../../ # if any error occurs, apply ccmake or cmake-gui to adjust the cmake config.
$ ccmake . # or GUI cmake
$ make -j `nproc`

Build the MXNet Python binding

Step 1 Install prerequisites - python, setup-tools, python-pip and numpy.

$ sudo apt-get install -y python-dev python-setuptools python-numpy python-pip

Step 2 Install the MXNet Python binding.

$ cd <mxnet-root>/python
$ pip install --upgrade pip
$ pip install -e .

If your mxnet python binding still not works, you can add the location of the libray to your LD_LIBRARY_PATH as well as the mxnet python folder to your PYTHONPATH:

$ export LD_LIBRARY_PATH=<mxnet-root>/build/Release
$ export PYTHONPATH=<mxnet-root>/python

Docker

There is a simple Dockerfile that you can use to ease the setup process. Once running, find mxnet at /mxnet and the build folder at /mxnet/release. (Be warned though, CUDA will not work inside the container so training process can be quite tedious)

$ cd <mxnet-root>/smd_hpi/tools/docker
$ docker build -t mxnet
$ docker run -t -i mxnet

You probably also want to map a folder to share files (trained models) inside docker (-v <absolute local path>:/shared).

Usage

Our main contribution are drop-in replacements for the Convolution, FullyConnected and Activation layers of mxnet called QConvoluion, QFullyConnected and QActivation.

These can be used when specifying a model. They extend the parameters of their corresponding original layer of mxnet with act_bit for activations and weight_bit for weights.

Quantization

Set the parameter act_bit and weight_bit to a value between 1 and 32 to quantize the activations and weights to that bit widths.

The quantization on bit widths ranging from 2 to 31 bit is available mainly for scientific purpose. There is no speed or memory gain (rather the opposite since there are conversion steps) as the quantized values are still stored in full precision float variables.

Binarization

To binarize the weights first set weight_bit=1 and act_bit=1. Then train your network (you can use CUDA/CuDNN). The resulting .params file will contain binary weights, but still store a single weight in one float.

To convert your trained and saved network, call the model converter with your .params file:

$ <mxnet-root>/build/Release/smd_hpi/tools/model_converter mnist-0001.params

This will generate a new .params and .json file with prepended binarized_. This model file will use only 1 bit of runtime memory and storage for every weight in the convolutional layers.

We have example python scripts to train and validate resnet18 (cifar10, imagenet) and lenet (mnist) neural networks with binarized layers.

There are example applications running on iOS and Android that can utilize binarized networks. Find them in the following repos:

Have a look at our source, tools and examples to find out more.

Citing BMXNet

Please cite BMXNet in your publications if it helps your research work:

@inproceedings{bmxnet,
 author = {Yang, Haojin and Fritzsche, Martin and Bartz, Christian and Meinel, Christoph},
 title = {BMXNet: An Open-Source Binary Neural Network Implementation Based on MXNet},
 booktitle = {Proceedings of the 2017 ACM on Multimedia Conference},
 series = {MM '17},
 year = {2017},
 isbn = {978-1-4503-4906-2},
 location = {Mountain View, California, USA},
 pages = {1209--1212},
 numpages = {4},
 url = {http://doi.acm.org/10.1145/3123266.3129393},
 doi = {10.1145/3123266.3129393},
 acmid = {3129393},
 publisher = {ACM},
 address = {New York, NY, USA},
 keywords = {binary neural networks, computer vision, machine learning, open source},
} 

Reference

bmxnet's People

Contributors

antinucleon avatar cjolivier01 avatar eric-haibin-lin avatar hetong007 avatar hjk41 avatar hotpxl avatar jermainewang avatar kevinthesun avatar ldpe2g avatar mavenlin avatar mli avatar mtin avatar piiswrong avatar pluskid avatar ptrendx avatar qiaohaijun avatar roshrini avatar sandeep-krishnamurthy avatar sneakerkg avatar sxjscience avatar szha avatar terrytangyuan avatar thirdwing avatar tornadomeet avatar tqchen avatar vchuravy avatar winstywang avatar yajiedesign avatar yanqingmen avatar yzhliu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

bmxnet's Issues

A question about realization

I find that all layers only store quantized weights.
It's different from Dorefa-net, which also store origin weights and update them.
Will this affect the result?

Quantization method

I confused that why using the function 2^k-1 Instead of 2^k

take 2.34669849e-01 for example:

  1. using 2^k-1 :
    weight_bit=2 the value is equal to 0.3333333,but actually 0.33 can't use two bits to represent
  2. using 2^k :
    weight_bit=2 the value is equal to 0.25, it can use two bits (0.11) to represent

So why don't use 2^k? Did I make a mistake? thanks.

image

Support act_bit/weight_bit values that are not powers of two.

Currently, act_bit and weight_bit parameters may only be powers of two (1,2,4,8,16,32). Apart from the fact that other values will not evenly divide 64, there is no reason you could not support other values. Indeed, those interested in modeling possible ASIC/FPGA architectures aren't going to want to be limited to powers of two.

Obviously, this would require a little bit of work to figure out the bit-packing scheme and mask out unused bits.

Core dump during fine-tuning with bin-resnet18

Environment info

Operating System: Ubuntu 16.04

Compiler: gcc, g++ 5.4.0

Package used (Python/R/Scala/Julia): Python

MXNet version: BMXNet built from master. USE_CUDA=0, USE_CUDNN=0 USE_DIST_KVSTORE=0

If you are using python package, please provide

Python version and distribution: 3.6 Anaconda

Error Message:

Core dump in ipython/jupyter notebook. Running with NaiveEngine gives me

./tensor_blob.h:198: Check failed: mshadow::DataType<DType>::kFlag == type_flag_ TBlob.get_with_shape: data type do not match specified type.Expected: 1 v.s. given 0

which seems to be a known issue but not sure if it was resolved the BMXNet version of MXNet 0.10.1!

Minimum reproducible example

/smd_hpi/examples/binary-imagenet1k/fine-tune.py works for regular pretrained resnet downloaded from modelzoo, but get the core dump using bin-resnet18 in smd_hpi/binary_models/binarized_imagenet-resnet18-64bit-0040

Looking at the git history, it seems that you haven't tried the fine-tuning and the paper only mentions training. How can I correctly load the already converted (binarized) weights in smd_hpi/binary_models/ in mx.module and send batch of data to mod.forward(batch)?

Upgrade to MXNet 0.12.1 and beyond

Hi

Thanks for providing this great tool on top of MXNet. I'd like to know whether you have any plans to upgrade to the latest version of MXNet (now 0.12.1, and 1.0.0 to be come) to use better features of gluon for example? or if someone wants to do so, is there any recommendation asides from watching collisions, etc.?

How to integrate a trained binary model in iOS?

Hi,

Thanks for your great work! I try to retrain resnet 18 binary on imagenet. I can convert weight to binary storage by using model_convert function. However, when I use that model run on iOS, it does not work. Also, I try to run amalgamate_mxnet_mac.sh but it gives an error: "fatal error: src/ndarray/autograd.cc: No such file or directory". I run everything in Linux Ubuntu (except to iOS app in macOS). Could you show me the step by step to build everything on iOS.

Thanks,
Hai

Build failing

Description

Build failing

Environment info (Required)

anaconda3, python2.7

Build info (Required if built from source)

Compiler (gcc/clang/mingw/visual studio): gcc

MXNet commit hash:
3b9348d

Build config:
(Paste the content of config.mk, or the build command.)

Error Message:

[ 93%] Linking CXX executable dmlc_unit_tests
CMakeFiles/dmlc_unit_tests.dir/unittest_json.cc.o: In function `testing::AssertionResult testing::internal::CmpHelperEQ<std::vector<int, std::allocator<int> >, std::vector<int, std::allocator<int> > >(char const*, char const*, std::vector<int, std::allocator<int> > const&, std::vector<int, std::allocator<int> > const&)':
unittest_json.cc:(.text._ZN7testing8internal11CmpHelperEQISt6vectorIiSaIiEES4_EENS_15AssertionResultEPKcS7_RKT_RKT0_[_ZN7testing8internal11CmpHelperEQISt6vectorIiSaIiEES4_EENS_15AssertionResultEPKcS7_RKT_RKT0_]+0xd5): undefined reference to `testing::internal::EqFailure(char const*, char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool)'
CMakeFiles/dmlc_unit_tests.dir/unittest_json.cc.o: In function `std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > testing::PrintToString<std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > >(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&)':
unittest_json.cc:(.text._ZN7testing13PrintToStringISt6vectorINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESaIS7_EEEES7_RKT_[_ZN7testing13PrintToStringISt6vectorINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESaIS7_EEEES7_RKT_]+0x9c): undefined reference to `testing::internal::PrintStringTo(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::ostream*)'
CMakeFiles/dmlc_unit_tests.dir/unittest_json.cc.o: In function `testing::AssertionResult testing::internal::CmpHelperEQ<std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > >(char const*, char const*, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&)':
unittest_json.cc:(.text._ZN7testing8internal11CmpHelperEQISt6vectorINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESaIS8_EESA_EENS_15AssertionResultEPKcSD_RKT_RKT0_[_ZN7testing8internal11CmpHelperEQISt6vectorINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESaIS8_EESA_EENS_15AssertionResultEPKcSD_RKT_RKT0_]+0x8c): undefined reference to `testing::internal::EqFailure(char const*, char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool)'
CMakeFiles/dmlc_unit_tests.dir/unittest_json.cc.o: In function `testing::AssertionResult testing::internal::CmpHelperEQ<std::vector<std::vector<int, std::allocator<int> >, std::allocator<std::vector<int, std::allocator<int> > > >, std::vector<std::vector<int, std::allocator<int> >, std::allocator<std::vector<int, std::allocator<int> > > > >(char const*, char const*, std::vector<std::vector<int, std::allocator<int> >, std::allocator<std::vector<int, std::allocator<int> > > > const&, std::vector<std::vector<int, std::allocator<int> >, std::allocator<std::vector<int, std::allocator<int> > > > const&)':
unittest_json.cc:(.text._ZN7testing8internal11CmpHelperEQISt6vectorIS2_IiSaIiEESaIS4_EES6_EENS_15AssertionResultEPKcS9_RKT_RKT0_[_ZN7testing8internal11CmpHelperEQISt6vectorIS2_IiSaIiEESaIS4_EES6_EENS_15AssertionResultEPKcS9_RKT_RKT0_]+0x9c): undefined reference to `testing::internal::EqFailure(char const*, char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool)'
CMakeFiles/dmlc_unit_tests.dir/unittest_json.cc.o: In function `std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > testing::PrintToString<std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, int, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, int> > > >(std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, int, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, int> > > const&)':
unittest_json.cc:(.text._ZN7testing13PrintToStringISt3mapINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEiSt4lessIS7_ESaISt4pairIKS7_iEEEEES7_RKT_[_ZN7testing13PrintToStringISt3mapINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEiSt4lessIS7_ESaISt4pairIKS7_iEEEEES7_RKT_]+0xcb): undefined reference to `testing::internal::PrintStringTo(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::ostream*)'
CMakeFiles/dmlc_unit_tests.dir/unittest_json.cc.o: In function `testing::AssertionResult testing::internal::CmpHelperEQ<std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, int, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, int> > >, std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, int, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, int> > > >(char const*, char const*, std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, int, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, int> > > const&, std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, int, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, int> > > const&)':
unittest_json.cc:(.text._ZN7testing8internal11CmpHelperEQISt3mapINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEiSt4lessIS8_ESaISt4pairIKS8_iEEESF_EENS_15AssertionResultEPKcSI_RKT_RKT0_[_ZN7testing8internal11CmpHelperEQISt3mapINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEiSt4lessIS8_ESaISt4pairIKS8_iEEESF_EENS_15AssertionResultEPKcSI_RKT_RKT0_]+0x70): undefined reference to `testing::internal::EqFailure(char const*, char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool)'
CMakeFiles/dmlc_unit_tests.dir/unittest_json.cc.o: In function `std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > testing::PrintToString<std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, int, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, int> > > >(std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, int, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, int> > > const&)':
unittest_json.cc:(.text._ZN7testing13PrintToStringISt13unordered_mapINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEiSt4hashIS7_ESt8equal_toIS7_ESaISt4pairIKS7_iEEEEES7_RKT_[_ZN7testing13PrintToStringISt13unordered_mapINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEiSt4hashIS7_ESt8equal_toIS7_ESaISt4pairIKS7_iEEEEES7_RKT_]+0xc2): undefined reference to `testing::internal::PrintStringTo(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::ostream*)'
CMakeFiles/dmlc_unit_tests.dir/unittest_json.cc.o: In function `void json::TestSaveLoad<std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, int, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, int> > > >(std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, int, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, int> > >)':
unittest_json.cc:(.text._ZN4json12TestSaveLoadISt13unordered_mapINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEiSt4hashIS7_ESt8equal_toIS7_ESaISt4pairIKS7_iEEEEEvT_[_ZN4json12TestSaveLoadISt13unordered_mapINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEiSt4hashIS7_ESt8equal_toIS7_ESaISt4pairIKS7_iEEEEEvT_]+0x71c): undefined reference to `testing::internal::EqFailure(char const*, char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool)'
CMakeFiles/dmlc_unit_tests.dir/unittest_json.cc.o: In function `std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > testing::PrintToString<std::__cxx11::list<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > >(std::__cxx11::list<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&)':
unittest_json.cc:(.text._ZN7testing13PrintToStringINSt7__cxx114listINS1_12basic_stringIcSt11char_traitsIcESaIcEEESaIS7_EEEEES7_RKT_[_ZN7testing13PrintToStringINSt7__cxx114listINS1_12basic_stringIcSt11char_traitsIcESaIcEEESaIS7_EEEEES7_RKT_]+0x9d): undefined reference to `testing::internal::PrintStringTo(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::ostream*)'
CMakeFiles/dmlc_unit_tests.dir/unittest_json.cc.o: In function `testing::AssertionResult testing::internal::CmpHelperEQ<std::__cxx11::list<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >, std::__cxx11::list<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > >(char const*, char const*, std::__cxx11::list<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, std::__cxx11::list<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&)':
unittest_json.cc:(.text._ZN7testing8internal11CmpHelperEQINSt7__cxx114listINS2_12basic_stringIcSt11char_traitsIcESaIcEEESaIS8_EEESA_EENS_15AssertionResultEPKcSD_RKT_RKT0_[_ZN7testing8internal11CmpHelperEQINSt7__cxx114listINS2_12basic_stringIcSt11char_traitsIcESaIcEEESaIS8_EEESA_EENS_15AssertionResultEPKcSD_RKT_RKT0_]+0x91): undefined reference to `testing::internal::EqFailure(char const*, char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool)'
CMakeFiles/dmlc_unit_tests.dir/unittest_json.cc.o: In function `testing::AssertionResult testing::internal::CmpHelperEQ<std::__cxx11::list<int, std::allocator<int> >, std::__cxx11::list<int, std::allocator<int> > >(char const*, char const*, std::__cxx11::list<int, std::allocator<int> > const&, std::__cxx11::list<int, std::allocator<int> > const&)':
unittest_json.cc:(.text._ZN7testing8internal11CmpHelperEQINSt7__cxx114listIiSaIiEEES5_EENS_15AssertionResultEPKcS8_RKT_RKT0_[_ZN7testing8internal11CmpHelperEQINSt7__cxx114listIiSaIiEEES5_EENS_15AssertionResultEPKcS8_RKT_RKT0_]+0x8f): undefined reference to `testing::internal::EqFailure(char const*, char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool)'
CMakeFiles/dmlc_unit_tests.dir/unittest_json.cc.o: In function `testing::AssertionResult testing::internal::CmpHelperEQ<std::__cxx11::list<json::MyClass, std::allocator<json::MyClass> >, std::__cxx11::list<json::MyClass, std::allocator<json::MyClass> > >(char const*, char const*, std::__cxx11::list<json::MyClass, std::allocator<json::MyClass> > const&, std::__cxx11::list<json::MyClass, std::allocator<json::MyClass> > const&)':
unittest_json.cc:(.text._ZN7testing8internal11CmpHelperEQINSt7__cxx114listIN4json7MyClassESaIS5_EEES7_EENS_15AssertionResultEPKcSA_RKT_RKT0_[_ZN7testing8internal11CmpHelperEQINSt7__cxx114listIN4json7MyClassESaIS5_EEES7_EENS_15AssertionResultEPKcSA_RKT_RKT0_]+0x8f): undefined reference to `testing::internal::EqFailure(char const*, char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool)'
CMakeFiles/dmlc_unit_tests.dir/unittest_logging.cc.o: In function `Logging_basics_Test::TestBody()':
unittest_logging.cc:(.text+0x7c1): undefined reference to `testing::internal::FormatFileLocation[abi:cxx11](char const*, int)'
CMakeFiles/dmlc_unit_tests.dir/unittest_serializer.cc.o: In function `void TestSaveLoad<std::vector<int, std::allocator<int> > >(std::vector<int, std::allocator<int> >)':
unittest_serializer.cc:(.text._Z12TestSaveLoadISt6vectorIiSaIiEEEvT_[_Z12TestSaveLoadISt6vectorIiSaIiEEEvT_]+0x298): undefined reference to `testing::internal::EqFailure(char const*, char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool)'
CMakeFiles/dmlc_unit_tests.dir/unittest_serializer.cc.o: In function `std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > testing::PrintToString<std::map<int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::less<int>, std::allocator<std::pair<int const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > > >(std::map<int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::less<int>, std::allocator<std::pair<int const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > > const&)':
unittest_serializer.cc:(.text._ZN7testing13PrintToStringISt3mapIiNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESt4lessIiESaISt4pairIKiS7_EEEEES7_RKT_[_ZN7testing13PrintToStringISt3mapIiNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESt4lessIiESaISt4pairIKiS7_EEEEES7_RKT_]+0xea): undefined reference to `testing::internal::PrintStringTo(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::ostream*)'
CMakeFiles/dmlc_unit_tests.dir/unittest_serializer.cc.o: In function `testing::AssertionResult testing::internal::CmpHelperEQ<std::map<int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::less<int>, std::allocator<std::pair<int const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > >, std::map<int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::less<int>, std::allocator<std::pair<int const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > > >(char const*, char const*, std::map<int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::less<int>, std::allocator<std::pair<int const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > > const&, std::map<int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::less<int>, std::allocator<std::pair<int const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > > const&)':
unittest_serializer.cc:(.text._ZN7testing8internal11CmpHelperEQISt3mapIiNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESt4lessIiESaISt4pairIKiS8_EEESF_EENS_15AssertionResultEPKcSI_RKT_RKT0_[_ZN7testing8internal11CmpHelperEQISt3mapIiNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESt4lessIiESaISt4pairIKiS8_EEESF_EENS_15AssertionResultEPKcSI_RKT_RKT0_]+0x70): undefined reference to `testing::internal::EqFailure(char const*, char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool)'
CMakeFiles/dmlc_unit_tests.dir/unittest_serializer.cc.o: In function `std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > testing::PrintToString<std::unordered_multimap<int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::hash<int>, std::equal_to<int>, std::allocator<std::pair<int const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > > >(std::unordered_multimap<int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::hash<int>, std::equal_to<int>, std::allocator<std::pair<int const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > > const&)':
unittest_serializer.cc:(.text._ZN7testing13PrintToStringISt18unordered_multimapIiNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESt4hashIiESt8equal_toIiESaISt4pairIKiS7_EEEEES7_RKT_[_ZN7testing13PrintToStringISt18unordered_multimapIiNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESt4hashIiESt8equal_toIiESaISt4pairIKiS7_EEEEES7_RKT_]+0xe1): undefined reference to `testing::internal::PrintStringTo(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::ostream*)'
CMakeFiles/dmlc_unit_tests.dir/unittest_serializer.cc.o: In function `testing::AssertionResult testing::internal::CmpHelperEQFailure<std::unordered_multimap<int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::hash<int>, std::equal_to<int>, std::allocator<std::pair<int const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > >, std::unordered_multimap<int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::hash<int>, std::equal_to<int>, std::allocator<std::pair<int const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > > >(char const*, char const*, std::unordered_multimap<int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::hash<int>, std::equal_to<int>, std::allocator<std::pair<int const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > > const&, std::unordered_multimap<int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::hash<int>, std::equal_to<int>, std::allocator<std::pair<int const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > > const&)':
unittest_serializer.cc:(.text._ZN7testing8internal18CmpHelperEQFailureISt18unordered_multimapIiNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESt4hashIiESt8equal_toIiESaISt4pairIKiS8_EEESH_EENS_15AssertionResultEPKcSK_RKT_RKT0_[_ZN7testing8internal18CmpHelperEQFailureISt18unordered_multimapIiNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESt4hashIiESt8equal_toIiESaISt4pairIKiS8_EEESH_EENS_15AssertionResultEPKcSK_RKT_RKT0_]+0x5b): undefined reference to `testing::internal::EqFailure(char const*, char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool)'
CMakeFiles/dmlc_unit_tests.dir/unittest_serializer.cc.o: In function `std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > testing::PrintToString<std::set<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > >(std::set<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&)':
unittest_serializer.cc:(.text._ZN7testing13PrintToStringISt3setINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESt4lessIS7_ESaIS7_EEEES7_RKT_[_ZN7testing13PrintToStringISt3setINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESt4lessIS7_ESaIS7_EEEES7_RKT_]+0x9d): undefined reference to `testing::internal::PrintStringTo(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::ostream*)'
CMakeFiles/dmlc_unit_tests.dir/unittest_serializer.cc.o: In function `testing::AssertionResult testing::internal::CmpHelperEQ<std::set<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >, std::set<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > >(char const*, char const*, std::set<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, std::set<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&)':
unittest_serializer.cc:(.text._ZN7testing8internal11CmpHelperEQISt3setINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESt4lessIS8_ESaIS8_EESC_EENS_15AssertionResultEPKcSF_RKT_RKT0_[_ZN7testing8internal11CmpHelperEQISt3setINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESt4lessIS8_ESaIS8_EESC_EENS_15AssertionResultEPKcSF_RKT_RKT0_]+0x70): undefined reference to `testing::internal::EqFailure(char const*, char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool)'
CMakeFiles/dmlc_unit_tests.dir/unittest_serializer.cc.o: In function `std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > testing::PrintToString<std::unordered_set<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > >(std::unordered_set<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&)':
unittest_serializer.cc:(.text._ZN7testing13PrintToStringISt13unordered_setINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESt4hashIS7_ESt8equal_toIS7_ESaIS7_EEEES7_RKT_[_ZN7testing13PrintToStringISt13unordered_setINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESt4hashIS7_ESt8equal_toIS7_ESaIS7_EEEES7_RKT_]+0x95): undefined reference to `testing::internal::PrintStringTo(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::ostream*)'
CMakeFiles/dmlc_unit_tests.dir/unittest_serializer.cc.o: In function `void TestSaveLoad<std::__cxx11::list<int, std::allocator<int> > >(std::__cxx11::list<int, std::allocator<int> >)':
unittest_serializer.cc:(.text._Z12TestSaveLoadINSt7__cxx114listIiSaIiEEEEvT_[_Z12TestSaveLoadINSt7__cxx114listIiSaIiEEEEvT_]+0x2d0): undefined reference to `testing::internal::EqFailure(char const*, char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool)'
CMakeFiles/dmlc_unit_tests.dir/unittest_serializer.cc.o: In function `testing::AssertionResult testing::internal::CmpHelperEQ<std::__cxx11::list<MyClass, std::allocator<MyClass> >, std::__cxx11::list<MyClass, std::allocator<MyClass> > >(char const*, char const*, std::__cxx11::list<MyClass, std::allocator<MyClass> > const&, std::__cxx11::list<MyClass, std::allocator<MyClass> > const&)':
unittest_serializer.cc:(.text._ZN7testing8internal11CmpHelperEQINSt7__cxx114listI7MyClassSaIS4_EEES6_EENS_15AssertionResultEPKcS9_RKT_RKT0_[_ZN7testing8internal11CmpHelperEQINSt7__cxx114listI7MyClassSaIS4_EEES6_EENS_15AssertionResultEPKcS9_RKT_RKT0_]+0x91): undefined reference to `testing::internal::EqFailure(char const*, char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool)'
CMakeFiles/dmlc_unit_tests.dir/unittest_serializer.cc.o: In function `testing::AssertionResult testing::internal::CmpHelperEQ<std::__cxx11::list<Param, std::allocator<Param> >, std::__cxx11::list<Param, std::allocator<Param> > >(char const*, char const*, std::__cxx11::list<Param, std::allocator<Param> > const&, std::__cxx11::list<Param, std::allocator<Param> > const&)':
unittest_serializer.cc:(.text._ZN7testing8internal11CmpHelperEQINSt7__cxx114listI5ParamSaIS4_EEES6_EENS_15AssertionResultEPKcS9_RKT_RKT0_[_ZN7testing8internal11CmpHelperEQINSt7__cxx114listI5ParamSaIS4_EEES6_EENS_15AssertionResultEPKcS9_RKT_RKT0_]+0x88): undefined reference to `testing::internal::EqFailure(char const*, char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool)'
CMakeFiles/dmlc_unit_tests.dir/unittest_serializer.cc.o: In function `void TestSaveLoad<std::unordered_set<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > >(std::unordered_set<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >)':
unittest_serializer.cc:(.text._Z12TestSaveLoadISt13unordered_setINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESt4hashIS6_ESt8equal_toIS6_ESaIS6_EEEvT_[_Z12TestSaveLoadISt13unordered_setINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESt4hashIS6_ESt8equal_toIS6_ESaIS6_EEEvT_]+0x353): undefined reference to `testing::internal::EqFailure(char const*, char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool)'
CMakeFiles/dmlc_unit_tests.dir/unittest_main.cc.o: In function `main':
unittest_main.cc:(.text.startup+0x15): undefined reference to `testing::FLAGS_gtest_death_test_style[abi:cxx11]'
CMakeFiles/dmlc_unit_tests.dir/unittest_any.cc.o: In function `Any_json_Test::TestBody()':
unittest_any.cc:(.text+0x59ca): undefined reference to `testing::internal::EqFailure(char const*, char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool)'
collect2: error: ld returned 1 exit status
dmlc-core/test/unittest/CMakeFiles/dmlc_unit_tests.dir/build.make:330: recipe for target 'dmlc-core/test/unittest/dmlc_unit_tests' failed
make[2]: *** [dmlc-core/test/unittest/dmlc_unit_tests] Error 1
CMakeFiles/Makefile2:1178: recipe for target 'dmlc-core/test/unittest/CMakeFiles/dmlc_unit_tests.dir/all' failed
make[1]: *** [dmlc-core/test/unittest/CMakeFiles/dmlc_unit_tests.dir/all] Error 2
Makefile:138: recipe for target 'all' failed
make: *** [all] Error 2

Order of QActivation, QConvolution layers

@yanghaojin

  1. could you quickly elaborate why the QActivation (as referenced in the code snippet of the paper) is in front of the QActivation/QFullyConnected layers ?
    for example why is there another activation layer after the binarized one:
    ba2 = mx.symbol.QActivation(..) fc1 = mx.symbol.QFullyConnected(..) bn3 = mx . sym . BatchNorm (...) tanh3 = mx . sym . Activation (...)
  2. could one use the mx.symbol.LeakyReLU or would you suggest to implement activation functions like Prelu/Swish (as supported by Gluon API) for binary networks in the underlying C/C++ src code?
  3. for a project we're especially interested in running inference in C++, are both
    c_predict_api.h
    and
    mxnet-cpp/MxNetCpp.h as in https://github.com/apache/incubator-mxnet/blob/master/cpp-package/example/feature_extract/feature_extract.cpp
    compatible with BMXNet ?

Plans to compare performance under more configurations

Thanks for your great job about implementation of all this algotithms and integrating them in MXNet.
I'm curious about performance in more convolution configurations.For instance,33 convolutions are often used in CNNs and there is winograd fast convolution algorithm for doing calculations.
As I know,Intel MKLใ€NN-PACK and cudnn(starting from v5) have used fast convolution algorithm instead of im2col+gemm in most 3
3 convolution configurations. So do you have any plans about comparing performance in this situations?
Thanks a lot

QFullyConnected mutates weights in forward mode

It appears that QFullyConnected will quantize the weights into the input weight tensor even when not in training mode. It is a bit surprising to have the operator mutate the weight tensor like this. If this is not a bug, this behavior should be clearly documented in the operator description.

>>> W = mx.random.normal(shape=(4,64))
>>> W
[[-2.11374566e-01 -7.62305379e-01  5.35988867e-01 -4.48779255e-01
  -1.31812811e-01 -9.39604759e-01  8.35931599e-01  3.33142400e-01
   1.88610386e-02 -4.82557148e-01  7.96305165e-02  2.28180361e+00
  -6.15945280e-01 -6.84607267e-01 -1.09104884e+00  1.61278319e+00
  -4.62019652e-01  1.15526748e+00  1.00387812e+00  6.51997179e-02
  -2.08390355e-01 -5.01749277e-01 -8.90954554e-01  1.93811685e-01
  -3.69189644e+00  1.33110189e+00 -9.12502468e-01  6.57643005e-02
  -1.09751368e+00 -9.91342485e-01 -1.21290708e+00  6.61847472e-01
  -2.20562726e-01  1.52223051e-01  6.54029310e-01 -4.36110109e-01
   6.78317189e-01 -4.90361512e-01 -1.13644254e+00 -1.15610786e-01
  -1.22058713e+00  5.92948437e-01  1.15824485e+00  8.71689692e-02
  -1.06366360e+00  7.94529617e-01 -1.97111309e+00  4.99654144e-01
   7.78103471e-01 -9.06336457e-02  1.36469460e+00  9.52839136e-01
   7.28555679e-01  2.49940425e-01 -3.67091447e-01  2.34669849e-01
   1.23725939e+00  7.70155713e-02  7.63777673e-01 -2.70560622e-01
  -3.04230303e-02 -5.69541216e-01 -4.35389206e-02 -2.02609086e+00]
 [ 1.19611490e+00 -4.55334902e-01  1.75488353e-01 -1.21917176e+00
  -2.98362315e-01 -1.93958059e-01  1.80431500e-01 -1.58335018e+00
  -1.61724344e-01  1.60257757e-01 -3.08117604e+00 -1.37699589e-01
   2.87654519e-01 -1.49461657e-01 -3.96128535e-01  2.14600182e+00
   4.24181908e-01  3.94673020e-01 -1.84842292e-02 -1.17970586e+00
   9.18654054e-02  8.21183503e-01 -2.83561778e+00  1.59463704e-01
  -6.14835680e-01 -1.63099396e+00 -8.21941197e-02  1.87127218e-02
   1.70377719e+00 -2.62416095e-01  1.14750612e+00 -7.83303559e-01
   6.05888128e-01  6.09731436e-01 -2.25910731e-02 -9.14791644e-01
   1.02548385e+00 -3.56592703e+00  1.29791510e+00  4.42981362e-01
  -7.46885777e-01  1.02512610e+00 -7.97469497e-01 -3.27157199e-01
   6.98440671e-01 -8.62959862e-01 -9.37188506e-01  1.27880239e+00
  -2.33837748e+00 -3.82108897e-01 -6.23956919e-02 -8.48336697e-01
  -9.68048036e-01 -2.98008025e-01  9.47782397e-02  4.11213666e-01
   4.17784423e-01  9.08401981e-02  2.04132140e-01  1.24544680e+00
   5.85648179e-01  6.69055283e-01 -1.39358103e+00  5.04939497e-01]
 [ 9.20787096e-01  9.13565159e-01  1.52436423e+00 -7.06564724e-01
  -4.66956079e-01  3.56256664e-02 -4.71516877e-01  4.01355475e-01
   5.14568210e-01 -8.81631017e-01 -4.48225707e-01 -1.55657268e+00
  -1.13136508e-01 -1.88967620e-03 -1.17206562e+00 -5.11925995e-01
  -1.65847576e+00 -3.38403374e-01  1.68761730e+00 -1.71251976e+00
  -1.30054665e+00  1.02668285e-01 -2.58739978e-01 -6.71934068e-01
   1.46498546e-01  3.35435748e-01  4.68158603e-01 -3.10511351e-01
  -1.41961992e+00 -5.00294864e-01  9.75775719e-01 -2.83480972e-01
  -1.56842291e-01  8.74613285e-01  8.50444660e-02 -1.82479694e-01
  -4.73392665e-01 -9.15907085e-01 -8.06360245e-01 -6.43816411e-01
  -6.91942811e-01  7.48873591e-01 -7.36202061e-01  7.20680177e-01
   8.20632339e-01  1.83446443e+00 -1.45658314e+00 -6.36922061e-01
   1.22709394e+00  8.55946958e-01  9.65574801e-01  5.68778694e-01
  -1.02208860e-01  1.36076117e+00  3.91971320e-01  3.41300428e-01
   3.70879382e-01 -1.07574785e+00  1.05239189e+00  8.15406501e-01
   1.07894875e-01 -3.64720911e-01  2.12204620e-01  9.17427897e-01]
 [ 1.15112793e+00 -2.00505897e-01  9.29222584e-01 -7.11516738e-02
  -8.05326998e-01  1.32869601e+00 -9.25439358e-01 -6.03633940e-01
  -2.48306438e-01  3.89059186e-01  9.18562055e-01 -3.78619999e-01
   1.00211866e-01  7.20045030e-01 -4.44365352e-01 -2.64862776e+00
  -1.18471313e+00  1.16577756e+00 -6.09033763e-01  9.64892924e-01
   1.43267602e-01  1.88822067e+00 -2.35196084e-01 -2.37704784e-01
  -1.39442050e+00 -2.20630479e+00 -2.18164459e-01  1.50160953e-01
  -7.75259554e-01  6.50879443e-01 -8.46705019e-01  1.04838349e-01
  -7.26454630e-02 -7.21233130e-01 -9.52106655e-01  1.59448719e+00
  -9.63124096e-01 -1.21563292e+00 -6.99505329e-01 -1.20860569e-01
  -2.78758675e-01  7.75578797e-01 -4.66849864e-01 -6.78790927e-01
   1.25006175e+00 -2.72246242e-01 -1.13920772e+00  1.05596157e-02
   8.58640492e-01 -3.42171431e-01  1.21449947e+00  2.70008862e-01
  -1.82649934e+00  4.53750230e-02 -6.52859628e-01  3.11093211e-01
  -4.11078960e-01 -1.70676017e+00 -3.61594371e-02  2.44527057e-01
   2.20263505e+00 -9.06375766e-01 -1.25763461e-01  4.25077640e-02]]
<NDArray 4x64 @cpu(0)>
>>> mx.ndarray.QFullyConnected(data=mx.ndarray.ones((1,64)), weight=W, 
                                            num_hidden=4, act_bit=2, weight_bit=2)
[[-2.6666646 -3.3333306  2.0000014 -5.333333 ]]
<NDArray 1x4 @cpu(0)>
>>> W
[[-0.3333333  -0.3333333   0.33333337 -0.3333333  -0.3333333  -1.
   1.          0.33333337  0.33333337 -0.3333333   0.33333337  1.
  -0.3333333  -0.3333333  -1.          1.         -0.3333333   1.
   1.          0.33333337 -0.3333333  -0.3333333  -1.          0.33333337
  -1.          1.         -1.          0.33333337 -1.         -1.
  -1.          0.33333337 -0.3333333   0.33333337  0.33333337 -0.3333333
   0.33333337 -0.3333333  -1.         -0.3333333  -1.          0.33333337
   1.          0.33333337 -1.          0.33333337 -1.          0.33333337
   0.33333337 -0.3333333   1.          1.          0.33333337  0.33333337
  -0.3333333   0.33333337  1.          0.33333337  0.33333337 -0.3333333
  -0.3333333  -0.3333333  -0.3333333  -1.        ]
 [ 1.         -0.3333333   0.33333337 -1.         -0.3333333  -0.3333333
   0.33333337 -1.         -0.3333333   0.33333337 -1.         -0.3333333
   0.33333337 -0.3333333  -0.3333333   1.          0.33333337  0.33333337
  -0.3333333  -1.          0.33333337  1.         -1.          0.33333337
  -0.3333333  -1.         -0.3333333   0.33333337  1.         -0.3333333
   1.         -0.3333333   0.33333337  0.33333337 -0.3333333  -1.
   1.         -1.          1.          0.33333337 -0.3333333   1.
  -0.3333333  -0.3333333   0.33333337 -1.         -1.          1.
  -1.         -0.3333333  -0.3333333  -1.         -1.         -0.3333333
   0.33333337  0.33333337  0.33333337  0.33333337  0.33333337  1.
   0.33333337  0.33333337 -1.          0.33333337]
 [ 1.          1.          1.         -0.3333333  -0.3333333   0.33333337
  -0.3333333   0.33333337  0.33333337 -1.         -0.3333333  -1.
  -0.3333333  -0.3333333  -1.         -0.3333333  -1.         -0.3333333
   1.         -1.         -1.          0.33333337 -0.3333333  -0.3333333
   0.33333337  0.33333337  0.33333337 -0.3333333  -1.         -0.3333333
   1.         -0.3333333  -0.3333333   1.          0.33333337 -0.3333333
  -0.3333333  -1.         -1.         -0.3333333  -0.3333333   0.33333337
  -0.3333333   0.33333337  1.          1.         -1.         -0.3333333
   1.          1.          1.          0.33333337 -0.3333333   1.
   0.33333337  0.33333337  0.33333337 -1.          1.          1.
   0.33333337 -0.3333333   0.33333337  1.        ]
 [ 1.         -0.3333333   1.         -0.3333333  -1.          1.
  -1.         -0.3333333  -0.3333333   0.33333337  1.         -0.3333333
   0.33333337  0.33333337 -0.3333333  -1.         -1.          1.
  -0.3333333   1.          0.33333337  1.         -0.3333333  -0.3333333
  -1.         -1.         -0.3333333   0.33333337 -0.3333333   0.33333337
  -1.          0.33333337 -0.3333333  -0.3333333  -1.          1.
  -1.         -1.         -0.3333333  -0.3333333  -0.3333333   0.33333337
  -0.3333333  -0.3333333   1.         -0.3333333  -1.          0.33333337
   1.         -0.3333333   1.          0.33333337 -1.          0.33333337
  -0.3333333   0.33333337 -0.3333333  -1.         -0.3333333   0.33333337
   1.         -1.         -0.3333333   0.33333337]]
<NDArray 4x64 @cpu(0)>

less forward speed-up when batch size is larger

Hi, thanks for the great work first!

I used benchmark_score.py to evaluate the forward latency of Resnet-18 and Resnet-18-binary.

  1. Although Resnet-18-binary speeds up 1.5x at batch size 1, the speed up decrease when I have larger batch. When I have batch size 32, they have almost the same latency. Do you know why does that happen?

  2. The GPU performance of Resnet-18-binary is much worse than the floating point model. I understand that your optimization focused on CPU rather than GPU, but I thought binary model should have at least similar GPU performance as FP model. Why is it much worse?


Here are my running results:

INFO:root:network: resnet-18-binary
INFO:root:device: gpu(0)
INFO:root:batch size 1, image/sec: 16.735898
INFO:root:batch size 2, image/sec: 25.027532
INFO:root:batch size 4, image/sec: 33.737085
INFO:root:batch size 8, image/sec: 41.273390
INFO:root:batch size 16, image/sec: 47.007433
INFO:root:batch size 32, image/sec: 50.493328

INFO:root:device: cpu(0)
INFO:root:batch size 1, image/sec: 6.693615
INFO:root:batch size 2, image/sec: 8.799900
INFO:root:batch size 4, image/sec: 11.307120
INFO:root:batch size 8, image/sec: 12.709365
INFO:root:batch size 16, image/sec: 12.371296
INFO:root:batch size 32, image/sec: 13.402594


INFO:root:network: resnet-18
INFO:root:device: gpu(0)
INFO:root:batch size 1, image/sec: 130.296734
INFO:root:batch size 2, image/sec: 192.971986
INFO:root:batch size 4, image/sec: 271.567828
INFO:root:batch size 8, image/sec: 338.648713
INFO:root:batch size 16, image/sec: 461.010049
INFO:root:batch size 32, image/sec: 486.325190

INFO:root:device: cpu(0)
INFO:root:batch size 1, image/sec: 4.363451
INFO:root:batch size 2, image/sec: 6.357484
INFO:root:batch size 4, image/sec: 8.384733
INFO:root:batch size 8, image/sec: 10.529395
INFO:root:batch size 16, image/sec: 11.955591
INFO:root:batch size 32, image/sec: 13.027583

encounter errors durring complie

system: Ubuntu 16.04, CUDA8.0
follow the step:
$git clone --recursive https://github.com/hpi-xnor/mxnet.git # remember to include the --recursive
$ mkdir build/Release && cd build/Release
$ cmake ../../ # if any error occurs, apply ccmake or cmake-gui to adjust the cmake config.
$ make -j 12

encounter errors as follow. How can I solve them?

[ 98%] Linking CXX static library libmxnet.a
[ 98%] Built target mxnet_static
[ 98%] Building CXX object example/image-classification/predict-cpp/CMakeFiles/image-classification-predict.dir/image-classification-predict.cc.o
[ 99%] Linking CXX shared library libmxnet.so
[100%] Building CXX object smd_hpi/tools/model-converter/CMakeFiles/model-converter.dir/main.cpp.o
[100%] Linking CXX executable image-classification-predict
[100%] Linking CXX executable model-converter
/home/sfzhou/BMXNet/build/Release/libmxnet.a(ndarray.cc.o): In function mxnet::Imdecode(mxnet::NDArray*, mxnet::NDArray, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, char*)': ndarray.cc:(.text+0xd14f): undefined reference to cv::String::allocate(unsigned long)'
ndarray.cc:(.text+0xd1a0): undefined reference to cv::error(int, cv::String const&, char const*, char const*, int)' ndarray.cc:(.text+0xd1a8): undefined reference to cv::String::deallocate()'
ndarray.cc:(.text+0xf1ed): undefined reference to cv::String::deallocate()' /home/sfzhou/BMXNet/build/Release/libmxnet.a(ndarray.cc.o): In function cvflann::anyimpl::big_any_policycv::String::static_delete(void**)':
ndarray.cc:(.text._ZN7cvflann7anyimpl14big_any_policyIN2cv6StringEE13static_deleteEPPv[_ZN7cvflann7anyimpl14big_any_policyIN2cv6StringEE13static_deleteEPPv]+0x15): undefined reference to cv::String::deallocate()' /home/sfzhou/BMXNet/build/Release/libmxnet.a(ndarray.cc.o): In function cvflann::anyimpl::big_any_policycv::String::move(void* const*, void**)':
ndarray.cc:(.text.ZN7cvflann7anyimpl14big_any_policyIN2cv6StringEE4moveEPKPvPS5[ZN7cvflann7anyimpl14big_any_policyIN2cv6StringEE4moveEPKPvPS5]+0x10): undefined reference to cv::String::deallocate()' ndarray.cc:(.text._ZN7cvflann7anyimpl14big_any_policyIN2cv6StringEE4moveEPKPvPS5_[_ZN7cvflann7anyimpl14big_any_policyIN2cv6StringEE4moveEPKPvPS5_]+0x24): undefined reference to cv::String::deallocate()'
/home/sfzhou/BMXNet/build/Release/libmxnet.a(image_io.cc.o): In function cv::Mat::Mat(int, int, int, void*, unsigned long) [clone .constprop.608]': image_io.cc:(.text+0x22d): undefined reference to cv::String::allocate(unsigned long)'
image_io.cc:(.text+0x27e): undefined reference to cv::error(int, cv::String const&, char const*, char const*, int)' image_io.cc:(.text+0x286): undefined reference to cv::String::deallocate()'
image_io.cc:(.text+0x2b0): undefined reference to cv::String::deallocate()' /home/sfzhou/BMXNet/build/Release/libmxnet.a(iter_image_recordio.cc.o): In function mxnet::io::ImageRecordIOParser::ParseNext(std::vector<mxnet::io::InstVector, std::allocator<mxnet::io::InstVector > >) [clone ._omp_fn.2]':
iter_image_recordio.cc:(.text._ZN5mxnet2io19ImageRecordIOParserIfE9ParseNextEPSt6vectorINS0_10InstVectorIfEESaIS5_EE._omp_fn.2[_ZN5mxnet2io19ImageRecordIOParserIfE9ParseNextEPSt6vectorINS0_10InstVectorIfEESaIS5_EE]+0x13f9): undefined reference to cv::String::allocate(unsigned long)' iter_image_recordio.cc:(.text._ZN5mxnet2io19ImageRecordIOParserIfE9ParseNextEPSt6vectorINS0_10InstVectorIfEESaIS5_EE._omp_fn.2[_ZN5mxnet2io19ImageRecordIOParserIfE9ParseNextEPSt6vectorINS0_10InstVectorIfEESaIS5_EE]+0x144a): undefined reference to cv::error(int, cv::String const&, char const
, char const*, int)'
iter_image_recordio.cc:(.text._ZN5mxnet2io19ImageRecordIOParserIfE9ParseNextEPSt6vectorINS0_10InstVectorIfEESaIS5_EE._omp_fn.2[_ZN5mxnet2io19ImageRecordIOParserIfE9ParseNextEPSt6vectorINS0_10InstVectorIfEESaIS5_EE]+0x1454): undefined reference to cv::String::deallocate()' iter_image_recordio.cc:(.text._ZN5mxnet2io19ImageRecordIOParserIfE9ParseNextEPSt6vectorINS0_10InstVectorIfEESaIS5_EE._omp_fn.2[_ZN5mxnet2io19ImageRecordIOParserIfE9ParseNextEPSt6vectorINS0_10InstVectorIfEESaIS5_EE]+0x150f): undefined reference to cv::String::deallocate()'
/home/sfzhou/BMXNet/build/Release/libmxnet.a(iter_image_recordio.cc.o): In function mxnet::io::ImageRecordIOParser<unsigned char>::ParseNext(std::vector<mxnet::io::InstVector<unsigned char>, std::allocator<mxnet::io::InstVector<unsigned char> > >*) [clone ._omp_fn.5]': iter_image_recordio.cc:(.text._ZN5mxnet2io19ImageRecordIOParserIhE9ParseNextEPSt6vectorINS0_10InstVectorIhEESaIS5_EE._omp_fn.5[_ZN5mxnet2io19ImageRecordIOParserIhE9ParseNextEPSt6vectorINS0_10InstVectorIhEESaIS5_EE]+0x13da): undefined reference to cv::String::allocate(unsigned long)'
iter_image_recordio.cc:(.text._ZN5mxnet2io19ImageRecordIOParserIhE9ParseNextEPSt6vectorINS0_10InstVectorIhEESaIS5_EE._omp_fn.5[_ZN5mxnet2io19ImageRecordIOParserIhE9ParseNextEPSt6vectorINS0_10InstVectorIhEESaIS5_EE]+0x142b): undefined reference to cv::error(int, cv::String const&, char const*, char const*, int)' iter_image_recordio.cc:(.text._ZN5mxnet2io19ImageRecordIOParserIhE9ParseNextEPSt6vectorINS0_10InstVectorIhEESaIS5_EE._omp_fn.5[_ZN5mxnet2io19ImageRecordIOParserIhE9ParseNextEPSt6vectorINS0_10InstVectorIhEESaIS5_EE]+0x1435): undefined reference to cv::String::deallocate()'
iter_image_recordio.cc:(.text._ZN5mxnet2io19ImageRecordIOParserIhE9ParseNextEPSt6vectorINS0_10InstVectorIhEESaIS5_EE._omp_fn.5[_ZN5mxnet2io19ImageRecordIOParserIhE9ParseNextEPSt6vectorINS0_10InstVectorIhEESaIS5_EE]+0x14bd): undefined reference to cv::String::deallocate()' /home/sfzhou/BMXNet/build/Release/libmxnet.a(iter_image_recordio_2.cc.o): In function mxnet::io::ImageRecordIOParser2::ParseChunk(float*, float*, unsigned int, dmlc::InputSplit::Blob*) [clone ._omp_fn.2]':
iter_image_recordio_2.cc:(.text+0x2cea): undefined reference to cv::String::allocate(unsigned long)' iter_image_recordio_2.cc:(.text+0x2d31): undefined reference to cv::error(int, cv::String const&, char const*, char const*, int)'
iter_image_recordio_2.cc:(.text+0x2d3b): undefined reference to cv::String::deallocate()' iter_image_recordio_2.cc:(.text+0x3423): undefined reference to cv::String::deallocate()'
/home/sfzhou/BMXNet/build/Release/libmxnet.a(iter_image_recordio_2.cc.o): In function mxnet::io::ImageRecordIOParser2<unsigned char>::ParseChunk(unsigned char*, float*, unsigned int, dmlc::InputSplit::Blob*) [clone ._omp_fn.7]': iter_image_recordio_2.cc:(.text._ZN5mxnet2io20ImageRecordIOParser2IhE10ParseChunkEPhPfjPN4dmlc10InputSplit4BlobE._omp_fn.7[_ZN5mxnet2io20ImageRecordIOParser2IhE10ParseChunkEPhPfjPN4dmlc10InputSplit4BlobE]+0xf08): undefined reference to cv::String::allocate(unsigned long)'
iter_image_recordio_2.cc:(.text._ZN5mxnet2io20ImageRecordIOParser2IhE10ParseChunkEPhPfjPN4dmlc10InputSplit4BlobE._omp_fn.7[_ZN5mxnet2io20ImageRecordIOParser2IhE10ParseChunkEPhPfjPN4dmlc10InputSplit4BlobE]+0xf59): undefined reference to cv::error(int, cv::String const&, char const*, char const*, int)' iter_image_recordio_2.cc:(.text._ZN5mxnet2io20ImageRecordIOParser2IhE10ParseChunkEPhPfjPN4dmlc10InputSplit4BlobE._omp_fn.7[_ZN5mxnet2io20ImageRecordIOParser2IhE10ParseChunkEPhPfjPN4dmlc10InputSplit4BlobE]+0xf63): undefined reference to cv::String::deallocate()'
iter_image_recordio_2.cc:(.text._ZN5mxnet2io20ImageRecordIOParser2IhE10ParseChunkEPhPfjPN4dmlc10InputSplit4BlobE._omp_fn.7[_ZN5mxnet2io20ImageRecordIOParser2IhE10ParseChunkEPhPfjPN4dmlc10InputSplit4BlobE]+0x18cb): undefined reference to cv::String::deallocate()' /home/sfzhou/BMXNet/build/Release/libmxnet.a(iter_image_det_recordio.cc.o): In function mxnet::io::ImageDetRecordIOParser::ParseNext(std::vector<mxnet::io::InstVector, std::allocator<mxnet::io::InstVector > >) [clone ._omp_fn.3]':
iter_image_det_recordio.cc:(.text._ZN5mxnet2io22ImageDetRecordIOParserIfE9ParseNextEPSt6vectorINS0_10InstVectorIfEESaIS5_EE._omp_fn.3[_ZN5mxnet2io22ImageDetRecordIOParserIfE9ParseNextEPSt6vectorINS0_10InstVectorIfEESaIS5_EE]+0x1cc2): undefined reference to cv::String::allocate(unsigned long)' iter_image_det_recordio.cc:(.text._ZN5mxnet2io22ImageDetRecordIOParserIfE9ParseNextEPSt6vectorINS0_10InstVectorIfEESaIS5_EE._omp_fn.3[_ZN5mxnet2io22ImageDetRecordIOParserIfE9ParseNextEPSt6vectorINS0_10InstVectorIfEESaIS5_EE]+0x1d09): undefined reference to cv::error(int, cv::String const&, char const
, char const*, int)'
iter_image_det_recordio.cc:(.text._ZN5mxnet2io22ImageDetRecordIOParserIfE9ParseNextEPSt6vectorINS0_10InstVectorIfEESaIS5_EE._omp_fn.3[_ZN5mxnet2io22ImageDetRecordIOParserIfE9ParseNextEPSt6vectorINS0_10InstVectorIfEESaIS5_EE]+0x1d11): undefined reference to cv::String::deallocate()' iter_image_det_recordio.cc:(.text._ZN5mxnet2io22ImageDetRecordIOParserIfE9ParseNextEPSt6vectorINS0_10InstVectorIfEESaIS5_EE._omp_fn.3[_ZN5mxnet2io22ImageDetRecordIOParserIfE9ParseNextEPSt6vectorINS0_10InstVectorIfEESaIS5_EE]+0x2029): undefined reference to cv::String::deallocate()'
collect2: error: ld returned 1 exit status
CMakeFiles/mxnet.dir/build.make:84: recipe for target 'libmxnet.so' failed
make[2]: *** [libmxnet.so] Error 1
CMakeFiles/Makefile2:137: recipe for target 'CMakeFiles/mxnet.dir/all' failed
make[1]: *** [CMakeFiles/mxnet.dir/all] Error 2
make[1]: *** Waiting for unfinished jobs....
CMakeFiles/image-classification-predict.dir/image-classification-predict.cc.o: In function GetImageFile(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, float*, int, cv::Size_<int>, float const*)': image-classification-predict.cc:(.text+0x182): undefined reference to cv::String::allocate(unsigned long)'
image-classification-predict.cc:(.text+0x1b6): undefined reference to cv::imread(cv::String const&, int)' image-classification-predict.cc:(.text+0x1be): undefined reference to cv::String::deallocate()'
image-classification-predict.cc:(.text+0xeb7): undefined reference to cv::String::deallocate()' CMakeFiles/image-classification-predict.dir/image-classification-predict.cc.o: In function cvflann::anyimpl::big_any_policycv::String::move(void* const*, void**)':
image-classification-predict.cc:(.text.ZN7cvflann7anyimpl14big_any_policyIN2cv6StringEE4moveEPKPvPS5[ZN7cvflann7anyimpl14big_any_policyIN2cv6StringEE4moveEPKPvPS5]+0x10): undefined reference to cv::String::deallocate()' image-classification-predict.cc:(.text._ZN7cvflann7anyimpl14big_any_policyIN2cv6StringEE4moveEPKPvPS5_[_ZN7cvflann7anyimpl14big_any_policyIN2cv6StringEE4moveEPKPvPS5_]+0x24): undefined reference to cv::String::deallocate()'
CMakeFiles/image-classification-predict.dir/image-classification-predict.cc.o: In function cvflann::anyimpl::big_any_policy<cv::String>::static_delete(void**)': image-classification-predict.cc:(.text._ZN7cvflann7anyimpl14big_any_policyIN2cv6StringEE13static_deleteEPPv[_ZN7cvflann7anyimpl14big_any_policyIN2cv6StringEE13static_deleteEPPv]+0x15): undefined reference to cv::String::deallocate()'
../../../libmxnet.a(ndarray.cc.o): In function mxnet::Imdecode(mxnet::NDArray*, mxnet::NDArray, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, char*)': ndarray.cc:(.text+0xd14f): undefined reference to cv::String::allocate(unsigned long)'
ndarray.cc:(.text+0xd1a0): undefined reference to cv::error(int, cv::String const&, char const*, char const*, int)' ndarray.cc:(.text+0xd1a8): undefined reference to cv::String::deallocate()'
ndarray.cc:(.text+0xf1ed): undefined reference to cv::String::deallocate()' ../../../libmxnet.a(image_io.cc.o): In function cv::Mat::Mat(int, int, int, void*, unsigned long) [clone .constprop.608]':
image_io.cc:(.text+0x22d): undefined reference to cv::String::allocate(unsigned long)' image_io.cc:(.text+0x27e): undefined reference to cv::error(int, cv::String const&, char const*, char const*, int)'
image_io.cc:(.text+0x286): undefined reference to cv::String::deallocate()' image_io.cc:(.text+0x2b0): undefined reference to cv::String::deallocate()'
../../../libmxnet.a(iter_image_recordio.cc.o): In function mxnet::io::ImageRecordIOParser<float>::ParseNext(std::vector<mxnet::io::InstVector<float>, std::allocator<mxnet::io::InstVector<float> > >*) [clone ._omp_fn.2]': iter_image_recordio.cc:(.text._ZN5mxnet2io19ImageRecordIOParserIfE9ParseNextEPSt6vectorINS0_10InstVectorIfEESaIS5_EE._omp_fn.2[_ZN5mxnet2io19ImageRecordIOParserIfE9ParseNextEPSt6vectorINS0_10InstVectorIfEESaIS5_EE]+0x13f9): undefined reference to cv::String::allocate(unsigned long)'
iter_image_recordio.cc:(.text._ZN5mxnet2io19ImageRecordIOParserIfE9ParseNextEPSt6vectorINS0_10InstVectorIfEESaIS5_EE._omp_fn.2[_ZN5mxnet2io19ImageRecordIOParserIfE9ParseNextEPSt6vectorINS0_10InstVectorIfEESaIS5_EE]+0x144a): undefined reference to cv::error(int, cv::String const&, char const*, char const*, int)' iter_image_recordio.cc:(.text._ZN5mxnet2io19ImageRecordIOParserIfE9ParseNextEPSt6vectorINS0_10InstVectorIfEESaIS5_EE._omp_fn.2[_ZN5mxnet2io19ImageRecordIOParserIfE9ParseNextEPSt6vectorINS0_10InstVectorIfEESaIS5_EE]+0x1454): undefined reference to cv::String::deallocate()'
iter_image_recordio.cc:(.text._ZN5mxnet2io19ImageRecordIOParserIfE9ParseNextEPSt6vectorINS0_10InstVectorIfEESaIS5_EE._omp_fn.2[_ZN5mxnet2io19ImageRecordIOParserIfE9ParseNextEPSt6vectorINS0_10InstVectorIfEESaIS5_EE]+0x150f): undefined reference to cv::String::deallocate()' ../../../libmxnet.a(iter_image_recordio.cc.o): In function mxnet::io::ImageRecordIOParser::ParseNext(std::vector<mxnet::io::InstVector, std::allocator<mxnet::io::InstVector > >) [clone ._omp_fn.5]':
iter_image_recordio.cc:(.text._ZN5mxnet2io19ImageRecordIOParserIhE9ParseNextEPSt6vectorINS0_10InstVectorIhEESaIS5_EE._omp_fn.5[_ZN5mxnet2io19ImageRecordIOParserIhE9ParseNextEPSt6vectorINS0_10InstVectorIhEESaIS5_EE]+0x13da): undefined reference to cv::String::allocate(unsigned long)' iter_image_recordio.cc:(.text._ZN5mxnet2io19ImageRecordIOParserIhE9ParseNextEPSt6vectorINS0_10InstVectorIhEESaIS5_EE._omp_fn.5[_ZN5mxnet2io19ImageRecordIOParserIhE9ParseNextEPSt6vectorINS0_10InstVectorIhEESaIS5_EE]+0x142b): undefined reference to cv::error(int, cv::String const&, char const
, char const*, int)'
iter_image_recordio.cc:(.text._ZN5mxnet2io19ImageRecordIOParserIhE9ParseNextEPSt6vectorINS0_10InstVectorIhEESaIS5_EE._omp_fn.5[_ZN5mxnet2io19ImageRecordIOParserIhE9ParseNextEPSt6vectorINS0_10InstVectorIhEESaIS5_EE]+0x1435): undefined reference to cv::String::deallocate()' iter_image_recordio.cc:(.text._ZN5mxnet2io19ImageRecordIOParserIhE9ParseNextEPSt6vectorINS0_10InstVectorIhEESaIS5_EE._omp_fn.5[_ZN5mxnet2io19ImageRecordIOParserIhE9ParseNextEPSt6vectorINS0_10InstVectorIhEESaIS5_EE]+0x14bd): undefined reference to cv::String::deallocate()'
../../../libmxnet.a(iter_image_recordio_2.cc.o): In function mxnet::io::ImageRecordIOParser2<float>::ParseChunk(float*, float*, unsigned int, dmlc::InputSplit::Blob*) [clone ._omp_fn.2]': iter_image_recordio_2.cc:(.text+0x2cea): undefined reference to cv::String::allocate(unsigned long)'
iter_image_recordio_2.cc:(.text+0x2d31): undefined reference to cv::error(int, cv::String const&, char const*, char const*, int)' iter_image_recordio_2.cc:(.text+0x2d3b): undefined reference to cv::String::deallocate()'
iter_image_recordio_2.cc:(.text+0x3423): undefined reference to cv::String::deallocate()' ../../../libmxnet.a(iter_image_recordio_2.cc.o): In function mxnet::io::ImageRecordIOParser2::ParseChunk(unsigned char*, float*, unsigned int, dmlc::InputSplit::Blob*) [clone ._omp_fn.7]':
iter_image_recordio_2.cc:(.text._ZN5mxnet2io20ImageRecordIOParser2IhE10ParseChunkEPhPfjPN4dmlc10InputSplit4BlobE._omp_fn.7[_ZN5mxnet2io20ImageRecordIOParser2IhE10ParseChunkEPhPfjPN4dmlc10InputSplit4BlobE]+0xf08): undefined reference to cv::String::allocate(unsigned long)' iter_image_recordio_2.cc:(.text._ZN5mxnet2io20ImageRecordIOParser2IhE10ParseChunkEPhPfjPN4dmlc10InputSplit4BlobE._omp_fn.7[_ZN5mxnet2io20ImageRecordIOParser2IhE10ParseChunkEPhPfjPN4dmlc10InputSplit4BlobE]+0xf59): undefined reference to cv::error(int, cv::String const&, char const*, char const*, int)'
iter_image_recordio_2.cc:(.text._ZN5mxnet2io20ImageRecordIOParser2IhE10ParseChunkEPhPfjPN4dmlc10InputSplit4BlobE._omp_fn.7[_ZN5mxnet2io20ImageRecordIOParser2IhE10ParseChunkEPhPfjPN4dmlc10InputSplit4BlobE]+0xf63): undefined reference to cv::String::deallocate()' iter_image_recordio_2.cc:(.text._ZN5mxnet2io20ImageRecordIOParser2IhE10ParseChunkEPhPfjPN4dmlc10InputSplit4BlobE._omp_fn.7[_ZN5mxnet2io20ImageRecordIOParser2IhE10ParseChunkEPhPfjPN4dmlc10InputSplit4BlobE]+0x18cb): undefined reference to cv::String::deallocate()'
../../../libmxnet.a(iter_image_det_recordio.cc.o): In function mxnet::io::ImageDetRecordIOParser<float>::ParseNext(std::vector<mxnet::io::InstVector<float>, std::allocator<mxnet::io::InstVector<float> > >*) [clone ._omp_fn.3]': iter_image_det_recordio.cc:(.text._ZN5mxnet2io22ImageDetRecordIOParserIfE9ParseNextEPSt6vectorINS0_10InstVectorIfEESaIS5_EE._omp_fn.3[_ZN5mxnet2io22ImageDetRecordIOParserIfE9ParseNextEPSt6vectorINS0_10InstVectorIfEESaIS5_EE]+0x1cc2): undefined reference to cv::String::allocate(unsigned long)'
iter_image_det_recordio.cc:(.text._ZN5mxnet2io22ImageDetRecordIOParserIfE9ParseNextEPSt6vectorINS0_10InstVectorIfEESaIS5_EE._omp_fn.3[_ZN5mxnet2io22ImageDetRecordIOParserIfE9ParseNextEPSt6vectorINS0_10InstVectorIfEESaIS5_EE]+0x1d09): undefined reference to cv::error(int, cv::String const&, char const*, char const*, int)' iter_image_det_recordio.cc:(.text._ZN5mxnet2io22ImageDetRecordIOParserIfE9ParseNextEPSt6vectorINS0_10InstVectorIfEESaIS5_EE._omp_fn.3[_ZN5mxnet2io22ImageDetRecordIOParserIfE9ParseNextEPSt6vectorINS0_10InstVectorIfEESaIS5_EE]+0x1d11): undefined reference to cv::String::deallocate()'
iter_image_det_recordio.cc:(.text._ZN5mxnet2io22ImageDetRecordIOParserIfE9ParseNextEPSt6vectorINS0_10InstVectorIfEESaIS5_EE._omp_fn.3[_ZN5mxnet2io22ImageDetRecordIOParserIfE9ParseNextEPSt6vectorINS0_10InstVectorIfEESaIS5_EE]+0x2029): undefined reference to cv::String::deallocate()' collect2: error: ld returned 1 exit status example/image-classification/predict-cpp/CMakeFiles/image-classification-predict.dir/build.make:110: recipe for target 'example/image-classification/predict-cpp/image-classification-predict' failed make[2]: *** [example/image-classification/predict-cpp/image-classification-predict] Error 1 CMakeFiles/Makefile2:1283: recipe for target 'example/image-classification/predict-cpp/CMakeFiles/image-classification-predict.dir/all' failed make[1]: *** [example/image-classification/predict-cpp/CMakeFiles/image-classification-predict.dir/all] Error 2 ../../../libmxnet.a(ndarray.cc.o): In function mxnet::Imdecode(mxnet::NDArray*, mxnet::NDArray, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, char*)':
ndarray.cc:(.text+0xd14f): undefined reference to cv::String::allocate(unsigned long)' ndarray.cc:(.text+0xd1a0): undefined reference to cv::error(int, cv::String const&, char const*, char const*, int)'
ndarray.cc:(.text+0xd1a8): undefined reference to cv::String::deallocate()' ndarray.cc:(.text+0xf1ed): undefined reference to cv::String::deallocate()'
../../../libmxnet.a(ndarray.cc.o): In function cvflann::anyimpl::big_any_policy<cv::String>::static_delete(void**)': ndarray.cc:(.text._ZN7cvflann7anyimpl14big_any_policyIN2cv6StringEE13static_deleteEPPv[_ZN7cvflann7anyimpl14big_any_policyIN2cv6StringEE13static_deleteEPPv]+0x15): undefined reference to cv::String::deallocate()'
../../../libmxnet.a(ndarray.cc.o): In function cvflann::anyimpl::big_any_policy<cv::String>::move(void* const*, void**)': ndarray.cc:(.text._ZN7cvflann7anyimpl14big_any_policyIN2cv6StringEE4moveEPKPvPS5_[_ZN7cvflann7anyimpl14big_any_policyIN2cv6StringEE4moveEPKPvPS5_]+0x10): undefined reference to cv::String::deallocate()'
ndarray.cc:(.text.ZN7cvflann7anyimpl14big_any_policyIN2cv6StringEE4moveEPKPvPS5[ZN7cvflann7anyimpl14big_any_policyIN2cv6StringEE4moveEPKPvPS5]+0x24): undefined reference to cv::String::deallocate()' ../../../libmxnet.a(image_io.cc.o): In function cv::Mat::Mat(int, int, int, void*, unsigned long) [clone .constprop.608]':
image_io.cc:(.text+0x22d): undefined reference to cv::String::allocate(unsigned long)' image_io.cc:(.text+0x27e): undefined reference to cv::error(int, cv::String const&, char const*, char const*, int)'
image_io.cc:(.text+0x286): undefined reference to cv::String::deallocate()' image_io.cc:(.text+0x2b0): undefined reference to cv::String::deallocate()'
../../../libmxnet.a(iter_image_recordio.cc.o): In function mxnet::io::ImageRecordIOParser<float>::ParseNext(std::vector<mxnet::io::InstVector<float>, std::allocator<mxnet::io::InstVector<float> > >*) [clone ._omp_fn.2]': iter_image_recordio.cc:(.text._ZN5mxnet2io19ImageRecordIOParserIfE9ParseNextEPSt6vectorINS0_10InstVectorIfEESaIS5_EE._omp_fn.2[_ZN5mxnet2io19ImageRecordIOParserIfE9ParseNextEPSt6vectorINS0_10InstVectorIfEESaIS5_EE]+0x13f9): undefined reference to cv::String::allocate(unsigned long)'
iter_image_recordio.cc:(.text._ZN5mxnet2io19ImageRecordIOParserIfE9ParseNextEPSt6vectorINS0_10InstVectorIfEESaIS5_EE._omp_fn.2[_ZN5mxnet2io19ImageRecordIOParserIfE9ParseNextEPSt6vectorINS0_10InstVectorIfEESaIS5_EE]+0x144a): undefined reference to cv::error(int, cv::String const&, char const*, char const*, int)' iter_image_recordio.cc:(.text._ZN5mxnet2io19ImageRecordIOParserIfE9ParseNextEPSt6vectorINS0_10InstVectorIfEESaIS5_EE._omp_fn.2[_ZN5mxnet2io19ImageRecordIOParserIfE9ParseNextEPSt6vectorINS0_10InstVectorIfEESaIS5_EE]+0x1454): undefined reference to cv::String::deallocate()'
iter_image_recordio.cc:(.text._ZN5mxnet2io19ImageRecordIOParserIfE9ParseNextEPSt6vectorINS0_10InstVectorIfEESaIS5_EE._omp_fn.2[_ZN5mxnet2io19ImageRecordIOParserIfE9ParseNextEPSt6vectorINS0_10InstVectorIfEESaIS5_EE]+0x150f): undefined reference to cv::String::deallocate()' ../../../libmxnet.a(iter_image_recordio.cc.o): In function mxnet::io::ImageRecordIOParser::ParseNext(std::vector<mxnet::io::InstVector, std::allocator<mxnet::io::InstVector > >) [clone ._omp_fn.5]':
iter_image_recordio.cc:(.text._ZN5mxnet2io19ImageRecordIOParserIhE9ParseNextEPSt6vectorINS0_10InstVectorIhEESaIS5_EE._omp_fn.5[_ZN5mxnet2io19ImageRecordIOParserIhE9ParseNextEPSt6vectorINS0_10InstVectorIhEESaIS5_EE]+0x13da): undefined reference to cv::String::allocate(unsigned long)' iter_image_recordio.cc:(.text._ZN5mxnet2io19ImageRecordIOParserIhE9ParseNextEPSt6vectorINS0_10InstVectorIhEESaIS5_EE._omp_fn.5[_ZN5mxnet2io19ImageRecordIOParserIhE9ParseNextEPSt6vectorINS0_10InstVectorIhEESaIS5_EE]+0x142b): undefined reference to cv::error(int, cv::String const&, char const
, char const*, int)'
iter_image_recordio.cc:(.text._ZN5mxnet2io19ImageRecordIOParserIhE9ParseNextEPSt6vectorINS0_10InstVectorIhEESaIS5_EE._omp_fn.5[_ZN5mxnet2io19ImageRecordIOParserIhE9ParseNextEPSt6vectorINS0_10InstVectorIhEESaIS5_EE]+0x1435): undefined reference to cv::String::deallocate()' iter_image_recordio.cc:(.text._ZN5mxnet2io19ImageRecordIOParserIhE9ParseNextEPSt6vectorINS0_10InstVectorIhEESaIS5_EE._omp_fn.5[_ZN5mxnet2io19ImageRecordIOParserIhE9ParseNextEPSt6vectorINS0_10InstVectorIhEESaIS5_EE]+0x14bd): undefined reference to cv::String::deallocate()'
../../../libmxnet.a(iter_image_recordio_2.cc.o): In function mxnet::io::ImageRecordIOParser2<float>::ParseChunk(float*, float*, unsigned int, dmlc::InputSplit::Blob*) [clone ._omp_fn.2]': iter_image_recordio_2.cc:(.text+0x2cea): undefined reference to cv::String::allocate(unsigned long)'
iter_image_recordio_2.cc:(.text+0x2d31): undefined reference to cv::error(int, cv::String const&, char const*, char const*, int)' iter_image_recordio_2.cc:(.text+0x2d3b): undefined reference to cv::String::deallocate()'
iter_image_recordio_2.cc:(.text+0x3423): undefined reference to cv::String::deallocate()' ../../../libmxnet.a(iter_image_recordio_2.cc.o): In function mxnet::io::ImageRecordIOParser2::ParseChunk(unsigned char*, float*, unsigned int, dmlc::InputSplit::Blob*) [clone ._omp_fn.7]':
iter_image_recordio_2.cc:(.text._ZN5mxnet2io20ImageRecordIOParser2IhE10ParseChunkEPhPfjPN4dmlc10InputSplit4BlobE._omp_fn.7[_ZN5mxnet2io20ImageRecordIOParser2IhE10ParseChunkEPhPfjPN4dmlc10InputSplit4BlobE]+0xf08): undefined reference to cv::String::allocate(unsigned long)' iter_image_recordio_2.cc:(.text._ZN5mxnet2io20ImageRecordIOParser2IhE10ParseChunkEPhPfjPN4dmlc10InputSplit4BlobE._omp_fn.7[_ZN5mxnet2io20ImageRecordIOParser2IhE10ParseChunkEPhPfjPN4dmlc10InputSplit4BlobE]+0xf59): undefined reference to cv::error(int, cv::String const&, char const*, char const*, int)'
iter_image_recordio_2.cc:(.text._ZN5mxnet2io20ImageRecordIOParser2IhE10ParseChunkEPhPfjPN4dmlc10InputSplit4BlobE._omp_fn.7[_ZN5mxnet2io20ImageRecordIOParser2IhE10ParseChunkEPhPfjPN4dmlc10InputSplit4BlobE]+0xf63): undefined reference to cv::String::deallocate()' iter_image_recordio_2.cc:(.text._ZN5mxnet2io20ImageRecordIOParser2IhE10ParseChunkEPhPfjPN4dmlc10InputSplit4BlobE._omp_fn.7[_ZN5mxnet2io20ImageRecordIOParser2IhE10ParseChunkEPhPfjPN4dmlc10InputSplit4BlobE]+0x18cb): undefined reference to cv::String::deallocate()'
../../../libmxnet.a(iter_image_det_recordio.cc.o): In function mxnet::io::ImageDetRecordIOParser<float>::ParseNext(std::vector<mxnet::io::InstVector<float>, std::allocator<mxnet::io::InstVector<float> > >*) [clone ._omp_fn.3]': iter_image_det_recordio.cc:(.text._ZN5mxnet2io22ImageDetRecordIOParserIfE9ParseNextEPSt6vectorINS0_10InstVectorIfEESaIS5_EE._omp_fn.3[_ZN5mxnet2io22ImageDetRecordIOParserIfE9ParseNextEPSt6vectorINS0_10InstVectorIfEESaIS5_EE]+0x1cc2): undefined reference to cv::String::allocate(unsigned long)'
iter_image_det_recordio.cc:(.text._ZN5mxnet2io22ImageDetRecordIOParserIfE9ParseNextEPSt6vectorINS0_10InstVectorIfEESaIS5_EE._omp_fn.3[_ZN5mxnet2io22ImageDetRecordIOParserIfE9ParseNextEPSt6vectorINS0_10InstVectorIfEESaIS5_EE]+0x1d09): undefined reference to cv::error(int, cv::String const&, char const*, char const*, int)' iter_image_det_recordio.cc:(.text._ZN5mxnet2io22ImageDetRecordIOParserIfE9ParseNextEPSt6vectorINS0_10InstVectorIfEESaIS5_EE._omp_fn.3[_ZN5mxnet2io22ImageDetRecordIOParserIfE9ParseNextEPSt6vectorINS0_10InstVectorIfEESaIS5_EE]+0x1d11): undefined reference to cv::String::deallocate()'
iter_image_det_recordio.cc:(.text._ZN5mxnet2io22ImageDetRecordIOParserIfE9ParseNextEPSt6vectorINS0_10InstVectorIfEESaIS5_EE._omp_fn.3[_ZN5mxnet2io22ImageDetRecordIOParserIfE9ParseNextEPSt6vectorINS0_10InstVectorIfEESaIS5_EE]+0x2029): undefined reference to `cv::String::deallocate()'
collect2: error: ld returned 1 exit status
smd_hpi/tools/model-converter/CMakeFiles/model-converter.dir/build.make:110: recipe for target 'smd_hpi/tools/model-converter/model-converter' failed
make[2]: *** [smd_hpi/tools/model-converter/model-converter] Error 1
CMakeFiles/Makefile2:1339: recipe for target 'smd_hpi/tools/model-converter/CMakeFiles/model-converter.dir/all' failed
make[1]: *** [smd_hpi/tools/model-converter/CMakeFiles/model-converter.dir/all] Error 2
Makefile:138: recipe for target 'all' failed
make: *** [all] Error 2

QConvolution mutates weights running forward

As with #30, QConvolution also mutates its input weights when running forward, but only when in training mode. The weights should only be mutated by the actual weight update performed by the optimizer. Instead weight quantization should be performed in temporary storage.

Unlike QFullyConnected, QConvolution does not even perform weight quantization when not in training mode! This means that given the same input weights, it will produce different outputs depending on whether training mode is enabled. Perhaps this was intended as an optimization, but it really is a bad idea to change the semantics of an operator based on the training state unless you have a very good reason (as in BatchNorm). Since the number of weights will typically be a small fraction of the output volume of a mini-batch, it probably isn't worth dropping the quantization step when not training.

Really I think you should try to eliminate all calls to ctx.is_train in the implementation.

Is Amalgamation method supported by BMXNet?

Haven't try it yet, but putting original MXNet models into mobile devices(say Android) using the amalgamation method is a error-prone thing. So is there anything needed to be paid extra attention when porting BMXNet models?

segv on QFullyConnected with one bit and one or two outputs.

>>> mx.ndarray.QFullyConnected(data=mx.ndarray.ones((1,64)), weight=mx.ndarray.ones((2,64)), 
                                            num_hidden=2, act_bit=1, weight_bit=1)
Process finished with exit code 139 (interrupted by signal 11: SIGSEGV)

This is a Mac OSX build with USE_CUDA=OFF and USE_OPENMP=OFF

cudnn_off=False has some problems with mx.mon.Monitor

Hi,
I use cudnn_off=False ,but the output is nan when I use mx.mon.Monitor when training(left hand side of the figure below),If I use cudnn_off=True,the output value is more reasonable(right hand side of the figure)
Could you please help me?thanks a lot!!

conv2_act1 = mx.sym.QActivation(data=batch1_3, act_bit=1, backward_only=True, name="conv2_act1")
conv2_1 = mx.sym.QConvolution(
      data=conv2_act1,pad=(1, 1), kernel=(3,3), num_filter=128,  act_bit=1, weight_bit=1,cudnn_off=True, name="conv2_1")

image

Running SSD with binary models

Hi,

I'm wondering if it's possible to run SSD within the examples using binary models and if so, how would one do it?

Thank you.

CUDA: an illegal memory access was encountered when training on GPU

I want to train binary/8bit on imagenet using train_imagenet.py under BMXNet/smd_hpi/examples/binary-imagenet1k. I did a few line of change inside parser.set_defaults(), as following:

parser.set_defaults(
# network
network = 'resnet-binary', //changed from 'resnet'
num_layers = 18,

    # data
    num_classes      = 1000,
    num_examples     = 1281167,
    image_shape      = '3,224,224',
    min_random_scale = 1, # if input image has min size k, suggest to use
                          # 256.0/x, e.g. 0.533 for 480
    # train
    num_epochs       = 60,
    lr_step_epochs   = '20,40',
    lr               = 0.001,
    lr_factor        = 0.1,
    batch_size     = 32,
    optimizer        = 'sgd',
    disp_batches     = 10,
    top_k            = 5,
    act_bit          = 8, //added this line, or act_bit=1 or 32 all not working on GPU
)

I used this command to train:
python train_imagenet.py --gpus '0' --data-train imagenet-train.rec --data-val imagenet-val.rec --batch-size 32

If I use GPU, it gives a CUDA illegal memory access error, if training on CPU (--gpu '') the training proceed fine. Please suggest the possible problem. I tried on two different ubuntu machine, the same thing. Thank you!

Environment info

Ubuntu 16.04
cuda-8.0
cudnn6
Python 2.7.12
GTX 1080 Ti
Lastest BMXNet code at commit c6624e

Error Message:

INFO:root:start with arguments Namespace(act_bit=8, batch_size=32, benchmark=0, data_nthreads=4, data_train='/home/local/ANT/luxial/DATA/ImageNet/imagenet-train.rec', data_val='/home/local/ANT/luxial/DATA/ImageNet/imagenet-val.rec', disp_batches=10, gpus='0', image_shape='3,224,224', kv_store='device', load_epoch=None, log_file='train.log', lr=0.001, lr_factor=0.1, lr_step_epochs='20,40', max_random_aspect_ratio=0.25, max_random_h=36, max_random_l=50, max_random_rotate_angle=10, max_random_s=50, max_random_scale=1, max_random_shear_ratio=0.1, min_random_scale=1, model_prefix='./myImageNetModels/resnet-18-8bit', mom=0.9, monitor=0, network='resnet-binary', num_classes=1000, num_epochs=60, num_examples=1281167, num_layers=18, optimizer='sgd', pad_size=0, pretrained=None, random_crop=1, random_mirror=1, rgb_mean='123.68,116.779,103.939', test_io=0, top_k=5, wd=0.0001)
[19:47:51] /home/local/ANT/luxial/BMXNet/src/io/iter_image_recordio_2.cc:135: ImageRecordIOParser2: /home/local/ANT/luxial/DATA/ImageNet/imagenet-train.rec, use 3 threads for decoding..
[19:47:53] /home/local/ANT/luxial/BMXNet/src/io/iter_image_recordio_2.cc:135: ImageRecordIOParser2: /home/local/ANT/luxial/DATA/ImageNet/imagenet-val.rec, use 3 threads for decoding..
[19:47:54] /home/local/ANT/luxial/BMXNet/src/operator/././cudnn_algoreg-inl.h:65: Running performance tests to find the best convolution algorithm, this can take a while... (setting env variable MXNET_CUDNN_AUTOTUNE_DEFAULT to 0 to disable)
[19:47:55] /home/local/ANT/luxial/BMXNet/dmlc-core/include/dmlc/./logging.h:308: [19:47:55] /home/local/ANT/luxial/BMXNet/mshadow/mshadow/./stream_gpu-inl.h:55: Check failed: e == cudaSuccess CUDA: an illegal memory access was encountered

Stack trace returned 9 entries:
[bt] (0) /home/local/ANT/luxial/BMXNet/build/libmxnet.so(_ZN4dmlc15LogMessageFatalD1Ev+0x39) [0x7f236fa98cd9]
[bt] (1) /home/local/ANT/luxial/BMXNet/build/libmxnet.so(_ZN7mshadow6StreamINS_3gpuEE4WaitEv+0xd8) [0x7f236fabef98]
[bt] (2) /home/local/ANT/luxial/BMXNet/build/libmxnet.so(+0x6b39af) [0x7f236faa59af]
[bt] (3) /home/local/ANT/luxial/BMXNet/build/libmxnet.so(_ZN5mxnet6engine14ThreadedEngine15ExecuteOprBlockENS_10RunContextEPNS0_8OprBlockE+0x93) [0x7f2370a8bce3]
[bt] (4) /home/local/ANT/luxial/BMXNet/build/libmxnet.so(ZNSt17_Function_handlerIFvSt10shared_ptrIN5mxnet6engine10ThreadPool11SimpleEventEEEZZNS2_23ThreadedEnginePerDevice13PushToExecuteEPNS2_8OprBlockEbENKUlvE1_clEvEUlS5_E_E9_M_invokeERKSt9_Any_dataOS5+0x123) [0x7f2370a95693]
[bt] (5) /home/local/ANT/luxial/BMXNet/build/libmxnet.so(_ZNSt6thread5_ImplISt12_Bind_simpleIFSt8functionIFvSt10shared_ptrIN5mxnet6engine10ThreadPool11SimpleEventEEEES8_EEE6_M_runEv+0x4e) [0x7f2370a8e3ee]
[bt] (6) /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xb8c80) [0x7f234b12cc80]
[bt] (7) /lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba) [0x7f239036c6ba]
[bt] (8) /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7f23900a23dd]

[19:47:55] /home/local/ANT/luxial/BMXNet/dmlc-core/include/dmlc/./logging.h:308: [19:47:55] /home/local/ANT/luxial/BMXNet/src/engine/./threaded_engine.h:329: [19:47:55] /home/local/ANT/luxial/BMXNet/mshadow/mshadow/./stream_gpu-inl.h:55: Check failed: e == cudaSuccess CUDA: an illegal memory access was encountered

Stack trace returned 9 entries:
[bt] (0) /home/local/ANT/luxial/BMXNet/build/libmxnet.so(_ZN4dmlc15LogMessageFatalD1Ev+0x39) [0x7f236fa98cd9]
[bt] (1) /home/local/ANT/luxial/BMXNet/build/libmxnet.so(_ZN7mshadow6StreamINS_3gpuEE4WaitEv+0xd8) [0x7f236fabef98]
[bt] (2) /home/local/ANT/luxial/BMXNet/build/libmxnet.so(+0x6b39af) [0x7f236faa59af]
[bt] (3) /home/local/ANT/luxial/BMXNet/build/libmxnet.so(_ZN5mxnet6engine14ThreadedEngine15ExecuteOprBlockENS_10RunContextEPNS0_8OprBlockE+0x93) [0x7f2370a8bce3]
[bt] (4) /home/local/ANT/luxial/BMXNet/build/libmxnet.so(ZNSt17_Function_handlerIFvSt10shared_ptrIN5mxnet6engine10ThreadPool11SimpleEventEEEZZNS2_23ThreadedEnginePerDevice13PushToExecuteEPNS2_8OprBlockEbENKUlvE1_clEvEUlS5_E_E9_M_invokeERKSt9_Any_dataOS5+0x123) [0x7f2370a95693]
[bt] (5) /home/local/ANT/luxial/BMXNet/build/libmxnet.so(_ZNSt6thread5_ImplISt12_Bind_simpleIFSt8functionIFvSt10shared_ptrIN5mxnet6engine10ThreadPool11SimpleEventEEEES8_EEE6_M_runEv+0x4e) [0x7f2370a8e3ee]
[bt] (6) /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xb8c80) [0x7f234b12cc80]
[bt] (7) /lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba) [0x7f239036c6ba]
[bt] (8) /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7f23900a23dd]

An fatal error occurred in asynchronous engine operation. If you do not know what caused this error, you can try set environment variable MXNET_ENGINE_TYPE to NaiveEngine and run with debugger (i.e. gdb). This will force all operations to be synchronous and backtrace will give you the series of calls that lead to this error. Remember to set MXNET_ENGINE_TYPE back to empty after debugging.

Stack trace returned 7 entries:
[bt] (0) /home/local/ANT/luxial/BMXNet/build/libmxnet.so(_ZN4dmlc15LogMessageFatalD1Ev+0x39) [0x7f236fa98cd9]
[bt] (1) /home/local/ANT/luxial/BMXNet/build/libmxnet.so(_ZN5mxnet6engine14ThreadedEngine15ExecuteOprBlockENS_10RunContextEPNS0_8OprBlockE+0x36b) [0x7f2370a8bfbb]
[bt] (2) /home/local/ANT/luxial/BMXNet/build/libmxnet.so(ZNSt17_Function_handlerIFvSt10shared_ptrIN5mxnet6engine10ThreadPool11SimpleEventEEEZZNS2_23ThreadedEnginePerDevice13PushToExecuteEPNS2_8OprBlockEbENKUlvE1_clEvEUlS5_E_E9_M_invokeERKSt9_Any_dataOS5+0x123) [0x7f2370a95693]
[bt] (3) /home/local/ANT/luxial/BMXNet/build/libmxnet.so(_ZNSt6thread5_ImplISt12_Bind_simpleIFSt8functionIFvSt10shared_ptrIN5mxnet6engine10ThreadPool11SimpleEventEEEES8_EEE6_M_runEv+0x4e) [0x7f2370a8e3ee]
[bt] (4) /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xb8c80) [0x7f234b12cc80]
[bt] (5) /lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba) [0x7f239036c6ba]
[bt] (6) /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7f23900a23dd]

terminate called after throwing an instance of 'dmlc::Error'
what(): [19:47:55] /home/local/ANT/luxial/BMXNet/src/engine/./threaded_engine.h:329: [19:47:55] /home/local/ANT/luxial/BMXNet/mshadow/mshadow/./stream_gpu-inl.h:55: Check failed: e == cudaSuccess CUDA: an illegal memory access was encountered

Stack trace returned 9 entries:
[bt] (0) /home/local/ANT/luxial/BMXNet/build/libmxnet.so(_ZN4dmlc15LogMessageFatalD1Ev+0x39) [0x7f236fa98cd9]
[bt] (1) /home/local/ANT/luxial/BMXNet/build/libmxnet.so(_ZN7mshadow6StreamINS_3gpuEE4WaitEv+0xd8) [0x7f236fabef98]
[bt] (2) /home/local/ANT/luxial/BMXNet/build/libmxnet.so(+0x6b39af) [0x7f236faa59af]
[bt] (3) /home/local/ANT/luxial/BMXNet/build/libmxnet.so(_ZN5mxnet6engine14ThreadedEngine15ExecuteOprBlockENS_10RunContextEPNS0_8OprBlockE+0x93) [0x7f2370a8bce3]
[bt] (4) /home/local/ANT/luxial/BMXNet/build/libmxnet.so(ZNSt17_Function_handlerIFvSt10shared_ptrIN5mxnet6engine10ThreadPool11SimpleEventEEEZZNS2_23ThreadedEnginePerDevice13PushToExecuteEPNS2_8OprBlockEbENKUlvE1_clEvEUlS5_E_E9_M_invokeERKSt9_Any_dataOS5+0x123) [0x7f2370a95693]
[bt] (5) /home/local/ANT/luxial/BMXNet/build/libmxnet.so(_ZNSt6thread5_ImplISt12_Bind_simpleIFSt8functionIFvSt10shared_ptrIN5mxnet6engine10ThreadPool11SimpleEventEEEES8_EEE6_M_runEv+0x4e) [0x7f2370a8e3ee]
[bt] (6) /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xb8c80) [0x7f234b12cc80]
[bt] (7) /lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba) [0x7f239036c6ba]
[bt] (8) /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7f23900a23dd]

An fatal error occurred in asynchronous engine operation. If you do not know what caused this error, you can try set environment variable MXNET_ENGINE_TYPE to NaiveEngine and run with debugger (i.e. gdb). This will force all operations to be synchronous and backtrace will give you the series of calls that lead to this error. Remember to set MXNET_ENGINE_TYPE back to empty after debugging.

Stack trace returned 7 entries:
[bt] (0) /home/local/ANT/luxial/BMXNet/build/libmxnet.so(_ZN4dmlc15LogMessageFatalD1Ev+0x39) [0x7f236fa98cd9]
[bt] (1) /home/local/ANT/luxial/BMXNet/build/libmxnet.so(_ZN5mxnet6engine14ThreadedEngine15ExecuteOprBlockENS_10RunContextEPNS0_8OprBlockE+0x36b) [0x7f2370a8bfbb]
[bt] (2) /home/local/ANT/luxial/BMXNet/build/libmxnet.so(ZNSt17_Function_handlerIFvSt10shared_ptrIN5mxnet6engine10ThreadPool11SimpleEventEEEZZNS2_23ThreadedEnginePerDevice13PushToExecuteEPNS2_8OprBlockEbENKUlvE1_clEvEUlS5_E_E9_M_invokeERKSt9_Any_dataOS5+0x123) [0x7f2370a95693]
[bt] (3) /home/local/ANT/luxial/BMXNet/build/libmxnet.so(_ZNSt6thread5_ImplISt12_Bind_simpleIFSt8functionIFvSt10shared_ptrIN5mxnet6engine10ThreadPool11SimpleEventEEEES8_EEE6_M_runEv+0x4e) [0x7f2370a8e3ee]
[bt] (4) /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xb8c80) [0x7f234b12cc80]
[bt] (5) /lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba) [0x7f239036c6ba]
[bt] (6) /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7f23900a23dd]

P.S.

  1. I am able to train standard network on GPU with the same script (e.g. network = 'resnet'). Just not the quantized version where QActivation is used.
  2. Evaluation of the pretrained network BMXNet/smd_hpi/binary_models/binarized_imagenet-resnet18-64bit or binarized_imagenet-resnet18-64bit-1st-stage-fullprecision is the same thing, only works for CPU.

gluon api

It would be nice if you provided gluon HybridBlock implementations for the Q* functions along the lines of the existing gluon.nn blocks (i.e QDense, QConv1, QConv2, QConv3).

Problems: Q series API function with faster rcnn in mxnet/examples/rcnn

Hi guys,

I already can use Q series API function to train and test on mnist dataset
Now I wants to test the pretrained model from smd_hpi/examples/binary-imagenet1k with faster rcnn code which in mxnet/examples/rcnn.So I setup the environment for rcnn requirements using the script file in mxnet/examples/rcnn/script/additional_deps.sh,and this file has list below,and I can run faster rcnn properly.
However I can't run my original Q series API function well,and get the error below:
AttributeError: 'module' object has no attribute 'QActivation'

Sorry,I'm not familiar with Linux so I don't know how to solve this problems
thanks a lot.

Best Regards,
Peng

error message:
[13:35:09] /home/jacky4323/BMXNet_New/mxnet/src/operator/././cudnn_algoreg-inl.h:107: Running performance tests to find the best convolution algorithm, this can take a while... (setting env variable MXNET_CUDNN_AUTOTUNE_DEFAULT to 0 to disable)
[13:35:29] /home/jacky4323/BMXNet_New/mxnet/dmlc-core/include/dmlc/logging.h:308: [13:35:29] /home/jacky4323/BMXNet_New/mxnet/src/operator/contrib/proposal.cu:495: Check failed: error == cudaSuccess (7 vs. 0) too many resources requested for launch

Stack trace returned 10 entries:
[bt] (0) /home/jacky4323/BMXNet_New/mxnet/build/libmxnet.so(_ZN4dmlc15LogMessageFatalD1Ev+0x3c) [0x7f49e2020e9c]
[bt] (1) /home/jacky4323/BMXNet_New/mxnet/build/libmxnet.so(ZN5mxnet2op13ProposalGPUOpIN7mshadow3gpuEE7ForwardERKNS_9OpContextERKSt6vectorINS_5TBlobESaIS9_EERKS8_INS_9OpReqTypeESaISE_EESD_SD+0x12b9) [0x7f49e4e1f2c9]
[bt] (2) /home/jacky4323/BMXNet_New/mxnet/build/libmxnet.so(ZN5mxnet2op13OperatorState7ForwardERKNS_9OpContextERKSt6vectorINS_5TBlobESaIS6_EERKS5_INS_9OpReqTypeESaISB_EESA+0x36d) [0x7f49e228c4ed]
[bt] (3) /home/jacky4323/BMXNet_New/mxnet/build/libmxnet.so(_ZN5mxnet4exec23StatefulComputeExecutor3RunENS_10RunContextEb+0x69) [0x7f49e2193e69]
[bt] (4) /home/jacky4323/BMXNet_New/mxnet/build/libmxnet.so(+0x992210) [0x7f49e2158210]
[bt] (5) /home/jacky4323/BMXNet_New/mxnet/build/libmxnet.so(_ZN5mxnet6engine14ThreadedEngine15ExecuteOprBlockENS_10RunContextEPNS0_8OprBlockE+0x93) [0x7f49e204fa83]
[bt] (6) /home/jacky4323/BMXNet_New/mxnet/build/libmxnet.so(_ZN5mxnet6engine23ThreadedEnginePerDevice9GPUWorkerILN4dmlc19ConcurrentQueueTypeE0EEEvNS_7ContextEbPNS1_17ThreadWorkerBlockIXT_EEESt10shared_ptrINS0_10ThreadPool11SimpleEventEE+0x10b) [0x7f49e205889b]
[bt] (7) /home/jacky4323/BMXNet_New/mxnet/build/libmxnet.so(ZNSt17_Function_handlerIFvSt10shared_ptrIN5mxnet6engine10ThreadPool11SimpleEventEEEZZNS2_23ThreadedEnginePerDevice13PushToExecuteEPNS2_8OprBlockEbENKUlvE1_clEvEUlS5_E_E9_M_invokeERKSt9_Any_dataOS5+0x63) [0x7f49e2058ac3]
[bt] (8) /home/jacky4323/BMXNet_New/mxnet/build/libmxnet.so(_ZNSt6thread5_ImplISt12_Bind_simpleIFSt8functionIFvSt10shared_ptrIN5mxnet6engine10ThreadPool11SimpleEventEEEES8_EEE6_M_runEv+0x4a) [0x7f49e205222a]
[bt] (9) /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xb8c80) [0x7f4a60b89c80]

[13:35:29] /home/jacky4323/BMXNet_New/mxnet/dmlc-core/include/dmlc/logging.h:308: [13:35:29] /home/jacky4323/BMXNet_New/mxnet/src/engine/./threaded_engine.h:359: [13:35:29] /home/jacky4323/BMXNet_New/mxnet/src/operator/contrib/proposal.cu:495: Check failed: error == cudaSuccess (7 vs. 0) too many resources requested for launch

additional_deps.sh file content has shown below:

##!/usr/bin/env bash

##install additional depts
sudo apt install python-pip python-dev unzip python-matplotlib
sudo pip install cython scikit-image easydict

install a forked MXNet

pushd ../../
cp make/config.mk ./
echo "USE_CUDA=1" >>config.mk
echo "USE_CUDA_PATH=/usr/local/cuda" >>config.mk
echo "USE_CUDNN=1" >>config.mk
make -j$(nproc)
pushd python
python setup.py install --user
popd
popd

build cython extension

make

AttributeError: 'module' object has no attribute 'QActivation'

I am having issue as shown in the topic, it seems that the smd_hpi/src files are not get compiled.
I am using Ubuntu 16.04. I don't see these .o files under build/.
I did follow the steps in the Setup section. The changes are:

  • make/config.mk: USE_MKL2017 = 1, USE_MKL2017_EXPERIMENTAL = 1, USE_BLAS = openblas (Note USE_CUDA is 0 as original)
  • the main CMakeLists.txt: line 14 changed USE_CUDA defaults to OFF, otherwise compile issue.

mxnet error: redeclaration of 'kNone' when setting USE_CPP_PACKAGE = 1

Hi, I followed your setup instructions and could build the project successfully.
But when I set USE_CPP_PACKAGE = 1, the error occurs:

default

The redeclaration looks like this:
enum class QConvolutionScalingMode {
kNone = 0,
kBackward = 1,
kForward = 2,
kNone = 3
};

I tried to delete one 'kNone' in line 6866 of 'mxnet-cpp/op.h'. But the file seems to be generated by a wrapper. So the modification is useless.
Any help would be greatly appreciated.
Thank you.

Some things to document

The documentation for the various Q* operators omits many details. Some points that should be described include:

  • QActivation in 1-bit mode quantizes to the values +1/-1, but in higher-bit modes quantizes to range [0,1].

  • QFullyConnected quantizes inputs according to QActivation, and quantizes weights to range [-1,1] prior to computing matrix-vector product.

  • QFullyConnected in the case of 1-bit activation and weights actually returns a scaled and offset version of the matrix product so that the output values are in the range [0,#inputs]. For other bit settings it computes a regular matrix product with outputs in range [-#inputs,+#inputs].

I assume similar considerations apply to QConvolution.

Where do you implement XNOR networks?

Hi @yanghaojin

Thanks for your great work. In the paper, you claimed that BMXNet uses the algorithm of XNOR networks. Could you show me where you did implement for Convolution layers? In addition, could you tell me which binary imagenet models you applied XNOR networks for binarizing?

Thanks,
Hai

Resnet-binary not working on 32 bit system

I used resnet-binary symbol to train on 224x224 images, converted the model and tried the inference on multiple systems. The model works fine on 64 bit system but gives the same prediction score for all the samples tested on 32 bit. I was wondering what the differences are between 64 and 32 and if there is anything I can do to fix that?
Thanks in advance!

Plan to merge to upstream?

Many thanks for the great work! I'm wondering if you have any plan to merge your codes to upstream dmlc/mxnet?

Error while doing inference on binarized model.

Hi authors,
I have trained a weight binarized model successfully and I can do inference on it before doing model binarizing. But while I convert the model to binarized format which has a much smaller size, an error occurs. Can you help me? The log shows below(in gpu mode):

Traceback (most recent call last):
  File "benchmark.py", line 55, in <module>
    model.forward(db, is_train=False)
  File "/root/codebase/BMXNet/python/mxnet/module/module.py", line 609, in forward
    self._exec_group.forward(data_batch, is_train)
  File "/root/codebase/BMXNet/python/mxnet/module/executor_group.py", line 416, in forward
    exec_.forward(is_train=is_train)
  File "/root/codebase/BMXNet/python/mxnet/executor.py", line 150, in forward
    ctypes.c_int(int(is_train))))
  File "/root/codebase/BMXNet/python/mxnet/base.py", line 146, in check_call
    raise MXNetError(py_str(_LIB.MXGetLastError()))
mxnet.base.MXNetError: [21:14:12] /root/codebase/BMXNet/include/mxnet/./tensor_blob.h:217: Check failed: mshadow::DataType<DType>::kFlag == type_flag_ TBlob.get_with_shape: data type do not match specified type.Expected: 1 v.s. given 0

And in CPU mode:

Traceback (most recent call last):
  File "benchmark.py", line 55, in <module>
    model.forward(db, is_train=False)
  File "/root/codebase/BMXNet/python/mxnet/module/module.py", line 609, in forward
    self._exec_group.forward(data_batch, is_train)
  File "/root/codebase/BMXNet/python/mxnet/module/executor_group.py", line 416, in forward
    exec_.forward(is_train=is_train)
  File "/root/codebase/BMXNet/python/mxnet/executor.py", line 150, in forward
    ctypes.c_int(int(is_train))))
  File "/root/codebase/BMXNet/python/mxnet/base.py", line 146, in check_call
    raise MXNetError(py_str(_LIB.MXGetLastError()))
mxnet.base.MXNetError: [21:15:01] /root/codebase/BMXNet/mshadow/mshadow/./././dot_engine-inl.h:763: Check failed: dst.size(0) == sleft[0] && dst.size(1) == sright[1] && sleft[1] == sright[0] dot-gemm: matrix shape mismatch

Question about the speed of binary_mnist

Hey guys i trained a binary model and a normal ones (LeNet) using my own dataset adapting your code and everything went fine but when i try to measure speed using the classify method on the train_val.py file it says that the LeNet takes less time to classify the images than the BinaryLeNet, do you guys now why? It should be a lot more faster than the normal ones

Low-bit width gradients

Is there any plan to implement the stochastic quantization for gradients as described in the DoReFa-Net paper?

Could not turn off USE_CUDA

There is a CMake error while compiling on raspberrypi:
CMake Error at mshadow/cmake/mshadow.cmake:46 (message):
-- CUDA is disabled.
Call Stack (most recent call first):
CMakeLists.txt:126 (include)

Operating System:
raspbian

I've changed something in CMakeLists.txt and cmake again but it didn't work:

mxnet_option(USE_CUDA "Build with CUDA support" OFF)
mxnet_option(USE_CUDNN "Build with cudnn support" OFF)

I use MESSAGE to print the variable USE_CUDA while cmake, it is still ON.

Can you tell me where is the right place to set USE_CUDA OFF?

Issues with word-based binary weight format

Currently, QFullyConnected and QConvolution when using the binary weight format
(binarized_weights_only=True) store weights packed into machine words. There are
two issues with this:

  1. It is difficult to share the same weights between architectures with different word size
    because the operator demands that the size of the weights is in units of machine word.
    It seems like there is no reason that a model that works on a 64-bit machine should not
    work without modification on a 32-bit machine.

  2. The format inherently assumes a particular byte ordering, so weights will not load
    correctly when going between a bigendian and little endian machine. This may only
    matter when loading weights from disk.

It seems bad to have a weight format that is not portable across machine architectures.

Alternatives would be:

  • bytes
  • use int32 regardless of machine word
  • allow multiple integer types as long as bit size times items is equal to bits in row

Cannot Build with USE_DIST_KVSTORE=ON

Environment info

Operating System: Ubuntu 16.04
gcc, g++ 5.4.0
protoc 3.5.0

Built protobuf from the source: https://github.com/google/protobuf/tree/master/src
and have all the required env variables:

PROTOBUF_INCLUDE_DIR="/usr/include/"
PROTOBUF_LIBRARY="/usr/lib/libprotobuf.so"
PROTOBUF_LIBRARY_DEBUG="/usr/lib/libprotobuf.so"
PROTOBUF_LITE_LIBRARY="/usr/lib/libprotobuf-lite.so"
PROTOBUF_LITE_LIBRARY_DEBUG="/usr/lib/libprotobuf-lite.so"
PROTOBUF_PROTOC_EXECUTABLE="/usr/bin/protoc"
PROTOBUF_PROTOC_LIBRARY="/usr/lib/libprotoc.so"
PROTOBUF_PROTOC_LIBRARY_DEBUG="/usr/lib/libprotoc.so"

Error Message:

You have called ADD_LIBRARY for library mxnet without any source files. This typically indicates a
problem with your CMakeLists.txt file

CMake Error at CMakeLists.txt:471 (target_link_libraries):
The "debug" argument must be followed by a library.

Minimum reproducible example

USE_CUDA=0, USE_OPENCV=1 and USE_DIST_KVSTORE=1
And when I remove that line in CMakeList.txt the tests fails!
Any idea what might be causing the problem?

This was also reported here with no solution!

forward speed problem

i'm trying to benchmark resnet-18 and resnet-binary-18 using the benchmark_score.py in the binary-imagenet1k folder.
the results are:
INFO:root:network: resnet-binary-18
INFO:root:device: gpu(0)
NFO:root:batch size 1, image/sec: 9.004076
INFO:root:batch size 8, image/sec: 12.565606
INFO:root:device: cpu(0)
INFO:root:batch size 1, image/sec: 0.661994
INFO:root:batch size 8, image/sec: 0.813232
INFO:root:network: resnet-18
INFO:root:device: gpu(0)
INFO:root:batch size 1, image/sec: 36.849257
INFO:root:batch size 8, image/sec: 48.341315
INFO:root:device: cpu(0)
INFO:root:batch size 1, image/sec: 0.552637
INFO:root:batch size 8, image/sec: 0.770370

it seems that binary resnet is a little faster than the vanilla resnet on CPU. However, in the figure 1 of the paper, binarized input and xnor_64_omp are 10 times faster than Cblas.
is this the normal speed of binary resnet or i've made something wrong? and i found that when testing with CPU, only 1 thread is used, seems openmp not working?

i'm using a ubuntu 14.04 machine, with openblas openmp installed.
CPU: Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz
GPU: titanx maxwell

The speed of the same model is rather slower than MXNet

Hi,

Thanks for such good work. I used the same structure on both MXNet and BMXNet. However, the speed of training process on MXNet is 400 imgs/sec meanwhile the speed on BMXNet is 30 imgs/sec. I checked the GPU and it worked. I use TitanXP and CUDA 8. Do you know how to solve this issue?

support for ndarray api

Description

Support new Q* operators in mxnet.ndarray module.

Details

The new operators are visible in the mxnet.ndarray module but when you try to invoke the input argument is not handled correctly. For example:

>>> import mxnet as mx
>>> x = mx.random.normal(shape=(1,3))
>>> y = mx.ndarray.QActivation(x, act_bit=1)
Traceback (most recent call last):
  File "<input>", line 1, in <module>
  File "<string>", line 37, in QActivation
  File "/Users/cbarber/ws/bmxnet/python/mxnet/_ctypes/ndarray.py", line 92, in _imperative_invoke
    ctypes.byref(out_stypes)))
  File "/Users/cbarber/ws/bmxnet/python/mxnet/base.py", line 146, in check_call
    raise MXNetError(py_str(_LIB.MXGetLastError()))
MXNetError: Invalid Parameter format for act_bit expect int (non-negative) but value='
[[-0.011113   -0.43647185  0.53347087]]
<NDArray 1x3 @cpu(0)>', in operator QActivation(name="", act_bit="
[[-0.011113   -0.43647185  0.53347087]]
<NDArray 1x3 @cpu(0)>")

Environment info (Required)

Built on a macbook pro with USE_CUDA and USE_OPENMP turned off.
Python 2.7

Built from source as of commit e4fc910.

Note that I had to change the library type from MODULE to SHARED in the root CMakeList.txt file for the cmake build to succeed.

tanh-based multi-bit weight quantization function is not idempotent

Currently when weight_bit > 1, QFullyConnected and QConvolution quantize weights by first squashing them using tanh:

        /*
         * Quantization for dataflow.
         * bit width > 1:
         *  Forward: w_i = 2 * quantize_k-bit(tanh(x_i)/2*max(|tanh(x_i)|) + 1/2) - 1
         */

This is the quantization scheme described in the DoReFa-Net paper. However, one big problem with this function is that it is not idempotent: if you apply it repeatedly it multiple times you may get different answers than just applying it once. That is, Q(Q(W)) != Q(W). This is not typically what you want in a quantization function. Normally you would expect a quantizer to be a no-op when applied to inputs that are already set to quantization targets.

While I can see that you would want to implement the DoReFa-Net function, I think it would make sense to make this behavior optional and also implement a simpler clipping based quantizer that would not suffer from this problem.

QActivation backward_only problems

Hi guys,

May I ask some questions?

question1: ALL Q series layers are all coding by c++ code ?
question2: QActivation layers with backward_only=True can back propagate quantization gradient,it means that you didn't write the calculate gradient of quantization function code in Qconvolution,Qfullconnectlayer?

thanks a lot

Best Regards,
Peng

No difference in binarized and full-precision LeNet model size

I tried your binarized mnist Lenet example and trained a binarized model and a normal model. The resulting binarized/normal params files are 4.6MB/4.6MB instead of 206kB/4.6MB in the paper. I saw your pretrained binarized model really achieved 206kB.

Am I missing something? Is there a compression step?

image-classification-predict

I use smd_hpi/examples/binary-imagenet1k/predict-cpp/image-classification-predict and a pretrained binary model to classify a image
when I use binarized_imagenet-resnet18-64bit-0040.params
the log is
model/Inception/Inception-BN-symbol.json ... 25049 bytes
model/Inception/Inception-BN-0126.params ... 3553824 bytes
model/Inception/mean_224.nd ... 602188 bytes
Segmentation fault
there is a Segmentation fault error.

when i use binarized_imagenet-resnet18-64bit-1st-stage-fullprecision-0038.params
whichever image I used to classify the log is
Accuracy[988] = 0.00000000
Accuracy[989] = 0.00000000
Accuracy[990] = 0.00000000
Accuracy[991] = 0.00000000
Accuracy[992] = 0.00000000
Accuracy[993] = 0.00000000
Accuracy[994] = 0.00000000
Accuracy[995] = 0.00000000
Accuracy[996] = 0.00000000
Accuracy[997] = 0.00000000
Accuracy[998] = 0.00000000
Accuracy[999] = 0.00000000
Best Result: [ tennis ball] id = 852, accuracy = 1.00000000

Any help would be appericated!

any plan to store parameters in 8-bits unsigned char?

The quantization on bit widths ranging from 2 to 31 bit is available mainly for scientific purpose. There is no speed or memory gain (rather the opposite since there are conversion steps) as the quantized values are still stored in full precision float variables.

to store all 2 to 31 bit to files maybe difficult. Any plan to support 8 bit parameter storage?

Runtime Error after modifying binary_mnist.py

For bugs or installation issues, please provide the following information.
The more information you provide, the more likely people will be able to help you.

Environment info

Operating System:
macOS
Compiler:

Package used (Python/R/Scala/Julia):
Python
MXNet version:

Or if installed from source:

MXNet commit hash (git rev-parse HEAD):

If you are using python package, please provide

Python version and distribution:

If you are using R package, please provide

R sessionInfo():

Error Message:

Please paste the full error message, including stack trace.

[16:20:03] /Users/zamyers/mxnet/mxnet/nnvm/include/dmlc/logging.h:300: [16:20:03] /Users/zamyers/mxnet/mxnet/smd_hpi/src/./q_convolution-inl.h:149: Check failed: data.shape_[1] % mxnet::op::xnor_cpu::BITS_PER_BINARY_WORD == 0 (1 vs. 0) input channel currently have to be multiple of 64 but are: 1

Stack trace returned 18 entries:
[bt] (0) 0 libmxnet.dylib 0x000000010e275511 _ZN4dmlc15LogMessageFatalD2Ev + 49
[bt] (1) 1 libmxnet.dylib 0x000000010e2639b5 _ZN4dmlc15LogMessageFatalD1Ev + 21
[bt] (2) 2 libmxnet.dylib 0x000000010fcd8ae7 ZN5mxnet2op14QConvolutionOpIN7mshadow3cpuEfE7ForwardERKNS_9OpContextERKNSt3__16vectorINS_5TBlobENS8_9allocatorISA_EEEERKNS9_INS_9OpReqTypeENSB_ISG_EEEESF_SF + 2951
[bt] (3) 3 libmxnet.dylib 0x000000010e391304 _ZN5mxnet4exec17ForwardOpExecutor3RunENS_10RunContextE + 116
[bt] (4) 4 libmxnet.dylib 0x000000010e3de1e9 _ZZN5mxnet4exec13GraphExecutor13InitCachedOpsEvENK3$_3clENS_10RunContextENS_6engine18CallbackOnCompleteE + 185
[bt] (5) 5 libmxnet.dylib 0x000000010e3de11f _ZNSt3__128__invoke_void_return_wrapperIvE6__callIJRZN5mxnet4exec13GraphExecutor13InitCachedOpsEvE3$3NS3_10RunContextENS3_6engine18CallbackOnCompleteEEEEvDpOT + 175
[bt] (6) 6 libmxnet.dylib 0x000000010e3dde99 _ZNSt3__110__function6__funcIZN5mxnet4exec13GraphExecutor13InitCachedOpsEvE3$3NS_9allocatorIS5_EEFvNS2_10RunContextENS2_6engine18CallbackOnCompleteEEEclEOS8_OSA + 73
[bt] (7) 7 libmxnet.dylib 0x000000010e346aa3 ZNKSt3__18functionIFvN5mxnet10RunContextENS1_6engine18CallbackOnCompleteEEEclES2_S4 + 179
[bt] (8) 8 libmxnet.dylib 0x000000010e37187f _ZN5mxnet6engine14ThreadedEngine15ExecuteOprBlockENS_10RunContextEPNS0_8OprBlockE + 1199
[bt] (9) 9 libmxnet.dylib 0x000000010e37aa0e _ZN5mxnet6engine23ThreadedEnginePerDevice9CPUWorkerILN4dmlc19ConcurrentQueueTypeE0EEEvPNS1_17ThreadWorkerBlockIXT_EEE + 94
[bt] (10) 10 libmxnet.dylib 0x000000010e37a99f _ZZZN5mxnet6engine23ThreadedEnginePerDevice13PushToExecuteEPNS0_8OprBlockEbENKUlvE_clEvENKUlvE_clEv + 31
[bt] (11) 11 libmxnet.dylib 0x000000010e37a96d ZNSt3__128__invoke_void_return_wrapperIvE6__callIJRZZN5mxnet6engine23ThreadedEnginePerDevice13PushToExecuteEPNS4_8OprBlockEbENKUlvE_clEvEUlvE_EEEvDpOT + 45
[bt] (12) 12 libmxnet.dylib 0x000000010e37a889 _ZNSt3__110__function6__funcIZZN5mxnet6engine23ThreadedEnginePerDevice13PushToExecuteEPNS3_8OprBlockEbENKUlvE_clEvEUlvE_NS_9allocatorIS8_EEFvvEEclEv + 41
[bt] (13) 13 libmxnet.dylib 0x000000010e373dae _ZNKSt3__18functionIFvvEEclEv + 126
[bt] (14) 14 libmxnet.dylib 0x000000010e3738f2 ZNSt3__114__thread_proxyINS_5tupleIJNS_8functionIFvvEEEEEEEEPvS6 + 402
[bt] (15) 15 libsystem_pthread.dylib 0x00007fffcad9793b _pthread_body + 180
[bt] (16) 16 libsystem_pthread.dylib 0x00007fffcad97887 _pthread_body + 0
[bt] (17) 17 libsystem_pthread.dylib 0x00007fffcad9708d thread_start + 13

[16:20:03] /Users/zamyers/mxnet/mxnet/nnvm/include/dmlc/logging.h:300: [16:20:03] /Users/zamyers/mxnet/mxnet/src/engine/./threaded_engine.h:336: [16:20:03] /Users/zamyers/mxnet/mxnet/smd_hpi/src/./q_convolution-inl.h:149: Check failed: data.shape_[1] % mxnet::op::xnor_cpu::BITS_PER_BINARY_WORD == 0 (1 vs. 0) input channel currently have to be multiple of 64 but are: 1

Minimum reproducible example

if you are using your own code, please provide a short script that reproduces the error.

Replaced

"""
def get_binary_lenet():
data = mx.symbol.Variable('data')

# first conv layer
conv1 = mx.sym.Convolution(data=data, kernel=(5,5), num_filter=64)	
tanh1 = mx.sym.Activation(data=conv1, act_type="tanh")
pool1 = mx.sym.Pooling(data=tanh1, pool_type="max", kernel=(2,2), stride=(2,2))
bn1 = mx.sym.BatchNorm(data=pool1)

# second conv layer
ba1 = mx.sym.QActivation(data=bn1, act_bit=BITA, backward_only=True)
conv2 = mx.sym.QConvolution(data=ba1, kernel=(5,5), num_filter=64, act_bit=BITW)
bn2 = mx.sym.BatchNorm(data=conv2)
pool2 = mx.sym.Pooling(data=bn2, pool_type="max", kernel=(2,2), stride=(2,2))

# first fullc layer
flatten = mx.sym.Flatten(data=pool2)	
ba2 = mx.sym.QActivation(data=flatten,  act_bit=BITA, backward_only=True)	
fc1 = mx.symbol.QFullyConnected(data=ba2, num_hidden=1000, act_bit=BITW)
#fc1 = mx.sym.Custom(data=fc1, op_type='debug')
bn3 = mx.sym.BatchNorm(data=fc1)
tanh3 = mx.sym.Activation(data=bn3, act_type="tanh")

# second fullc
fc2 = mx.sym.FullyConnected(data=tanh3, num_hidden=10)
# softmax loss
lenet = mx.sym.SoftmaxOutput(data=fc2, name='softmax')

print 'using quantized lenet with bitwidth %d (weights), %d (activations) and %d (gradients)' % (BITW, BITA, BITG)
return lenet

"""

with

"""
def get_binary_lenet():
data = mx.symbol.Variable('data')

# first conv layer
ba0 = mx.sym.QActivation(data=data, act_bit=BITA, backward_only=True)
conv0 = mx.sym.QConvolution(data=ba0, kernel=(5,5), num_filter=64, act_bit=BITW)
bn0 = mx.sym.BatchNorm(data=conv0)
pool0 = mx.sym.Pooling(data=bn0, pool_type="max", kernel=(2,2), stride=(2,2))

# second conv layer
ba1 = mx.sym.QActivation(data=pool0, act_bit=BITA, backward_only=True)
conv2 = mx.sym.QConvolution(data=ba1, kernel=(5,5), num_filter=64, act_bit=BITW)
bn2 = mx.sym.BatchNorm(data=conv2)
pool2 = mx.sym.Pooling(data=bn2, pool_type="max", kernel=(2,2), stride=(2,2))

# first fullc layer
flatten = mx.sym.Flatten(data=pool2)	
ba2 = mx.sym.QActivation(data=flatten,  act_bit=BITA, backward_only=True)	
fc1 = mx.symbol.QFullyConnected(data=ba2, num_hidden=1000, act_bit=BITW)
#fc1 = mx.sym.Custom(data=fc1, op_type='debug')
bn3 = mx.sym.BatchNorm(data=fc1)
tanh3 = mx.sym.Activation(data=bn3, act_type="tanh")

# second fullc
fc2 = mx.sym.QFullyConnected(data=tanh3, num_hidden=10, act_bit=BITW)
# softmax loss
lenet = mx.sym.SoftmaxOutput(data=fc2, name='softmax')

print 'using quantized lenet with bitwidth %d (weights), %d (activations) and %d (gradients)' % (BITW, BITA, BITG)
return lenet

"""

Steps to reproduce

or if you are running standard examples, please provide the commands you have run that lead to the error.

  1. Quantize the first layer of the network by duplicating the middle binary convolutional layer (code is provided above)
  2. Ran BMXnet without CUDA/GPU enabled
  3. Error occurs

What have you tried to solve it?

  1. Replacing Qconvolution with a normal convolution eliminates the error but the network no longer trains. Even with quantization activation set to 32b.

Bad indentation in resnet-binary.py

Bad code is frustrating for people who wants to reproduce your results. There's a clear incorrect indentation
in line 170 here so python train_cifar10.py --network resnet-binary cannot be executed as it was instructed here!

Even a small mistake damages the credibility of the results!

is there any bit-packing implementation?

I just have a quick question:
Did you implement any bit-packing after the 1-bit quantization and if so, would you please direct me to the source code?
Thanks in advance.

incorrect application of tanh-based weight quantization

For multi-bit weight quantization you have implemented the tanh-based squashing function as described in the DoReFa-Net paper. However, instead of incorporating its derivative in the weight updates you simply apply the quantization squashing and quantization in place and ignore the derivative of the squashing operation entirely.

For comparison, here is the DoReFa-Net quantization code. Note how it replaces the gradient of the quantization rounding with identity but does not modify the gradient of the squashing operations:

   def quantize(x, k):
        n = float(2**k - 1)
        with G.gradient_override_map({"Round": "Identity"}):
            return tf.round(x * n) / n

    def fw(x):
        if bitW == 32:
            return x
        if bitW == 1:   # BWN
            with G.gradient_override_map({"Sign": "Identity"}):
                E = tf.stop_gradient(tf.reduce_mean(tf.abs(x)))
                return tf.sign(x / E) * E
        x = tf.tanh(x)
        x = x / tf.reduce_max(tf.abs(x)) * 0.5 + 0.5
        return 2 * quantize(x, bitW) - 1

    def fa(x):
        if bitA == 32:
            return x
        return quantize(x, bitA)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.