aovoc / nnieqat-pytorch Goto Github PK

A nnie quantization aware training tool on pytorch.

License: MIT License

Makefile 9.97% Python 80.69% Dockerfile 0.78% C++ 1.69% Cuda 6.86%

nnie nnieqat-pytorch pytorch quantized-training

nnieqat-pytorch's Introduction

nnieqat-pytorch

Nnieqat is a quantize aware training package for Neural Network Inference Engine(NNIE) on pytorch, it uses hisilicon quantization library to quantize module's weight and activation as fake fp32 format.

nnieqat-pytorch
- Table of Contents
- Installation
- Usage
- Code Examples
- Results
- Todo
- Reference

Installation

Supported Platforms: Linux
Accelerators and GPUs: NVIDIA GPUs via CUDA driver 10.1 or 10.2.
Dependencies:
- python >= 3.5, < 4
- llvmlite >= 0.31.0
- pytorch >= 1.5
- numba >= 0.42.0
- numpy >= 1.18.1
Install nnieqat via pypi:
```
$ pip install nnieqat
```
Install nnieqat in docker(easy way to solve environment problems)：
```
$ cd docker
$ docker build -t nnieqat-image .
```

Install nnieqat via repo：

$ git clone https://github.com/aovoc/nnieqat-pytorch
$ cd nnieqat-pytorch
$ make install

Usage

add quantization hook.

quantize and dequantize weight and data with HiSVP GFPQ library in forward() process.

from nnieqat import quant_dequant_weight, unquant_weight, merge_freeze_bn, register_quantization_hook
...
...
  register_quantization_hook(model)
...

merge bn weight into conv and freeze bn

suggest finetuning from a well-trained model, merge_freeze_bn at beginning. do it after a few epochs of training otherwise.

from nnieqat import quant_dequant_weight, unquant_weight, merge_freeze_bn, register_quantization_hook
...
...
    model.train()
    model = merge_freeze_bn(model)  #it will change bn to eval() mode during training
...

Unquantize weight before update it

from nnieqat import quant_dequant_weight, unquant_weight, merge_freeze_bn, register_quantization_hook
...
...
    model.apply(unquant_weight)  # using original weight while updating
    optimizer.step()
...

Dump weight optimized model

from nnieqat import quant_dequant_weight, unquant_weight, merge_freeze_bn, register_quantization_hook
...
...
    model.apply(quant_dequant_weight)
    save_checkpoint(...)
    model.apply(unquant_weight)
...

Using EMA with caution(Not recommended).

Code Examples

Cifar10 quantization aware training example (add nnieqat into pytorch_cifar10_tutorial)

python test/test_cifar10.py
ImageNet quantization finetuning example (add nnieqat into pytorh_imagenet_main.py)

python test/test_imagenet.py --pretrained path_to_imagenet_dataset

Results

ImageNet

python test/test_imagenet.py /data/imgnet/ --arch squeezenet1_1  --lr 0.001 --pretrained --epoch 10   # nnie_lr_e-3_ft
python pytorh_imagenet_main.py /data/imgnet/ --arch squeezenet1_1  --lr 0.0001 --pretrained --epoch 10  # lr_e-4_ft
python test/test_imagenet.py /data/imgnet/ --arch squeezenet1_1  --lr 0.0001 --pretrained --epoch 10  # nnie_lr_e-4_ft

finetune result：

	trt_fp32	trt_int8	nnie
torchvision	0.56992	0.56424	0.56026
nnie_lr_e-3_ft	0.56600	0.56328	0.56612
lr_e-4_ft	0.57884	0.57502	0.57542
nnie_lr_e-4_ft	0.57834	0.57524	0.57730

coco

net: simplified yolov5s

train 300 epoches, hi3559 test result:

finetune 20 epoches, hi3559 test result:

Todo

Generate quantized model directly.

Reference

HiSVP 量化库使用指南

Quantizing deep convolutional networks for efficient inference: A whitepaper

8-bit Inference with TensorRT

Distilling the Knowledge in a Neural Network

nnieqat-pytorch's People

Contributors

Stargazers

Watchers

Forkers

robinhoodki zyc4me ichejun bytemeow tarfnet hxl1990 craft-zhang xinxin12345 seanxcwang goodgoodstudy92 mzpmzk ebugger sherylwang xtanitfy weitaoatvison frizy-up felix-liuying zpxiao61 yerniyaz derronqi starstylesky ashsur k9sret doublexxking lxgychen githubfragments yxpandjay wllkk fenguoo garricklin scott-mao jie311 doctorimage xuekunnan minionyh shenmayufei ml-inory youdutaidi robotseye jimmylauren

nnieqat-pytorch's Issues

why the loss increase much after add nnieqat?

thank you for yourshare..... I add nnieqat on my segmentation training task which has already convergence, but the loss become bigger after some batch, and the segmentation not work..., which may cause this problem?

你好！我需要如何使用nnieqat在我自己的工程中呢？

你好！
非常感谢你分享的nnieqat的相关工作。
目前，我在使用海思的3519芯片移植目标检测算法yolov3。目前遇到的问题是模型经过int8量化之后精度损失比较严重，因此想参考你的工作做模型的量化工作。
我不确定我对你的工作中提到的方法的使用的正确性，因此我想简单陈述一下我的理解（以v5为例），具体如下：
1）通过nnieqat量化库 fine-tuning一个训练好的基于pytorch框架的yolov5模型，之后得到了一个基于int8量化后的模型文件
2）关联pytorch模型和转换后caffe模型的对应层信息
3）将量化信息写到nnie_mapper cfg的gfpq_param_file中
4）如何转为wk模型文件？

谢谢博主！

merge_freeze_bn bug?

quantize.py line 188-190 是否有必要，我在测试mobilenet_v2时，发现注释掉这三行结果才是对的，可以帮忙看看吗？

import test() in init will coccupy gpu memory

you can delete it here: https://github.com/aovoc/nnieqat-pytorch/blob/master/nnieqat/__init__.py#L12)

libcublas.so.10 not found

Environment

Ubuntu 16.04
Miniconda installed Python3.7.3
nnieqat-pytorch, installed via git clone
NVIDIA RTX 2080, Driver Version 418.67, Driver CUDA Version 10.1
CUDA 10.0, located in /usr/local/cuda-10.0

The files in /usr/local/cuda-10.0/lib64 are:

libaccinj64.so                libcufftw.so             libnppc.so.10.0.130     libnppig.so.10.0.130   libnvblas.so.10.0.130
libaccinj64.so.10.0           libcufftw.so.10.0        libnppc_static.a        libnppig_static.a      libnvgraph.so
libaccinj64.so.10.0.130       libcufftw.so.10.0.145    libnppial.so            libnppim.so            libnvgraph.so.10.0
libcublas.so                  libcufftw_static.a       libnppial.so.10.0       libnppim.so.10.0       libnvgraph.so.10.0.130
libcublas.so.10               libcuinj64.so            libnppial.so.10.0.130   libnppim.so.10.0.130   libnvgraph_static.a
libcublas.so.10.0             libcuinj64.so.10.0       libnppial_static.a      libnppim_static.a      libnvjpeg.so
libcublas.so.10.0.130         libcuinj64.so.10.0.130   libnppicc.so            libnppist.so           libnvjpeg.so.10.0
libcublas_static.a            libculibos.a             libnppicc.so.10.0       libnppist.so.10.0      libnvjpeg.so.10.0.130
libcudadevrt.a                libcurand.so             libnppicc.so.10.0.130   libnppist.so.10.0.130  libnvjpeg.so.10.0.318
libcudart.so                  libcurand.so.10.0        libnppicc_static.a      libnppist_static.a     libnvjpeg_static.a
libcudart.so.10               libcurand.so.10.0.130    libnppicom.so           libnppisu.so           libnvrtc-builtins.so
libcudart.so.10.0             libcurand_static.a       libnppicom.so.10.0      libnppisu.so.10.0      libnvrtc-builtins.so.10.0
libcudart.so.10.0.130         libcusolver.so           libnppicom.so.10.0.130  libnppisu.so.10.0.130  libnvrtc-builtins.so.10.0.130
libcudart.so.10.1             libcusolver.so.10.0      libnppicom_static.a     libnppisu_static.a     libnvrtc.so
libcudart_static.a            libcusolver.so.10.0.130  libnppidei.so           libnppitc.so           libnvrtc.so.10.0
libcudnn.so                   libcusolver_static.a     libnppidei.so.10.0      libnppitc.so.10.0      libnvrtc.so.10.0.130
libcudnn.so.7                 libcusparse.so           libnppidei.so.10.0.130  libnppitc.so.10.0.130  libnvToolsExt.so
libcudnn.so.7.5.0             libcusparse.so.10.0      libnppidei_static.a     libnppitc_static.a     libnvToolsExt.so.1
libcudnn_static.a             libcusparse.so.10.0.130  libnppif.so             libnpps.so             libnvToolsExt.so.1.0.0
libcufft.so                   libcusparse_static.a     libnppif.so.10.0        libnpps.so.10.0        libOpenCL.so
libcufft.so.10.0              liblapack_static.a       libnppif.so.10.0.130    libnpps.so.10.0.130    libOpenCL.so.1
libcufft.so.10.0.145          libmetis_static.a        libnppif_static.a       libnpps_static.a       libOpenCL.so.1.1
libcufft_static.a             libnppc.so               libnppig.so             libnvblas.so           stubs
libcufft_static_nocallback.a  libnppc.so.10.0          libnppig.so.10.0        libnvblas.so.10.0

To reproduce

build and install

git clone https://github.com/aovoc/nnieqat-pytorch
cd nnieqat-pytorch
python setup.py build
python setup.py install

write up a test.py file, with only one line:

from nnieqat.modules import convert_layers

python test.py
Got error message:

(base) zz% python test.py
Error: Please import nniepat before torch modules.
Traceback (most recent call last):
  File "test.py", line 4, in <module>
    from nnieqat.modules import convert_layers
  File "/home/zz/soft/miniconda3/lib/python3.7/site-packages/nnieqat-0.1.0b0-py3.7.egg/nnieqat/modules/__init__.py", line 4, in <module>
    from .linear import Linear, Bilinear
  File "/home/zz/soft/miniconda3/lib/python3.7/site-packages/nnieqat-0.1.0b0-py3.7.egg/nnieqat/modules/linear.py", line 1, in <module>
    import nnieqat.gpu.quantize as Q
  File "/home/zz/soft/miniconda3/lib/python3.7/site-packages/nnieqat-0.1.0b0-py3.7.egg/nnieqat/gpu/__init__.py", line 5, in <module>
    from .quantize import QuantAndDeQuantGPU, test, quant_weight, \
  File "/home/zz/soft/miniconda3/lib/python3.7/site-packages/nnieqat-0.1.0b0-py3.7.egg/nnieqat/gpu/quantize.py", line 69, in <module>
    _QUANT_HANDLE = QuantAndDeQuantGPU()
  File "/home/zz/soft/miniconda3/lib/python3.7/site-packages/nnieqat-0.1.0b0-py3.7.egg/nnieqat/gpu/quantize.py", line 32, in __init__
    self._libquant = ctypes.cdll.LoadLibrary(libquant_path)
  File "/home/zz/soft/miniconda3/lib/python3.7/ctypes/__init__.py", line 434, in LoadLibrary
    return self._dlltype(name)
  File "/home/zz/soft/miniconda3/lib/python3.7/ctypes/__init__.py", line 356, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: /usr/local/cuda-10.0/lib64/libcublas.so.10: version `libcublas.so.10' not found (required by /home/zz/soft/miniconda3/lib/python3.7/site-packages/nnieqat-0.1.0b0-py3.7.egg/nnieqat/gpu/lib/libgfpq_gpu.so)

yolov5量化训练有问题

（1）您好，我按照readme的介绍在yolov5的源码中加入量化处理训练时，会出现问题：训练迭代三个epoch后map急剧下降，如下图所示：

（2）加载第2个epoch的模型时，利用gpu加载可以，但是利用cpu加载就会出现如下错误；即使是利用第2个epoch的模型进行gpu检测，是检测不到任何结果的，返回值全是nan：
执行ckpt = torch.load(w, map_location=map_location)出现错误：
Error at driver init:
[100] Call to cuInit results in CUDA_ERROR_NO_DEVICE:
（3）请问您能开源一下您的yolov5的量化训练代码吗？还请不吝赐教，谢谢啦！

量化感知训练检测网络

感谢作者开源此项目，请教个问题：

关于分类网络的QAT比较容易，但是没看到关于目标检测的QAT，请问目标检测如何使用量化感知训练呢？

Can QAT of GRU model be supported?

Does it support upsampling layer?

all = [
'Linear', 'Bilinear', 'Conv1d', 'Conv2d', 'Conv3d', 'ConvTranspose1d',
'ConvTranspose2d', 'ConvTranspose3d', 'AvgPool1d', 'AvgPool2d',
'AvgPool3d', 'MaxPool1d', 'MaxPool2d', 'MaxPool3d', 'MaxUnpool1d',
'MaxUnpool2d', 'MaxUnpool3d', 'FractionalMaxPool2d', 'LPPool1d',
'LPPool2d', 'AdaptiveMaxPool1d', 'AdaptiveMaxPool2d', 'AdaptiveMaxPool3d',
'AdaptiveAvgPool1d', 'AdaptiveAvgPool2d', 'AdaptiveAvgPool3d'
]
all layer type in nnieqat/modules/init.py ? if i have any other unsupported layer, how to use the code ??

参数导出问题

感谢作者的付出，有个问题想请教您，在训练保存模型后我希望导出网络的参数，用于部署在嵌入式芯片或FPGA上，但是我发现导出的参数并不是INT8数据的，依旧是浮点的，请问该如何操作？

torch.nn.Hardswish报错

您好！在我这边跑test_cifar10.py例子时，在quantize.py文件 if quant_activation and isinstance(module, (torch.nn.ReLU, torch.nn.Hardswish, torch.nn.ELU)):这句话报错，显示torch.nn.Hardswish无定义，通过上pytorch官网，发现1.6之前的版本没有torch.nn.Hardswish相关说明，我把pytorch版本由1.5升级到1.6后能够正常使用了，麻烦您确认一下看看是否需要修改版本相关信息，谢谢！

Can PReLU be quantized?

In the "_quantizing_activation" function, only ReLU, ELU and Hardswish are added, can PReLU be added here?

undefined symbol: _ZN6caffe26detail37_typeMetaDataInstance_preallocated_10E

系统：Ubuntu18.04
python版本: python3.7.9
CUDA版本：10.2
pytorch: 1.6.0
numpy: 1.19.1
llvmlite: 0.34.0
下面是make install的过程，应该是正常的。
(torch_env) guide@guide:~/Desktop/ErnestLi/Packages/nnieqat-pytorch-master$ make install
python setup.py install
make[1]: Entering directory '/home/guide/Desktop/ErnestLi/Packages/nnieqat-pytorch-master'
make[1]: Nothing to be done for 'default'.
make[1]: Leaving directory '/home/guide/Desktop/ErnestLi/Packages/nnieqat-pytorch-master'
running install
running bdist_egg
running egg_info
creating nnieqat.egg-info
writing nnieqat.egg-info/PKG-INFO
writing dependency_links to nnieqat.egg-info/dependency_links.txt
writing requirements to nnieqat.egg-info/requires.txt
writing top-level names to nnieqat.egg-info/top_level.txt
writing manifest file 'nnieqat.egg-info/SOURCES.txt'
reading manifest file 'nnieqat.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
writing manifest file 'nnieqat.egg-info/SOURCES.txt'
installing library code to build/bdist.linux-x86_64/egg
running install_lib
running build_py
creating build
creating build/lib.linux-x86_64-3.7
creating build/lib.linux-x86_64-3.7/nnieqat
copying nnieqat/quantize.py -> build/lib.linux-x86_64-3.7/nnieqat
copying nnieqat/init.py -> build/lib.linux-x86_64-3.7/nnieqat
running build_ext
building 'quant_impl' extension
creating /home/guide/Desktop/ErnestLi/Packages/nnieqat-pytorch-master/build/temp.linux-x86_64-3.7
creating /home/guide/Desktop/ErnestLi/Packages/nnieqat-pytorch-master/build/temp.linux-x86_64-3.7/src
Emitting ninja build file /home/guide/Desktop/ErnestLi/Packages/nnieqat-pytorch-master/build/temp.linux-x86_64-3.7/build.ninja...
Compiling objects...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
[1/1] c++ -MMD -MF /home/guide/Desktop/ErnestLi/Packages/nnieqat-pytorch-master/build/temp.linux-x86_64-3.7/src/fake_quantize.o.d -pthread -B /home/guide/anaconda3/envs/torch_env/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/home/guide/anaconda3/envs/torch_env/lib/python3.7/site-packages/torch/include -I/home/guide/anaconda3/envs/torch_env/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -I/home/guide/anaconda3/envs/torch_env/lib/python3.7/site-packages/torch/include/TH -I/home/guide/anaconda3/envs/torch_env/lib/python3.7/site-packages/torch/include/THC -I/usr/local/cuda-10.2/include -I/home/guide/anaconda3/envs/torch_env/include/python3.7m -c -c /home/guide/Desktop/ErnestLi/Packages/nnieqat-pytorch-master/src/fake_quantize.cpp -o /home/guide/Desktop/ErnestLi/Packages/nnieqat-pytorch-master/build/temp.linux-x86_64-3.7/src/fake_quantize.o -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=quant_impl -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++14
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
In file included from /home/guide/anaconda3/envs/torch_env/lib/python3.7/site-packages/torch/include/ATen/Parallel.h:149:0,
from /home/guide/anaconda3/envs/torch_env/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/utils.h:3,
from /home/guide/anaconda3/envs/torch_env/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:5,
from /home/guide/anaconda3/envs/torch_env/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/nn.h:3,
from /home/guide/anaconda3/envs/torch_env/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/all.h:7,
from /home/guide/anaconda3/envs/torch_env/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/torch.h:3,
from /home/guide/Desktop/ErnestLi/Packages/nnieqat-pytorch-master/src/fake_quantize.h:9,
from /home/guide/Desktop/ErnestLi/Packages/nnieqat-pytorch-master/src/fake_quantize.cpp:1:
/home/guide/anaconda3/envs/torch_env/lib/python3.7/site-packages/torch/include/ATen/ParallelOpenMP.h:84:0: warning: ignoring #pragma omp parallel [-Wunknown-pragmas]
#pragma omp parallel for if ((end - begin) >= grain_size)

g++ -pthread -shared -B /home/guide/anaconda3/envs/torch_env/compiler_compat -L/home/guide/anaconda3/envs/torch_env/lib -Wl,-rpath=/home/guide/anaconda3/envs/torch_env/lib -Wl,--no-as-needed -Wl,--sysroot=/ /home/guide/Desktop/ErnestLi/Packages/nnieqat-pytorch-master/build/temp.linux-x86_64-3.7/./src/fake_quantize.o -Lobj -L/home/guide/anaconda3/envs/torch_env/lib/python3.7/site-packages/torch/lib -L/usr/local/cuda-10.2/lib64 -lquant_impl -lc10 -ltorch -ltorch_cpu -ltorch_python -lcudart -lc10_cuda -ltorch_cuda -o build/lib.linux-x86_64-3.7/quant_impl.cpython-37m-x86_64-linux-gnu.so
creating build/bdist.linux-x86_64
creating build/bdist.linux-x86_64/egg
creating build/bdist.linux-x86_64/egg/nnieqat
copying build/lib.linux-x86_64-3.7/nnieqat/quantize.py -> build/bdist.linux-x86_64/egg/nnieqat
copying build/lib.linux-x86_64-3.7/nnieqat/init.py -> build/bdist.linux-x86_64/egg/nnieqat
copying build/lib.linux-x86_64-3.7/quant_impl.cpython-37m-x86_64-linux-gnu.so -> build/bdist.linux-x86_64/egg
byte-compiling build/bdist.linux-x86_64/egg/nnieqat/quantize.py to quantize.cpython-37.pyc
byte-compiling build/bdist.linux-x86_64/egg/nnieqat/init.py to init.cpython-37.pyc
creating stub loader for quant_impl.cpython-37m-x86_64-linux-gnu.so
byte-compiling build/bdist.linux-x86_64/egg/quant_impl.py to quant_impl.cpython-37.pyc
creating build/bdist.linux-x86_64/egg/EGG-INFO
copying nnieqat.egg-info/PKG-INFO -> build/bdist.linux-x86_64/egg/EGG-INFO
copying nnieqat.egg-info/SOURCES.txt -> build/bdist.linux-x86_64/egg/EGG-INFO
copying nnieqat.egg-info/dependency_links.txt -> build/bdist.linux-x86_64/egg/EGG-INFO
copying nnieqat.egg-info/requires.txt -> build/bdist.linux-x86_64/egg/EGG-INFO
copying nnieqat.egg-info/top_level.txt -> build/bdist.linux-x86_64/egg/EGG-INFO
writing build/bdist.linux-x86_64/egg/EGG-INFO/native_libs.txt
zip_safe flag not set; analyzing archive contents...
pycache.quant_impl.cpython-37: module references file
nnieqat.pycache.quantize.cpython-37: module references file
creating dist
creating 'dist/nnieqat-0.1.0-py3.7-linux-x86_64.egg' and adding 'build/bdist.linux-x86_64/egg' to it
removing 'build/bdist.linux-x86_64/egg' (and everything under it)
Processing nnieqat-0.1.0-py3.7-linux-x86_64.egg
removing '/home/guide/anaconda3/envs/torch_env/lib/python3.7/site-packages/nnieqat-0.1.0-py3.7-linux-x86_64.egg' (and everything under it)
creating /home/guide/anaconda3/envs/torch_env/lib/python3.7/site-packages/nnieqat-0.1.0-py3.7-linux-x86_64.egg
Extracting nnieqat-0.1.0-py3.7-linux-x86_64.egg to /home/guide/anaconda3/envs/torch_env/lib/python3.7/site-packages
nnieqat 0.1.0 is already the active version in easy-install.pth

Installed /home/guide/anaconda3/envs/torch_env/lib/python3.7/site-packages/nnieqat-0.1.0-py3.7-linux-x86_64.egg
Processing dependencies for nnieqat==0.1.0
Searching for numpy==1.19.1
Best match: numpy 1.19.1
Adding numpy 1.19.1 to easy-install.pth file
Installing f2py script to /home/guide/anaconda3/envs/torch_env/bin
Installing f2py3 script to /home/guide/anaconda3/envs/torch_env/bin
Installing f2py3.7 script to /home/guide/anaconda3/envs/torch_env/bin

Using /home/guide/anaconda3/envs/torch_env/lib/python3.7/site-packages
Searching for numba==0.51.2
Best match: numba 0.51.2
Processing numba-0.51.2-py3.7-linux-x86_64.egg
numba 0.51.2 is already the active version in easy-install.pth
Installing numba script to /home/guide/anaconda3/envs/torch_env/bin
Installing pycc script to /home/guide/anaconda3/envs/torch_env/bin

Using /home/guide/anaconda3/envs/torch_env/lib/python3.7/site-packages/numba-0.51.2-py3.7-linux-x86_64.egg
Searching for torch==1.6.0
Best match: torch 1.6.0
Adding torch 1.6.0 to easy-install.pth file
Installing convert-caffe2-to-onnx script to /home/guide/anaconda3/envs/torch_env/bin
Installing convert-onnx-to-caffe2 script to /home/guide/anaconda3/envs/torch_env/bin

Using /home/guide/anaconda3/envs/torch_env/lib/python3.7/site-packages
Searching for setuptools==50.3.0.post20201006
Best match: setuptools 50.3.0.post20201006
Adding setuptools 50.3.0.post20201006 to easy-install.pth file
Installing easy_install script to /home/guide/anaconda3/envs/torch_env/bin

Using /home/guide/anaconda3/envs/torch_env/lib/python3.7/site-packages
Searching for llvmlite==0.34.0
Best match: llvmlite 0.34.0
Adding llvmlite 0.34.0 to easy-install.pth file

Using /home/guide/anaconda3/envs/torch_env/lib/python3.7/site-packages
Searching for future==0.18.2
Best match: future 0.18.2
Processing future-0.18.2-py3.7.egg
future 0.18.2 is already the active version in easy-install.pth
Installing futurize script to /home/guide/anaconda3/envs/torch_env/bin
Installing pasteurize script to /home/guide/anaconda3/envs/torch_env/bin

Using /home/guide/anaconda3/envs/torch_env/lib/python3.7/site-packages/future-0.18.2-py3.7.egg
Finished processing dependencies for nnieqat==0.1.0
但是当进入python环境，import nnieqat时
报错：

import nnieqat
Traceback (most recent call last):
File "", line 1, in
File "/home/guide/Desktop/ErnestLi/Packages/nnieqat-pytorch-master/nnieqat/init.py", line 5, in
from .quantize import quant_dequant_weight, unquant_weight, freeze_bn,
File "/home/guide/Desktop/ErnestLi/Packages/nnieqat-pytorch-master/nnieqat/quantize.py", line 12, in
from quant_impl import fake_quantize
ImportError: /home/guide/anaconda3/envs/torch_env/lib/python3.7/site-packages/nnieqat-0.1.0-py3.7-linux-x86_64.egg/quant_impl.cpython-37m-x86_64-linux-gnu.so: undefined symbol: _ZN6caffe26detail37_typeMetaDataInstance_preallocated_10E

量化后对于小目标检测得分特别低

你好，我用darknet训练了yolov3后，转换到nnie，用nnie sample，修改类别为两类（人，人头），跑起来后，发现对于大目标person预测正常，但是对于小目标,人头，得分特别低，很多只有0.0?, 我把阈值设置为0.01才把这些框取出来。而在darknet转到caffe后，得分在0.3以上，一切正常。
请问这种现象，是nnie的量化方式导致的吗？使用您的代码，训练后，这种现象会好吗？谢谢

没事了

yolov5感知训练冻结bn层

您好，请问yolov5感知训练时只要冻结bn层就会出现检测不出结果的情况，不冻结bn层可以检测出结果但是精度很低，原代码没问题，求解答。我是按照readme里的方法进行的代码修改，自己编译的nnieqat库，方便的话可以提供下yolov5的感知量化训练代码吗？谢谢了！！！

请教一下：后期会支持其他层的量化么？

感谢分享！
调试代码，发现quantize.py文件只支持Convolution层量化，没有对Deconvolution、DepthwiseConv、InnerProduct等层进行实现，如果我在Deconvolution层后加入BN层，训练时则会在conv_module = _fuse_conv_bn(conv_module, child)这句传入conv_module=None的错误参数，虽然通过调整(修改为 if isinstance(child, (torch.nn.BatchNorm1d, torch.nn.BatchNorm2d, torch.nn.BatchNorm3d)) and (conv_module is not None): )能过滤掉，但是属于治标不治本的做法，所以好奇问问：后期会有其他层实现吗？

添加量化训练后显存oom

你好，我添加nnieqat量化训练后，每轮训练显存不释放，显存递增后oom

量化感知训练后模型的部署

量化感知训练后将模型部署在hisi平台，pytorch 转成caffe的过程中，相比原来的模型，在卷积激活后会有一个cast层，caffe不支持，请问可有解决的办法？

在NNIE上耗时如何？

想问下这个模型在NNIE上耗时如何呢？

这个量化流程可以应用于其他int8量化工具的部署嘛？

你好，请问一下这个库是否量化流程仅仅是对NNIE的量化部署友好呢，是否可以用于训练部署于其他框架的模型，例如mnn？

测试相关

测试时候是否按照float测试模型进行测试即可

error when i run the "from nnieqat import quant_dequant_weight"

hi I met a error when i run the "from nnieqat import quant_dequant_weight"

error information is shown as "ImportError: cannot import name 'quant_dequant_weight'"

when i run the "import nnieqat" it is ok

Can you give me some help!

thanks a lot!

量化感知训练后如何在使用nnie_mapper工具转换？

你好，我使用这个框架finetune训练了yolov3模型，想问一下训练完的权重如何结合nnie_mapper使用？我看到你在知乎的评论回复中说“可以读取并关联pytorch模型和转换后caffe模型的对应层，然后将量化信息写到nnie_mapper cfg的gfpq_param_file里边哈”，我在nnie_mapper参数表里没有看到gfpq_param_file这个参数说明，能不能再详细的讲一下部署的过程？
感谢！

from nnieqat.modules import convert_layers error

OSError: /usr/local/cuda-10.0/lib64/libcublas.so.10: version `libcublas.so.10' not found (required by /home/xxx/anaconda3/envs/xxx_py36/lib/python3.6/site-packages/nnieqat/gpu/lib/libgfpq_gpu.so)

how do i solve this?

Quantization of Yolo and SSD

Hi,
I have a few questions about the Yolov3 and SSD etc quantization.
1-How we can utilize this project for these state of the art methods?
2-Are you planning to add any examples for these networks in the near future?
3-Do we need to train the model using your proposed quantization method or we can also quantize pre-trained models? For example, if we have pre-trained yolov3.