Code Monkey home page Code Monkey logo

intel / neural-compressor Goto Github PK

View Code? Open in Web Editor NEW
2.1K 36.0 251.0 433.77 MB

SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime

Home Page: https://intel.github.io/neural-compressor/

License: Apache License 2.0

Python 98.71% Dockerfile 0.02% Shell 1.23% Roff 0.01% C++ 0.02%
low-precision pruning sparsity auto-tuning knowledge-distillation quantization quantization-aware-training post-training-quantization smoothquant large-language-models

neural-compressor's Introduction

Intel® Neural Compressor

An open-source Python library supporting popular model compression techniques on all mainstream deep learning frameworks (TensorFlow, PyTorch, ONNX Runtime, and MXNet)

python version license coverage Downloads

Architecture   |   Workflow   |   LLMs Recipes   |   Results   |   Documentations


Intel® Neural Compressor aims to provide popular model compression techniques such as quantization, pruning (sparsity), distillation, and neural architecture search on mainstream frameworks such as TensorFlow, PyTorch, ONNX Runtime, and MXNet, as well as Intel extensions such as Intel Extension for TensorFlow and Intel Extension for PyTorch. In particular, the tool provides the key features, typical examples, and open collaborations as below:

What's New

Installation

Install from pypi

pip install neural-compressor

Note: Further installation methods can be found under Installation Guide. check out our FAQ for more details.

Getting Started

Setting up the environment:

pip install "neural-compressor>=2.3" "transformers>=4.34.0" torch torchvision

After successfully installing these packages, try your first quantization program.

Weight-Only Quantization (LLMs)

Following example code demonstrates Weight-Only Quantization on LLMs, it supports Intel CPU, Intel Gaudi2 AI Accelerator, Nvidia GPU, best device will be selected automatically.

To try on Intel Gaudi2, docker image with Gaudi Software Stack is recommended, please refer to following script for environment setup. More details can be found in Gaudi Guide.

# Run a container with an interactive shell
docker run -it --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --net=host --ipc=host vault.habana.ai/gaudi-docker/1.14.0/ubuntu22.04/habanalabs/pytorch-installer-2.1.1:latest

# Install the optimum-habana
pip install --upgrade-strategy eager optimum[habana]

# Install INC/auto_round
pip install neural-compressor auto_round

Run the example:

from transformers import AutoModel, AutoTokenizer

from neural_compressor.config import PostTrainingQuantConfig
from neural_compressor.quantization import fit
from neural_compressor.adaptor.torch_utils.auto_round import get_dataloader

model_name = "EleutherAI/gpt-neo-125m"
float_model = AutoModel.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
dataloader = get_dataloader(tokenizer, seqlen=2048)

woq_conf = PostTrainingQuantConfig(
    approach="weight_only",
    op_type_dict={
        ".*": {  # match all ops
            "weight": {
                "dtype": "int",
                "bits": 4,
                "algorithm": "AUTOROUND",
            },
        }
    },
)
quantized_model = fit(model=float_model, conf=woq_conf, calib_dataloader=dataloader)

Note:

To try INT4 model inference, please directly use Intel Extension for Transformers, which leverages Intel Neural Compressor for model quantization.

Static Quantization (Non-LLMs)

from torchvision import models

from neural_compressor.config import PostTrainingQuantConfig
from neural_compressor.data import DataLoader, Datasets
from neural_compressor.quantization import fit

float_model = models.resnet18()
dataset = Datasets("pytorch")["dummy"](shape=(1, 3, 224, 224))
calib_dataloader = DataLoader(framework="pytorch", dataset=dataset)
static_quant_conf = PostTrainingQuantConfig()
quantized_model = fit(model=float_model, conf=static_quant_conf, calib_dataloader=calib_dataloader)

Documentation

Overview
Architecture Workflow APIs LLMs Recipes Examples
PyTorch Extension APIs
Overview Static Quantization Dynamic Quantization Smooth Quantization
Weight-Only Quantization MX Quantization Mixed Precision
Tensorflow Extension APIs
Overview Static Quantization Smooth Quantization
Other Modules
Auto Tune Benchmark

Note:
From 3.0 release, we recommend to use 3.X API. Compression techniques during training such as QAT, Pruning, Distillation only available in 2.X API currently.

Selected Publications/Events

Note: View Full Publication List.

Additional Content

Communication

  • GitHub Issues: mainly for bug reports, new feature requests, question asking, etc.
  • Email: welcome to raise any interesting research ideas on model compression techniques by email for collaborations.
  • Discord Channel: join the discord channel for more flexible technical discussion.
  • WeChat group: scan the QA code to join the technical discussion.

neural-compressor's People

Contributors

airmeng avatar aradys avatar bmyrcha avatar changwangss avatar chendali-intel avatar chensuyue avatar chuanqi129 avatar clarkchin08 avatar dependabot[bot] avatar eason9393 avatar ftian1 avatar guomingz avatar kaihui-intel avatar kaikaiyao avatar lvliang-intel avatar mengniwang95 avatar penghuicheng avatar pengxin99 avatar spycsh avatar tybulewicz avatar vincyzhang avatar violetch24 avatar xin3he avatar xinyuye-intel avatar xuehaosun avatar yiliu30 avatar yiyangcai avatar yuwenzho avatar zehao-intel avatar zhiwei35 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

neural-compressor's Issues

Custom dataset for multiple inputs

Hi,
I want to construct a custom dataset that is to be used by Common dataloader. My model has two inputs and one output, but I only know that from the documentation in the repo, Dataset.getitem() can be written in the following fashion:

class Dataset(object):
    def __init__(self, args):
        # init code here

    def __getitem__(self, idx):
        # use idx to get data and label
        return data, label

    def __len__(self):
        return len

whereas the return objects of __getitem__() are data, label respectively.

I tried to pass my inputs in a tuple:

    def __getitem__(self, idx):
        return (data0, data1), label

a list:

    def __getitem__(self, idx):
        return [data0, data1], label

and even a dictionary:

    def __getitem__(self, idx):
        return {'input0': data0, 'input1': data1}, label

but none of them work. What is the proper way of passing mutiple inputs in the __getitem__ function?

Thank you.

How to set yaml conf When ptq A tensorflow2.x saved_model?

I have a saved_model want to use NIC's PTQ, When I use examples/tensorflow/image_recognition/SavedModel/quantization/ptq to test it, The error occur:

2022-03-23 17:08:39 [INFO] Generate a fake evaluation function.
Traceback (most recent call last):
  File "main.py", line 59, in <module>
    evaluate_opt_graph.run()
  File "main.py", line 48, in run
    q_model = quantizer.fit()
  File "/root/anaconda3/envs/tf2/lib/python3.7/site-packages/neural_compressor/experimental/quantization.py", line 212, in __call__
    return super(Quantization, self).__call__()
  File "/root/anaconda3/envs/tf2/lib/python3.7/site-packages/neural_compressor/experimental/component.py", line 214, in __call__
    self.pre_process()
  File "/root/anaconda3/envs/tf2/lib/python3.7/site-packages/neural_compressor/experimental/quantization.py", line 121, in pre_process
    self._create_calib_dataloader(cfg)
  File "/root/anaconda3/envs/tf2/lib/python3.7/site-packages/neural_compressor/experimental/quantization.py", line 112, in _create_calib_dataloader
    self._calib_dataloader = create_dataloader(self.framework, calib_dataloader_cfg)
  File "/root/anaconda3/envs/tf2/lib/python3.7/site-packages/neural_compressor/utils/create_obj_from_config.py", line 96, in create_dataloader
    copy.deepcopy(dataloader_cfg['filter']),)
  File "/root/anaconda3/envs/tf2/lib/python3.7/site-packages/neural_compressor/utils/create_obj_from_config.py", line 79, in create_dataset
    transform=preprocess, filter=filter)
  File "/root/anaconda3/envs/tf2/lib/python3.7/site-packages/neural_compressor/experimental/data/datasets/dataset.py", line 764, in __new__
    raise ValueError('Found no files in --root matching: {}'.format(glob_pattern))
ValueError: Found no files in --root matching: ./data/*-*-of-*

my yaml conf:

model:                                               # mandatory. used to specify model specific information.
  name: origin_model
  framework: tensorflow                              # mandatory. supported values are tensorflow, pytorch, pytorch_ipex, onnxrt_integer, onnxrt_qlinear or mxnet; allow new framework backend extension.
  inputs: dense_input, sparse_ids_input, sparse_wgt_input, seq_50_input
  outputs: dense


quantization:                                        # optional. tuning constraints on model-wise for advance user to reduce tuning space.
  calibration:
    sampling_size: 20000                              # optional. default value is 100. used to set how many samples should be used in calibration.
    dataloader:
      batch_size: 10
      dataset:
         TFRecordDataset:
           root: ./data
  model_wise:                                        # optional. tuning constraints on model-wise for advance user to reduce tuning space.
    activation:
      algorithm: minmax
  op_wise: {
    'import/dnn/hiddenlayer_0/MatMul': {
      'activation':  {'dtype': ['uint8'], 'algorithm': ['minmax'], 'scheme':['asym']},
    }
  }

tuning:
  accuracy_criterion:
    relative:  0.01                                  # optional. default value is relative, other value is absolute. this example allows relative accuracy loss: 1%.
  exit_policy:
    timeout: 0                                       # optional. tuning timeout (seconds). default value is 0 which means early stop. combine with max_trials field to decide when to exit.
    max_trials: 100                                  # optional. max tune times. default value is 100. combine with timeout field to decide when to exit.
  random_seed: 9527                                  # optional. random seed for deterministic tuning.

image

Does INC not support torch.bfloat16 during the pruning process

Hi, when I use the magnitude pruning on models with torch.bfloat16 tensor, if fails:

TypeError: Got unsupported ScalarType BFloat16

I find this error is due to the unsupported operation when converting torch.bfloat16 to numpy array. Why INC does not try to support bfloat16 operation? Is there any special consideration?

[MXNET] Activation in yaml and calibration algorithm

Hi, I would like to contribute and improve MXNet integration and add MXNet 1.9 support, but one thing is not clear to me.

I would like to ask what exactly does the "activation" of an operator mean in "capabilities", especially its "algorithm" attribute. I thought this meant the operator's input (as stated here), and the "algorithm" attribute determines whether minmax or kl will be used when calibrating that operator's input tensors, but I believe this is not what is happening in MXNet adaptor.
https://github.com/intel/lpot/blob/v1.3.1/lpot/adaptor/mxnet.py#L581-L592
Here, calib_minmax_layers and calib_kl_layers are filled with the names of the operators' outputs, not their inputs. Is this intentional?

Thanks

[Tensorflow] ops not quantized

Framework: Tensorflow 2.6.0
LPOT: 1.6.0

When I printed out the tune_cfg() in strategy.py

### op_cfgs ###
('model/dense_5/Tensordot/MatMul', 'matmul')
{'weight': {'dtype': 'int8', 'scheme': 'sym', 'granularity': 'per_tensor', 'algorithm': 'minmax', 'bit': 7.0}, 'activation': {'dtype': 'uint8', 'scheme': 'asym', 'granularity': 'per_tensor', 'algorithm': 'minmax'}, 'pattern': {'sequence': 'MatMul,BiasAdd', 'precision': 'int8'}}
('model/dense_5/Tensordot/concat_1', 'concat')
{'activation': {'dtype': 'uint8', 'algorithm': 'minmax', 'scheme': 'sym', 'granularity': 'per_tensor'}}
('model/dense_4/Tensordot/MatMul', 'matmul')
{'weight': {'dtype': 'int8', 'scheme': 'sym', 'granularity': 'per_tensor', 'algorithm': 'minmax', 'bit': 7.0}, 'activation': {'dtype': 'uint8', 'scheme': 'asym', 'granularity': 'per_tensor', 'algorithm': 'minmax'}, 'pattern': {'sequence': 'MatMul,BiasAdd', 'precision': 'int8'}}
('model/dense_4/Tensordot/concat_1', 'concat')
{'activation': {'dtype': 'uint8', 'algorithm': 'minmax', 'scheme': 'sym', 'granularity': 'per_tensor'}}
('model/dense_3/Tensordot/MatMul', 'matmul')
{'weight': {'dtype': 'int8', 'scheme': 'sym', 'granularity': 'per_tensor', 'algorithm': 'minmax', 'bit': 7.0}, 'activation': {'dtype': 'uint8', 'scheme': 'asym', 'granularity': 'per_tensor', 'algorithm': 'minmax'}, 'pattern': {'sequence': 'MatMul,BiasAdd', 'precision': 'int8'}}
('model/dense_3/Tensordot/concat_1', 'concat')
{'activation': {'dtype': 'uint8', 'algorithm': 'minmax', 'scheme': 'sym', 'granularity': 'per_tensor'}}
('model/dense_1/Tensordot/MatMul', 'matmul')
{'weight': {'dtype': 'int8', 'scheme': 'sym', 'granularity': 'per_tensor', 'algorithm': 'minmax', 'bit': 7.0}, 'activation': {'dtype': 'uint8', 'scheme': 'asym', 'granularity': 'per_tensor', 'algorithm': 'minmax'}, 'pattern': {'sequence': 'MatMul,BiasAdd', 'precision': 'int8'}}
('model/dense_1/Tensordot/concat_1', 'concat')
{'activation': {'dtype': 'uint8', 'algorithm': 'minmax', 'scheme': 'sym', 'granularity': 'per_tensor'}}
('model/dense_2/Tensordot/MatMul', 'matmul')
{'weight': {'dtype': 'int8', 'scheme': 'sym', 'granularity': 'per_tensor', 'algorithm': 'minmax', 'bit': 7.0}, 'activation': {'dtype': 'uint8', 'scheme': 'asym', 'granularity': 'per_tensor', 'algorithm': 'minmax'}, 'pattern': {'sequence': 'MatMul,BiasAdd', 'precision': 'int8'}}
('model/dense_2/Tensordot/concat_1', 'concat')
{'activation': {'dtype': 'uint8', 'algorithm': 'minmax', 'scheme': 'sym', 'granularity': 'per_tensor'}}
('model/dense/Tensordot/MatMul', 'matmul')
{'weight': {'dtype': 'int8', 'scheme': 'sym', 'granularity': 'per_tensor', 'algorithm': 'minmax', 'bit': 7.0}, 'activation': {'dtype': 'uint8', 'scheme': 'asym', 'granularity': 'per_tensor', 'algorithm': 'minmax'}, 'pattern': {'sequence': 'MatMul,BiasAdd', 'precision': 'int8'}}
('model/LSTM_2/PartitionedCall/while/body/_23/while/MatMul', 'matmul')
{'weight': {'dtype': 'int8', 'scheme': 'sym', 'granularity': 'per_tensor', 'algorithm': 'minmax', 'bit': 7.0}, 'activation': {'dtype': 'uint8', 'scheme': 'asym', 'granularity': 'per_tensor', 'algorithm': 'minmax'}, 'pattern': {'sequence': 'MatMul', 'precision': 'int8'}}
('model/dense/Tensordot/concat_1', 'concat')
{'activation': {'dtype': 'uint8', 'algorithm': 'minmax', 'scheme': 'sym', 'granularity': 'per_tensor'}}
('model/LSTM_2/PartitionedCall/while/body/_23/while/MatMul_1', 'matmul')
{'weight': {'dtype': 'int8', 'scheme': 'sym', 'granularity': 'per_tensor', 'algorithm': 'minmax', 'bit': 7.0}, 'activation': {'dtype': 'uint8', 'scheme': 'asym', 'granularity': 'per_tensor', 'algorithm': 'minmax'}, 'pattern': {'sequence': 'MatMul', 'precision': 'int8'}}
('model/LSTM_1/PartitionedCall/while/body/_83/while/MatMul', 'matmul')
{'weight': {'dtype': 'int8', 'scheme': 'sym', 'granularity': 'per_tensor', 'algorithm': 'minmax', 'bit': 7.0}, 'activation': {'dtype': 'uint8', 'scheme': 'asym', 'granularity': 'per_tensor', 'algorithm': 'minmax'}, 'pattern': {'sequence': 'MatMul', 'precision': 'int8'}}
('model/LSTM_1/PartitionedCall/while/body/_83/while/MatMul_1', 'matmul')
{'weight': {'dtype': 'int8', 'scheme': 'sym', 'granularity': 'per_tensor', 'algorithm': 'minmax', 'bit': 7.0}, 'activation': {'dtype': 'uint8', 'scheme': 'asym', 'granularity': 'per_tensor', 'algorithm': 'minmax'}, 'pattern': {'sequence': 'MatMul', 'precision': 'int8'}}
('model/52/Conv2D', 'conv2d')
{'weight': {'dtype': 'int8', 'scheme': 'sym', 'granularity': 'per_channel', 'algorithm': 'minmax', 'bit': 7.0}, 'activation': {'dtype': 'uint8', 'scheme': 'sym', 'granularity': 'per_tensor', 'algorithm': 'minmax'}, 'pattern': {'sequence': 'Conv2D', 'precision': 'int8'}}
('model/51/Conv2D', 'conv2d')
{'weight': {'dtype': 'int8', 'scheme': 'sym', 'granularity': 'per_channel', 'algorithm': 'minmax', 'bit': 7.0}, 'activation': {'dtype': 'uint8', 'scheme': 'sym', 'granularity': 'per_tensor', 'algorithm': 'minmax'}, 'pattern': {'sequence': 'Conv2D', 'precision': 'int8'}}
('model/42/Conv2D', 'conv2d')
{'weight': {'dtype': 'int8', 'scheme': 'sym', 'granularity': 'per_channel', 'algorithm': 'minmax', 'bit': 7.0}, 'activation': {'dtype': 'uint8', 'scheme': 'sym', 'granularity': 'per_tensor', 'algorithm': 'minmax'}, 'pattern': {'sequence': 'Conv2D', 'precision': 'int8'}}
('model/41/Conv2D', 'conv2d')
{'weight': {'dtype': 'int8', 'scheme': 'sym', 'granularity': 'per_channel', 'algorithm': 'minmax', 'bit': 7.0}, 'activation': {'dtype': 'uint8', 'scheme': 'sym', 'granularity': 'per_tensor', 'algorithm': 'minmax'}, 'pattern': {'sequence': 'Conv2D', 'precision': 'int8'}}
('model/32/Conv2D', 'conv2d')
{'weight': {'dtype': 'int8', 'scheme': 'sym', 'granularity': 'per_channel', 'algorithm': 'minmax', 'bit': 7.0}, 'activation': {'dtype': 'uint8', 'scheme': 'sym', 'granularity': 'per_tensor', 'algorithm': 'minmax'}, 'pattern': {'sequence': 'Conv2D', 'precision': 'int8'}}
('model/31/Conv2D', 'conv2d')
{'weight': {'dtype': 'int8', 'scheme': 'sym', 'granularity': 'per_channel', 'algorithm': 'minmax', 'bit': 7.0}, 'activation': {'dtype': 'uint8', 'scheme': 'sym', 'granularity': 'per_tensor', 'algorithm': 'minmax'}, 'pattern': {'sequence': 'Conv2D', 'precision': 'int8'}}
('model/2/Conv2D', 'conv2d')
{'weight': {'dtype': 'int8', 'scheme': 'sym', 'granularity': 'per_channel', 'algorithm': 'minmax', 'bit': 7.0}, 'activation': {'dtype': 'uint8', 'scheme': 'sym', 'granularity': 'per_tensor', 'algorithm': 'minmax'}, 'pattern': {'sequence': 'Conv2D', 'precision': 'int8'}}
('model/1/Conv2D', 'conv2d')
{'weight': {'dtype': 'int8', 'scheme': 'sym', 'granularity': 'per_channel', 'algorithm': 'minmax', 'bit': 7.0}, 'activation': {'dtype': 'uint8', 'scheme': 'sym', 'granularity': 'per_tensor', 'algorithm': 'minmax'}, 'pattern': {'sequence': 'Conv2D', 'precision': 'int8'}}


### dispatched_op_names ###
['model/dense_5/Tensordot/MatMul', 'model/dense_5/Tensordot/concat_1', 'model/dense_4/Tensordot/MatMul', 'model/dense_4/Tensordot/concat_1', 'model/dense_3/Tensordot/MatMul', 'model/dense_3/Tensordot/concat_1', 'model/dense_1/Tensordot/MatMul', 'model/dense_1/Tensordot/concat_1', 'model/dense_2/Tensordot/MatMul', 'model/dense_2/Tensordot/concat_1', 'model/dense/Tensordot/MatMul', 'model/LSTM_2/PartitionedCall/while/body/_23/while/MatMul', 'model/dense/Tensordot/concat_1', 'model/LSTM_2/PartitionedCall/while/body/_23/while/MatMul_1', 'model/LSTM_1/PartitionedCall/while/body/_83/while/MatMul', 'model/LSTM_1/PartitionedCall/while/body/_83/while/MatMul_1', 'model/52/Conv2D', 'model/51/Conv2D', 'model/42/Conv2D', 'model/41/Conv2D', 'model/32/Conv2D', 'model/31/Conv2D', 'model/2/Conv2D', 'model/1/Conv2D']
### invalid_op_names ###
[]



2021-08-27 08:43:41 [WARNING] Found possible input node names: ['input_noisy', 'input_noisy_norm'], output node names: ['outputMask'].
2021-08-27 08:43:53 [WARNING] Found possible input node names: ['input_noisy', 'input_noisy_norm'], output node names: ['outputMask'].
2021-08-27 08:44:01.108141: I tensorflow/core/grappler/devices.cc:66] Number of eligible GPUs (core count >= 8, compute capability >= 0.0): 0
2021-08-27 08:44:01.108428: I tensorflow/core/grappler/clusters/single_machine.cc:357] Starting new session
2021-08-27 08:44:01.156499: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:1137] Optimization results for grappler item: graph_to_optimize
  function_optimizer: Graph size after: 974 nodes (370), 1237 edges (530), time = 17.966ms.
  function_optimizer: function_optimizer did nothing. time = 0.805ms.

2021-08-27 08:44:02.385658: I tensorflow/core/grappler/devices.cc:66] Number of eligible GPUs (core count >= 8, compute capability >= 0.0): 0
2021-08-27 08:44:02.385886: I tensorflow/core/grappler/clusters/single_machine.cc:357] Starting new session
2021-08-27 08:44:02.657347: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:1137] Optimization results for grappler item: tf_graph
  constant_folding: Graph size after: 782 nodes (-96), 919 edges (-118), time = 150.555ms.
  constant_folding: Graph size after: 782 nodes (0), 919 edges (0), time = 37.266ms.

2021-08-27 08:44:05 [INFO] Pass Quantization elapsed time: 2325.7 ms
2021-08-27 08:44:38 [INFO] Pass QuantizedRNNConverter elapsed time: 57.53 ms
2021-08-27 08:44:39 [INFO] Pass StripUnusedNodesOptimizer elapsed time: 168.84 ms
2021-08-27 08:44:39 [INFO] Pass RemoveTrainingNodesOptimizer elapsed time: 57.83 ms
2021-08-27 08:44:39 [INFO] Pass FoldBatchNormNodesOptimizer elapsed time: 57.06 ms
2021-08-27 08:44:39 [INFO] Pass MetaOpOptimizer elapsed time: 54.55 ms
2021-08-27 08:44:39 [WARNING] Node name unused_control_flow_input_20 specified in yaml doesn't exist in the model.
2021-08-27 08:44:39 [WARNING] Found possible input node names: ['input_noisy', 'input_noisy_norm'], output node names: ['outputMask'].
2021-08-27 08:44:41 [INFO] Pass PostCseOptimizer elapsed time: 1593.45 ms
2021-08-27 08:44:41 [INFO] |********Mixed Precision Statistics*******|
2021-08-27 08:44:41 [INFO] +---------------+---------+-------+-------+
2021-08-27 08:44:41 [INFO] |    Op Type    |  Total  |  INT8 |  FP32 |
2021-08-27 08:44:41 [INFO] +---------------+---------+-------+-------+
2021-08-27 08:44:41 [INFO] |     Conv2D    |    8    |   0   |   8   |
2021-08-27 08:44:41 [INFO] |     MatMul    |    10   |   6   |   4   |
2021-08-27 08:44:41 [INFO] |    ConcatV2   |    6    |   0   |   6   |
2021-08-27 08:44:41 [INFO] |   QuantizeV2  |    6    |   6   |   0   |
2021-08-27 08:44:41 [INFO] |   Dequantize  |    1    |   1   |   0   |
2021-08-27 08:44:41 [INFO] +---------------+---------+-------+-------+
2021-08-27 08:44:41 [INFO] Pass quantize model elapsed time: 73892.89 ms
2021-08-27 08:44:41 [INFO] Start to evaluate the TensorFlow model.
2021-08-27 08:46:07 [INFO] Tune 1 result is: [accuracy: 0.3118, duration (seconds): 86.5451], Best tune result is: [accuracy: 0.3118, duration (seconds): 86.5451]

First Conv2D and Matmul seems to be set to quantize to int8, but in mixed precision statistics, they are still in fp32 format. My main focus is to speed up Conv2D computation, but I cannot find the reason why it stays unquantized.
Is this because the pattern is unmatched?
Originally, my convolutional layer is paired with a leaky ReLU, and I also tried using ReLU, or no activation at all, but it just won't quantize Conv2D.

Please find my model link here

[keras example] AttributeError: 'NoneType' object has no attribute 'items'

Environment: Google Colab, Tensorflow 1.15.0

I was trying out one of the examples (examples/tensorflow/keras), and when I was running benchmark session, I got the following error:

Traceback (most recent call last):
File "main.py", line 76, in
evaluate_opt_graph.run()
File "main.py", line 63, in run
for mode, result in results.items():
AttributeError: 'NoneType' object has no attribute 'items'

So I opened the file main.py to inspect the code:

results = evaluator()
for mode, result in results.items():

It seems that results is a NoneType object, which caused the error.


On another note, I wanted to ask about how to setup config for the quantization:

  1. How do I specify the type of quantization, e.g. full integer quantization or 16x8 integer quantization? I see the current method seems to employ the full integer method. So I opened that the sample config file to try to edit the method, but ended up seeing the following parameters
    model_wise:
    activation:
    algorithm: minmax
    I'm not sure how this connects to the quantization method, any hints/help would be greatly appreciated.
  2. Just to be sure, the benchmark statistic
    FP32 baseline is: [0.1135, 138.3854] list object refers to accuracy, and loss respectively right?

Thanks in advance. 😃

Examples for Pruning with Lpot

Hello,

I am working on building a front-end focused platform that can handle pruning for the user. I wanted to know if there are any examples in the repository for pruning or if someone could provide an example from their own experiments. I appreciate all your help.

Thank you,

Will

python_fx example problem

Run https://github.com/intel/neural-compressor/tree/master/examples/pytorch/nlp/huggingface_models/question-answering/optimization_pipeline/prune_once_for_all/fx script stage:

for stage 1

python run_qa_no_trainer_pruneOFA.py --dataset_name squad
--model_name_or_path Intel/bert-base-uncased-sparse-90-unstructured-pruneofa
--teacher_model_name_or_path csarron/bert-base-uncased-squad-v1
--do_prune --do_distillation --max_seq_length 384 --batch_size 12
--learning_rate 1.5e-4 --do_eval --num_train_epochs 8
--output_dir ./ --loss_weights 0 1
--temperature 2 --seed 5143 --pad_to_max_length --run_teacher_logits

There are some errors during running:
inc_quan_report

custom data loader and metric not working for quantization of TF saved model

I am using helloworld tf_example 2 as baseline to create custom dataset loader from local image folders, and the same classification accuracy as metric. But during quantization, I see a message saying "[INFO] Neither evaluation function nor metric is defined. Generate a quantized model with default quantization configuration. [INFO] Generate a fake evaluation function."
Here is the code snippet for Dataset class constructor. MyMetric class definition and member functions are the same as the example.

class Dataset(object):
  def __init__(self, image_dir):
      
    #   (train_images, train_labels), (test_images,
    #              test_labels) = keras.datasets.fashion_mnist.load_data()
      datagen = keras.preprocessing.image.ImageDataGenerator()
      train_generator = datagen.flow_from_directory(
        image_dir,
        target_size=(300, 300),
        color_mode ='grayscale',
        batch_size=32,
        class_mode='categorical'
      )
      x=np.concatenate([train_generator.next()[0] for i in range(train_generator.__len__())])
      y=np.concatenate([train_generator.next()[1] for i in range(train_generator.__len__())])

      self.test_images = x / 255.0
      self.labels = y
      pass

and here is the output during quantization.

Found 10731 images belonging to 3 classes.
WARNING:tensorflow:SavedModel saved prior to TF 2.5 detected when loading Keras model. Please ensure that you are saving the model with model.save() or tf.keras.models.save_model(), NOT tf.saved_model.save(). To confirm, there should be a file named "keras_metadata.pb" in the SavedModel directory.
2022-02-23 04:54:53 [WARNING] SavedModel saved prior to TF 2.5 detected when loading Keras model. Please ensure that you are saving the model with model.save() or tf.keras.models.save_model(), NOT tf.saved_model.save(). To confirm, there should be a file named "keras_metadata.pb" in the SavedModel directory.
2022-02-23 04:54:54.146247: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-02-23 04:54:54.148159: I tensorflow/core/common_runtime/process_util.cc:146] Creating new thread pool with default inter op setting: 2. Tune using inter_op_parallelism_threads for best performance.
WARNING:tensorflow:SavedModel saved prior to TF 2.5 detected when loading Keras model. Please ensure that you are saving the model with model.save() or tf.keras.models.save_model(), NOT tf.saved_model.save(). To confirm, there should be a file named "keras_metadata.pb" in the SavedModel directory.
2022-02-23 04:56:00 [WARNING] SavedModel saved prior to TF 2.5 detected when loading Keras model. Please ensure that you are saving the model with model.save() or tf.keras.models.save_model(), NOT tf.saved_model.save(). To confirm, there should be a file named "keras_metadata.pb" in the SavedModel directory.
WARNING:tensorflow:SavedModel saved prior to TF 2.5 detected when loading Keras model. Please ensure that you are saving the model with model.save() or tf.keras.models.save_model(), NOT tf.saved_model.save(). To confirm, there should be a file named "keras_metadata.pb" in the SavedModel directory.
2022-02-23 04:57:14 [WARNING] SavedModel saved prior to TF 2.5 detected when loading Keras model. Please ensure that you are saving the model with model.save() or tf.keras.models.save_model(), NOT tf.saved_model.save(). To confirm, there should be a file named "keras_metadata.pb" in the SavedModel directory.
2022-02-23 04:58:12.338695: I tensorflow/core/grappler/devices.cc:75] Number of eligible GPUs (core count >= 8, compute capability >= 0.0): 0 (Note: TensorFlow was not compiled with CUDA or ROCm support)
2022-02-23 04:58:12.338923: I tensorflow/core/grappler/clusters/single_machine.cc:358] Starting new session
2022-02-23 04:58:12.362167: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:1149] Optimization results for grappler item: graph_to_optimize
function_optimizer: function_optimizer did nothing. time = 0.021ms.
function_optimizer: function_optimizer did nothing. time = 0.001ms.

2022-02-23 04:58:17.228588: I tensorflow/core/grappler/devices.cc:75] Number of eligible GPUs (core count >= 8, compute capability >= 0.0): 0 (Note: TensorFlow was not compiled with CUDA or ROCm support)
2022-02-23 04:58:17.228792: I tensorflow/core/grappler/clusters/single_machine.cc:358] Starting new session
2022-02-23 04:58:17.444025: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:1149] Optimization results for grappler item: tf_graph
constant_folding: Graph size after: 1118 nodes (-612), 1787 edges (-612), time = 80.122ms.
constant_folding: Graph size after: 1118 nodes (0), 1787 edges (0), time = 64.691ms.

WARNING:tensorflow:SavedModel saved prior to TF 2.5 detected when loading Keras model. Please ensure that you are saving the model with model.save() or tf.keras.models.save_model(), NOT tf.saved_model.save(). To confirm, there should be a file named "keras_metadata.pb" in the SavedModel directory.
2022-02-23 04:58:18 [WARNING] SavedModel saved prior to TF 2.5 detected when loading Keras model. Please ensure that you are saving the model with model.save() or tf.keras.models.save_model(), NOT tf.saved_model.save(). To confirm, there should be a file named "keras_metadata.pb" in the SavedModel directory.
WARNING:tensorflow:SavedModel saved prior to TF 2.5 detected when loading Keras model. Please ensure that you are saving the model with model.save() or tf.keras.models.save_model(), NOT tf.saved_model.save(). To confirm, there should be a file named "keras_metadata.pb" in the SavedModel directory.
2022-02-23 04:59:14 [WARNING] SavedModel saved prior to TF 2.5 detected when loading Keras model. Please ensure that you are saving the model with model.save() or tf.keras.models.save_model(), NOT tf.saved_model.save(). To confirm, there should be a file named "keras_metadata.pb" in the SavedModel directory.
2022-02-23 05:00:09 [INFO] ConvertLayoutOptimizer elapsed time: 2.27 ms
2022-02-23 05:00:11.809891: I tensorflow/core/grappler/devices.cc:75] Number of eligible GPUs (core count >= 8, compute capability >= 0.0): 0 (Note: TensorFlow was not compiled with CUDA or ROCm support)
2022-02-23 05:00:11.810752: I tensorflow/core/grappler/clusters/single_machine.cc:358] Starting new session
2022-02-23 05:00:11.940530: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:1149] Optimization results for grappler item: graph_to_optimize
model_pruner: Graph size after: 1115 nodes (-3), 1784 edges (-3), time = 18.656ms.
shape_optimizer: shape_optimizer did nothing. time = 1.215ms.
dependency_optimizer: Graph size after: 1114 nodes (-1), 1171 edges (-613), time = 18.54ms.
debug_stripper: debug_stripper did nothing. time = 1.425ms.
loop_optimizer: Graph size after: 1114 nodes (0), 1171 edges (0), time = 9.038ms.
model_pruner: Graph size after: 1114 nodes (0), 1171 edges (0), time = 10.288ms.
shape_optimizer: shape_optimizer did nothing. time = 1.118ms.
dependency_optimizer: Graph size after: 1114 nodes (0), 1171 edges (0), time = 13.384ms.
debug_stripper: debug_stripper did nothing. time = 1.266ms.

2022-02-23 05:00:11 [INFO] Pass GrapplerOptimizer elapsed time: 1987.65 ms
2022-02-23 05:00:12 [INFO] Pass SwitchOptimizer elapsed time: 242.62 ms
2022-02-23 05:00:12 [INFO] Pass RemoveTrainingNodesOptimizer elapsed time: 244.49 ms
2022-02-23 05:00:12 [INFO] Pass SplitSharedInputOptimizer elapsed time: 31.42 ms
2022-02-23 05:00:12 [INFO] Pass GraphFoldConstantOptimizer elapsed time: 220.24 ms
2022-02-23 05:00:12 [INFO] Pass FuseColumnWiseMulOptimizer elapsed time: 243.88 ms
WARNING:tensorflow:From /usr/local/lib/python3.8/dist-packages/neural_compressor/adaptor/tf_utils/util.py:317: extract_sub_graph (from tensorflow.python.framework.graph_util_impl) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.compat.v1.graph_util.extract_sub_graph
2022-02-23 05:00:13 [WARNING] From /usr/local/lib/python3.8/dist-packages/neural_compressor/adaptor/tf_utils/util.py:317: extract_sub_graph (from tensorflow.python.framework.graph_util_impl) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.compat.v1.graph_util.extract_sub_graph
2022-02-23 05:00:13 [INFO] Pass StripUnusedNodesOptimizer elapsed time: 912.96 ms
2022-02-23 05:00:14 [INFO] Pass GraphCseOptimizer elapsed time: 249.38 ms
2022-02-23 05:00:14 [INFO] Pass FoldBatchNormNodesOptimizer elapsed time: 781.72 ms
2022-02-23 05:00:15 [INFO] Pass UpdateEnterOptimizer elapsed time: 105.81 ms
2022-02-23 05:00:15 [INFO] Pass ConvertLeakyReluOptimizer elapsed time: 126.88 ms
2022-02-23 05:00:15 [INFO] Pass ConvertAddToBiasAddOptimizer elapsed time: 128.69 ms
2022-02-23 05:00:15 [INFO] Pass FuseTransposeReshapeOptimizer elapsed time: 131.34 ms
2022-02-23 05:00:15 [INFO] Pass FuseConvWithMathOptimizer elapsed time: 129.63 ms
2022-02-23 05:00:15 [INFO] Pass ExpandDimsOptimizer elapsed time: 125.77 ms
2022-02-23 05:00:15 [INFO] Pass InjectDummyBiasAddOptimizer elapsed time: 171.51 ms
2022-02-23 05:00:16 [INFO] Pass MoveSqueezeAfterReluOptimizer elapsed time: 123.17 ms
2022-02-23 05:00:17 [INFO] Pass Pre Optimization elapsed time: 119182.96 ms
2022-02-23 05:00:18 [INFO] Neither evaluation function nor metric is defined. Generate a quantized model with default quantization configuration.
2022-02-23 05:00:18 [INFO] Generate a fake evaluation function.

2022-02-23 05:00:18 [INFO] Get FP32 model baseline.
2022-02-23 05:00:18 [INFO] Save tuning history to /workspaces/Neural_Compressor/nc_workspace/2022-02-23_04-51-36/./history.snapshot.
2022-02-23 05:00:18 [INFO] FP32 baseline is: [Accuracy: 1.0000, Duration (seconds): 0.0001]
WARNING:tensorflow:SavedModel saved prior to TF 2.5 detected when loading Keras model. Please ensure that you are saving the model with model.save() or tf.keras.models.save_model(), NOT tf.saved_model.save(). To confirm, there should be a file named "keras_metadata.pb" in the SavedModel directory.
2022-02-23 05:00:19 [WARNING] SavedModel saved prior to TF 2.5 detected when loading Keras model. Please ensure that you are saving the model with model.save() or tf.keras.models.save_model(), NOT tf.saved_model.save(). To confirm, there should be a file named "keras_metadata.pb" in the SavedModel directory.
WARNING:tensorflow:SavedModel saved prior to TF 2.5 detected when loading Keras model. Please ensure that you are saving the model with model.save() or tf.keras.models.save_model(), NOT tf.saved_model.save(). To confirm, there should be a file named "keras_metadata.pb" in the SavedModel directory.
2022-02-23 05:01:14 [WARNING] SavedModel saved prior to TF 2.5 detected when loading Keras model. Please ensure that you are saving the model with model.save() or tf.keras.models.save_model(), NOT tf.saved_model.save(). To confirm, there should be a file named "keras_metadata.pb" in the SavedModel directory.
WARNING:tensorflow:SavedModel saved prior to TF 2.5 detected when loading Keras model. Please ensure that you are saving the model with model.save() or tf.keras.models.save_model(), NOT tf.saved_model.save(). To confirm, there should be a file named "keras_metadata.pb" in the SavedModel directory.
2022-02-23 05:02:09 [WARNING] SavedModel saved prior to TF 2.5 detected when loading Keras model. Please ensure that you are saving the model with model.save() or tf.keras.models.save_model(), NOT tf.saved_model.save(). To confirm, there should be a file named "keras_metadata.pb" in the SavedModel directory.
WARNING:tensorflow:SavedModel saved prior to TF 2.5 detected when loading Keras model. Please ensure that you are saving the model with model.save() or tf.keras.models.save_model(), NOT tf.saved_model.save(). To confirm, there should be a file named "keras_metadata.pb" in the SavedModel directory.
2022-02-23 05:03:05 [WARNING] SavedModel saved prior to TF 2.5 detected when loading Keras model. Please ensure that you are saving the model with model.save() or tf.keras.models.save_model(), NOT tf.saved_model.save(). To confirm, there should be a file named "keras_metadata.pb" in the SavedModel directory.
2022-02-23 05:04:00 [WARNING] Found possible input node names: ['InputLayer'], output node names: ['dense_3'].
WARNING:tensorflow:SavedModel saved prior to TF 2.5 detected when loading Keras model. Please ensure that you are saving the model with model.save() or tf.keras.models.save_model(), NOT tf.saved_model.save(). To confirm, there should be a file named "keras_metadata.pb" in the SavedModel directory.
2022-02-23 05:04:01 [WARNING] SavedModel saved prior to TF 2.5 detected when loading Keras model. Please ensure that you are saving the model with model.save() or tf.keras.models.save_model(), NOT tf.saved_model.save(). To confirm, there should be a file named "keras_metadata.pb" in the SavedModel directory.
WARNING:tensorflow:SavedModel saved prior to TF 2.5 detected when loading Keras model. Please ensure that you are saving the model with model.save() or tf.keras.models.save_model(), NOT tf.saved_model.save(). To confirm, there should be a file named "keras_metadata.pb" in the SavedModel directory.
2022-02-23 05:04:57 [WARNING] SavedModel saved prior to TF 2.5 detected when loading Keras model. Please ensure that you are saving the model with model.save() or tf.keras.models.save_model(), NOT tf.saved_model.save(). To confirm, there should be a file named "keras_metadata.pb" in the SavedModel directory.
WARNING:tensorflow:SavedModel saved prior to TF 2.5 detected when loading Keras model. Please ensure that you are saving the model with model.save() or tf.keras.models.save_model(), NOT tf.saved_model.save(). To confirm, there should be a file named "keras_metadata.pb" in the SavedModel directory.
2022-02-23 05:05:52 [WARNING] SavedModel saved prior to TF 2.5 detected when loading Keras model. Please ensure that you are saving the model with model.save() or tf.keras.models.save_model(), NOT tf.saved_model.save(). To confirm, there should be a file named "keras_metadata.pb" in the SavedModel directory.
WARNING:tensorflow:SavedModel saved prior to TF 2.5 detected when loading Keras model. Please ensure that you are saving the model with model.save() or tf.keras.models.save_model(), NOT tf.saved_model.save(). To confirm, there should be a file named "keras_metadata.pb" in the SavedModel directory.
2022-02-23 05:06:47 [WARNING] SavedModel saved prior to TF 2.5 detected when loading Keras model. Please ensure that you are saving the model with model.save() or tf.keras.models.save_model(), NOT tf.saved_model.save(). To confirm, there should be a file named "keras_metadata.pb" in the SavedModel directory.
2022-02-23 05:07:43 [WARNING] Found possible input node names: ['InputLayer'], output node names: ['dense_3'].
WARNING:tensorflow:SavedModel saved prior to TF 2.5 detected when loading Keras model. Please ensure that you are saving the model with model.save() or tf.keras.models.save_model(), NOT tf.saved_model.save(). To confirm, there should be a file named "keras_metadata.pb" in the SavedModel directory.
2022-02-23 05:07:44 [WARNING] SavedModel saved prior to TF 2.5 detected when loading Keras model. Please ensure that you are saving the model with model.save() or tf.keras.models.save_model(), NOT tf.saved_model.save(). To confirm, there should be a file named "keras_metadata.pb" in the SavedModel directory.
2022-02-23 05:08:41.231079: I tensorflow/core/grappler/devices.cc:75] Number of eligible GPUs (core count >= 8, compute capability >= 0.0): 0 (Note: TensorFlow was not compiled with CUDA or ROCm support)
2022-02-23 05:08:41.231264: I tensorflow/core/grappler/clusters/single_machine.cc:358] Starting new session
2022-02-23 05:08:41.254906: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:1149] Optimization results for grappler item: graph_to_optimize
function_optimizer: function_optimizer did nothing. time = 0.013ms.
function_optimizer: function_optimizer did nothing. time = 0.001ms.

2022-02-23 05:08:46.726701: I tensorflow/core/grappler/devices.cc:75] Number of eligible GPUs (core count >= 8, compute capability >= 0.0): 0 (Note: TensorFlow was not compiled with CUDA or ROCm support)
2022-02-23 05:08:46.726891: I tensorflow/core/grappler/clusters/single_machine.cc:358] Starting new session
2022-02-23 05:08:46.895748: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:1149] Optimization results for grappler item: tf_graph
constant_folding: Graph size after: 1118 nodes (-612), 1787 edges (-612), time = 63.491ms.
constant_folding: Graph size after: 1118 nodes (0), 1787 edges (0), time = 53.424ms.

2022-02-23 05:09:21 [INFO] Pass Quantization elapsed time: 33568.48 ms
2022-02-23 05:10:22 [INFO] Pass QuantizedRNNConverter elapsed time: 76.27 ms
2022-02-23 05:10:26 [INFO] Pass StripUnusedNodesOptimizer elapsed time: 143.55 ms
2022-02-23 05:10:26 [INFO] Pass RemoveTrainingNodesOptimizer elapsed time: 57.56 ms
2022-02-23 05:10:26 [INFO] Pass FoldBatchNormNodesOptimizer elapsed time: 63.12 ms
2022-02-23 05:10:26 [INFO] Pass MetaOpOptimizer elapsed time: 28.79 ms
2022-02-23 05:10:27 [INFO] Pass PostCseOptimizer elapsed time: 403.61 ms
2022-02-23 05:10:28 [INFO] |*Mixed Precision Statistics|
2022-02-23 05:10:28 [INFO] +---------------+---------+-------+-------+
2022-02-23 05:10:28 [INFO] | Op Type | Total | INT8 | FP32 |
2022-02-23 05:10:28 [INFO] +---------------+---------+-------+-------+
2022-02-23 05:10:28 [INFO] | Conv2D | 120 | 120 | 0 |
2022-02-23 05:10:28 [INFO] | MatMul | 4 | 4 | 0 |
2022-02-23 05:10:28 [INFO] | ConcatV2 | 58 | 0 | 58 |
2022-02-23 05:10:28 [INFO] | MaxPool | 1 | 1 | 0 |
2022-02-23 05:10:28 [INFO] | AvgPool | 3 | 3 | 0 |
2022-02-23 05:10:28 [INFO] | QuantizeV2 | 64 | 64 | 0 |
2022-02-23 05:10:28 [INFO] | Dequantize | 64 | 64 | 0 |
2022-02-23 05:10:28 [INFO] +---------------+---------+-------+-------+
2022-02-23 05:10:28 [INFO] Pass quantize model elapsed time: 609884.64 ms
2022-02-23 05:10:28 [INFO] Tune 1 result is: [Accuracy (None|fp32): 1.0000|1.0000, Duration (seconds) (None|fp32): 0.0000|0.0001], Best tune result is: [Accuracy: 1.0000, Duration (seconds): 0.0000]
2022-02-23 05:10:28 [INFO] Save tuning history to /workspaces/Neural_Compressor/nc_workspace/2022-02-23_04-51-36/./history.snapshot.
2022-02-23 05:10:29 [INFO] Specified timeout or max trials is reached! Found a quantized model which meet accuracy goal. Exit.
2022-02-23 05:10:29 [INFO] Save deploy yaml to /workspaces/Neural_Compressor/nc_workspace/2022-02-23_04-51-36/deploy.yaml
2022-02-23 05:10:29.169157: I tensorflow/core/common_runtime/process_util.cc:146] Creating new thread pool with default inter op setting: 2. Tune using inter_op_parallelism_threads for best performance.
WARNING:tensorflow:From /usr/local/lib/python3.8/dist-packages/tensorflow/python/saved_model/signature_def_utils_impl.py:207: build_tensor_info (from tensorflow.python.saved_model.utils_impl) is deprecated and will be removed in a future version.
Instructions for updating:
This function will only be available through the v1 compatibility library as tf.compat.v1.saved_model.utils.build_tensor_info or tf.compat.v1.saved_model.build_tensor_info.
2022-02-23 05:10:29 [WARNING] From /usr/local/lib/python3.8/dist-packages/tensorflow/python/saved_model/signature_def_utils_impl.py:207: build_tensor_info (from tensorflow.python.saved_model.utils_impl) is deprecated and will be removed in a future version.
Instructions for updating:
This function will only be available through the v1 compatibility library as tf.compat.v1.saved_model.utils.build_tensor_info or tf.compat.v1.saved_model.build_tensor_info.
INFO:tensorflow:No assets to save.
2022-02-23 05:10:29 [INFO] No assets to save.
INFO:tensorflow:No assets to write.
2022-02-23 05:10:29 [INFO] No assets to write.
INFO:tensorflow:SavedModel written to: /workspaces/Neural_Compressor/models/int8/saved_model.pb
2022-02-23 05:10:30 [INFO] SavedModel written to: /workspaces/Neural_Compressor/models/int8/saved_model.pb
2022-02-23 05:10:30 [INFO] Save quantized model to /workspaces/Neural_Compressor/models/int8.

[Tensorflow] IndexError: list index (0) out of range

Environment: Google Colaboratory
Tensorflow (official): 2.6.0
Neural Compressor: 1.7.2

Hi, I was quantizing a pre-trained Keras model (also in Tensorflow 2.6) with post training quantization method when the following error occurred:

2021-10-28 08:33:25 [INFO] Pass Quantization elapsed time: 9205.52 ms
2021-10-28 08:33:42 [INFO] Pass QuantizedRNNConverter elapsed time: 300.62 ms
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/neural_compressor/adaptor/tensorflow.py", line 462, in quantize
    data_loader=data_loader).convert()
  File "/usr/local/lib/python3.7/dist-packages/neural_compressor/adaptor/tf_utils/graph_converter.py", line 249, in convert
    post_cse_graph_def = PostCseOptimizer(model.graph_def).do_transformation()
AttributeError: 'NoneType' object has no attribute 'graph_def'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/neural_compressor/adaptor/tf_utils/graph_converter.py", line 571, in quantize
    self._fuse_requantize_with_fused_quantized_node()
  File "/usr/local/lib/python3.7/dist-packages/neural_compressor/adaptor/tf_utils/graph_converter.py", line 709, in _fuse_requantize_with_fused_quantized_node
    self._tmp_graph_def).do_transformation()
  File "/usr/local/lib/python3.7/dist-packages/neural_compressor/adaptor/tf_utils/graph_rewriter/int8/fuse_matmul_requantize.py", line 108, in do_transformation
    min_input_value = (min_input_node.attr['value'].tensor.float_val)[0]
IndexError: list index (0) out of range
2021-10-28 08:33:43 [ERROR] Fail to quantize graph due to list index (0) out of range.
2021-10-28 08:33:43 [ERROR] Unexpected exception AttributeError("'NoneType' object has no attribute 'graph_def'") happened during tuning.
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/neural_compressor/adaptor/tensorflow.py", line 462, in quantize
    data_loader=data_loader).convert()
  File "/usr/local/lib/python3.7/dist-packages/neural_compressor/adaptor/tf_utils/graph_converter.py", line 249, in convert
    post_cse_graph_def = PostCseOptimizer(model.graph_def).do_transformation()
AttributeError: 'NoneType' object has no attribute 'graph_def'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/neural_compressor/experimental/quantization.py", line 151, in execute
    self.strategy.traverse()
  File "/usr/local/lib/python3.7/dist-packages/neural_compressor/strategy/strategy.py", line 333, in traverse
    tune_cfg, self.model, self.calib_dataloader, self.q_func)
  File "/usr/local/lib/python3.7/dist-packages/neural_compressor/utils/utility.py", line 240, in fi
    res = func(*args, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/neural_compressor/adaptor/tensorflow.py", line 475, in quantize
    data_loader=data_loader).convert()
  File "/usr/local/lib/python3.7/dist-packages/neural_compressor/adaptor/tf_utils/graph_converter.py", line 249, in convert
    post_cse_graph_def = PostCseOptimizer(model.graph_def).do_transformation()
AttributeError: 'NoneType' object has no attribute 'graph_def'
2021-10-28 08:33:43 [ERROR] Specified timeout or max trials is reached! Not found any quantized model which meet accuracy goal. Exit.
Traceback (most recent call last):
  File "main.py", line 115, in <module>
    evaluate_opt_graph.run()
  File "main.py", line 100, in run
    q_model.save(self.args.output_graph) 
AttributeError: 'NoneType' object has no attribute 'save'

The complete log is attached below.
error_log.txt

If helpful, please find the quantization config, pre-trained model and script here (please request access via email)

Thanks!

what should be included in Model()

Hi, I'd like to ask about what should I input in this common.Model, specifically what type of the object?
quantizer.model = common.Model('../models/saved_model')
Since there're not many comments in the source code.

[BUG] Issue found when tuning ssd_mobilenet_V1

Summary:

The ConcatV2 node is not quantized in ssd-mobilenet-V1, but is not update in tune_history config.

Description:

We tried to tune ssd_mobilenet_V1 using the model from https://github.com/intel/neural-compressor/tree/master/examples/tensorflow/object_detection#ssd_mobilenet_v1.
In the tuning config, we set the relative drop to be 0.003 as the following shows:

accuracy_criterion:
    relative:  0.003

The script we used to tune the model is:
bash run_tuning.sh --config=ssd_mobilenet_v1.yaml --input_model=./frozen_inference_graph.pb --output_model=./tensorflow-ssd_mobilenet_v1-tune.pb
The ssd_mobilenet_v1.yaml file used in the script is as attached:
ssd_mobilenet_v1.yaml.txt
We can see in the Mixed Precision statistics of the fully quantized model that ConcatV2 layers are all in FP32.

|**********Mixed Precision Statistics*********|
+-----------------------+-------+------+------+
|        Op Type        | Total | INT8 | FP32 |
+-----------------------+-------+------+------+
|         Conv2D        |   34  |  34  |  0   |
| DepthwiseConv2dNative |   13  |  13  |  0   |
|        ConcatV2       |  106  |  0   | 106  |
|       QuantizeV2      |   1   |  1   |  0   |
|       Dequantize      |   18  |  18  |  0   |
|          Cast         |   5   |  0   |  5   |
+-----------------------+-------+------+------+

This is because according to lpot/adaptor/tf_utils/quantize_graph/quantize_graph_concatv2.py, line 64, Concat can only be quantized when the inputs are Dequantize nodes.

    def _quantizable_concat(self, node):
        deq_type = []
        for input_node_name in node.input[:node.attr['N'].i]:
            node_name = helper.node_name_from_input(input_node_name)
            if self.node_name_mapping[node_name].node.op != "Dequantize":
                return False

In our experiment, the fully quantized model gave 22.90map, which is about 0.9% lower than 23.13map baseline.
Then the tuner starts to revert layers back from int8 to fp32 because we set the target to be 0.3%. Supposedly, the ConcatV2 layers should be ignored because these layers are already fp32. But in _find_tuning_history function in lpot/strategy/strategy.py:

    def _find_tuning_history(self, tune_cfg):
        """check if the specified tune_cfg is evaluated or not on same yaml config.

        Args:
            tune_cfg (dict): The tune_cfg to check if evaluated before.

        Returns:
            tuning_history or None: The tuning history containing evaluated tune_cfg.
        """
        for tuning_history in self.tuning_history:
            # only check if a tune_cfg is evaluated under same yam config, excluding
            # some fields in tuning section of yaml, such as tensorboard, snapshot, resume.
            if self._same_yaml(tuning_history['cfg'], self.cfg):
                for history in tuning_history['history']:
                    if history and history['tune_cfg'] == tune_cfg:
                        return tuning_history

        return None

The Concat nodes in history config file are still in int8. So the tuner would evaluate the same config file again and again, this would be rather time-consuming because there are 106 ConcatV2 nodes in ssd-mobilenet-V1. The detailed log file is as attached:
log.txt

examples/helloworld/tf_example2 seem to quantize on test dataset

class Dataset(object):
 def __init__(self):'
      (train_images, train_labels), (test_images,
                 test_labels) = keras.datasets.fashion_mnist.load_data()
      self.test_images = test_images.astype(np.float32) / 255.0
      self.labels = test_labels
      pass

  def __getitem__(self, index):
      return self.test_images[index], self.labels[index]

  def __len__(self):
      return len(self.test_images)

this code will return test dataset only

[Tensorflow] Question: PTQ and QAT

Hi, may I ask some questions based on my understanding of the source code please:

1. Conv2D

As far as I know, in post-training quantization, Conv2D supports both Conv2DBiasAddRelu and Conv2DBiasAddLeakyRelu through FuseNodeStartWithConv2d.apply_conv_biasadd_relu_fusion(...). However, the key difference is that with Leaky ReLU, quantized values cannot be directly passed to next quantized Conv2D due to the positive inputs constraint, so QuantizedConv2DWithBiasAndRelu will first dequantize to pass through Leaky ReLU, and then quantize again into next QuantizedConv2DWithBiasAndRelu.

So, if I have a quantization-aware trained model with Conv2DBiasAddLeakyRelu pattern, is it also converted to quantized model in the same manner? That is, regardless of the quantization method, in order to pass through Leaky ReLU, the predecessor node must first dequantize and the successor node must add a quantize input layer, is that correct?

2. LSTM

I noticed the following lines:

# FIXME We only quantize the MatMul op which second input node type is const. This is a
# workaround for RNN model like LTSM.
if weight_node.op != 'Const':
self.output_graph = self.input_graph
return []

Does this mean quantization for LSTM is currently not supported?

Thanks!

local variable 'tmp_model' referenced before assignment

I am using version 1.8.0. And I got following error. Version 1.7.3 is fine.

Traceback (most recent call last):
  File "/home/zhentaoc/PycharmProjects/chronos-benchmark/forecaster/forecaster_tcn_altran_quant.py", line 148, in <module>
    quant_model = quantizer()
  File "/home/zhentaoc/anaconda3/envs/test/lib/python3.7/site-packages/neural_compressor/experimental/quantization.py", line 212, in __call__
    return super(Quantization, self).__call__()
  File "/home/zhentaoc/anaconda3/envs/test/lib/python3.7/site-packages/neural_compressor/experimental/component.py", line 208, in __call__
    self.pre_process()
  File "/home/zhentaoc/anaconda3/envs/test/lib/python3.7/site-packages/neural_compressor/experimental/quantization.py", line 144, in pre_process
    self.hooks)
  File "/home/zhentaoc/anaconda3/envs/test/lib/python3.7/site-packages/neural_compressor/strategy/basic.py", line 86, in __init__
    q_hooks)
  File "/home/zhentaoc/anaconda3/envs/test/lib/python3.7/site-packages/neural_compressor/strategy/strategy.py", line 189, in __init__
    self.capability = self.adaptor.query_fw_capability(model)
  File "/home/zhentaoc/anaconda3/envs/test/lib/python3.7/site-packages/neural_compressor/utils/utility.py", line 240, in fi
    res = func(*args, **kwargs)
  File "/home/zhentaoc/anaconda3/envs/test/lib/python3.7/site-packages/neural_compressor/adaptor/pytorch.py", line 806, in query_fw_capability
    tmp_model = self.fuse_fx_model(model)
  File "/home/zhentaoc/anaconda3/envs/test/lib/python3.7/site-packages/neural_compressor/adaptor/pytorch.py", line 844, in fuse_fx_model
    graph_module = GraphModule(tmp_model, tracer.trace(tmp_model))
UnboundLocalError: local variable 'tmp_model' referenced before assignment

[ONNX] Fails to quantize dynamic input model

Environment: Google Colab
LPOT Ver: 1.5.1
ONNX Ver: 1.10.1
ONNXRuntime Ver: 1.8.1

I originally built the model as Keras saved model, then I used tf2onnx to convert the saved model to ONNX format.

My model originally has two inputs of static shapes:

(64, 60, 257)
(64, 257, 60, 1)

which can be successfully quantized with LPOT.

But when I set some dimensions to be dynamic as follows:

(-1, -1, 257)
(-1, 257, -1, 1)

and use it to run quantization with LPOT, the program ends with an assertion error

2021-08-05 09:48:08.018023: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-08-05 09:48:10 [INFO] Getting FP32 model baseline...
tcmalloc: large alloc 1073741824 bytes == 0x5599ab11e000 @  0x7fa8269b5b6b 0x7fa8269d5379 0x7fa75458534c 0x7fa7545822f4 0x7fa75453c7d1 0x7fa7545417b2 0x7fa7543b0b9f 0x7fa75456af3a 0x7fa75456c904 0x7fa754559d64 0x7fa75461e108 0x7fa7545294e7 0x7fa754535190 0x7fa75486e3c8 0x7fa75486eb75 0x7fa7548bd1e9 0x7fa7548e6b38 0x55991200cbf8 0x5599120806f2 0x55991207ac35 0x55991200d73a 0x55991207bd67 0x55991207b235 0x55991200d73a 0x55991207bd67 0x55991207b235 0x55991200d73a 0x55991207bb0e 0x55991200d65a 0x55991207bd67 0x55991200d65a
tcmalloc: large alloc 2147483648 bytes == 0x5599eb11e000 @  0x7fa8269b5b6b 0x7fa8269d5379 0x7fa75458534c 0x7fa7545822f4 0x7fa7545713a8 0x7fa7545290d7 0x7fa754535190 0x7fa75486e3c8 0x7fa75486eb75 0x7fa7548bd1e9 0x7fa7548e6b38 0x55991200cbf8 0x5599120806f2 0x55991207ac35 0x55991200d73a 0x55991207bd67 0x55991207b235 0x55991200d73a 0x55991207bd67 0x55991207b235 0x55991200d73a 0x55991207bb0e 0x55991200d65a 0x55991207bd67 0x55991200d65a 0x55991207bd67 0x55991200d65a 0x55991207bd67 0x55991200db99 0x559912050e79 0x55991200c7b2
2021-08-05 09:51:48 [INFO] Save tuning history to /content/drive/My Drive/Work/model_quantization/lpot_files/onnx/lpot_workspace/2021-08-05_09-47-51/./history.snapshot
2021-08-05 09:51:48 [INFO] FP32 baseline is: [accuracy: 0.2009, duration (seconds): 217.5703]
tcmalloc: large alloc 2147483648 bytes == 0x5599eb11e000 @  0x7fa8269b5b6b 0x7fa8269d5379 0x7fa75458534c 0x7fa7545822f4 0x7fa75453c7d1 0x7fa7545417b2 0x7fa7543b0b9f 0x7fa75456af3a 0x7fa75456c904 0x7fa754559c68 0x7fa7545e65ef 0x7fa7545294e7 0x7fa754535190 0x7fa75486e3c8 0x7fa75486eb75 0x7fa7548bd1e9 0x7fa7548e6b38 0x55991200cbf8 0x5599120806f2 0x55991207ac35 0x55991200d73a 0x55991207bd67 0x55991207b235 0x55991200d73a 0x55991207bd67 0x55991207ac35 0x55991200d73a 0x55991207bd67 0x55991207b235 0x55991200d73a 0x55991207bd67
tcmalloc: large alloc 4294967296 bytes == 0x559a6b11e000 @  0x7fa8269b5b6b 0x7fa8269d5379 0x7fa75458534c 0x7fa7545822f4 0x7fa75453c7d1 0x7fa7545417b2 0x7fa7543b0b9f 0x7fa75456af3a 0x7fa75456c904 0x7fa754559d64 0x7fa754642684 0x7fa75464a70b 0x7fa7545294e7 0x7fa754535190 0x7fa75486e3c8 0x7fa75486eb75 0x7fa7548bd1e9 0x7fa7548e6b38 0x55991200cbf8 0x5599120806f2 0x55991207ac35 0x55991200d73a 0x55991207bd67 0x55991207b235 0x55991200d73a 0x55991207bd67 0x55991207ac35 0x55991200d73a 0x55991207bd67 0x55991207b235 0x55991200d73a
Traceback (most recent call last):
  File "main.py", line 164, in <module>
    q_model = quantize() 
  File "/usr/local/lib/python3.7/dist-packages/lpot/experimental/quantization.py", line 177, in __call__
    self.strategy.traverse()
  File "/usr/local/lib/python3.7/dist-packages/lpot/strategy/strategy.py", line 310, in traverse
    tune_cfg, self.model, self.calib_dataloader, self.q_func)
  File "/usr/local/lib/python3.7/dist-packages/lpot/utils/utility.py", line 200, in fi
    res = func(*args, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/lpot/adaptor/onnxrt.py", line 106, in quantize
    quantizer.quantize_model()
  File "/usr/local/lib/python3.7/dist-packages/lpot/adaptor/ox_utils/onnx_quantizer.py", line 204, in quantize_model
    op_quantizer.quantize()
  File "/usr/local/lib/python3.7/dist-packages/lpot/adaptor/ox_utils/operators/split.py", line 32, in quantize
    self.quantizer.quantize_inputs(node, [0])
  File "/usr/local/lib/python3.7/dist-packages/lpot/adaptor/ox_utils/onnx_quantizer.py", line 759, in quantize_inputs
    self.config[node.name]['activation']['dtype'])
  File "/usr/local/lib/python3.7/dist-packages/lpot/adaptor/ox_utils/onnx_quantizer.py", line 528, in _get_quantize_input_nodes
    of nodes to be quantized are required.".format(input_name))
ValueError: Quantization parameters are not specified for param StatefulPartitionedCall/model/32/Conv2D/SpaceToBatchND_reshape__132:0.In static mode quantization params for inputs and outputs

I also found out by comparing the static shape model and dynamic shape model, that the number of operations in the dynamic model is much higher than the static model, as it adds a lot of reshape/resize related nodes to help deal with the unknown dimension issue. The node that triggers the error StatefulPartitionedCall/model/32/Conv2D/SpaceToBatchND_reshape__132:0 seems to be one of those added for dynamic input purpose.

Timeout issue when quantizing BERT large squad model

I have been following these steps to quantize BERT model - https://github.com/intel/lpot/tree/master/examples/tensorflow/nlp/bert_large_squad.

While running the below command for quantization:
python tune_squad.py --config=./bert.yaml --input_model=./bert_fp32.pb --output_model=./int8.pb --tune

I see a timeout/crash with the following message:
[ERROR] Specified timeout or max trials is reached! Not found any quantized model which meet accuracy goal.

Error log - bert_quantization_error.txt
yaml config file I am using - bert.yaml.txt

PyTorch PTQ does not save model file by default

Post training quantization for PyTorch saves the state file (collections.OrderedDict) instead of the model files.
This prevents conversion to other frameworks like onnx post quantization.
Is there a way to force saving model file?

AssertionError: Framework is not detected correctly from model format.

when I use NIC to Quantization my tensorflow 2.x saved_model, it's occur error:

from neural_compressor.experimental import Quantization, common
quantizer = Quantization()
quantizer.model = "/data/home/dcn3_bt_2/models_new/origin_model"
q_model = quantizer.fit()

output:

2022-03-22 12:39:30 [WARNING] Force convert framework model to neural_compressor model.
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-36-ab40d8ad3e38> in <module>
      1 from neural_compressor.experimental import Quantization, common
      2 quantizer = Quantization()
----> 3 quantizer.model = "/data/home/dcn3_bt_2/models_new/origin_model"
      4 q_model = quantizer.fit()

~/anaconda3/lib/python3.8/site-packages/neural_compressor/experimental/component.py in model(self, user_model)
    358         if not isinstance(user_model, BaseModel):
    359             logger.warning("Force convert framework model to neural_compressor model.")
--> 360             self._model = Model(user_model)
    361         else:
    362             self._model = user_model

~/anaconda3/lib/python3.8/site-packages/neural_compressor/experimental/common/model.py in __new__(cls, root, **kwargs)
     39         """
     40         backend = get_backend()
---> 41         framework = get_model_fwk_name(root)
     42 
     43         if backend == 'engine':

~/anaconda3/lib/python3.8/site-packages/neural_compressor/model/model.py in get_model_fwk_name(model)
    199         if fwk_name != 'NA':
    200             break
--> 201     assert fwk_name != 'NA', 'Framework is not detected correctly from model format.'
    202 
    203     return fwk_name

AssertionError: Framework is not detected correctly from model format.

Specify the saved model signature key

I'm trying to use the quantizer with a saved model and I'm running into a KeyError for the signature serving_default. How do I specify a different signature key?

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-64-46277b78cb73> in <module>
      1 quantizer.metric = common.Metric(metric_cls=Accuracy, name="BERT_metric")
----> 2 q_model = quantizer()

/usr/local/lib/python3.8/dist-packages/neural_compressor/experimental/quantization.py in __call__(self)
    210 
    211         """
--> 212         return super(Quantization, self).__call__()
    213 
    214     def dataset(self, dataset_type, *args, **kwargs):

/usr/local/lib/python3.8/dist-packages/neural_compressor/experimental/component.py in __call__(self)
    204 
    205     def __call__(self):
--> 206         self.pre_process()
    207         results = self.execute()
    208         self.post_process()

/usr/local/lib/python3.8/dist-packages/neural_compressor/experimental/quantization.py in pre_process(self)
    134                 _resume = pickle.load(f).__dict__
    135 
--> 136         self.strategy = STRATEGIES[strategy](
    137             self._model,
    138             self.conf,

/usr/local/lib/python3.8/dist-packages/neural_compressor/strategy/basic.py in __init__(self, model, conf, q_dataloader, q_func, eval_dataloader, eval_func, dicts, q_hooks)
     74     def __init__(self, model, conf, q_dataloader, q_func=None,
     75                  eval_dataloader=None, eval_func=None, dicts=None, q_hooks=None):
---> 76         super(
     77             BasicTuneStrategy,
     78             self).__init__(

/usr/local/lib/python3.8/dist-packages/neural_compressor/strategy/strategy.py in __init__(self, model, conf, q_dataloader, q_func, eval_dataloader, eval_func, resume, q_hooks)
    183         self.objective = OBJECTIVES[objective](self.cfg.tuning.accuracy_criterion)
    184 
--> 185         self.capability = self.adaptor.query_fw_capability(model)
    186         self.graph_optimization_mode = bool('graph_optimization' in self.cfg)
    187 

/usr/local/lib/python3.8/dist-packages/neural_compressor/adaptor/tensorflow.py in query_fw_capability(self, model)
    569         from .tf_utils.graph_rewriter.generic.pre_optimize import PreOptimization
    570 
--> 571         self.pre_optimizer_handle = PreOptimization(model, self.optimization)
    572 
    573         self.pre_optimized_model = self.pre_optimizer_handle.get_optimized_model()

/usr/local/lib/python3.8/dist-packages/neural_compressor/adaptor/tf_utils/graph_rewriter/generic/pre_optimize.py in __init__(self, model, optimization)
     47 
     48         self.analyzer = GraphAnalyzer()
---> 49         self.analyzer.graph = model.graph_def
     50         self.analyzer.parse_graph()
     51         self._tmp_graph_def = None

/usr/local/lib/python3.8/dist-packages/neural_compressor/model/model.py in graph_def(self)
    675     @property
    676     def graph_def(self):
--> 677         return self.graph.as_graph_def()
    678 
    679     @property

/usr/local/lib/python3.8/dist-packages/neural_compressor/model/model.py in graph(self)
    692     @property
    693     def graph(self):
--> 694         return self.sess.graph
    695 
    696     @graph_def.setter

/usr/local/lib/python3.8/dist-packages/neural_compressor/model/model.py in sess(self)
    687     def sess(self):
    688         if self._sess is None:
--> 689             self._load_sess(self._model, **self.kwargs)
    690         return self._sess
    691 

/usr/local/lib/python3.8/dist-packages/neural_compressor/model/model.py in _load_sess(self, model, **kwargs)
    711             kwargs.update({'name': self.name})
    712         # assert self.model_type, 'model type not set....'
--> 713         output_sess = SESSIONS[self.model_type](model,
    714                                                 self._input_tensor_names, \
    715                                                 self._output_tensor_names,

/usr/local/lib/python3.8/dist-packages/neural_compressor/model/model.py in saved_model_session(model, input_tensor_names, output_tensor_names, **kwargs)
    566         from tensorflow.core.protobuf import meta_graph_pb2
    567         _saved_model = load.load(model, [tag_constants.SERVING])
--> 568         func = _saved_model.signatures[signature_constants.DEFAULT_SERVING_SIGNATURE_DEF_KEY]
    569         frozen_func = convert_variables_to_constants_v2(func)
    570         grappler_meta_graph_def = saver.export_meta_graph(

/usr/local/lib/python3.8/dist-packages/tensorflow/python/saved_model/signature_serialization.py in __getitem__(self, key)
    245 
    246   def __getitem__(self, key):
--> 247     return self._signatures[key]
    248 
    249   def __iter__(self):

KeyError: 'serving_default'

This is using the neural-compressor version 1.7

AttributeError: 'list' object has no attribute 'shape'

LPOT Version: 1.4
Installation: source
Environment: Google Colab
Framework: ONNX (as specified in requirements.txt in the example)

I used a common dataloader to load my own dataset in main.py (using numpy array to store the data) as below:

    tuneDataset = Dataset('tune')
    evalDataset = Dataset('eval')
    
    quantize = Quantization(args.config)
    quantize.model = common.Model(model)
    quantize.calib_dataloader = common.DataLoader(tuneDataset, batch_size=64, last_batch='drop')
    quantize.eval_dataloader = common.DataLoader(evalDataset, batch_size=64, last_batch='drop')
    q_model = quantize()
    q_model.save(args.output_model)

In the line q_model = quantize(), it was getting FP32 model baseline, and then the error occurs:

==========================================
Traceback (most recent call last):
File "main.py", line 112, in
q_model = quantize()
File "/usr/local/lib/python3.7/dist-packages/lpot-1.4.1-py3.7.egg/lpot/experimental/quantization.py", line 176, in call
self.strategy.traverse()
File "/usr/local/lib/python3.7/dist-packages/lpot-1.4.1-py3.7.egg/lpot/strategy/strategy.py", line 286, in traverse
self.baseline = self._evaluate(self.model)
File "/usr/local/lib/python3.7/dist-packages/lpot-1.4.1-py3.7.egg/lpot/strategy/strategy.py", line 424, in _evaluate
val = self.objective.evaluate(eval_func, model)
File "/usr/local/lib/python3.7/dist-packages/lpot-1.4.1-py3.7.egg/lpot/objective.py", line 213, in evaluate
acc = eval_func(model)
File "/usr/local/lib/python3.7/dist-packages/lpot-1.4.1-py3.7.egg/lpot/utils/create_obj_from_config.py", line 131, in eval_func
tensorboard, fp32_baseline)
File "/usr/local/lib/python3.7/dist-packages/lpot-1.4.1-py3.7.egg/lpot/adaptor/onnxrt.py", line 419, in evaluate
metric.update(predictions, labels)
File "/usr/local/lib/python3.7/dist-packages/lpot-1.4.1-py3.7.egg/lpot/experimental/metric/metric.py", line 491, in update
preds, labels = _shape_validate(preds, labels)
File "/usr/local/lib/python3.7/dist-packages/lpot-1.4.1-py3.7.egg/lpot/experimental/metric/metric.py", line 268, in _shape_validate
of labels {} does not match shape of predictions {}'.format(labels.shape, preds.shape)
AttributeError: 'list' object has no attribute 'shape'

TensorFlow.yaml common capabilities and common patterns supported cases.

Hi,

I have a couple of questions to help me understand the quantization flow and the supported configurations in the TensorFlow.yaml file provided.

dnnl_verbose,exec,cpu,reorder,jit:uni,undef,src_f32::blocked:acdb:f0 dst_s8::blocked:acdb:f0,oscale:0:0.840726;,,1x224x224x3,0.032959
dnnl_verbose,exec,cpu,convolution,jit_int8:avx512_core,forward_training,src_s8::blocked:acdb:f0 wei_s8:p:blocked:ABcd4b16a4b:f3 bia_f32::blocked:a:f0 dst_u8::blocked:acdb:f0,oscale:2;post_ops:'eltwise_relu;';,alg:convolution_direct,mb1_ic3oc64_ih224oh112kh7sh2dh0ph3_iw224ow112kw7sw2dw0pw3,0.184082
dnnl_verbose,exec,cpu,pooling,jit_int:avx512_core,forward_inference,src_u8::blocked:acdb:f0 dst_u8::blocked:acdb:f0 ws_undef::undef::f0,,alg:pooling_max,mb1ic64_ih112oh56kh3dh0sh2ph0_iw112ow56kw3dw0sw2pw0,0.0258789
dnnl_verbose,exec,cpu,convolution,jit_int8_1x1:avx512_core,forward_training,src_u8::blocked:acdb:f0 wei_s8::blocked:ABcd4b16a4b:f0 bia_s32::blocked:a:f0 dst_s8::blocked:acdb:f0,oscale:2;,alg:convolution_direct,mb1_ic64oc256_ih56oh56kh1sh1dh0ph0_iw56ow56kw1sw1dw0pw0,0.0419922
dnnl_verbose,exec,cpu,convolution,jit_int8_1x1:avx512_core,forward_training,src_u8::blocked:acdb:f0 wei_s8::blocked:ABcd4b16a4b:f0 bia_s32::blocked:a:f0 dst_u8::blocked:acdb:f0,oscale:2;post_ops:'eltwise_relu;';,alg:convolution_direct,mb1_ic64oc64_ih56oh56kh1sh1dh0ph0_iw56ow56kw1sw1dw0pw0,0.0249023

How to use the quantised model for inference

I was able to quantise and save a hugging face model with my custom data how do I use the model for inference. Could anybody help on how to load and use the quantised model with something like trainer / pipeline APIs?

I have 2 files in the directory to which I save my quantised model best_configure.yaml and best_model_weights.pt how do I use these files for inference?

[Tensorflow] Fail to quantize graph due to list index (0) out of range.

Environment: Google Colab
LPOT: 1.6
Tensorflow: official 2.6.0

Please find my model here

I am trying to quantize a Keras model, but in the middle of quantization prints the following error:

...
2021-08-30 01:27:42 [INFO] Unknown fusion pattern Conv2DRelu.
2021-08-30 01:27:42 [INFO] Unknown fusion pattern Conv2DRelu.
2021-08-30 01:27:42 [INFO] Unknown fusion pattern Conv2DRelu.
2021-08-30 01:27:43 [INFO] Unknown fusion pattern Conv2DRelu.
2021-08-30 01:27:44 [INFO] Pass Quantization elapsed time: 2318.58 ms
2021-08-30 01:28:14 [INFO] Pass QuantizedRNNConverter elapsed time: 55.02 ms
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/lpot/adaptor/tf_utils/graph_converter.py", line 572, in quantize
    self._fuse_requantize_with_fused_quantized_node()
  File "/usr/local/lib/python3.7/dist-packages/lpot/adaptor/tf_utils/graph_converter.py", line 705, in _fuse_requantize_with_fused_quantized_node
    self._tmp_graph_def).do_transformation()
  File "/usr/local/lib/python3.7/dist-packages/lpot/adaptor/tf_utils/graph_rewriter/int8/fuse_matmul_requantize.py", line 106, in do_transformation
    min_input_value = (min_input_node.attr['value'].tensor.float_val)[0]
IndexError: list index (0) out of range
2021-08-30 01:28:14 [ERROR] Fail to quantize graph due to list index (0) out of range.
2021-08-30 01:28:14 [ERROR] Unexpected exception AttributeError("'NoneType' object has no attribute 'graph_def'") happened during tuning.
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/lpot/experimental/quantization.py", line 140, in execute
    self.strategy.traverse()
  File "/usr/local/lib/python3.7/dist-packages/lpot/strategy/strategy.py", line 329, in traverse
    tune_cfg, self.model, self.calib_dataloader, self.q_func)
  File "/usr/local/lib/python3.7/dist-packages/lpot/utils/utility.py", line 201, in fi
    res = func(*args, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/lpot/adaptor/tensorflow.py", line 350, in quantize
    data_loader=data_loader).convert()
  File "/usr/local/lib/python3.7/dist-packages/lpot/adaptor/tf_utils/graph_converter.py", line 249, in convert
    post_cse_graph_def = PostCseOptimizer(model.graph_def).do_transformation()
AttributeError: 'NoneType' object has no attribute 'graph_def'
2021-08-30 01:28:14 [ERROR] Specified timeout or max trials is reached! Not found any quantized model which meet accuracy goal. Exit.
Traceback (most recent call last):
  File "main.py", line 119, in <module>
    evaluate_opt_graph.run()
  File "main.py", line 104, in run
    q_model.save(self.args.output_graph) 
AttributeError: 'NoneType' object has no attribute 'save'

Inside lpot/adaptor/tf_utils/graph_rewriter/int8/fuse_matmul_requantize.py, I printed out the node that caused the error:

### max_filter_node ###
name: "model/dense_4/Tensordot/ReadVariableOp_max"
op: "Const"
attr {
  key: "dtype"
  value {
    type: DT_FLOAT
  }
}
attr {
  key: "value"
  value {
    tensor {
      dtype: DT_FLOAT
      tensor_shape {
      }
      float_val: 17.38440704345703
    }
  }
}

### max_input_node ###
name: "model/dense_4/Tensordot/MatMul_eightbit_max_model/dense_4/Tensordot/Reshape/frozen_max_only"
op: "Const"
attr {
  key: "dtype"
  value {
    type: DT_FLOAT
  }
}
attr {
  key: "value"
  value {
    tensor {
      dtype: DT_FLOAT
      tensor_shape {
      }
      float_val: 49.27424240112305
    }
  }
}

### min_input_node ###
name: "model/dense_4/Tensordot/MatMul_eightbit_min_model/dense_4/Tensordot/Reshape"
op: "Min"
input: "model/dense_4/Tensordot/MatMul_eightbit_reshape_model/dense_4/Tensordot/Reshape"
input: "model/dense_4/Tensordot/MatMul_eightbit_reduction_dims"
attr {
  key: "T"
  value {
    type: DT_FLOAT
  }
}
attr {
  key: "Tidx"
  value {
    type: DT_INT32
  }
}
attr {
  key: "keep_dims"
  value {
    b: false
  }
}




Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/lpot/adaptor/tf_utils/graph_converter.py", line 572, in quantize
    self._fuse_requantize_with_fused_quantized_node()
  File "/usr/local/lib/python3.7/dist-packages/lpot/adaptor/tf_utils/graph_converter.py", line 705, in _fuse_requantize_with_fused_quantized_node
    self._tmp_graph_def).do_transformation()
  File "/usr/local/lib/python3.7/dist-packages/lpot/adaptor/tf_utils/graph_rewriter/int8/fuse_matmul_requantize.py", line 114, in do_transformation
    min_input_value = (min_input_node.attr['value'].tensor.float_val)[0]
IndexError: list index (0) out of range

Looks like it is trying to quantize a reshape node, which has no float_val attribute.

Missing classes in the coco label map

Please add the rest of the classes in the coco label map: https://github.com/intel/neural-compressor/blob/master/neural_compressor/experimental/metric/coco_label_map.py

The missing classes are breaking any code that tries to parse a coco dataset.

You can use the following list as a reference.
https://gist.github.com/iitzco/3b2ee634a12f154be6e840308abfcab5

To reproduce the error, just try to compile a model using something like:
COCORaw:
root: datasets/COCO
img_dir: train2017
anno_dir: annotations/instances_train2017.json

You'll get a KeyError exception with the first missing class the code finds you your annotation .json. If you edit coco_label_map.py and replace the dict by the one I pasted above, it solves the problem.

Traceback (most recent call last):
File "run.py", line 4, in
quantized_model = quantizer.fit()
File "/home/ec2-user/anaconda3/envs/amazonei_tensorflow_p36/lib/python3.6/site-packages/neural_compressor/experimental/quantization.py", line 212, in call
return super(Quantization, self).call()
File "/home/ec2-user/anaconda3/envs/amazonei_tensorflow_p36/lib/python3.6/site-packages/neural_compressor/experimental/component.py", line 214, in call
self.pre_process()
File "/home/ec2-user/anaconda3/envs/amazonei_tensorflow_p36/lib/python3.6/site-packages/neural_compressor/experimental/quantization.py", line 120, in pre_process
self._create_eval_dataloader(cfg)
File "/home/ec2-user/anaconda3/envs/amazonei_tensorflow_p36/lib/python3.6/site-packages/neural_compressor/experimental/quantization.py", line 83, in _create_eval_dataloader
eval_dataloader_cfg)
File "/home/ec2-user/anaconda3/envs/amazonei_tensorflow_p36/lib/python3.6/site-packages/neural_compressor/utils/create_obj_from_config.py", line 96, in create_dataloader
copy.deepcopy(dataloader_cfg['filter']),)
File "/home/ec2-user/anaconda3/envs/amazonei_tensorflow_p36/lib/python3.6/site-packages/neural_compressor/utils/create_obj_from_config.py", line 79, in create_dataset
transform=preprocess, filter=filter)
File "/home/ec2-user/anaconda3/envs/amazonei_tensorflow_p36/lib/python3.6/site-packages/neural_compressor/experimental/data/datasets/coco_dataset.py", line 186, in init
labels.append(category_map[ann['category_id']].encode('utf8'))
KeyError: 12 <---- complaining about class 12

Quantization AssertionError "inputs len must equal with input_tensor"

I'm following the TensorFlow BERT MRPC example to run the neural compressor with a saved model that I exported after fine tuning BERT from the Intel Model Zoo using the IMDB movie review sentiment analysis dataset. The training task for this was "cola" instead of "mrpc", but I still used run_classifier.py to train the model.
I used the same Dataset class definition and collate_fn from the example and my yaml has:

model:
  name: bert
  framework: tensorflow
  inputs: input_file, batch_size
  outputs: Cast_147:0, loss/Mean:0, loss/Neg:0, loss/Cast:0

My python code looks like this:

from neural_compressor.metric import METRICS
class Accuracy(object):
    def __init__(self):
        self.metric = METRICS('tensorflow')['Accuracy']()
          
    # it's ugly that the label is in the iterator
    def update(self, preds, label):
        logits, labels = preds
        self.metric.update(logits, labels)

    def reset(self):
        self.metric.reset()

    def result(self):
        return self.metric.result()

# Using run_classifier from the Intel model zoo
from run_classifier import file_based_input_fn_builder

eval_file = os.path.join(output_dir, "eval.tf_record")
estimator_input_fn = file_based_input_fn_builder(
          input_file=eval_file,
          seq_length=max_seq_length,
          is_training=False,
          drop_remainder=False)

quantizer.model = common.Model(os.path.join(output_dir, "frozen"), input_fn=estimator_input_fn)
quantizer.calib_dataloader = common.DataLoader(dataset, collate_fn=collate_fn)
quantizer.eval_dataloader = common.DataLoader(dataset, collate_fn=collate_fn)

quantizer.metric = common.Metric(metric_cls=Accuracy, name="bert_metric")
q_model = quantizer()

This is failing with the following error:

2021-10-11 20:32:11 [INFO] Start to evaluate the TensorFlow model.
2021-10-11 20:32:11 [ERROR] Unexpected exception AssertionError('inputs len must equal with input_tensor') happened during tuning.
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/neural_compressor/experimental/quantization.py", line 151, in execute
    self.strategy.traverse()
  File "/usr/local/lib/python3.8/dist-packages/neural_compressor/strategy/strategy.py", line 307, in traverse
    self.baseline = self._evaluate(self.model)
  File "/usr/local/lib/python3.8/dist-packages/neural_compressor/strategy/strategy.py", line 446, in _evaluate
    val = self.objective.evaluate(eval_func, model)
  File "/usr/local/lib/python3.8/dist-packages/neural_compressor/objective.py", line 213, in evaluate
    acc = eval_func(model)
  File "/usr/local/lib/python3.8/dist-packages/neural_compressor/utils/create_obj_from_config.py", line 132, in eval_func
    return adaptor.evaluate(model, dataloader, postprocess,
  File "/usr/local/lib/python3.8/dist-packages/neural_compressor/adaptor/tensorflow.py", line 291, in evaluate
    assert len(input_tensor) == len(inputs), \
AssertionError: inputs len must equal with input_tensor

The input_tensor for my model looks like this so I'm assuming that length is 4. What is other inputs that's being checked in the error above that has a different length than the input tensor?

[<tf.Tensor 'input_mask:0' shape=(None, 128) dtype=int32>,
 <tf.Tensor 'input_ids:0' shape=(None, 128) dtype=int32>,
 <tf.Tensor 'label_ids:0' shape=(None,) dtype=int32>,
 <tf.Tensor 'segment_ids:0' shape=(None, 128) dtype=int32>]

Any suggestions on how to resolve this issue?

list index out of range in ResNet50 example

Environment:
Windows 10
LPOT master
ResNet50 ONNX model IRv11
ONNX 1.8.1
ORT 1.6.0

Running the example in examples/onnxrt/image_recognition/resnet50:
run_tuning.bat resnet50.onnx resnet50_v1_5.yaml test

yields the error:
IndexError: list index out of range @ line 143 get_intermediate_outputs in onnx_calibrate.py
the intermediate_outputs list is empty.

Traceback (most recent call last):
 File "main.py", line 85, in 
 q_model = quantize(model)
 File "C:\Users\kkoyanag\AppData\Local\Programs\Python\Python36\lib\site-packages\lpot-1.1-py3.6.egg\lpot\quantization.py", line 219, in __call__
 self.strategy.traverse()
 File "C:\Users\kkoyanag\AppData\Local\Programs\Python\Python36\lib\site-packages\lpot-1.1-py3.6.egg\lpot\strategy\strategy.py", line 284, in traverse
 tune_cfg, self.model, self.calib_dataloader, self.q_func)
 File "C:\Users\kkoyanag\AppData\Local\Programs\Python\Python36\lib\site-packages\lpot-1.1-py3.6.egg\lpot\adaptor\onnxrt.py", line 82, in quantize
 quantize_params = self._get_quantize_params(model, dataLoader, q_config, iterations)
 File "C:\Users\kkoyanag\AppData\Local\Programs\Python\Python36\lib\site-packages\lpot-1.1-py3.6.egg\lpot\adaptor\onnxrt.py", line 102, in _get_quantize_params
 q_config["nodes_exclude"], q_config["nodes_include"], iterations=iterations)
 File "C:\Users\kkoyanag\AppData\Local\Programs\Python\Python36\lib\site-packages\lpot-1.1-py3.6.egg\lpot\adaptor\ox_utils\onnx_calibrate.py", line 295, in calibrate
 dict_for_quantization = calibrater.get_intermediate_outputs()
 File "C:\Users\kkoyanag\AppData\Local\Programs\Python\Python36\lib\site-packages\lpot-1.1-py3.6.egg\lpot\adaptor\ox_utils\onnx_calibrate.py", line 143, in get_intermediate_outputs
 range(len(intermediate_outputs[0]))]
IndexError: list index out of range

DenseNet transfer learning, custom dataset quantization: [ERROR] Unexpected exception InvalidArgumentError() happened during tuning

I've trained a multiclass classifier with TF using transfer learning and have been trying to use Neural Compressor in dev container enviroment. Firstly, I had to modify build_imagenet_data.py for grayscale image to obtain custom tfrecord file. Then I modifed densenet121.yaml for tfrecord location, height, width, scale and mean_value.
I come across InvalidArgumentError() during tuning, which I am not able to overcome.

root@docker-desktop:/workspaces/Neural_Compressor/neural-compressor/examples/tensorflow/image_recognition/tensorflow_models/quantization/ptq# bash run_tuning.sh --config=densenet121.yaml --input_model="/workspaces/Neural_Compressor/models/tensorflow/model" --output_model=./nc_densenet121

  • main --config=densenet121.yaml --input_model=/workspaces/Neural_Compressor/models/tensorflow/model --output_model=./nc_densenet121
  • init_params --config=densenet121.yaml --input_model=/workspaces/Neural_Compressor/models/tensorflow/model --output_model=./nc_densenet121
  • for var in "$@"
  • case $var in
    ++ echo --config=densenet121.yaml
    ++ cut -f2 -d=
  • config=densenet121.yaml
  • for var in "$@"
  • case $var in
    ++ echo --input_model=/workspaces/Neural_Compressor/models/tensorflow/model
    ++ cut -f2 -d=
  • input_model=/workspaces/Neural_Compressor/models/tensorflow/model
  • for var in "$@"
  • case $var in
    ++ echo --output_model=./nc_densenet121
    ++ cut -f2 -d=
  • output_model=./nc_densenet121
  • run_tuning
  • python main.py --input-graph /workspaces/Neural_Compressor/models/tensorflow/model --output-graph ./nc_densenet121 --config densenet121.yaml --tune
    2022-02-14 18:47:44.290292: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F FMA
    To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
    2022-02-14 18:47:44.292744: I tensorflow/core/common_runtime/process_util.cc:146] Creating new thread pool with default inter op setting: 2. Tune using inter_op_parallelism_threads for best performance.
    2022-02-14 18:48:34 [WARNING] Output tensor names should not be empty.
    2022-02-14 18:48:34 [WARNING] Input tensor names should not be empty.
    2022-02-14 18:49:02.538820: I tensorflow/core/grappler/devices.cc:75] Number of eligible GPUs (core count >= 8, compute capability >= 0.0): 0 (Note: TensorFlow was not compiled with CUDA or ROCm support)
    2022-02-14 18:49:02.539043: I tensorflow/core/grappler/clusters/single_machine.cc:358] Starting new session
    2022-02-14 18:49:02.564178: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:1149] Optimization results for grappler item: graph_to_optimize
    function_optimizer: function_optimizer did nothing. time = 0.019ms.
    function_optimizer: function_optimizer did nothing. time = 0.001ms.

2022-02-14 18:49:04.277543: I tensorflow/core/grappler/devices.cc:75] Number of eligible GPUs (core count >= 8, compute capability >= 0.0): 0 (Note: TensorFlow was not compiled with CUDA or ROCm support)
2022-02-14 18:49:04.277786: I tensorflow/core/grappler/clusters/single_machine.cc:358] Starting new session
2022-02-14 18:49:04.524204: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:1149] Optimization results for grappler item: tf_graph
constant_folding: Graph size after: 1118 nodes (-612), 1787 edges (-612), time = 85.762ms.
constant_folding: Graph size after: 1118 nodes (0), 1787 edges (0), time = 70.253ms.

2022-02-14 18:49:57 [INFO] ConvertLayoutOptimizer elapsed time: 1.95 ms
2022-02-14 18:49:58.696739: I tensorflow/core/grappler/devices.cc:75] Number of eligible GPUs (core count >= 8, compute capability >= 0.0): 0 (Note: TensorFlow was not compiled with CUDA or ROCm support)
2022-02-14 18:49:58.697680: I tensorflow/core/grappler/clusters/single_machine.cc:358] Starting new session
2022-02-14 18:49:58.829495: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:1149] Optimization results for grappler item: graph_to_optimize
model_pruner: Graph size after: 1115 nodes (-3), 1784 edges (-3), time = 19.002ms.
shape_optimizer: shape_optimizer did nothing. time = 1.244ms.
dependency_optimizer: Graph size after: 1114 nodes (-1), 1171 edges (-613), time = 19.209ms.
debug_stripper: debug_stripper did nothing. time = 1.636ms.
loop_optimizer: Graph size after: 1114 nodes (0), 1171 edges (0), time = 9.353ms.
model_pruner: Graph size after: 1114 nodes (0), 1171 edges (0), time = 10.482ms.
shape_optimizer: shape_optimizer did nothing. time = 1.148ms.
dependency_optimizer: Graph size after: 1114 nodes (0), 1171 edges (0), time = 13.312ms.
debug_stripper: debug_stripper did nothing. time = 1.285ms.

2022-02-14 18:49:58 [INFO] Pass GrapplerOptimizer elapsed time: 1625.51 ms
2022-02-14 18:49:59 [INFO] Pass SwitchOptimizer elapsed time: 227.41 ms
2022-02-14 18:49:59 [INFO] Pass RemoveTrainingNodesOptimizer elapsed time: 229.28 ms
2022-02-14 18:49:59 [INFO] Pass SplitSharedInputOptimizer elapsed time: 16.05 ms
2022-02-14 18:49:59 [INFO] Pass GraphFoldConstantOptimizer elapsed time: 215.11 ms
2022-02-14 18:49:59 [INFO] Pass FuseColumnWiseMulOptimizer elapsed time: 227.62 ms
2022-02-14 18:50:00 [INFO] Pass StripUnusedNodesOptimizer elapsed time: 892.47 ms
2022-02-14 18:50:00 [INFO] Pass GraphCseOptimizer elapsed time: 225.92 ms
2022-02-14 18:50:01 [INFO] Pass FoldBatchNormNodesOptimizer elapsed time: 718.75 ms
2022-02-14 18:50:01 [INFO] Pass UpdateEnterOptimizer elapsed time: 103.94 ms
2022-02-14 18:50:01 [INFO] Pass ConvertLeakyReluOptimizer elapsed time: 113.15 ms
2022-02-14 18:50:01 [INFO] Pass ConvertAddToBiasAddOptimizer elapsed time: 114.01 ms
2022-02-14 18:50:02 [INFO] Pass FuseTransposeReshapeOptimizer elapsed time: 116.84 ms
2022-02-14 18:50:02 [INFO] Pass FuseConvWithMathOptimizer elapsed time: 115.22 ms
2022-02-14 18:50:02 [INFO] Pass ExpandDimsOptimizer elapsed time: 112.96 ms
2022-02-14 18:50:02 [INFO] Pass InjectDummyBiasAddOptimizer elapsed time: 128.91 ms
2022-02-14 18:50:02 [INFO] Pass MoveSqueezeAfterReluOptimizer elapsed time: 112.66 ms
2022-02-14 18:50:03 [INFO] Pass Pre Optimization elapsed time: 58651.82 ms
2022-02-14 18:50:03 [INFO] Get FP32 model baseline.
2022-02-14 18:50:03 [INFO] Start to evaluate the TensorFlow model.
2022-02-14 18:50:04 [WARNING] Fail to forward with batch size=32, set to 1 now.
2022-02-14 18:50:04 [ERROR] Unexpected exception InvalidArgumentError() happened during tuning.
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/neural_compressor/experimental/quantization.py", line 151, in execute
self.strategy.traverse()
File "/usr/local/lib/python3.8/dist-packages/neural_compressor/strategy/strategy.py", line 323, in traverse
self.baseline = self._evaluate(self.model)
File "/usr/local/lib/python3.8/dist-packages/neural_compressor/strategy/strategy.py", line 501, in _evaluate
val = self.multi_objective.evaluate(eval_func, model)
File "/usr/local/lib/python3.8/dist-packages/neural_compressor/objective.py", line 222, in evaluate
acc = eval_func(model)
File "/usr/local/lib/python3.8/dist-packages/neural_compressor/utils/create_obj_from_config.py", line 132, in eval_func
return adaptor.evaluate(model, dataloader, postprocess,
File "/usr/local/lib/python3.8/dist-packages/neural_compressor/utils/utility.py", line 240, in fi
res = func(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/neural_compressor/adaptor/tensorflow.py", line 379, in evaluate
results = eval_func(dataloader)
File "/usr/local/lib/python3.8/dist-packages/neural_compressor/adaptor/tensorflow.py", line 326, in eval_func
for idx, (inputs, labels) in enumerate(dataloader):
File "/usr/local/lib/python3.8/dist-packages/neural_compressor/experimental/data/dataloaders/tensorflow_dataloader.py", line 88, in _generate_dataloader
for iter_tensors in dataset:
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/data/ops/iterator_ops.py", line 800, in next
return self._next_internal()
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/data/ops/iterator_ops.py", line 783, in _next_internal
ret = gen_dataset_ops.iterator_get_next(
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/ops/gen_dataset_ops.py", line 2845, in iterator_get_next
_ops.raise_from_not_ok_status(e, name)
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/framework/ops.py", line 7107, in raise_from_not_ok_status
raise core._status_to_exception(e) from None # pylint: disable=protected-access
tensorflow.python.framework.errors_impl.InvalidArgumentError: assertion failed: [offset_width must be >= 0.]
[[{{node crop_to_bounding_box/Assert/Assert}}]] [Op:IteratorGetNext]
2022-02-14 18:50:04 [ERROR] Specified timeout or max trials is reached! Not found any quantized model which meet accuracy goal. Exit.
Traceback (most recent call last):
File "main.py", line 69, in
evaluate_opt_graph.run()
File "main.py", line 58, in run
q_model.save(self.args.output_graph)
AttributeError: 'NoneType' object has no attribute 'save'

Use Graph optimization occur Error When save.

python3
>>> from neural_compressor.experimental import Graph_Optimization
>>> graph_optimizer = Graph_Optimization()
>>> graph_optimizer.precisions = 'fp32'
>>> graph_optimizer.model = "origin_model"
2022-03-24 11:42:56 [WARNING] Force convert framework model to neural_compressor model.
2022-03-24 11:42:58.097985: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-03-24 11:42:58.098912: I tensorflow/core/common_runtime/process_util.cc:146] Creating new thread pool with default inter op setting: 2. Tune using inter_op_parallelism_threads for best performance.
>>> optimized_model = graph_optimizer()
2022-03-24 11:43:09 [WARNING] Output tensor names should not be empty.
2022-03-24 11:43:09 [WARNING] Input tensor names should not be empty.
2022-03-24 11:43:12.321838: I tensorflow/core/grappler/devices.cc:75] Number of eligible GPUs (core count >= 8, compute capability >= 0.0): 0 (Note: TensorFlow was not compiled with CUDA or ROCm support)
2022-03-24 11:43:12.322074: I tensorflow/core/grappler/clusters/single_machine.cc:358] Starting new session
2022-03-24 11:43:12.357096: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:1149] Optimization results for grappler item: graph_to_optimize
  function_optimizer: Graph size after: 925 nodes (545), 1259 edges (725), time = 19.193ms.
  function_optimizer: function_optimizer did nothing. time = 0.627ms.

2022-03-24 11:43:13.417728: I tensorflow/core/grappler/devices.cc:75] Number of eligible GPUs (core count >= 8, compute capability >= 0.0): 0 (Note: TensorFlow was not compiled with CUDA or ROCm support)
2022-03-24 11:43:13.417871: I tensorflow/core/grappler/clusters/single_machine.cc:358] Starting new session
2022-03-24 11:43:13.722849: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:1149] Optimization results for grappler item: tf_graph
  constant_folding: Graph size after: 835 nodes (-82), 1112 edges (-132), time = 137.46ms.
  constant_folding: Graph size after: 835 nodes (0), 1112 edges (0), time = 78.661ms.

2022-03-24 11:43:19 [INFO] ConvertLayoutOptimizer elapsed time: 0.49 ms
2022-03-24 11:43:21.230239: I tensorflow/core/grappler/devices.cc:75] Number of eligible GPUs (core count >= 8, compute capability >= 0.0): 0 (Note: TensorFlow was not compiled with CUDA or ROCm support)
2022-03-24 11:43:21.230607: I tensorflow/core/grappler/clusters/single_machine.cc:358] Starting new session
2022-03-24 11:43:21.428514: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:1149] Optimization results for grappler item: graph_to_optimize
  model_pruner: Graph size after: 679 nodes (-156), 956 edges (-156), time = 47.762ms.
  shape_optimizer: shape_optimizer did nothing. time = 0.704ms.
  dependency_optimizer: Graph size after: 553 nodes (-126), 713 edges (-243), time = 15.518ms.
  debug_stripper: debug_stripper did nothing. time = 4.271ms.
  loop_optimizer: Graph size after: 553 nodes (0), 713 edges (0), time = 18.663ms.
  model_pruner: Graph size after: 553 nodes (0), 713 edges (0), time = 10.513ms.
  shape_optimizer: shape_optimizer did nothing. time = 0.474ms.
  dependency_optimizer: Graph size after: 553 nodes (0), 711 edges (-2), time = 11.468ms.
  debug_stripper: debug_stripper did nothing. time = 0.519ms.

2022-03-24 11:43:21 [INFO] Pass GrapplerOptimizer elapsed time: 1854.36 ms
2022-03-24 11:43:21 [INFO] Pass SwitchOptimizer elapsed time: 223.62 ms
2022-03-24 11:43:21 [INFO] Pass RemoveTrainingNodesOptimizer elapsed time: 223.07 ms
2022-03-24 11:43:22 [INFO] Pass SplitSharedInputOptimizer elapsed time: 370.06 ms
2022-03-24 11:43:22 [INFO] Pass GraphFoldConstantOptimizer elapsed time: 292.84 ms
2022-03-24 11:43:22 [INFO] Pass FuseColumnWiseMulOptimizer elapsed time: 296.89 ms
2022-03-24 11:43:23 [WARNING] From /root/anaconda3/envs/tf2/lib/python3.7/site-packages/neural_compressor/adaptor/tf_utils/util.py:318: extract_sub_graph (from tensorflow.python.framework.graph_util_impl) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.compat.v1.graph_util.extract_sub_graph`
2022-03-24 11:43:24 [INFO] Pass StripUnusedNodesOptimizer elapsed time: 1180.0 ms
2022-03-24 11:43:24 [INFO] Pass GraphCseOptimizer elapsed time: 324.92 ms
2022-03-24 11:43:24 [INFO] Pass FoldBatchNormNodesOptimizer elapsed time: 293.02 ms
2022-03-24 11:43:25 [INFO] Pass UpdateEnterOptimizer elapsed time: 290.38 ms
2022-03-24 11:43:25 [INFO] Pass ConvertLeakyReluOptimizer elapsed time: 292.11 ms
2022-03-24 11:43:25 [INFO] Pass ConvertAddToBiasAddOptimizer elapsed time: 295.49 ms
2022-03-24 11:43:25 [INFO] Pass FuseTransposeReshapeOptimizer elapsed time: 292.92 ms
2022-03-24 11:43:26 [INFO] Pass FuseConvWithMathOptimizer elapsed time: 292.74 ms
2022-03-24 11:43:26 [INFO] Pass ExpandDimsOptimizer elapsed time: 294.19 ms
2022-03-24 11:43:26 [INFO] Pass InjectDummyBiasAddOptimizer elapsed time: 300.0 ms
2022-03-24 11:43:27 [INFO] Pass MoveSqueezeAfterReluOptimizer elapsed time: 294.78 ms
2022-03-24 11:43:28 [INFO] Pass Pre Optimization elapsed time: 14510.94 ms
2022-03-24 11:43:29 [CRITICAL] Please set environment variable TF_ENABLE_ONEDNN_OPTS=1 when Tensorflow 2.6.x installed.
2022-03-24 11:43:40 [WARNING] Found possible input node names: ['dense_input', 'seq_50_input', 'sparse_ids_input', 'sparse_wgt_input'], output node names: ['dense'].
2022-03-24 11:43:51 [WARNING] Found possible input node names: ['dense_input', 'seq_50_input', 'sparse_ids_input', 'sparse_wgt_input'], output node names: ['dense'].
2022-03-24 11:43:54.987955: I tensorflow/core/grappler/devices.cc:75] Number of eligible GPUs (core count >= 8, compute capability >= 0.0): 0 (Note: TensorFlow was not compiled with CUDA or ROCm support)
2022-03-24 11:43:54.988129: I tensorflow/core/grappler/clusters/single_machine.cc:358] Starting new session
2022-03-24 11:43:55.023096: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:1149] Optimization results for grappler item: graph_to_optimize
  function_optimizer: Graph size after: 925 nodes (545), 1259 edges (725), time = 19.211ms.
  function_optimizer: function_optimizer did nothing. time = 0.639ms.

2022-03-24 11:43:56.109952: I tensorflow/core/grappler/devices.cc:75] Number of eligible GPUs (core count >= 8, compute capability >= 0.0): 0 (Note: TensorFlow was not compiled with CUDA or ROCm support)
2022-03-24 11:43:56.110097: I tensorflow/core/grappler/clusters/single_machine.cc:358] Starting new session
2022-03-24 11:43:56.419930: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:1149] Optimization results for grappler item: tf_graph
  constant_folding: Graph size after: 835 nodes (-82), 1112 edges (-132), time = 140.983ms.
  constant_folding: Graph size after: 835 nodes (0), 1112 edges (0), time = 80.166ms.


2022-03-24 11:44:04 [INFO] Pass PostCseOptimizer elapsed time: 8093.01 ms
2022-03-24 11:44:05 [INFO] Unexpected exception KeyError('model/embeddings_sparse_layer/StatefulPartitionedCall/StatefulPartitionedCall/feature_embedding/embeddings_sparse_layer/embedding_2/embeddings/Regularizer/Square/ReadVariableOp') happened during turing.
2022-03-24 11:44:05 [INFO] Specified timeout or max trials is reached! Not found any converted model which meet accuracy goal. Exit.
2022-03-24 11:44:05 [INFO] Graph optimization is done. Please invoke model.save() to save optimized model to disk.
>>> 
>>> optimized_model.save('./fp32_optimized_model/')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'NoneType' object has no attribute 'save'

InternalError: Missing 0-th output from node model/layer_1/Conv2D_eightbit_requantize (defined at <ipython-input-6-2bddd853d111>:2)

Case 1

Framework: Tensorflow 2.5.0, Intel-Tensorflow 2.5.0
Environment: Google Colab

I have a successfully quantized model that is to be run for inference without using LPOT API, so I wrote the following inference code:

with tf.compat.v1.Session() as sess:
    tf.compat.v1.saved_model.loader.load(sess, ['serve'], model)
    output = sess.graph.get_tensor_by_name(output_tensor_name)
    predictions = sess.run(output, {input_tensor_name: x})
    mse = tf.reduce_mean(tf.keras.losses.mean_squared_error(y, predictions))
    print(mse.eval())

When running the line predictions = sess.run(output, {input_tensor_name: x}):

---------------------------------------------------------------------------
InternalError                             Traceback (most recent call last)
/usr/local/lib/python3.7/dist-packages/tensorflow/python/client/session.py in _do_call(self, fn, *args)
   1374     try:
-> 1375       return fn(*args)
   1376     except errors.OpError as e:

7 frames
/usr/local/lib/python3.7/dist-packages/tensorflow/python/client/session.py in _run_fn(feed_dict, fetch_list, target_list, options, run_metadata)
   1359       return self._call_tf_sessionrun(options, feed_dict, fetch_list,
-> 1360                                       target_list, run_metadata)
   1361 

/usr/local/lib/python3.7/dist-packages/tensorflow/python/client/session.py in _call_tf_sessionrun(self, options, feed_dict, fetch_list, target_list, run_metadata)
   1452                                             fetch_list, target_list,
-> 1453                                             run_metadata)
   1454 

InternalError: Missing 0-th output from {{node model/layer_1/Conv2D_eightbit_requantize}}

During handling of the above exception, another exception occurred:

InternalError                             Traceback (most recent call last)
<ipython-input-6-2bddd853d111> in <module>()
      2     tf.compat.v1.saved_model.loader.load(sess, ['serve'], model)
      3     output = sess.graph.get_tensor_by_name(output_tensor_name)
----> 4     predictions = sess.run(output, {input_tensor_name: x[:64]}) # 64, 257, 60, 1
      5     mse = tf.reduce_mean(tf.keras.losses.mean_squared_error(y[:64], predictions))
      6     print(mse.eval())

/usr/local/lib/python3.7/dist-packages/tensorflow/python/client/session.py in run(self, fetches, feed_dict, options, run_metadata)
    966     try:
    967       result = self._run(None, fetches, feed_dict, options_ptr,
--> 968                          run_metadata_ptr)
    969       if run_metadata:
    970         proto_data = tf_session.TF_GetBuffer(run_metadata_ptr)

/usr/local/lib/python3.7/dist-packages/tensorflow/python/client/session.py in _run(self, handle, fetches, feed_dict, options, run_metadata)
   1189     if final_fetches or final_targets or (handle and feed_dict_tensor):
   1190       results = self._do_run(handle, final_targets, final_fetches,
-> 1191                              feed_dict_tensor, options, run_metadata)
   1192     else:
   1193       results = []

/usr/local/lib/python3.7/dist-packages/tensorflow/python/client/session.py in _do_run(self, handle, target_list, fetch_list, feed_dict, options, run_metadata)
   1367     if handle is None:
   1368       return self._do_call(_run_fn, feeds, fetches, targets, options,
-> 1369                            run_metadata)
   1370     else:
   1371       return self._do_call(_prun_fn, handle, feeds, fetches)

/usr/local/lib/python3.7/dist-packages/tensorflow/python/client/session.py in _do_call(self, fn, *args)
   1392                     '\nsession_config.graph_options.rewrite_options.'
   1393                     'disable_meta_optimizer = True')
-> 1394       raise type(e)(node_def, op, message)
   1395 
   1396   def _extend_graph(self):

InternalError: Missing 0-th output from node model/layer_1/Conv2D_eightbit_requantize (defined at <ipython-input-6-2bddd853d111>:2) 

This error happens with or without Intel-Tensorflow==2.5.0 installed, nor is it resolved when os.environ['TF_ENABLE_ONEDNN_OPTS'] = '1' is set explicitly.

On the other hand, when I run the same code in VS Code with Python 3.6.8 64-bit base: Conda, it returns the same error message as in Case 2.

Case 2

Framework: Tensorflow 2.4.0, Intel-Tensorflow 2.4.0
Environment: Google Colab

This case works well and prints out the MSE loss of the predictions, but when I uninstall Intel-Tensorflow 2.4.0 and run it with official Tensorflow, while running the same line in Case 1 (predictions = sess.run(output, {input_tensor_name: x})):

---------------------------------------------------------------------------
InvalidArgumentError                      Traceback (most recent call last)
/usr/local/lib/python3.7/dist-packages/tensorflow/python/client/session.py in _do_call(self, fn, *args)
   1374     try:
-> 1375       return fn(*args)
   1376     except errors.OpError as e:

7 frames
/usr/local/lib/python3.7/dist-packages/tensorflow/python/client/session.py in _run_fn(feed_dict, fetch_list, target_list, options, run_metadata)
   1357       # Ensure any changes to the graph are reflected in the runtime.
-> 1358       self._extend_graph()
   1359       return self._call_tf_sessionrun(options, feed_dict, fetch_list,

/usr/local/lib/python3.7/dist-packages/tensorflow/python/client/session.py in _extend_graph(self)
   1397     with self._graph._session_run_lock():  # pylint: disable=protected-access
-> 1398       tf_session.ExtendSession(self._session)
   1399 

InvalidArgumentError: No OpKernel was registered to support Op 'QuantizedMatMulWithBiasAndDequantize' used by {{node model/dense/Tensordot/MatMul_eightbit_requantize}} with these attrs: [input_quant_mode="MIN_FIRST", T1=DT_QUINT8, Toutput=DT_FLOAT, T2=DT_QINT8, Tbias=DT_QINT32, transpose_a=false, transpose_b=false]
Registered devices: [CPU]
Registered kernels:
  <no registered kernels>

	 [[model/dense/Tensordot/MatMul_eightbit_requantize]]

During handling of the above exception, another exception occurred:

InvalidArgumentError                      Traceback (most recent call last)
<ipython-input-6-2bddd853d111> in <module>()
      2     tf.compat.v1.saved_model.loader.load(sess, ['serve'], model)
      3     output = sess.graph.get_tensor_by_name(output_tensor_name)
----> 4     predictions = sess.run(output, {input_tensor_name: x[:64]}) # 64, 257, 60, 1
      5     mse = tf.reduce_mean(tf.keras.losses.mean_squared_error(y[:64], predictions))
      6     print(mse.eval())

/usr/local/lib/python3.7/dist-packages/tensorflow/python/client/session.py in run(self, fetches, feed_dict, options, run_metadata)
    966     try:
    967       result = self._run(None, fetches, feed_dict, options_ptr,
--> 968                          run_metadata_ptr)
    969       if run_metadata:
    970         proto_data = tf_session.TF_GetBuffer(run_metadata_ptr)

/usr/local/lib/python3.7/dist-packages/tensorflow/python/client/session.py in _run(self, handle, fetches, feed_dict, options, run_metadata)
   1189     if final_fetches or final_targets or (handle and feed_dict_tensor):
   1190       results = self._do_run(handle, final_targets, final_fetches,
-> 1191                              feed_dict_tensor, options, run_metadata)
   1192     else:
   1193       results = []

/usr/local/lib/python3.7/dist-packages/tensorflow/python/client/session.py in _do_run(self, handle, target_list, fetch_list, feed_dict, options, run_metadata)
   1367     if handle is None:
   1368       return self._do_call(_run_fn, feeds, fetches, targets, options,
-> 1369                            run_metadata)
   1370     else:
   1371       return self._do_call(_prun_fn, handle, feeds, fetches)

/usr/local/lib/python3.7/dist-packages/tensorflow/python/client/session.py in _do_call(self, fn, *args)
   1392                     '\nsession_config.graph_options.rewrite_options.'
   1393                     'disable_meta_optimizer = True')
-> 1394       raise type(e)(node_def, op, message)
   1395 
   1396   def _extend_graph(self):

InvalidArgumentError: No OpKernel was registered to support Op 'QuantizedMatMulWithBiasAndDequantize' used by node model/dense/Tensordot/MatMul_eightbit_requantize (defined at <ipython-input-6-2bddd853d111>:2)  with these attrs: [input_quant_mode="MIN_FIRST", T1=DT_QUINT8, Toutput=DT_FLOAT, T2=DT_QINT8, Tbias=DT_QINT32, transpose_a=false, transpose_b=false]
Registered devices: [CPU]
Registered kernels:
  <no registered kernels>

	 [[model/dense/Tensordot/MatMul_eightbit_requantize]]

The error persists even with os.environ['TF_ENABLE_ONEDNN_OPTS'] = '1' set explicitly.

I believe both cases are caused by the same type of error, i.e. No OpKernel was registered to support Op ...

I was given to understand that with official Tensorflow v2.5 installed and the environment variable TF_ENABLE_ONEDNN_OPTS=1 set (reference), the quantized model is supposed to run with oneDNN supported. But it doesn't seem to be the case in neither v2.4 nor v2.5.

Not sure if this is the right place to post this issue, but I have nowhere else to report the problem as Intel-Tensorflow doesn't allow issue reporting and Tensorflow developers usually ignore issues dependent on other packages. Any hint is greatly appreciated, thank you.

Compatibility of LPOT/Intel-Tensorflow with ML.NET

Tensorflow/Intel-Tensorflow Ver: 2.4.0

I was able to successfully output a quantized model from my TF 2.4 Keras model using your package, but I couldn't load the model under native Tensorflow framework without installing Intel-Tensorflow (See the following error traceback). As far as I know, the op 'QuantizedMatMulWithBiasAndDequantize' is deprecated by Tensorflow, which could possibly be the cause of the error.
Anyhow, my ultimate goal is to load my frozen pb model into ML.NET framework, and I know ML.NET supports loading Tensorflow pb model, but due to the error I'm afraid this method won't work. So I was wondering is there any way to load the model on ML.NET? For example, how is Intel-Tensorflow supported on ML.NET? Or maybe there is some other way to do so?

The line of error:

with tf.compat.v1.Session(graph=graph) as sess:
21 output = graph.get_tensor_by_name(output_tensor_name)
---> 22 predictions = sess.run(output, {input_tensor_name: x})
23 mse = tf.reduce_mean(tf.keras.losses.mean_squared_error(y, predictions))
24 print(mse.eval())

InvalidArgumentError: No OpKernel was registered to support Op 'QuantizedMatMulWithBiasAndDequantize'


InvalidArgumentError Traceback (most recent call last)
/usr/local/lib/python3.7/dist-packages/tensorflow/python/client/session.py in _do_call(self, fn, *args)
1374 try:
-> 1375 return fn(*args)
1376 except errors.OpError as e:

8 frames
/usr/local/lib/python3.7/dist-packages/tensorflow/python/client/session.py in _run_fn(feed_dict, fetch_list, target_list, options, run_metadata)
1357 # Ensure any changes to the graph are reflected in the runtime.
-> 1358 self._extend_graph()
1359 return self._call_tf_sessionrun(options, feed_dict, fetch_list,

/usr/local/lib/python3.7/dist-packages/tensorflow/python/client/session.py in _extend_graph(self)
1397 with self._graph._session_run_lock(): # pylint: disable=protected-access
-> 1398 tf_session.ExtendSession(self._session)
1399

InvalidArgumentError: No OpKernel was registered to support Op 'QuantizedMatMulWithBiasAndDequantize' used by {{node model/dense/Tensordot/MatMul_eightbit_requantize}} with these attrs: [input_quant_mode="MIN_FIRST", Toutput=DT_FLOAT, T1=DT_QUINT8, T2=DT_QINT8, Tbias=DT_QINT32, transpose_a=false, transpose_b=false]
Registered devices: [CPU]
Registered kernels:

 [[model/dense/Tensordot/MatMul_eightbit_requantize]]

During handling of the above exception, another exception occurred:

InvalidArgumentError Traceback (most recent call last)
in ()
----> 1 loss = inference(path_to_pb='./model.pb', input_tensor_name='x:0', output_tensor_name='Identity:0', x=x[:64], y=y[:64])

in inference(path_to_pb, input_tensor_name, output_tensor_name, x, y)
20 with tf.compat.v1.Session(graph=graph) as sess:
21 output = graph.get_tensor_by_name(output_tensor_name)
---> 22 predictions = sess.run(output, {input_tensor_name: x})
23 mse = tf.reduce_mean(tf.keras.losses.mean_squared_error(y, predictions))
24 print(mse.eval())

/usr/local/lib/python3.7/dist-packages/tensorflow/python/client/session.py in run(self, fetches, feed_dict, options, run_metadata)
966 try:
967 result = self._run(None, fetches, feed_dict, options_ptr,
--> 968 run_metadata_ptr)
969 if run_metadata:
970 proto_data = tf_session.TF_GetBuffer(run_metadata_ptr)

/usr/local/lib/python3.7/dist-packages/tensorflow/python/client/session.py in _run(self, handle, fetches, feed_dict, options, run_metadata)
1189 if final_fetches or final_targets or (handle and feed_dict_tensor):
1190 results = self._do_run(handle, final_targets, final_fetches,
-> 1191 feed_dict_tensor, options, run_metadata)
1192 else:
1193 results = []

/usr/local/lib/python3.7/dist-packages/tensorflow/python/client/session.py in _do_run(self, handle, target_list, fetch_list, feed_dict, options, run_metadata)
1367 if handle is None:
1368 return self._do_call(_run_fn, feeds, fetches, targets, options,
-> 1369 run_metadata)
1370 else:
1371 return self._do_call(_prun_fn, handle, feeds, fetches)

/usr/local/lib/python3.7/dist-packages/tensorflow/python/client/session.py in _do_call(self, fn, *args)
1392 '\nsession_config.graph_options.rewrite_options.'
1393 'disable_meta_optimizer = True')
-> 1394 raise type(e)(node_def, op, message)
1395
1396 def _extend_graph(self):

InvalidArgumentError: No OpKernel was registered to support Op 'QuantizedMatMulWithBiasAndDequantize' used by node model/dense/Tensordot/MatMul_eightbit_requantize (defined at :24) with these attrs: [input_quant_mode="MIN_FIRST", Toutput=DT_FLOAT, T1=DT_QUINT8, T2=DT_QINT8, Tbias=DT_QINT32, transpose_a=false, transpose_b=false]
Registered devices: [CPU]
Registered kernels:

 [[model/dense/Tensordot/MatMul_eightbit_requantize]]

Error while converting lstm graph

Quantizing model which takes multiple input image based on time distribution. Is there way to skip layer if it is not supported?

2021-01-04 16:10:53 [INFO] Start to run model quantization...
| Mixed Precision Statistics |
| INT8 Conv2D: 30 |
| INT8 MaxPool: 4 |
| INT8 ConcatV2: 4 |
| Overall: INT8 100.00% (38/38) BF16 0.00% (0/38) |
| ************************************************** |
Traceback (most recent call last):
File "/usr/local/lib/python3.7/dist-packages/lpot-1.0-py3.7.egg/lpot/adaptor/tf_utils/graph_converter.py", line 505, in quantize
self._quantize_graph()
File "/usr/local/lib/python3.7/dist-packages/lpot-1.0-py3.7.egg/lpot/adaptor/tf_utils/graph_converter.py", line 577, in _quantize_graph
self._tmp_graph_def = intel_quantizer.do_transform()
File "/usr/local/lib/python3.7/dist-packages/lpot-1.0-py3.7.egg/lpot/adaptor/tf_utils/quantize_graph/quantize_graph_for_intel_cpu.py", line 107, in do_transform
self.input_graph = worker.apply_the_transform()
File "/usr/local/lib/python3.7/dist-packages/lpot-1.0-py3.7.egg/lpot/adaptor/tf_utils/quantize_graph/quantize_graph_conv.py", line 356, in apply_the_transform
self.fusion_mappingfusion_name
File "/usr/local/lib/python3.7/dist-packages/lpot-1.0-py3.7.egg/lpot/adaptor/tf_utils/quantize_graph/quantize_graph_conv.py", line 214, in apply_conv_biasadd_fusion
matched_node.node.op, self.node_name_mapping[weight_name].node,
KeyError: 'conv_lst_m2d/while/split:3'
2021-01-04 16:10:55 [ERROR] Failed to quantize graph due to: 'conv_lst_m2d/while/split:3'
Traceback (most recent call last):
File "inference_Sequential.py", line 272, in
q_model = quantizer('./save/tf_model.pb', q_dataloader = dataloader)
File "/usr/local/lib/python3.7/dist-packages/lpot-1.0-py3.7.egg/lpot/quantization.py", line 221, in call
self.strategy.traverse()
File "/usr/local/lib/python3.7/dist-packages/lpot-1.0-py3.7.egg/lpot/strategy/strategy.py", line 266, in traverse
assert self.last_qmodel
AssertionError

`io.UnsupportedOperation: fileno` when running from Jupyter notebook

I'm trying to run the neural compressor from a Jupyter notebook like:

quantizer.model = common.Model(os.path.join(output_dir, "frozen"))
quantizer.calib_dataloader = common.DataLoader(dataset, batch_size)
quantizer.eval_dataloader = common.DataLoader(dataset, batch_size)
quantizer.metric = common.Metric(metric_cls=Accuracy)

q_model = quantizer()

While running this, I'm getting following error:

2021-10-19 22:20:14 [INFO] Pass Quantization elapsed time: 5277.63 ms
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/neural_compressor/adaptor/tf_utils/graph_converter.py", line 558, in quantize
    with CaptureOutputToFile(tmp_dump_file):
  File "/usr/local/lib/python3.8/dist-packages/neural_compressor/utils/utility.py", line 343, in __init__
    self.orig_stream_fileno = stream.fileno()
io.UnsupportedOperation: fileno

I think this is due to the way that the notebook routes stdout and stderr output streams to be seen in the notebook. The same script runs fine outside of the notebook (just using python).

Is there a way to workaround this to run the neural compressor quantization from a notebook?

[ERROR] Unexpected exception InternalError() happened during turing!

Framework: Tensorflow 2.5.0, Intel-Tensorflow 2.5.0
Environment: Google Colab
Model format: saved model (pre-trained under TF 2.5.0)

  1. I encountered the following error
[ERROR] Unexpected exception InternalError() happened during turing!

several times, and I don't know what is the cause since it is not printing the source of error.

  1. For the inputs outputs node suggestion
Found possible input node names: ['input_noisy'], output node names: ['dense']

I already set my inputs/outputs to the ones it suggested in my config yaml, but the warning keeps showing, as if ignoring the config file.

  1. The warning keeps showing:
Please set environment variable TF_ENABLE_MKL_NATIVE_FORMAT=0 when Tensorflow 2.5.0 installed.

even after I specifically set it via

!export TF_ENABLE_MKL_NATIVE_FORMAT=0
  1. I believe there is a typo in the error message, it's supposed to say 'tuning', no? 😃

Traceback

2021-07-19 02:52:17 [INFO] Start to run model quantization...
2021-07-19 02:52:17 [INFO] |Mixed Precision Statistics|
2021-07-19 02:52:17 [INFO] |INT8 MatMul: 3 |
2021-07-19 02:52:17 [INFO] |INT8 Conv2D: 5 |
2021-07-19 02:52:17 [INFO] |Overall: INT8 100.00% (8/8) BF16 0.00% (0/8) |
2021-07-19 02:52:17 [INFO] |**************************************************|
2021-07-19 02:52:18 [WARNING] Please set environment variable TF_ENABLE_MKL_NATIVE_FORMAT=0 when Tensorflow 2.5.0 installed.
2021-07-19 02:52:23 [WARNING] Found possible input node names: ['input_noisy'], output node names: ['dense']
2021-07-19 02:52:23 [WARNING] Found possible input node names: ['input_noisy'], output node names: ['dense']
2021-07-19 02:52:26 [INFO] loading session....
2021-07-19 02:52:29.701188: I tensorflow/core/grappler/devices.cc:78] Number of eligible GPUs (core count >= 8, compute capability >= 0.0): 0 (Note: TensorFlow was not compiled with CUDA or ROCm support)
2021-07-19 02:52:29.701454: I tensorflow/core/grappler/clusters/single_machine.cc:357] Starting new session
2021-07-19 02:52:29.717618: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:1144] Optimization results for grappler item: graph_to_optimize
function_optimizer: Graph size after: 270 nodes (185), 364 edges (264), time = 4.316ms.
function_optimizer: function_optimizer did nothing. time = 0.183ms.

2021-07-19 02:52:30.057810: I tensorflow/core/grappler/devices.cc:78] Number of eligible GPUs (core count >= 8, compute capability >= 0.0): 0 (Note: TensorFlow was not compiled with CUDA or ROCm support)
2021-07-19 02:52:30.058012: I tensorflow/core/grappler/clusters/single_machine.cc:357] Starting new session
2021-07-19 02:52:30.190360: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:1144] Optimization results for grappler item: tf_graph
constant_folding: Graph size after: 185 nodes (-42), 218 edges (-46), time = 80.757ms.
constant_folding: Graph size after: 185 nodes (0), 218 edges (0), time = 26.106ms.

2021-07-19 02:52:30 [INFO] Unknown match MatMul
2021-07-19 02:52:30 [INFO] Unknown match MatMul
2021-07-19 02:52:30.842607: I tensorflow/core/common_runtime/process_util.cc:146] Creating new thread pool with default inter op setting: 2. Tune using inter_op_parallelism_threads for best performance.
2021-07-19 02:52:32 [INFO] Pass Quantization elapsed time: 1876.04 ms
2021-07-19 02:53:01 [INFO] Pass QuantizedRNNConverter elapsed time: 100.07 ms
2021-07-19 02:53:02 [INFO] Pass StripUnusedNodesOptimizer elapsed time: 300.89 ms
2021-07-19 02:53:02 [INFO] Pass RemoveTrainingNodesOptimizer elapsed time: 99.77 ms
2021-07-19 02:53:02 [INFO] Pass FoldBatchNormNodesOptimizer elapsed time: 100.88 ms
2021-07-19 02:53:02 [INFO] Pass MetaOpOptimizer elapsed time: 99.0 ms
2021-07-19 02:53:02 [INFO] Node name unused_control_flow_input_23 doesn't exist in the model, please check the yaml.
2021-07-19 02:53:02 [WARNING] Found possible input node names: ['input_noisy'], output node names: ['dense']
2021-07-19 02:53:05 [INFO] Pass PostCseOptimizer elapsed time: 3148.93 ms
2021-07-19 02:53:05 [INFO] Pass quantize model elapsed time: 47932.33 ms
2021-07-19 02:53:05 [INFO] Start to evaluate Tensorflow model...
2021-07-19 02:53:06 [ERROR] Unexpected exception InternalError() happened during turing!
2021-07-19 02:53:06 [ERROR] Specified timeout or max trials is reached! Not found any quantized model which meet accuracy goal. Exit...

Traceback (most recent call last):
File "main.py", line 94, in
evaluate_opt_graph.run()
File "main.py", line 79, in run
q_model.save(self.args.output_graph)
AttributeError: 'NoneType' object has no attribute 'save'

[ONNX] AttributeError: 'tuple' object has no attribute 'replace'

LPOT Version: 1.5.1
ONNX Version: 1.7
ONNXRuntime Version: 1.6.0
Environment: Google Colab

Configuration:

quantization:
  calibration:
    sampling_size: 500
  model_wise:
    weight:
      granularity: per_channel
      scheme: sym
      algorithm: minmax
    activation:
      granularity: per_tensor
      scheme: sym
      algorithm: minmax

evaluation:
  accuracy:
    metric:
      MSE: {}  
  performance:                               
    warmup: 30
    iteration: 300

tuning:
  objective: performance
  random_seed: 9527
  strategy:
    name: mse
  accuracy_criterion:
    higher_is_better: False
    relative: 0.8
  exit_policy:
    performance_only: False
    max_trials: 20
    timeout: 100000

When quantizing my ONNX model, at the first tune it seems fine and outputs the accuracy and latency, but in tune 2 the following error happens:

Traceback (most recent call last):
  File "main.py", line 154, in <module>
    q_model = quantize()
  File "/usr/local/lib/python3.7/dist-packages/lpot/experimental/quantization.py", line 178, in __call__
    self.strategy.traverse()
  File "/usr/local/lib/python3.7/dist-packages/lpot/strategy/strategy.py", line 296, in traverse
    for tune_cfg in self.next_tune_cfg():
  File "/usr/local/lib/python3.7/dist-packages/lpot/strategy/mse.py", line 145, in next_tune_cfg
    self.model, self.calib_dataloader, op_lists, [1])
  File "/usr/local/lib/python3.7/dist-packages/lpot/adaptor/onnxrt.py", line 200, in inspect_tensor
    weight=(inspect_type!='activation'))
  File "/usr/local/lib/python3.7/dist-packages/lpot/adaptor/ox_utils/onnxrt_mid.py", line 417, in dump_tensor
    self.white_nodes = [node.replace('_quant', '') for node in self.white_nodes]
  File "/usr/local/lib/python3.7/dist-packages/lpot/adaptor/ox_utils/onnxrt_mid.py", line 417, in <listcomp>
    self.white_nodes = [node.replace('_quant', '') for node in self.white_nodes]
AttributeError: 'tuple' object has no attribute 'replace'

[Tensorflow] tensorflow.python.framework.errors_impl.FailedPreconditionError: Could not find variable 52/kernel. This could mean that the variable has been deleted. In TF1, it can also mean the variable is uninitialized.

Version: LPOT 1.5, Tensorflow 2.5, Intel-Tensorflow 2.5
Env: Google Colab

I was using a Keras saved model for quantization, and the following error occurs:

2021-07-23 03:49:12 [WARNING] There is no quantizable op type!!!
2021-07-23 03:49:12 [INFO] Getting FP32 model baseline...
2021-07-23 03:49:12 [INFO] Start to evaluate Tensorflow model...
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/client/session.py", line 1375, in _do_call
    return fn(*args)
  File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/client/session.py", line 1360, in _run_fn
    target_list, run_metadata)
  File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/client/session.py", line 1453, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.FailedPreconditionError: Could not find variable 52/kernel. This could mean that the variable has been deleted. In TF1, it can also mean the variable is uninitialized. Debug info: container=localhost, status=Not found: Container localhost does not exist. (Could not find resource: localhost/52/kernel)
	 [[{{node model/52/Conv2D/ReadVariableOp}}]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "main.py", line 110, in <module>
    evaluate_opt_graph.run()
  File "main.py", line 94, in run
    q_model = quantizer()
  File "/usr/local/lib/python3.7/dist-packages/lpot/experimental/quantization.py", line 177, in __call__
    self.strategy.traverse()
  File "/usr/local/lib/python3.7/dist-packages/lpot/strategy/strategy.py", line 286, in traverse
    self.baseline = self._evaluate(self.model)
  File "/usr/local/lib/python3.7/dist-packages/lpot/strategy/strategy.py", line 424, in _evaluate
    val = self.objective.evaluate(eval_func, model)
  File "/usr/local/lib/python3.7/dist-packages/lpot/objective.py", line 213, in evaluate
    acc = eval_func(model)
  File "/usr/local/lib/python3.7/dist-packages/lpot/utils/create_obj_from_config.py", line 131, in eval_func
    tensorboard, fp32_baseline)
  File "/usr/local/lib/python3.7/dist-packages/lpot/adaptor/tensorflow.py", line 210, in evaluate
    predictions = model.sess.run(output_tensor, feed_dict)
  File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/client/session.py", line 968, in run
    run_metadata_ptr)
  File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/client/session.py", line 1191, in _run
    feed_dict_tensor, options, run_metadata)
  File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/client/session.py", line 1369, in _do_run
    run_metadata)
  File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/client/session.py", line 1394, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.FailedPreconditionError: Could not find variable 52/kernel. This could mean that the variable has been deleted. In TF1, it can also mean the variable is uninitialized. Debug info: container=localhost, status=Not found: Container localhost does not exist. (Could not find resource: localhost/52/kernel)
	 [[{{node model/52/Conv2D/ReadVariableOp}}]]

Also, I don't know why the system prints [WARNING] There is no quantizable op type!!!, because my model contains Conv2D operations and Matmul operations, which are clearly quantizable.

List out dependencies that are required to use neural-compressor

When running pip install neural-compressor, I ran into the following error:

  unable to execute 'x86_64-linux-gnu-gcc': No such file or directory
  error: command 'x86_64-linux-gnu-gcc' failed with exit status 1
  ----------------------------------------
  ERROR: Failed building wheel for pycocotools

Through trial and error and googling the error, I found that I need to first install build-essential and python3-dev. After that, the pip install of neural compressor was successful, but something seems like it's still not right, because I get the following error when trying to do the imports from the pruning doc.

>>> from neural_compressor.experimental import Pruning, common
2021-10-06 21:52:52 [INFO] generated new fontManager
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.8/dist-packages/neural_compressor/__init__.py", line 18, in <module>
    from .quantization import Quantization
  File "/usr/local/lib/python3.8/dist-packages/neural_compressor/quantization.py", line 20, in <module>
    from .data import DATALOADERS, DATASETS
  File "/usr/local/lib/python3.8/dist-packages/neural_compressor/data/__init__.py", line 19, in <module>
    from .dataloaders import DataLoader
  File "/usr/local/lib/python3.8/dist-packages/neural_compressor/data/dataloaders/__init__.py", line 18, in <module>
    from .dataloader import DataLoader
  File "/usr/local/lib/python3.8/dist-packages/neural_compressor/data/dataloaders/dataloader.py", line 18, in <module>
    from neural_compressor.experimental.data.dataloaders import DATALOADERS
  File "/usr/local/lib/python3.8/dist-packages/neural_compressor/experimental/__init__.py", line 18, in <module>
    from .component import Component
  File "/usr/local/lib/python3.8/dist-packages/neural_compressor/experimental/component.py", line 17, in <module>
    from ..conf.config import Conf
  File "/usr/local/lib/python3.8/dist-packages/neural_compressor/conf/config.py", line 21, in <module>
    from ..strategy import STRATEGIES
  File "/usr/local/lib/python3.8/dist-packages/neural_compressor/strategy/__init__.py", line 18, in <module>
    from .strategy import STRATEGIES
  File "/usr/local/lib/python3.8/dist-packages/neural_compressor/strategy/strategy.py", line 30, in <module>
    from ..utils.create_obj_from_config import create_eval_func, create_train_func
  File "/usr/local/lib/python3.8/dist-packages/neural_compressor/utils/create_obj_from_config.py", line 19, in <module>
    from neural_compressor.experimental.data import DATASETS, TRANSFORMS, FILTERS, DATALOADERS
  File "/usr/local/lib/python3.8/dist-packages/neural_compressor/experimental/data/__init__.py", line 20, in <module>
    from .transforms import TRANSFORMS, BaseTransform, transform_registry
  File "/usr/local/lib/python3.8/dist-packages/neural_compressor/experimental/data/transforms/__init__.py", line 18, in <module>
    from .transform import TRANSFORMS, BaseTransform, transform_registry
  File "/usr/local/lib/python3.8/dist-packages/neural_compressor/experimental/data/transforms/transform.py", line 316, in <module>
    'nearest': cv2.INTER_NEAREST,
  File "/usr/local/lib/python3.8/dist-packages/neural_compressor/utils/utility.py", line 72, in __getattr__
    top_level_module = __import__(self.module_name)
  File "/usr/local/lib/python3.8/dist-packages/cv2/__init__.py", line 5, in <module>
    from .cv2 import *
ImportError: libGL.so.1: cannot open shared object file: No such file or directory

Can you list out all the dependencies that are required to install and run the neural compressor? I'm running this inside the intel/intel-optimized-tensorflow:2.5.0-jupyter container.

[Tensorflow] ValueError: Cannot feed value of shape (1,) for Tensor 'Inputs/InputFrame:0', which has shape '(1, 257, None, 1)'

Environment: Google Colab, Tensorflow 2.2.0, Intel Tensorflow 2.2.0

I used a custom dataset to store the data, and used common.dataloader to load the dataset.
quantizer.calib_dataloader = common.DataLoader(dataset, batch_size=1)
Later, I input a pre-trained pb model for quantization,
and the model has an input node of shape (1, 257, None, 1).

Upon running the quantization, the following error occured:

Traceback (most recent call last):
File "main.py", line 134, in
evaluate_opt_graph.run()
File "main.py", line 116, in run
q_model = quantizer()
File "/usr/local/lib/python3.7/dist-packages/lpot/experimental/quantization.py", line 170, in call
self.strategy.traverse()
File "/usr/local/lib/python3.7/dist-packages/lpot/strategy/strategy.py", line 287, in traverse
self.baseline = self._evaluate(self.model)
File "/usr/local/lib/python3.7/dist-packages/lpot/strategy/strategy.py", line 425, in _evaluate
val = self.objective.evaluate(eval_func, model)
File "/usr/local/lib/python3.7/dist-packages/lpot/objective.py", line 213, in evaluate
acc = eval_func(model)
File "/usr/local/lib/python3.7/dist-packages/lpot/utils/create_obj_from_config.py", line 127, in eval_func
tensorboard, fp32_baseline)
File "/usr/local/lib/python3.7/dist-packages/lpot/adaptor/tensorflow.py", line 211, in evaluate
predictions = model.sess.run(output_tensor, feed_dict)
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/client/session.py", line 958, in run
run_metadata_ptr)
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/client/session.py", line 1157, in _run
(np_val.shape, subfeed_t.name, str(subfeed_t.get_shape())))
ValueError: Cannot feed value of shape (1,) for Tensor 'Inputs/InputFrame:0', which has shape '(1, 257, None, 1)'

I'm not sure why this error occurred, though I suspect it's due to the None dimension.
Is there any workaround to it without having to set the None dimension to be fixed? Thanks.

[Stock Tensorflow ResNet50_v1_5] calibration step has an error

I followed the steps mentioned in the examples README.md to reproduce the resnet50_v1_5 results, but the calibration step fails to proceed ahead.

Can you kindly document the requirements for the calibration tf_records? The doc says it is optional in resnet50_v1_5.yaml, but it is not. We cannot remove any of the options there. I am providing a few validation tf_records for calibration and have set the number of samples to be 100.


bash run_tuning.sh --config=resnet50_v1_5.yaml  --input_model=./resnet50_v1.pb --output_model=./lpot_resnet50_v15.pb
+ main --config=resnet50_v1_5.yaml --input_model=./resnet50_v1.pb --output_model=./lpot_resnet50_v15.pb
+ init_params --config=resnet50_v1_5.yaml --input_model=./resnet50_v1.pb --output_model=./lpot_resnet50_v15.pb
+ for var in "$@"
+ case $var in
++ echo --config=resnet50_v1_5.yaml
++ cut -f2 -d=
+ config=resnet50_v1_5.yaml
+ for var in "$@"
+ case $var in
++ echo --input_model=./resnet50_v1.pb
++ cut -f2 -d=
+ input_model=./resnet50_v1.pb
+ for var in "$@"
+ case $var in
++ echo --output_model=./lpot_resnet50_v15.pb
++ cut -f2 -d=
+ output_model=./lpot_resnet50_v15.pb
+ run_tuning
+ python main.py --input-graph ./resnet50_v1.pb --output-graph ./lpot_resnet50_v15.pb --config resnet50_v1_5.yaml --tune
2021-09-23 19:51:46 [WARNING] Input tensor names should not be empty.
2021-09-23 19:51:46.650656: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set
2021-09-23 19:51:46.716141: I tensorflow/core/common_runtime/process_util.cc:146] Creating new thread pool with default inter op setting: 2. Tune using inter_op_parallelism_threads for best performance.
2021-09-23 19:51:47 [WARNING] Found possible input node names: ['input_tensor'], output node names: ['ArgMax', 'softmax_tensor'].
2021-09-23 19:51:48 [INFO] ConvertLayoutOptimizer elapsed time: 0.8 ms
2021-09-23 19:51:48.890494: I tensorflow/core/grappler/devices.cc:78] Number of eligible GPUs (core count >= 8, compute capability >= 0.0): 0 (Note: TensorFlow was not compiled with CUDA or ROCm support)
2021-09-23 19:51:48.890959: I tensorflow/core/grappler/clusters/single_machine.cc:356] Starting new session
2021-09-23 19:51:48.914173: I tensorflow/core/platform/profile_utils/cpu_utils.cc:112] CPU Frequency: 2500000000 Hz
2021-09-23 19:51:49.375314: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:928] Optimization results for grappler item: graph_to_optimize
  model_pruner: Graph size after: 460 nodes (-278), 475 edges (-277), time = 103.298ms.
  shape_optimizer: shape_optimizer did nothing. time = 0.647ms.
  dependency_optimizer: Graph size after: 460 nodes (0), 475 edges (0), time = 24.819ms.
  debug_stripper: debug_stripper did nothing. time = 0.501ms.
  loop_optimizer: Graph size after: 460 nodes (0), 475 edges (0), time = 23.821ms.
  model_pruner: Graph size after: 460 nodes (0), 475 edges (0), time = 24.285ms.
  shape_optimizer: shape_optimizer did nothing. time = 0.462ms.
  dependency_optimizer: Graph size after: 460 nodes (0), 475 edges (0), time = 25.048ms.
  debug_stripper: debug_stripper did nothing. time = 0.608ms.

2021-09-23 19:51:49 [INFO] Pass GrapplerOptimizer elapsed time: 1042.24 ms
2021-09-23 19:51:49 [INFO] Pass RemoveTrainingNodesOptimizer elapsed time: 10.31 ms
2021-09-23 19:51:49 [INFO] Pass SplitSharedInputOptimizer elapsed time: 4.2 ms
2021-09-23 19:51:49 [INFO] Pass GraphFoldConstantOptimizer elapsed time: 3.15 ms
2021-09-23 19:51:49 [INFO] Pass FuseColumnWiseMulOptimizer elapsed time: 5.37 ms
2021-09-23 19:51:49 [INFO] Pass StripUnusedNodesOptimizer elapsed time: 14.49 ms
2021-09-23 19:51:49 [INFO] Pass GraphCseOptimizer elapsed time: 4.5 ms
2021-09-23 19:51:49 [INFO] Pass FoldBatchNormNodesOptimizer elapsed time: 352.35 ms
2021-09-23 19:51:49 [INFO] Pass UpdateEnterOptimizer elapsed time: 1.52 ms
2021-09-23 19:51:49 [INFO] Pass ConvertLeakyReluOptimizer elapsed time: 2.75 ms
2021-09-23 19:51:49 [INFO] Pass InjectDummyBiasAddOptimizer elapsed time: 2.79 ms
2021-09-23 19:51:49 [INFO] Pass ConvertAddToBiasAddOptimizer elapsed time: 2.78 ms
2021-09-23 19:51:49 [INFO] Pass FuseTransposeReshapeOptimizer elapsed time: 3.03 ms
2021-09-23 19:51:49 [INFO] Pass FuseConvWithMathOptimizer elapsed time: 2.96 ms
2021-09-23 19:51:50 [INFO] Pass Pre Optimization elapsed time: 2207.32 ms
2021-09-23 19:51:50 [INFO] Get FP32 model baseline.
2021-09-23 19:51:51 [INFO] Start to evaluate the TensorFlow model.
2021-09-23 19:51:51.116307: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:196] None of the MLIR optimization passes are enabled (registered 0 passes)
2021-09-23 19:51:51.188329: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:116] None of the MLIR optimization passes are enabled (registered 2)
2021-09-23 19:51:51 [WARNING] Sample num during evaluation is 0.
2021-09-23 19:51:51 [INFO] Save tuning history to /mnt/storage/snkashya/BDL_lpot/neural-compressor/examples/tensorflow/image_recognition/lpot_workspace/2021-09-23_19-51-46/./history.snapshot.
2021-09-23 19:51:51 [INFO] FP32 baseline is: [accuracy: 0.0000, duration (seconds): 0.6577]
2021-09-23 19:51:52 [WARNING] Found possible input node names: ['input_tensor'], output node names: ['softmax_tensor'].
2021-09-23 19:51:52 [WARNING] Found possible input node names: ['input_tensor'], output node names: ['softmax_tensor'].
2021-09-23 19:51:54.882834: I tensorflow/core/common_runtime/process_util.cc:146] Creating new thread pool with default inter op setting: 2. Tune using inter_op_parallelism_threads for best performance.
2021-09-23 19:51:55 [INFO] Pass Quantization elapsed time: 1038.32 ms
Traceback (most recent call last):
  File "/home/snkashya/.conda/envs/lpot/lib/python3.7/site-packages/lpot/adaptor/tf_utils/graph_converter.py", line 561, in quantize
    self._calibration_data = Helper.gen_valid_sampling_log(tmp_dump_file)
  File "/home/snkashya/.conda/envs/lpot/lib/python3.7/site-packages/lpot/adaptor/tf_utils/graph_rewriter/graph_util.py", line 860, in gen_valid_sampling_log
    first_line = valid_data[0].rsplit(':')[0]
IndexError: list index out of range


[Tensorflow QAT] AttributeError: 'NoneType' object has no attribute 'graph_def'

Environment: Google Colab
LPOT Version: 1.6
Tensorflow Version: Official 2.6.0 (with environment variables set as below)
TF_ENABLE_ONEDNN_OPTS=1
TF_ENABLE_MKL_NATIVE_FORMAT=0

I basically followed the qat example provided here.
I used a pretrained model that is to be annotated with only Conv2D being quantized, and used the annotated model for model.fit() for several epochs and saved the model.
After that, I use LPOT ModelConversion to convert the model, and the following error occurs:

2021-09-10 03:07:43 [INFO] Pass Quantization elapsed time: 7581.68 ms
2021-09-10 03:07:44 [INFO] Pass FreezeFakeQuantOpOptimizer elapsed time: 283.8 ms
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/lpot/adaptor/tf_utils/graph_converter.py", line 534, in quantize
    self._fuse_requantize_with_fused_quantized_node()
  File "/usr/local/lib/python3.7/dist-packages/lpot/adaptor/tf_utils/graph_converter.py", line 698, in _fuse_requantize_with_fused_quantized_node
    self.device).do_transformation()
  File "/usr/local/lib/python3.7/dist-packages/lpot/adaptor/tf_utils/graph_rewriter/int8/fuse_conv_requantize.py", line 47, in __init__
    self.graph_info = self.graph_analyzer.parse_graph()
  File "/usr/local/lib/python3.7/dist-packages/lpot/adaptor/tf_utils/graph_rewriter/graph_util.py", line 611, in parse_graph
    each_input)].outputs.append(node_name)
KeyError: 'model_3/quant_31/StatefulPartitionedCall/StatefulPartitionedCall/MovingAvgQuantize/FakeQuantWithMinMaxVars'
2021-09-10 03:07:44 [ERROR] Fail to quantize graph due to 'model_3/quant_31/StatefulPartitionedCall/StatefulPartitionedCall/MovingAvgQuantize/FakeQuantWithMinMaxVars'.
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-3-515087c4513a> in <module>()
      4 conversion.destination = 'default'
      5 conversion.model = common.Model('./q_aware_model')
----> 6 q_model = conversion()
      7 q_model.save('quantized_model')

2 frames
/usr/local/lib/python3.7/dist-packages/lpot/experimental/model_conversion.py in __call__(self)
     94 
     95         self.adaptor = FRAMEWORKS[self.framework](framework_specific_info)
---> 96         q_model = self.adaptor.convert(self._model, self._source, self._destination)
     97 
     98         # when eval_func is None but metric or _eval_dataloader is set by yaml or code,

/usr/local/lib/python3.7/dist-packages/lpot/adaptor/tensorflow.py in convert(self, model, source, destination)
    814                                    fake_quant=True)
    815 
--> 816         return converter.convert()
    817 
    818     @dump_elapsed_time("Pass recover model")

/usr/local/lib/python3.7/dist-packages/lpot/adaptor/tf_utils/graph_converter.py in convert(self)
    247         if len(self.bf16_ops) > 0:
    248             model = self.bf16_convert()
--> 249         post_cse_graph_def = PostCseOptimizer(model.graph_def).do_transformation()
    250         post_cse_graph_def.library.CopyFrom(self.model.graph_def.library)
    251         model.graph_def = post_cse_graph_def

AttributeError: 'NoneType' object has no attribute 'graph_def'

My original code (simplified):

model = tf.keras.models.load_model('model')

import tensorflow_model_optimization as tfmot

def apply_quantization_to_Conv2D(layer):
  if isinstance(layer, tf.keras.layers.Conv2D):
    return tfmot.quantization.keras.quantize_annotate_layer(layer)
  return layer

annotated_model = tf.keras.models.clone_model(model, clone_function=apply_quantization_to_Conv2D)

q_aware_model = tfmot.quantization.keras.quantize_apply(annotated_model)
q_aware_model.summary()

q_aware_model.compile(optimizer='adam', loss='mse')
q_aware_model.fit(x=[X_q, X_norm_q], y=y_q, 
                  batch_size=64,
                  epochs=45)
q_aware_model.save('./q_aware_model')

from lpot.experimental import ModelConversion, common
conversion = ModelConversion()
conversion.source = 'QAT'
conversion.destination = 'default'
conversion.model = common.Model('./q_aware_model')
q_model = conversion()
q_model.save('quantized_model')

Please find model here.
Thanks!

Import Error: libGL.so.1: cannot open shared object file: No such file or directory

Environment: Docker
OS: ubuntu 16.04 LTS 64-bit
Processor: Intel(R) Core i7-7700 CPU @ 3.60 GHz * 8
Tensorflow version: 2.6.0
TF_ENABLE_ONEDNN_OPTS=1

Hi, I am currently trying to run Tensorflow 2.6 quantization with Neural Compressor in Docker environment. Basically I modified my image from tensorflow/tensorflow:2.6.0-gpu-jupyter, and use pip install neural-compressor to install. However when I try to import neural_compressor with the command

$ python
>>> import neural_compressor

the following error occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.6/dist-packages/neural_compressor/__init__.py", line 18, in <module>
    from .quantization import Quantization
  File "/usr/local/lib/python3.6/dist-packages/neural_compressor/quantization.py", line 20, in <module>
    from .data import DATALOADERS, DATASETS
  File "/usr/local/lib/python3.6/dist-packages/neural_compressor/data/__init__.py", line 19, in <module>
    from .dataloaders import DataLoader
  File "/usr/local/lib/python3.6/dist-packages/neural_compressor/data/dataloaders/__init__.py", line 18, in <module>
    from .dataloader import DataLoader
  File "/usr/local/lib/python3.6/dist-packages/neural_compressor/data/dataloaders/dataloader.py", line 18, in <module>
    from neural_compressor.experimental.data.dataloaders import DATALOADERS
  File "/usr/local/lib/python3.6/dist-packages/neural_compressor/experimental/__init__.py", line 18, in <module>
    from .component import Component
  File "/usr/local/lib/python3.6/dist-packages/neural_compressor/experimental/component.py", line 17, in <module>
    from ..conf.config import Conf
  File "/usr/local/lib/python3.6/dist-packages/neural_compressor/conf/config.py", line 21, in <module>
    from ..strategy import STRATEGIES
  File "/usr/local/lib/python3.6/dist-packages/neural_compressor/strategy/__init__.py", line 18, in <module>
    from .strategy import STRATEGIES
  File "/usr/local/lib/python3.6/dist-packages/neural_compressor/strategy/strategy.py", line 30, in <module>
    from ..utils.create_obj_from_config import create_eval_func, create_train_func
  File "/usr/local/lib/python3.6/dist-packages/neural_compressor/utils/create_obj_from_config.py", line 19, in <module>
    from neural_compressor.experimental.data import DATASETS, TRANSFORMS, FILTERS, DATALOADERS
  File "/usr/local/lib/python3.6/dist-packages/neural_compressor/experimental/data/__init__.py", line 20, in <module>
    from .transforms import TRANSFORMS, BaseTransform, transform_registry
  File "/usr/local/lib/python3.6/dist-packages/neural_compressor/experimental/data/transforms/__init__.py", line 18, in <module>
    from .transform import TRANSFORMS, BaseTransform, transform_registry
  File "/usr/local/lib/python3.6/dist-packages/neural_compressor/experimental/data/transforms/transform.py", line 316, in <module>
    'nearest': cv2.INTER_NEAREST,
  File "/usr/local/lib/python3.6/dist-packages/neural_compressor/utils/utility.py", line 72, in __getattr__
    top_level_module = __import__(self.module_name)
  File "/usr/local/lib/python3.6/dist-packages/cv2/__init__.py", line 180, in <module>
    bootstrap()
  File "/usr/local/lib/python3.6/dist-packages/cv2/__init__.py", line 152, in bootstrap
    native_module = importlib.import_module("cv2")
  File "/usr/lib/python3.6/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
ImportError: libGL.so.1: cannot open shared object file: No such file or directory

I have attached my Dockerfile, (please note that neural_compressor is installed after creating the image and the container is committed successfully)
Dockerfile.txt

Thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.