I want to generate a .pb file instead of a flat buffer file(.tflite) after the quantiz

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Which model conversion are you using? And, which model compression method (or "

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Running Quantized model on a GPU about pocketflow HOT 12 CLOSED

tencent commented on July 19, 2024

Running Quantized model on a GPU

from pocketflow.

Comments (12)

jiaxiang-wu commented on July 19, 2024 1

@Ariel-JUAN
You can load the checkpoint file, mark input and output tensors with tf.add_to_collection, and save the graph to checkpoint files, so that the model conversion script can recognize them.

from pocketflow.

jiaxiang-wu commented on July 19, 2024

As far as I know, you cannot using post-training quantization to convert a model with 32-bit floating-point weights into its counterpart with 8-bit fixed-point weights in the *.pb format. It is only provided in the TF-Lite module: tf.contrib.lite.TocoConverter.
You can try to quantize a model with our UniformQuantTFLearner with quantization-aware training, and then export the resulting model to a *.pb file for deployment.

BTW, are you sure that 8-bit quantized model can bring acceleration on a GPU? We have only tested it on mobile devices, where TF-Lite has provided special optimization for low-precision operations.

from pocketflow.

dhingratul commented on July 19, 2024

The only way to find out is to benchmark on the target H/w

from pocketflow.

jiaxiang-wu commented on July 19, 2024

Sorry, we currently do not have enough time to complete this benchmark, due to limited resources. Could you please benchmark this by yourself?

from pocketflow.

Ariel-JUAN commented on July 19, 2024

@jiaxiang-wu
Hi, I use python tools/conversion/export_quant_tflite_model.py --model_dir ./models --input_coll inputs --output_coll outputs --quantize True to do the quantization, but I am not sure what does inputs and outputs mean? What should I put in?

from pocketflow.

jiaxiang-wu commented on July 19, 2024

The input_coll and output_coll arguments are used to locate the input and output tensors in the original / compressed model. For instance, in DisChnPrunedLearner, we mark the input and output tensors with:

# add input & output tensors to certain collections
tf.add_to_collection('images_final', images)
tf.add_to_collection('logits_final', logits)

In this case, we use --input_coll images_final --output_coll logits_final (which are default values of these two arguments) to locate input and output tensors.

https://github.com/Tencent/PocketFlow/blob/master/learners/discr_channel_pruning/learner.py#L328

from pocketflow.

Ariel-JUAN commented on July 19, 2024

@jiaxiang-wu
Hi, thanks for replying.
There is a model which doesn't specify the input and output tensors using tf.add_to_collection operation. Can't I get the quantized model?
Can I just specify the input and output tensor without using tf.get_collection?

from pocketflow.

Ariel-JUAN commented on July 19, 2024

@jiaxiang-wu
Hi, I did as you told me and it worked~
But I got confused again. Why use export_pb_tflite_model to operate pb file twice? After that I got a model_original.pb and model_transformed.pb?
My original model.ckpt.data is about 255M, and meta is about 255M. The model_original.pb is about 255M but the model_transformed.pb is about 400M.
It is normal? I thought the quantized model should be smaller than it used to be.....

from pocketflow.

jiaxiang-wu commented on July 19, 2024

Which model conversion script are you using? And, which model compression method (or "learner") are you using to obtain the compressed model?

from pocketflow.

Ariel-JUAN commented on July 19, 2024

@jiaxiang-wu
The command is python tools/conversion/export_pb_tflite_models.py --model_dir ./models --input_coll input --output_coll output.

from pocketflow.

jiaxiang-wu commented on July 19, 2024

This model conversion script is used to convert channel pruned models, rather than quantized models, to TF-Lite models. If the compressed model you provided is not channel pruned, then "model_transformed.pb" and "model_transformed.tflite" models may be larger due to newly inserted 1x1 convolutional layers.

from pocketflow.

Ariel-JUAN commented on July 19, 2024

@jiaxiang-wu
Thanks.

from pocketflow.

Running Quantized model on a GPU about pocketflow HOT 12 CLOSED

Comments (12)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent