Comments (7)
I get it. Thank you!
from compute-engine.
Q1. When I print X_in_quantized in python, I found X_in_quantized is still a float tensor with the shape of [1, 1, 1024]. But after converting the tf model into a tflite model, it becomes a bitpacked tensor. Can you give me an overview of how this is done?
This is correct. During training everything is float so that you can compute gradients. Once you convert the model into a tflite model for inference, it becomes a bitpacked tensor. What exactly is your question about this? This conversion between the models happens in the MLIR converter. In the tflite model, there is an operator LceQuantize
which takes a float or int8 tensor as input, and outputs the bitpacked tensor. At inference time it does that by checking if (x[i] < 0)
for each input value, and then sets the i
-th bit to 0
or 1
.
Q2. Is there any possibility to add a custom operation, e.g.,
costom_converter()
, to convert the `tf.float32' input tensor to a 'tf.int32' bitpacked input tensor?X_in = Input(shape=(1, 1, 32), batch_size=1024, dtype=tf.float32) X_in_quantized = costom_converter()(X_in)If the answer is yes, how can I add a custom operation and build it.
If you want the tf.int32
type during training, I'm not sure if that's possible. I also don't understand what the purpose of that would be: the TF ops don't understand this bitpacked format so it would be a bit useless.
If you want the tf.int32
type during inference in TFLite and you want to remove the int8
or float32
tensor that comes before LceQuantize
, then you can maybe do that by extending the function strip_lcedequantize_ops
from larq_compute_engine.mlir.python.util
.
from compute-engine.
@Tombana , thanks for your suggestion! I have successfully removed the float32
input tensor by extending the strip_lcedequantize_ops
function.
from compute-engine.
I think the MLIR converter can not directly create .tflite
files with boolean input or output tensors; only float32 and int8 are supported. Even if booleans were supported, then every bool would still take up 1 byte instead of 1 bit.
The LCE converter does have a utility function for getting bitpacked boolean output tensors: first you have to create a regular tflite file that ends with a lq.quantizers.SteSign()
as output. Then you convert it as normal and get a float32 output tensor. (You can verify that in netron). Then you can call the following utility function:
from larq_compute_engine.mlir.python.util import strip_lcedequantize_ops
strip_lcedequantize_ops(tflite_file_bytes)
That should result in a tflite file that has an int32 output tensor (again its a good idea to verify it in netron). It does not actually use 32-bit integers though: these numbers represent bitpacked booleans where every integer contains 32 booleans.
from compute-engine.
Thanks for your reply. It helps me a lot.
Besides, how can I use int8 input in LCE benchmark? When I directly modify the dtype of input tensor, it raises a TypeError.
X_in = Input(shape=(1, 1, 1024,), batch_size=1024, dtype=tf.int8)
TypeError: Value passed to parameter 'x' has DataType int8 not in list of allowed values: bfloat16, float16, float32, float64, int32, int64, complex64, complex128
from compute-engine.
For int8 tflite files you have to do either int8 quantization-aware-training or use post-training quantization. The tensor in Tensorflow/Keras stays float32 and during conversion it becomes int8.
from compute-engine.
Hi @Tombana , sorry for reopening this issue. After learning more about larq compute engine, I have two new questions.
As discussed above,
X_in = Input(shape=(1, 1, 1024), batch_size=1024, dtype=tf.float32)
X_in_quantized = lq.quantizers.SteSign()(X_in)
I want X_in_quantized
as Input.
Q1. When I print X_in_quantized in python, I found X_in_quantized is still a float tensor with the shape of [1, 1, 1024]. But after converting the tf model into a tflite model, it becomes a bitpacked tensor. Can you give me an overview of how this is done?
Q2. Is there any possibility to add a custom operation, e.g., costom_converter()
, to convert the `tf.float32' input tensor to a 'tf.int32' bitpacked input tensor?
X_in = Input(shape=(1, 1, 32), batch_size=1024, dtype=tf.float32)
X_in_quantized = costom_converter()(X_in)
If the answer is yes, how can I add a custom operation and build it.
I'm not familiar with tflite and mlir, so I'd appreciate it if you could go into more detail.
Thank you!
from compute-engine.
Related Issues (20)
- Automatic release builds for benchmarking binaries are broken HOT 2
- Deployment on Cortex-M HOT 2
- Tensor transform triggers dequantization HOT 6
- Error on import HOT 2
- Select indirect BGEMM kernels - Benchmarking grouped binary convolutions HOT 3
- LCEInterpreter and converter design HOT 1
- core dumped when number of threads is larger than 2 HOT 3
- Benchmarking custom model HOT 3
- Int8 quantization for microcontroller HOT 13
- Failed import 'org.tensorflow.lite.DataType' on Android project HOT 8
- `convert_keras_model()` does not work as expected for BinaryDenseNet37 Dilated and XNORNet HOT 1
- DoReFa quantizer with higher number of MACs/Ops, Grouped convs as custom ops on LCE 0.7.0 HOT 3
- Get Operator-wise Profiling Results HOT 1
- Error while performing benchmarking HOT 44
- extra model size induced by non-parameter layer HOT 1
- Fix Android benchmarker build
- Larq Compute Engine seems incompatible with tensorflow-lite-task-vision on Android (using the latest tensorflow lite demo code) HOT 2
- Dorefa model size and behavior with full precision model and ste_sign model HOT 13
- Cannot save compressed binary or ternary weights, saved as float32 parameters HOT 12
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. πππ
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google β€οΈ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from compute-engine.