Moad Computer, the actionable insights company
  • Home
  • Contact
  • Shop
  • Blog
  • Home
  • Contact
  • Shop
  • Blog
Search

Actionable Insights blog




Post-training dynamic range quantization

4/30/2023

0 Comments

 
Picture

Dr. Rahul Remanan

CEO, Moad Computer

​Described in this article are two example python functions to handle the post-training dynamic range quantization in Tensorflow. Finally, there is a prototype example inference API to help make quantized model deployments easier by making them plug-in-play with the Tensorflow Keras inference API.

The first function describes the conversion of the floating point model to a dynamic range quantized model and the second function describes the inference steps using the dynamic range quantized model.

​This helps reduce the memory footprint and the inference time for a model. It can achieve 4x reduction in model size, along with 2x to 3x speed-up of the inference performance. 
Post-training dynamic range model quantization
Inference using post-training dynamic range quantized model
Here, the floating point model weights are quantized statistically to 8-bit precision integers. To achieve reduced inference time, the outputs of activation functions are also 8-bits quantized dynamically depending on their range.

All the computations using the weights and the activations are therefore performed using the 8-bit precision integers. Also, the user does not have to provide a representative dataset for calibration of the quantized model.

It should be noted that the outputs are stored using floating point. The trade-off for storing the outputs as floating point is the slightly lower speed-up of the dynamic range operations, compared to the full fixed-point computation and storage of weights, activations and outputs. 
A simple inference API for the quantized model
The key challenge in making quantized models production ready is to adapt the inference pipeline with the slightly different prediction generating steps for a quantized model.

Using an inference API such as the simple prototype example, built using the TFLiteModel() class, outlined above, the code compatibility with the Tensorflow Keras inference API can be accomplished.

Such an approach of writing compatible, plug-in-play APIs with the regular deep neural network models, especially for the inference pipelines, can make the production deployments of quantized models much simpler.
References:
1. ​Tensorflow documentation on post-training quantization
2. ​Tensorflow post-training dynamic range quantization example using MNIST
3. Model inference API in Tensorflow Keras
0 Comments

Your comment will be posted after it is approved.


Leave a Reply.

    Overview

    Moad Computer is an actionable insights firm. We provide enterprises with end-to-end artificial intelligence solutions. Actionable Insights blog is a quick overview of things we are most excited about.

    Archives

    November 2022
    October 2022
    September 2022
    August 2022
    July 2022
    June 2022
    May 2022
    April 2022
    March 2022
    February 2022
    January 2022
    December 2021
    November 2021
    October 2021
    September 2021
    August 2021
    July 2021
    June 2021
    May 2021
    April 2021
    March 2021
    February 2021
    January 2021
    December 2020
    November 2020
    October 2020
    May 2020
    April 2020

    Categories

    All

    RSS Feed

Location

Our mission:

Cutting edge, insightful analytics using AI, for everyone.

Contact Us

    Subscribe Today!

Submit
  • Home
  • Contact
  • Shop
  • Blog