google-vision-wrapper

A Tiny Python Wrapper 🐍 for Google Vision API. 👁

Introduction

Google Vision API is a service provided by Google Cloud, with the intent of giving to the vast public the possibility to access complex deep learning models, trained with huge datasets, without the inconvenience of collecting a proper dataset and training a model from scratch.

All pythonist computer vision ethusiasts (as I am), knows that there are computer vision libraries that comes already loaded with potent pre-trained models. An example is dlib with Haar Cascades, HOG for face recognition and Caffe models for object detection. But sometimes you need an API that is accessible from the web (i.e. if you want to deploy a web app or a lightweight service). Then you have two possibilities:

You deploy this already trained model on a web app in the cloud, with the incovinence of hosting a potential huge model on physical space, paying a lot of attention to subtle cloud Pricing and billings.
Rely on a third party service, already shipped with all you need, with less attention to costs since this kind of service is really cheap.

For some kind of projects I prefer relying on the second option. Google Vision API is a perfect solution in this context. There is a list of tasks that the API can accomplish on an image:

Optical character recognition
Detect crop hints
Detect faces
Detect image properties
Detect labels
Detect landmarks
Detect logos
Detect multiple objects
Detect explicit content
Detect web entities and pages

For more information I strongly suggest to read the Google Official User Guide to the API.

Even if the guide is exhaustive and the API is straightforward to use, I found that it requires a bit of manipulation to setup and to retrieve the information from the server response in a useable way (a Pandas DataFrame or an array for plotting). This is why I started creating a wrapper around the API, that handles the request under-the-hood and which interface is more compact than the original one.

I started for a small project of aesthetics medicine I'm working on and I though that maybe someone else could find it useful. I am therefore extending my original tiny class for face landmark recognition, with all the features provided by Google Vision API, with the possibility of retrieving the desired information in 3-lines code, in array format or as a Pandas DataFrame.

Installation

From Pypi:

pip install google-vision-wrapper

Cloning the Github repository:

git clone https://github.com/gcgrossi/google-vision-wrapper

Before you start: Setup your GCP

Before starting, it is mandatory to correctly setup a Google Cloud Project, authorise the Google Vision API and generate a .json API key file. Be sure to have fulfilled all the steps in the Before you Begin Guide before moving on.

Classes and Methods

1. Class Initialization

class GVisionAPI():  
  def __init__(self,keyfile=None):
   '''
   set credentials as environmental variable and 
   instantiate a Google ImageAnnotator Client
   see https://cloud.google.com/vision/docs/before-you-begin
   and https://cloud.google.com/vision/docs/detecting-faces 
   for more info
   '''

Description:

Main Class Constructor. set credentials as environmental variable and instantiate a Google ImageAnnotator Client. See 1 and 2 for more info.

Parameters:

keyfile: path to the .json auth file returned by google vision API. Refer to here to know how to get it

Usage:

from gvision import GVisionAPI

authfile = 'path_to_authfile'
gvision = GVisionAPI(authfile)

2. Perform a Request

def perform_request(self,img=None,request_type=None):
    '''
    given and imput image in either numpy array or bytes format
    chek type and perform conversion nparray->bytes (if needed).
    Provides the bytes content to the Google client and perform
    an API request based on the "request_type" parameter.
    Response can be accessed using the self.response attribute.

    Parameters:
    - img : input imange of type numpy.ndarray or bytes
    - request_type : a string in ['face detection','landmark detection','logo detection',
                                  'object detection','label detection','image properties',
                                  'text detection','handwriting detection','web detection']
      representing the type of request to perform
    '''

Description:

Given and imput image in either numpy array or bytes format checks type and perform conversion nparray->bytes (if needed). Provides the bytes content to the Google client and performs an API request based on the "request_type" parameter. Response can be accessed using the self.response attribute. All the possible options can printed with the method request_options() and retrieved as a list from the request_types attribute.

Parameters:

img : input imange of type numpy.ndarray or bytes
request_type : a string representing the type of request to perform. Possible values: ['face detection','landmark detection','logo detection','object detection','label detection','image properties','text detection','handwriting detection','web detection'].

Usage:

import cv2

# replace 'path_to_image' with what you want
img   = cv2.imread('path_to_image')

#print the possible options via method
gvision.request_options()
# or via attribute
print(gvision.request_types)

#perform request
gvision.perform_request('face detection')
print(gvision.response)

3. Obtain Data as list

    def face_landmarks(self):
        '''
        Loop on the face annotations and, for each face,
        append a 2-tuple list with (x,y) coordinates of detected points.
        append also a list with the corresponding point names.
        return the two lists created
        '''

    def head(self):
        '''
        Loop on the face annotations and, for each face,
        append a 2-tuple list with (x,y) coordinates of head bounding box.
        append also a list with the corresponding point names.
        return the two lists created
        '''
        
    def face(self):
        '''
        Loop on the face annotations and, for each face,
        append a 2-tuple list with (x,y) coordinates of face bounding box.
        append also a list with the corresponding point names.
        return the two lists created
        '''
        
    def angles(self):
        '''
        Loops on the face annotations and, for each face,
        appends a list with the rotation angles on the 3 axis.
        appends also a list with the corresponding angle names.
        return the two lists created
        '''     
    
    def objects(self):
        '''
        Loops on the detected objects. For each,
        appends a list with name, confidence and
        (x,y) 2-tuples with the bounding box coordinates.
        Appends also a list with the corresponding headers.
        Returns the two lists created.
        '''
        
    def colors(self):
        '''
        Loops on the detected colors. For each,
        appends a list with: 
        - (r,g,b) 3-tuple with values of the color channels.
        - pixel fraction.
        - score
        Appends also a list with the corresponding headers.
        Returns the two lists created.
        '''
        
    def crop_hints(self):
        '''
        Loops on the crop_hints. For each,
        appends a list with: 
        - (x,y) 2-tuple with coordinates of the cropped image.
        - confidence.
        - importance fraction.
        Appends also a list with the corresponding headers.
        Returns the two lists created.
        '''
        
    def logos(self):
        '''
        Loops on the detected logos. For each,
        appends a list with name, confidence and
        (x,y) 2-tuples with the bounding box coordinates.
        Appends also a list with the corresponding headers.
        Returns the two lists created.
        '''
    
    def texts(self):
        '''
        Loops on the detected texts. For each,
        appends a list with description, language and
        (x,y) 2-tuples with the bounding box coordinates.
        Appends also a list with the corresponding headers.
        Returns the two lists created.
        '''
        
    def pages(self):
        '''
        Loops on the detected pages. For each,
        appends a list with language, confidence, 
        height and width.
        Appends also a list with the corresponding headers.
        Returns the two lists created.
        ''' 
        
    def blocks(self):
        '''
        Loops on the blocks in the detected pages. For each,
        appends a list with description, language, confidence, block type 
        and (x,y) 2-tuples with the bounding box coordinates.
        Appends also a list with the corresponding headers.
        Returns the two lists created.
        '''
    
    def paragraphs(self):
        '''
        Loops on the paragraphs in the blocks in the detected pages. For each,
        appends a list with description, language, confidence
        and (x,y) 2-tuples with the bounding box coordinates.
        Appends also a list with the corresponding headers.
        Returns the two lists created.
        '''
    
    def words(self):
        '''
        Loops on the words in the paragraphs 
        in the blocks in the detected pages. For each,
        appends a list with language
        and (x,y) 2-tuples with the bounding box coordinates.
        Appends also a list with the corresponding headers.
        Returns the two lists created.
        '''
    
    def symbols(self):
        '''
        Loops on the symbols in the words, in the paragraphs,
        in the blocks, in the detected pages. For each,
        appends a list with text, language
        and (x,y) 2-tuples with the bounding box coordinates.
        Appends also a list with the corresponding headers.
        Returns the two lists created.
        '''
     
    def landmarks(self):
        '''
        Loops on the detected landmarks. For each,
        appends a list with: 
        - name
        - confidence
        - (x,y) 2-tuples with coordinates of the bounding box
        - 2-tuple with latitude and logitude.
        Appends also a list with the corresponding headers.
        Returns the two lists created.
        '''
        
    def labels(self):
        '''
        Loops on the detected labels. For each,
        appends a list with name, 
        confidence and topicality.
        Appends also a list with the corresponding headers.
        Returns the two lists created.
        '''
        
    def web_entities(self):
        '''
        Loops on the detected web entities. For each,
        appends a list with description and score.
        Appends also a list with the corresponding headers.
        Returns the two lists created.
        '''
    
    def matching_images(self):
        '''
        Loops on the detected pages with matching images. 
        For each, appends a list with the page title and url,
        plus the fully or partially matched images found (url).
        Appends also a list with the corresponding headers.
        Returns the two lists created.
        '''
        
    def similar_images(self):
        '''
        Loops on the suggested similar images. For each,
        appends a list with the url of that image.
        Appends also a list with the corresponding headers.
        Returns the two lists created.
        '''

Description:

Response data in form of list. For more detailed information regarding the headers meaning or type of information please refer to the corresponding guides:

N.B.: to each request type correspond different information that can be retrieved. I.e. gvision.perform_request('face detection') must me used to retrieve face_landmarks, face, head, angles information. If another type of request has been performed the API will throw a "key" error. Refere to the section Usage below for a full explanation.

Return types:

headers : list.
data : list

Usage:

# to each type of request corresponds 
# different information that can be retrieved

gvision.perform_request('face detection')
headers,   data   = gvision.face_landmarks()
h_headers, h_data = gvision.head()
f_headers, f_data = gvision.face()
a_headers, a_data = gvision.angles()

gvision.perform_request('object detection')
headers, data = gvision.objects()

gvision.perform_request('landmark detection')
headers, data = gvision.landmarks()

gvision.perform_request('label detection')
headers, data = gvision.labels()

gvision.perform_request('logo detection')
headers, data = gvision.logos()

gvision.perform_request('text detection')
# work also with:
#gvision.perform_request('handwriting detection')

headers,    data    = gvision.texts()
p_headers,  p_data  = gvision.pages()
b_headers,  b_data  = gvision.blocks()
pr_headers, pr_data = gvision.paragraphs()
w_headers,  w_data  = gvision.words()
s_headers,  s_data  = gvision.symbols()

gvision.perform_request('image properties')
headers,    data    = gvision.colors()
ch_headers, ch_data = gvision.crop_hints()

gvision.perform_request('web detection')
headers,   data   = gvision.web_entities()
m_headers, m_data = gvision.matching_images()
s_headers, s_data = gvision.similar_images()

4. Obtain Data as a Pandas DataFrame

def to_df(self,option=None,name='image'):
    '''
    Parameters:
    - option: a string in ['face landmarks','face','head','angles',
                          'objects','landmarks','logos','labels',
                          'colors', 'crop hints','texts','pages',
                          'blocks','paragraphs','words','symbols',
                          'web entitites','matching images',
                          'similar images']
      precifing the type of information to dump in the DataFrame
    - name: (optional) the name of the image used in the request. 
      default is 'image'.

    Returns:
    a DataFrame with information for the specific option.
    '''

Description:

Dump the information specific to the option parameter into a pandas DataFrame.

N.B.: Each option retrieves information via the output lists of the functions defined in Section 3. The same rule regarding the type of request and the information available is applicable to this method.

Parameters:

option : a string representing the type of information to retrieve. Possible values: ['face landmarks','face','head','angles','objects','landmarks','logos','labels','colors', 'crop hints','texts','pages','blocks','paragraphs','words','symbols','web entitites','matching images','similar images']. All the possible options can printed with the method df_options() and retrieved as a list from the df_types attribute.
name : a value representing the name or the id of the processed image that will be appended to each row of the returned DataFrame.

Usage:

#print the possible options via method
gvision.df_options()
# or via attribute
print(gvision.df_types)

# dump to DataFrame
gvision.perform_request('object detection')
df = gvision.to_df('objects')

terragord7 / google-vision-wrapper Goto Github PK