Code Monkey home page Code Monkey logo

data-middleware's Introduction

DataMidWare - A Data Middleware Python Library

Motivation:

  • In this data-driven-age, every small and large organization is taking their business decisions based on the data insights.
  • Transformation of raw data into usable cleaned data is crucial to take data-driven-decisions.
  • Data cleaning and preparation takes valuable time away from real data science work – model building and deployment
  • Bulk time is spent on data preparation (According to Anaconda survey-2020, on average 45% of time is spent on data preparation, 21% of time spent on visualization)

Why DataMidWare?

DataMidWare accelerates data preparation, analysis, and visualization tasks under a unified and single framework to overcome efficiency gap

Introduction:

DataMidWare is a data middleware python library which accelerates data preparation, analysis, and visualization tasks by integrating different technologies, software, and libraries using its APIs.

The DataMidWare performs the following tasks:

  • Provides direct interactions with database, data preparation & processing tool, algorithm library, and data visualization tool using its APIs.
  • Imports and parses raw data (csv, json, txt, xml) from different sources and load to database (MySQL, NoSQL).
  • Produces clean data from raw format and stores into database for analysis
  • Performs SQL queries using its APIs
  • Performs data analysis on the data stored in database
  • Provides direct visualization of csv, json data, and data stored in database
  • Exports data in different format (csv, json) from database.

Architectural Overview:

Alt Text

Package Structure Overview:

datamidware/                                     Top-level package
        __init__.py
        pyalgo/
             __init__.py
        pydp/
             __init__.py
        pydb/
             __init__.py
        pyviz/
             __init__.py
        pydm/
             __init__.py
             create_mysql_db.py
             file2db.py
             db2file.py
             csv2mysql.py
             json2mysql.py
             mysql2csv.py
             mysql_query.py
settings/
        __init__.py
        write_config.py
        config.ini

tests/
        __init__.py
        test_data/
        test_result/
        test_pyviz.py
        test_search.py
        test_sort.py
        test_pydb.py
        test_pydp.py
        test_pydm
        test_mysql_query.py
  • pyalgo: Algorithm library
    • Search algorithms
    • Sort algorithms
  • pydb: Database connection and SQL query library
  • pydp: Data Processing library
  • pydm: Middleware library
  • pyviz: Data Visualization library
  • settings: Contains mysqlconfig.py file
  • tests: Contains test modules

APIs: (Reference)

  1. file2db.file2db(host, user, password, filename, db_name, tb_name, file_type="csv", db_type="mysql", key=None) :

    Imports raw structured/semi-structured data (csv, json) into database (MySQL, NoSQL).
    
    Parameters:
        host: host name
        user: user name
        password: password
        filename: filename to send to database
        file_type: file type (csv, json), str
        db_type: database type (mysql, nosql), str
        tb_name: name of the table where data will be stored
    
    Import statement: from datamidware.pydm import file2db
    
  2. db2file.db2file(host, user, password, file_path, db_name, tb_name, file_type="file_type", db_type="db_type") :

    Exports table data as csv/json format from the database
    
    parameters:
        host: host name
        user: user name
        password: password
        file_path: file path to export data from database
        file_type: file type (csv, json)
        db_type: database type (mysql, nosql)
        db_name: database name from where data is exported
        tb_name: name of the table from where data will be exported
    
    Import statement: from datamidware.pydm import db2file
    
  3. create_mysql_db.create_mysql_db(host, user, password, db_name) :

    Creates a new MySQL database
    
    parameters:
        host: host name
        user: user name
        password: password
        db_name: database name to be created
    
    Import statement: from datamidware.pydm import create_mysql_db
    
  4. csv2mysql.csv2mysql(host, user, password, filename, db_name, tb_name) :

    Imports csv file into mysql database
    
    Parameters:
        host: host name
        user: user name
        password: password
        filename: filename to send to database
        db_name: name of the database -- if database already exists, import data
                   in the existing database, if not exists, create new database and import
                   data.
        tb_name: name of the table -- if table already exists, add data
                   in the existing table, if not exists, create new table and import
                   data.
    
    Import statement: from datamidware.pydm import csv2mysql
    
  5. json2mysql.json2mysql(host, user, password, filename, db_name, tb_name, key=None) :

    Imports json file and converts json file into pandas DataFrame.
    Sends DataFrame to mysql database table
    
    Parameters:
        host: host name
        user: user name
        password: password
        filename: filename to send to database
        db_name: name of the database -- if database already exists, import data
                       in the existing database, if not exists, create new database and import
                       data.
        tb_name: name of the table -- if table already exists, add data
                       in the existing table, if not exists, create new table and import
                       data.
        key: json key name to create mysql table
    
    Import statement: from datamidware.pydm import json2mysql
    
  6. mysql2csv.mysql2csv(host, user, password, file_path, db_name, tb_name) :

    Exports csv file from mysql database table
    
    Parameters:
        host: host name
        user: user name
        password: password
        file_path: file path to save csv file
        db_name: name of the database from where data will be exported
        tb_name: name of the table from where data will be exported
    
    Import statement: from datamidware.pydm import mysql2csv
    
  7. db2viz.db2viz(host, user, password, db_name, tb_name, kind=None, x=None, y=None, …):

    Visualize the Database table data
    
    Parameters:
        host: host name
        user: user name
        password: password
        db_name: database name
        tb_name: table name
        kind: plot kind (bar, horizontal bar, hist,..)
        file_path: file path to save figure
        x: x data; list or array-like
        y: y data; list or array-like
        title: title of the figure
        label: x-label, y-label
    
    Import statement: from datamidware.pydm import db2viz
    
  8. Class mysql_query.MySQLDatabase(): - Database connection class. - Performs mysql queries. - For example:

    mysql_query.MySQLDatabase.select(tb_name, row_count="all")
    
            Execute SQL query: SELECT * FROM table.
            Selecting all(or one if row_count="one") rows from the table.
    
            Parameters:
                query: SQL query to select rows: SELECT * FROM <table>
                row_count: "all" or "one" row. default "all".
                return: list of rows selected.
    
    
    mysql_query.MySQLDatabase.drop_column(tb_name, col_name)
    
            Drop a column in a table.
    
            Execute SQL query:
    
                "ALTER TABLE <table name>
                    DROP COLUMN <column name>"
    
            Parameters:
                tb_name: The name of the table to modify
                col_name: The name of the column to delete from the table.
                return: number of rows affected after modification
    
    mysql_query.MySQLDatabase.rename_column(tb_name, old_name, new_name, col_def, col_pos=None)
            Rename a column in a table.
    
            Execute SQL query:
    
                "ALTER TABLE <table name>
                    CHANGE COLUMN <old name> <new name>
                    column_definition
                    [ FIRST | AFTER column_name ]"
    
            Parameters:
                tb_name: The name of the table to modify
                old_name: The column name to rename
                new_name: The new name for the column
                col_def: The data type and definition of the column (NULL or NOT NULL, etc).
                            You must specify the column definition when renaming the column,
                            even if it does not change.
                col_pos: Optional. It tells MySQL where in the table to position the column,
                            if you wish to change its position.
                return: number of rows affected after modification
    

Code Examples:

datamidware

Deployment and Execution Reference

Environment setup using using PyCharm Reference

  • IDE: PyCharm Installation and setup -
  • Install MySQL server 8.0.22 (using Homebrew on Mac Mojave >= 10.14.4)
  • Install MySQL Workbench 8.0.21 (using Homebrew on Mac Mojave >= 10.14.4)
  • Create new project in virtual environment in PyCharm -
  • Programming language: Python (version 3.7)
  • Clone package from Github –
  • Install Database Navigator -
  • To install dependencies, install requirements.txt using pip -

Environment setup using Anaconda Reference

Configuration file & testing:

  • To successfully run the test, update mysqlconfig.py file following the below steps:
    • go to -> settings/
    • update mysqlconfig.py with database connection credentials

To test all the modules, run following command from the top-level directory

$ pytest -v tests

OR

To test individual module, run following unittest commands from the top-level directory

$ pytest -v tests/test_pydm.py
$ pytest -v tests/test_pyviz.py
$ pytest -v tests/test_mysql_query.py
$ python3 -m unittest tests/test_search.py
$ python3 -m unittest tests/test_sort.py

User Guide

  • Project environment setup and execution instructions is available in the project_setup.pdf file - docs/userguide/project_setup.pdf
  • APIs information is available in the api.pdf file - docs/userguide/api.pdf

Future Work

  • Extend all the libraries – (data processing, algorithms, and data visualization library)
  • Integrate Exploratory and Inferential data analysis library
  • Integrate Machine Learning algorithms library
  • Integrate Web Visualization
  • Integrate with other Databases (MSSQL)
  • Integrate with Splunk

data-middleware's People

Contributors

jagritig avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.