Code Monkey home page Code Monkey logo

cp4d-notebook-datastage's Introduction

CP4D-notebook-datastage

This project demonstrates the integration between notebook and DataStage in IBM CP4D Environment. IBM CP4D environment is IBM Cloud Pak for Data, which is an open, extensible data platform that provides a data fabric to make all data available for AI and analytics, on any cloud.

Description

Validations that are performed on files are done by a notebook in Analytics Project and transformations are performed on files through DataStage Jobs in Data Transformation Project. The functionality of the implemented functions can be viewed via notebook. Basic functionalities are listed below:

  1. Connect to IBM COS (s3) and DB2
  2. Take the data from the IBM COS landing bucket
  3. Perform checks and validations
  4. Insert the status into DB2
  5. Saved the data in IBM COS processed bucket
  6. Call DataStage REST APIs for data transformation

Advantages

  1. Customization of functionalities can be achieved via using Notebooks.
  2. Transformation on millions of rows processing done through DataStage Tool integrated into IBM CP4D Environment.
  3. No need to lift and shift data in different environments based on their usage.

Environment Details

This implementation is done on IBM CP4D which runs on top of the Openshift cluster on IBM Cloud. IBM CP4D allows three kinds of projects:

  1. Analytics Project: Notebooks, AUTO AI, Data connections, dashboards, Federated Learning Environment
  2. Data Transformation Project: DataStage ETL
  3. Data Quality Project: Data Cleansing and Data Matching, Business Terms, Rules

conf directory consists of sample_configuration_file.txt the configuration file which holds the user-provided input parameter.

It holds the client name, file name, target table name, DataStage job name, data transformation project name, username, and password of the user on this IBM CP4D cluster hosted on OpenShift. User will store all their files along with this configuration file separated by the "|" (pipe) symbol in the IBM COS bucket. The notebook will read the parameters from this configuration file and trigger the functionality. Packages required to build this project were pre-installed in Analytics Notebook.

Working Details

Analytics Notebook works like a Jupyter Notebook. When the user receives the request from client then they have to keep the parameters in configuration file and execute the code stored in Notebook. Notebook code will run as per the functionalities written in the Description section.

Analytics Project holds the data connectors for IBM COS and DB2.

Highlights

  1. IBM COS and DB2 connectors.
  2. Loading and storing data in s3 buckets.
  3. Integration between Analytics and Data Transformation Project using APIs.

Demo Screenshots

The output of the functionality can be seen through Analytics notebook as shown below: IBM JUPYTER NOTEBOOK SCREENSHOT

IBM COS Bucket Structure: IBM COS SCREENSHOT

Count Metrics in AUDIT_TABLE for file: AUDIT TABLE SCREENSHOT

File Status in AUDIT_TABLE_STATUS: AUDIT TABLE STATUS SCREENSHOT

cp4d-notebook-datastage's People

Contributors

anshita1saxena avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.