Code Monkey home page Code Monkey logo

olaolagunju / inventory-data-warehouse Goto Github PK

View Code? Open in Web Editor NEW

This project forked from aafreen29/inventory-data-warehouse

0.0 0.0 0.0 1.74 MB

This repository provides background on inventory transaction concepts, details about the snowflake schema to support inventory transaction cycles, and a data dictionary about the table design for the data warehouse. It also includes ETL process to integrate data using Pentaho Data Integration tool

inventory-data-warehouse's Introduction

Inventory-Data-Warehouse

This repository provides background on inventory transaction concepts, details about the snowflake schema to support inventory transaction cycles, and a data dictionary about the table design for the data warehouse. It also includes ETL process to integrate data using Pentaho Data Integration tool.

Basic ETL Process:

ETL stands for Extract, Transform and Load. An ETL tool extracts the data from different RDBMS source systems, transforms the data like applying calculations, concatenate, etc. and then load the data to Data Warehouse system. The data is loaded in the DW system in the form of dimension and fact tables.

Extract

A staging area is required during ETL load. There are various reasons why staging area is required.

The source systems are only available for specific period of time to extract data. This period of time is less than the total data-load time. Therefore, staging area allows you to extract the data from the source system and keeps it in the staging area before the time slot ends.

Staging area is required when you want to get the data from multiple data sources together or if you want to join two or more systems together. For example, you will not be able to perform a SQL query joining two tables from two physically different databases.

Data extractions’ time slot for different systems vary as per the time zone and operational hours.

Data extracted from source systems can be used in multiple data warehouse system, Operation Data stores, etc.

ETL allows you to perform complex transformations and requires extra area to store the data.

Transform

In data transformation, you apply a set of functions on extracted data to load it into the target system. Data, which does not require any transformation is known as direct move or pass through data.

You can apply different transformations on extracted data from the source system. For example, you can perform customized calculations. If you want sum-of-sales revenue and this is not in database, you can apply the SUM formula during transformation and load the data.

For example, if you have the first name and the last name in a table in different columns, you can use concatenate before loading.

Load

During Load phase, data is loaded into the end-target system and it can be a flat file or a Data Warehouse system.

Pentaho Data Integration(PDI)

Pentaho Data Integration (PDI) is a part of the Pentaho Open Source Business intelligence suite. It includes software for all aspects of supporting business decision making: the data warehouse managing utilities, data integration and analysis tools, software for managers, and data mining tools.

Pentaho Data Integration is well known for its ease of use and quick learning curve. PDI implements a metadata-driven approach which means that the development is based on specifying what to do, not how to do it.

Pentaho lets administrators and ETL developers create their own data manipulation jobs with a user-friendly graphical creator, and without entering a single line of code.

PDI uses a common, shared repository which enables remote ETL execution, facilitates teamwork, and simplifies the development process.

PDI components

There are a few development tools for implementing ETL processes in Pentaho:
1.Spoon- a data modeling and development tool for ETL developers. Use it to create transformations (elementary data flows) and jobs (execution sequences of transformations and other jobs).
2.Pan - executes transformations modeled in Spoon.
3.Kitchen - executes jobs designed in Spoon.
4.Carte - a simple web server used for running and monitoring data integration tasks.

Steps to follow along this repository:

1> Read InventoryDW_project.doc file. This files provides background on inventory transaction concepts, details about the snowflake schema to support inventory transaction cycles, and a data dictionary abou the table design for the data warehouse.

2> Read PentahoSetupExercise.docx file. This file gives a basic idea of Pentaho Data Integration Tool and how to use it to create transformations with step by step instructions.

3> Read ETLWithPentaho.doc file. This file provides ETL process experience with data integration tasks using Pentaho Data Integration Tool. There are tasks to perform some data cleaning operations on two data sources and then load the data into the fact table of the inventory data warehouse.

4> Follow DataIntegration.docx file. This file gives a snapshot of this project in Pentaho Data Integration Tool with both Excel and Access data sources.

inventory-data-warehouse's People

Contributors

aafreen29 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.