Code Monkey home page Code Monkey logo

helthcare-system's Introduction

HELTHCARE SYSTEM BIG DATA ANALYTICS

shutterstock_400002673-760x475@2x

Requirement

A Health Care insurance company is facing challenges in enhancing its revenue and understanding the customers so it wants to take help of Big Data Ecosystem to analyze the Competitors company data received from varieties of sources, namely through scrapping and third-party sources. This analysis will help them to track the behavior, condition of customers so that to customize offers for them to buy insurance policies and also calculate royalties to those customers who buy policies in past, this in turn will enhance their revenues.

The goal of the project

The goal of the project is to create data pipelines for the Health Care insurance company which will make the company make appropriate business strategies to enhance their revenue by analyzing customers behaviors and send offers and royalties to customers respectively.

Major Components

Apache Spark Logo hadoop

Environment

  • Linux (Ubuntu 18.04)
  • Hadoop 2.7.2
  • Spark 2.0.2
  • sqoop 1.4.7
  • python3

STEPS:

DATASET CREATION

A data set (or dataset) is a collection of data. In the case of tabular data, a data set corresponds to one or more database tables, where every column of a table represents a particular variable, and each row corresponds to a given record of the data set in question. The data set lists values for each of the variables, such as height and weight of an object, for each member of the data set. Each value is known as a datum. Data sets can also consist of a collection of documents or files.

DATA CLEANING

Data cleaning is the process of fixing or removing incorrect, corrupted, incorrectly formatted, duplicate, or incomplete data within a dataset. When combining multiple data sources, there are many opportunities for data to be duplicated or mislabeled. If data is incorrect, outcomes and algorithms are unreliable, even though they may look correct. There is no one absolute way to prescribe the exact steps in the data cleaning process because the processes will vary from dataset to dataset.

DATABASE CREATION

A database is an organized collection of structured information, or data, typically stored electronically in a computer system. A database is usually controlled by a database management system (DBMS). Together, the data and the DBMS, along with the applications that are associated with them, are referred to as a database system, often shortened to the just a database.

LOADING DATA TO DATABASE

Data loading refers to the "load" component of ETL. After data is retrieved and combined from multiple sources (extracted), cleaned, and formatted (transformed), it is then loaded into a storage system, such as a cloud data warehouse, or relational database.

DATA TRANSFER TO HDFS USING SQOOP

Apache Sqoop is a command-line interface application used for transferring data between relational databases and Hadoop. The focus of this blog is on making the readers thoroughly understand Apache Sqoop and its deployment.

HIVE

Apache Hive is a particularly efficient tool when it comes to big data (exponential data that is to be analyzed). A warehouse data software that supports the data analysis process of big data on a regular basis, the concept of hive big data is quite popular in the technological realm. As data is stored in the Apache Hadoop Distributed File System (HDFS) wherein data is organized and structured, Apache Hive helps in processing this data and analyzing it producing data-driven patterns and trends. Fit to be used by organizations or institutions, Apache Hive is extremely helpful in big data and its ever-changing growth.

Spark SQL

Spark SQL is a Spark module for structured data processing. It provides a programming abstraction called DataFrames and can also act as a distributed SQL query engine. It enables unmodified Hadoop Hive queries to run up to 100x faster on existing deployments and data.

DATA VISUALIZATION

Data visualization is the representation of data through use of common graphics, such as charts, plots, infographics, and even animations. These visual displays of information communicate complex data relationships and data-driven insights in a way that is easy to understand.

Contributors


Tejash

๐Ÿ’ป

Abhay

๐Ÿ’ป

Arjit

๐Ÿ’ป

Prity

๐Ÿ’ป

Mohit

๐Ÿ’ป

Vikalp

๐Ÿ’ป

Utkarsh

๐Ÿ’ป

Karan

๐Ÿ’ป

Piyush

๐Ÿ’ป

Sumedh

๐Ÿ’ป

Shivam

๐Ÿ’ป

Aayush

๐Ÿ’ป

Pardeep

๐Ÿ’ป

Rutwick

๐Ÿ’ป

Madhu

๐Ÿ’ป

Khushboo

๐Ÿ’ป

Yuvraj

๐Ÿ’ป

Harshvardhan

๐Ÿ’ป

Aditya

๐Ÿ’ป

Ujjwal

๐Ÿ’ป

License

This repository is licensed under Apache License 2.0 - see License for more details

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.