Code Monkey home page Code Monkey logo

ds-eda-project's Introduction

EDA Project - Data Analysis & Visualisation

This project is centered around exploratory data analysis techniques and presentation of results to a client.

The data

  • The King County Housing dataset contains information about home sales in King County (USA).
  • The data was accessed via DBeaver and different tables were joined via SQL.
  • The initial data set contains > 21.000 rows and > 20 columns.
  • I created additional restrictions to the clients characteristics to be able to filter the data accordingly.
  • The description of the column names can be found in the column_names.md file.

The client

Note: The client is made up (any resemblance to present people is absolutely random), assumptions about answers the client would give to specific questions are therefore also made up. (i.e. How do you define a rich neighborhood? -> take the zipcodes with most houses in upper 10% percentile).

name type characteristics
William Rodriguez Buyer 2 people, country (best timing & non-renovated) & city house (fast & central location), wants two houses



The project goal is to generate and answer hypotheses concerning the overall data, as well has providing recommendations for the client.



EDA process

The process is documented in the EDA-file.


Step 0: Research Questions

  • What do I want to find out?

    -> Generate Hypotheses


Step 1: Data Understanding / Exploring

  • Dataframe shape
  • head, tail, columns
  • dtypes
  • describe
  • info, etc.

Step 2: Data Cleaning

  • Remove unneccessary columns
  • Identifying duplicated rows/columns
  • Renaming Columns
  • Turn integers into date-time-objects
  • Check/handle missing values
  • Look for and handle possible errors

Step 3: Data Preparation

  • filter houses according to clients needs and wishes
  • Make decisions about which houses to recommend to clients

Step 4: Feature Understanding & Relationships - Data Visualisation

  • Map locations
  • Boxplots
  • Scatterplots
  • Heatmap
  • Pairplots
  • (Multi-)Lineplots
  • Compare Top Houses

Step 5: Answer questions about the data

  • When is the best time to buy the country house?
  • Show top three country and city houses
  • Insights and recommendations for client

Presentation

The final presentation slides are located at the presentation folder.

Requirements

The requirements can be found in the 'requirements.txt' file

  • pyenv
  • python==3.11.3

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.