Code Monkey home page Code Monkey logo

nashville-housing-sql-data-cleaning's Introduction

Housing Data Cleaning Project

This project entailed the meticulous cleaning of a dataset detailing the Nashville housing market. The source of the data was an .xlsx file which was subsequently imported into SSMS and housed in a table named 'NashvilleHousing'.

Initial Steps:

  • Review of Raw Data: Started with a thorough examination of the raw data contained within the 'NashvilleHousing' table.

Cleaning Procedures:

  • 'SaleDate' Column: Utilized the CONVERT function to format dates, removing the time component.
  • 'PropertyAddress' Column: Employed the ISNULL function to fill null entries using values from rows sharing the same Parcel ID.
  • Column Division - 'PropertyAddress': Split into two new columns to separate Address and City using the SUBSTRING function.
  • Column Division - 'OwnerAddress': Split into three new columns for Address, City, and State using the PARSENAME function.
  • 'SoldAsVacant' Column: Transformed 'Y' and 'N' to 'Yes' and 'No' with a CASE statement.
  • Removing Duplicates: Implemented the ROW_NUMBER function within a CTE to identify and remove duplicates.
  • Useless Columns Removal: Deleted obsolete columns post-manipulation and those lacking valuable information, such as 'TaxDistrict'.

Finalizing:

  • Final Check: Conducted a last review to ensure the integrity and cleanliness of the updated table.

Results:

The original dataset comprising 56,477 rows was condensed to a clean, analysis-ready table of 56,373 rows. The cleaning process resulted in the elimination of merely 104 rows, which constitutes roughly 0.002% of the total data, ensuring a dataset primed for future analysis.

Nashville-Housing-Data-Cleaning

I applied a variety of SQL techniques, such as Aggregate Functions, Joins, Window Functions, CTEs, and Views to refine the dataset.

This project followed a step-by-step tutorial: SQL Tutorial

Technologies Used:

  • SQL/SSMS
  • Excel

Cleaning Process Overview:

  1. Download the dataset.
  2. Load the data into SSMS.
  3. Execute the cleaning operations.

nashville-housing-sql-data-cleaning's People

Contributors

joeyh820 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.