Code Monkey home page Code Monkey logo

codingdojodata-enrichment-hypothesis-testing-codealong's Introduction

Mock Belt Exam Revisited - For Class

  • 05/05/05

Original Instructions

Data Enrichment Mock Exam

API results:

https://drive.google.com/file/d/10iWPhZtId0R9RCiVculSozCwldG-V3eH/view?usp=sharing

  1. Read in the json file
  2. Separate the records into 4 tables each a pandas dataframe
  3. Transform In this case remove dollar signs from funded amount in the financials records and convert to numeric datatype
  4. Create a database with SQLAlchemy and add the tables to the datbase

  1. Perform a hypothesis test to determine if there is a signficant difference between the funded amount when it is all males and when there is at least one female in the group.

Follow-Up Hypothesis to Test (if there's time)

  • If there is time, perform an additional hypothesis test to determine if there is a significant difference in the funded amount for different sectors.

ETL of JSON File

import json
import pandas as pd
import numpy as np
import seaborn as sns
from scipy import stats


import pymysql
pymysql.install_as_MySQLdb()

from sqlalchemy import create_engine
from sqlalchemy_utils import create_database, database_exists

Extract

## Loading json file
with open('Mock_Crowdsourcing_API_Results.json') as f:
    results = json.load(f)
results.keys()
## explore each key 
type(results['meta'])
## display meta
results['meta']
## display data
type(results['data'])
## preview the dictionary
# results['data']
## preview just the keys
results['data'].keys()
## what does the crowd key look like?
# results['data']['crowd']
## checking single entry of crowd
results['data']['crowd'][0]
## making crowd a dataframe
crowd = pd.DataFrame(results['data']['crowd'])
crowd
## making demographics a dataframe
demo = pd.DataFrame(results['data']['demographics'])
demo
## making financials a dataframe
financials = pd.DataFrame(results['data']['financials'])
financials
## making use a dataframe
use = pd.DataFrame(results['data']['use'])
use

Transform

## fixing funded amount column
financials['funded_amount'] = financials['funded_amount'].str.replace('$','')
financials['funded_amount'] = pd.to_numeric(financials['funded_amount'])
financials

Load

## loading mysql credentials
with open('/Users/codingdojo/.secret/mysql.json') as f:
    login = json.load(f)
login.keys()
## creating connection to database with sqlalchemy
connection_str  = f"mysql+pymysql://{login['user']}:{login['password']}@localhost/mock-belt-exam"
engine = create_engine(connection_str)
## Check if database exists, if not, create it
if database_exists(connection_str) == False: 
    create_database(connection_str)
else: 
    print('The database already exists.')
## saving dataframes to database
financials.to_sql('financials', engine, index=False, if_exists = 'replace')
use.to_sql('use', engine, index=False, if_exists = 'replace')
demo.to_sql('demographics', engine, index=False, if_exists = 'replace')
crowd.to_sql('crowd',engine, index=False, if_exists = 'replace')
## checking if tables created
q= '''SHOW TABLES;'''
pd.read_sql(q,engine)

Hypothesis Testing

Follow the Guide: Choosing the Right Hypothesis Test from the LP.

1. State the Hypothesis & Null Hypothesis

  • $H_0$ (Null Hypothesis):
  • $H_A$ (Alternative Hypothesis):

codingdojodata-enrichment-hypothesis-testing-codealong's People

Contributors

jirvingphd avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.