Code Monkey home page Code Monkey logo

ds_project_f2020's Introduction

Team 4 Project for CSE 40647

Predicting Congressional Party Flips with Binary Classification

Patrick Soga, Connor Delaney, Luke Siela, and Brian Cariddi


The purpose of this project is to predict whether the party for a given congressional district's representative will change during an election using the demographic data of that district. Below is a description of the file structure and what each file/folder means.

data

This contains all project data. original_feature_data and original_label_data contain the original raw data from E. Scott Adler's personal site and MIT's Election Data Science Lab, respectively.

flipped_data contains all data using whether the congressional district flipped parties or not as a label as well as the feature data concatenated using the demographic data collected in original_feature_data.

scaled_merged_features_and_flipped_labels.csv is the main cleaned and scaled data file that has all integrated feature and label data. Use this for training/testing models.

1978_case_study.csv contains cleaned and scaled data for the case study for 1978.

data_cleaning_and_scaling

This contains all scripts for cleaning and scaling the data. concat_demographics.py combines all demographic feature data for each congressional district and merges them into a single file.

data_reduction_house.py removes all unnecessary columns from the election outcome label data such as candidate name, whether the candidate was a write-in, etc. and then collects the election outcome label data from all years of interest into a single file.

get_flip_labels.py does further processing on the collected election label data and generates further features such as prev_party and win_ratio.

merge_on_id.py takes the accumulated feature and label data and joins them based on a custom ID.

scaling.py processes that integrated data, standardizing and normalizing features that need to be scaled.

modeling

This contains all the scripts that run and tune relevant models. all_models.py instantiates various models and trains them. Based on given parameters, it may oversample or undersample or leave alone the training data for the models. All results are saved in the output folder.

model_optimization.py contains a general function used for iterating over a numerical range representing the numerical value of the hyperparameter of interest and plots the accuracy/area under the ROC curve against the hyperparameter value. Change manually to tune different classifiers.

case_study.py trains an AdaBoost model using oversampled training data and then predicts whether the districts in the 1978 case study data flip or do not flip. It prints the indices of the data objects in the 1978 case study csv for the user to inspect.

feature_interpretation.py is very similar to all_models.py except it performs a hold-out analysis of the features to investigate each feature's impact on accuracy. All results are saved in the output folder and contains each feature paired with the accuracy of the model after removal of that feature.

output

This contains all model results output by the scripts in modeling. Their names do not match what the scripts wills save new files as since what is currently in output are our group's results which the user can verify. correlations contains the correlation matrix for the features with respect to the flip label.

ds_project_f2020's People

Contributors

ajb117 avatar cdelaney314 avatar bcariddi avatar lukesiela avatar patricksoga avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.