Code Monkey home page Code Monkey logo

vox's Introduction

Application for Data Science Intern at Vox

Here are some samples of recent work that demonstrates proficiency in areas that are potentially relevant to the role you have in mind, based on your description of the job. I will contextualize them here, and am happy to answer questions or provide further elaboration if you like.

Data Cleaning and Manipulation

Importing, cleaning and analyzing data is a perennial focus of the MIDS program, and every class touch on it in some way. I have included a project that uses Python/Pandas, but I can share a statistical analysis with a substantial EDA component using R if you would like to see it (the job description doesn't mention R). Obtaining, cleaning, manipulating and joining data was also a big part of the PASSNYC machine learning project, which is included in this repo.

This was a culminatory project to demonstrate proficiency in data import and analysis using Python/Pandas. We chose to see if there was any interesting relationship between the introduction of bike sharing and the issuance of parking tickets in New York. It turned out there wasn't, but it was still a fun project that involved some tricky data cleanup and the development of a geocoding pipeline. This was a group project, and my primary responsibility was the final section, which examined the intersection of the two datasets.

Machine Learning

The job description makes only passing reference to Machine Learning, but given that I learned about the opening from my Machine Learning instructor who works at Vox (Hi Amit), I would be remiss not to include this project. I should also note that I'm scheduled to take a Natural Language Processing class starting in September, which is one of the most ambitious machine learning courses offered by the program.

PASSNYC is an organization dedicated to increasing minority enrollment at New York's elite specialized high schools. They partnered with Kaggle for a Data For Good competition to see if they could improve their targeting of and engagement with underrepresented middle schools within the five boroughs. Although we were disqualified from entry because we had a Google employee on our team (Google owns Kaggle), we elected to attempt the competition regardless and simply passed along the results to PASSNYC to review in case they found anything useful.

It was an ambitious group project, covering substantial data manipulation (both Pandas and Numpy) as well as incorporating and analyzing four classifiers - K-Nearest Neighbors, Random Forests, Logistic Regression and Neural Nets. My specific focus was Logistic Regression, and I'm particularly proud of the post-hoc analysis that involved (especially the heatmaps). However, we all worked together and approved all aspects, so I wasn't working in isolation.

SQL

I have been using SQL for years as a web designer--the Theatre Bay Area website alone had over 30 tables with thousands of records--but that was predominantly holding content for JSP or PHP webpages, not data analysis. One of the projects for Data Engineering involved using SQL with Google BigQuery and Jupyter Notebook to make an analysis of SF Bikeshare data, and I've included that here.

This project was rather open-ended, but I used it to push the limits of what I could accomplish with SQL. It would probably be more appropriate to do some of these queries in Pandas, as they get quite complex, but it gave me a reason to explore some of the more esoteric SQL functions like CASE and PARTITION BY. I'm particularly proud of using SQL to track a subset of bikes over the course of a day in order to determine how far they stray from their original dock.

Python

The job description makes some reference to Python, and although this isn't a data-focused project, I thought it might be helpful to provide a more substantive example of my ability with object-oriented programming, rather than the notebook-based functional programming that makes up the bulk of my Python portfolio.

This is a fun little Twitter-based visualization/game whose primary purpose is to get a sense of the "temperature" of the Twitterverse on a given topic. It is a bit of a project to install (it has a few dependencies), but there is a link to a YouTube video of the game in practice that might be the most efficient way to see the code in action.

I've included a copy of my resume here as well. Thank you for your time and attention.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.