Code Monkey home page Code Monkey logo

house_expenditures's Introduction

title author date output
Read Me
Eric and Ryan
1/17/2017
html_document

Slack: #propublica

Project Description: This ProPublica repository is part of Data for Democracy. Our purpose is to collaboratively work through analytic processes that support the journalism at ProPublica. Currently, contributors have been focused on cleaning the house expenditures dataset. We are always open to ideas for how to work with this dataset to make it more useful to ProPublica. Please contact @ryanes or @eric_bickel on Slack with any suggestions or questions.

Analysis Workflow

Reading, cleaning, and analyzing data should be done in a reproducible notebook format when possible. When submitting pull requests, please submit them from a fork of the repository and on a separate branch. Data for Democracy has an awesome set of instructions for how to do this if you need it.

Organizing Work

If contributors are working on projects other than updating the files in the main directory, they are encouraged to keep their work in a folder that is named in a way that describes the folder's contents. Some examples might be ml_model_R or alternate_cleaning_python. This should make it easier for new contributors to follow what is happening and make judgements about how to organize their contributions.

Loading and Cleaning Datasets

For each analysis, data needs to be loaded and cleaned to a format that is useable for the current analysis and for future analyses.

After data has been cleaned, the resulting dataset should be written as a csv. The csv should be made available on data.world.

Exploratory Analysis

Team members working in exploratory analysis work up general statistics, distributions of important variables, and hypotheses based on initial exploration of covariation. If this analysis is in a notebook that is different from the cleaning script, there should be documentation of which scripts need to be run in order to reproduce the analysis results.

When an analysis job is complete, a pull request to the GitHub repo should be made to be edited by collaborators of the project or a committee of assigned editors.

Modeling

Team members use modeling techniques to test the hypotheses generated in the exploratory analysis phase and to quantify relationships between variables in the data. Team members may also be working to test specific hypotheses generated by ProPublica.

Algorithms used in the modeling should be vetted through open discussions with the team and through pull requests, and final model specification should be a collaborative effort using any individual findings from the discussion. The project readme should outline these specifications, and the final modeling code should be pushed to the GitHub repo.

Reporting

Team members detail the findings in a reproducible report that can be immediately used by ProPublica. All sources and data used should be linked in the report, and the project readme containing all background in methodology and links to data and code.

house_expenditures's People

Contributors

restrellado avatar supermdat avatar josiahparry avatar ehbick01 avatar

Watchers

James Cloos avatar Frankie Zeager avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.