openelections / docs Goto Github PK

View Code? Open in Web Editor NEW

16.0 16.0 13.0 886 KB

Documentation for The OpenElections project

Home Page: http://docs.openelections.net/

Ruby 0.33% HTML 66.21% CSS 29.74% JavaScript 3.72%

docs's People

Contributors

Stargazers

Watchers

Forkers

ghing redgemstate evz samgassel jonrobinson2 jslap lexifdev benlk todrobbins wtadler jotasprout capesepias dlu

docs's Issues

Section describing types of GitHub repos and what they are for

And how to contribute to them.

Explain how we approach Parties and standardization

Principles:

Standardization across states for parties that field presidential candidates
Standardization within a state for other parties
Don't standardize within a state if any ambiguity

Document fields in baked results

I realized when we were talking about differences between rows in baked RawResults because of different structures of the input files that we didn't have any documentation for the fields in the baked results. They're obvious for anyone who's looked at the models, but less so from someone just consuming the data.

Writing this will be fast and straightforward, but my one question is about where this should live, on www.openelections.net or on docs.openelections.net and link to it. I'm inclined toward putting it with the docs because that seems more maintainable. But I'm open to other opinions.

Add state-specific documentation

Build state pages with details on acquisition and conversion process.

Add docs for url_paths.csv

Quick start docs

At the beginning of the contributing code guide and the openelctions-core README, there should be a short section that walks through all the pipeline steps quickly. This should include command line examples for a working state.

Add Pre-Processing Docs to ETL

Explain the role of pre-processed data in the process.

"Cookbook" for non-obvious loading/data situations

guide to result files

I've tried to find a doc about the result files. Other than granularity, what's the data relationship between 20181106__pa__general__county.csv, 20181106__pa__general__precinct.csv and the counties folder?

Add docs for dealing with dynamic attributes in Loader

General principles: slugify the original attribute unless its name is obscure, then change to a meaningful slug and document in a comment at that point in the loader.

Contributor docs need to better explain contract, steps to implement the modules

Via @konklone:

Docs should tell contributors in which order they should implement the different methods on the different classes
Docs should tell users the contract for each method

Add anchor links to headings

There have been a few times when I've wanted to link to a specific section of the docs, particularly in the Contributing Code page.

It looks like, in most cases, Jekyll is already giving these elements an ID, so
http://ben.balter.com/2014/03/13/pages-anchor-links/ seems like the cleanest and least intrusive way to handle this.

@openelections/owners, let me know if it's ok to spend a tiny amount of time making this change this week.

Improving volunteer pages

This builds off a conversation I had with @dwillis about getting more volunteers. Currently, the Get Involved page and the pages it links out to do not directly tell volunteers where help is needed within OpenElections and where those tasks fit into the overall project. As it stands, the pages require volunteers to read (pretty much) everything about the project before they even get a sense about where they can help.

In order to attract more volunteers, we need to push for more outreach, but without a volunteer-friendly resource that boils down what's important for them to know, the outreach is likely to fall flat.

To that end, we should reformat the Get Involved page to follow this or a similar outline:

Get Involved

What is OpenElections?

Helping OpenElections without programming

Metadata Collection (link to current metadata collection page)
Converting raw results to data
Recruiting more volunteers
Github tutorials and other documentation tasks

Contributing code to OpenElections

Writing scrapers to convert HTML or PDF results to CSVs.
Writing code to implement a state's processing pipeline (or parts of it).
Addressing issues on the core repository.

Reaching the community

Google Groups
Github
Email
Twitter

Each task should link out to a page that gives clear instructions and resources:

Task page

Task description

What are the inputs/outputs?
Why is this important?

How to complete it

For well-defined tasks, list out as many discrete steps as possible
For unclear tasks, list out a preliminary attempt at what the process would look like

Helpful resources

Who has done this before?
Videos, tutorials, blog posts
As always, Google Group and other forms of comm

By the end, we will have one place to send potential volunteers.

Load/Transform steps need updating

The Load section in Contributing Code docs reflect the old methodology, before we moved to RawResult:

The load.py file is responsible for taking the raw results and putting them into a MongoDB database. The goal is to keep the data loader simple, and defer any data cleaning and transformations to downstream steps in the process. That said, this is where the process begins to place results data into our defined models, including Candidate, Result and Contest.`

The above graph makes it sound like data normalization should take place during load process. This is no longer the case. Loader scripts should only insert data as RawResult instances.

Normalization of RawResult into Contest/Candidate/Result records should always be one of the first (usually the first) transform step. We should mention this under Transform section.

Cookbook for data processing patterns

Add comments here as we come up with patterns for solving data problems.

For example, proxying out to other loader classes.

Document bake.election_file, publish in contributing code guide

The guide reflects the bake.state_file command when it should reference the bake.election_file command since that's what we're currently using to create the downloadable results.

Also, there's no documentation for the publish command.

Fix/clarify example for special election file naming conventions

In http://docs.openelections.net/archive-standardization/, the example for the special election seems a bit wonky:

Or a single file for a special general election for a single office:

20071211__oh__general__special__house__5.csv

It seems to me like the office name should follow our standardized naming conventions and the district should be part of that office name. That is, it should not be separated from the rest of the office by a '__'. I believe the example should be:

20071211__oh__general__special__us_house_of_representatives_5.csv

Labelling conventions?

What standardized should we apply to our data repos as we begin ramping up on metrics. For example, should we create a standard label for identifying issues related to missing precinct-level data such as in Arkansas?