Code Monkey home page Code Monkey logo

openelections-data-ia's Introduction

Build Status

openelections-data-ia

Election results for Iowa.

These are machine readable results converted from the PDF results published at http://sos.iowa.gov/elections/results/.

Methodology

Parsing script

Files were converted to text using the pdftotext command and the text output was passed through a filter to convert the raw text to CSV:

pdftotext -layout pdf/20000606__ia__primary__county.pdf - | ./bin/parse_2000_primary.py > 20000606__ia__primary__county.csv

There are parsing scripts for many of the results files with descriptive names or docstrings. These scripts aren't very DRY, because figuring out what changed between vintages of files and abstracting this out seemed a low priority.

Manual preprocessing

2006 General

pdftotext couldn't extract the text from the PDF file for the county-level 2006-11-07 general election results. I used Adobe Acrobat Pro 9 to extract the text from the file and saved it to txt/20061107__ia__general__county.orig.txt. Some of the text was transposed and the spacing made it difficult to parse, so I had to manually clean it up using vim and LibreOffice Calc. The cleaned text file is saved in txt/20061107__ia__general__county.txt.

2013 Special Election, State Senate District 13 Warren County

This was an image PDF. I used pdftoppm, ImageMagick and Tesseract to extract text from the PDF. These steps are performed in bin/ocr_2013_special_ss_13_precinct_warren.

I then copied and pasted the text into LibreOffice Calc and manually corrected some of the values that were incorrectly recognized by Tesseract. This data mirrors the layout of the PDF file and is saved in input/20131119__ia__special__general__warren__state_senate__13__precinct.csv.

Finally, I used the script bin/reshape_2013_special_precinct_ss_13_warren.py to reshape the data from the raw format to our more standard format, with one row per candidate (or pseudo-candidate) result.

2014 Special Election, State House District 25

These were image PDFs. I used pdftoppm, ImageMagick and Tesseract to extract text from the PDF. These steps are performed in bin/ocr_2014_special_sh_25_precinct_(warren|madison).

I then copied and pasted the text into LibreOffice Calc and manually corrected some of the values that were incorrectly recognized by Tesseract. This data mirrors the layout of the PDF file and is saved in 20140107__ia__special__general__(madison|warren)__state_house__25__precinct.csv.

Finally, I used the script bin/reshape_2014_special_precinct_sh_25.py to reshape the data from the raw format to our more standard format, with one row per candidate (or pseudo-candidate) result.

Manual entry

Some files had a small number of results and were in a format that was difficult to parse. These were entered manually.

This process was used for the following elections:

  • 2000-01-04 Special Election, State Representative, District 53
  • 2002-03-12 Special Election, State Senate, District 10
  • 2002-01-22 Special Election, State House, District 28
  • 2002-02-19 Special Election, State Senate, District 39

openelections-data-ia's People

Contributors

dwillis avatar ghing avatar keithly avatar ljanek avatar palewire avatar ra-data avatar warwickmm avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

openelections-data-ia's Issues

Weird values for party in 2018 precinct data

For the 2018 general election precinct data, there are some weird values in the party column: "Kevin," "illegible last name," "blank," and "Bob Rasmussen." Just wanted to flag this -- not sure if this was a result of the coding process or the form that the raw IA data came in?

Convert 2015 specials PDF results

Available here. Using Tabula, create CSV files that look like this one. For races that have both county and precinct-level results, create separate files for both.

  • State Representative District 4, Jan. 4, 2015 (filename would be 20150104__ia__special__general__precinct.csv for precinct results)
  • State Representative District 23, Feb. 10, 2015 (filename would be 20150210__ia__special__general__precinct.csv for precinct results)
  • State Representative District 5, Nov. 3, 2015 (filename would be 20151103__ia__special__general__precinct.csv for precinct results)
  • State Representative District 21, Dec. 8, 2015 (filename would be 20151208__ia__special__general__precinct.csv for precinct results)

Incorrect office entries in 2012 files

2012/20121106__ia__general__allamakee__precinct.csv has some incorrect entries in the office column:

Allamakee,PCT 8 - IA/UC/NA/NA CITY,President/Vice President,,,Under Votes,1.0,2.0,3.0
Allamakee,PCT 8 - IA/UC/NA/NA CITY,President/Vice President,,,Over Votes,0.0,0.0,0.0
Allamakee,PCT 9 - WK 1, Mitt RomneyPaul RyanRepublican,,Republican,Mitt Romney Paul Ryan,175.0,196.0,371.0
Allamakee,PCT 9 - WK 1, Mitt RomneyPaul RyanRepublican,,Democratic,Barack Obama Joe Biden,216.0,171.0,387.0
Allamakee,PCT 9 - WK 1, Mitt RomneyPaul RyanRepublican,,Constitution,Virgil Goode James Clymer,0.0,1.0,1.0

"Mitt RomneyPaul RyanRepublican" should be "President/Vice President".

Incorrect district entries in 2020 files

Many of the 2020 files have some suspect district values. For example, 2020/20201103__ia__general__adair__precinct.csv has column headers:

county,precinct,office,district,party,candidate,votes,election_day,absentee

However, there are rows like the following:

Adair,1 NW,Question 1, and propose amendment or amendments to same?,,Yes,186,106,80

Convert PDF results from 2010 primary

  • 20100608__ia__primary__county.pdf - Candidate names are in annoying diagonal orientation. We could manually data enter these, or try to convert to image, rotate, and OCR.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.