Code Monkey home page Code Monkey logo

lobbying_federal_domestic's People

Contributors

apendleton avatar boblannon avatar zmaril avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

lobbying_federal_domestic's Issues

Transform step for incomes is making a mistake

If you compare the original copy of this document to the transformed data, it seems that there is a mistake in how the various fields related to expense are calculated. Specifically, the expense_less_than_five_thousand field is false, implying that the expense is greater than 5000, while the expense_amount field is null. In the original document, neither field was check and expenses was left blank, which seems to imply that there were no expenses (rather than implying there was an unknown expense that was greater than 5000).

Scraping not getting certain parts of the issuses on reports

The scraper seems to be missing out on any other line besides the first when scraping the specific issuses from lobbying reports. View this form and then check the json you have locally to see that the line HR 4310, National Defense Authorization Act, provisions relating to tanks and communications just doesn't show up. My local file path for this file is /home/zmaril/data/sopr_html/2013/Q4/f7c1c383-cbf0-4233-b094-e1324b78d042.json.

Standardization of common form attributes

There are several fields that are common to both the reports and registrations forms that are expressed in different ways in each. Calling the fields the same thing in both instances would make working with the files much easier to understand and program against. Besides the following three fields, everything else matches up as well as it can and is easy to work with:

Registration Report
registrant.registrant_house_id client_registrant_house_id
registrant.registrant_senate_id client_registrant_senate_id
registration_type.amendment report_is_amendment

sopr xml document id: validity and scope?

SOPR LD forms have a "Filing ID"

<Filing ID="B0584458-99B9-4BDC-89B9-0C2098B0E28C" Year="2000" Received="2000-11-01T00:00:00" Amount="40000" Type="MID-YEAR REPORT" Period="Mid-Year (Jan 1 - Jun 30)">

things to investigate:

  • are they unique at any scope?
  • what scope (yearly, quarterly, by filer)?
  • do they appear to have any semantic information?

`houses_and_agencies` parser

Write a simple parser for the houses_and_agencies field on lobbying activities in reports. The fields come prepopulated and so we know all the possible leaves of the parse tree that could exist. Should be straightforward to do.

values for LobbyistGovPositionIndicator

What seems to be the source? What are the possible values? Why does it seem to be okay to have LobbyistCoveredGovPositionIndicator = "COVERED" while also reporting OfficialPosition="N/A"

eg:

<Filing ID="B0584458-99B9-4BDC-89B9-0C2098B0E28C" Year="2000" Received="2000-11-01T00:00:00" Amount="40000" Type="MID-YEAR REPORT" Period="Mid-Year (Jan 1 - Jun 30)">

(snip)

  <Lobbyists>
    <Lobbyist xmlns="" LobbyistName="EPPENBERGER, RYAN A" LobbyistCoveredGovPositionIndicator="COVERED" OfficialPosition="N/A"/>
    <Lobbyist xmlns="" LobbyistName="EPPENBERGER, RYAN A" LobbyistCoveredGovPositionIndicator="COVERED" OfficialPosition="N/A"/>
    <Lobbyist xmlns="" LobbyistName="DELMONTAGNE, REGIS J" LobbyistCoveredGovPositionIndicator="COVERED" OfficialPosition="N/A"/>
    <Lobbyist xmlns="" LobbyistName="NUZZACO, MARK J" LobbyistCoveredGovPositionIndicator="COVERED" OfficialPosition="N/A"/>
  </Lobbyists>

There are no registrations for relationships established prior to 1999--are there alternatives?

Registrations for client-registrant relationships prior to 1999 are not available through SOPR.

Below is the firm and client we were looking to find a filing for.

{
    "_id" : ObjectId("5359516b6e955232f6d25b32"),
    "Received" : ISODate("2000-08-14T00:00:00.000Z"),
    "Period" : "Mid-Year (Jan 1 - Jun 30)",
    "Registrant" : {
        "RegistrantPPBCountry" : "USA",
        "RegistrantID" : "41454",
        "RegistrantCountry" : "USA",
        "RegistrantName" : "Williams and Jensen, PLLC",
        "GeneralDescription" : "Lobbying law firm",
        "Address" : "701 8th Street, NW\r\nSuite 500\r\nWashington, DC 20001"
    },

determine how to treat bad encoding in original documents

example:

xmlParseCharRef: invalid xmlChar value 16, line 277, column 44
/home/blannon/dev/influence-usa/lobbying-federal-domestic/data/original/house_xml/LD2/2013/Q4/300616796.xml

this was a &#x10; in the LOBBYINGDISCLOSURE2/alis/ali_info/specific_issues/description field, which was either meant to be a linefeed or a DLE char.

Search Limitations

Are their general search options--or do searches have to be assigned specific fields? (For example, "issue area" etc.)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.