Comments (15)
Hi. I'm not seeing that. Can you provide the command line args you're using and also an example filename? Thanks!
from congress.
So out of many examples, one is data/108/votes/2004/h405/data.json
. If I do a search of the document, one of the legislator ids will be "0000000". I scraped it last night using ./run votes --congress=108 --session=2004 --force
.
from congress.
Yeah, I see it. Run ./run votes --vote_id=h405-108.2004
and then look at data/108/votes/2004/h405
. One of the voters is:
{
"display_name": "Butterfield",
"id": "0000000",
"party": "D",
"state": "NC"
}
The 0's appear in the original data:
http://clerk.house.gov/evs/2004/roll405.xml
Something to report to the Clerk, I think.
from congress.
Ahha. I hadn't scraped that far back. On GovTrack I used to fall back to name/state (so, fwiw, the data is complete there: http://www.govtrack.us/data/us/108/rolls/h2004-405.xml).
from congress.
Okay. I'll report it to the Clerk.
On Feb 28, 2013 9:07 AM, "Joshua Tauberer" [email protected] wrote:
Ahha. I hadn't scraped that far back. On GovTrack I used to fall back to
name/state (so, fwiw, the data is complete there:
http://www.govtrack.us/data/us/108/rolls/h2004-405.xml).—
Reply to this email directly or view it on GitHubhttps://github.com//issues/46#issuecomment-14237448
.
from congress.
I committed a check for 0000000. Maybe we want to make it an error condition?
from congress.
OK, I've made 0000000 an error condition. I also made improperly parsing the legis-num an error condition, and added "MOTION" to the list of acceptable values it can have.
(The ticket can stay open 'til the data's fixed.)
from congress.
This hasn't been fixed yet, and I've confirmed (myself, and with the Clerk) that it only affects this one person, and only a specific time frame: vote No. 405 through No. 544. (The first vote he took following his special election, until the end of that Congress.)
Given that, should I simply hardcode a fix in the scraper for that value for that time?
from congress.
I vote yes.
from congress.
Yeah, I think it can only reduce the amount of error in the scraper's output, even in the long run. I'll do this.
from congress.
But if they make the same error w/ a different member, then we won't be able to catch it.
from congress.
Well, we'd catch it the same way we caught this. And right now, this is causing a big swathe of invalid data. It seems unlikely to happen for another member, especially since we now know the cause - that the guy was specially elected mid-session. So as long as we only do it for House votes between these two numbers in that year, the only way it'll fail us is if it develops for someone else during that specific time period. So the worst case is we'll be in the same situation we're in right now, and the best (and most likely) case is it's all fixed.
from congress.
Did they give any indication that they would fix the issue upstream?
from congress.
Yes, but only "at some point".
from congress.
This still isn't fixed upstream, btw.
I've replaced the previous fix with a more generic name lookup in 08f4025. (Through the 107th Congress there were no bioguide IDs listed for anyone!)
from congress.
Related Issues (20)
- [Bug] Error handling in govinfo.py line 73 HOT 5
- [Bug] Votes scraper not pulling in most recent vote, until I cleared cache HOT 2
- [Bug] Bad zip file HOT 1
- Newbie Q: Pulling bills for only one topic HOT 2
- Is there any interest in using govinfo's bulkdata zip files HOT 1
- Error: ImportError: No module named html.entities after the Feb 28th update HOT 4
- Unable to scrape Committee meetings HOT 1
- Downloading House votes in 2001 and 1991 raises exception HOT 5
- Error in parsing sponsor & byRequest HOT 4
- Discrepancies on amendment roll call votes
- Update PyPI Package HOT 8
- (votes, committee_meetings): senate.gov and clerk.house.gov not redirecting to https
- Correct Virtual Env Suggestion
- Request - Include Mastodon ID for members of congress HOT 2
- Error from lxml when parsing amendments "purpose" field HOT 1
- Bills & data.json HOT 1
- Errors when parsing amendments for 118th Congress
- Diff: draft-ietf-httpbis-unprompted-auth-06.txt - draft-ietf-httpbis-unprompted-auth-07.txt
- Bulkdata download from sunlight foundation error HOT 3
- googletagmanager.com/gtag/js?id="+oCONFIG.GWT_GA4ID%5B0%5D);head.appendChild(GA4Object);window.dataLayer=window.dataLayer%7C%7C%5B%5D
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from congress.