Code Monkey home page Code Monkey logo

opengovernment-local's Introduction

OpenGovernment is a Ruby on Rails application for aggregating and presenting open government data.

Overview

This project powers OpenGovernment.org and was started by the Participatory Politics Foundation.

We hope you'll get involved! Read our Contributors' Guide for details.

  • Mailing list: Join our developer list.
  • IRC: Find us in chat.freenode.net channel #opengovernment.
  • Development Roadmap: December 2012.
  • Project management & bug tracker: to come, will be updated Dec. 2012. Previously on Pivotal Tracker & Lighthouse.

Visit our Wiki for full installation instructions.

opengovernment-local's People

Contributors

walter avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Forkers

janbenson

opengovernment-local's Issues

pa-philadelphia María Quiñones-Sánchez page contains unicode chars

Error when Billy scrapes this page:

02:05:18 INFO scrapelib: GET -
http://philadelphiacitycouncil.net/council-members/councilwoman-maria-d-quinones-sanchez-7th-district/councilwoman-maria-d-quinones-sanchez-contact/
Traceback (most recent call last):
File "/u/apps/virtualenvs/billy/src/billy/billy/ext/ansistrm.py", line 56, in emit
stream.write(message)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xf1' in position 45:
ordinal not in range(128)
Logged from file init.py, line 177

Here are the obvious non-ascii chars I can see, indeed the first one is Unicode point F1:

Not certain, but I'm wondering if Billy might be constrained to the ascii character set:

billy$ find . -type f -print | xargs grep ascii | egrep -v git
./billy/importers/bills.py:            r.encode('ascii', 'replace') for r in remaining]))
./billy/importers/committees.py:            logger.debug("No matches for %s" % member['name'].encode('ascii',
./billy/web/api/emitters.py:                           ensure_ascii=False)
./billy/web/api/emitters.py:        return obj.encode("ascii", "replace")
./billy/scrape/bills.py:        return filename.encode('ascii', 'replace')
./billy/scrape/legislators.py:        return filename.encode('ascii', 'replace')
./billy/scrape/legislators.py:        return filename.encode('ascii', 'replace')
./billy/utils/fulltext.py:        text = text.encode('ascii', 'ignore')
./billy/utils/fulltext.py:        text = text.decode('utf8', 'ignore').encode('ascii', 'ignore')

pa-philadelphia Brian J. O'Neill contact page contains multiple paragraphs

http://philadelphiacitycouncil.net/council-members/councilman-brian-j-oneill-10th-district-minority-leader/councilman-brian-j-oneill-contact/

Scraper results:

18:39:07 INFO scrapelib: GET - http://philadelphiacitycouncil.net/council-members/councilman-brian-j-oneill-10th-district-minority-leader/councilman-brian-j-oneill-contact/
18:39:09 WARNING billy: Skipped paragraphs:
<p><strong>City Hall, Room 562</strong><br>
Philadelphia, PA 19107<br>
T:(215) 686-3422<br>
F:(215) 686-1939<br>
Hours: 8:30AM-5:00PM</p>

<p><strong>Neighborhood Offices</strong></p>

<p><strong>Northeast Office</strong><br>
Bustleton &amp; Bowler, 2nd FL.<br>
Philadelphia, PA 19115<br>
T:(215) 685-0432<br>
F:(215) 685-0436<br>
Hours: 8:30AM-5:00PM</p>

<p><strong>FOP Heroes Hall</strong><br>
11630 Caroline Road<br>
Philadelphia, PA 19154<br>
T:(215) 437-9167<br>
F:(215) 437-9350<br>
Hours: 8:30AM-5:00PM</p>

<p><strong>Email Councilman O&#8217;Neill:</strong></p>

<p><a href="http://[email protected]" target="_blank">Brian.O&#8217;[email protected]</a></p>
                                        &#13;

18:39:09 INFO billy: PART City Hall Office
18:39:09 WARNING billy: Skipped: City Hall Office
18:39:09 INFO billy: save person Brian J. O'Neill

ca-san-jose scraper hangs then fails

Running ca-san-jose scraper:

19:37:14 INFO billy: save person Chuck Reed
19:37:14 INFO scrapelib: GET - http://www.sjdistrict1.com/
19:37:14 INFO billy: save person Pete Constant
19:37:14 INFO scrapelib: GET - http://sjdistrict2.com
Traceback (most recent call last):
  File "/u/apps/virtualenvs/billy/bin/billy-update", line 9, in <module>
    load_entry_point('billy==1.7.0', 'console_scripts', 'billy-update')()
  File "/u/apps/virtualenvs/billy/src/billy/billy/bin/update.py", line 393, in main
    run_record += _run_scraper(stype, args, metadata)
  File "/u/apps/virtualenvs/billy/src/billy/billy/bin/update.py", line 104, in _run_scraper
    scraper.scrape(time, chambers=chambers)
  File "/u/apps/opengovernment-local-staging/scrapers/ca-san-jose/legislators.py", line 92, in scrape
    councilmember_doc = lxml.html.fromstring(self.urlopen(url))
  File "/u/apps/virtualenvs/billy/lib/python2.7/site-packages/scrapelib/__init__.py", line 393, in urlopen
    raise HTTPError(resp)
scrapelib.HTTPError: 403 while retrieving http://sjdistrict2.com/

Billy 1.7.0 incompatible with current scrapelib

We have not yet updated our scrapers to use new Billy module loading per this commit:
openstates/billy@ddcdb91#billy/scrape/__init__.py

As a workaround, we have downgraded to Billy 1.7.0, however Billy 1.7.0 is not compatible with current scrapelib, producing these errors:

TypeError: __init__() got an unexpected keyword argument 'timeout'
TypeError: __init__() got an unexpected keyword argument 'cache_obj'
TypeError: __init__() got an unexpected keyword argument 'cache_write_only'

Best fix: update scrapers to conform to edge Billy.

Short-term workaround: fork billy 1.7.0 and remove these lines from /src/billy/billy/scrape/init.py:

        kwargs['cache_obj'] = scrapelib.FileCache(settings.BILLY_CACHE_DIR)
        kwargs['timeout'] = settings.SCRAPELIB_TIMEOUT
            kwargs['cache_write_only'] = False

pa-philadelphia Curtis Jones Jr. contact page idiosyncracies

http://philadelphiacitycouncil.net/council-members/councilman-curtis-jones-jr-4th-district/contact-councilman-curtis-jones-jr/

Scraper results:

18:39:17 INFO billy: PART City Hall, Room 404
18:39:17 INFO billy: PART , Philadelphia, PA
18:39:17 INFO billy: PART Email: [email protected]
18:39:17 INFO billy: PART (215) 686-3416, (215) 686-3417
18:39:17 INFO billy: PART Fax: (215) 686-1934
18:39:17 INFO billy: PART Local Office:
18:39:17 WARNING billy: Skipped: Local Office:
18:39:17 INFO billy: PART 5398 Wynnefield Avenue, Philadelphia PA
18:39:17 WARNING billy: Skipped: 5398 Wynnefield Avenue, Philadelphia PA
18:39:17 INFO billy: PART Phone: (215) 685-0293, (215) 685-0295
18:39:17 WARNING billy: Already have phone numbers for one office: Phone: (215) 685-0293, (215) 685-0295
18:39:17 INFO billy: save person Curtis Jones, Jr.

represent-boundaries v1.0.0 no longer available

Building a the Python environment specified by requirements.txt now returns this error:

Obtaining represent-boundaries from git+http://github.com/rhymeswithcycle/[email protected]#egg=represent-boundaries (from -r /u/apps/opengovernment-local-staging/requirements.txt (line 13))
Cloning http://github.com/rhymeswithcycle/represent-boundaries.git (to v1.0.0) to ./src/represent-boundaries
Could not find a tag or branch 'v1.0.0', assuming commit.
error: pathspec 'v1.0.0' did not match any file(s) known to git.
Complete output from command /usr/bin/git checkout -q v1.0.0:

Indeed, this repo shows no branch or tag by that name anymore; I suspect requirements.txt needs to be updated:
https://github.com/rhymeswithcycle/represent-boundaries

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.