Code Monkey home page Code Monkey logo

census's Introduction

census

A simple wrapper for the United States Census Bureau's API.

Provides access to ACS and SF1 data sets.

Install

pip install census

You may also want to install a complementary library, us, which help you figure out the FIPS codes for many geographies. We use it in the examples below.

pip install us

Usage

First, get yourself a Census API key.

from census import Census
from us import states

c = Census("MY_API_KEY")
c.acs5.get(('NAME', 'B25034_010E'),
          {'for': 'state:{}'.format(states.MD.fips)})

The call above will return the name of the geographic area and the number of homes that were built before 1939 for the state of Maryland. Helper methods have been created to simplify common geometry calls:

c.acs5.state(('NAME', 'B25034_010E'), states.MD.fips)

Full details on geometries and the states module can be found below.

The get method is the core data access method on both the ACS and SF1 data sets. The first parameter is either a single string column or a tuple of columns. The second parameter is a geoemtry dict with a for key and on option in key. The for argument accepts a "*" wildcard character or Census.ALL. The wildcard is not valid for the in parameter.

By default, the year for a dataset is the most recent year available. To access earlier data, pass a year parameter to the API call:

c.acs5.state(('NAME', 'B25034_010E'), states.MD.fips, year=2010)

The default year may also be set client-wide:

c = Census("MY_API_KEY", year=2010)

Detailed information about the API can be found at the Census Data API User Guide.

Datasets

For each dataset, the first year listed is the default.

Geographies

The API supports a wide range of geographic regions. The specification of these can be quite complicated so a number of convenience methods are provided. Refer to the Census API documentation for more geographies beyond the convenience methods.

Not all geographies are supported in all years. Calling a convenience method with a year that is not supported will raise census.UnsupportedYearException.

Geographic relationship files are provided on the Census developer site as a tool to help users compare the geographies from the 1990, 2000 and 2010 Censuses. From these files, data users may determine how geographies from one Census relate to those from the prior Census.

ACS5 Geographies

  • state(fields, state_fips)
  • state_county(fields, state_fips, county_fips)
  • state_county_blockgroup(fields, state_fips, county_fips, blockgroup)
  • state_county_subdivision(fields, state_fips, county_fips, subdiv_fips)
  • state_county_tract(fields, state_fips, county_fips, tract)
  • state_place(fields, state_fips, place)
  • state_congressional_district(fields, state_fips, congressional_district)
  • state_legislative_district_upper(fields, state_fips, legislative_district)
  • state_legislative_district_lower(fields, state_fips, legislative_district)
  • us(fields)
  • state_zipcode(fields, state_fips, zip5)

ACS1 Geographies

  • state(fields, state_fips)
  • state_congressional_district(fields, state_fips, district)
  • us(fields)

SF1 Geographies

  • state(fields, state_fips)
  • state_county(fields, state_fips, county_fips)
  • state_county_subdivision(fields, state_fips, county_fips, subdiv_fips)
  • state_county_tract(fields, state_fips, county_fips, tract)
  • state_place(fields, state_fips, place)
  • state_congressional_district(fields, state_fips, district)
  • state_msa(fields, state_fips, msa)
  • state_csa(fields, state_fips, csa)
  • state_district_place(fields, state_fips, district, place)
  • state_zipcode(fields, state_fips, zip5)

PL Geographies

  • state(fields, state_fips)
  • state_county(fields, state_fips, county_fips)
  • state_county_subdivision(fields, state_fips, county_fips, subdiv_fips)
  • state_county_tract(fields, state_fips, county_fips, tract)
  • state_county_blockgroup(fields, state_fips, county_fips, blockgroup)
  • state_place(fields, state_fips, place)
  • state_congressional_district(fields, state_fips, district)
  • state_legislative_district_upper(fields, state_fips, legislative_district)
  • state_legislative_district_lower(fields, state_fips, legislative_district)

States

This package previously had a census.states module, but now uses the us package.

>>> from us import states
>>> print states.MD.fips
u'24'

Convert FIPS to state abbreviation using lookup():

>>> print states.lookup('24').abbr
u'MD'

BYOS - Bring Your Own Session

If you'd prefer to use a custom configured requests.Session, you can pass it to the Census constructor:

s = requests.session()
s.headers.update({'User-Agent': 'census-demo/0.0'})

c = Census("MY_API_KEY", session=s)

You can also replace the session used by a specific data set:

c.sf1.session = s

Examples

The geographic name for all census tracts for county 170 in Alaska:

c.sf1.get('NAME', geo={'for': 'tract:*',
                       'in': 'state:{} county:170'.format(states.AK.fips)})

The same call using the state_county_tract convenience method:

c.sf1.state_county_tract('NAME', states.AK.fips, '170', Census.ALL)

Total number of males age 5 - 9 for all states:

c.acs5.get('B01001_004E', {'for': 'state:*'})

The same call using the state convenience method:

c.acs5.state('B01001_004E', Census.ALL)

Don't know the list of tables in a survey, try this:

c.acs5.tables()

census's People

Contributors

andytay avatar arturo-ramos avatar bahoo avatar bsweger avatar charlie-kramer avatar detinator10 avatar dwbond avatar fgregg avatar hancush avatar javadocmd avatar jcarbaugh avatar jcgiuffrida avatar jeancochrane avatar jiffyclub avatar joehand avatar koshy1123 avatar mattspence avatar maxghenis avatar mr-fuller avatar noahlee826 avatar palewire avatar prha avatar rifferreinert avatar ronnie-llamado avatar rrizner7 avatar ryanvmenezes avatar selik avatar sirwart avatar sweatercomeback avatar thegreatmagnet avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

census's Issues

error in fetch of 2010 SF1 data

I'm trying to write a routine that will pull the same data for the years 2010, 2000 and 1990. All of the data comes from the SF1 file. Here's how I wrote a query for 2010:

c.sf1.get(
    (
        'P005001', # total
        'P005003', # white
        'P005004', # black
        'P005006', # asian
        'P004003', # latino
    ),
    geo={'for':'place:53476', 'in':'state:06'},
    year=2010
)

which returns this:

[{'P005001': 7461.0,
  'P005003': 5754.0,
  'P005004': 36.0,
  'P005006': 152.0,
  'P004003': 1339.0,
  'state': '06',
  'place': '53476'}]

I know the above output is correct based on FactFinder.

I then tried to write a query to pull the same data points from the 2000 SF1. The variable names are different (here's what I'm looking at to refer to 2000 and 2010) so I had to tweak those slightly. But this query breaks:

c.sf1.get(
    (
        'P008001', # total
        'P008003', # white
        'P008004', # black
        'P008006', # asian
        'P008010', # latino
    ),
    geo={'for':'place:53476', 'in':'state:06'},
    year=2000
)

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-50-98c9b83b2ca4> in <module>
      8     ),
      9     geo={'for':'place:53476', 'in':'state:06'},
---> 10     year=2000
     11 )

~/.virtualenvs/censusqueries/lib/python3.6/site-packages/census/core.py in get(self, *args, **kwargs)
    412 
    413     def get(self, *args, **kwargs):
--> 414         self._switch_endpoints(kwargs.get('year', self.default_year))
    415 
    416         return super(SF1Client, self).get(*args, **kwargs)

~/.virtualenvs/censusqueries/lib/python3.6/site-packages/census/core.py in _switch_endpoints(self, year)
    402             self.groups_url = 'https://api.census.gov/data/%s/dec/%s/groups.json'
    403         else:
--> 404             self.endpoint_url = super(ACSClient, self).endpoint_url
    405             self.definitions_url = super(ACSClient, self).definitions_url
    406             self.definition_url = super(ACSClient, self).definition_url

TypeError: super(type, obj): obj must be an instance or subtype of type

This appears to be a pure python error.

I've also tried doing this for 1990 (with different column names again) and the same thing happens.

Any help would be appreciated. Thanks.

SyntaxError: invalid syntax

from census import states
Traceback (most recent call last):
File "", line 1, in
File "/census_api/src/census/census/states.py", line 59
FIPS = {v: k for (k, v) in STATES.items()}
^
SyntaxError: invalid syntax

I am using python 2.6.5 on Ubuntu 10.04

Census.ALL does not function

It appears that using Census.ALL instead of something of the form {'for': 'state:*'} does not work, even on the core example in the README:

>>> c.acs5.get('B01001_004E', Census.ALL)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/poulson/miniconda3/lib/python3.6/site-packages/census/core.py", line 298, in get
    return super(ACSClient, self).get(*args, **kwargs)
  File "/home/poulson/miniconda3/lib/python3.6/site-packages/census/core.py", line 155, in get
    merged_results = [merge(result) for result in zip(*all_results)]
  File "/home/poulson/miniconda3/lib/python3.6/site-packages/census/core.py", line 154, in <genexpr>
    for fifty_fields in chunks(fields, 50))
  File "/home/poulson/miniconda3/lib/python3.6/site-packages/census/core.py", line 64, in wrapper
    result = func(self, *args, **kwargs)
  File "/home/poulson/miniconda3/lib/python3.6/site-packages/census/core.py", line 170, in query
    'for': geo['for'],
TypeError: string indices must be integers

I installed the census API via pip: is it possible that this issue has already been fixed in master?

What would be the best way to query and return all the variables available on a table?

Let's say I'm interested in exploring a topic I've never worked with before. Say, for example, Census data tracking languages spoken at home.

In this use case, even if I am a very skilled Python programmer (lol, remember this is a fantasy) and even if I have some experience with Census data, I am unlikely to know either the table id (in this case, B16001) or the fields on the table (e.g. that B16001_024E is the total number of people who speak Greek at home).

In a perfect world, I think this library would help the user find the table they're interested in, and then enumerate the fields within it.

I took a baby step in this direction with PR #61, which allows you get to a raw JSON list of all tables, but it's a long way from a solution. The good news is that all of the data we need to solve the problem is available in the Census API. https://api.census.gov/data/2016/acs/acs5/groups/B16001.json

If the crowd here agrees this is a worthy problem statement and use case, I'd like to start a discussion about how to solve it.

Move to AIOHTTP (asyncio)

Would there be any interest in moving Census from Requests to python asyncio / AIOHTTP? I'm probably going to attempt to port this over so I can asynchronously dispatch multiple requests, unless there's a better way to optimize making a bunch of api requests.

Would it be worth me putting it on a branch for you guys to take a look at?

Stop supporting Python 2

Python 2 has been sunset effective 2020-01-01. It looks like supporting it here creates extra work, e.g. a test breaking #82.

Census Metadata function?

Is there any interest in a function to retrieve metadata (variables, names, labels, concepts, etc.) for datasets?
I've started working on something here.

You included a key with this request, however, it is not valid.

When I was making an API request using the demo code, I received APIKeyError. When I clicked the link in the email to activate the key, it redirected me to an error page saying "You've attempted to validate an unknown key. If it has been more than 48 hours since you submitted your request for this API key then the request has been removed from the system. Please request a new key and activate it within 48 hours." Did someone meet the same issue, and maybe have figured it out?

census.acs5_state does not work for years from 2010-2015

To whom it may concern:

I am trying to use census.acs5_state and census.acs5_state_place. They only worked well for the year 2009 and 2016, but not working for years from 2010 to 2015. Is there anything wrong with these years?

Thanks.

'census.acs5.get' can't retrieve data on county subdivision level for a certain county.

Dear datamade team,

I found out that c.acs5. state_county_subdivision can only retrieve 2016 data as the latest year so I'm trying to retrieve data using c.acs5.get as I can get 2017 this way. But I couldn't refine the county subdivision data results to just a specific county.

The code I used:

c.acs5.get(('NAME', 'B01001_001E', 'B01001_002E', 'B01001_003E', 'B01001_004E', 'B01001_005E', 
            'B01001_006E',   'B01001_026E', 'B01001A_001E', 'B01001B_001E'),
          {'for':'county subdivision:*', 'in':'county: 077', 'in':'state: 42'}, year = 2017)

But this will give me data for all county subdivisions in Pennsylvania while I only want the county subdivision data for Lehigh County, PA.

Is there anyway to achieve that? Thanks.

Update year info in docs for ACS5

Currently the README has the default year for the ACS 5 Year Estimates as 2013 and that the latest year available is 2013. However, the ACS5Client supports 2014 and has it as the default.

These discrepancies should be resolved.

Parsing Variables Document (Client.fields)

In addition to issue #14, it looks like there are other issues parsing the variables documents in the Client.fields function. If I run it as is, the returned data is empty. It seems like the Census changed format for the XML document.

I created a branch using the JSON variable documentation instead of XML. Let me know if I should open a PR for that. (edit: went ahead and opened PR #18)

But I couldn't quite figure out what the old code was trying to return when flat=False. Right now I return the variables as a nested object, which seemed right for flat=False.

Endpoint getting 404 on some years

Some queries are returning an HTML page (I believe from a 404 response) with the message "The requested resource is not available". Here is an example of a request that encounters this problem:

c = Census(my_api_key, year=2015)
results = c.acs5.state_county('C24010_041E', states.CA.fips, '037')
print(results)

But if I change the year to 2016, it works (returns results without any errors).

Hope this helps, let me know if there is any other info I can provide.

Raise unsupported geography if variable response returns null

If you request a variable that is not available at a specific geography, then the response return null. I think this should cause the query to raise a CensusException.

Example: Median monthly housing costs is available as the tract level
http://api.census.gov/data/2014/acs5/?get=B25105_001E&for=tract:*&in=state:17%20county:001

but not at the block group level:

http://api.census.gov/data/2014/acs5/?get=B25105_001E&for=block%20group:*&in=state:17%20county:001%20tract:00100

If @jcarbaugh agrees, then I'll make a PR.

acs5.state_county_blockgroup error

Exception Value: | error: unknown/unsupported geography heirarchy census\core.py in query, line 200
I'm not up to speed in python to try to troubleshoot, but the problem may be in the call, where according to the Census API, you are missing tract in that call:

https://api.census.gov/data/2017/acs/acs5/examples.html
state› county› tract› block group

Your command: state_county_tract(fields, state_fips, county_fips, tract)

Add install instructions

It would be great to have the installation instructions for the package, including install from PyPi and compiling from source.

import in Client.__init__ doesn't make request visible to rest of class

I ran into an import-related problem using v 0.7 of census, which I document in https://github.com/rdhyee/working-open-data-2014/blob/postscript/notebooks/Day_XX_census_0.7_import_issue.ipynb

In summary: I think the problem comes in how the requests module is imported. The requests module is imported inside the Client.__init__ method to be subsequently used in the fields method. However, it doesn't seem requests is visible in that context and hence the exception. (My hypothesis is corrobarated in Python import in init() - Stack Overflow.)

OriginDestinationDB changed

I've just been trying an old script of mine which imports python-census:
from census.origin_destination_db import OriginDestinationDB

But now it fails:
ModuleNotFoundError: No module named 'census.origin_destination_db'

Have you changed something here?

Can't Retrieve data at County level - Example please ?

I am trying to get population using ACS1 at a county level or a zip code level if possible.
ACS5 supports a method called zipcode which makes this possible as recommended here (#76). However, for my needs, I need this data through ACS1, so what I tried was :

   # Define a consensus class object to get population estimates 
   consensus_object = Census(api_key_census)

   # Get total population estimates 
   consensus_object.acs1.get(('NAME', 'B01003_001E'), {'for': 'county', 'in': 'county:36047, state:36'})

Trying to frame this similar to (#73). However, one thing that I am unable to figure out is:

  1. Does ACS1 support this ?
  2. If it does, can someone provide a working example ?
  3. Assuming, there is no way to do this, can anyone suggest alternatives with this or a different package ?

ACS5 2018 compatibility

2018 census data is now available.

I've edited my version of the module locally to get access.

But just need the bump in supported years:

328698d

Is it possible to query S###_C##_###E variables from ACS5?

I'm looking for household income distribution data from the S1901 table, e.g. S1901_C01_011E (households with income >$200k). I can get this from the API directly via https://api.census.gov/data/2018/acs/acs5/subject?get=S1901_C01_011E&for=us:*, but can't find a way to get it through the package.

For example, neither of these work:

c.acs5.get('S1901_C01_011E', {'for': 'state:*'})
c.acs5.get('S1901_011E', {'for': 'state:*'})

though the published example does:

c.acs5.get('B01001_004E', {'for': 'state:*'})

Is this possible?

Can this code please be documented?

I'm trying to figure out what each function does in order to use it. Some of the code seems to be written in Python 2 & 3 so trying to follow the workflow can be confusing. Can this code be documented and have comments added so the code can be readable? _

Integer casting generates errors when API returns floats

Commit 179c9d5 is causing issues because the Census API is often returning floats for fields marked with predicateType 'int'.

It seems like the API is using the predicateType 'int' for all numerics. Fields that represent a numeric mean across a geo have this mismatch:

$ curl https://api.census.gov/data/2015/acs5/variables/B23020_002E.json
{
  "name": "B23020_002E",
  "label": "Mean usual hours--!!Male",
  "concept": "B23020.  Mean Hours Worked in the Past 12 Months for Workers 16 to 64 Years",
  "predicateType": "int",
  "group": "N/A",
  "limit": 0,
  "validValues": [
  ]
}

The Census endpoint is returning a value of '41.4' here.

Not sure what the right behavior should be here, because I'm not sure what the value of commit of 179c9d5 is, in the absence of consistent response handling.

get data for all block groups in a state

Hi I want to be able to look at all the block groups in a state.

So for example to look at all the block groups in a county I would go:

c.acs5.state_county_blockgroup('NAME', states.AK.fips, '170', Census.ALL)
But if I try to reply the "county" argument with Census.ALL I get an error.

c.acs5.state_county_blockgroup('NAME', states.AK.fips,Census.ALL, Census.ALL)
Is there a way to do this?

AttributeError: type object 'Census' has no attribute 'ALL'

I tried running the Alaska example on the latest version (.3) using Python 2.7. Here's the example I used:

c.sf1.state_county_tract('NAME', states.AK.fips, '170', Census.ALL)

I received this error:

AttributeError: type object 'Census' has no attribute 'ALL'

Any thoughts?

Thanks!

support for state_place_tract, state_place_blockgroup, and state_place_block

I have extended the census class to implement methods for getting data on the the tracts, blockgroups, and blocks that intersect with a census incorporated place.

https://github.com/datamade/census_area/blob/master/census_area/__init__.py

This use's the census's tiger web rest APIs to get the geometry of a place and then find the tracts, blocks groups, or blocks within the place. Because their API limits the complexity of target geometry to use for the intersection, I also use shapely to calculate intersections locally.

Is this functionality that you would like in this library. Shapely is, unfortunately, a kind of big dependency.

Missing Block group data for 2016

So when I run this code below,
c = Census ("blhablahblah", year = 2016)
cen= c.acs5.get (("NAME",
"B11001A_001E",
"B11001A_001M"), {'for': 'tract:*',
'in': 'state:{} county:83'.format(states.PA.fips)})
cen_pd = pd.DataFrame(cen)
print(cen_pd.head())
I receive an empty dataframe as below:
Empty DataFrame
Columns: []
Index: []
But on the package documentation https://github.com/datamade/census it mentions that you have 2016. Not sure where I am going wrong.

Documentation Cleanup (Zip)

Geographic elements like zipcode aren't named in an obvious way.
"zip code tabulation area" is an unwieldy way to reference zipcode.

release of .6

I'm hoping to integrate this api into a library which is used to get census data into Pandas dataframes. https://github.com/synthicity/popgen The current version of this library on pypi seems to have a few issues, but the version in github seems to be fine. Is there any way you guys could upload the latest version to pypi and call it a release? Thanks!

ACS5 2016 Block groups

I'm getting an unsupported year exception when trying to query acs5.state_county_blockgroup():

census.core.UnsupportedYearException: Geography is not available in 2016. Available years include (2015, 2014, 2013, 2012, 2011, 2010, 2009)

Support for 2016 release in ACS5 and ACS1 without breaking releases prior to 2015

The 2012-2016 5-year ACS data is available through the Census API.

The Census Bureau is testing some changes with an alternate endpoint. Switching to this endpoint for ACS1 has broken functionality for years 2014 and prior.

Can we add support for 2016 ACS5 without breaking functionality for 2014 and prior? Can the same fix be applied to the ACS1 issue?

Code Suffixes Ending in .5

Does anyone know if you can pull data for code that end in .5 (example B25088_0.5)? It does not appear that you are able to pull data through the API call for codes that end with decimals and I cannot find documentation that references this type of situation.

Any insight would be greatly appreciated.

Update 2010 SF1 endpoint

I don't see it mentioned anywhere on the Census site, but I got an email that the 2010 SF1 endpoint is changing from /data/2010/sf1 to /data/2010/dec/sf1 and that the old endpoint will be removed August 30th. It looks like this could be addressed the same way as the _switch_endpoints method in ACSClient

API returns value of -666666666

Some calls to acs5.state_county_tract() return the value -666666666. This seems like a semantically significant value, but I can't find documentation on what it means.

For a reproducible example, this call returns the value in question:

>>> c = Census(os.environ['CENSUS_API_KEY'], year=2016)
>>> c.acs5.state_county_tract('B19081_001E', 42, 101, '989100')

[{'B19081_001E': -666666666.0, 'state': '42', 'county': '101', 'tract': '989100'}]

Any thoughts on what this response means, and how I should handle it? So far I've just been converting them to nulls.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.