Comments (23)
from coronavirus-data.
@armsp you can get those data via American Community Survey from https://data.census.gov/cedsci/
If you have very specific questions, you can ask folks on the census' slack: https://www.census.gov/data/developers.html
from coronavirus-data.
@armsp https://github.com/nychealth/coronavirus-data#geography-zip-codes-and-zctas
from coronavirus-data.
from coronavirus-data.
@StevenLi-DS I became curious about how well the ZCTAs in NYC map onto one particular Census geographic unit (the Public Use Microdata Area, or "PUMA") -- I have some experience with working with these data in the PUMS context -- and found that there are 28 ZCTAs that have a maximum PUMA share of <60%. In other words, no more than 60% of the population in each of these ZCTAs is located in a single PUMA.
(Table below shows the results of this analysis.)
zcta | max_pop | min_pop | sum_pop | n | max_prop | min_prop |
---|---|---|---|---|---|---|
10003 | 26184 | 2966 | 56024 | 4 | 0.467371 | 0.052942 |
10037 | 10247 | 7169 | 17416 | 2 | 0.588367 | 0.411633 |
10306 | 29235 | 26674 | 55909 | 2 | 0.522903 | 0.477097 |
10451 | 22027 | 2684 | 45713 | 3 | 0.481854 | 0.058714 |
10456 | 42529 | 3875 | 86547 | 3 | 0.491398 | 0.044773 |
10457 | 34042 | 6419 | 70496 | 3 | 0.482893 | 0.091055 |
10458 | 47412 | 7873 | 79492 | 3 | 0.596437 | 0.099041 |
10461 | 24412 | 4852 | 50502 | 3 | 0.483387 | 0.096075 |
10462 | 41523 | 561 | 75784 | 3 | 0.547913 | 0.007403 |
10467 | 39973 | 24805 | 97060 | 3 | 0.411838 | 0.255564 |
10468 | 36204 | 11626 | 76103 | 3 | 0.475724 | 0.152767 |
10469 | 32167 | 899 | 66631 | 4 | 0.482763 | 0.013492 |
11206 | 37871 | 20525 | 81677 | 3 | 0.463668 | 0.251295 |
11207 | 45892 | 720 | 93386 | 4 | 0.491423 | 0.00771 |
11210 | 29799 | 124 | 62008 | 4 | 0.480567 | 0.002 |
11216 | 27628 | 2707 | 54316 | 3 | 0.508653 | 0.049838 |
11217 | 19970 | 3075 | 35881 | 3 | 0.556562 | 0.0857 |
11221 | 40226 | 38669 | 78895 | 2 | 0.509868 | 0.490132 |
11223 | 30790 | 20033 | 78731 | 3 | 0.391079 | 0.254449 |
11226 | 52928 | 20038 | 101572 | 3 | 0.521089 | 0.197279 |
11233 | 31278 | 15530 | 67053 | 3 | 0.466467 | 0.231608 |
11235 | 40583 | 38549 | 79132 | 2 | 0.512852 | 0.487148 |
11238 | 26568 | 589 | 49262 | 4 | 0.53932 | 0.011956 |
11368 | 58003 | 51928 | 109931 | 2 | 0.527631 | 0.472369 |
11370 | 22082 | 6515 | 39688 | 3 | 0.55639 | 0.164155 |
11419 | 24223 | 22988 | 47211 | 2 | 0.51308 | 0.48692 |
11432 | 35383 | 25426 | 60809 | 2 | 0.581871 | 0.418129 |
11435 | 29563 | 24124 | 53687 | 2 | 0.550655 | 0.449345 |
Given that it would not make any sense to simply follow the population proportions in allocating ZCTA-level cases, I am wondering if you or anyone have experience doing a more sophisticated allocation in this context. Hope this helps; happy to chat further.
from coronavirus-data.
Dear @igorgeyn. Thanks for the links you shared.
Some researchers like me might prefer census-tract level data is because many data in ACS is available on that geo level such as median household income, and we can merge it with this covid-19 dataset to have a better understand of the relationship between its prevalence and communities with different demographic characteristics. Meanwhile, NYC have more than two thousands census tracts so you would also end up with a much larger sample size than either PUMA or zipcode area.
Currently, I use 2018 census-tract level ACS 5-year estimates dataset and spatial join to determine all the census tracts within each zipcode area and then aggregate. For total population, I sum them up and for other variables I just take the mean.
One article from The Times mentioned:
The coronavirus has spread into virtually every corner of the city, and some wealthier neighborhoods have been overrun with cases, including some parts of Manhattan and Staten Island. But that may be because of the availability of testing in those areas. Nineteen of the 20 neighborhoods with the lowest percentage of positive tests have been in wealthy ZIP codes
Dr. Jessica Justman, an epidemiologist at Columbia University in Manhattan, said the numbers were most likely because many immigrants and low-income residents live with large families in small apartments and cannot isolate at home.
The data seems consistent with both statements:
from coronavirus-data.
@StevenLi-DS gotcha, that's helpful context. I think that's an interesting analysis, forsure. I am not as familiar with spatial joins between census tract and ZIP code/ZCTA.
Are there cases there in which ZIP codes/ZCTAs fall into multiple census tracts, similar to the situation I am seeing with ZCTA-to-PUMA? If so, how are you distributing the ZCTA-level cases among the census tracts in those situations?
Your analysis is indeed interesting and, funny enough, I've actually been spending time thinking about that NYT article as well, albeit from a bit of a different angle. (Namely, I've been working on a regression model to capture the variables you have visualized above as well a few others that get at the general 'lower-income-immigrants-in-a-cramped-house' notion.) All that to say: great minds think alike! :)
from coronavirus-data.
@igorgeyn Since most census tracts are smaller than a zip code area so a zipcode area will contain multiple tracts. Indeed, you would expect some tracts go across the boundary between two zipcode areas. I did a spatial join with only the centroid of each tract, but you could also do it with the polygon of each.
I believe this data will eventually be available on a census tract level so I won't try anything more complicated at this moment.
from coronavirus-data.
@StevenLi-DS
Some researchers like me might prefer census-tract level data is because many data in ACS is available on that geo level such as median household income
This data is available from the ACS by ZCTA as well, e.g.
from coronavirus-data.
We understand that census tracts are a more informative geography to evaluate neighborhood characteristics compared with ZIP codes. Unfortunately census tract assignment for each case requires a level of address cleaning that we are unable to perform daily at this time. We hope to provide census tract case counts in the future.
from coronavirus-data.
@mmontesanonyc Thanks for the information. I'm definitely looking forward it!
from coronavirus-data.
Hi, @algorythmic. Thanks for the info. Indeed. I haven't been working on zipcode data for a while and it turns out the new data.census.gov has a bug and the acs tables didn't show up when I was looking for it. I just confirm it with the folks at the Census, and they are working on it. Right now you have to select the whole US to be able to select ACS tables.
@igorgeyn this should make life easier
from coronavirus-data.
@mmontesanonyc I know you guys might be very busy to even got the zip code level data cleaned before uploading here. But do you know when (like an estimated time) the census tract level data would be available?
from coronavirus-data.
from coronavirus-data.
@igorgeyn as we talked about before, each ZCTA would contain multiple tracts, so we need to the specific tract of the cases.
from coronavirus-data.
@mmontesanonyc are you able to advise what it means for a case to be in a tract? Is it the address of residence/origin for a patient who tests for COVID-19 or something else?
from coronavirus-data.
@mmontesanonyc indeed, I do believe that what it should be since I would like to link it to other demographic data.
from coronavirus-data.
from coronavirus-data.
@mmontesanonyc are you able to advise what it means for a case to be in a tract? Is it the address of residence/origin for a patient who tests for COVID-19 or something else?
Yes. Along the lines of our note in Readme about tests-by-zcta being by ZIP of residence, it would be patients' residence.
from coronavirus-data.
@StevenLi-DS Sir, could you please point out the link(s) from where you got the "Median Household Income" data and you then combined with ZCTA or MODZCTA data in this repository?
I am also looking for "Median Household Income" by zip codes and "People per household" by zip codes data.
It would be really helpful if you could point it out.
from coronavirus-data.
@StevenLi-DS Thank you so much, I was wondering that since you have already done some work on "Median Income" as well as "Household Size" you can maybe point me towards those data. I am exploring the website as we speak, but all I see are aggregate values. I am not seeing anything per zipcode or a proper dataset.
from coronavirus-data.
@armsp you should ask folks on census' slack, and they are likely be stored in different tables. There isn't an easy to share them with you with a link.
from coronavirus-data.
@StevenLi-DS I found a way to get the data.
The only thing I am stuck with is, here we have COVID cases per MODZCTA. But the census gives data per ZIP Code. Based on my understanding, MODZCTA combines a few Zip Codes together.
In such a case, how do I aggregate "Median Household Incomes" from multiple ZIP Codes to get the "Median Household Income" for a particular MODZCTA?
from coronavirus-data.
Related Issues (20)
- Total number of pediatric deaths 2-4 in NYC vs 0-1 HOT 5
- antibody-by-modzcta-by-week HOT 1
- calculation error in NYC 28-day-average daily percent positive ? HOT 4
- antibody data HOT 6
- Why are case rates among boosted folks higher than unboosted ones? HOT 2
- Please help connect May and September antibody data; there appears to be an error HOT 3
- How to identify hospitalizations due to COVID vs hospitalized patients who happen to have COVID? HOT 1
- The 06/09 data update is missing latest/now-weekly-breakthrough.csv HOT 1
- Covid Alert Level HOT 1
- Data for vaccination status has not updated since 6/19 HOT 1
- Disagreement in hospitalizations between weekly breakthrough an 7-day average HOT 3
- now-weekly-breakthrough.csv Not Updated for 11 Days HOT 1
- Question about total_covid_tests in data-by-modzcta.csv HOT 1
- Weekly breakthrough no longer updated or reported on website? HOT 1
- last7days-by-modzcta update frequency HOT 2
- now-weekly-breakthrough.csv: Definition of "Unvaccinated" & where do partially vaxxed show up here? HOT 2
- Data updating HOT 2
- NYC Dept of Health Data HOT 2
- Weekly Rates vs Daily Counts HOT 2
- PERCENT_POSITIVE indicator field is missing.
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from coronavirus-data.