Code Monkey home page Code Monkey logo

Comments (23)

igorgeyn avatar igorgeyn commented on May 10, 2024 1

from coronavirus-data.

stevenlis avatar stevenlis commented on May 10, 2024 1

@armsp you can get those data via American Community Survey from https://data.census.gov/cedsci/

If you have very specific questions, you can ask folks on the census' slack: https://www.census.gov/data/developers.html

from coronavirus-data.

stevenlis avatar stevenlis commented on May 10, 2024 1

@armsp https://github.com/nychealth/coronavirus-data#geography-zip-codes-and-zctas

from coronavirus-data.

igorgeyn avatar igorgeyn commented on May 10, 2024

from coronavirus-data.

igorgeyn avatar igorgeyn commented on May 10, 2024

@StevenLi-DS I became curious about how well the ZCTAs in NYC map onto one particular Census geographic unit (the Public Use Microdata Area, or "PUMA") -- I have some experience with working with these data in the PUMS context -- and found that there are 28 ZCTAs that have a maximum PUMA share of <60%. In other words, no more than 60% of the population in each of these ZCTAs is located in a single PUMA.

(Table below shows the results of this analysis.)

zcta max_pop min_pop sum_pop n max_prop min_prop
10003 26184 2966 56024 4 0.467371 0.052942
10037 10247 7169 17416 2 0.588367 0.411633
10306 29235 26674 55909 2 0.522903 0.477097
10451 22027 2684 45713 3 0.481854 0.058714
10456 42529 3875 86547 3 0.491398 0.044773
10457 34042 6419 70496 3 0.482893 0.091055
10458 47412 7873 79492 3 0.596437 0.099041
10461 24412 4852 50502 3 0.483387 0.096075
10462 41523 561 75784 3 0.547913 0.007403
10467 39973 24805 97060 3 0.411838 0.255564
10468 36204 11626 76103 3 0.475724 0.152767
10469 32167 899 66631 4 0.482763 0.013492
11206 37871 20525 81677 3 0.463668 0.251295
11207 45892 720 93386 4 0.491423 0.00771
11210 29799 124 62008 4 0.480567 0.002
11216 27628 2707 54316 3 0.508653 0.049838
11217 19970 3075 35881 3 0.556562 0.0857
11221 40226 38669 78895 2 0.509868 0.490132
11223 30790 20033 78731 3 0.391079 0.254449
11226 52928 20038 101572 3 0.521089 0.197279
11233 31278 15530 67053 3 0.466467 0.231608
11235 40583 38549 79132 2 0.512852 0.487148
11238 26568 589 49262 4 0.53932 0.011956
11368 58003 51928 109931 2 0.527631 0.472369
11370 22082 6515 39688 3 0.55639 0.164155
11419 24223 22988 47211 2 0.51308 0.48692
11432 35383 25426 60809 2 0.581871 0.418129
11435 29563 24124 53687 2 0.550655 0.449345

Given that it would not make any sense to simply follow the population proportions in allocating ZCTA-level cases, I am wondering if you or anyone have experience doing a more sophisticated allocation in this context. Hope this helps; happy to chat further.

from coronavirus-data.

stevenlis avatar stevenlis commented on May 10, 2024

Dear @igorgeyn. Thanks for the links you shared.

Some researchers like me might prefer census-tract level data is because many data in ACS is available on that geo level such as median household income, and we can merge it with this covid-19 dataset to have a better understand of the relationship between its prevalence and communities with different demographic characteristics. Meanwhile, NYC have more than two thousands census tracts so you would also end up with a much larger sample size than either PUMA or zipcode area.

Currently, I use 2018 census-tract level ACS 5-year estimates dataset and spatial join to determine all the census tracts within each zipcode area and then aggregate. For total population, I sum them up and for other variables I just take the mean.

One article from The Times mentioned:

The coronavirus has spread into virtually every corner of the city, and some wealthier neighborhoods have been overrun with cases, including some parts of Manhattan and Staten Island. But that may be because of the availability of testing in those areas. Nineteen of the 20 neighborhoods with the lowest percentage of positive tests have been in wealthy ZIP codes

Dr. Jessica Justman, an epidemiologist at Columbia University in Manhattan, said the numbers were most likely because many immigrants and low-income residents live with large families in small apartments and cannot isolate at home.

The data seems consistent with both statements:
nyc-zipcode

from coronavirus-data.

igorgeyn avatar igorgeyn commented on May 10, 2024

@StevenLi-DS gotcha, that's helpful context. I think that's an interesting analysis, forsure. I am not as familiar with spatial joins between census tract and ZIP code/ZCTA.

Are there cases there in which ZIP codes/ZCTAs fall into multiple census tracts, similar to the situation I am seeing with ZCTA-to-PUMA? If so, how are you distributing the ZCTA-level cases among the census tracts in those situations?

Your analysis is indeed interesting and, funny enough, I've actually been spending time thinking about that NYT article as well, albeit from a bit of a different angle. (Namely, I've been working on a regression model to capture the variables you have visualized above as well a few others that get at the general 'lower-income-immigrants-in-a-cramped-house' notion.) All that to say: great minds think alike! :)

from coronavirus-data.

stevenlis avatar stevenlis commented on May 10, 2024

@igorgeyn Since most census tracts are smaller than a zip code area so a zipcode area will contain multiple tracts. Indeed, you would expect some tracts go across the boundary between two zipcode areas. I did a spatial join with only the centroid of each tract, but you could also do it with the polygon of each.

I believe this data will eventually be available on a census tract level so I won't try anything more complicated at this moment.

from coronavirus-data.

algorythmic avatar algorythmic commented on May 10, 2024

@StevenLi-DS

Some researchers like me might prefer census-tract level data is because many data in ACS is available on that geo level such as median household income

This data is available from the ACS by ZCTA as well, e.g.

https://censusreporter.org/data/table/?table=B19013&geo_ids=860|05000US36061,860|05000US36047,860|05000US36081,860|05000US36005,860|05000US36085

from coronavirus-data.

mmontesanonyc avatar mmontesanonyc commented on May 10, 2024

We understand that census tracts are a more informative geography to evaluate neighborhood characteristics compared with ZIP codes. Unfortunately census tract assignment for each case requires a level of address cleaning that we are unable to perform daily at this time. We hope to provide census tract case counts in the future.

from coronavirus-data.

stevenlis avatar stevenlis commented on May 10, 2024

@mmontesanonyc Thanks for the information. I'm definitely looking forward it!

from coronavirus-data.

stevenlis avatar stevenlis commented on May 10, 2024

Hi, @algorythmic. Thanks for the info. Indeed. I haven't been working on zipcode data for a while and it turns out the new data.census.gov has a bug and the acs tables didn't show up when I was looking for it. I just confirm it with the folks at the Census, and they are working on it. Right now you have to select the whole US to be able to select ACS tables.

@igorgeyn this should make life easier

from coronavirus-data.

stevenlis avatar stevenlis commented on May 10, 2024

@mmontesanonyc I know you guys might be very busy to even got the zip code level data cleaned before uploading here. But do you know when (like an estimated time) the census tract level data would be available?

from coronavirus-data.

igorgeyn avatar igorgeyn commented on May 10, 2024

from coronavirus-data.

stevenlis avatar stevenlis commented on May 10, 2024

@igorgeyn as we talked about before, each ZCTA would contain multiple tracts, so we need to the specific tract of the cases.

from coronavirus-data.

igorgeyn avatar igorgeyn commented on May 10, 2024

@mmontesanonyc are you able to advise what it means for a case to be in a tract? Is it the address of residence/origin for a patient who tests for COVID-19 or something else?

from coronavirus-data.

stevenlis avatar stevenlis commented on May 10, 2024

@mmontesanonyc indeed, I do believe that what it should be since I would like to link it to other demographic data.

from coronavirus-data.

stevenlis avatar stevenlis commented on May 10, 2024

https://www.wusa9.com/article/news/health/coronavirus/why-zip-codes-are-an-imperfect-unit-to-measure-covi19-cases/65-12da1870-646c-4d6e-b88c-4f110582c1e3

from coronavirus-data.

mmontesanonyc avatar mmontesanonyc commented on May 10, 2024

@mmontesanonyc are you able to advise what it means for a case to be in a tract? Is it the address of residence/origin for a patient who tests for COVID-19 or something else?

Yes. Along the lines of our note in Readme about tests-by-zcta being by ZIP of residence, it would be patients' residence.

from coronavirus-data.

armsp avatar armsp commented on May 10, 2024

@StevenLi-DS Sir, could you please point out the link(s) from where you got the "Median Household Income" data and you then combined with ZCTA or MODZCTA data in this repository?
I am also looking for "Median Household Income" by zip codes and "People per household" by zip codes data.
It would be really helpful if you could point it out.

from coronavirus-data.

armsp avatar armsp commented on May 10, 2024

@StevenLi-DS Thank you so much, I was wondering that since you have already done some work on "Median Income" as well as "Household Size" you can maybe point me towards those data. I am exploring the website as we speak, but all I see are aggregate values. I am not seeing anything per zipcode or a proper dataset.

from coronavirus-data.

stevenlis avatar stevenlis commented on May 10, 2024

@armsp you should ask folks on census' slack, and they are likely be stored in different tables. There isn't an easy to share them with you with a link.

from coronavirus-data.

armsp avatar armsp commented on May 10, 2024

@StevenLi-DS I found a way to get the data.
The only thing I am stuck with is, here we have COVID cases per MODZCTA. But the census gives data per ZIP Code. Based on my understanding, MODZCTA combines a few Zip Codes together.
In such a case, how do I aggregate "Median Household Incomes" from multiple ZIP Codes to get the "Median Household Income" for a particular MODZCTA?

from coronavirus-data.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.