Code Monkey home page Code Monkey logo

coronavirus-data's Introduction

NYC Coronavirus Disease 2019 (COVID-19) Data

This repository contains data on Coronavirus Disease 2019 (COVID-19) in New York City (NYC). The Health Department classifies the start of the COVID-19 outbreak in NYC as the date of the first laboratory-confirmed case, February 29, 2020.

You can view visualizations of these data on the Health Department’s COVID-19 Data webpage. Additional data related to COVID-19 are available via NYC Open Data.

Data are preliminary and subject to change. Information on this page will change as data and documentation are updated. Tables are updated either weekly on Thursday (at a 3-day lag or with data through the previous Saturday) or monthly (at a 14-day lag).


This Readme includes:

  • How to use this repository
  • Important changes (by date)
  • Key Technical notes
  • Contents

How to use this repository

This repository contains CSV (comma separated values) files of data, and Readme files with important documentation of the data. If you are unfamiliar with Github, you may find these instructions helpful:

To download data, scroll up to the green button labelled "Code." Clicking this button will start a download of a ZIP file of the entire contents of this repository.

Alternatively, you can download a single file. Click on a file you would like to download. Next, click the "Raw" button. Right click and save as a CSV file.

For help understanding a file, you can consult the documentation we have provided in the Readme files for each folder of data. To find Readme files, just click on a folder name, above, and scroll down. Documentation is organized by file name, so you can scroll through the Readme, find the name of the file you are using for, and read documentation on it. Additionally, some universal documentation is provided in the Key Technical Notes.

Questions and custom requests: We will try to answer questions about the data in this repository as we are able to. If you have a question, please search the Issues to see if it’s already been addressed. Please understand that we are responding to a pandemic and we might not be able to address all questions in a timely manner. We are not able to accommodate custom data requests placed via Github.


Update on October 5, 2023

Due to changes in reporting requirements, vaccination data are incomplete. We will no longer be presenting data by vaccination status, and we will be discontinuing updates to our now-weekly-breakthrough file.

Update on June 1, 2023

Because the federal public health emergency for COVID-19 has ended, labs are no longer required to report negative SARS-CoV-2 test results. Multiple labs have stopped reporting these results, so we are no longer able to accurately calculate percent positivity and testing rates for COVID-19. The following tables will no longer be updated:

  • antibody-by-group.csv
  • antibody-by-modzcta.csv
  • antibody-by-week.csv
  • last7days-by-modzcta.csv
  • now-testing-by-age.csv
  • now-tests.csv
  • pp-by-modzcta.csv
  • percentpositive-by-modzcta.csv
  • testing-by-age.csv
  • testrate-by-modzcta.csv
  • tests.csv

Update on May 11, 2023

The Health Department made several changes to this repository and the COVID-19 Data webpage on May 11, 2023. These include:

  • We are applying a revised the COVID-19 death definition and COVID-19 deaths will no longer be classified as confirmed or probable. Deaths from April 3, 2023, will be counted as a COVID-19 death if:
    • the death certificate lists COVID-19 or an equivalent term as the underlying or a contributing cause of death, or
    • a case investigation for a confirmed, probable, or suspect COVID-19 case determined that COVID-19 was the cause of death or contributed to the death.
  • In line with the death definition change change, total deaths will be presented instead of confirmed and probable deaths in all graphs and tables.
  • We have revised the tables for weekly hospitalization and death rates by age and race/ethnicity to correctly categorize people with COVID-19 based on date of event (hospital admission or death). These files previously categorized people with COVID-19 based on date of COVID-19 diagnosis. Trends using the updated data remain the same.

Update on April 3, 2023

Starting the week of April 3, 2023, the Health Department will update data in this repository and on the COVID-19 Data webpage weekly on Thursdays.

Update on October 28, 2022

The Health Department uploaded historical probable deaths among NYC residents recently reported from other jurisdictions. These were cases where COVID was listed on the death certificate but without a corresponding positive lab and took place in 2020, 2021, and early 2022.

Update on September 14, 2021

  • Antibody-by-age, -by-boro, -by-poverty, and -by-sex data are now found in antibody-by-group.csv. The other files will no longer be updated.
  • by-age, by-boro, by-poverty, by-race, and by-sex will no longer be updated. Those data are now found in by-group.csv.
  • deaths-by-boro-age, deaths-by-underlying-conditions, and probable-confirmed-by-age, -by-location, -by-boro, -by-race, and -by-sex are no longer updated.

Important: Update on August 11, 2021

As indicated in commit notes from 8/11, there were technical issues with our data processing on 8/9 and 8/10. While these issues were being fixed, data updates were paused for those days. Data updated on 8/11 include backfill for days with no updates. Differences in counts in cumulative files from 8/8 to 8/11 reflect events that have happened over a broad recent time period and should not be interpreted as events that have happened since the previous update. As always, data are preliminary and subject to change, and dates are backfilled as additional data come in.

Important: Update on August 2, 2021

The Health Department made several changes to this repository and the COVID-19 Data webpage on August 2, 2021. These include:

  • Adding weekly case, hospitalization, and death rate files by race/ethnicity, and age.
  • Adding 7-day Transmission Rates for Citywide, Borough, and UHF42 neighborhoods
  • Revising case, hospitalization, and death rates to reflect both confirmed and probable cases

Important: Update on June 10, 2021

The Health Department made several changes to this repository and the COVID-19 Data webpage on June 10, 2021. These include:

  • Adding data on SARS-CoV-2 variants, including the number and type of variants identified in NYC over time
    • Please see the technical notes for a description of SARS-CoV-2 variants and genomic sequencing in NYC
    • Data on SARS-CoV-2 variants will be updated weekly on Thursday (with data through the previous Saturday)

Important: Update on March 24, 2021

As indicated in commit notes from 3/19 and 3/20, there were technical issues in the data transmission from New York State to New York City. This resulted in counts that were lower than expected for several days. While this transmission error was being fixed, data updates were paused for 3/21, 3/22, and 3/23. Data updated on 3/24 include backfill for days with low counts and days with no updates. Differences in counts in cumulative files from 3/20 to 3/24 reflect events that have happened over a broad recent time period and should not be interpreted as events that have happened since the previous update. As always, data are preliminary and subject to change, and dates are backfilled as additional data come in.


Important: Changes on March 3, 2021

The Health Department made several changes to this repository and the COVID-19 Data webpage on March 3, 2021. These include:

  • Adding rates of hospitalizations and confirmed deaths by modified ZIP code tabulation areas

    • Case rates will be calculated for the most recent 28-days on a 14-day lag, and updated daily
  • Adding trends in rates of hospitalizations and confirmed deaths by multiple geographies

    • Case rates will be calculated for each month on a 14-day lag, and updated monthly

Important: Changes on December 7, 2020

The Health Department made several changes to this repository and the COVID-19 Data webpage on December 7, 2020. These include:

  • Changing the naming convention for the main categories of COVID-19 tests

    • Data referenced in this repository as "molecular tests" correspond to data previously labeled as "PCR tests." Please see the technical notes for a description of the different types of COVID-19 laboratory tests
  • Including probable COVID-19 cases into summary data and epi curves

    • Please see the technical notes for a description of the different case definitions for COVID-19
    • Cumulative data, such as case, hospitalization, and death rates, are for confirmed COVID-19 cases only
  • Updating all files by geography to reflect a revised geocoding process

    • The Health Department receives borough of residence and other address information as part of routine reportable disease surveillance. These geographic data require substantial cleaning prior to analysis and presentation, and the data in this repository reflect an updated geocoding process as of December 7, 2020. Therefore, counts and rates by geographies (i.e., borough, modified ZIP code tabulation area) may change slightly.

Important: Changes on November 9, 2020

In order to support an update to the Health Department’s COVID-19 Data webpage on November 9, 2020, changes were made to this repository, including revisions to some key files, filenames, and locations. These changes include:

  • Adding daily 7-day cumulative percent positivity by modified ZIP code tabulation areas
  • Adding test rate and percent positivity by age
  • Adding testing turnaround time
  • Using more granular categories for the display of data by age
  • Revising the output and presentation of data by borough
  • Reclassifying files as Latest, Trends, or Totals
  • Adding files with the now- prefix that truncate trend data to the last 90 days to display data with a focus on recent changes
  • Changing the organization of select data elements as outlined in the crosswalk below:
Prior file name(s) New file name(s) New file location
boro.csv by-boro.csv Totals/
case-hosp-death.csv data-by-day.csv Trends/
tests-by-zcta data-by-modzcta.csv Totals/
boro/boroughs-case-hosp-death.csv data-by-day.csv Trends/
boroughs-by-age.csv, boroughs-by-race.csv, boroughs-by-sex.csv group-data-by-boro.csv, group-case-by-boro.csv, group-hosp-by-boro,csv, group-deaths-by-boro.csv Totals/
deaths/probable-confirmed-dod.csv data-by-day.csv Trends/
sydromic_data.csv covid-like-illness.csv Trends/
recent-4-week-citywide.csv Similar data available in now-summary.csv Latest/
recent-4-week-by-modzcta.csv Similar data available in caserate-by-modzcta.csv, testrate-by-modzcta.csv, percentpositive-by-modzcta.csv Trends/


Key Technical Notes

Public health reporting

Reporting lag

Our data are updated either weekly on Thursday (at a 3-day lag or with data through the previous Saturday) or monthly (at a 14-day lag). For example, a 3-day lag means that the most recent data in the update are from three days before. These lags are due to standard delays (up to several days) in reporting a new test, case, hospitalization or death to the Health Department, and are a common limitation of surveillance data. Given the delay, our counts of what has happened in the most recent few days are artificially small. We delay publishing these data until more reports have come in and the data are more complete.

Report date versus date of event

Due to lags common with surveillance data, we receive reports of events (diagnoses, hospitalizations and deaths) that happened on past days. We publish trend data (e.g., case-hosp-death.csv) using date of event (date of diagnosis, date of hospitalization or date of death), not date of report. This approach may differ from the data published by other state and local health departments.

Publishing data by date of event better reflects when things actually happened (e.g., when a person went to the doctor to get tested), as opposed to when the Health Department learned about them. We strongly discourage data users from using daily changes to cumulative files as trend data – this represents information by report date and is prone to misuse and misinterpretation.

Differences between City and State values

Generally, the NYC Health Department and the New York State Department of Health will not have matching numbers for the same metrics, though they report the same general trends. Some reasons for this include:

  • Different data sources for different metrics

  • Different analytical and informatics processes

  • Different uses of event date or report date (see above)

Types of disease surveillance

The Health Department conducts two main types of surveillance for COVID-19:

  • Syndromic surveillance

  • Reportable disease surveillance

Syndromic surveillance

We receive data from all 53 hospital emergency departments (EDs) in NYC about the types of illnesses people experience on a regular basis. This surveillance allows the Health Department to evaluate care-seeking trends at hospitals for influenza-like illness and pneumonia.

The information on each patient is evaluated for descriptions that resemble influenza-like illness or pneumonia, or include the ICD-10-CM code (U07.1) for 2019 novel coronavirus disease. Influenza-like illness is defined as mention of either:

  • Fever and cough
  • Fever and sore throat
  • Fever and shortness of breath or difficulty breathing
  • Influenza

We exclude those who present with influenza-like illness and are subsequently assigned with only an ICD-10-CM code for influenza.

Pneumonia is defined as mention or diagnosis of pneumonia. Since the signs and symptoms of COVID-19 overlap with these categories that the Health Department tracks routinely, we are able to identify unusual spikes in people seeking care at hospitals. We are using this as a proxy measure to observe COVID-19-like disease in the population.

  • Strengths: The data show real-time, population-level trends of people seeking health care for COVID-like disease

  • Limitations: The data do not represent patients with laboratory-confirmed COVID-19

Reportable disease surveillance

The Health Department receives electronic laboratory reports for a number of infectious diseases, including COVID-19, as required by law in the NYC Health Code. When a specimen is collected from a patient for SARS-CoV-2 laboratory testing, the laboratory must report all results to the Health Department. As of May 11, 2023, labs are only required to submit positive diagnostic SARS-CoV-2 test results to the Health Department. Limited demographic information on the person being tested is reported to the Health Department, including name, address, and date of birth.

Laboratory testing

Types of COVID-19 laboratory tests

The COVID-19 testing landscape is continually changing. Please see the Health Department's guidance on SARS-CoV-2 tests for up to date information on the use and interpretation of tests.

There are three main types of COVID-19 tests that are reported to the Health Department as part of reportable disease surveillance:

Diagnostic (viral) tests

  • Molecular tests: The primary test for COVID-19 infection is the molecular test, which includes the polymerase chain reaction (PCR) test. Molecular tests work through direct detection of the virus’s genetic material, and typically involve collecting a nasal swab. After specimen collection, molecular tests are generally processed in large laboratories, and consequently, the results may take a few days to be delivered. These tests are highly accurate, and recommended for diagnosing current COVID-19 infection.

  • Antigen tests: This test detects proteins on the surface of the virus that are called antigens. The antigen test typically involves collecting a nasal swab. The results from an antigen test are delivered quickly (turnaround as low as 15 minutes), however, these tests are not as accurate as molecular tests, especially when the likelihood of someone having a COVID-19 infection is low.

Serologic tests

  • Antibody tests: Exposure to COVID-19 can be detected by measuring antibodies, which can reflect a person’s immune response to the virus. Antibodies are proteins produced by the body’s immune system that can be found in the blood. People can test positive for antibodies specific to COVID-19 after they have been exposed, sometimes when they no longer test positive for the virus itself. Therefore, an antibody test will not be accurate for someone with active or recent infection, but can identify people who likely had a previous COVID-19 infection. It is important to note that the science around COVID-19 antibody tests is evolving rapidly and there is still much uncertainty about what individual and population level antibody test results mean for the epidemiology of COVID-19.

Strengths: This standard reporting system allows for rapid and detailed information to be transmitted routinely to the Health Department.

Limitations:

  • Because of delays in reporting, the most recent data may be incomplete. Current data will be updated in the future as information on laboratory tests are reported to the Health Department.

  • Health Department recommendations for testing have changed throughout the COVID-19 outbreak. During the spring of 2020, the Health Department advised people with mild to moderate symptoms to stay at home and not seek testing to conserve testing supplies and personal protective equipment. Consequently, many cases in the community early in the outbreak were never diagnosed with a laboratory test and will not be included in these counts.

  • The testing landscape for COVID-19 is continually changing as new tests receive emergency use authorization from the Food and Drug Administration (FDA). For example, antigen testing started to become more widely available in NYC in October 2020. These changes in the types of tests should be considered when looking at trends across time.

  • Antibody testing started to become more widely available in NYC in April 2020. Cumulative data include all antibody tests with a specimen collection date after March 3, 2020, and data reported by week include tests conducted starting on April 5, 2020.

  • The Health Department consistently receives electronic reports only for COVID-19 tests that are conducted in laboratories. Point-of-care and at-home tests may be conducted in a setting outside of a clinical laboratory; these settings often do not have the infrastructure for electronically reporting directly to the Health Department. While providers and facilities are required to report results from point-of-care and at-home tests to the Health Department within 24 hours, these data are incomplete.

  • Most of the data in this repository include patients who reside in congregate facilities, such as correctional facilities and long-term care facilities. While data reported from these facilities may sometimes influence local trends, cases reported from these facilities do not necessarily represent community-based transmission. The only data that exclude patients in congregate facilities are in pp-by-modzcta.csv and last7days-by-modzcta.csv.

  • Because these data only provide information on people tested and not everyone who may have had COVID-19 in NYC, caution needs to be used when interpreting testing data. For example, people who are tested for antibodies may be more likely to test positive because people who were previously ill are preferentially seeking testing, in addition to the testing of persons with higher exposure (e.g., health care workers, first responders). Therefore, these data may not reflect antibody prevalence among all New Yorkers.

  • Increasing instances of screening programs further impact the generalizability of all testing data, as screening programs influence who and how many people are tested over time. Examples of screening programs in NYC include: employers screening their employees (e.g., hospitals), and long-term care facility screening of residents and employees.

  • These data are based on electronic laboratory reports, which often lack information on demographic and clinical characteristics of interest, such as race and ethnicity, co-occurring medical conditions, presence and onset date of COVID-19 symptoms, reason for testing, and occupation.

Counting COVID-19 cases, hospitalizations, and deaths

Case definitions for COVID-19

Surveillance case definitions for all notifiable conditions are developed at the national level by the Council of State and Territorial Epidemiologists (CSTE). These standard definitions support public health officials in classifying and counting infections consistently across different states and local jurisdictions. The criteria for reporting a person with COVID-19 infection (“case”) are based on laboratory test results and epidemiologic links, and include two classifications:

  • Confirmed COVID-19 case: A person is classified as a confirmed COVID-19 case if they test positive with a molecular test

  • Probable COVID-19 case: A person is classified as a probable COVID-19 case if they meet any of the following criteria with no positive molecular test on record: (a) test positive with an antigen test, (b) have symptoms and an exposure to a confirmed COVID-19 case, or (c) died and their cause of death is listed as COVID-19 or similar

The Centers for Disease Control and Prevention (CDC) consider most people to be protected from getting COVID-19 again for up to 90 days after testing positive for the virus. For consistency, as of June 9, 2021, people who meet the definition of a confirmed or probable COVID-19 case >90 days after a previous positive test (date of first positive test) or probable COVID-19 onset date will be counted as a new case. Prior to June 9, 2021, new cases were counted ≥365 days after the first date of specimen collection or clinical diagnosis.

Case reporting

NYC COVID-19 data include people who live in NYC. Any person with a residence outside of NYC is not included.

Reporting on hospitalization status

The Health Department imports information on hospitalization status from a number of sources, including Regional Health Information Organizations, NYC public hospitals, non-public hospital systems, remote access to electronic health record systems, the Health Department’s electronic death registry system, and the Health Department's syndromic surveillance database that tracks daily hospital admissions from all 53 emergency departments across NYC. People who were hospitalized more than one time are only counted once.

Note that hospitalization information can be missing or incomplete from a number of facilities, which is a limitation for any analysis considering hospitalization status by geography (e.g., borough).

With the November 9, 2020 update, we revised the definition of a COVID-19 hospitalization and removed people who were hospitalized more than 14 days before or after their COVID-19 diagnosis from our count. Starting October 6, 2022, we revised the definition of a COVID-19 hospitalization to prospectively include those diagnosed 14 days before through 3 days after their hospitalization.

Hospitalizations are among confirmed or probable cases of COVID-19.

Reporting on COVID-19 deaths

COVID-19 deaths are reported from March 11, 2020 as this was the first date of death for a patient with confirmed COVID-19.

Starting April 3, 2023, COVID-19 deaths are no longer classified as confirmed or probable. Deaths are counted as a COVID-19 death if:

  • the death certificate lists COVID-19 or an equivalent term as the underlying or a contributing cause of death, or

  • a case investigation for a confirmed, probable, or suspect COVID-19 case determined that COVID-19 was the cause of death or contributed to the death.

Prior to April 3, 2023, there were two classifications of COVID-19 deaths reported:

  • A death was classified as confirmed if the decedent was a NYC resident who had a positive molecular (PCR) test* for the virus that causes COVID-19 and did not die of external causes such as gunshot wounds or drug overdoses.

  • A death was classified as probable if the decedent was a NYC resident (or residency pending) who had no known positive molecular test for the virus that causes COVID-19 but the death certificate lists “COVID-19” or an equivalent as a cause of death.

Starting in June 2020, people who died more than 60 days after their COVID-19 diagnosis and starting August 3, 2021, people who died more than 30 days after their COVID-19 diagnosis who did not have "COVID" or similar listed on their death certificate were removed from the death count. This was to address instances in which a person was diagnosed with COVID-19 and survived, but later died, likely of other causes.

Differences between death counts between NYC and New York State: Data on deaths reported by NYC are derived from the Health Department’s surveillance database and will be different from data reported by the New York State Department of Health. The State Department of Health reports data on deaths from:

  • The State Hospital Emergency Response Data System
  • Daily calls to hospitals and other facilities that are caring for patients, such as nursing homes

The NYC Health Department reports data on deaths that reflect both:

  • Positive tests for COVID-19 confirmed by laboratories
  • Confirmations of a person’s death from the City’s Office of the Chief Medical Examiner and the Health Department's Bureau of Vital Statistics, which is responsible for the registration, analysis and reporting of all deaths in NYC

Changes to reported data

The Health Department updates data for earlier dates after resolving testing and reporting delays. Reported data reflect what we know at the time of publishing on Github, not what occurred in real time. For example, we may find that a person who was originally reported to live in NYC no longer does. This person would be removed from our dataset after their address is updated, and our case count would decrease by one.

Rates vs. case counts

The Health Department is reporting rates of cases, hospitalizations, and deaths in addition to counts. We report rates to give clear comparisons between different groups — such as borough, sex, or age — with differently sized populations. For example, we may report that the rate of confirmed COVID-19 cases is 100 per 100,000 population in NYC. That means for every 100,000 people living in NYC, there are 100 people diagnosed with COVID-19.

Rates per 100,000 people

Rates for annual citywide-, borough-, ZIP code tabulation areas-, and demographic-specific categories were calculated using interpolated intercensal population estimates updated in 2020. These rates differ from previously reported rates based on the 2000 Census or previous versions of population estimates. The Health Department produced these population estimates based on estimates from the U.S. Census Bureau and NYC Department of City Planning.

Please note that population estimates were updated on November 9, 2020 to reflect annual population estimates for all New Yorkers as of July 1, 2019. These estimates are prior to the COVID-19 outbreak, and therefore, do not represent any changes to NYC’s population as a result of COVID-related migration.

Rates of cases, hospitalizations, and deaths for poverty and race/ethnicity groups were calculated using direct standardization for age at diagnosis, hospitalization, or death and weighting by the US 2000 standard population.

Demographic characteristics

Geography: ZIP codes and ZCTAs

We report information by geography using modified ZIP Code Tabulation Areas (MODZCTA). It can be challenging to map data that are reported by ZIP Code. A ZIP Code doesn’t actually refer to an area, but rather a collection of points that make up a mail delivery route. Furthermore, there are some buildings that have their own ZIP Code, and some non-residential areas with ZIP Codes.

To deal with the challenges of ZIP Codes, the Health Department uses ZCTAs which solidify ZIP codes into units of area. Often, data reported by ZIP code are actually mapped by ZCTA. The ZCTA geography was developed by the U.S. Census Bureau.

The modified ZCTA (MODZCTA) geography combines census blocks with smaller populations to allow more stable estimates of population size for rate calculation.

Information by geography reflect people's MODZCTA of residence at the time of reporting, and not the location of testing, diagnosing, or hospitalization.

Poverty groups

Neighborhood-level poverty groups were classified in a manner consistent with Health Department practices to describe and monitor inequities in health in NYC. Neighborhood poverty measures are defined as the percentage of people earning below the Federal Poverty Threshold (FPT) within a ZCTA, per the American Community Survey 2014-2018.

The standard cut-points for defining categories of neighborhood-level poverty in NYC are:

  • Low: <10% of residents in ZCTA living below the FPT
  • Medium: 10% to <20%
  • High: 20% to <30%
  • Very high: ≥30% residents living below the FPT

Age groups

The Health Department initially reported out data for the following age groups: 0-17, 18-44, 45-64, 65-74, and 75+ years. As of November 9, 2020, we updated the age groups to: 0-4, 5-12, 13-17, 18-24, 25-34, 35-44, 45-54, 55-64, 65-74, and 75+ years to provide more detail and granularity on age groups, especially with regard to children and young adults. For data on deaths, age groups 0-4, 5-12, and 13-17 are collapsed into 0-17 years due to low death counts in this population and to ensure protection of privacy.

Race and ethnicity

Race and ethnicity information is often missing in reportable disease surveillance. Information on race/ethnicity typically comes from electronic laboratory reports and unfortunately, race/ethnicity data are often missing in these reports. For the COVID-19 response, the Health Department has electronically imported aggregated data from partners such as hospitals, hospital systems, or Regional Health Information Organizations to improve the completeness of race/ethnicity data for people who are hospitalized. However, health records may also be missing race/ethnicity information. Additionally, the Health Department often investigates or imports race/ethnicity information for people who have died. However, this information is often incomplete or not immediately available because it can take a few days for the information to be entered into the electronic death registration system. Race/ethnicity information is typically collected by funeral directors from next of kin of the decedent.

The Health Department classifies race/ethnicity into the following mutually-exclusive categories: Asian/Pacific-Islander, Black/African-American, Hispanic/Latino, and White. Information on people identified as other categories, including Native American/Alaska Native or multi-racial, are not provided in files showing race/ethnicity data. The Hispanic/Latino category includes people of any race, and all other categories exclude those who identified as Hispanic/Latino.

Differences in health outcomes among racial and ethnic groups are due to long-term institutional and personal biases against people of color. There is no evidence that these health inequities are due to personal traits. Lasting racism and an inequitable distribution of resources needed for wellness cause these health inequities. These include quality jobs, housing, health care and food, among others. The greater impact of the COVID-19 pandemic on people of color shows how these inequities influence health outcomes.

Variants of the SARS-CoV-2 virus

Multiple variants of the SARS-CoV-2 virus have been characterized in the US and globally. These variants involve mutations to the SARS-CoV-2 virus, and might make COVID-19 easier to spread, more severe, or more likely to reinfect people who have either had COVID-19 before or who have been vaccinated.

Surveillance for variants

Variants can be detected through genomic sequencing, a process that involves analyzing the virus's genetic material. Sequencing occurs on specimens collected for COVID-19 molecular laboratory testing, and determines which variant of the SARS-CoV-2 virus a particular person was infected with.

The City’s Public Health Laboratory (PHL) and Pandemic Response Laboratory (PRL) have been sequencing a subset of SARS-CoV-2 laboratory specimens to identify emerging variants in NYC. Since October 2020, the PHL has sequenced all laboratory specimens received that meet certain technical criteria (e.g., sufficient levels of virus in a sample). Starting in February 2021, the PRL sequenced randomly selected specimens that meet certain technical criteria. As of January 2023, PRL has closed and is no longer sequencing specimens.

A small proportion of all confirmed COVID-19 cases are now being sequenced citywide. As such, all findings related to variant data are based on a small subset of all confirmed COVID-19 cases. Because patients who have specimens sequenced are likely to be different than those that do not, findings may not be representative of all confirmed COVID-19 cases citywide, and should be interpreted with caution. Additional specimens are being sequenced by the New York State Wadsworth Laboratory and university, hospital, and private laboratories, and reported to the Health Department. These include samples from NYC Health & Hospitals emergency departments, the Office of the Chief Medical Examiner, and other sources, which may bias data toward more severe cases.

Laboratories have identified multiple variants which have emerged in NYC. The Health Department uses findings from the PHL and PRL, as well as other laboratories reporting findings to the Health Department in combination with epidemiologic surveillance systems, to better understand whether the variant might affect:

  • Transmission: Whether these variants increase transmission, making it easier to spread COVID-19
  • Re-infection: Whether these variants are more likely to re-infect people who previously had COVID-19
  • Vaccine effectiveness: Whether these variants are more likely to infect fully vaccinated people. Some infections among fully vaccinated people are expected to occur in a very small proportion of vaccinated people, regardless of the spread of variants.
  • Severity: Whether these variants are more likely to result in hospitalization or death

Variant classifications

The CDC classifies variants into the following three categories:

  • Variants of interest: Has specific genetic markers associated with changes to receptor binding, reduced neutralization by antibodies generated against previous infection or vaccination, reduced efficacy of treatments, potential diagnostic impact, or predicted increase in transmissibility or disease severity
  • Variants of concern: Evidence of an increase in transmissibility, more severe disease (e.g., increased hospitalizations or deaths), significant reduction in neutralization by antibodies generated during previous infection or vaccination, reduced effectiveness of treatments or vaccines, or diagnostic detection failure
  • Variants of high consequence: Clear evidence that prevention measures or medical countermeasures (MCMs) have significantly reduced effectiveness relative to previously circulating variants

Please see the CDC’s definitions of SARS-CoV-2 variants for more information. Details on the variants that have emerged in NYC, which are being actively monitored are available on the “Variants” page of the Health Department’s COVID-19 Data webpage.

The Health Department is continuing to investigate the emergence of variants of concern and variants of interest in NYC, using a combination laboratory and epidemiologic observations to characterize each variant. Some ongoing efforts include:

  • Monitoring the number of hospitalizations and deaths that occur among patients with sequenced specimens that are caused by variants
  • Identifying cases caused by variants where the person had a previous positive diagnostic test for COVID-19 more than 90 days earlier. These cases are investigated to determine if they are likely to represent a reinfection, and to determine if reinfection cases are more common in people who have been infected with one of the variants
  • Matching data on cases caused by variants with the NYC Citywide Immunization Registry to identify if the person was fully immunized prior to testing positive for COVID-19

Repository contents

latest/

This folder contains files with data that focus on the most recent period of the outbreak. It includes daily 28-day counts and rates of hospitalizations and deaths by MODZCTA, and trend data that cover the most recent 90 days. See this folder’s Readme for a detailed description of its contents.

totals/

This folder contains files with cumulative totals since the start of the COVID-19 outbreak in NYC, which the Health Department defines as the diagnosis of the first confirmed COVID-19 case on February 29, 2020. The Health Department recommends against interpreting daily changes to these files as one day’s worth of data, due to the difference between date of event and date of report. See this folder’s Readme for a detailed description of its contents.

trends/

This folder contains files with daily, weekly, and monthly data shown across time. Note that these trend data are published by date of event, not by date of report. The Health Department recommends against interpreting daily changes to these files as one day’s worth of data, due to the difference between date of event and date of report. See this folder’s Readme for a detailed description of its contents.

variants/

This folder contains files with data on SARS-CoV-2 variants. It includes information on the number and type of SARS-CoV-2 variants identified in NYC, over time and by MODZCTA. All tables containing variant data are updated weekly on Thursday (with data through two previous Saturdays). These files are based on a small subset of all confirmed COVID-19 cases; findings may not be representative of all confirmed COVID-19 cases citywide, and should be interpreted with caution. See this folder’s Readme for a detailed description of its contents.

Geography-resources/

This folder contains additional resources for data provided by MODZCTA geographies, inlcuding geographic files for MODZCTA. See this folder’s Readme for a detailed description of its contents.

archive/

This folder contains files that are no longer updated.

coronavirus-data's People

Contributors

acharney2 avatar awilson18doh avatar awtang avatar cchang21 avatar cthompson2nyc avatar eabubakar avatar eluomanyc avatar eplumeng avatar grantpezeshki avatar hbpartonnyc avatar joekim23 avatar jslutsker avatar kjohnson5nyc avatar lfirestein avatar meddynyc avatar mmontesanonyc avatar nyc-dohmh-mm avatar osamson1 avatar pchan7 avatar rmacd1 avatar rrohrerdohmh avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

coronavirus-data's Issues

Citywide rate question

Is the citywide death rate found in the age and sex files based on confirmed deaths only, or both confirmed and probable?

Normalizing to 100,000 Population

I am concerned that the process of normalizing the data to represent 100,000 population may have been based on national coefficients and not NYC based coefficients. I am specifically concerned that the AGE-GROUP presentation here is showing that AGE does not really matter.

When I used Coefficients based on NYC demographics I came up with a different story. I have been especially concerned with the 65-74 age group (Young Retired) have had a disproportional high level of disease incident level from the onset.
Please see my further explanation & assessment here

The readme documentation points to CDC with regards to 100,000 standardization. My review of the CDC document talks about National Coefficients based on National Demographics for a National Survey.

daily testing data (testing.csv)

Hi,

Thanks so much for making this available.

I was wondering if the data for daily testing- hospitalizations- deaths is still available in testing.csv. I'm not seeing this in the updated repo and we were interested in looking at these numbers.

Best,
Ashley

1. Patients transferred out of NYC and 2. Hospitalizations in new-to-NYC facilities

Thank you for managing this repository.

  1. How will DOHMH begin to count patients (hospitalizations, deaths) transferred outside the city system to relieve pressure to the system? https://www.syracuse.com/coronavirus/2020/04/new-york-city-hospitals-begin-transferring-patients-to-upstate-ny-report.html ?

  2. Will data start reflecting cases taken up in USS Comfort and Javitz Center and on what date?

  3. Will NYC hospital discharge data start being published ?

Current Hospitalizations

Is it possible to report the current hospitalizations each day as opposed to the "ever hospitalized?" The current hospitalizations are what all other NY jurisdictions are reporting, apparently from HERD.

Tendency

Hi,

I’d like to see the tendency of positive, hospitalizations and death by borough, like a graphic showing day by day by borough, the number of people getting sick and dying. Thank you

Date of first death

case-hosp-death.csv lists the first death (DEATH_COUNT: 1) on 3/11/20. News reports and statements from public officials suggest the first death occurred on 3/13/20. Which is correct? Thanks.

testing.csv file?

Hi,

Readme says that there should be a file called testing.csv, which is supposed to include "counts of New York City residents with specimens collected for SARS-CoV-2 testing by day, the subsets who tested positive as confirmed COVID-19 cases, were ever hospitalized, and who died."

Will that be posted soon? Also, last week the mayor was promising a count of people admitted into the ICU as well has being hospitalized. Will that be in that file?

Thanks,

by_age double count?

Your binning is off in the by_age table such that I'm not sure whether 75yos are being counted twice, or if there is simply a typo in one of the bin labels:
65-75 years
75 and older years

Data Formatting

Hi All,

First off, thanks so much to the maintainers of this github for all their effots!

My question is if the data tables be updated to include:

  • historical daily numbers
  • demographic info?

As of now, the "summary" csv only contains totals, but there must be a data-set that includes the historical dailies somewhere, as this underlies this view. Would it be possible to get access to that data-set? I am trying to calculate the doubling rate as data evolves, which is not really possible without having access to this historical data.

In short, I'm looking to find a data-set that has all the info available in the pdf historics, e.g. for Total Cases, Deaths and Hospitalizations. As above, these datasets must exist because there are dynamic graphs that display these numbers available here.

It doesn't really look like it would be possible to rebuild this set with the summary data points or calculated data, such as case rate per hundred thousand.

Ideally, a table would look (something) like this, with one for each borough - the age demographics could be built into additional columns, but regardless the point is that this data would be very helpful to run deeper analysis.

Date Tests Run - Date Tests Run- Total Positive Tests - Date - Positive Tests Total Hospitalizations - Date Hospitalizations - Total Fatalities - Date Fatalities - Total
4/9 - Boro-Wide [x] [x] [x] [x] [x] [x] [x] [x]
4/9 - Age 0-17 M [x] [x] [x] [x] [x] [x] [x] [x]
4/9 - Age 18-44 M [x] [x] [x] [x] [x] [x] [x] [x]
4/9 - Age 45-64 M [x] [x] [x] [x] [x] [x] [x] [x]
4/9 - Age 65-74 M [x] [x] [x] [x] [x] [x] [x] [x]
4/9 - Age 75+ M [x] [x] [x] [x] [x] [x] [x] [x]
4/9 - Age 50+ F [x] [x] [x] [x] [x] [x] [x] [x]
4/9 - Age 0-17 F [x] [x] [x] [x] [x] [x] [x] [x]
4/9 - Age 18-44 F [x] [x] [x] [x] [x] [x] [x] [x]
4/9 - Age 45-64 F [x] [x] [x] [x] [x] [x] [x] [x]
4/9 - Age 65-74 F [x] [x] [x] [x] [x] [x] [x] [x]
4/9- Age 75+ F [x] [x] [x] [x] [x] [x] [x] [x]
4/9- Age 50+ F [x] [x] [x] [x] [x] [x] [x] [x]
4/8 - Boro-Wide [x] [x] [x] [x] [x] [x] [x] [x]
4/8 - Age 0-17 M [x] [x] [x] [x] [x] [x] [x] [x]
4/8 - Age 18-44 M [x] [x] [x] [x] [x] [x] [x] [x]
4/8 - Age 45-64 M [x] [x] [x] [x] [x] [x] [x] [x]
4/8 - Age 65-74 M [x] [x] [x] [x] [x] [x] [x] [x]
4/8 - Age 75+ M [x] [x] [x] [x] [x] [x] [x] [x]
4/8- Age 50+ F [x] [x] [x] [x] [x] [x] [x] [x]
4/8 - Age 0-17 F [x] [x] [x] [x] [x] [x] [x] [x]
4/8 - Age 18-44 F [x] [x] [x] [x] [x] [x] [x] [x]
4/8 - Age 45-64 F [x] [x] [x] [x] [x] [x] [x] [x]
4/8 - Age 65-74 F [x] [x] [x] [x] [x] [x] [x] [x]
4/8 - Age 75+ F [x] [x] [x] [x] [x] [x] [x] [x]
4/8- Age 50+ F [x] [x] [x] [x] [x] [x] [x] [x]

Does anyone have something like this?

Thank you again, everyone - be safe, be well and please wash your hands!

Publish borough level breakdown

Can the data be broken down to the borough level?

Interestingly, USAFacts is publishing confirmed and deaths down to the borough level, which even the NYTimes COVID19 repo nor the JHU COVID19 repo does not do.

https://usafacts.org/visualizations/coronavirus-covid-19-spread-map/

Where do they get the borough breakdowns if not NYCDOMH, which they list as one of their sources?

With the USAFacts data, I was able to pull together this quick boro-level viz using Google Colab
https://colab.research.google.com/drive/17-g0CIG57q13S4p26r6NpCFTLSYA1mBn

tests-by-zcts.csv data issue

Data quality question on tests-by-zcts.csv (cumulative count of New York City residents by ZIP code of residence):

What does it mean if MODZCTA is NA? Does this just mean you don't know their ZIP code but they are still a New York resident? Can we assume that all of these people live in New York if we are mapping these ZIPs to geography?

NY city zcta shape files

How many Zip code (zcta) areas are in the NY city? Corona virus data are posted for 177 zcta areas here. Does any one have the shape files for these 177 zip codes or knows where to get these? I have two different shape files: one with 200 and the other with 214 zipcode areas but those don't have some of the 177 zcta areas for which we have the corona virus data.

Today's (4/13) case/hospitalization/death count not properly updated

Hello...

Thanks much for posting this data. It's an important resource.

Today's (early) update of case-hosp-death.csv has an updated count of cases, but neither hospitalizations nor deaths are updated -- they match yesterday's numbers. Similarly, the death/hosp numbers on the official NYCDOHMH data page are not updated, even though cases are.

Thanks for your attention!

--Charles Seife

Raw data

Does anyone have the raw data grabs from the PDF's published for data older than March 31? This is what I have from pdf's but wanted the historical breakdowns starting March 1st.

Borough Cases Hospitalized Deaths DateTime
Bronx 1071     March 21, 2020 5:00PM
Brooklyn 2484     March 21, 2020 5:00PM
Manhattan 1863     March 21, 2020 5:00PM
Queens 2254     March 21, 2020 5:00PM
Staten Island 437     March 21, 2020 5:00PM
Unknown 5     March 21, 2020 5:00PM
Bronx 1829     March 23, 2020 9:30PM
Brooklyn 3494     March 23, 2020 9:30PM
Manhattan 2572     March 23, 2020 9:30PM
Queens 3621     March 23, 2020 9:30PM
Staten Island 817     March 23, 2020 9:30PM
Unknown 6     March 23, 2020 9:30PM
Bronx 2328     March 24, 2020 9:30PM
Brooklyn 4237     March 24, 2020 9:30PM
Manhattan 2887     March 24, 2020 9:30PM
Queens 4364     March 24, 2020 9:30PM
Staten Island 953     March 24, 2020 9:30PM
Unknown 7     March 24, 2020 9:30PM
Bronx 3924 1071   March 26, 2020 9:30PM
Brooklyn 5705 968   March 26, 2020 9:30PM
Manhattan 3907 729   March 26, 2020 9:30PM
Queens 7026 1659   March 26, 2020 9:30PM
Staten Island 1276 283   March 26, 2020 9:30PM
Unknown 35 10   March 26, 2020 9:30PM
Bronx 4880 1200 79 March 27, 2020 4:00PM
Brooklyn 7091 989 82 March 27, 2020 4:00PM
Manhattan 4627 709 55 March 27, 2020 4:00PM
Queens 8529 1810 124 March 27, 2020 4:00PM
Staten Island 1534 321 26 March 27, 2020 4:00PM
Unknown 36 10   March 27, 2020 4:00PM
Bronx 5352   110 March 28, 2020 10:00AM
Brooklyn 7789   118 March 28, 2020 10:00AM
Manhattan 5036   79 March 28, 2020 10:00AM
Queens 9228   174 March 28, 2020 10:00AM
Staten Island 1718   36 March 28, 2020 10:00AM
Unknown 35     March 28, 2020 10:00AM
Bronx 6250 1729 153 March 29, 2020 10:00AM
Brooklyn 8887 1617 168 March 29, 2020 10:00AM
Manhattan 5582 1074 94 March 29, 2020 10:00AM
Queens 10737 2536 219 March 29, 2020 10:00AM
Staten Island 1984 444 43 March 29, 2020 10:00AM
Unknown 35 10 1 March 29, 2020 10:00AM
Bronx 6925 1880 215 March 30, 2020 5:00PM
Brooklyn 10171 1661 216 March 30, 2020 5:00PM
Manhattan 6060 1075 119 March 30, 2020 5:00PM
Queens 12756 2650 305 March 30, 2020 5:00PM
Staten Island 2140 465 58 March 30, 2020 5:00PM
Unknown 35 10 1 March 30, 2020 5:00PM
Bronx 7815 2056 262 March 31, 2020 5:00PM
Brooklyn 11160 1901 261 March 31, 2020 5:00PM
Manhattan 6538 1130 129 March 31, 2020 5:00PM
Queens 13869 2954 376 March 31, 2020 5:00PM
Staten Island 2354 499 67 March 31, 2020 5:00PM
Unknown 35 9 1 March 31, 2020 5:00PM
NYCAgeRange Cases Hospitalized Deaths DateTime
0 to 17 323     March 23, 2020 9:00AM
18 to 44 5704     March 23, 2020 9:00AM
45 to 64 4069     March 23, 2020 9:00AM
65 to 74 1308     March 23, 2020 9:00AM
75 > 930     March 23, 2020 9:00AM
Unknown 5     March 23, 2020 9:00AM
0 to 17 374     March 24, 2020 10:00AM
18 to 44 6786     March 24, 2020 10:00AM
45 to 64 4906     March 24, 2020 10:00AM
65 to 74 1591     March 24, 2020 10:00AM
75 > 1109     March 24, 2020 10:00AM
Unknown 10     March 24, 2020 10:00AM
0 to 17 446     March 25, 2020 10:00AM
18 to 44 8880     March 25, 2020 10:00AM
45 to 64 6786     March 25, 2020 10:00AM
65 to 74 2226     March 25, 2020 10:00AM
75 > 1633     March 25, 2020 10:00AM
Unknown 40     March 25, 2020 10:00AM
0 to 17 495 46   March 26, 2020 5:00PM
18 to 44 10145 950   March 26, 2020 5:00PM
45 to 64 7869 1749   March 26, 2020 5:00PM
65 to 74 2627 946   March 26, 2020 5:00PM
75 > 1935 1029   March 26, 2020 5:00PM
Unknown 41 0   March 26, 2020 5:00PM
0 to 17 543 47 0 March 27, 2020 4:00PM
18 to 44 11617 971 16 March 27, 2020 4:00PM
45 to 64 9158 1886 78 March 27, 2020 4:00PM
65 to 74 3034 1032 90 March 27, 2020 4:00PM
75 > 2286 1103 182 March 27, 2020 4:00PM
Unknown 67 0   March 27, 2020 4:00PM
0 to 17 573   0 March 28, 2020 9:00AM
18 to 44 12590   22 March 28, 2020 9:00AM
45 to 64 10019   125 March 28, 2020 9:00AM
65 to 74 3354   120 March 28, 2020 9:00AM
75 > 2568   249 March 28, 2020 9:00AM
Unknown 54   1 March 28, 2020 9:00AM
0 to 17 619 67 0 March 29, 2020 10:00AM
18 to 44 14233 1459 33 March 29, 2020 10:00AM
45 to 64 11577 2765 162 March 29, 2020 10:00AM
65 to 74 3954 1499 159 March 29, 2020 10:00AM
75 > 3020 1620 324 March 29, 2020 10:00AM
Unknown 71 0   March 29, 2020 10:00AM
0 to 17 714 72 1 March 30, 2020 5:00PM
18 to 44 16028 1448 54 March 30, 2020 5:00PM
45 to 64 13344 2887 216 March 30, 2020 5:00PM
65 to 74 4496 1612 215 March 30, 2020 5:00PM
75 > 3410 1722 428 March 30, 2020 5:00PM
Unknown 95 0   March 30, 2020 5:00PM
0 to 17 757 74 1 March 31, 2020 5:00PM
18 to 44 17347 1532 67 March 31, 2020 5:00PM
45 to 64 14689 3195 259 March 31, 2020 5:00PM
65 to 74 5015 1826 255 March 31, 2020 5:00PM
75 > 3866 1922 514 March 31, 2020 5:00PM
Unknown 97 0   March 31, 2020 5:00PM
NYCAgeSex Cases Hospitalized Deaths DateTime
Female 3318     March 21, 2020 5:00PM
Male 4787     March 21, 2020 5:00PM
Unknown 10     March 21, 2020 5:00PM
Female 5255     March 23, 2020 9:00AM
Male 7097     March 23, 2020 9:00AM
Unknown 17     March 23, 2020 9:00AM
Female 6374     March 24, 2020 10:00AM
Male 8379     March 24, 2020 10:00AM
Unknown 23     March 24, 2020 10:00AM
Female 8655     March 25, 2020 10:00AM
Male 11325     March 25, 2020 10:00AM
Unknown 31     March 25, 2020 10:00AM
Female 10124 1918   March 26, 2020 5:00PM
Male 12948 2801   March 26, 2020 5:00PM
Unknown 40 1   March 26, 2020 5:00PM
Female 11250   151 March 27, 2020 10:00AM
Male 14279   215 March 27, 2020 10:00AM
Unknown 44     March 27, 2020 10:00AM
Female 12928   206 March 28, 2020 10:00AM
Male 16192   311 March 28, 2020 10:00AM
Unknown 38     March 28, 2020 10:00AM
Female 14837 3001 264 March 29, 2020 10:00AM
Male 18593 4408 414 March 29, 2020 10:00AM
Unknown 44 1   March 29, 2020 10:00AM
Female 16920 3130 334 March 30, 2020 5:00PM
Male 21120 4610 569 March 30, 2020 5:00PM
Unknown 47 1 1 March 30, 2020 5:00PM
Female 18677 3455 401 March 31, 2020 5:00PM
Male 23043 5093 694 March 31, 2020 5:00PM
Unknown 51 1 1 March 31, 2020 5:00PM

zip code data not updated

First off, thank you for providing this data. I started looking at the data just a few days ago, and I noticed that the test-by-zcta.csv file has been the same for the past 3 days (with a total number of cases standing at 94,499 for the city. Is this data being updated?

Probable Hospitalizations?

The "probable deaths" that you started publishing on the 14th are currently increasing NYC's total death toll by about 50%. Could we also say that there are probable hospitalizations that would similarly increase the total hospitalization count by around 50%? Or are those already factored into the published figure?

Recovery Data

First I want to thank you for this great effort of providing this data. It is very helpful and critical.

Them I just curious, are there any data about the recovered cases in NYC? Since the outbreak has been about one month long, are there any recoveries cases?

Thanks.

zip data vs cases file

I've noticed the total number of positives in the zip file (132,001) does not match the total number of cases in the cases-hosp-death file (129,786). Any reason for that? I would think they would be the same.

Revised Historical Data by Zipcode

Hi,

I was wondering if there is a repository with revised daily data for each zip code. Since there is a delay in reporting, the past files have stales numbers.

I am trying to plot the trend of new cases for each individual zip codes.

Thanks

NYC ER Visits/Admits for ILI and Pneumonia

Hey all - based on the line graphs here https://www1.nyc.gov/assets/doh/downloads/pdf/imm/covid-19-syndromic-surveillance-04082020-1.pdf

I was able to roughly compile total numbers for each age group and approximate totals - you can view those here https://github.com/briankoral713/NYC-COVID19-Data/blob/master/ER%20Visits-Admits%20NYC.pdf

Remember these are rough estimations based on the line graphs and don't reflect actual numbers. The trend is that overall visits are on a downtrend however the bulk of ER Visits and Admits reside with the elderly (65+) which has been steadily increasing.

The age group total population numbers are from https://www.baruch.cuny.edu/nycdata/population-geography/pop-demography.htm

All-cause mortality?

I'm combining these data with the NCHS mortality surveillance survey, but NCHS has only reported total all-cause mortality for NYC through 3/28. Could you add all-cause mortality to these data?

Discrepancy with Daily Data Summary: Total Cases

Hi, thanks so much for making all these data available. How is "diagnosis date" defined (i.e., from test administered vs results received)? I note that there is a considerable difference between the figures in case-hosp-death.csv and the daily differences between the nyc.gov Daily Data Summary: Total Cases pdfs. Wondering if this might be explained by case-hosp-death.csv using the testing date and Daily Data Summary: Total Cases pdfs using the results date. Any insight appreciated. Thanks again.

tests-by-zcta NA values

why does the current and previous versions of tests-by-zcta.csv have NA values in the first row? Is this an unincorporated area of NY?

Hospital Cases Discharged

Can the summary file including new cases, hospitalizations and deaths by date be enhanced to include a subset of the hospitalized column providing a count of cases that recovered and were released ?

So much fluctuation in cases, hospitalizations and deaths today

Hello - I track the numbers daily and keep them for review. The differences between the cases-hosp-death file currently posted and the one from yesterday is quite substantial. For example - March 20 has been hovering around 3600-3900 cases since I started tracking on March 31. However, in today's file - they are now at 2594. In total - the number of positive cases for the day nets to be 3420 more since yesterday - which seems directionally correct. But the day by day numbers are way different. Any ideas?

COVID-19 Deaths by NYC Zip Code

Is there any way to modify the tests-by-zcta file to include confirmed deaths by zip code? It would be helpful to know the confirmed death rate alongside the positive test rate. If not, where might I find this data (whether as a GitHub project or other form)? Thanks!

Thank you

As someone living in this city I very much appreciate having this data openly available. The landing page is also very well done. Thank you for doing this, it is very much appreciated.

Disparity in NYC deaths between NYC DOHMH, NY State, & Johns Hopkins

I had found that for the most part the three sources all more or less agreed with each other until around a week ago, where NYC total cases reported by NY State and Johns Hopkins began to grow at a faster rate than cases reported by NYC's DOHMH. However, starting yesterday evening I noticed that NY State and Johns Hopkins suddenly were reporting far more deaths than NYC, and the gap grew by this evening (2,738 vs 3,485). I do not believe ~750 people were lost in a gap of a day or a few hours. Does anyone know anything about whether they are using different standards in confirming the deaths as COVID-19 related, or have any other explanation?

Added to Open Source COVID-19

Thanks for your work to help the people in need! Your site has been added to the Open-Source-COVID-19 page, which collects open source projects related to COVID-19, including maps, data, news, api, analysis, medical and supply information, etc. Please share to anyone who might need the information in the list, or will possibly contribute to some of those projects. You are also welcome to recommend more projects.

http://open-source-covid-19.weileizeng.com/

Cheers!

Lack of metadata

There is currently a substantial lack of metadata regarding what the numbers in this database actually represent. For example, what is the denominator for DEATH_RATE in by-age.csv? Is it confirmed positive cases in that age bracket, or is it total population?

Raw Counts in a Choropleth Map

Publishing the raw counts in a choropleth map is bad practice, since the data isn't normalized. The choropleths should reflect a rate.

death counts off by 5?

thank you for your service

summary.csv reports 4260
case-hosp-death.csv reports 4255

$ date ; date --utc ; curl -s https://raw.githubusercontent.com/nychealth/coronavirus-data/master/summary.csv | grep Deaths ; echo ; curl -s https://raw.githubusercontent.com/nychealth/coronavirus-data/master/case-hosp-death.csv | dos2unix | cut -d, -f4 | grep -v -e "DEATH_COUNT" -e "^\s*$" | paste -sd+ - ; echo ; curl -s https://raw.githubusercontent.com/nychealth/coronavirus-data/master/case-hosp-death.csv | dos2unix | cut -d, -f4 | grep -v -e "DEATH_COUNT" -e "^\s*$" | paste -sd+ - | bc
Thu Apr  9 08:44:49 EDT 2020
Thu Apr  9 12:44:49 UTC 2020
Deaths:,4260

1+1+2+6+9+7+21+25+44+36+45+80+89+110+167+179+230+247+266+319+337+361+362+336+379+327+235+34

4255

(where's my low-priority button)

Age-specific ZCTA-level data?

Thank you all for providing this data. I'm wondering if we can obtain the zipcode-level data for age groups. The current zipcode data are all cases.

Neighborhood & Race/ethnicity data

Thanks for creating this platform and regularly sharing the updated counts.

I would love to see these data presented by zip code in raw form so that it can linked to existing govt datasets on neighborhood level factors. Currently, I only see tests and positive frequencies by zcta. Are you planning to release hospitalizations, deaths and recovery by zcta as well?

Also, is there any plan to release these data by race/ethnic? For social epidemiologist who examine disparities, this would be crucial information to understand spread and burden in communities of color.

Stay safe everyone.

Consider appending rows to datasets instead of overwriting the entire table

It would be useful if each update was just a new row or rows appended to each table so users of the data can track changes over time.

Also, using ISO8601 timestamps for the "as of" value would also be helpful for machines reading the dataset.

I'm curious, is there a data standard you are complying with that wants the data in the shape it's currently published in?

For summary.csv this would look like:

as_of,cases,total_hospitalized,deaths
2019-03-25T17:28,20011,3922,280
2019-03-26T17:30,23112,4712,364
2019-03-27T17:30,26697,5039,450
2019-03-30T17:30,38087,7741,914
2019-03-31T17:30,41771,8549,1096

Is there a weekly dataset available?

It looks like the data is cumulative, is there a dataset available that shows the number of cases each week for the same attributes (zip code, race, etc)?

Total Deaths to Total Hospitalizations Ratio

First off, thanks so much for this! This data really provides insight into what is actually going on here in the epicenter, so thanks for all of your work in keeping this up to date.

I know hospitalization data is more or less estimated from ER visits/admits. What's interesting is that when you take a ratio of total deaths to total hospitalizations, converted to a percentage, the ratio always increases by a near perfect percentage point every day

...is this just coincidence?? Or is there perhaps a standard time frame from when a patient is admitted into the hospital to when they either succumb or recover?? (The inverse of this ratio - subtracted from 100% - would give you a "survival rate" of sorts, or the percentage of those hospitalized removed from total deaths.)

Or does it have to do with less people being hospitalized vs. succumbing day to day?? The 1% increase is just intriguing.

Thanks again!
-Brian

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.