Daily publication of cleaned and tidy Texas county-level Covid-19 statistics, as published by Texas DSHS.
Original data sourced from https://www.dshs.state.tx.us/coronavirus/additionaldata/; ugly excel, beware.
Tidy data can be accessed here:
- Daily Cases: https://raw.githubusercontent.com/nikolkj/Texas-Covid/master/daily-county-data/Texas-County-Cases.csv
- Daily Fatalities: https://raw.githubusercontent.com/nikolkj/Texas-Covid/master/daily-county-data/Texas-County-Deaths.csv
- Daily Tests: https://raw.githubusercontent.com/nikolkj/Texas-Covid/master/daily-county-data/Texas-County-Tests.csv
- Combined Daily File: https://raw.githubusercontent.com/nikolkj/Texas-Covid/master/daily-county-data/Texas-County-Main.csv
Data has been cleaned at put in a long format for easy visualization and modeling.
All data-tables have the following fields:
- "County": Texas county name
- "Date": Date associated with observation, YYYY-MM-DD format.
- "DailyCount": Aggregate measure, to-date, as published by DSHS.
- "DailyDelta": Calculated daily measure (
$x_{t} - x_{t-1}$ ) to get e.g. new cases for a given day - "LastUpdateDate": Date when the data was pulled.
DSHS updates data everyday around ~9:30am CST, tidy-data is then updated at 10:30am CST.
Read data from github link.
dat = read_csv(file = "https://raw.githubusercontent.com/nikolkj/Texas-Covid/master/daily-county-data/Texas-County-Cases.csv", col_names = TRUE, progress = FALSE)
## Parsed with column specification:
## cols(
## County = col_character(),
## Date = col_date(format = ""),
## DailyCount = col_double(),
## DailyDelta = col_double(),
## LastUpdateDate = col_date(format = "")
## )
Examine some data sample.
dat %>%
filter(Date > "2020-04-15", DailyCount > 100) %>%
sample_n(15) %>%
kable() %>% kableExtra::kable_styling(kable_input = ., bootstrap_options = c("striped", "hover"))
County | Date | DailyCount | DailyDelta | LastUpdateDate |
---|---|---|---|---|
Collin | 2020-05-21 | 1090 | 17 | 2020-06-10 |
Kaufman | 2020-05-10 | 116 | 0 | 2020-06-10 |
Grayson | 2020-06-03 | 350 | 8 | 2020-06-10 |
Hidalgo | 2020-05-07 | 359 | 6 | 2020-06-10 |
Montgomery | 2020-06-07 | 1064 | 0 | 2020-06-10 |
Hardin | 2020-05-23 | 136 | 11 | 2020-06-10 |
Potter | 2020-05-21 | 2196 | 3 | 2020-06-10 |
Bowie | 2020-06-05 | 301 | 5 | 2020-06-10 |
Randall | 2020-05-17 | 602 | 9 | 2020-06-10 |
Bell | 2020-05-15 | 242 | 5 | 2020-06-10 |
Taylor | 2020-05-02 | 327 | 8 | 2020-06-10 |
Harris | 2020-06-02 | 12664 | 388 | 2020-06-10 |
Hays | 2020-06-07 | 385 | 0 | 2020-06-10 |
Hardin | 2020-05-30 | 138 | 0 | 2020-06-10 |
Coryell | 2020-05-15 | 221 | 1 | 2020-06-10 |
Find when new cases peaked for each county, take top 10.
dat %>% group_by(County) %>%
filter(DailyDelta == max(DailyDelta, na.rm = T)) %>%
rename(PeakDate = Date, PeakCases = DailyDelta) %>%
arrange(desc(PeakCases)) %>% head(n = 10) %>%
select(County, PeakDate, PeakCases) %>%
kable() %>% kableExtra::kable_styling(kable_input = ., bootstrap_options = c("striped", "hover"), full_width = FALSE, position = "left")
County | PeakDate | PeakCases |
---|---|---|
Harris | 2020-04-10 | 706 |
Potter | 2020-05-16 | 618 |
Walker | 2020-05-31 | 510 |
Tarrant | 2020-05-11 | 485 |
Dallas | 2020-05-22 | 369 |
Jones | 2020-05-28 | 222 |
El Paso | 2020-06-04 | 197 |
Bexar | 2020-05-31 | 189 |
Moore | 2020-06-02 | 149 |
Medina | 2020-06-06 | 138 |
dat %>%
filter(!is.na(DailyDelta),
County %in% c("Harris","Dallas","Bexar","Walker")) %>%
mutate(County = factor(County)) %>%
select(County, Date, DailyDelta) %>%
ggplot(data = ., mapping = aes(x = Date, y = DailyDelta, col = County)) +
geom_line() +
ggtitle("New Cases", subtitle = "For select counties") +
ylab("") + xlab("") +
scale_x_date(labels = scales::date_format(format = "%m/%d")) +
ggthemes::theme_fivethirtyeight()
dat %>%
filter(County %in% c("Harris","Dallas","Bexar","Walker"),
DailyCount > 0,
Date > "2020-03-15") %>%
mutate(County = factor(County)) %>%
select(County, Date, DailyCount) %>%
ggplot(data = ., mapping = aes(x = Date, y = DailyCount, col = County)) +
geom_line() +
ggtitle("Total Cases", subtitle = "For select counties") +
ylab("") + xlab("") +
scale_y_continuous(na.value = 0, trans = "log10", labels = scales::number_format(big.mark = ",", accuracy = 1)) +
scale_x_date(labels = scales::date_format(format = "%m/%d")) +
ggthemes::theme_fivethirtyeight()