Using CartoDB to analyze and map New York demographic and crime data
Some draft notes and documentation on using CartoDB to do some data visualization and GIS, including how to collect that data from the source, and how to use SQL to wrangle it for analytical and visualization purposes.
- The Maps section contains example maps, as well as the SQL needed to create their datasets.
- The Datasets section documents where on the Web I found the data as well as the actual file downloaded.
- The Join/lookup tables section shows the SQL for deriving the intermediary datasets needed to link the official datasets, e.g. how the pretracts_lookup is used to associate each Census tract to a NYPD precinct, which allows for the calculation of Census population by NYPD precinct.
Maps (so far)
Census total population by NYPD precinct
Dataset and SQL
You can see the underlying dataset here: nypd_precincts_and_2010_census_pop
The SQL to create it from my source datasets:
SELECT
precinct_id,
precinct_pop,
cartodb_id,
the_geom_webmercator
FROM nypd_precincts
INNER JOIN
(SELECT
precinct_id,
SUM(total_pop) AS precinct_pop
FROM
pretracts_lookup
INNER JOIN census_pop
ON
pretracts_lookup.tract_id = census_pop.census_tract_code
AND
pretracts_lookup.borocode = census_pop.dcp_borough_code
GROUP BY precinct_id)
AS tx
ON tx.precinct_id = nypd_precincts.precinct;
Datasets
- census_pop - contains 2010 Census population by race data
- census_tracts - contains the shapes of the 2010 Census tracts
- nypd_precincts
- nypd_crimedata - NYPD's historical crime statistics at the precinct level
Join/lookup tables
pretracts_lookup
Requires a spatial join of the Census tracts to each precinct -- not knowing a lot about PostGIS but also knowing how annoying it is to define containment/intersection of real-life boundaries in general, I opted to join the tracts to precincts on the basis of whether the tract's centroid was within a precinct's boundary.
SELECT
borocode,
ct2010 AS tract_id,
precinct AS precinct_id
FROM census_tracts
INNER JOIN nypd_precincts
ON ST_WITHIN(
ST_Centroid(census_tracts.the_geom),
nypd_precincts.the_geom);