Code Monkey home page Code Monkey logo

opendatadc's Introduction

DC Open Data

This repository has some code I'm using to work with DC's Open Data portal. Currently, this includes:

  1. R scripts used to pull data from the API, clean up some column names, and then push this data to a local postgres database
  2. a dbt project to clean up data in the database
  3. any SQL used to generate schemas, queries, etc. from the local postgres database

Motivation

Working with data from the API can be a bit annoying/time-consuming. Many of the datasets available have hundreds of thousands of records, and the API has a rate limit of 1000 records per request. My solution to this was to spin up a local postgres server, collect the data + do a bit of name cleanup in R, populate the postgres database, then use dbt to run some tests against the raw data + clean it up for analysis.

With this data now being available to me via SQL, it makes doing analysis quite a bit easier than having to tap the API every time I want a data source. In addition, using the offset in the API functions I wrote, I can append new data to the tables easily.

Example of Use

In the terminal, I can spin up my postgres dataset using brew:

brew services start postgresql@14

Now, instead of using the API directly, I can access information via SQL. This is quite powerful combined with R, since I can query the dataset directly and do things like plotting:

library(ggplot2)

source("utils.R")

## connect to the local postgresDB
conn <- connect_to_db()

## query the DB
conn |> 
  query_db(
    "
    SELECT 
      EXTRACT(YEAR FROM report_date)::Int AS report_year,
      COUNT(*)::INT AS n_crashes
    FROM dc_open_data_dev.raw__crashes
    WHERE report_date IS NOT NULL
      AND EXTRACT(YEAR FROM report_date) < 2023
        AND EXTRACT(YEAR FROM report_date) >= 2010
    GROUP BY EXTRACT(YEAR FROM report_date)
    ORDER BY report_year
    "
  ) |> 
## create a plot from the queried data
  ggplot() +
  aes(x = report_year, y = n_crashes) +
  geom_line(size = 1.5, color = "deepskyblue4") +
  labs(
    x = "\nYear Reported", 
    y = "Num. Reported Crashes", 
    title = "Crashes in Washington, D.C.", 
    subtitle = "sourced from DC's Open Data Portal"
  ) +
  scale_x_continuous(labels = round) +
  theme_minimal() +
  theme(
    panel.grid.major = element_blank(), 
    panel.grid.minor = element_blank(),
    panel.grid.major.y = element_line(color = "grey95", linetype = "dashed"),
    plot.title = element_text(size = 24, face = "bold"), 
    plot.subtitle = element_text(face = "italic")
  )

image

opendatadc's People

Contributors

mistermichaelll avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.