- Instructor(s):
- Mjumbe Poe, [email protected]
- Jingyi Li, [email protected]
- Schedule: Wednesdays, 10:15-1:15
- Room: Meyerson Hall, B2
- Office Hours:
Description | Schedule | Objectives | Format | Assignments | Grading | Academic Integrity
In this course you will learn how to collect, store, wrangle and display cartographic data in a cloud-based setting. You will learn reproducible approaches for pulling spatial data from APIs with emphasis on PostGIS, Airflow, and BigQuery; to wrangle these data in Python and/or JavaScript; and visualize in various platforms including Carto and Metabase. You will build your own APIs and develop your own custom web applications. This course is the second in a progression toward building web-based systems using geospatial data, and expands on the Fall course in JavaScript Programming for Planning.
There will be a strong emphasis on open source tools although we will also strongly rely on proprietary cloud-based infrastructure providers. Besides the technologies used in class, we will be using large and sometimes messy data from which we will be deriving insights from how people inhabit, move around in, and affect their environments. We will be working with datasets published by a variety of organizations, from the local to the national level, across governments, non-profits, and private corporations.
The class is divided into four modules:
- Spatial Analytics with Databases -- learn the basics of SQL and PostGIS for exploring datasets and answering questions of your data
- Scripting with Cloud Services -- building basic scripts with queries and interacting with web services/APIs programmatically
- Data Pipelining -- use Python or JavaScript and SQL to automate extracting, transforming, and loading data into a data warehouse
- Building Interfaces -- build a dashboard and APIs to answer operational questions using dynamic representations data
(subject to adapt to the flow of the semester)
W# | Date | Topic | Notes |
---|---|---|---|
1 | Jan 18 | Introduction | Slides & More |
2 | Jan 25 | Analytics: Spatial Databases & Querying Geospatial Data | Slides & More |
3 | Feb 1 | Analytics: Joins & More Geospatial SQL Operations | Slides & More |
4 | Feb 8 | Analytics: Efficient Queries | |
5 | Feb 15 | -(OVERFLOW -- We'll introduce BigQuery and some out-of-the-box visualization options here, time permitting )- | |
6 | Feb 22 | Scripting: Working with Data from Files and Web Services | |
7 | Mar 1 | Scripting: Warehousing Data | |
- | Mar 8 | -(SPRING BREAK)- | - |
8 | Mar 15 | Pipelines: Modeling and Transforming Geospatial Data | |
9 | Mar 22 | Pipelines: Implementing ETL in Cloud Services | |
10 | Mar 29 | Interfaces: Open Source Business Intelligence Tools | |
11 | Apr 5 | Interfaces: Rendering Data with Custom Applications (APIs and Templates) | |
12 | Apr 12 | ||
13 | Apr 19 | ||
14 | Apr 26 | ||
15 | May 3? | Final Project wrap-up |
Students will learn how to use professional tools and cloud-based services to automate the process of preparing data for use in organizational decision making. By the end of this course students should be able to:
- Use SQL to answer questions with the data in a database
- Set up and use tools for exploring and visualizing data in a database
- Use web services to create beautiful and meaningful data products
- Use Python or JavaScript to automate the process of extracting, transforming, and loading data
- Do all of these things using professional software development tools and methods
- The course will be divided between lectures during the first half of class sessions, and exercises/labs in the second half.
- Lab sessions will be interactive, usually with some deliverable expected by the end that will make up part of the participation portion of a student's grade.
- Students will have the option of attending the lecture and lab sessions in person in the classroom, or virtually through Zoom.
There will be assignments with some lectures. Other lectures will have recommended readings and suggested exercises to give additional practice. Labs will often have exercises that are intended to be completed in class or, in some exceptional cases, soon after.
The final project will be the culmination of all of the skills learned in the class. Students will build an automatically updating data product, powered by a cloud-hosted data pipeline, that can be used to make some operational decisions. Expectations are that the products will address some socially relevant domain, and will make use of multiple visualizations of data (static or interactive charts and maps, formatted statistics, templated prose, etc.).
- Assignments: 25%
- Participation: 25%
- Final Project Proposal: 10%
- Final Project: 40%
Some of the data we are using in this course is proprietary and cannot be openly disseminated. In these cases students will be provided with access to private class repositories of datasets. Derivative insights based on these datasets can be openly shared, especially as part of final project work.
Students are expected to comply with and be familiar with Penn's Code of Academic Integrity.
When writing software, it is common to copy and paste small code snippets from online sources without citation. For larger samples, it is expected that the source is cited in the code base. In case there is uncertainty, speak with your instructor for guidance.