Code Monkey home page Code Monkey logo

cmaq-exposure-api's Introduction

cmaq-exposure-api

The CMAQ Exposure API is a RESTful data service implemented in Swagger using OpenAPI 2.0 standards and provides environmental CMAQ exposures data based on GeoCodes (latitude, longitude) and dates.

TL;DR

cd cmaq-exposure-api
./run-cmaq-api.sh

Open http://localhost:5000/v1/ui/#/default in your browser when the script completes.

Development Environment

Preliminary assumptions

  • Docker and docker-compose are available on the host

    • Generally executed using a bash script that performs docker or docker-compose calls
  • Python 3 is available on the host

    • Generally executed using virtualenv in the manner:

       $ virtualenv -p /PATH_TO/python3 venv
       $ source venv/bin/activate
       (venv)$ pip install -r requirements.txt
       (venv)$ python SOMETHING.py [/PATH_TO/SOME_FILE]
      

Repository structure

The repository has been broken into multiple sections based on the infrastructure, application or task being addressed. Each section is briefly described here with a more detailed overview as a README.md file at each primary directory level.

PostgreSQL 9.6 / PostGIS 2.3:

  • Docker-compose based development database
  • See README.md in postgres96/

Sample Data:

  • Initialization scripts for PostgreSQL cmaq database and tables
  • Representive CMAQ data in SQL format
  • See README.md in data-sample/

Data Tools:

  • pre-ingest: checks to validate CMAQ source data against the PostgreSQL database schema
  • ingest: scripts for reading the CMAQ source data into the PostgreSQL database
  • postgres-functions: indexes and function generation tools
  • post-ingest: scripts for updating the aggregate values of newly ingested data
  • See README.md in data-tools/

Server

  • Python3/Flask based API server
  • Docker implementation of the API server
  • See README.md in server/

Client

  • TODO

Swagger Editor

  • TODO

See INSTALL.md for full details.

About CMAQ / CMAS

CMAQ is an active open-source development project of the U.S. EPA that consists of a suite of programs for conducting air quality model simulations. CMAQ is supported and distributed by the Community Modeling and Analysis System (CMAS) Center.

CMAQ combines current knowledge in atmospheric science and air quality modeling with multi-processor computing techniques in an open-source framework to deliver fast, technically sound estimates of ozone, particulates, toxics, and acid deposition.

cmaq-exposure-api's People

Contributors

lstillwe avatar mjstealey avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cmaq-exposure-api's Issues

provide bravado client example

Though stand alone clients can be generated using swagger codegen, Python users find the bravado package to be useful for generating the API client on the fly.

script for updating common_name attribute of expsoure_list table

Create a script to update the common_name attribute for CMAQ exposure variables.

  • The update-cmaq-tables.py script populates the exposure_list table based on discovered attributes in the CMAQ source data files.
  • It does not however have the ability to update the common_name attribute as that will be manually defined by @arunacs for the variables of interest.

Example of existing exposure_list table:

 id |   type    |           description           | units | common_name |  utc_min_date_time  |  utc_max_date_time  |     resolution      | aggregation
----+-----------+---------------------------------+-------+-------------+---------------------+---------------------+---------------------+-------------
  1 | ald2      | 1000.0*ALD2[1]                  | ppbV  |             | 2011-01-01 01:00:00 | 2011-02-01 01:00:00 | hour;day;7day;14day | max;avg
  2 | aldx      | 1000.0*ALDX[1]                  | ppbV  |             | 2011-01-01 01:00:00 | 2011-02-01 01:00:00 | hour;day;7day;14day | max;avg
  3 | benzene   | 1000.0*BENZENE[1]               | ppbV  |             | 2011-01-01 01:00:00 | 2011-02-01 01:00:00 | hour;day;7day;14day | max;avg
  4 | co        | 1000.0*CO[1]                    | ppbV  |             | 2011-01-01 01:00:00 | 2011-02-01 01:00:00 | hour;day;7day;14day | max;avg
  5 | eth       | 1000.0*ETH[1]                   | ppbV  |             | 2011-01-01 01:00:00 | 2011-02-01 01:00:00 | hour;day;7day;14day | max;avg
  6 | etha      | 1000.0*ETHA[1]                  | ppbV  |             | 2011-01-01 01:00:00 | 2011-02-01 01:00:00 | hour;day;7day;14day | max;avg
  7 | form      | 1000.0*FORM[1]                  | ppbV  |             | 2011-01-01 01:00:00 | 2011-02-01 01:00:00 | hour;day;7day;14day | max;avg
  8 | h2o2      | 1000.0*H2O2[1]                  | ppbV  |             | 2011-01-01 01:00:00 | 2011-02-01 01:00:00 | hour;day;7day;14day | max;avg
  9 | hno3      | 1000.0*HNO3[1]                  | ppbV  |             | 2011-01-01 01:00:00 | 2011-02-01 01:00:00 | hour;day;7day;14day | max;avg
 10 | hno3_ugm3 | 1000.0*(HNO3[1]*2.1756*DENS[3]) | ug/m3 |             | 2011-01-01 01:00:00 | 2011-02-01 01:00:00 | hour;day;7day;14day | max;avg
(10 rows)

quality metrics retrieval flag - default to false

Allow user to determine if they want to retrieve quality metrics for CMAQ variables that have them.

  • Add new parameter to /values path named include_quality_metric and default to false

Per discussion with @hyi re: hackathon experience with @arunacs

  • There was interest in retrieving CMAQ data in yearly increments for Ozone, however the existing quality metric implementation slows the query down to the point of not being effective at that scale.
  • Issue #18 has been created to address speeding up quality metrics queries

refactor CMAQ ingest to be more efficient / faster

  • Existing implementation works well for small sets, but takes weeks (or longer) to ingest data formatted at the 299 x 459 grid size

Example:

Processes started to ingest for 2011 CMAQ data:

25124 Sat Dec 23 20:24:15 2017 ./venv/bin/python ./ingest-cmaq-file.py /projects/datatrans/CMAQ/2011/raw/CCTM_CMAQ_v51_Release_Oct23_NoDust_
25268 Sat Dec 23 20:40:27 2017 ./venv/bin/python ./ingest-cmaq-file.py /projects/datatrans/CMAQ/2011/raw/CCTM_CMAQ_v51_Release_Oct23_NoDust_
25269 Sat Dec 23 20:40:27 2017 ./venv/bin/python ./ingest-cmaq-file.py /projects/datatrans/CMAQ/2011/raw/CCTM_CMAQ_v51_Release_Oct23_NoDust_
25270 Sat Dec 23 20:40:27 2017 ./venv/bin/python ./ingest-cmaq-file.py /projects/datatrans/CMAQ/2011/raw/CCTM_CMAQ_v51_Release_Oct23_NoDust_
25271 Sat Dec 23 20:40:27 2017 ./venv/bin/python ./ingest-cmaq-file.py /projects/datatrans/CMAQ/2011/raw/CCTM_CMAQ_v51_Release_Oct23_NoDust_
25272 Sat Dec 23 20:40:27 2017 ./venv/bin/python ./ingest-cmaq-file.py /projects/datatrans/CMAQ/2011/raw/CCTM_CMAQ_v51_Release_Oct23_NoDust_
25273 Sat Dec 23 20:40:27 2017 ./venv/bin/python ./ingest-cmaq-file.py /projects/datatrans/CMAQ/2011/raw/CCTM_CMAQ_v51_Release_Oct23_NoDust_
25274 Sat Dec 23 20:40:27 2017 ./venv/bin/python ./ingest-cmaq-file.py /projects/datatrans/CMAQ/2011/raw/CCTM_CMAQ_v51_Release_Oct23_NoDust_
25275 Sat Dec 23 20:40:27 2017 ./venv/bin/python ./ingest-cmaq-file.py /projects/datatrans/CMAQ/2011/raw/CCTM_CMAQ_v51_Release_Oct23_NoDust_
25276 Sat Dec 23 20:40:27 2017 ./venv/bin/python ./ingest-cmaq-file.py /projects/datatrans/CMAQ/2011/raw/CCTM_CMAQ_v51_Release_Oct23_NoDust_
25277 Sat Dec 23 20:40:27 2017 ./venv/bin/python ./ingest-cmaq-file.py /projects/datatrans/CMAQ/2011/raw/CCTM_CMAQ_v51_Release_Oct23_NoDust_
25278 Sat Dec 23 20:40:27 2017 ./venv/bin/python ./ingest-cmaq-file.py /projects/datatrans/CMAQ/2011/raw/CCTM_CMAQ_v51_Release_Oct23_NoDust_

As of 2018-01-16, the most populated set is January, 2011 with 90 of 459 columns completed. The rest of the months are between 21 and 24 of 459 completed.

psql (9.6.6)
Type "help" for help.

cmaq=# select utc_date_time::date, max(row), max(col) from exposure_data where utc_date_time::date >= '2011-01-01' group by utc_date_time::date order by utc_date_time::date;
 utc_date_time | max | max
---------------+-----+-----
 2011-01-01    | 299 |  90
 2011-01-02    | 299 |  90
 2011-01-03    | 299 |  90
 2011-01-04    | 299 |  90
 2011-01-05    | 299 |  90
 2011-01-06    | 299 |  90
 2011-01-07    | 299 |  90
 2011-01-08    | 299 |  90
 2011-01-09    | 299 |  90
 2011-01-10    | 299 |  90
 2011-01-11    | 299 |  90
 2011-01-12    | 299 |  90
 2011-01-13    | 299 |  90
 2011-01-14    | 299 |  90
 2011-01-15    | 299 |  90
 2011-01-16    | 299 |  90
 2011-01-17    | 299 |  90
 2011-01-18    | 299 |  90
 2011-01-19    | 299 |  90
 2011-01-20    | 299 |  90
 2011-01-21    | 299 |  90
 2011-01-22    | 299 |  90
 2011-01-23    | 299 |  90
 2011-01-24    | 299 |  90
 2011-01-25    | 299 |  90
 2011-01-26    | 299 |  90
 2011-01-27    | 299 |  90
 2011-01-28    | 299 |  90
 2011-01-29    | 299 |  90
 2011-01-30    | 299 |  90
 2011-01-31    | 299 |  90
 2011-02-01    | 299 |  90
 2011-02-02    | 299 |  24
 2011-02-03    | 299 |  24
 2011-02-04    | 299 |  24
 2011-02-05    | 299 |  24
 2011-02-06    | 299 |  24
 2011-02-07    | 299 |  24
...
 2011-12-22    | 299 |  21
 2011-12-23    | 299 |  21
 2011-12-24    | 299 |  21
 2011-12-25    | 299 |  21
 2011-12-26    | 299 |  21
 2011-12-27    | 299 |  21
 2011-12-28    | 299 |  21
 2011-12-29    | 299 |  21
 2011-12-30    | 299 |  21
 2011-12-31    | 299 |  21
 2012-01-01    | 299 |  21
(366 rows)

Use outer join to add quality metrics data instead of loop

Currently the quality metrics data is queried separately and added to the CMAQ values data by doing a utc_date_time comparison in a loop.

This is inefficient and should be done as an outer join to the CMAQ value query itself.

Per discussion with @hyi regarding experience with @arunacs at hackathon. Existing implementation is too slow for queries at year long range.

generate NetCDF to PostgreSQL scripts

Bypass the intermediate CSV stage and go directly from the CMAQ NetCDF source files to PostgreSQL database tables.

  • option for batch operation
  • python based
  • should not attempt to modify schema (use pre-ingest scripts instead)

update server documentation

  • document server section under server/
  • add server refs to README.md at main level
  • add full workflow doc using existing references
    • local development
    • production deployment

generate PostgreSQL functions for pre-calculating exposure aggregates

Majority of queries will be interested in daily, 7 day or 14 day aggregates for the CMAQ hourly data.

  • create function to look for non-calculated aggregates and then generate values for them
  • function should scope the size of the transaction to be runnable on a modest system
  • postgres functions do not commit until the entire transaction is completed and any new functions should keep this in mind

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.