Code Monkey home page Code Monkey logo

trace's Introduction

CarbonPlan monogram.

carbonplan / trace

working repo for carbonplan's climate trace project

CI License: MIT

This repository includes example Jupyter notebooks and other utilities for a collaborative project CarbonPlan is working on involving tracking emissions related to biomass losses from forests.

This project is a work in progress. Nothing here is final or complete.

We have completed the scripts and notebooks for delivery of version 0 of the dataset (carbonplan_trace/v0) which largely reproduces and extends work by Zarin et al (2016) as hosted on the Global Forest Watch platform.

Input datasets include:

  • aboveground biomass for year 2000 (Zarin et al., 2016)
  • binary masks of tree cover loss year for 2001-2020 (Hansen et al (2013))
  • Suomi NPP (VIIRS) Fire Masks for 2011-2020 (Schroeder et al., 2014)
  • country boundary shapefile from the Database of Global Administrative Areas (GADM) version 3.6. Note: Geopolitical boundaries that have changed over the period of record will be tagged to the static country designation as defined in GADM v3.6.

Some tips for reproducing this effort:

  • The scripts/aggregate_emissions.v0.py script can be run to reproduce both the 3 km global emissions raster dataset and the country-average estimates. As a warning, depending on the size of the machine you're running on, you might encounter memory issues when dealing with the 30m datasets. For that reason, we opted to process the 30m tiles in serial. If you are struggling, you might want to check to ensure that your cluster isn't getting overloaded.
  • We recommend using the sample notebook notebooks/blogpost_sample_notebook.ipynb as a starting point to introduce yourself to the structure of the data and how to work with a high resolution global product. We emphasize that the 3km product may be sufficient for some users.

license

All the code in this repository is MIT-licensed. When possible, the data used by this project is licensed using the CC-BY-4.0 license. We include attribution and additional license information for third party datasets, and we request that you also maintain that attribution if using this data.

about us

CarbonPlan is a nonprofit organization that uses data and science for climate action. We aim to improve the transparency and scientific integrity of climate solutions with open data and tools. Find out more at carbonplan.org or get in touch by opening an issue or sending us an email.

trace's People

Contributors

dependabot[bot] avatar freeman-lab avatar katamartin avatar maxrjones avatar norlandrhagen avatar orianac avatar pre-commit-ci[bot] avatar tcchiao avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

trace's Issues

consolidate tile utilities

@tcchiao and I have both been writing some useful utilities for parse 10x10-degree tile ids. We should consider consolidating these into carbonplan_trace.tiles.

30m data export

We're interested in testing a 30m web map. That'll require generating the pyramid starting from a much higher resolution version of the data (either 30m, or something slightly coarser). We can start experimenting with this and document any challenges here (or over on ndpyramid).

Running list of TODOs

For MVP for Washington:

  • basic form of training dataset:
    • All glas shots translated to biomass using one allometric equation (Cindy) [done]
    • Look up sampling strategy of GLAS and allometric equation assumptions wrt leaf conditions (Cindy/Ori) [done]
    • Calculate seasonal average for each year from Landsat with spatially continuous map for WA (Ori + Joe) (relatedly, decide on Landsat data structure) (snap to a uniform Hansen 30m grid x annually)
    • Extract Landsat variables to use into a tabular format (all raw bands)
  • set up ML model for training (Cindy)
    • random forest + XGBoost!
    • set up inference function
  • Set up inference inputs
    • extract the same landsat variables into tabular format for all of washington
  • Plotting function from ML model output (altair)(Ori) (lat/lon/time)
    • spatial maps
    • time series
  • Set up validation dataset
    • Find 4 well-respected datasets

To expand to global:

  • Transforming Harris et al spreadsheet into python
    • Mask of column 2 (ecoregion + NLCD) -> allometric equation
    • allometric equation = dictionary of functions
    • height metrics = another dictionary of functions [done]
    • parameter to indicate whether to preprocess (whether input is smooth or raw)

Improvements by April:
GLAS/biomass:

  • apply glas filtering based on Harris et al (Cindy) [done]
  • double check how GLAS elevation should be calculated from GLAH14 data
  • decide whether we should use smoothed or raw wf to make height metric calculations
  • Double check terrain calculations by reading Duncanson et al more closely
  • potentially change the raw extracted glas data into the original variable name
  • interpolate between bins (currently at 15cm intervals)
  • double check that compression ratio does not change during the valid signal part (between sig beg and sig end)
  • Figure out which allometric equations can be used for leaf off conditionsAllometric equations are trained predominantly upon leaf-on conditions, so we should determine whether estimates for leaf-off conditions are valid. This is relevant for our reporting/updating interval- proposal: update bi-annually after the end of the growing season in each hemisphere (September and March(?)).

Landsat

  • Masking clouds (potentially via https://github.com/ubarsc/python-fmask or potentially using *_BQA.TIF files in LANDSAT archive
  • Smoothing LANDSAT images using CCDC
  • Grabbing multiple LANDSAT pixels for each GLAS record? GLAS has 70 m diameter and LANDSAT is 30m so could use 4 LANDSAT? Bounding box of all LANDSAT pixels?

ML model

  • Training different model for each ecoregion
  • Incorporating a climate dataset into the training of the model (Others have used Worldclim, though we could use Terraclim)
  • out of sample validation

v0 and v1 data cleanup

We've been pushing hard on both v0 and v1 lately. As these pushes wrap up, we've likely left quite of bit of data on s3 and gcs that needs to be cleaned up.

@orianac and @tcchiao, can you each provide a short writeup below outlining the data we want to keep/delete from v0 and v1 respectively?

add methods docs to this repo

We currently have two draft google docs that we've been using to summarize the technical methods. We should move these documents to this repo in order to properly version control them and to document the methods deployed here.

Recreate V0 and do some QA/QC

This would be a three step process:

  • run the global_emission.v0.py script to recreate all emission tiles (zarr) (~200 tiles)
  • run the aggregate_emissions.v0.py script to create global coverage gridded (1 zarr file for sharing) and country-total timeseries (1 json file)
  • Visualization notebook (the gridded analysis is so nicely done through the website so this notebook might just focus on the country roll-ups)

data exports for v1 explainer

We had a very productive call today discussing the data needs for the v1 explainer. This issue documents the outputs of that call:

Figures 1 (global maps) a zarr group with a 2d "biomass" variable (coarsened to 0.5 deg) and 1d "lat" and "long" variables as coordinates (~700x1 and ~300x1)

Figure 5 (differences maps) a zarr group with a 3d "difference" variable (coarsened to 0.5 deg) with the three difference maps we want to show and 1d "lat" and "long" variables as coordinates (~700x1 and ~300x1)

GeoJSON for the "study domain"

R2 and other stats in a JSON

Alternatives and/or additions to Landsat collection 2 for biomass prediction

For the V1 data produced here, we've used Landsat 7 ETM Analysis-Ready-Data (Collection 2, Level 2). Along the way, we've had plenty of discussions about the potential to pull in alternate and/or additional datasets for biomass model training/inference. I'd like to use this issue to enumerate alternatives and discuss their potential inclusion in future work.

I'm specifically hoping to see some discussion the following datasets (please add to this list):

pinging @tcchiao, @orianac, and @badgley

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.