Code Monkey home page Code Monkey logo

phenology_forecasts's People

Contributors

sdtaylor avatar

Stargazers

Ethan Welty avatar Ricardo Barros Lourenço avatar Felipe Sodré M. Barros avatar  Tagh bio avatar  avatar Ethan White avatar Rene Kopeinig  avatar Loïc Dutrieux avatar

Watchers

Ethan White avatar James Cloos avatar  avatar

phenology_forecasts's Issues

updates to observation data

I'm making the switch to using the "Individual Phenometrics" data instead of the "Status and Intensity" data. The former summarizes things into first "yes" dates for all individual trees, which I was doing manually myself. This NPN summarized data also has better conflict flags which can be used to filter out most of the problematic group sites.

Data download for all prior data used in model buidling (2008 - 2017)
https://data.usanpn.org/observations?search=3dd370f197c6ae95e881f0e93cc56ae8

evaluation paper todo

Writing

  • confirm the 3 methodologies (climatology, climatology + current year temp, current year temp + temp forecasts) are consistently named and lettered (a,b,c) throughout methods,results,discussion,supplement.
  • abstract
  • npn data cite
  • all software citations
  • acknowledgements (not in main paper, they go in a special submission box)
  • moore funding statement in acknowledgements for bioarxiv version
  • ethan affiliations
  • do latex /refs so peerj will have an easier time. (though not technically needed till resubmission)

Logistics

required packages

NOAA CFSv2 forecasts are in GRIB2 format. grib files in xarray requires pynio. pynio was only recently ported to python3 and requires a dev version. Here's how I got it to work.

From a fresh anaconda3 install

conda install -c ncar -c conda-forge pynio=dev python=3

Then download the latest xarray master and install with python setup.py install

In a few months this will hopefully just be

conda install xarray pynio

large bottleneck in np.exp

profiling shows np.exp() to be taking quit a long time. in the forecast runs.

Testing shows the following on serenity (ubuntu 16.04, conda install python3.6)

shawn@serenity:~$ python
Python 3.6.3 | packaged by conda-forge | (default, Nov  4 2017, 10:10:56) numpy 1.15
[GCC 4.8.2 20140120 (Red Hat 4.8.2-15)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import timeit
>>> timeit.timeit('np.exp(d)', setup='import numpy as np;d=np.ones((420,1405,620))',number=1)
27.85343541414477

A windows machine in the library runs the same command in 1.5 seconds.

This issue potentially points to a glibc issue .

do cutoff for minimum number of observations

I think I chose species based on having >1000 observations in the raw data. but after processing this boiled down to <10 usable observations for some species. Need to fix this.

archive forecasts

from Ethan:
At this size we can store 1000 sets of forecasts in a single Zenodo archive, which is over a decade of forecasts. I'd recommend starting to automatically archive there, either by pushing forecasts to a directory in https://github.com/weecology/forecasts (you already have write access) or by us setting up a similar system for just the phenology forecasts. This would address Deitze's interest in downloading forecasts and you could (at some point in the future) add a feature to the website that can download forecasts for a selected species and forecast date.

consoliate R tools

various R scripts with their own helper functions are hanging out in different places, namely model_building/phenology and the map building R scripts

make the cfs stuff into a package

This seems like a good option to break out and clear a lot of code out in this repo

pyDownscaledCFSv2

-create a downscale model (or download a premade one)
option to downloscale via PRISM or DayMet

-downloads cfs data
-converts them to netcdf format
-downscales via

need classes with generalized methods for
PRISM data
daymet data

temperature data

Weather forecasts are available from the NOAA CFSv2 model. A new run is released every 6 hours and forecasts 9 months out, with slight variations on output depending on the time of day.
These are deterministic forecasts, so to create an ensemble I'd need to use several days worth and combine them.

Downscaling requirements

This will involve taking the CFSv2 reanalysis and combining it with PRISM to get comparisons.

Potential steps

On each forecast day (ie. 1st and 15th of the month)

  1. Download latest 10 (?) model runs.
  2. Also download latest operations re-analysis.
    (could also use the op. analysis that NPN uses)
  3. Extract North America.
  4. Downscale each one.
  5. combine time series of the obs. temp from some cutoff thru the spring/summer

End up with:
An nc file of 10 runs of cuttoff - end of forecast

scope

Goals

  • using species/phenophase specific phenology models, making forecasts several months out of the spring/summer season
  • do this using true weather forecasts from the CFSv2 model.
  • automatically repeat forecasts every 2 weeks or so

Needed

  • Automatic download, injest, and downscaling of CFSv2 data. (probably the hardest part)
  • fitting of model to large spatial dataset and producing maps.
  • potentially combining observed data with forecast data (like forecast made on Feb 1)
  • potentially using phenology database compiled by brian stucky

data provenance

Use netcdf attributes to record some history in the files

recent forecast files

  • url of forecast file
  • downscaled using such and such method, observed data
  • date range of prism data, prism homepage url

plant forecast files

  • weather forecast note above
  • data from NPN.
  • using model XXX
  • weecology

have all or nothing create for API client

Ran into an issue where forecasting entries from the static image metadata file were duplicated, causing the API update to error out from duplicate entries, causing an imcomplete API update and website errors. Not sure where the duplicates came from, but a good guard regardless is having an all or nothing update for every forecast iteration inside api_client.py. There is likely something for this built into the django stuff.

account for timezone

A fairly important step thats easily overlooked. All the times in the CFS forecast are GMT. Need to convert things to their own timezone.

Or ... with daily mean temperature it mayyyy be fine.

key forecast dates

Key issue dates where new things were implemented.

2018-01-05 - First full automated run
2018-01-20 - first time having issue_date and crs in attributes (easily fixable)
2018-01-23 - first time using the larger species set (66 instead of 44) by having a larger set of range masks

PRISM connection issue

currently cannot connect to the PRISM ftp server. running this on a node just hanges

In [3]: from ftplib import FTP

In [4]: ftp_con = FTP(host='prism.nacse.org', user='anonymous',passwd='abc123')

In [5]: ftp_con.nlst('/')

BUT, running the same on the login node is fine.

In [1]: from ftplib import FTP                                                                                                                                           

In [2]: ftp_con = FTP(host='prism.nacse.org', user='anonymous',passwd='abc123')                                                                                          

In [3]: ftp_con.nlst('/')                                                                                                                                                
Out[3]: 
['/PRISM_datasets.pdf',
 '/normals_800m',
 '/normals_4km',
 '/daily',
 '/monthly',
 '/data_archive']

forecast archiving

figshare - unlimited data
zenodo - 50GB limit (but can ask for more), versioning (very important)

Current daily forecasts are ~31mb compressed.

Infrastructure paper eco-apps final checklist

Highlighted notes from staff

  • Please provide the main manuscript in Word or PDF+LaTeX format. Word is preferred. Figures and appendices may remain in PDF format. See Items 1, 8, 9, and 10 in the Checklist for Authors below.

  • List the Running Head on the title page of the manuscript file, matching the entry in the corresponding field of the online submission form.

  • Please provide Figs. 1, 2, and 3 sized for PDF publication at no larger than portrait layout (maximum 6 inches wide x 8 inches high) or landscape layout (maximum 8.75 inches wide x 5.25 inches high). All text must be sized between 6 and 10 point when the image is sized for publication. For readability, we suggest using a text size hierarchy, sizing axis numbers between 6-7 point, axis labels between 8-9 point, panel labels that consist of words between 7-8 point, and panel labels that consist of a single letter at 10 point.

  • Remove line numbers from Appendix S1 to prepare it for posting online.

DEFAULT CHECKLIST FOR AUTHORS
How to Prepare Your Accepted Manuscript for Publication

  • Upload the main manuscript file and tables in Word or LaTeX. If your manuscript was prepared in LaTeX, please upload a ZIP of the LaTeX source files and include a PDF version.

  • Assemble the parts of the manuscript in the following order: Title page, Abstract, Key words, Text, Acknowledgments, Literature Citations, Data Availability Statement (if any), Tables (table legend, headings, data, footnotes), Figure Legends, Figures.

  • Completely double space all text in the manuscript, including table legends and footnotes. All pages should have 1 inch margins on all sides.

  • Prepare the manuscript using Times New Roman font in a 12-point size. Number the pages sequentially, beginning with the title page.

  • Check that the hierarchy of text headings is discernible.

  • Display equations should be formatted using MathType software (a trial version is available online).

  • Confirm that tables, figures, and Supporting Information are mentioned in the body of the manuscript in numeric order.

  • For Word submissions, tables must be provided in an editable format as true tables in the Word file, created using the "Insert Table" function, rather than using tabs, spaces, or embedded images. Tables cannot contain colors, shading or graphics. If such enhancements are needed, the information must be presented as a figure.

  • Provide each figure a single time, as a high-resolution image suitable for publication. Preferred file formats include TIF, EPS, PDF, or AI created using at least 600 dpi, while JPEG, PPT/PPTX, or DOC/DOCX are also acceptable if the resolution is sufficient. See Author Guidelines for more details. Ultimately we are looking for files that not only create crisp, clean prints, but also remain crisp and clear when the on-screen view is significantly zoomed in.

  • Prepare and submit any previously provided Supporting Information for online publication in Wiley Online Library. New material cannot be added at this stage. Supporting Information is not edited or typeset, and thus should be supplied in the format intended for posting online. For materials prepared in LaTeX, please supply a PDF only. Each appendix must be provided in a separate file. To avoid publication delays please review the naming conventions for this material and the file naming requirements prior to submission: https://esajournals.onlinelibrary.wiley.com/hub/journal/19395582/resources/author-guidelines-eap#Supporting_Information

  • Prepare and submit any necessary data to an approved repository. Additional material cannot be added to Supporting Information. See Ecological Application's Data Policy here: https://esajournals.onlinelibrary.wiley.com/hub/journal/19395582/resources/data-policy-eap

  • Ecological Applications is using Twitter (@ESAApplications) to publicize articles. In the Twitter section of ScholarOne's online submission form, please provide any individual, institution, or funder Twitter accounts you would like us to tag, along with any applicable Twitter hashtags. ESA staff will post a tweet shortly after your article appears online with a direct link to the article.

  • If authors wish to promote their paper at the time it is released online, be advised that ESA does not embargo papers and the Accepted Article is expected to publish online within a week of files being transmitted to the publisher. General information on ESA’s publicity and embargo policies can be found online: https://esajournals.onlinelibrary.wiley.com/hub/journal/19395582/resources/publicity-esa

updates to species_list.csv

Update the range_map_made column in misc_data_prep/create_speces_masks.py

remove occurences_downloaded column

hindcasting notes

dates to hindcast from = jan 1 - June 30, every 4 days
45 hindcasts / year
~ 100 mb per species/phenophase (compressed) = 4.5GB * 138 27 spp = 121 GB
~ 30 min per species/phenophase = 14 hours * 45 hindcast dates = 630 total hours
(~ 2 days w/ 16 cores. but will likely need 25-30GB ram each)
(note the 30 min time was with ThermalTime model, a Uniforc model took 2 hours)

for each hindcast_date
  obtain current_weather_observation nc file (or just use the one already built)
  cutoff the current_weather_observation file to hincast_date - 1 day
  get latest forecasts for hindcast_date
  make folder of "current forecasts"
  pass that folder to apply_phenology_forecasts, but this must use the bootstrap versions of all models

infrastructure paper final todo

Manuscript Logistics

  • zenodo

  • confirm all references

  • npn data citation

  • all software refs in final methods paragraph.

  • get nice X's in prediction paragraph (latitude X longitude X time)
    Use $\times$

  • cited papers by white or taylor from the past 12 months.

  • portal forecasting paper (MEE)

  • pyphenology (JOSS)

  • npn-lter paper

  • portal data paper (PLOS BIO)

  • figures and figure legends, with references right after main text.
    potentially have to edit the latex template: https://rmarkdown.rstudio.com/authoring_bibliographies_and_citations.html

  • fiddle with figure font sizes. shoot for final widths from the instructions:

Figures should be drawn/submitted at their smallest practicable size (to fit a single column (82 mm), two-thirds page width (110 mm) or full page width (173 mm).

Writing Stuff

  • cite little dataset for range maps

  • include note on error boxplot/timeseries bias. Since we're only half thru spring.

  • confirm 100 million number for total calculations

  • make supp table of species/phenophases used.

  • confirm final tally of total species + total unique forecasts

  • confirm most of the steps in fig. 1 are represented in text.
    present in text: B, C, D, E-G, H, M, N, L Q, O
    needed: A: NPN data
    J: latest PRISM data
    K: apply downscale model
    P: sync to site

  • write text for new fig 3 and 4

  • detail out uncertainty equation

  • equation showing the weighted average derived from climate ensemble in supplement

  • methods for anomalies?

different types of forecasts

peak flower forecasts - some papers have looked into this
community forecasts - like a mountain meadow, i'm very interested in this

project folder structure

Things are getting a bit crowded, so...

phenology_forecasts/
  tools/
    ...
  model_building/  
    phenology/  
      build_phenology_models.py
      download_species_observations.R
      download_species_observation_temperature.R
      phenology_observation_functions.R
    climate/
      download_historic_observations.py
      download_historic_reanalysis.py
      download_historic_forecasts.py
      build_downscaling_model.py
  automated_forecasting/
    climate/
      download_latest_forecasts.py
      download_latest_observations.py
    phenology/
      apply_phenology_models.py
    presentation/
      build_maps.R
  misc_data_prep/
    create_species_masks.py
    create_mask.py

USGS Tree ranges

There is some data processing for this done by hand.

wget https://www.fs.fed.us/nrs/atlas/littlefia/species_table.html
grep IV_Little species_table.html | cut -d"<" -f15 | cut -d"_" -f 3,4 > species_list_cut_output.csv

Go thru and x out the 8 intermediate characters on each line in vim

sort species_list_cut_output.csv | uniq > species_list.csv

go back thru and put in commas. (quicker than it sounds)

April 9th forecast for some forecasts shows incorrect range

Looks like there's something funny going on with at least the Sugar Maple forecasts which went from this on April 5th:

screenshot from 2018-04-10 14-49-50

To this on April 9th:

screenshot from 2018-04-10 14-50-07

I've checked about half a doze other species and don't see anything comparable, which is weird.

which time in reanalysis to use?

The CFSv2 reanalysis has 6 hour timesteps, but between those timesteps it has hourly "forecasts" where the model is run with no assimilation. Since I'm just getting daily means I only want to the the primary timesteps at 6 hour intervals. But this description says the 00 forecast for some things is essential invalid, but I'm not sure if it applies to the tmp2m that I'm using.

Text from the pdf

Important Note: The forecast at the first time step (f00) of 3 minutes constitutes a spin up of the model hysics, and extreme care should be taken when using it as a proxy of any type of validation. IT IS NOT THE ANALYSIS.

The tmp2m files I'm using don't have this 3 minute initial timestep so for the time being I'm using the initial timestep.

add data integration/assimalation via model ensemble weights

From @ethanwhite. Instead of picking one model for each species/phenophase, use an ensemble with appropriate weights. As new observations come in adjust the weights somehow for the next forecasts.

This will essentially be observations in the southern/lower elev. range of species affecting the forecast for more northern/upper elev individuals.

species notes

some species in the Tree Atlas data have synonym issues

  • Alnus tenuifolia and Alnus rugosa -> A. incana

  • Cornus stolonifera -> C. sericea

  • Sambucus sp. -> S. nigra (this genus is a mess, see below)

Others are just neat plants that have the NPN data available but just need the range map

  • fouquieria splendens
  • taraxacum officinale
  • Vaccinium corymbosum
  • ginkgo biloba
  • clintionia borealis
  • maianthemum canadense
  • Platanthera praeclara (western fringed prairie orchid) (from L. Biederman)
  • larrea tridentata
  • trillium grandiflorum
  • trillium erectum
  • trillium ovatum
  • adenostoma fasciculatum

From Janet Prevey

  • Vaccinium membranaceum (black huckleberry)
    mean flowering GDD: 698, mean fruiting GDD: 953
  • Gaultheria shallon (salal)
    mean flowering GDD: 1113, mean fruiting GDD: 1790
  • Berberis aquifolium (Oregon grape)
    mean flowering GDD: 460, mean fruiting GDD: 1447
  • Corylus cornuta (Hazelnut)
    mean flowering GDD: 418, mean fruiting GDD: 1914

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.