sdtaylor / phenology_forecasts Goto Github PK
View Code? Open in Web Editor NEWThe backend for http://phenology.naturecast.org
Home Page: http://phenology.naturecast.org
The backend for http://phenology.naturecast.org
Home Page: http://phenology.naturecast.org
I'm making the switch to using the "Individual Phenometrics" data instead of the "Status and Intensity" data. The former summarizes things into first "yes" dates for all individual trees, which I was doing manually myself. This NPN summarized data also has better conflict flags which can be used to filter out most of the problematic group sites.
Data download for all prior data used in model buidling (2008 - 2017)
https://data.usanpn.org/observations?search=3dd370f197c6ae95e881f0e93cc56ae8
in case anyone asks, this was a public domain image from here https://openclipart.org/detail/110371/oak-leaf-silhouette
On running into bottlenecks dealing with 100's of GB (several TB's in the future) of weather forecast data.
Using chunks in NetCDF
https://www.unidata.ucar.edu/blogs/developer/entry/chunking_data_why_it_matters
Xarray discussion on aligning dask and netcdf chunks
pydata/xarray#1440
example of using apply_ufunc in downscaling observed and modelled arrays
https://groups.google.com/forum/#!topic/xarray/eyWr_ajTmL4
NOAA CFSv2 forecasts are in GRIB2 format. grib files in xarray requires pynio. pynio was only recently ported to python3 and requires a dev version. Here's how I got it to work.
From a fresh anaconda3 install
conda install -c ncar -c conda-forge pynio=dev python=3
Then download the latest xarray master and install with python setup.py install
In a few months this will hopefully just be
conda install xarray pynio
profiling shows np.exp() to be taking quit a long time. in the forecast runs.
Testing shows the following on serenity (ubuntu 16.04, conda install python3.6)
shawn@serenity:~$ python
Python 3.6.3 | packaged by conda-forge | (default, Nov 4 2017, 10:10:56) numpy 1.15
[GCC 4.8.2 20140120 (Red Hat 4.8.2-15)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import timeit
>>> timeit.timeit('np.exp(d)', setup='import numpy as np;d=np.ones((420,1405,620))',number=1)
27.85343541414477
A windows machine in the library runs the same command in 1.5 seconds.
This issue potentially points to a glibc issue .
I think I chose species based on having >1000 observations in the raw data. but after processing this boiled down to <10 usable observations for some species. Need to fix this.
from Ethan:
At this size we can store 1000 sets of forecasts in a single Zenodo archive, which is over a decade of forecasts. I'd recommend starting to automatically archive there, either by pushing forecasts to a directory in https://github.com/weecology/forecasts (you already have write access) or by us setting up a similar system for just the phenology forecasts. This would address Deitze's interest in downloading forecasts and you could (at some point in the future) add a feature to the website that can download forecasts for a selected species and forecast date.
various R scripts with their own helper functions are hanging out in different places, namely model_building/phenology
and the map building R scripts
conda-pack looks like a good solution here
This seems like a good option to break out and clear a lot of code out in this repo
pyDownscaledCFSv2
-create a downscale model (or download a premade one)
option to downloscale via PRISM or DayMet
-downloads cfs data
-converts them to netcdf format
-downscales via
need classes with generalized methods for
PRISM data
daymet data
Weather forecasts are available from the NOAA CFSv2 model. A new run is released every 6 hours and forecasts 9 months out, with slight variations on output depending on the time of day.
These are deterministic forecasts, so to create an ensemble I'd need to use several days worth and combine them.
This will involve taking the CFSv2 reanalysis and combining it with PRISM to get comparisons.
On each forecast day (ie. 1st and 15th of the month)
End up with:
An nc file of 10 runs of cuttoff - end of forecast
too much variation in this between species. Fix the scale to something like 0 - 20+
lots of output to log
Use netcdf attributes to record some history in the files
Ran into an issue where forecasting entries from the static image metadata file were duplicated, causing the API update to error out from duplicate entries, causing an imcomplete API update and website errors. Not sure where the duplicates came from, but a good guard regardless is having an all or nothing update for every forecast iteration inside api_client.py
. There is likely something for this built into the django stuff.
A fairly important step thats easily overlooked. All the times in the CFS forecast are GMT. Need to convert things to their own timezone.
Or ... with daily mean temperature it mayyyy be fine.
Key issue dates where new things were implemented.
2018-01-05 - First full automated run
2018-01-20 - first time having issue_date and crs in attributes (easily fixable)
2018-01-23 - first time using the larger species set (66 instead of 44) by having a larger set of range masks
currently cannot connect to the PRISM ftp server. running this on a node just hanges
In [3]: from ftplib import FTP
In [4]: ftp_con = FTP(host='prism.nacse.org', user='anonymous',passwd='abc123')
In [5]: ftp_con.nlst('/')
BUT, running the same on the login node is fine.
In [1]: from ftplib import FTP
In [2]: ftp_con = FTP(host='prism.nacse.org', user='anonymous',passwd='abc123')
In [3]: ftp_con.nlst('/')
Out[3]:
['/PRISM_datasets.pdf',
'/normals_800m',
'/normals_4km',
'/daily',
'/monthly',
'/data_archive']
figshare - unlimited data
zenodo - 50GB limit (but can ask for more), versioning (very important)
Current daily forecasts are ~31mb compressed.
Highlighted notes from staff
Please provide the main manuscript in Word or PDF+LaTeX format. Word is preferred. Figures and appendices may remain in PDF format. See Items 1, 8, 9, and 10 in the Checklist for Authors below.
List the Running Head on the title page of the manuscript file, matching the entry in the corresponding field of the online submission form.
Please provide Figs. 1, 2, and 3 sized for PDF publication at no larger than portrait layout (maximum 6 inches wide x 8 inches high) or landscape layout (maximum 8.75 inches wide x 5.25 inches high). All text must be sized between 6 and 10 point when the image is sized for publication. For readability, we suggest using a text size hierarchy, sizing axis numbers between 6-7 point, axis labels between 8-9 point, panel labels that consist of words between 7-8 point, and panel labels that consist of a single letter at 10 point.
Remove line numbers from Appendix S1 to prepare it for posting online.
DEFAULT CHECKLIST FOR AUTHORS
How to Prepare Your Accepted Manuscript for Publication
Upload the main manuscript file and tables in Word or LaTeX. If your manuscript was prepared in LaTeX, please upload a ZIP of the LaTeX source files and include a PDF version.
Assemble the parts of the manuscript in the following order: Title page, Abstract, Key words, Text, Acknowledgments, Literature Citations, Data Availability Statement (if any), Tables (table legend, headings, data, footnotes), Figure Legends, Figures.
Completely double space all text in the manuscript, including table legends and footnotes. All pages should have 1 inch margins on all sides.
Prepare the manuscript using Times New Roman font in a 12-point size. Number the pages sequentially, beginning with the title page.
Check that the hierarchy of text headings is discernible.
Display equations should be formatted using MathType software (a trial version is available online).
Confirm that tables, figures, and Supporting Information are mentioned in the body of the manuscript in numeric order.
For Word submissions, tables must be provided in an editable format as true tables in the Word file, created using the "Insert Table" function, rather than using tabs, spaces, or embedded images. Tables cannot contain colors, shading or graphics. If such enhancements are needed, the information must be presented as a figure.
Provide each figure a single time, as a high-resolution image suitable for publication. Preferred file formats include TIF, EPS, PDF, or AI created using at least 600 dpi, while JPEG, PPT/PPTX, or DOC/DOCX are also acceptable if the resolution is sufficient. See Author Guidelines for more details. Ultimately we are looking for files that not only create crisp, clean prints, but also remain crisp and clear when the on-screen view is significantly zoomed in.
Prepare and submit any previously provided Supporting Information for online publication in Wiley Online Library. New material cannot be added at this stage. Supporting Information is not edited or typeset, and thus should be supplied in the format intended for posting online. For materials prepared in LaTeX, please supply a PDF only. Each appendix must be provided in a separate file. To avoid publication delays please review the naming conventions for this material and the file naming requirements prior to submission: https://esajournals.onlinelibrary.wiley.com/hub/journal/19395582/resources/author-guidelines-eap#Supporting_Information
Prepare and submit any necessary data to an approved repository. Additional material cannot be added to Supporting Information. See Ecological Application's Data Policy here: https://esajournals.onlinelibrary.wiley.com/hub/journal/19395582/resources/data-policy-eap
Ecological Applications is using Twitter (@ESAApplications) to publicize articles. In the Twitter section of ScholarOne's online submission form, please provide any individual, institution, or funder Twitter accounts you would like us to tag, along with any applicable Twitter hashtags. ESA staff will post a tweet shortly after your article appears online with a direct link to the article.
If authors wish to promote their paper at the time it is released online, be advised that ESA does not embargo papers and the Accepted Article is expected to publish online within a week of files being transmitted to the publisher. General information on ESA’s publicity and embargo policies can be found online: https://esajournals.onlinelibrary.wiley.com/hub/journal/19395582/resources/publicity-esa
Update the range_map_made
column in misc_data_prep/create_speces_masks.py
remove occurences_downloaded
column
this column name has stuck around for some reason, need to change it everywhere in one go
dates to hindcast from = jan 1 - June 30, every 4 days
45 hindcasts / year
~ 100 mb per species/phenophase (compressed) = 4.5GB * 138 27 spp = 121 GB
~ 30 min per species/phenophase = 14 hours * 45 hindcast dates = 630 total hours
(~ 2 days w/ 16 cores. but will likely need 25-30GB ram each)
(note the 30 min time was with ThermalTime model, a Uniforc model took 2 hours)
for each hindcast_date
obtain current_weather_observation nc file (or just use the one already built)
cutoff the current_weather_observation file to hincast_date - 1 day
get latest forecasts for hindcast_date
make folder of "current forecasts"
pass that folder to apply_phenology_forecasts, but this must use the bootstrap versions of all models
zenodo
confirm all references
npn data citation
all software refs in final methods paragraph.
get nice X's in prediction paragraph (latitude X longitude X time)
Use $\times$
cited papers by white or taylor from the past 12 months.
portal forecasting paper (MEE)
pyphenology (JOSS)
npn-lter paper
portal data paper (PLOS BIO)
figures and figure legends, with references right after main text.
potentially have to edit the latex template: https://rmarkdown.rstudio.com/authoring_bibliographies_and_citations.html
fiddle with figure font sizes. shoot for final widths from the instructions:
Figures should be drawn/submitted at their smallest practicable size (to fit a single column (82 mm), two-thirds page width (110 mm) or full page width (173 mm).
cite little dataset for range maps
include note on error boxplot/timeseries bias. Since we're only half thru spring.
confirm 100 million number for total calculations
make supp table of species/phenophases used.
confirm final tally of total species + total unique forecasts
confirm most of the steps in fig. 1 are represented in text.
present in text: B, C, D, E-G, H, M, N, L Q, O
needed: A: NPN data
J: latest PRISM data
K: apply downscale model
P: sync to site
write text for new fig 3 and 4
detail out uncertainty equation
equation showing the weighted average derived from climate ensemble in supplement
methods for anomalies?
peak flower forecasts - some papers have looked into this
community forecasts - like a mountain meadow, i'm very interested in this
Looks like a nice package here https://github.com/JiaweiZhuang/xESMF, for the initial interpolation only
for every species/phenophase give some status update on the model
Perm links to a specific species/phenophase/issue date OR the latest issue date
Things are getting a bit crowded, so...
phenology_forecasts/
tools/
...
model_building/
phenology/
build_phenology_models.py
download_species_observations.R
download_species_observation_temperature.R
phenology_observation_functions.R
climate/
download_historic_observations.py
download_historic_reanalysis.py
download_historic_forecasts.py
build_downscaling_model.py
automated_forecasting/
climate/
download_latest_forecasts.py
download_latest_observations.py
phenology/
apply_phenology_models.py
presentation/
build_maps.R
misc_data_prep/
create_species_masks.py
create_mask.py
it's in the current_season_observation.nc
file
There is some data processing for this done by hand.
wget https://www.fs.fed.us/nrs/atlas/littlefia/species_table.html
grep IV_Little species_table.html | cut -d"<" -f15 | cut -d"_" -f 3,4 > species_list_cut_output.csv
Go thru and x
out the 8 intermediate characters on each line in vim
sort species_list_cut_output.csv | uniq > species_list.csv
go back thru and put in commas. (quicker than it sounds)
The CFSv2 reanalysis has 6 hour timesteps, but between those timesteps it has hourly "forecasts" where the model is run with no assimilation. Since I'm just getting daily means I only want to the the primary timesteps at 6 hour intervals. But this description says the 00 forecast for some things is essential invalid, but I'm not sure if it applies to the tmp2m that I'm using.
Text from the pdf
Important Note: The forecast at the first time step (f00) of 3 minutes constitutes a spin up of the model hysics, and extreme care should be taken when using it as a proxy of any type of validation. IT IS NOT THE ANALYSIS.
The tmp2m files I'm using don't have this 3 minute initial timestep so for the time being I'm using the initial timestep.
From @ethanwhite. Instead of picking one model for each species/phenophase, use an ensemble with appropriate weights. As new observations come in adjust the weights somehow for the next forecasts.
This will essentially be observations in the southern/lower elev. range of species affecting the forecast for more northern/upper elev individuals.
some species in the Tree Atlas data have synonym issues
Alnus tenuifolia and Alnus rugosa -> A. incana
Cornus stolonifera -> C. sericea
Sambucus sp. -> S. nigra (this genus is a mess, see below)
Others are just neat plants that have the NPN data available but just need the range map
From Janet Prevey
making selectable leaflet maps https://github.com/stefanocudini/leaflet-panel-layers
epic tutorial on leaflet/js everying online mapping https://www.e-education.psu.edu/geog585/node/776
custom tile creation with gdal (and made very complex with AWS) https://hi.stamen.com/stamen-aws-lambda-tiler-blog-post-76fc1138a145, and the code https://github.com/hotosm/oam-dynamic-tiler
leaflet color scale https://gis.stackexchange.com/questions/193161/add-legend-to-leaflet-map
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.