developmentseed / sat-ml-training Goto Github PK

Home Page: https://developmentseed.org/sat-ml-training/

License: Apache License 2.0

Ruby 0.01% Makefile 0.01% Dockerfile 0.01% Shell 0.03% Python 0.02% Smarty 0.01% HTML 0.16% Jupyter Notebook 99.62% JavaScript 0.07% SCSS 0.06%

sat-ml-training's Issues

Corrections to Crop Yield

Last section tunning/tuning
more?

Add more issues as found and we'll wrap into another update.

Test Remote Meeting Tools

@lillythomas and @wildintellect should arrange a call mid Sept with ICIMOD to verify that MS Teams will work, and has all the features we need:

Video presentation
Long lived chat rooms for feedback and interaction
1:1 calls with desktop sharing for troubleshooting

Remove non-public data from examples

Some of the examples use non-public data. In the short term it's not an issue as the Shared Drive is not Public. Long term when finalizing the examples, we will need to only provide public data for examples, or instructions for how to obtain the needed data.

TODO:

Zindi data can be obtained from Zindi.org by anyone who signs up for an account
Crop Yield data can't be shared, we should probably delist that notebook from the public index, and remove the data from the shared drive.

Corrections to Deep Learning

typo in F1 formula
move google drive mounting until after installs and imports to try and fix bug
save model path needs /content added to it
don't reset the weights after the training when reloading model from file

From Slack discussion:

i think we should use an if/else statement for whether you want to use a saved model for predictions, dictated by boolean variable. the default setting should be False.
maybe if statement based on the model existing as a variable, question how do we make sure the weights are saved/reloaded before predictions?
if the saved_model.pb and variables folder are in the saved_model_path then that indicates the model saved out.
maybe we can cross compare the weights by printing them from the in memory trained model and then from the loaded model

Upgrade Fastpages

Probably won't figure this out, automated upgrade is broken due to customizations and permissions. The only thing this blocks is that launching a local server while developing is broken, so you have to push your notebook to github first, then you can view, launch in colab, or do a PR to put into main branch to get it check it on the website.

Finalize Intro ML Material

Looking at our intro material, I think we need to better guide readers as to what they should actually review before jumping into our interactive lessons. We've got lots of good links but should narrow down the number of pages to review covering the key concepts before using.

http://devseed.com/sat-ml-training/python/background/2020/02/23/IntroMachineLearning.html

Large Scale Inference & Cloud Scaling Session

@drewbo this session "Large Scale Inference, Cloud Scaling", lesson number 6, on Oct 8, which is the final lesson before office hours is up to you.

@Geoyi mentioned discussing some topics with you and @lillythomas . A few things we thought about in scrum that might be good to include, which are not well covered elsewhere.

Demo how to connect to a GCP instance as a backend for Colab - References1 2 , participants need not do this since it would require setting up GCP accounts/IAM/etc.
Enabling GPU (Lilly will briefly show how to enable in Colab as part of lesson 4)
Using COGs as a data source instead of single tifs on Google Drive. The Sentinel 2 for Africa that @vincentsarago mentioned is Requester Pays, so wasn't easy to work into an example. Not sure if there is a good GCS bucket with COGs that could be used or another source. @vincentsarago maybe a standard notebook you have from previous talks you've done could be used.

Lilly's SERVIR to-do list

Last week we polished most of the pre-read content, namely the ML guide.

Remaining to-do items:

synthesize and clean up the deep learning crop type mapping code
add remaining markdown instruction to the deep learning crop type mapping notebook
write out interim data from the deep learning crop type mapping notebook, so that we can jump steps if a student needs to.
write out a saved model from the deep learning crop type mapping notebook, so that we can jump steps if a student needs to.
implement notebook in google colab
send out pre-read materials to students
support development on remaining tutorial notebooks

Getting big files into Colab without Download

Was working with the Zindi data, it's not really feasible for participants to download the data themselves and then upload to their google drives, could easily take hours. As a workaround we are putting all the data unzipped on Google Shared Drive accessible to participants. This is a common problem in general for ML work.

However it would be really awesome to use Colab to download the data directly to the cloud and unzip.

It takes authentication to the Zindi platform to download, so using requests library is a little more complicated
The links in the page are obsfucated, "Note: If you want to download the satellite data using a script, you can contact us for a permanent URL"

Alternatives:

Get the Scenes from GEE or AWS Public Bucket
Have the data ready to use (plan to do so anyways)
There's some interesting Chrome plugins to save to drive directly, not sure which are safe to use.

This is a wishlist item if we've got all the other important stuff done already, but important to talk about.

Code Cells missing from Website

https://developmentseed.org/sat-ml-training/Randomforest_cropmapping-with_GEE the code cell with the Export to Drive is missing, but it is in the source code.

Accounts Setup

Participants are going to need several different accounts. We need to figure out that list and ensure everyone has instructions for each account type they need ahead of time.

Yes

Google Colab (Does this use a generic google account?)
Google Account for use of Google Drive
Google Earth Engine Account

Maybe

Microsoft Teams for Chat and Video, DevSeeders added to the NASA or ICIMOD under a Team specific for the training
Google Cloud Platform - Roles linked to an ICIMOD account so they can spin up resources for the very last lesson on scaling?
SentinelHub - Are we getting any data via SentinelHub?

How to do "train_test_split" for image data set used in pyrasterframes?

Hi,
I have got the code for "supervised machine learning" of pyrasterframes, and the link is "https://rasterframes.io/supervised-learning.html". I have seen the author used 12 ".tiff" to train the machine learning model, however the training set and testing set has not been split. Also, I used the code "x_training_data, x_test_data, y_training_data, y_test_data = train_test_split(x, y, test_size = 0.3)" and hoped to split the training set and testing, but I am not sure which one is "x" and "y". So, could you pls help to give me some suggestions on how to split the training set and testing set for the program on the the link "https://rasterframes.io/supervised-learning.html"? Thanks!

Branding

Add Devseed Logo
Tweak style slightly

[fastpages] Automated Upgrade

Opening this issue will trigger GitHub Actions to fetch the lastest version of fastpages. More information will be provided in forthcoming comments below.

Organize Page Order

Create a Page with the listing in order of materials for the HKH training.

This might be done using the sticky feature of fastpages, or a tag + date ordering of the posts, although newest post seems to show up first. Might also be able to include a hidden Markdown page that is hand coded links in the order we want.

developmentseed / sat-ml-training Goto Github PK

sat-ml-training's People

Contributors

Stargazers

Watchers

Forkers

sat-ml-training's Issues

Yes

Maybe

Recommend Projects

Recommend Topics

Recommend Org