Code Monkey home page Code Monkey logo

sat-ml-training's People

Contributors

dependabot[bot] avatar drewbo avatar geoyi avatar lillythomas avatar vincentsarago avatar wildintellect avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

sat-ml-training's Issues

Test Remote Meeting Tools

@lillythomas and @wildintellect should arrange a call mid Sept with ICIMOD to verify that MS Teams will work, and has all the features we need:

  • Video presentation
  • Long lived chat rooms for feedback and interaction
  • 1:1 calls with desktop sharing for troubleshooting

Remove non-public data from examples

Some of the examples use non-public data. In the short term it's not an issue as the Shared Drive is not Public. Long term when finalizing the examples, we will need to only provide public data for examples, or instructions for how to obtain the needed data.

TODO:

  • Zindi data can be obtained from Zindi.org by anyone who signs up for an account
  • Crop Yield data can't be shared, we should probably delist that notebook from the public index, and remove the data from the shared drive.

Corrections to Deep Learning

  • typo in F1 formula
  • move google drive mounting until after installs and imports to try and fix bug
  • save model path needs /content added to it
  • don't reset the weights after the training when reloading model from file

From Slack discussion:

i think we should use an if/else statement for whether you want to use a saved model for predictions, dictated by boolean variable. the default setting should be False.
maybe if statement based on the model existing as a variable, question how do we make sure the weights are saved/reloaded before predictions?
if the saved_model.pb and variables folder are in the saved_model_path then that indicates the model saved out.
maybe we can cross compare the weights by printing them from the in memory trained model and then from the loaded model

Upgrade Fastpages

Probably won't figure this out, automated upgrade is broken due to customizations and permissions. The only thing this blocks is that launching a local server while developing is broken, so you have to push your notebook to github first, then you can view, launch in colab, or do a PR to put into main branch to get it check it on the website.

Large Scale Inference & Cloud Scaling Session

@drewbo this session "Large Scale Inference, Cloud Scaling", lesson number 6, on Oct 8, which is the final lesson before office hours is up to you.

@Geoyi mentioned discussing some topics with you and @lillythomas . A few things we thought about in scrum that might be good to include, which are not well covered elsewhere.

  • Demo how to connect to a GCP instance as a backend for Colab - References1 2 , participants need not do this since it would require setting up GCP accounts/IAM/etc.
  • Enabling GPU (Lilly will briefly show how to enable in Colab as part of lesson 4)
  • Using COGs as a data source instead of single tifs on Google Drive. The Sentinel 2 for Africa that @vincentsarago mentioned is Requester Pays, so wasn't easy to work into an example. Not sure if there is a good GCS bucket with COGs that could be used or another source. @vincentsarago maybe a standard notebook you have from previous talks you've done could be used.

Lilly's SERVIR to-do list

Last week we polished most of the pre-read content, namely the ML guide.

Remaining to-do items:

  • synthesize and clean up the deep learning crop type mapping code
  • add remaining markdown instruction to the deep learning crop type mapping notebook
  • write out interim data from the deep learning crop type mapping notebook, so that we can jump steps if a student needs to.
  • write out a saved model from the deep learning crop type mapping notebook, so that we can jump steps if a student needs to.
  • implement notebook in google colab
  • send out pre-read materials to students
  • support development on remaining tutorial notebooks

Getting big files into Colab without Download

Was working with the Zindi data, it's not really feasible for participants to download the data themselves and then upload to their google drives, could easily take hours. As a workaround we are putting all the data unzipped on Google Shared Drive accessible to participants. This is a common problem in general for ML work.

However it would be really awesome to use Colab to download the data directly to the cloud and unzip.

  • It takes authentication to the Zindi platform to download, so using requests library is a little more complicated
  • The links in the page are obsfucated, "Note: If you want to download the satellite data using a script, you can contact us for a permanent URL"

Alternatives:

  • Get the Scenes from GEE or AWS Public Bucket
  • Have the data ready to use (plan to do so anyways)
  • There's some interesting Chrome plugins to save to drive directly, not sure which are safe to use.

This is a wishlist item if we've got all the other important stuff done already, but important to talk about.

Accounts Setup

Participants are going to need several different accounts. We need to figure out that list and ensure everyone has instructions for each account type they need ahead of time.

Yes

  • Google Colab (Does this use a generic google account?)
  • Google Account for use of Google Drive
  • Google Earth Engine Account

Maybe

  • Microsoft Teams for Chat and Video, DevSeeders added to the NASA or ICIMOD under a Team specific for the training
  • Google Cloud Platform - Roles linked to an ICIMOD account so they can spin up resources for the very last lesson on scaling?
  • SentinelHub - Are we getting any data via SentinelHub?

How to do "train_test_split" for image data set used in pyrasterframes?

Hi,
I have got the code for "supervised machine learning" of pyrasterframes, and the link is "https://rasterframes.io/supervised-learning.html". I have seen the author used 12 ".tiff" to train the machine learning model, however the training set and testing set has not been split. Also, I used the code "x_training_data, x_test_data, y_training_data, y_test_data = train_test_split(x, y, test_size = 0.3)" and hoped to split the training set and testing, but I am not sure which one is "x" and "y". So, could you pls help to give me some suggestions on how to split the training set and testing set for the program on the the link "https://rasterframes.io/supervised-learning.html"? Thanks!
1
2
3
4

Branding

  • Add Devseed Logo
  • Tweak style slightly

Organize Page Order

Create a Page with the listing in order of materials for the HKH training.

This might be done using the sticky feature of fastpages, or a tag + date ordering of the posts, although newest post seems to show up first. Might also be able to include a hidden Markdown page that is hand coded links in the order we want.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.