Code Monkey home page Code Monkey logo

cumulo's Introduction

a benchmark dataset for training and evaluating global cloud classification models. It merges two satellite products from the A-train constellation: the Moderate Resolution Imaging Spectroradiometer (MODIS) from Aqua satellite and the 2B-CLDCLASS-LIDAR product derived from the combination of CloudSat Cloud Profiling Radar (CPR) and CALIPSO Cloud‐Aerosol Lidar with Orthogonal Polarization (CALIOP).

FULL README

Dataset

The dataset is hosted here. It contains over 300k annotated multispectral images at 1km x 1km resolution, providing daily full coverage of the Earth for 2008, 2009 and 2016.

Download

Option 1: syncing with your DropBox Account

  1. add CUMULO to your DropBox account
  2. use rclone for syncing it on your machine

Option 2: direct download -- DEPRECATED!

  1. use one of these download scripts

File Format

Data is stored in Network Common Data Form (NetCDF) following this convention.

There is 1 NetCDF file per swath of 1354x2030 pixels, 1 every 5 minutes, named:

filename = AYYYYDDD.HHMM.nc

YYYY => year
DDD => absolute day since 01.01.YYYY 
HH => hour of day
MM => minutes    

File Content

To see the variables available for a netcdf file and their description, run:

ncdump -h netcdf/cumulo.nc

Code Source

  1. The script pipeline.py extracts one CUMULO's swath (as a netcdf file) from the corresponding MODIS' MYD02, MYD03, MYD06 and MYD35 files, and CloudSat's CS_2B-CLDCLASS and/or CS_2B-CLDCLASS-LIDAR files.
python3 pipeline <save-dir> <myd02-filename>
  1. src/ contains the code source for extracting the different CUMULO's features, for alignment them and for completing the missing values when possible.

Dependencies

pip install gcsfs
conda install -c conda-forge pyhdf  #The pip install's wheels are broken at time of writing
pip install satpy
pip install satpy[modis_l1b]
pip install -r requirements.txt

Machine Learning Baselines

Examples for training models on CUMULO are provided here.

Cite

If you find this work useful, please cite the original paper:

@article{zantedeschi2019cumulo,
        title={Cumulo: A Dataset for Learning Cloud Classes},
        author={Zantedeschi, Valentina and Falasca, Fabrizio and Douglas, Alyson and Strange, Richard and Kusner, Matt J and Watson-Parris, Duncan},
        journal={Tackling Climate Change with Machine Learning Workshop, NeurIPS},
        year={2019}}

Acknowledgments

This work is the result of the 2019 ESA Frontier Development Lab Atmospheric Phenomena and Climate Variability challenge. We are grateful to all organisers, mentors and sponsors for providing us this opportunity. We thank Google Cloud for providing computing and storage resources to complete this work.

cumulo's People

Contributors

alysonr avatar r-strange avatar vzantedeschi avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.