Code Monkey home page Code Monkey logo

kaggle-plasticc's Introduction

Photometric light curves classification with machine learning

The Large Synoptic Survey Telescope will begin its survey in 2022 and produce terabytes of imaging data each night. To work with this massive onset of data, automated algorithms to classify astronomical light curves are crucial. Here, we present a method for automated classification of photometric light curves for a range of astronomical objects. Our approach is based on the gradient boosting of decision trees, feature extraction and selection, and augmentation. The solution was developed in the context of The Photometric LSST Astronomical Time Series Classification Challenge (PLAsTiCC) and achieved one of the top results in the challenge.

For more details, please refer to the paper.

If you are using the results and code of this work, please cite it as

@article{Gabruseva_2019,
  title={Photometric light curves classification with machine learning},
  author={T. Gabruseva and S. Zlobin and P. Wang},
  journal={JAI},
  year={2020}
}

Dataset

The training dataset consisted of simulated astronomical light curves modeled for a range of transients and periodic objects, see data . The dataset is available on kaggle platform.

The training dataset had 7848 light curves from 15 classes, and was highly unbalanced.

eda Fig. 1. Examples of simulated light curves for each class in the passbands ugrizy. MDJ โ€“ Modified Julian Date in days

Metrics

The evaluation metric was provided in the challenge. The models were evaluated with weighted multi-class logarithmic loss. See evaluation here.

Models

In this paper, we use python boosted decision trees implementation, LightGBM, with 5 folds cross-validation, stratified by classes.

We used different sets of features for the input of LightGBM classifier and selected the optimal features set based on the average 5-folds cross-validation scheme. The hyperparameters used are listed in the paper.

Features

We calculated a number of various features from the light curves. The features exptractors used for the paper can be found in src/feature_extractors . The exptracted features calculated for the train and test sets are available on kaggle dataset for download: features.

How to install and run

Preparing the training data

To download dataset from kaggle one need to have a kaggle account, join the competition and accept the conditions, get the kaggle API token ansd copy it to .kaggle directory. After that you may run bash dataset_download.sh in the command line. The script for downloading and unpacking data is in dataset_download.sh.

Prepare environment

  1. Install anaconda
  2. You may use the create_env.sh bash file to set up the conda environment

Reproducing the experiments

  1. Download extracted features from kaggle and place them to the input folder.
  2. Train different LightGBM classifiers from src/classifiers/ folder and
  3. Run predict on the test data using the same classifiers

kaggle-plasticc's People

Contributors

sergey-zlobin avatar tatigabru avatar

Stargazers

 avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.