Galaxy Zoo

This project aims to classify the morphologies of distant galaxies using deep neural networks.

It is based on the Kaggle Galaxy Zoo Challenge.

Documentation

Project's assignement as well as inspirational papers on the topic are available in doc/.

To better understand the task to be learned, you could give it a go yourself ! try it here

Installation

(Optional) Install poetry if you don't have it already:

make setup-poetry

Install dependencies:

poetry install

To download the dataset, you can install Kaggle's API (you need to setup your credentials), and then download the dataset:

pip install --user kaggle
kaggle competitions download -c galaxy-zoo-the-galaxy-challenge

You're good to go!

Train

Create the training labels for classification

poetry run python -m gzoo.app.make_labels <data_dir>

required arguments:

<data_dir>: specifies the location of the dataset directory containing the original regression labels training_solutions_rev1.csv

Run the classification pipeline:

poetry run python -m gzoo.app.train -o config/train_classification.yaml

script option:

-o: specify the .yamlconfig file to read options from. Every run config option should be listed in this file (the default file for this is config/train_classification.yaml) and every option in the yaml file can be overloaded on the fly at the command line.

For instance, if you are fine with the values in the yaml config file but you just want to change the epochs number, you can either change it in the config file or you can directly run:

poetry run python -m gzoo.app.evaluate -o config/train.yaml --epochs 50

This will use all config values from config/train.yaml except the number of epochs which will be set to 50.

main run options:

--seed: seed for initializing training. (default: None)
--epochs: total number of epochs (default: 90)
--batch-size: batch size (default: 256)
--workers: number of threads (default: 4)
--model.arch: model architecture to be used(default: resnet18)
--model.pretrained: use pre-trained model (default: False)
--optimizer.lr: optimizer learning rate (default: 3.e-4 with Adam)
--optimizer.momentum: optimizer momentum (default: 0.9)
--optimizer.weight-decay: optimizer weights regularization (L2) (default 1.e-4)

Predict

From the web app

streamlit run gzoo/interface/web_app.py

From the command line:

poetry run python -m gzoo.app.predict -o config/predict.yaml

Config works the same as for train.py, default config is at config/predict.yaml. The dataset directory specified in the config must contain an images_test_rev1 that contains itself the images to predict, as well as the all_ones_benchmark.csv output template from the Kaggle project's data sources.

A 1-image example is provided which you can run with:

poetry run python -m gzoo.app.predict -o config/predict.yaml --dataset example

Developer

Activate pre-commit hooks:

poetry run pre-commit install

jeremie-koster / galaxy-classification-yotta-project Goto Github PK

galaxy-classification-yotta-project's Introduction

Galaxy Zoo

Documentation

Installation

Train

Create the training labels for classification

Run the classification pipeline:

Predict

From the web app

From the command line:

Developer

galaxy-classification-yotta-project's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent