This project aims to classify the morphologies of distant galaxies using deep neural networks.
It is based on the Kaggle Galaxy Zoo Challenge.
Project's assignement as well as inspirational papers on the topic are available in doc/.
To better understand the task to be learned, you could give it a go yourself ! try it here
- (Optional) Install
poetry
if you don't have it already:
make setup-poetry
- Install dependencies:
poetry install
- To download the dataset, you can install Kaggle's API (you need to setup your credentials), and then download the dataset:
pip install --user kaggle
kaggle competitions download -c galaxy-zoo-the-galaxy-challenge
- You're good to go!
poetry run python -m gzoo.app.make_labels <data_dir>
required arguments:
<data_dir>
: specifies the location of the dataset directory containing the original regression labelstraining_solutions_rev1.csv
poetry run python -m gzoo.app.train -o config/train_classification.yaml
script option:
-o
: specify the.yaml
config file to read options from. Every run config option should be listed in this file (the default file for this is config/train_classification.yaml) and every option in theyaml
file can be overloaded on the fly at the command line.
For instance, if you are fine with the values in the yaml
config file but you just want to change the epochs
number, you can either change it in the config file or you can directly run:
poetry run python -m gzoo.app.evaluate -o config/train.yaml --epochs 50
This will use all config values from config/train.yaml
except the number of epochs which will be set to 50
.
main run options:
--seed
: seed for initializing training. (default:None
)--epochs
: total number of epochs (default:90
)--batch-size
: batch size (default:256
)--workers
: number of threads (default:4
)--model.arch
: model architecture to be used(default:resnet18
)--model.pretrained
: use pre-trained model (default:False
)--optimizer.lr
: optimizer learning rate (default:3.e-4
with Adam)--optimizer.momentum
: optimizer momentum (default:0.9
)--optimizer.weight-decay
: optimizer weights regularization (L2) (default1.e-4
)
streamlit run gzoo/interface/web_app.py
poetry run python -m gzoo.app.predict -o config/predict.yaml
Config works the same as for train.py
, default config is at config/predict.yaml.
The dataset
directory specified in the config must contain an images_test_rev1
that contains itself the images to predict, as well as the all_ones_benchmark.csv
output template from the Kaggle project's data sources.
A 1-image example is provided which you can run with:
poetry run python -m gzoo.app.predict -o config/predict.yaml --dataset example
Activate pre-commit hooks:
poetry run pre-commit install