automl / autodlcomp19 Goto Github PK

View Code? Open in Web Editor NEW

3.0 14.0 3.0 525.44 MB

AutoDL Competition Scripts 2019

License: Apache License 2.0

Python 98.10% Shell 0.85% Makefile 1.05%

autodlcomp19's Introduction

AutoDLComp19

AutoDL Competition 2019

Installation

Activate a conda python3.5 environment then run

bash install/requirements_gcc.sh
pip install -r install/requirements.txt
bash install/requirements_torch_cuda100.sh
bash install/install_winner_cv.sh
bash install/install_winner_speech.sh
bash install/install_just.sh        # Optional command runner
bash install/install_precommit.sh   # Developer dependency

Usage

Running locally

To run the competition evaluation locally run

python -m src.competition.run_local_test \
    --dataset_dir DATASET_DIR \
    --code_dir src \
    --model_config_name CONFIG.yaml \
    --experiment_group EXPERIMENT_GROUP \
    --experiment_name EXPERIMENT_NAME \
    --time_budget 1200 \

CONFIG corresponds to one of the names of the general configs in src/configs/. If this argument is ommited, src/configs/default.yaml is used.

You can use --time_budget_approx <LOWER_TIME> and --time_budget <ACTUAL_TIME> to simulate cutting a run with budget <ACTUAL_TIME> after <LOWER_TIME> seconds.

If you want to overwrite the output dir (for repeated local testing for example), supply the --overwrite flag.

Do not run pre-commit hooks

To commit without runnning pre-commit use git commit --no-verify -m <COMMIT MESSAGE>.

Generate a performance matrix

0. create available_datasets.py dynamically (to be implemented)

1. Create the arguments for the HPO and run them on META as follows

python submission/create_hpo_args.py --command_file_name ARGS_FILE_NAME #--> args file outputted
sbatch submission/meta_kakaobrain_optimized_per_dataset.sh # --> set ARGS_FILE parameter to newly created ARGS_FILE_NAME, set experiment_group to EXPERIMENT_DIR, set budgets

2. Generate incumbent configs

mkdir src/configs/INC_OUTPUT_DIR
python src/hpo/incumbents_to_config.py --output_dir src/configs/INC_OUTPUT_DIR --experiment_group_dir EXPERIMENT_DIR # --> .yaml configs outputted to EXPERIMENT_DIR

3. Evaluate configurations

python submission/create_datasets_x_configs_args.py --configs_path src/configs/INC_OUTPUT_DIR --command_file_name EVAL_ARGS_FILE_NAME # --> EVAL_ARGS_FILE_NAME stored in submission/
sbatch submission/meta_kakaobrain_datasets_x_configs.sh # --> set ARGS_FILE to EVAL_ARGS_FILE_NAME, set --experiment_group to EVALUATION_DIR_PATH, evaluations stored in EVALUATION_DIR_PATH

4. Once the evaluation directory has been generated, generate the pandas DataFrames and csv files with the following command

python src/hpo/performance_matrix_from_evaluation.py --experiment_group_dir EVALUATION_DIR_PATH

Pre-Computing meta features

To pre-compute meta features run

python -m src.meta_features.precompute_meta_features --dataset_path DATASET_PATH

the default output_path is src/meta_features/meta_features.yaml.

Making a submission

To create a submission .zip for the codalab platform run

python submission/codalab.py

This uses the settings in src/configs/default.yaml.

Project Structure

├── experiments/                           <<  Files generated during runtime
│
├── install/                               <<  Requirements and scripts for installation
│
├── src/                                   <<  Source code
│   └── winner_<TRACK>/                    <<  Winner code for <TRACK>
│   └── competition/                       <<  Competition source code
│       └── run_local_test.py              <<  Execute competition evaluation locally
│
├── submission/                            <<  Submission utilities
│    └── competition.py                    <<  Create codalab submission
│
└── justfile                               <<  Command runner file akin to Makefile

License

Apache 2.0

autodlcomp19's People

Contributors

Stargazers

Watchers

Forkers

sanyam07 stjordanis ekremozturk

autodlcomp19's Issues

Optimize the dataloading pipeline for speed and check for issues. Based on the script that @julien Siems wrote can you please try if it is possible to get a Dataloader object without any memory issues on PyTorch from the tf_records iterator during training (https://github.com/automl/AutoDLComp19/blob/master/src/image/before_code_structure/model.py).

Tensorboard for image models

Add tensorboard to the model to visualize image model's input

Check online_meta.py

Check online_meta.py if there are any logical errors

Optimize scheduling/stopping

Investigate/optimize the general scheduling/stopping criteria of model.py

Cluster compatibility for non-submission experiments

Add cluster compatibility for non-submission experiments, i.e. add $SLURM_ARRAY_JOB_ID to the folder name where logs are written to avoid overwriting the results.

Extract imagenet metafeature from support datasets

Finetune pretrain Models

Run fine-tuning experiments on these models from #5

Competition starting datasets

Add the really small datasets from the competition starting kit to the repository.

Implement train/validation split

Error messages in Dataloading

Review if the dataloading we discussed still has issues and write a message to the competition organizers

Train Shake-Shake

Train shake-shake models on PyTorch 1.0.1, save them and add the scripts for loading the checkpoints to the codebase.

4 Channel Image data

Run experiments on which channel to exclude for the 4-channel dataset.
Arber: If this dataset is CMYK, we can map the (None, None, 4) tensor to RGB (None, None, 3) as pointed out here https://stackoverflow.com/questions/14088375/how-can-i-convert-rgb-to-cmyk-and-vice-versa-in-python

@anupam Kakkar, @debayan Sen @manav Madan: If the data of this dataset is not normalized and the pixel range is (0, 100) it is indeed CMYK.

Online-Optimization

Research and review more sophisticated online-optimization strategies (weight regularization, optimizers, lr-schedule, ...)

Create score percentage to minutes converter

Make an interactive plot to quickly looup the minutes or score percentage given score percentage or minutes + offset.
The input should be:
input type - percentage/minutes
t0 - The not counted seconds in the beginning
tmax - Total amount of time available
t1 - Lower bound of the region to convert
t2 - Upper bound of the region to convert

This should return the timer/percentage

Find a way to include this as interactive graph into the Readme.md
Implement it

Train Timeception on ucf101 and epic kitchen

Integrate mixed precicion training

Add the option to use mixed precision training to the pipeline.

Add the nvidia apex lib to the environment.yaml for automatic install
Add the necessary files to the config.hjson the library needs to function
Add an option to the config.hjson to use/not use mixed precision training

Integration Test Gitkranken

Check capabilities
Tick off 1

Run experiments on finetune strategies

Run experiments on finetuning strategies (including exploring the hyperparameters) [also specific to test dataset]

Train Timeception on ucf101 and epic kitchen

Add preprocessing/augmentation pipeline

Create a preprocessing and augmentation pipeline

Optimize lr, weight decay and mixup interpolation factor of WideResNet with BOHB

As discussed yesterday, it would be great if we have a zoo/portfolio of pretrained models with different hyperparameters. I think considering the time left we need something that trains relatively quick. You can use WideResNet since it is a shallow model and trains for less than 3h achieving a test error of around 3% on CIFAR-10. Another reason why I chose this model is because it is trained with a step leaning rate schedule. Of course, cosine annealing might be a better choice for training a network but if we want to fine-tune from these pretrained models and keep a decent accuracy in the very first beginning we need to keep in mind that increasing the learning rate after it has decayed gets you away from that good local minima and the accuracy drops. On the other hand, if the pretrained models had a leaning rate different from 0 at the end of their training we can directly use that value when starting to fine-tune. Lets keep it like this for now, later we can experiment with cosine annealing with lr_min != 0 at the end of training.
At this point I would suggest to start running BOHB with min_budget=max_budget=200 epochs and 100 iterations. Optimize lr, weight decay and mixup interpolation factor (I will send you the code, so you can find them in the arguments. Then you just integrate this search space with the BOHB scripts you have already been running for DARTS.). The ranges can be [0.01, 1] (log scale), [1e-5, 1e-3] (log scale), [0, 1], respectively. Please make sure to save the model parameters and optimizer state after training has finished for each sampled configuration. In the end we will have 100 pretrained models on CIFAR10 that we can use later.

automl / autodlcomp19 Goto Github PK

autodlcomp19's Introduction

AutoDLComp19

Installation

Usage

Running locally

Do not run pre-commit hooks

Generate a performance matrix

0. create available_datasets.py dynamically (to be implemented)

1. Create the arguments for the HPO and run them on META as follows

2. Generate incumbent configs

3. Evaluate configurations

4. Once the evaluation directory has been generated, generate the pandas DataFrames and csv files with the following command

Pre-Computing meta features

Making a submission

Project Structure

License

autodlcomp19's People

Contributors

Stargazers

Watchers

Forkers

autodlcomp19's Issues

Recommend Projects

Recommend Topics

Recommend Org