Code Monkey home page Code Monkey logo

syne-tune's Issues

Warning: Python 3.6

Hi, I receive this warning using the docker image:

PythonDeprecationWarning: Boto3 will no longer support Python 3.6 starting May 30, 2022. To continue receiving service updates, bug fixes, and security updates please upgrade to Python 3.7 or later. More information can be found here:

Since this is only a couple of weeks away it might be a good idea to update the Dockerfile to Python 3.7 now.

INFO:root:Detected 2 GPUs on an EC2 m5d.12xlarge that has no GPU ?


I'm running Syne Tune on the conda_python3 Jupyter kernel of a SageMaker-managed EC2 instance (ml.m5d.12xlarge notebook instance), that has no GPUs.
However, in the Syne Tune logs I see:

INFO:root:Detected 2 GPUs

and then few lines below

DEBUG:root:Free GPUs: {0, 1}
DEBUG:root:Assigned GPU 0 to trial_id 0

But an m5d.12xlarge is not expected to have GPUs, right?

Experiment Results Contain Random Rows

In my experiment, the result data frame contains multiple rows with trial id 1 with the same content as the next row, the only difference being the config. This causes problems since sometimes the best config is now trial id 1 that shows a config which did not achieve the best performance.

See this example: True trial id 1 performance is 81% (Row 4) but trial id 1 also shows up in row 10 with highest accuracy.
I've added a simple example to reproduce this behavior.


from pathlib import Path

from sagemaker.pytorch import PyTorch

from syne_tune.backend import SageMakerBackend
from sagemaker import get_execution_role
from syne_tune.optimizer.baselines import RandomSearch
from syne_tune import Tuner
from syne_tune.config_space import randint
from syne_tune import StoppingCriterion
from syne_tune.optimizer.schedulers.fifo import FIFOScheduler

entry_point = Path('examples') / "training_scripts" / "height_example" / ""
assert entry_point.is_file(), 'File unknown'
mode = "min"
metric = "mean_loss"
instance_type = 'ml.c5.4xlarge'
instance_count = 1
instance_max_time = 999
n_workers = 20

config_space = {
    "steps": 1,
    "width": randint(0, 20),
    "height": randint(-100, 100)

backend = SageMakerBackend(

# Random search without stopping
scheduler = FIFOScheduler(

tuner = Tuner(

[BUG] LocalBackend: Evaluation Failed!

Hi, I am using LocalBackend to train a couple of huggingface models for a sample dataset (still WIP)..

However, I ran into the following errors:

INFO:syne_tune.optimizer.schedulers.hyperband:trial_id 1 starts (first milestone = 1)
INFO:root:running subprocess with command: /opt/conda/bin/python --model_type google/electra-base-discriminator --learning_rate 8.018154654725304e-05 --weight_decay 1.3591419560772573e-07 --dataset_path /DATA/jin/  --CUDA_VISIBLE_DEVICES 2 --train_batch_size 8 --valid_batch_size 8 --epochs 1 --output_dir output/ --eval_steps 100 --st_checkpoint_dir /root/syne-tune/test-hugging/1/checkpoints
INFO:syne_tune.tuner:(trial 1) - scheduled config {'model_type': 'google/electra-base-discriminator', 'learning_rate': 8.018154654725304e-05, 'weight_decay': 1.3591419560772573e-07, 'dataset_path': '/DATA/jin/, 'CUDA_VISIBLE_DEVICES': '2', 'train_batch_size': 8, 'valid_batch_size': 8, 'epochs': 1, 'output_dir': 'output/', 'eval_steps': 100}
INFO:syne_tune.tuner:Trial trial_id 1 was stopped independently of the scheduler.
INFO:syne_tune.optimizer.schedulers.fifo:trial_id 1: Evaluation failed!

Some of the debugging methods I have tried:

  1. setting debug_mode : True in tuner did not reflect the bug.
  2. I am able to run the exact commands for the subprocess without running into any issue or bug.

Any advice will be appreciated. Thank you!

Numeric and Log-Scale Choice

There is no equivalent of choice for numeric values. E.g., in the FCNet blackbox the learning rate is defined as 'hp_init_lr': choice([0.0005, 0.001, 0.005, 0.01, 0.05, 0.1]). This will not allow model-based approaches to encode this hyperparameter correctly. Would be great to identify them as numeric and also indicate whether log transform is needed.

Simulator results are ignored

When running (main branch) e.g.

python benchmarking/nursery/benchmark_automl/ --num_seeds 1 --method ASHA --benchmark fcnet-protein

I get following warnings. Is this expected, anyone knows what's going on?

WARNING:syne_tune.backend.simulator_backend.simulator_backend:The following trials reported results, but are not covered by trial_ids. These results will be ignored:
  trial_id 38: status = Stopped, num_results = 1
WARNING:syne_tune.backend.simulator_backend.simulator_backend:The following trials reported results, but are not covered by trial_ids. These results will be ignored:
  trial_id 44: status = Stopped, num_results = 1
WARNING:syne_tune.backend.simulator_backend.simulator_backend:The following trials reported results, but are not covered by trial_ids. These results will be ignored:
  trial_id 49: status = Stopped, num_results = 1
WARNING:syne_tune.backend.simulator_backend.simulator_backend:The following trials reported results, but are not covered by trial_ids. These results will be ignored:
  trial_id 77: status = Stopped, num_results = 1
WARNING:syne_tune.backend.simulator_backend.simulator_backend:The following trials reported results, but are not covered by trial_ids. These results will be ignored:
  trial_id 86: status = Stopped, num_results = 1
WARNING:syne_tune.backend.simulator_backend.simulator_backend:The following trials reported results, but are not covered by trial_ids. These results will be ignored:
  trial_id 113: status = Stopped, num_results = 1
WARNING:syne_tune.backend.simulator_backend.simulator_backend:The following trials reported results, but are not covered by trial_ids. These results will be ignored:
  trial_id 121: status = Stopped, num_results = 1
WARNING:syne_tune.backend.simulator_backend.simulator_backend:The following trials reported results, but are not covered by trial_ids. These results will be ignored:
  trial_id 142: status = Stopped, num_results = 1
WARNING:syne_tune.backend.simulator_backend.simulator_backend:The following trials reported results, but are not covered by trial_ids. These results will be ignored:
  trial_id 169: status = Stopped, num_results = 1
WARNING:syne_tune.backend.simulator_backend.simulator_backend:The following trials reported results, but are not covered by trial_ids. These results will be ignored:
  trial_id 186: status = Stopped, num_results = 1
WARNING:syne_tune.backend.simulator_backend.simulator_backend:The following trials reported results, but are not covered by trial_ids. These results will be ignored:
  trial_id 188: status = Stopped, num_results = 1
WARNING:syne_tune.backend.simulator_backend.simulator_backend:The following trials reported results, but are not covered by trial_ids. These results will be ignored:
  trial_id 229: status = Stopped, num_results = 1
WARNING:syne_tune.backend.simulator_backend.simulator_backend:The following trials reported results, but are not covered by trial_ids. These results will be ignored:
  trial_id 247: status = Stopped, num_results = 1
WARNING:syne_tune.backend.simulator_backend.simulator_backend:The following trials reported results, but are not covered by trial_ids. These results will be ignored:
  trial_id 252: status = Stopped, num_results = 1
WARNING:syne_tune.backend.simulator_backend.simulator_backend:The following trials reported results, but are not covered by trial_ids. These results will be ignored:
  trial_id 260: status = Stopped, num_results = 1
WARNING:syne_tune.backend.simulator_backend.simulator_backend:The following trials reported results, but are not covered by trial_ids. These results will be ignored:
  trial_id 255: status = Stopped, num_results = 1
WARNING:syne_tune.backend.simulator_backend.simulator_backend:The following trials reported results, but are not covered by trial_ids. These results will be ignored:
  trial_id 264: status = Stopped, num_results = 1
WARNING:syne_tune.backend.simulator_backend.simulator_backend:The following trials reported results, but are not covered by trial_ids. These results will be ignored:
  trial_id 297: status = Stopped, num_results = 1
WARNING:syne_tune.backend.simulator_backend.simulator_backend:The following trials reported results, but are not covered by trial_ids. These results will be ignored:
  trial_id 309: status = Stopped, num_results = 1
WARNING:syne_tune.backend.simulator_backend.simulator_backend:The following trials reported results, but are not covered by trial_ids. These results will be ignored:
  trial_id 314: status = Stopped, num_results = 1

Duplicate SM training job names with SagemakerBackend

Running python docs/tutorials/basics/scripts/ produces SM training jobs named None-0, None-1, etc which do not depend on tuner_name.
Rerunning the example leads to duplicate SM training job names and hence failure of the script.
This is because tuner_name inside the SagemakerBackend object is only ever set to None in the constructor.

Cross-ref: #112 and points by mseeger in #113.

ImportError for BotorchSearcher

Test (3.8) fails with:

____________ ERROR collecting tst/schedulers/ ____________ ImportError while importing test module '/home/runner/work/syne-tune/syne-tune/tst/schedulers/'. Hint: make sure your test modules/packages have valid Python names. Traceback: /opt/hostedtoolcache/Python/3.8.13/x64/lib/python3.8/importlib/ in import_module return _bootstrap._gcd_import(name[level:], package, level) tst/schedulers/ in <module> from syne_tune.optimizer.schedulers.botorch.botorch_searcher import BotorchSearcher syne_tune/optimizer/schedulers/botorch/ in <module> from botorch.models import SingleTaskGP /opt/hostedtoolcache/Python/3.8.13/x64/lib/python3.8/site-packages/botorch/ in <module> from botorch import ( /opt/hostedtoolcache/Python/3.8.13/x64/lib/python3.8/site-packages/botorch/acquisition/ in <module> from botorch.acquisition.acquisition import ( /opt/hostedtoolcache/Python/3.8.13/x64/lib/python3.8/site-packages/botorch/acquisition/ in <module> from botorch.models.model import Model /opt/hostedtoolcache/Python/3.8.13/x64/lib/python3.8/site-packages/botorch/models/ in <module> from botorch.models.approximate_gp import ( /opt/hostedtoolcache/Python/3.8.13/x64/lib/python3.8/site-packages/botorch/models/ in <module> from botorch.models.gpytorch import GPyTorchModel /opt/hostedtoolcache/Python/3.8.13/x64/lib/python3.8/site-packages/botorch/models/ in <module> from botorch.acquisition.objective import PosteriorTransform /opt/hostedtoolcache/Python/3.8.13/x64/lib/python3.8/site-packages/botorch/acquisition/ in <module> from botorch.models.model import Model /opt/hostedtoolcache/Python/3.8.13/x64/lib/python3.8/site-packages/botorch/models/ in <module> from botorch.models.utils.assorted import fantasize as fantasize_flag /opt/hostedtoolcache/Python/3.8.13/x64/lib/python3.8/site-packages/botorch/models/utils/ in <module> from botorch.models.utils.assorted import ( /opt/hostedtoolcache/Python/3.8.13/x64/lib/python3.8/site-packages/botorch/models/utils/ in <module> from gpytorch.utils.broadcasting import _mul_broadcast_shape E ImportError: cannot import name '_mul_broadcast_shape' from 'gpytorch.utils.broadcasting' (/opt/hostedtoolcache/Python/3.8.13/x64/lib/python3.8/site-packages/gpytorch/utils/

Container build fails

Hi, when running the container build script, it fails at the following:

Step 12/12 : RUN python -m pip install --no-cache-dir --upgrade -r /tmp/packages/requirements.txt
 ---> Running in 67f84184ab3e
ERROR: Extras after version '>=1.3ray[tune]'.
The command '/bin/sh -c python -m pip install --no-cache-dir --upgrade -r /tmp/packages/requirements.txt' returned a non-zero code: 1

Add plateau stopper

Add a stopping criterion that stops the HPO process if it hasn’t improved for N consecutive steps

Gracefully deal with SageMaker Failures

A SageMaker training job failed for some random reasons which seems to break the tuner:

File "/opt/conda/lib/python3.8/site-packages/syne_tune/", line 152, in run
    new_done_trial_statuses, new_results = self._process_new_results(
  File "/opt/conda/lib/python3.8/site-packages/syne_tune/", line 282, in _process_new_results
    done_trials_statuses = self._update_running_trials(trial_status_dict, new_results, callbacks=self.callbacks)
  File "/opt/conda/lib/python3.8/site-packages/syne_tune/", line 437, in _update_running_trials
    assert trial_id in self.last_seen_result_per_trial, \
AssertionError: trial 35 completed and no metrics got observed

Would be great to retry jobs or at least ignore and continue somehow.

Is there a tuner.best_config() API?

After a execution, I'd like to be able to programmatically get the best config, either from the tuner or from its data folder, eg:



tuning_experiment = load_experiment("experiment-xxxxxxxx")

Is there an API for this?
If no, I suggest to add it to the roadmap

How to set and get experiment name?


I see in the blog that one can query experiments by name, to access the metrics:

from syne_tune.experiments import load_experiment
tuning_experiment = load_experiment("train-cifar100-2021-11-05-15-22-27-531")

How do we know and set an experiment name?

sp.(log)finrange throws an error when sample(size=1)

Caused by self._uniform_int.sample(spec, size=1, random_state) returning an int rather than an iterable.
This seems to be caused by this piece of code

def _sanitize_sample_result(items, domain: Domain):
if len(items) > 1:
return [domain.cast(x) for x in items]
return domain.cast(items[0])

import syne_tune.search_space as sp
fr = sp.finrange(1, 2, 2)
> Out[4]: [1.0, 2.0]

> Traceback (most recent call last):
>   ...
>   File "/Users/awgol/code/syne-tune/syne_tune/", line 592, in sample
>     for x in self._uniform_int.sample(spec, size, random_state)]
> TypeError: 'int' object is not iterable

ExperimentResult plot warning

Hi, with recent versions of matplotlib, ExperimentResult.plot() gives me the warning:

WARNING:matplotlib.legend:No artists with labels found to put in legend. Note that artists whose label start with an underscore are ignored when legend() is called with no argument.

Config_space for full pipeline optimization

Hi, would it be possible to direct the config_space search for a conditional set formation so it can create a multi-step pipeline?
Something that will limit activation of invalid pipelines from alg+hp config variables.

SageMaker ResourceLimitExceeded

Hi, I have a limit of 8 ml.g5.12xlarge instances, and although I set Tuner.n_workers = 5 I still got a ResourceLimitExceeded error. Is there a way to make sure that jobs are fully stopped when using SageMakerBackend before launching new ones?

Also, when using RemoteLauncher, in situations where the management instance does error out (for example due to ResourceLimitExceeded), is there a way to make sure the management instance sends a stop signal to all tuning jobs before exiting? Maybe something like:

    # manage tuning jobs
   # raise error
   # stop any trials still running

How to set a custom tuner_path ?


how to set a custom tuner_path?

I'm launching long-running experiments on remote SageMaker jobs, and I'd like to set the tuner metadata path to /opt/ml/checkpoints (local path on those transient VMs), to get the metadata sent to s3 upon updates

ModuleNotFoundError when running example_syne_tune_for_hf.ipynb notebook

When I run example_syne_tune_for_hf.ipynb notebook, first cell after !pip install commands, results in ModuleNotFoundError: No module named 'syne_tune.config_space' error.


import matplotlib as mpl #$; mpl.use('pgf')
import os

%matplotlib inline
import matplotlib.pyplot as plt
import logging
from pathlib import Path

from syne_tune.backend.local_backend import LocalBackend
from syne_tune.tuner import Tuner
from syne_tune.search_space import uniform, loguniform, choice, randint
from syne_tune.stopping_criterion import StoppingCriterion
from syne_tune.optimizer.baselines import ASHA, MOBSTER, BayesianOptimization, RandomSearch, MOASHA
from syne_tune.constants import ST_WORKER_TIME
from syne_tune.backend.sagemaker_backend.instance_info import select_instance_type
from syne_tune.backend.sagemaker_backend.sagemaker_backend import SagemakerBackend
from syne_tune.backend.sagemaker_backend.sagemaker_utils import get_execution_role

    "cola": {'metric': 'matthews_correlation', 'mode': 'max'},
    "mnli": {'metric': 'accuracy', 'mode': 'max'},
    "mrpc": {'metric': 'f1', 'mode': 'max'},
    "qnli": {'metric': 'accuracy', 'mode': 'max'},
    "qqp": {'metric': 'f1', 'mode': 'max'},
    "rte": {'metric': 'accuracy', 'mode': 'max'},
    "sst2": {'metric': 'accuracy', 'mode': 'max'},
    "stsb": {'metric': 'spearmanr', 'mode': 'max'},
    "wnli": {'metric': 'accuracy', 'mode': 'max'},

Full Logs:

ModuleNotFoundError                       Traceback (most recent call last)
<ipython-input-5-ad89febf37d1> in <module>
     10 from syne_tune.backend.local_backend import LocalBackend
     11 from syne_tune.tuner import Tuner
---> 12 from syne_tune.config_space import uniform, loguniform, choice, randint
     13 from syne_tune.stopping_criterion import StoppingCriterion
     14 from syne_tune.optimizer.baselines import ASHA, MOBSTER, BayesianOptimization, RandomSearch, MOASHA

ModuleNotFoundError: No module named 'syne_tune.config_space'

Failed trials have out of date metrics

Hi, I'm using SageMaker as a backend and remote launcher. I noticed that if a job errors out during training, the latest performance logs will not be captured.

For example in my HPO experiment on CIFAR-10 dataset, One trial (number 8) had been reported in the Syne Tune results dataframe as achieving a validation accuracy of 0.8478 at epoch 22:


However my CloudWatch logs show that the validation accuracy actually reached 0.926 at epoch 60 before crashing:



Interestingly the job shows as Stopped rather than Failed in SageMaker console. Does Syne Tune notice an exception and stop the job before it exits with a failure?

Custom results directory

Dear creators, thank you again for your great work and perhaps sorry for being annoying with my suggestions/questions. Is it possible to change the home directory of different runs such that it is not ~/syne-tune but a custom path? Thanks!

Tutorial: Multi-fidelity HNAS in Syne Tune

What. Longer step-by-step tutorial on how to run experiments with our async and sync multi-fidelity HPO methods, both using tabulated blackboxes and a real DNN tuning problem (Hugging Face?).
Why. The way in which variants of different algos are implemented and available in ST could be a real advantage, but is right now hidden and undocumented. A tutorial would be most accessible, and would clarify important concepts (sync/async)
Done. Tutorial tested with volunteer outside the team, feedback incorporated

A second part of the tutorial could be for developers: how to implement a new scheduler, or a variant of an existing one.

[Feature Request] Parallel Categories Plot

(Apologies for creating multiple recent GitHub issues, this is the last one, I promise!)

I took the DataFrame from my experiment results and used Plotly's plot to visualize hyperparameter interactions, dropping any features that only have one unique value. This is an interactive plot, and you can wrap it in a function that refreshes periodically when new data is available:


This has been super useful for myself, so I thought that it may be useful to others as well if it were added as a plotting capability to the library? Although I'd understand if it's not desirable to add another dependency. Just thought I'd share!

Doc mismatch leads to ImportError: cannot import name 'report' ?


I'm using Syne Tune from a SageMaker-managed EC2 instance (notebook instance)

As indicated here, I'm using this code in my backend script:
from import report
which returns an ImportError: cannot import name 'report'

and when I look in I can see a Reporter but no report

This blog post however proposes a different code:

from import Reporter
report = Reporter()

Could the README be clarified?

(Note that I cannot check the version: import syne_tune syne_tune.__version__ returns an AttributeError: module 'syne_tune' has no attribute '__version__'



ST already has sp.choice for categorical variables, and sp.finrange and sp.logfinrange for numerical values, but I feel that sometimes it is easier to manually specify the elements (as per sp.choice), but have them treated as numerical values by the GP-models and by the Blackbox-surrogate-models. Hence I'm wondering about implementing something like sp.number_choice, mostly for convenience, what do you think?

Grid search in syne-tune

Hey folks,
would you be interested in grid search implemented in syne-tune? I had a few offline discussions with some of you already, and it seems that you are not against grid search added to syne-tune, but want to keep a record of that here.

Additionally, would you have any pointers as to what would be the best way to add grid search to syne-tune?

make QuantileBasedSurrogateSearcher import in baselines optional

Right now importing syne_tune.optimizer.baselines fails when only core dependencies are installed because it imports QuantileBasedSurrogateSearcher, which in turn requires additional dependencies, such as XGBoost or sklearn. I would suggest to make it's import optional to avoid exceptions.

Custom arguments packaging

Dear creators, thank you for your great work. Is there a way how we could specify any packaging for the input hyperparameters for our main script? E.g. in our project we do not input hyperparameters directly as in python3 --width 1 but through python3 --hyperparameters='{"width": 1}' to avoid adding a new argument to our parser and to avoid clutter each time we would like to change something. I have checked the FAQ but I have not found anything related. Thank you for your input!

[Feature Request] Attach to SageMakerBackend logging

Hi, could you add a method to attach to the logs for the SageMakerBackend management estimator? For example, RemoteLauncher.logs so we can simply do remote.logs()?

Some customers can't access console to view CloudWatch logs, so this would be easier for them than fiddling with boto3.

AttributeError: 'NoneType' object has no attribute 'scheduler'


I launched a Syne Tune experiment few hours ago (experiment-2022-01-11-10-57-17-491), then stopped it and launched another one.

While experiment-2022-01-11-10-57-17-491 was running I could see its chart using

from syne_tune.experiments import load_experiment

tuning_experiment = load_experiment("experiment-2022-01-11-10-57-17-491")

Now, when I'm doing it from the same machine, I get a :

AttributeError                            Traceback (most recent call last)
<ipython-input-5-0c0bfae5f6de> in <module>
      1 # metric over time
----> 2 tuning_experiment.plot()

~/anaconda3/envs/python3/lib/python3.6/site-packages/syne_tune/ in plot(self, **plt_kwargs)
     51         import matplotlib.pyplot as plt
---> 53         scheduler = self.tuner.scheduler
     54         metric = self.metric_name()
     55         df = self.results

AttributeError: 'NoneType' object has no attribute 'scheduler'

What is wrong? Can the graph be accessed only while the tuner is running?

Issue with running No module named 'benchmarks'

When running (python docs/tutorials/basics/scripts/ on the main branch I get an error within the spawned SageMaker training jobs:

Traceback (most recent call last):
  File "", line 29, in <module>
    from benchmarks.checkpoint import resume_from_checkpointed_model, \
ModuleNotFoundError: No module named 'benchmarks'

I'm including the full log below.
I’m not certain if it’s due to my AWS environment setup (although I am generally able to run SageMaker training jobs) or an issue with the code, could you please have a look?

Best wishes,

Full log:

showing log of sagemaker job: traincode-report-withcheckpointing-2022-01-18-16-26-35-248-4
bash: cannot set terminal process group (-1): Inappropriate ioctl for device
bash: no job control in this shell
2022-01-18 16:34:35,020 sagemaker-training-toolkit INFO     Imported framework
2022-01-18 16:34:35,023 sagemaker-training-toolkit INFO     No GPUs detected (normal if no gpus installed)
2022-01-18 16:34:35,035 INFO     Block until all host DNS lookups succeed.
2022-01-18 16:34:36,465 INFO     Invoking user training script.
2022-01-18 16:34:37,061 sagemaker-training-toolkit INFO     No GPUs detected (normal if no gpus installed)
2022-01-18 16:34:37,076 sagemaker-training-toolkit INFO     No GPUs detected (normal if no gpus installed)
2022-01-18 16:34:37,090 sagemaker-training-toolkit INFO     No GPUs detected (normal if no gpus installed)
2022-01-18 16:34:37,103 sagemaker-training-toolkit INFO     Invoking user script
Training Env:
    "additional_framework_parameters": {},
    "channel_input_dirs": {},
    "current_host": "algo-1",
    "framework_module": "",
    "hosts": [
    "hyperparameters": {
        "batch_size": 126,
        "weight_decay": 0.7744002774231975,
        "st_checkpoint_dir": "/opt/ml/checkpoints",
        "st_instance_count": 1,
        "n_units_2": 322,
        "dataset_path": "./",
        "n_units_1": 107,
        "dropout_2": 0.20979101632756325,
        "dropout_1": 0.4715702331554363,
        "epochs": 81,
        "learning_rate": 0.0029903699075321814,
        "st_instance_type": "ml.m4.10xlarge"
    "input_config_dir": "/opt/ml/input/config",
    "input_data_config": {},
    "input_dir": "/opt/ml/input",
    "is_master": true,
    "job_name": "traincode-report-withcheckpointing-2022-01-18-16-26-35-248-4",
    "log_level": 20,
    "master_hostname": "algo-1",
    "model_dir": "/opt/ml/model",
    "module_dir": "s3://sagemaker-us-west-2-640549960621/traincode-report-withcheckpointing-2022-01-18-16-26-35-248-4/source/sourcedir.tar.gz",
    "module_name": "traincode_report_withcheckpointing",
    "network_interface_name": "eth0",
    "num_cpus": 40,
    "num_gpus": 0,
    "output_data_dir": "/opt/ml/output/data",
    "output_dir": "/opt/ml/output",
    "output_intermediate_dir": "/opt/ml/output/intermediate",
    "resource_config": {
        "current_host": "algo-1",
        "hosts": [
        "network_interface_name": "eth0"
    "user_entry_point": ""
Environment variables:
Invoking script with the following command:
/opt/conda/bin/python3.6 --batch_size 126 --dataset_path ./ --dropout_1 0.4715702331554363 --dropout_2 0.20979101632756325 --epochs 81 --learning_rate 0.0029903699075321814 --n_units_1 107 --n_units_2 322 --st_checkpoint_dir /opt/ml/checkpoints --st_instance_count 1 --st_instance_type ml.m4.10xlarge --weight_decay 0.7744002774231975
Traceback (most recent call last):
  File "", line 29, in <module>
    from benchmarks.checkpoint import resume_from_checkpointed_model, \
ModuleNotFoundError: No module named 'benchmarks'
2022-01-18 16:34:38,444 sagemaker-training-toolkit ERROR    ExecuteUserScriptError:
Command "/opt/conda/bin/python3.6 --batch_size 126 --dataset_path ./ --dropout_1 0.4715702331554363 --dropout_2 0.20979101632756325 --epochs 81 --learning_rate 0.0029903699075321814 --n_units_1 107 --n_units_2 322 --st_checkpoint_dir /opt/ml/checkpoints --st_instance_count 1 --st_instance_type ml.m4.10xlarge --weight_decay 0.7744002774231975"
Traceback (most recent call last):
  File "", line 29, in <module>
    from benchmarks.checkpoint import resume_from_checkpointed_model, \
ModuleNotFoundError: No module named 'benchmarks'

Promotion Logic Bug

There seems to be a problem with the Hyperband promotion logic.

How to reproduce:
Add type="promotion" to

Run python benchmarking/nursery/benchmark_automl/ --num_seeds 1 --method ASHA --benchmark lcbench-airlines

  File "/syne-tune/benchmarking/nursery/benchmark_automl/", line 209, in <module>
  File "/syne-tune/syne_tune/", line 240, in run
    raise e
  File "/syne-tune/syne_tune/", line 175, in run
    new_done_trial_statuses, new_results = self._process_new_results(
  File "/syne-tune/syne_tune/", line 345, in _process_new_results
    done_trials_statuses = self._update_running_trials(
  File "/syne-tune/syne_tune/", line 465, in _update_running_trials
    decision = self.scheduler.on_trial_result(trial=trial, result=result)
  File "/syne-tune/syne_tune/optimizer/schedulers/", line 779, in on_trial_result
    task_info = self.terminator.on_task_report(trial_id, result)
  File "/syne-tune/syne_tune/optimizer/schedulers/", line 1124, in on_task_report
    rung_sys.on_task_report(trial_id, result, skip_rungs=skip_rungs)
  File "/syne-tune/syne_tune/optimizer/schedulers/", line 221, in on_task_report
    assert resource == milestone, (
AssertionError: trial_id 1: resource = 4 > 3 milestone. Make sure to report time attributes covering all milestones```

Implement independent GP surrogate model and Hyper-Tune

What. Implement Hyper-Tune as extension of asynchronous Hyperband (ASHA)
Why. Very competitive method, according to the paper. We lack async HB methods that do a good job with bracket sampling
Done. Some unit tests, comparison with baselines

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.