Code Monkey home page Code Monkey logo

Comments (12)

geoalgo avatar geoalgo commented on June 16, 2024 1

Hi Andreas,

I agree this is a great suggestion, we will add an example in the doc.

To answer your question, the tuner is regularly checkpointed (if you let the default options of the Tuner) so you can resume a tuning by loading its checkpoint:

from syne_tune.experiments import load_experiment
from syne_tune import StoppingCriterion

# Loads a previous experiment, sets `load_tuner` to True to deserialize the Tuner
tuning_experiment = load_experiment("plot-results-demo-2023-10-10-07-27-48-235", load_tuner=True)

# Update stop criterion to run the tuning a couple more
tuner = tuning_experiment.tuner
tuner.stop_criterion = StoppingCriterion(max_num_trials_started=100)
tuner.run()

See screenshot for resuming a previous example:
Screenshot 2023-10-24 at 17 26 53

Note that the tuner is serialized with dill, so this should only be done if you trust the file.

from syne-tune.

amueller avatar amueller commented on June 16, 2024 1

Awesome, that's easy and makes sense, but also would be good to call out in the docs :)

from syne-tune.

amueller avatar amueller commented on June 16, 2024 1

one more note for just restarting the tuner: I would add to the docs to recommend backing up the tuner before restarting. I just did ctrl-c after reloading and the file got corrupted / was emptied.

from syne-tune.

geoalgo avatar geoalgo commented on June 16, 2024 1

This is a good point. Right now, the example modify some internals to update the configuration space but ideally we would have clear setters methods with tests and list the properties that can be changed.

For now, I believe it makes sense to allow user to perform those modifications in case they want to experiment something (for instance your case of resuming tuning while changing deleting checkpoints option) and add those features as soon as we have multiple users asking for them.

Regarding your change, I think it does the right thing but @mseeger would be the best person to confirm.

from syne-tune.

amueller avatar amueller commented on June 16, 2024 1

Btw, figuring out which aspects of a model can be changed at which state of training is something scikit-learn hasn't even figured out yet, it's definitely not easy to solve in general.
I think this specific one is quite relevant since it's on the natural progression path of a new user trying to make things work, and the fact that the harddrive is full probably means they invested quite a bite of compute already that they'd like to reuse.

from syne-tune.

amueller avatar amueller commented on June 16, 2024

Oh I got one more follow-up: Let's say I want to re-use experiments with a different tuner. Is that also possible? Say I want to expand a parameter range, or maybe vary one more parameter. It seems silly to start from scratch then.

from syne-tune.

geoalgo avatar geoalgo commented on June 16, 2024

Awesome, that's easy and makes sense, but also would be good to call out in the docs :)

Completely agree, I am planning to add a FAQ example (you are not the first person to ask :-)).

from syne-tune.

geoalgo avatar geoalgo commented on June 16, 2024

Oh I got one more follow-up: Let's say I want to re-use experiments with a different tuner. Is that also possible? Say I want to expand a parameter range, or maybe vary one more parameter. It seems silly to start from scratch then.

Here the problem would be more dependent on the scheduler you are using. Conceptually, it should work for everything that looks like random-search (and ASHA), but it is not tested and I am not sure which scheduler would work in this mode.

There was this paper https://arxiv.org/abs/2010.13117 that proposed some strategies for this problem but this problem is not well studied I would say.

Edit: added the not :-)

from syne-tune.

amueller avatar amueller commented on June 16, 2024

Are you saying it's not a well-studied problem?

The general case could get arbitrarily complicated, I think, but I'm mostly interested in the case where the true function stays the same, but maybe parameters of the search or the domain of the function change.
In these settings, it should be possible to re-use old data by potentially filling in missing values in the search space, let's say we fixed super_duper_option=True before and now we vary it, we would need to inform the scheduler that the previous points correspond to super_duper_option=True since that wasn't included in the search space.

Re-using previous runs would lead to a different sampling bias but hopefully the acquisition function can compensate for that?

If the true function changes, say you change the dataset or model in some way, that seems much more trickier and I wasn't really asking about that.

from syne-tune.

amueller avatar amueller commented on June 16, 2024

One related thing that would be useful is knowing what can be changed and how if a tuner is loaded.
Let's say my tuner crashed cause my hard drive was full, so now I want to restart it and enable the automatic checkpoint removal. It's not entirely clear to me how to do that. I tried

tuner = load_experiment(sys.argv[1], load_tuner=True).tuner
tuner.trial_backend.delete_checkpoints = True
tuner.scheduler.early_checkpoint_removal_kwargs = {"max_num_checkpoints": 80}
tuner.scheduler._initialize_early_checkpoint_removal({"max_num_checkpoints": 80})
tuner.run()

but I don't think that had the desired effect?

from syne-tune.

mseeger avatar mseeger commented on June 16, 2024

You are asking for a lot here. You can try to add tuner._initialize_early_checkpoint_removal() before tuner.run() above, see whether this works. Most likely, it will not. The feature is based on a callback, which has a state, depending on what happened during the experiment. It would be very difficult to recreate this state based on checkpoints written during an experiment when the feature was not enabled.

What may work, is to enable the removal feature up front, but with a large max_num_checkpoints. Maybe in this case, the callback can be amended (i.e., max_num_checkpoints can be lowered) when restarting.

from syne-tune.

amueller avatar amueller commented on June 16, 2024

Yeah makes sense that that's complex to support. It might not be worth the hassle. Another option would be to do what you suggest as second option by default. My issue, and maybe that of other new users, is that I didn't realize how quickly that would become an issue. Though your tutorial does point it out as an issue, so you could also just say that was user error.

from syne-tune.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.