Code Monkey home page Code Monkey logo

Comments (5)

LarsHH avatar LarsHH commented on May 26, 2024 1

Hey Martin,
PBT initializes a whole population of e.g. size 20 and trains each population member for say 1 epoch. Let's call this the first generation. The top 80% of this first generation simply move on to the second generation. The bottom 20% are discarded and replaced by sampling members from the top 20% and perturbing their parameters. Then this second generation is trained for one epoch. Then the process repeats onto the third generation and so forth.

So the population itself doesn't actually grow. But it does evolve through the resampling. Furthermore, each population member is trained further and further (in terms of epochs).

Now for Sherpa there may be a little bit of confusion in terms of the naming. For the sake of being able to parallelize, Sherpa here considers one trial as one "job". So Sherpa-PBT initializes the population as 20 trials with randomly sampled hyperparameters and leaves it to the user to decide in their script for how long to train each (say one epoch). After those 20 one-epoch-trials have finished it will schedule the top 80% out of those as new trials but indicating via the load_from field to load the weights from a previously finished trial. This corresponds to "continuing" the best 80%. Each of those new trials will have new trial IDs because those have to be unique. You can however identify them by the fact that their generation field will be 2 and their load_from field will indicate what the "parent" of this trial is.

For the bottom 20% their load_from fields will correspond to trials from the top 20% of the previous generation and their trial.parameters will have those parameters perturbed. So the user has to incorporate those perturbed parameters. For some Keras parameters this can be a bit tricky and I think you had actually found a bug for that in another issue.

Let me know if this clarifies things at all. Will review the other issues now.

Best,
Lars

from sherpa.

LarsHH avatar LarsHH commented on May 26, 2024

PopulationBasedTraining has the population_size argument. Since PBT only trains one population the notion of max_num_trials doesn't really exist there. One could call population_size max_num_trials instead but I think that could be confusing.

from sherpa.

LarsHH avatar LarsHH commented on May 26, 2024

The asynchronous successive halving algorithm also doesn't really have a notion of maximum number of trials. It does however have a max_finished_configs argument. This corresponds to putting a limit on how many trials to finish. This could be renamed max_num_trials. I am not sure though if that would make it clearer or less clear, since this would only refer to the finished trials and not to the many unfinished ones that the algorithm explores along the way.

from sherpa.

LarsHH avatar LarsHH commented on May 26, 2024

Both algorithms are ready to use. I just haven't run and reproduced those plots in the documentation yet.

from sherpa.

martsalz avatar martsalz commented on May 26, 2024

PopulationBasedTraining has the population_size argument. Since PBT only trains one population the notion of max_num_trials doesn't really exist there. One could call population_size max_num_trials instead but I think that could be confusing.

What do you mean by "Since PBT only trains one population"?

If I use the PBT as shown below, I can see from the table which trials were performed in which generation and on which trial the trial X is based. How many generations are carried out in total or how often is this process repeated?

In my case I have specified population_size=10 and in the experiment > 10 trials are performed.

image

Thanks.

from sherpa.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.