Code Monkey home page Code Monkey logo

Comments (13)

neubig avatar neubig commented on September 4, 2024 1

I think it's safe to say that nobody is working on this. I'd say in terms of difficulty, I think probably deciding the design is the hardest part, and once the design is fixed it'd not be too difficult. Probably we'll need to think of what the yaml configuration file would look like for an ensembled model. I'll try to think about this a bit and post a draft, and @msperber might have some ideas as well.

from xnmt.

mbollmann avatar mbollmann commented on September 4, 2024

Is anyone currently working on this?

I'd be very interested in it, and wondered how difficult it would be to implement. Since I'm not that deep into the code base yet, it would be nice to hear some thoughts or considerations from other contributors.

from xnmt.

msperber avatar msperber commented on September 4, 2024

I think that would be great, and it would also be a good time to do this now because the infrastructure is stable now. We could aim for something like the following:

Training: train several models such as the one in 01_standard.yaml, we will end up with serialized models m1.mod / m1.mod.data, m2.mod / m2.mod.data, etc.

Test time config:

experiment: ..
  model: !EnsembleTranslator
    models:
    - !Load
      mod_file: m1.mod
      path: model # this should only load the 'model' part of the serialized experiment; also compatible with multi-tasking by specifying path: train.tasks.0.model instead
    - !Load
      mod_file: m2.mod
      path: model
  evaluate:
    .. # no changes needed here

We would need to address 2 limitations in the serialization process (I'll be happy to help with these):

  • The only way to load a previously trained model currently is using the load/overwrite mechanism, which only allows loading a single complete experiment, and then overwriting parts of it. Instead, we would like to load parts of several saved experiments (namely, the 'model' parts) and assemble them into one new experiment. The outlined !Load mechanism could do this, and would be a straightforward-to-implement extension of the current load/overwrite mechanism.
  • Currently, pretrained DyNet weights are loaded by first initializing the complete hierarchy, registering DyNet parameters where needed, and then finally filling the allocated parameters with pretrained values if available. This requires that the order of initialization is the same when creating the hierarchy initially and when loading a saved model. Extending this requires a bit of thought: either we assume that a whole .data file corresponds to a whole !Load object, and everything is still initialized in the same order below this, the drawback being that we can't apply this to load partial models using !Load that are trained via multi-task training; or, we could assign to each component a named DyNet param subcollection, and passing only the subcollection when initializing those components.

Adressing all of these at once may be too difficult, but as a starting point you could implement a dummy ensemble translator that looks the same at training and test time (so no new serialization features are needed), and simply replicates the same model several times, but then queries all of them at decoding time. Maybe something like this:

experiment:
  ..
  model: !DummyEnsembleTranslator
    models:
    - !DefaultTranslator
      _xnmt_id: main_model
      ..
    - !Ref {name: main_model}
  ..

The calc_loss method would be simply delegated to the first model's calc_loss, so that we can "train" the ensemble, but the generate() method would be changed to perform ensembled decoding.

Later, another intermediate step might be checkpoint ensembling, since we already have the feature to save multiple checkpoints.

Finally, one more hint would be to check lamtram's implementation of ensembling since it also uses DyNet: https://github.com/neubig/lamtram/blob/master/src/lamtram/ensemble-decoder.cc

from xnmt.

neubig avatar neubig commented on September 4, 2024

I looked over this and think that this is a good design, and also a very good way to break the problem into pieces to tackle both separately. @mbollmann, if you could do the first part of creating the !DummyEnsembleTranslator then probably @msperber could help with the serialization stuff, as the serialization part is slightly involved.

from xnmt.

neubig avatar neubig commented on September 4, 2024

(P.S.: one thing though, I don't really like the naming of !Load, it should probably be a little more descriptive.)

from xnmt.

mbollmann avatar mbollmann commented on September 4, 2024

Thanks for the outline! Sounds like a good plan to me as well, and I would really appreciate help with the serialization code then, as so far I only understood half of what @msperber said about its limitations :)

I will look into the dummy ensemble, should be a good start to figure out how to hook up the models.

from xnmt.

mbollmann avatar mbollmann commented on September 4, 2024

So, some observations/questions:

  1. Should the ensemble translator ultimately work with both DefaultTranslator and TransformerTranslator models or even a mixture of the two? I don't really know anything about the latter, but I noticed they use quite different generate functions; also, most of the actual generation of the former takes place within some SearchStrategy object. So, I'm not sure if/how a one-size-fits-all approach could work without some major refactoring.

  2. Implementing generate for an ensemble of DefaultTranslators only, on the other hand, could be achieved by having an EnsembleDecoder, EnsembleAttender etc. that wrap the individual models' components in a way that allows to pass them to the SearchStrategy objects in the same way as before, which seems relatively simple to do.

What do you think? Am I way off here?

from xnmt.

msperber avatar msperber commented on September 4, 2024

I would also say that the first goal would be ensembling only the DefaultTranslator. After implementing that, we probably have a better feel for whether the same approach would work for the TransformerTranslator.

Regarding (2), could you elaborate a bit more? One thing to keep in mind is that it would be good to keep the configuration files relatively simple. If I understood correctly, this approach would result in something like the following, which is probably more verbose than necessary?

  model: DefaultTranslator
    decoder: EnsembleDecoder
      decoders:
      - !LoadSerialized
          file: mod1.mod
          path: model.decoder
      - !LoadSerialized
          file: mod2.mod
          path: model.decoder
    attender: EnsembleAttender
      attenders:
      - !LoadSerialized
          file: mod1.mod
          path: model.attender
      - !LoadSerialized
          file: mod2.mod
          path: model.attender

from xnmt.

mbollmann avatar mbollmann commented on September 4, 2024

No, I was thinking that an ensemble for DefaultTranslator only could load the models, then access their attender, decoder, etc. attributes and wrap them in specialized ensemble classes to pass to a SearchStrategy for the actual output generation. We could still load full models, I would think, as long as we know they're DefaultTranslator objects.

To have an ensembling approach that is agnostic to the model architecture, we would need a function that only generates output for one timestep at a time (so we could then combine these outputs). SearchStrategy objects would need to interface with this function then, so we could pass them ensemble outputs as well. That would probably require immense refactoring, as they currently work with the decoder, attender etc. directly. At least this is how I see it, take it with a grain of salt as I haven't worked with the code for that long!

from xnmt.

msperber avatar msperber commented on September 4, 2024

Okay I got it, that sounds reasonable to me then!

from xnmt.

mbollmann avatar mbollmann commented on September 4, 2024

I started an implementation of this idea in #295 but got stuck -- see there for details.

from xnmt.

philip30 avatar philip30 commented on September 4, 2024

Maybe this is done?

from xnmt.

neubig avatar neubig commented on September 4, 2024

Implemented in #295. Thanks @mbollmann !

from xnmt.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.