It would be nice to be able to decode with an ensemble of multiple models.

So, some observations/questions: Should the ens

I would also say that the first goal would be ensembling only the <code class="notrans

Feature Request: Ensembling about xnmt HOT 13 CLOSED

neulab commented on September 4, 2024

Feature Request: Ensembling

from xnmt.

Comments (13)

neubig commented on September 4, 2024 1

I think it's safe to say that nobody is working on this. I'd say in terms of difficulty, I think probably deciding the design is the hardest part, and once the design is fixed it'd not be too difficult. Probably we'll need to think of what the yaml configuration file would look like for an ensembled model. I'll try to think about this a bit and post a draft, and @msperber might have some ideas as well.

from xnmt.

mbollmann commented on September 4, 2024

Is anyone currently working on this?

I'd be very interested in it, and wondered how difficult it would be to implement. Since I'm not that deep into the code base yet, it would be nice to hear some thoughts or considerations from other contributors.

from xnmt.

msperber commented on September 4, 2024

I think that would be great, and it would also be a good time to do this now because the infrastructure is stable now. We could aim for something like the following:

Training: train several models such as the one in 01_standard.yaml, we will end up with serialized models m1.mod / m1.mod.data, m2.mod / m2.mod.data, etc.

Test time config:

experiment: ..
  model: !EnsembleTranslator
    models:
    - !Load
      mod_file: m1.mod
      path: model # this should only load the 'model' part of the serialized experiment; also compatible with multi-tasking by specifying path: train.tasks.0.model instead
    - !Load
      mod_file: m2.mod
      path: model
  evaluate:
    .. # no changes needed here

We would need to address 2 limitations in the serialization process (I'll be happy to help with these):

The only way to load a previously trained model currently is using the load/overwrite mechanism, which only allows loading a single complete experiment, and then overwriting parts of it. Instead, we would like to load parts of several saved experiments (namely, the 'model' parts) and assemble them into one new experiment. The outlined !Load mechanism could do this, and would be a straightforward-to-implement extension of the current load/overwrite mechanism.
Currently, pretrained DyNet weights are loaded by first initializing the complete hierarchy, registering DyNet parameters where needed, and then finally filling the allocated parameters with pretrained values if available. This requires that the order of initialization is the same when creating the hierarchy initially and when loading a saved model. Extending this requires a bit of thought: either we assume that a whole .data file corresponds to a whole !Load object, and everything is still initialized in the same order below this, the drawback being that we can't apply this to load partial models using !Load that are trained via multi-task training; or, we could assign to each component a named DyNet param subcollection, and passing only the subcollection when initializing those components.

Adressing all of these at once may be too difficult, but as a starting point you could implement a dummy ensemble translator that looks the same at training and test time (so no new serialization features are needed), and simply replicates the same model several times, but then queries all of them at decoding time. Maybe something like this:

experiment:
  ..
  model: !DummyEnsembleTranslator
    models:
    - !DefaultTranslator
      _xnmt_id: main_model
      ..
    - !Ref {name: main_model}
  ..

The calc_loss method would be simply delegated to the first model's calc_loss, so that we can "train" the ensemble, but the generate() method would be changed to perform ensembled decoding.

Later, another intermediate step might be checkpoint ensembling, since we already have the feature to save multiple checkpoints.

Finally, one more hint would be to check lamtram's implementation of ensembling since it also uses DyNet: https://github.com/neubig/lamtram/blob/master/src/lamtram/ensemble-decoder.cc

from xnmt.

neubig commented on September 4, 2024

I looked over this and think that this is a good design, and also a very good way to break the problem into pieces to tackle both separately. @mbollmann, if you could do the first part of creating the !DummyEnsembleTranslator then probably @msperber could help with the serialization stuff, as the serialization part is slightly involved.

from xnmt.

neubig commented on September 4, 2024

(P.S.: one thing though, I don't really like the naming of !Load, it should probably be a little more descriptive.)

from xnmt.

mbollmann commented on September 4, 2024

Thanks for the outline! Sounds like a good plan to me as well, and I would really appreciate help with the serialization code then, as so far I only understood half of what @msperber said about its limitations :)

I will look into the dummy ensemble, should be a good start to figure out how to hook up the models.

from xnmt.

mbollmann commented on September 4, 2024

So, some observations/questions:

Should the ensemble translator ultimately work with both DefaultTranslator and TransformerTranslator models or even a mixture of the two? I don't really know anything about the latter, but I noticed they use quite different generate functions; also, most of the actual generation of the former takes place within some SearchStrategy object. So, I'm not sure if/how a one-size-fits-all approach could work without some major refactoring.
Implementing generate for an ensemble of DefaultTranslators only, on the other hand, could be achieved by having an EnsembleDecoder, EnsembleAttender etc. that wrap the individual models' components in a way that allows to pass them to the SearchStrategy objects in the same way as before, which seems relatively simple to do.

What do you think? Am I way off here?

from xnmt.

msperber commented on September 4, 2024

I would also say that the first goal would be ensembling only the DefaultTranslator. After implementing that, we probably have a better feel for whether the same approach would work for the TransformerTranslator.

Regarding (2), could you elaborate a bit more? One thing to keep in mind is that it would be good to keep the configuration files relatively simple. If I understood correctly, this approach would result in something like the following, which is probably more verbose than necessary?

  model: DefaultTranslator
    decoder: EnsembleDecoder
      decoders:
      - !LoadSerialized
          file: mod1.mod
          path: model.decoder
      - !LoadSerialized
          file: mod2.mod
          path: model.decoder
    attender: EnsembleAttender
      attenders:
      - !LoadSerialized
          file: mod1.mod
          path: model.attender
      - !LoadSerialized
          file: mod2.mod
          path: model.attender

from xnmt.

mbollmann commented on September 4, 2024

No, I was thinking that an ensemble for DefaultTranslator only could load the models, then access their attender, decoder, etc. attributes and wrap them in specialized ensemble classes to pass to a SearchStrategy for the actual output generation. We could still load full models, I would think, as long as we know they're DefaultTranslator objects.

To have an ensembling approach that is agnostic to the model architecture, we would need a function that only generates output for one timestep at a time (so we could then combine these outputs). SearchStrategy objects would need to interface with this function then, so we could pass them ensemble outputs as well. That would probably require immense refactoring, as they currently work with the decoder, attender etc. directly. At least this is how I see it, take it with a grain of salt as I haven't worked with the code for that long!

from xnmt.

msperber commented on September 4, 2024

Okay I got it, that sounds reasonable to me then!

from xnmt.

mbollmann commented on September 4, 2024

I started an implementation of this idea in #295 but got stuck -- see there for details.

from xnmt.

philip30 commented on September 4, 2024

Maybe this is done?

from xnmt.

neubig commented on September 4, 2024

Implemented in #295. Thanks @mbollmann !

from xnmt.

Feature Request: Ensembling about xnmt HOT 13 CLOSED

Comments (13)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent