Code Monkey home page Code Monkey logo

Comments (5)

ravi-mosaicml avatar ravi-mosaicml commented on August 20, 2024

Depends on #120

from composer.

hanlint avatar hanlint commented on August 20, 2024

adding @ajaysaini725 here to help with design ideas

from composer.

ajaysaini725 avatar ajaysaini725 commented on August 20, 2024

Gave this some thought - in order to finalize a design for this we need to decide exactly how to handle placement of train_dataloader and Evaluators across Trainer.__init__, Trainer.fit(), and Trainer.eval().

This is very connected to the discussion and design that happened in #161 .

The high-level decision looks like the following:

  • Do we plan on keeping train_dataloader and eval_dataloader as required arguments to the Trainer.__init__ or are we okay moving them to Trainer.fit()?
    • Reason being that it doesn’t make sense to force the user to pass in a train_dataloader in order to instantiate a Trainer to load the checkpoint when running eval_only mode
    • If we decide to keep the Trainer.__init__ as-is, then it would make the most sense to have the eval_only mode be either:
      • A separate class that takes in the necessary input and provides a predict() function
      • A staticmethod of the Trainer class that takes in the necessary input
    • If we decide to move the dataloader arguments out of Trainer.__init__ and into Trainer.fit(), then we can have the Trainer.__init__ take an eval_only flag, have eval() take Evaluators as an argument, and have a predict() function within the Trainer class that takes a test_dataloader as an argument

In order to support interactive development in addition to having a clean eval_only interface, it seems like it makes the most sense to move the train_dataloader andEvaluators out of __init__ (or at least make them optional) and make them parameters to fit() and eval(). Once we do this, we can easily add a predict() function that is cohesive with the rest of the Trainer design as well.

Thoughts?

from composer.

ravi-mosaicml avatar ravi-mosaicml commented on August 20, 2024

One of the main issues about moving arguments out of init is that create_from_hparams would need to store the arguments that would be passed into fit somewhere. We could also refactoring the yaml and have something else call trainer.init and trainer.fit for jobs created from hparams. I think if we went this route we'd need to resolve the multiple calls to fit issue.

For eval only, thoughts about making a separate class for evaluation (say by extending the evaluator that is part of #120)? Im thinking that the eval method that is currently in the trainer would be moved there. Then, the trainer would call self.evaluator.eval() or the like to perform evaluation. A user who wants to evaluate only would just create an evaluator and not instantiate a trainer. The evaluator and trainer would share the same model (passed by reference).

from composer.

ajaysaini725 avatar ajaysaini725 commented on August 20, 2024

One of the main issues about moving arguments out of init is that create_from_hparams would need to store the arguments that would be passed into fit somewhere. We could also refactoring the yaml and have something else call trainer.init and trainer.fit for jobs created from hparams. I think if we went this route we'd need to resolve the multiple calls to fit issue.

For eval only, thoughts about making a separate class for evaluation (say by extending the evaluator that is part of #120)? Im thinking that the eval method that is currently in the trainer would be moved there. Then, the trainer would call self.evaluator.eval() or the like to perform evaluation. A user who wants to evaluate only would just create an evaluator and not instantiate a trainer. The evaluator and trainer would share the same model (passed by reference).

We could go with the separate Evaluator class approach you mentioned since it works for this use case but I have a few concerns.

  1. It deviates quite a bit from the pattern established by other trainers like PyTorchLightning and HuggingFace and might not be what people are used to? In general maybe we shouldn't re-invent the wheel to much for things that already have somewhat of a standard out there like a Trainer API.
  2. The eval() function still has some non-trivial setup logic that is tied to the Trainer itself (ex. around handling data-parallelism). Instantiating it separately may involve duplicate logic.
  3. When running eval() on multiple evaluators, we want to loop over the dataset once and run all evaluators on each batch, this is hard to do if eval() is in the Evaluator class.

Regarding the issue with create_from_hparams, one thing we can do is keep the dataloaders in trainer_hparams but have create_from_hparams ignore them. Then we can change our run_mosaic_trainer script to extract the dataloaders from trainer_hparams and pass them to the fit() function. This does deviate from our standard use of hparams in that there will be values in trainer_hparams that aren't used by the create_from_hparams but it does allow for an API change that makes the non-hparams codepath a lot better. Also, as a note, the reason for leaving the dataloaders in the trainer_hparams is so that the entire yaml can be parsed at once into a single class by yahp (it would be too verbose in code to have to extract the specific parts of the yaml like dataloaders that aren't needed by `create_from_hparams).

Thoughts?

from composer.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.