Code Monkey home page Code Monkey logo

Comments (9)

egpbos avatar egpbos commented on June 22, 2024

Ah yeah, I was worried about that parse there, but didn't see yet how it would go wrong :)

It turns out there are now module-level "properties" (not really properties, but still pretty awesome, I didn't know before just now!), so I would go with that, see here: https://www.python.org/dev/peps/pep-0562/

So for config.py, replace the args, ... line with something like:

def __getattr__(name):
    """Only trigger parsing when it's needed."""
    if name == 'args':
        args, unknown_args = parser.parse_known_args()
        return args
    raise AttributeError(f"module '{__name__}' has no attribute '{name}'")

from platalea.

bhigy avatar bhigy commented on June 22, 2024

I tried that but __getattr__ is called before my main script adds its own arguments.

from platalea.

egpbos avatar egpbos commented on June 22, 2024

Ah yeah, that would happen when another module (that you import from your script probably) uses config.args or generally if config.args is accessed anywhere before all arguments are added to the parser... Can you find out where that happens and whether we can avoid that access?

from platalea.

bhigy avatar bhigy commented on June 22, 2024

It is in dataset.py. We might be able to avoid the issue here but I feel like we might run into this issue agian with the current setup. I was wondering wether an alternative could be to use two different parsers: one for general level config, such as dataset path, which wouldn't have help associated with it (we could document those parameters somewhere else) and another one for the script specific parameters that would appear in the help message.

from platalea.

egpbos avatar egpbos commented on June 22, 2024

Hm, then it wouldn't give any help message if you run a script that doesn't add the second parser.

Another option would be to just put all parameters in one file, even though we only use them in specific runs. We can indicate these experiment-specific parameters in the help section using groups: https://docs.python.org/3/library/argparse.html#argument-groups

So for instance, --epochs, which I added to basic-stack/run.py, should then go in this experiment-parameter group in __init__.py (or maybe we can move it to config.py since we don't need the named singleton parser anymore, we can just make it a simple member of config):

parser = configargparse.get_argument_parser(name='platalea')

# [... current options there ...]

experiment_arguments = parser.add_argument_group('experiment arguments', "Some experiment run scripts use the following optional parameters.")

experiment_arguments.add_argument('--epochs', action='store', default=32, dest='epochs', type=int,
                                  help='number of epochs after which to stop training (default: 32)')
# [... etc ...]

The help message is then:

python experiments/basic-stack/run.py -h
usage: run.py [-h] [--data_root DATA_ROOT] [--meta META] [--audio_features_fn AUDIO_FEATURES_FN]
              [--audio_subdir AUDIO_SUBDIR] [--image_subdir IMAGE_SUBDIR] [--epochs EPOCHS]

Args that start with '--' (eg. --data_root) can also be set in a config file (config.ini or config.yml). Config file
syntax allows: key=value, flag=true, stuff=[a,b,c] (for details, see syntax at https://goo.gl/R74nmi). If an arg is
specified in more than one place, then commandline values override environment variables which override config file
values which override defaults.

optional arguments:
  -h, --help            show this help message and exit
  --data_root DATA_ROOT
                        location of the flickr8k (or similar) dataset [env var: PLATALEA_DATA_ROOT] (default:
                        /roaming/gchrupal/datasets/flickr8k/)
  --meta META           filename of the metadata file (dataset.json or similar) relative to the dataset location [env
                        var: PLATALEA_METADATA_JSON] (default: dataset_multilingual.json)
  --audio_features_fn AUDIO_FEATURES_FN
                        filename of the MFCC audio features file relative to the dataset location [env var:
                        PLATALEA_AUDIO_FEATURES_FN] (default: mfcc_delta_features.pt)
  --audio_subdir AUDIO_SUBDIR
                        directory containing the flickr8k wav files, relative to the dataset location [env var:
                        PLATALEA_AUDIO_SUBDIR] (default: flickr_audio/wavs/)
  --image_subdir IMAGE_SUBDIR
                        directory containing the flickr8k image files, relative to the dataset location [env var:
                        PLATALEA_IMAGE_SUBDIR] (default: Flickr8k_Dataset/Flicker8k_Dataset/)

experiment arguments:
  Some experiment run scripts use the following optional parameters.

  --epochs EPOCHS       number of epochs after which to stop training (default: 32) (default: 32)

A downside of this approach is that we may at some point end up with a huge bunch of experiment arguments, but we can always reevaluate once this happens. Also, we may need slightly different variants/interpretations for different experiments. However, if we document this properly, it shouldn't be a big issue.

What do you think?

from platalea.

egpbos avatar egpbos commented on June 22, 2024

Btw, this approach also means we can get rid of the perhaps slightly overly complicated get_attr trick again :P I love it, but sometimes you have to kill your darlings ;)

from platalea.

bhigy avatar bhigy commented on June 22, 2024

I am not really fond of adding all parameters in one file for 2 main reasons:

  • it will quickly make the help unreadable
  • while some parameters (e.g. number of epochs) apply to most cases, some are specific to one script. I think things will get very confused because of this.

I would rather go for one of the following option:

  1. have only important parameters for each script in the help and general configuration elements visible somewhere else (e.g. through python config.py -h)
  2. keep the get_attr trick and remove the accesses to the parameters in function definitions so that the script can add it's own parameters before help is displayed

from platalea.

egpbos avatar egpbos commented on June 22, 2024

After discussion, I think we came up with the following idea, please correct me if I misunderstood:

We create a new class that internally uses the ConfigArgParse parser, but by default does not parse, it only parses when the user asks it to. Moreover, it exposes the default parameters so that we can still use those when parsing has not taken place. When parsing is activated, the class updates the parameters from their default values to the values from the parser. The parser has a set of default options included (those that are used inside the main platalea module, like data_root and filenames for feature files), but specific options can be added for each experiment.

The interface and use will be something like this:

class PlataleaConfig:
  def __init__(self):
    # setup ConfigArgParse parser and default parameters in self.stuff dicts

  def add_argument(self, *args, **kwargs):
    self._parser.add_argument(*args, **kwargs)

  def parse(self):
    args, _ = self.parser.parse_known_args()
    # update default parameters dict(s) using args results

  def __getattr__(self, arg_name):
    return self._stuff[arg_name]

from platalea.

bhigy avatar bhigy commented on June 22, 2024

Yes, that's it.

from platalea.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.