Code Monkey home page Code Monkey logo

mlc's Introduction

Meta-Learning for Compositionality (MLC) for modeling human behavior

Meta-Learning for Compositionality (MLC) is an optimization procedure that encourages systematicity through a series of few-shot compositional tasks. This code shows how to train and evaluate a sequence-to-sequence (seq2seq) transformer in PyTorch to implement MLC for modeling human behavior.

A separate repository here has code for applying MLC to machine learning benchmarks, including SCAN and COGS.

Note: Users can regard the acronym BIML as synonymous with MLC. The approach was renamed to MLC after the code was written.

This code accompanies the following submitted paper.

  • Lake, B. M. and Baroni, M. (submitted). Human-like systematic generalization through a meta-learning neural network.

You can email brenden AT nyu.edu if you would like a copy.

Credits

This repo borrows from the excellent PyTorch seq2seq tutorial.

Requirements

Python 3 with the following packages (install time within minutes): torch (PyTorch), sklearn (scikit-learn), numpy, matplotlib

The specific versions used for development: Python (3.7.9), PyTorch (1.10.1), sklean (0.24.2), numpy (1.21.5), matplotlib (3.3.2)

Downloading data and pre-trained models

Meta-training data
To get the episodes used for meta-training, you should download the following zip file with the 100K meta-training episodes. Please extract data_algebraic.zip such that data_algebraicis a sub-directory of the main repo.

Pre-trained models
To get the top pre-trained models, you should download the following zip file. Please extract BIML_top_models.zip such that out_models is a sub-directory of the main repo and contains the model files net-BIML-*.pt.

Evaluating models

There are many different ways to evaluate a model after training, each of which should take less than a minute on a standard desktop. Here are a few examples.

Generating algebraic outputs on few-shot learning task

Here we find the best response from the pre-trained MLC model using greedy decoding:

python eval.py  --max --episode_type few_shot_gold --fn_out_model net-BIML-top.pt --verbose

Evaluating human responses on few-shot learning task (using log-likelihood)

Here we evaluate the log-likelihood of the human data:

python eval.py  --ll --ll_nrep 100 --episode_type few_shot_human --ll_p_lapse 0.03 --fn_out_model net-BIML-top.pt

To evaluate the log-likelihood of all models and to reproduce Table 1 in the manuscript, you can run this command for the various models. Please see the table below for how to set the arguments in each case. Note that due to system/version differences, the log-likelihood values may vary in minor ways from the paper.

--fn_out_model --ll_p_lapse --episode_type
net-basic-seq2seq-top.pt 0.9 human_vanilla
net-BIML-copy-top.pt 0.5 few_shot_human
net-BIML-algebraic-top.pt 0.1 few_shot_human
net-BIML-joint-top.pt 0.03 few_shot_human
net-BIML-top.pt 0.03 few_shot_human

Sampling model responses for the few-shot learning task

The models can be asked to mimic human responses on few-shot learning. To do so, the models sample from their distribution of possible outputs. A full set of samples from the models is available on this webpage. To reproduce the results for MLC (or other models), you can type the following to generate a HTML page.

python eval.py --episode_type few_shot_human_mult10 --sample_html --fn_out_model net-BIML-top.pt

Then, after ensuring the right file name is listed under __main__ in the script analysis_few_shot.py, you can compare the human and machine mistakes

cd html_output/few_shot_human_mult10
python analysis_few_shot.py

This should reproduce the hosted HTML file and numbers reported in the paper. Small variations may arise through version differences. Here is a snippet of the HTML and the text output.
few shot learning task

Human responses (item accuracy):
   DAX after 1 : 86.364
  ...
  mean overall acc.: 80.739
  mean acc. on simple queries: 85.479
  mean acc. on complex queries: 76.0
  mean acc. on len=6 queries: 72.5
  perc. of errors that are one2one:  24.39 ; 10 of 41
  perc. of errors (involving "after") that are iconic :  23.333 ; 7 of 30
Model responses (item accuracy):
   DAX after 1 : 85.909
  ...
  mean overall acc.: 82.376
  mean acc. on simple queries: 86.252
  mean acc. on complex queries: 78.5
  mean acc. on len=6 queries: 77.75
  perc. of errors that are one2one:  56.267 ; 211 of 375
  perc. of errors (involving "after") that are iconic :  13.83 ; 39 of 282

Correlation for item accuracies: r= 0.788 ; p= 0.007
Generating HTML file: human_few_shot_behavior.html

Sampling model responses for the open-ended task

The models can be asked to mimic human responses on the open-ended task. Again, a full set of samples from the models is available on this webpage. To reproduce the results for MLC, you can type the following two commands to generate a HTML page.

python eval.py --episode_type open_end_freeform --sample_iterative --fn_out_model net-BIML-open-ended-top.pt

Then, after ensuring the right file name is listed under __main__ in the script analysis_freeform.py, you can compare the human and machine mistakes

cd html_output/open_end_freeform
python analysis_freeform.py

This should reproduce the hosted HTML file and numbers reported in the paper. Small variations may arise through version differences. Here is a snippet of the HTML and the text output.
open-ended task

Human:
   Processing 29 human participants.
   Percent with perfect maps (consistent with 3 inductive biases): 58.621 ; N= 17 of 29
   Percent with one2one maps: 62.069 ; N= 18 of 29
   Percent with iconic concatenation: 79.31 ; N= 23 of 29
   Percent with ME maps: 93.103 ; N= 27 of 29
Model:
   Processing 100 model samples.
   Percent with perfect maps (consistent with 3 inductive biases): 65.0 ; N= 65 of 100
   Percent with one2one maps: 66.0 ; N= 66 of 100
   Percent with iconic concatenation: 85.0 ; N= 85 of 100
   Percent with ME maps: 99.0 ; N= 99 of 100
Generating HTML file: human_open_end_freeform.html
Generating HTML file: open_end_freeform_net-BIML-open-ended-top.html

The full set of evaluation arguments can be viewed with when typing python eval.py -h:

optional arguments:
  -h, --help            show this help message and exit
  --fn_out_model FN_OUT_MODEL
                        *REQUIRED*. Filename for loading the model
  --dir_model DIR_MODEL
                        Directory for loading the model file
  --max_length_eval MAX_LENGTH_EVAL
                        Maximum generated sequence length
  --batch_size BATCH_SIZE
                        Number of episodes in batch
  --episode_type EPISODE_TYPE
                        What type of episodes do we want? See datasets.py for
                        options
  --dashboard           Showing loss curves during training.
  --ll                  Evaluate log-likelihood of validation (val) set
  --max                 Find best outputs for val commands (greedy decoding)
  --sample              Sample outputs for val commands
  --sample_html         Sample outputs for val commands in html format (using
                        unmap to canonical text)
  --sample_iterative    Sample outputs for val commands iteratively
  --fit_lapse           Fit the lapse rate
  --ll_nrep LL_NREP     Evaluate each episode this many times when computing
                        log-likelihood (needed for stochastic remappings)
  --ll_p_lapse LL_P_LAPSE
                        Lapse rate when evaluating log-likelihoods
  --verbose             Inspect outputs in more detail

Episode types

Please see datasets.py for the full set of options. Here are a few key episode types that can be set via --episode_type:

  • algebraic+biases : For meta-training. Corresponds to "MLC" in Table 1 and main results
  • algebraic_noise : For meta-training. Corresponds to "MLC (algebraic only)" in Table 1 and main results
  • retrieve : For meta-training. Correspond to "MLC (copy only)" in Table 1 and main results
  • few_shot_gold : For evaluating MLC on the prescribed algebraic responses for the few-shot learning task. (test only)
  • few_shot_human : For evaluating MLC on predicting human responses for the few-shot learning task. (test only)
  • few_shot_human_mult10 : For evaluating MLC on predicting human responses for the few-shot learning task (human data up-sampled/repeated 10x). (test only)
  • open_end_freeform : For generating MLC responses on open-ended task. Here, the models iteratively fill out responses one-by-one. (test only)

Training models from scratch

To train MLC on few-shot learning (as in the MLC model in Fig. 2 and Table 1), you can run the train command with default arguments:

python train.py --episode_type algebraic+biases --fn_out_model net-BIML.pt

which will produce a file out_models/net-BIML.pt.

The full set of training arguments can be viewed with python train.py -h:

optional arguments:
  -h, --help            show this help message and exit
  --fn_out_model FN_OUT_MODEL
                        *REQUIRED* Filename for saving model checkpoints.
                        Typically ends in .pt
  --dir_model DIR_MODEL
                        Directory for saving model files
  --episode_type EPISODE_TYPE
                        What type of episodes do we want? See datasets.py for
                        options
  --batch_size BATCH_SIZE
                        number of episodes per batch
  --nepochs NEPOCHS     number of training epochs
  --lr LR               learning rate
  --lr_end_factor LR_END_FACTOR
                        factor X for decrease learning rate linearly from
                        1.0*lr to X*lr across training
  --no_lr_warmup        Turn off learning rate warm up (by default, we use 1
                        epoch of warm up)
  --nlayers_encoder NLAYERS_ENCODER
                        number of layers for encoder
  --nlayers_decoder NLAYERS_DECODER
                        number of layers for decoder
  --emb_size EMB_SIZE   size of embedding
  --ff_mult FF_MULT     multiplier for size of the fully-connected layer in
                        transformer
  --dropout DROPOUT     dropout applied to embeddings and transformer
  --act ACT             activation function in the fully-connected layer of
                        the transformer (relu or gelu)
  --save_best           Save the "best model" according to validation loss.
  --save_best_skip SAVE_BEST_SKIP
                        Do not bother saving the "best model" for this
                        fraction of early training
  --resume              Resume training from a previous checkpoint

mlc's People

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.