Code Monkey home page Code Monkey logo

mend's Introduction

MEND: Model Editing Networks using Gradient Decomposition

If you run into any issues with the code, you can open an issue and/or email me at [email protected]

Setup

Environment

This codebase uses Python 3.7.9. Other versions may work as well.

Create a virtualenv (pyenv can help with this) and install the dependencies:

$ python -m venv env
$ source env/bin/activate
(env) $ pip install -r requirements.txt

Data

You can download the data needed for this project from this Google Drive link. Unzip each sub-directory into mend/data and you should be good to go.

Running the code

Run MEND training/evaluation for distilGPT-2 on the wikitext editing problem with:

(env) $ python -m run +alg=mend +experiment=gen +model=distilgpt2 data.wiki_webtext=False

Other valid algs include efk (KnowledgeEditor) and enn (Editable Neural Networks). Other valid experiments include fc (FEVER fact checking) and qa (zsRE question-answering). Splits, rephrases, and pre-trained BERT and BART models required for running fc and qa, respectively, come from De Cao et. al (see repo here). Check config/model for options for editable models (note that all models don't work for all experiments; GPT-style models only work with gen, seq2seq models only work with qa, and BERT only works with fc).

Also note that in the paper, we sample locality data from different datasets depending on the model. By default, training will use Natural Questions data (not zsRE data) for computing drawdown in the qa experiment and OpenWebText. For models such as the distilgpt2 model we use (which was fine-tuned on wikitext) or the BART-base model, this behavior should be disabled with data.wiki_webtext=False or data.zsre_nq=False, respectively.

Multi-edit experiments

For multi-edit experiments, it's important to configure batch sizing correctly. In order to run training & evaluation with 5 edits, for example, we pass the arguments data.n_edits=5 batch_size=6 val_batch_size=6.

This convention is interpreted as using batches of size 6 during training and validation, with 5 of those batch elements being used to apply edits to the model and the remaining (1) example used to compute drawdown.

Citing the paper

If this code or paper was useful, please consider using the following citation:

@inproceedings{mitchell2022fast,
    title={Fast Model Editing at Scale},
    author={Eric Mitchell and Charles Lin and Antoine Bosselut and Chelsea Finn and Christopher D Manning},
    booktitle={International Conference on Learning Representations},
    year={2022},
    url={https://openreview.net/pdf?id=0DcZxeWfOPt}
}

mend's People

Contributors

eric-mitchell avatar xxupiano avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.