Code Monkey home page Code Monkey logo

snorkel-superglue's Introduction

snorkel-superglue

Applying Snorkel to SuperGLUE

This repository includes a demonstration of how to use the Snorkel library to achieve a state-of-the-art score on the SuperGLUE benchmark. The specific code used to create the submission on the leaderboard is hosted in the emmental-tutorials repository. This repository contains a refactored version of that code made compatible with the Snorkel API for general exploration.

Best Reference: Blog post

Installation

To use this repository:

  1. Install snorkel (see snorkel repo for details). This repository will be compatible with v0.9 being released in July and will use pip to install it as a package. In the meantime, run the following command from within the snorkel repository to checkout the appropriate version:

    git fetch --tags
    git checkout snorkel-superglue
    
  2. In the virtual environment you created for snorkel (or a copy of it if you want to keep them separate), move back to this directory and run:

    pip install -r requirements.txt
    
  3. Set the environment variable $SUPERGLUEDATA that points to the directory where the data will be stored. We recommend using a directory called data/ at the root of the repo) by running:

    export SUPERGLUEDATA=$(pwd)/data/
    
  4. Download the SuperGLUE data by running:

    bash download_superglue_data.sh $SUPERGLUEDATA
    

This will download the data for the primary SuperGLUE tasks as well as the SWAG dataset used for pretraining COPA. To obtain the MNLI dataset for pretraining RTE and CB, we recommend referring to the starter code for the GLUE benchmark.

Usage

  • Tutorials for using Slicing Functions (SFs), Transformation Functions (TFs), or doing pre-training with an auxiliary task are included under tutorials/. Start with the WiC_augmentation_tutorial and WiC_slicing_tutorial for gentler introductions to those concepts.
  • To train a model for one of the SuperGLUE tasks, use run.py with settings you specify or run.sh to use general defaults we recommend. (e.g., bash run.sh CB)
  • See run.py for an example of how to add slicing functions to a run.
  • Note that the first training run will automatically download the pretrained BERT module, and that training will be very slow in general without a GPU.

snorkel-superglue's People

Contributors

bhancock8 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.