Code Monkey home page Code Monkey logo

eliciting-latent-sentiment's Introduction

eliciting-latent-sentiment

Disclaimers

This is research code and we sincerely apologise for not having time to neatly package it up. The most friendly and reusable code can be found in the utils directory. We will try to draw attention to other key scripts here which might be of use to others. The Dockerfile should be sufficient to run most of this code. We forked CircuitsVis as a sub-module in order to make a couple of small changes used to generate cleaner figures for this paper.

Acknowledgements

Stanford Sentiment Treebank was downloaded from https://nlp.stanford.edu/sentiment/index.html

We are very grateful to Neel Nanda for his mentorship and the transformer-lens library.

SERI MATS provided funding for this research.

Cached sentiment direction

In the data/gpt2-small directory, there is a residual stream direction already computed using DAS and the "simple_train" dataset for each layer, stored as a numpy file.

Training the sentiment direction

In fit_directions.py, you can specify

  • a list of models (e.g. gpt2-small)
  • a list of methods (e.g. das, kmeans, logistic_regression)
  • a list of training datasets (we generally use simple_train)
  • a list of test datasets to use for evaluation during training (can just use none to save code time)
  • a scaffold (only necessary if using Stanford Sentiment Treebank): either continuation or classification.

Then the for-loop between # Training loop and # # END OF ACTUAL DIRECTION FITTING is the critical section.

This writes the directions to numpy files like data/gpt2-small/kmeans_simple_train_ADJ_layer1.npy. The file names are fairly self-explanatory but for completeness, there is a directory per model then the file name states the method, training data, token position and residual stream layer.

The only exception are the random directions which are generated by random_directions.py.

Patching the sentiment direction

In direction_patching_suite.py, you can select

  • a list of models
  • a list of filename patterns to load as directions
  • a list of evaluation datasets
  • a scaffold (if using Treebank)
  • a list of patching metrics (see utils/circuit_analysis.py::PatchingMetric). runs The for-loop at the bottom of the file performs the directional activation patching experiments and writes the results to CSVs with names that begin direction_patching_.

Then direction_patching_results.py is a very quick and basic script to generate the plots shown in the paper using the cached CSV files from the first step.

Circuit Analyses

The code used to analyze circuits performing various functions can be found in the notebook files prepended with circuit. mood_inference refers to circuits for the ToyMoodStories dataset or variants thereof. simple_sentiment refers to the ToyMovieReview dataset. In addition, we performed a number of analyses that we did not cover in the paper--e.g., sentiment continuation and classification in Pythia 1.4b.

Circuit analysis notebooks include a range of experiments that look at attention patterns and model components using patching experiments. Depending on the task, additional experiments may be included. Each notebook is specific to a dataset and model, and can be run top to bottom.

Note: Some of the dataset generation code may be outdated in a few of these notebooks. We have updated the most important of these, but if you find this is the case for a notebook you use, you can replace the dataset generation code to use the latest get_dataset function in utils.py.

Summarization Experiments

Treebank data

Before the Treebank dataset can be used, it is necessary to first run treebank_data_gen.py to write pickle files locally.

eliciting-latent-sentiment's People

Contributors

ojh31 avatar curt-tigges avatar elena-baixy avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.