Code Monkey home page Code Monkey logo

gpt3_gender's Introduction

Gender and Representation Bias in GPT-3 Generated Stories

This README is ordered according to sections in the paper, and each section describes corresponding scripts and materials involved in their production.

Abstract

Using topic modeling and lexicon-based word similarity, we find that stories generated by GPT-3 exhibit many known gender stereotypes. Generated stories depict different topics and descriptions depending on GPT-3's perceived gender of the character in a prompt, with feminine characters more likely to be associated with family and appearance, and described as less powerful than masculine characters, even when associated with high power verbs in a prompt. Our study raises questions on how one can avoid unintended social biases when using large language models for storytelling.

Requirements

See requirements.txt. I think I may have switched machines/environments at some point during this project, so let me know if something does not work with these packages.

Data

  • query_openai.py: for gathering GPT-3 generated stories
  • data/booklist.csv: list of authors and book titles

We cannot release the original books because they are copyrighted. Generated stories may also output copyrighted content.

Text Processing

  • book_nlp.sh: runs Book NLP over original books
  • check_book_bounds.py: just checks that the start and end of books annotated by humans can actually be used to find start/end
  • data_organize.py: various formatting and sanity checking functions
  • get_characters.py: extracts sentences that mention main characters, to use as prompts
  • get_entity_info.py: groups character mentions and finds pronouns associated with them based on coreference chains
  • preprocessing.py: for getting an idea of what books I have
  • segment_original_books.py: get excerpts from original books with similar length as generated stories

The coreference model is the same as that used in Sims et al. 2020.

Gender

We do not recommend the use of these methods for inferring the actual gender of real people. Please see the paper's section on gender and the other papers I cite in that section.

  • gender_inference.py
  • character_viz.ipynb: contains code for generating plots in the paper
  • logs/char_gender_0.9: contains gender labels, pronouns, and aliases for each generated story's characters (numbers are story IDs)
  • logs/orig_char_gender: contains gender labels, pronouns, and aliases for each excerpt's characters (numbers are story IDs)

Matching

  • prompt_design.py: prompt matching (note: there are some deprecated functions in here, see function comments)

Topics

  • mallet.sh: taken from this repo, runs topic modeling
  • get_topics.py: gets topics for documents/stories, modified from this repo
  • character_viz.ipynb: contains code for generating plots in the paper
  • logs/orig_gender_topics.json, logs/gender_topics_0.9.json, logs/gender_topics_0.9_matched.json: topics and gender

Lexicons

  • word_embeddings.py: functions for the word embedding part of this paper
  • prompt_design.py: getting prompts with specific verbs
  • character_viz.ipynb: contains code for generating plots in the paper
  • logs/matched_adj_verb, logs/orig_adj_verb, logs/generated_adj_verb: adjectives and verbs

Ethan Fast's stereotype lexicon is available upon request. Power verbs can be found here, Bloom's taxonomy verbs are in the verb folder of this repo, and Empath categories are here.

gpt3_gender's People

Contributors

lucy3 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.