Code Monkey home page Code Monkey logo

transformer-puzzles's Introduction

Transformer Puzzles

Open In Colab

This notebook is a collection of short coding puzzles based on the internals of the Transformer. The puzzles are written in Python and can be done in this notebook. After completing these you will have a much better intutive sense of how a Transformer can compute certain logical operations.

These puzzles are based on Thinking Like Transformers by Gail Weiss, Yoav Goldberg, Eran Yahav and derived from this blog post.

image

Goal

Can we produce a Transformer that does basic elementary school addition?

i.e. given a string "19492+23919" can we produce the correct output?

Rules

Each exercise consists of a function with a argument seq and output seq. Like a transformer we cannot change length. Operations need to act on the entire sequence in parallel. There is a global indices which tells use the position in the sequence. If we want to do something different on certain positions we can use where like in Numpy or PyTorch. To run the seq we need to give it an initial input.

def even_vals(seq=tokens):
    "Keep even positions, set odd positions to -1"
    x = indices % 2
    # Note that all operations broadcast so you can use scalars.
    return where(x == 0, seq, -1)
seq = even_vals()

# Give the initial input tokens
seq.input([0,1,2,3,4])

The main operation you can use is "attention". You do this by defining a selector which forms a matrix based on key and query.

before = key(indices) < query(indices)
before

We can combine selectors with logical operations.

before_or_same = before | (key(indices) == query(indices))
before_or_same

Once you have a selector, you can apply "attention" to sum over the grey positions. For example to compute cumulative such we run the following function.

def cumsum(seq=tokens):
    return before_or_same.value(seq)
seq = cumsum()
seq.input([0, 1, 2, 3, 4])

Good luck!

transformer-puzzles's People

Contributors

caisq avatar srush avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

transformer-puzzles's Issues

Spec for first index is wrong, and tests are incomplete

Old code

def first_spec(token, seq):
    first = None
    for i, s in enumerate(seq):
        if s == token:
            first = i
    return [first for _ in seq]

def first(token, seq=tokens):
    raise NotImplementedError

test_output(first, first_spec, [(3, SEQ), (-1, SEQ2)])

Suggested code with two updates:

  1. Added clause "first is None" to if statement, to only grab the first matching index
  2. Added proper test: 'L' in "HELLO". Current test sequences do not have duplicates of the query token, so faulty implementations of first() can pass the old test.
def first_spec(token, seq):
    first = None
    for i, s in enumerate(seq):
        if s == token and first is None:
            first = i
    return [first for _ in seq]

def first(token, seq=tokens):
    raise NotImplementedError

test_output(first, first_spec, [(3, SEQ), (-1, SEQ2), ('l', list('hello'))])

I would have submitted a pull request, but the diff in the jupyter notebook was unreadable.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.