Light

rish-16 / aft-pytorch Goto Github PK

View Code? Open in Web Editor NEW

226.0 9.0 23.0 85 KB

Unofficial PyTorch implementation of Attention Free Transformer (AFT) layers by Apple Inc.

License: MIT License

Python 100.00%

aft-pytorch's Introduction

aft-pytorch

Unofficial PyTorch implementation of Attention Free Transformer's layers by Zhai, et al. [abs, pdf] from Apple Inc.

I'd like to thank primary author, Dr. Shuangfei Zhai, for his informal guidance and feedback as I built this package!

Installation

You can install aft-pytorch via pip:

pip install aft-pytorch

Usage

You can import the AFT-Full or AFT-Simple layer (as described in the paper) from the package like so:

`AFTFull`

from aft_pytorch import AFTFull

layer = AFTFull(
    max_seqlen=20,
    dim=512,
    hidden_dim=64
)

# a batch of sequences with 10 timesteps of length 512 each
x = torch.rand(32, 10, 512)
y = layer(x) # [32, 10, 512]

`AFTSimple`

from aft_pytorch import AFTSimple

layer = AFTSimple(
    max_seqlen=20,
    dim=512,
    hidden_dim=64
)

# a batch of sequences with 10 timesteps of length 512 each
x = torch.rand(32, 10, 512)
y = layer(x) # [32, 10, 512]

`AFTLocal`

from aft_pytorch import AFTLocal

layer = AFTLocal(
    max_seqlen=20,
    dim=512,
    hidden_dim=64
)

# a batch of sequences with 10 timesteps of length 512 each
x = torch.rand(32, 10, 512)
y = layer(x) # [32, 10, 512]

This layer wrapper is a 'plug-and-play' with your existing networks / Transformers. You can swap out the Self-Attention layer with the available layers in this package with minimal changes.

TODO

Add full AFT architecture
Add variants like, AFTConv
Benchmark using Karpathy's minGPT

Contributing

If you like this repo, please leave a star! If there are any amends or suggestions, feel free to raise a PR/issue.

Credits

@misc{attention-free-transformer,
title = {An Attention Free Transformer},
author = {Shuangfei Zhai and Walter Talbott and Nitish Srivastava and Chen Huang and Hanlin Goh and Ruixiang Zhang and Josh Susskind},
year = {2021},
URL = {https://arxiv.org/pdf/2105.14103.pdf}
}

License

aft-pytorch's People

Contributors

Stargazers

Watchers

aft-pytorch's Issues

I test the model in an NLP task.

I use aft_full model，6 layers.
and I use it in init with this code:

self.encoder_transformer = nn.ModuleList()
for _ in range(6):
    self.encoder_transformer.append(AFTFull(max_seqlen=500, dim=512,hidden_dim=256))

and in forward function, I use this code:

for _, layer in enumerate(self.encoder_transformer):`
    x = layer(x) + x

Originally I used the traditional transformer, now I replaced it with this, the training loss appeared Nan，Is something wrong? and how U use the model for many layers，please help me, Thank U.

why wbias = nn.Parameter(torch.rand(self.heads, T, T)) in forward pass?

About muti-head

Thank you for your work!
I wanted to inquire if there are any other branches available for this project (about muti-head).

How to init the additional param w?

I want to migrate a existing llm to this arch. There is a additional param w. How to init it?

why sum(0)?

Thank you for this code!
I got a problem with this at line 33 and line 36. Why sum(0) to add all batch dimension? I noticed that it add the timestep dimension from t=1 to T, so maybe it should be sum(2) here.

can run on cpu but failed in gpu,why?

RuntimeError: Tensor for 'out' is on CPU, Tensor for argument #1 'self' is on CPU, but expected them to be on GPU (while checking arguments for baddbmm)

i set .cuda() but it can't work, please help!

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.