Code Monkey home page Code Monkey logo

hypermixing's Introduction

HyperMixing

HyperMixing is a token-mixing techniques to be used as linear-time alternative to attention, for example in Transformer-like architecture like HyperMixer.

This repository serves as the unified PyTorch implementation for both single-head hypermixing and multi-head-hypermixing.

Alt text

Requirements

Code was tested with:

  • Python 3.10
  • PyTorch 2.0

You can create an environment with the required dependencies by running

conda env create -f environment.yml

Installation

cd hypermixing
pip install .

Usage

import torch
from hypermixing import HyperMixing

input_dim = 128
hypernet_size = 512
tied = False
num_heads = 2
max_length = 3000
token_mixer = HyperMixing(input_output_dim=input_dim,
        hypernet_size=hypernet_size,
        tied=tied,
        num_heads=num_heads,
        max_length=max_length)

queries = torch.randn((64, 50, 128)) # [bsize, num_queries, emb_dim]
keys = torch.randn((64, 25, 128)) # [bsize, num_keys, emb_dim]
values = torch.randn((64, 25, 128)) # [bsize, num_keys, emb_dim]
out = token_mixer(queries, keys, values) # [bsize, num_queries, emb_dim]
assert out.size() == queries.size()

Citation

If you use or build on HyperMixer, please cite the following papers:

@inproceedings{mai2023hypermixer,
    author = {Mai, F. and Pannatier, A. and Fehr, F. and Chen, H. and Marelli, F. and Fleuret, F. and Henderson, J.},
    title = {HyperMixer: An MLP-based Low Cost Alternative to Transformers},
    booktitle = {Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},
    year = {2023}
}

@article{mai2023multihead-hypermixer,
    author={Mai, F. and Zuluaga-Gomez, J. and Parcollet, T. and Motlicek, P.},
    title={HyperConformer: Multi-head HyperMixer for Efficient Speech Recognition},
    booktitle = {Proc. Interspeech 2023},
    year={2023}
}

hypermixing's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

hypermixing's Issues

Questions about the experiment in the paper

Hello! I am impressed by your work on HyperMixer: An MLP-based Green AI Alternative to Transformers. It greatly reduces the computational complexity of transformers through a simple and elegant method, which greatly improves computational efficiency and is also very environmentally friendly! However, I have a question about the writing of the paper: did you not use a decoder when doing experiments with transformers? Or did you introduce decoders for both MLP mixer and HyperMixer? Thank you!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.