Unofficial PyTorch implementation of Attention Free Transformer's layers by Zhai, et al. [abs, pdf] from Apple Inc.
I'd like to thank primary author, Dr. Shuangfei Zhai, for his informal guidance and feedback as I built this package!
You can install aft-pytorch via pip:
pip install aft-pytorchYou can import the AFT-Full or AFT-Simple layer (as described in the paper) from the package like so:
from aft_pytorch import AFTFull
layer = AFTFull(
    max_seqlen=20,
    dim=512,
    hidden_dim=64
)
# a batch of sequences with 10 timesteps of length 512 each
x = torch.rand(32, 10, 512)
y = layer(x) # [32, 10, 512]from aft_pytorch import AFTSimple
layer = AFTSimple(
    max_seqlen=20,
    dim=512,
    hidden_dim=64
)
# a batch of sequences with 10 timesteps of length 512 each
x = torch.rand(32, 10, 512)
y = layer(x) # [32, 10, 512]from aft_pytorch import AFTLocal
layer = AFTLocal(
    max_seqlen=20,
    dim=512,
    hidden_dim=64
)
# a batch of sequences with 10 timesteps of length 512 each
x = torch.rand(32, 10, 512)
y = layer(x) # [32, 10, 512]This layer wrapper is a 'plug-and-play' with your existing networks / Transformers. You can swap out the Self-Attention layer with the available layers in this package with minimal changes.
- Add full AFT architecture
-  Add variants like, AFTConv
- Benchmark using Karpathy's minGPT
If you like this repo, please leave a star! If there are any amends or suggestions, feel free to raise a PR/issue.
@misc{attention-free-transformer,
title = {An Attention Free Transformer},
author = {Shuangfei Zhai and Walter Talbott and Nitish Srivastava and Chen Huang and Hanlin Goh and Ruixiang Zhang and Josh Susskind},
year = {2021},
URL = {https://arxiv.org/pdf/2105.14103.pdf}
}
