We base our implementations on two open-source model architectures: https://github.com/fkodom/yet-another-retnet (for RetNet) and https://github.com/karpathy/nanoGPT/tree/master for training the models from scratch.
We then adapted the data processing on the Wikitext-2 and Wikitext-103 datasets: https://huggingface.co/datasets/wikitext