Analysis updated 2026-07-03
Reproduce the Transformer-XL benchmark results using the provided pre-trained model weights without training from scratch.
Train a long-context language model across multiple GPUs or Google TPUs using the included training scripts.
Study the segment-level memory mechanism as a reference implementation when building or comparing transformer architectures.
Fine-tune the Transformer-XL architecture on a custom long-document NLP task using the PyTorch implementation.
| kimiyoung/transformer-xl | websocket-client/websocket-client | facebookresearch/reagent | |
|---|---|---|---|
| Stars | 3,702 | 3,701 | 3,699 |
| Language | Python | Python | Python |
| Setup difficulty | hard | easy | moderate |
| Complexity | 5/5 | 2/5 | 4/5 |
| Audience | researcher | developer | researcher |
Figures from each repo's GitHub metadata at analysis time.
Requires GPU or TPU hardware, multi-machine training needs Google TPU access, training from scratch is highly compute-intensive.
Transformer-XL is research code released alongside an academic paper of the same name. The project proposes a change to how language models, the type of AI system that predicts and generates text, handle long documents. Standard transformer models process text in fixed-length chunks and lose context that appeared earlier in the document. Transformer-XL introduces a memory mechanism that lets the model carry information forward across chunks, so it can reference words and patterns from much earlier in a piece of text. The repository provides implementations in both PyTorch and TensorFlow, two popular machine learning frameworks. The TensorFlow version supports training across multiple GPUs on a single machine and also across multiple machines using Google TPU hardware. The PyTorch version supports multi-GPU training on a single machine. The paper reports that Transformer-XL set new top scores on several standard language modeling benchmarks at the time of publication, and was the first model to score below 1.0 on a character-level language modeling task (lower scores are better on the specific metric used). Pre-trained model weights are included so that researchers can reproduce the reported results without training from scratch. This repository is primarily aimed at machine learning researchers and engineers who want to study or build on the work. It is not a general-purpose tool for end users. The README is brief and points to subfolder READMEs in the tf/ and pytorch/ directories for setup and training instructions.
Research code for Transformer-XL, an AI language model that uses a memory mechanism to carry context across document chunks, letting it reference text from much earlier in a passage than standard transformers.
Mainly Python. The stack also includes Python, PyTorch, TensorFlow.
Setup difficulty is rated hard, with roughly 1day+ to a first successful run.
Mainly researcher.
This repo across BitVibe Labs
Verify against the repo before relying on details.