Analysis updated 2026-06-24 · repo last pushed 2026-05-08
Train a small RWKV-7 model from scratch on a single GPU with 7GB of VRAM.
Fine-tune a pre-trained RWKV checkpoint on a custom dataset using DeepSpeed.
Compare RWKV inference speed against a same-size transformer for long-context workloads.
Convert RWKV weights to GGUF and run them in a local chat UI.
| blinkdl/rwkv-lm | weifeng2333/videocaptioner | swivid/f5-tts | |
|---|---|---|---|
| Stars | 14,524 | 14,530 | 14,508 |
| Language | Python | Python | Python |
| Last pushed | 2026-05-08 | — | — |
| Maintenance | Maintained | — | — |
| Setup difficulty | hard | easy | hard |
| Complexity | 5/5 | 2/5 | 4/5 |
| Audience | researcher | general | developer |
Figures from each repo's GitHub metadata at analysis time.
Needs CUDA, a specific PyTorch Lightning 1.9.5, DeepSpeed, and at least 7GB GPU VRAM for the smallest training script.
RWKV (pronounced "RwaKuv") is a research project that designs a new kind of large language model. Most modern chat-style models, like GPT and similar systems, use an architecture called the transformer. RWKV takes a different route: it is built as a recurrent neural network (RNN), which means it reads text one token at a time and keeps a small running "state" instead of looking back at the whole conversation. The claim of this repository is that RWKV can match transformer-level quality while keeping the speed and memory advantages of an RNN. The README is centered on RWKV-7, nicknamed "Goose", which the author calls the strongest linear-time, constant-space, attention-free, fully RNN architecture available at the time of writing. Linear-time means the work grows in proportion to the length of the input, and constant-space means it does not need a growing key-value cache the way a transformer does. The project is hosted under the Linux Foundation AI umbrella so the code and weights are free to use, and the README notes that RWKV is already shipped inside Windows and Office. The repository is mostly training code and reference implementations. There are demo scripts for RWKV-7 in GPT-like mode, in pure RNN mode, and in a faster combined mode, with similar files for RWKV-6 and RWKV-5. A simplified training script in RWKV-v7/train_temp can be run on a single GPU with about 7 GB of VRAM, and a fuller script trains a model on the MiniPile dataset using PyTorch, PyTorch Lightning 1.9.5, DeepSpeed, and CUDA. The README is firm about a few training details, like using PreLN LayerNorm, applying weight decay only to large projection matrices, and following the supplied initialization. The README also lists a wide surrounding ecosystem: pre-trained weights and GGUF conversions on Hugging Face, a pip package called rwkv, Gradio and WebGPU chat demos, a graphical runner, an inference server called Ai00, a PEFT and LoRA tuning project, an RLHF project, fast CUDA kernels, and a mobile inference library. A successor architecture, RWKV-8 "ROSA", is mentioned at the end. The full README is longer than what was shown.
Training code and reference implementation for RWKV, a recurrent neural network language model that aims to match transformer quality with constant memory and linear-time inference.
Mainly Python. The stack also includes Python, PyTorch, PyTorch Lightning.
Maintained — commit in last 6 months (last push 2026-05-08).
License is not stated in the available content, though it notes the project is under the Linux Foundation AI umbrella with free weights.
Setup difficulty is rated hard, with roughly 1day+ to a first successful run.
Mainly researcher.
This repo across BitVibe Labs
Verify against the repo before relying on details.