explaingit

fardinsabid/spectron

1PythonAudience · researcherComplexity · 3/5ActiveLicenseSetup · easy

TLDR

Research PyTorch implementation of an FFT-based replacement for self-attention that claims O(n log n) cost and a 15x speedup at 4096 tokens.

Mindmap

mindmap
  root((spectron))
    Inputs
      Token sequence
      Learned filter W
    Outputs
      Mixed token sequence
      Benchmark plots
      Paper PDF
    Use Cases
      Study FFT attention
      Compare with Transformer
      Long range mixing test
    Tech Stack
      Python
      PyTorch
      NumPy
      Matplotlib

Things people build with this

USE CASE 1

Reproduce the 4096 token benchmark and plot O(n log n) vs O(n squared) curves

USE CASE 2

Swap the FFT mixer into a small Transformer to compare loss and speed

USE CASE 3

Run the long range mixing test that measures cosine similarity across 256 token gaps

USE CASE 4

Read the bundled paper to learn how a learned frequency filter substitutes for QK attention

Tech stack

PythonPyTorchNumPyMatplotlib

Getting it running

Difficulty · easy Time to first run · 30min

Plain pip install of torch, numpy, and matplotlib is enough; commercial use is blocked by the custom license.

Custom Spectron Research and Ethical License v1.0 that allows research and personal use only; commercial use needs a separate license and military or surveillance use is forbidden.

In plain English

Spectron is a research project by Fardin Sabid in Bangladesh that proposes a different way to do the attention step inside a language model. Modern language models like GPT are built on Transformers, and the part of a Transformer that lets each word look at every other word is called self-attention. The standard recipe multiplies a matrix called Q by the transpose of K, which is fine for short text but grows with the square of the sequence length. For a 128,000 token input the README points out that the attention step alone would need around 68 gigabytes of memory. The Spectron idea is to skip that big matrix entirely. Instead of comparing every token to every other token directly, the input sequence is sent through a Fast Fourier Transform, which converts it into a frequency representation, then multiplied by a learned filter, then sent back through the inverse Fourier Transform. The README sums this up as three lines of PyTorch and writes the operation as IFFT(W ⊙ FFT(x)). The author explains that low frequencies capture long range structure while high frequencies capture local details, and the learned filter W decides which frequencies matter for each dimension. The README claims this runs in O(n log n) time instead of O(n squared) and shows a small benchmark table to back it up. At 4096 tokens, Spectron is reported as 15.4 times faster than a Transformer baseline. There are also figures fitting the measured runtimes to the two complexity curves, and a long range mixing test where tokens 256 positions apart end up with cosine similarity 0.91. The model used for the benchmark has 1.9 million parameters. To try it locally, clone the repo, pip install torch, numpy, and matplotlib, and run python test.py. The project ships a research paper PDF and a citation file. The license is a custom Spectron Research and Ethical License v1.0 that allows free use for research and personal projects but requires a separate license for commercial use and forbids military or surveillance use.

Copy-paste prompts

Prompt 1
Clone fardinsabid/spectron, install torch and numpy, and run python test.py end to end on CPU
Prompt 2
Walk me through the three line IFFT(W times FFT(x)) operation in the Spectron code and how W is learned
Prompt 3
Modify the Spectron benchmark to compare 8192 token runs against a vanilla scaled dot product attention baseline
Prompt 4
Explain what the Spectron Research and Ethical License lets me do if I want to use this in a commercial product
Prompt 5
Rewrite the long range mixing test in Spectron so it logs cosine similarity at gaps of 64, 256, and 1024
Open on GitHub → Explain another repo

Generated 2026-05-22 · Model: sonnet-4-6 · Verify against the repo before relying on details.