Train Kolmogorov-Arnold Network models without running out of GPU memory as with the original pykan library.
Swap a standard PyTorch linear layer for a KAN layer in an existing model to test if learnable activations improve results.
Experiment with B-spline-based learnable activation functions as an alternative to standard neural network architectures.
Benchmark KAN vs MLP performance on a classification task using a drop-in PyTorch module.
Requires PyTorch, sparsification regularization differs from the original KAN paper which may affect interpretability.
This repository is a reimplementation of Kolmogorov-Arnold Networks (KAN), a type of neural network architecture proposed as an alternative to the standard multilayer perceptron. In a standard neural network, activation functions are fixed and applied at each node. In a KAN, the activation functions are learnable and sit on the connections between nodes instead of the nodes themselves. The learnable functions are built from B-splines, a class of smooth mathematical curves. The motivation for this project is performance. The original KAN implementation (from a separate project called pykan) works correctly but is slow and uses a lot of memory because it expands data into large intermediate tensors to handle all the different activation functions at once. This implementation reorganizes the same computation so that it becomes a standard matrix multiplication. Matrix multiplications are highly optimized in PyTorch and on GPUs, so the result is much faster and uses far less memory, while still computing the same thing in both the forward and backward passes. One feature that could not be kept exactly identical is the sparsification regularization that the original KAN paper describes as important for making the network interpretable. The original method requires operating on those large intermediate tensors, which conflicts with the memory-efficient reformulation. This implementation substitutes a standard L1 weight regularization instead, which is compatible with matrix multiplication. The author notes that this difference may affect results and that more experiments are needed to understand the trade-off. The project is a single Python file using PyTorch and is aimed at researchers and developers experimenting with KAN architectures who need something faster than the reference implementation. A 2024 update improved weight initialization, which significantly improved accuracy on a standard benchmark dataset. The README is short and technical, and assumes familiarity with neural network training concepts.
← blealtan on gitmyhub — every repo by this author, as a profile.
Verify against the repo before relying on details.