Build a fraud detection model that updates in real time as each transaction arrives, without storing all past transactions.
Forecast time-series values like energy usage or web traffic using a model that adapts as new data streams in.
Detect concept drift in a production ML system to know when the data patterns have shifted and the model is going stale.
Run anomaly detection on a sensor or log stream where storing a full dataset for batch training is impractical.
Requires Python 3.11 or later, the docs note that most use cases are better served by standard batch learning, so verify the fit before committing.
River is a Python library for machine learning on data that arrives as a continuous stream, rather than as a fixed dataset loaded all at once. Most machine learning approaches collect a batch of data, train a model on that batch, and then stop. River works differently: it processes one data point at a time, updating the model with each new observation so the model continuously reflects the most recent information. This approach is called online machine learning, and River is the main Python library dedicated to it. It was created by merging two earlier projects, creme and scikit-multiflow, and is backed by academic researchers as well as practitioners. River covers a wide range of algorithm types. On the supervised side it includes linear models with many optimizer options, decision trees, random forests, nearest-neighbor methods, and time-series forecasting. On the unsupervised side it includes clustering and anomaly detection. It also provides tools for detecting concept drift, which is what happens when the relationship between inputs and outputs changes over time in a live system, causing an older model to become less accurate. Beyond the algorithms, River ships utilities for preprocessing data in a streaming context, computing running statistics and metrics, building model pipelines, and validating model performance progressively using the same stream used for training rather than a held-out test set. River is worth considering when you need a model that does not have to store or revisit past data, when you expect the data distribution to shift over time, or when you want to mirror the event-based structure of a production system during development. The library's own documentation notes that most use cases are better served by standard batch learning, so it is worth being clear about your specific needs before reaching for it. River requires Python 3.11 or later and can be installed via pip with prebuilt wheels for Linux, macOS, and Windows.
← online-ml on gitmyhub — every repo by this author, as a profile.
Verify against the repo before relying on details.