online-ml/river

★ 5,814PythonAudience · dataComplexity · 3/5Setup · easy

Mindmap

mindmap
  root((River))
    What It Does
      Online ML training
      One sample at a time
      Continuous model updates
    Algorithm Types
      Linear models
      Decision trees
      Anomaly detection
      Time-series forecast
    Key Features
      Concept drift detection
      Progressive validation
      Streaming pipelines
    When To Use
      Streaming data
      Shifting distributions
      No batch storage

mindmap root((River)) What It Does Online ML training One sample at a time Continuous model updates Algorithm Types Linear models Decision trees Anomaly detection Time-series forecast Key Features Concept drift detection Progressive validation Streaming pipelines When To Use Streaming data Shifting distributions No batch storage

Click or tap to explore — scroll the page freely

Things people build with this

USE CASE 1

Build a fraud detection model that updates in real time as each transaction arrives, without storing all past transactions.

USE CASE 2

Forecast time-series values like energy usage or web traffic using a model that adapts as new data streams in.

USE CASE 3

Detect concept drift in a production ML system to know when the data patterns have shifted and the model is going stale.

USE CASE 4

Run anomaly detection on a sensor or log stream where storing a full dataset for batch training is impractical.

Tech stack

Python

Getting it running

Difficulty · easy Time to first run · 30min

Requires Python 3.11 or later, the docs note that most use cases are better served by standard batch learning, so verify the fit before committing.

In plain English

River is a Python library for machine learning on data that arrives as a continuous stream, rather than as a fixed dataset loaded all at once. Most machine learning approaches collect a batch of data, train a model on that batch, and then stop. River works differently: it processes one data point at a time, updating the model with each new observation so the model continuously reflects the most recent information. This approach is called online machine learning, and River is the main Python library dedicated to it. It was created by merging two earlier projects, creme and scikit-multiflow, and is backed by academic researchers as well as practitioners. River covers a wide range of algorithm types. On the supervised side it includes linear models with many optimizer options, decision trees, random forests, nearest-neighbor methods, and time-series forecasting. On the unsupervised side it includes clustering and anomaly detection. It also provides tools for detecting concept drift, which is what happens when the relationship between inputs and outputs changes over time in a live system, causing an older model to become less accurate. Beyond the algorithms, River ships utilities for preprocessing data in a streaming context, computing running statistics and metrics, building model pipelines, and validating model performance progressively using the same stream used for training rather than a held-out test set. River is worth considering when you need a model that does not have to store or revisit past data, when you expect the data distribution to shift over time, or when you want to mirror the event-based structure of a production system during development. The library's own documentation notes that most use cases are better served by standard batch learning, so it is worth being clear about your specific needs before reaching for it. River requires Python 3.11 or later and can be installed via pip with prebuilt wheels for Linux, macOS, and Windows.

Copy-paste prompts

Prompt 1

Using River, build an online logistic regression classifier that trains on a stream of labeled records one at a time and tracks accuracy progressively.

Prompt 2

I have a live IoT sensor feed. Show me how to use River to detect anomalies in the stream and flag readings that deviate from the expected pattern.

Prompt 3

How do I use River's concept drift detector to alert me when the distribution of incoming data has shifted significantly from what my model was trained on?

Prompt 4

Show me how to build a River pipeline that standardizes numeric features, one-hot encodes categoricals, and feeds the result into an online random forest classifier.

Prompt 5

I want to forecast next-hour web traffic using River's time-series module. Show me a minimal working example.

Open on GitHub → Explain another repo

← online-ml on gitmyhub — every repo by this author, as a profile.

Verify against the repo before relying on details.