Forecast weekly sales figures for thousands of products at once without waiting hours for results.
Automatically find the best ARIMA or ETS model for a time series without manually tuning parameters.
Detect anomalies in time series data such as unusual spikes in website traffic or energy consumption.
Scale a forecasting pipeline across a cluster with Ray or Spark when data volumes are too large for one machine.
Requires familiarity with pandas DataFrames and time series concepts, distributed backends need Spark, Dask, or Ray configured separately.
StatsForecast is a Python library for predicting future values in time series data using established statistical methods. A time series is any sequence of measurements recorded over time: sales figures by week, electricity usage by hour, website traffic by day. Statistical forecasting models try to find patterns in that history and extend them forward. StatsForecast packages many of these well-known models together and focuses on making them run much faster than previous Python implementations. The library includes automatic versions of several classic forecasting approaches. AutoARIMA, for example, searches for the best configuration of a model family called ARIMA (which stands for Autoregressive Integrated Moving Average) by testing different parameter combinations and picking the one that fits the data best. Similar auto-selection versions exist for ETS (a family of exponential smoothing models), Theta, and CES. For situations where you have a rough idea of what you want, manual versions of each model are also available. There is also support for time series with multiple seasonal patterns, anomaly detection, and incorporating external variables like weather or pricing. Speed is a central claim of the project. The README states that the AutoARIMA implementation is roughly 20 times faster than a comparable Python library called pmdarima and about 500 times faster than Facebook's Prophet. For very large workloads, the library integrates with distributed computing frameworks including Spark, Dask, and Ray, which lets it split work across many machines. The README includes a benchmark showing one million time series processed in around 30 minutes using Ray. The library uses the same interface style as scikit-learn, a well-known Python machine learning library, so anyone familiar with that pattern will recognize the fit and predict calls. It is available through PyPI and conda-forge, the two main Python package distribution channels.
← nixtla on gitmyhub — every repo by this author, as a profile.
Verify against the repo before relying on details.