explaingit

blue-yonder/tsfresh

9,210Jupyter NotebookAudience · dataComplexity · 3/5Setup · easy

TLDR

A Python package that automatically extracts hundreds of statistical features from time series data and filters out irrelevant ones, cutting manual feature engineering work for machine learning.

Mindmap

mindmap
  root((repo))
    What it does
      Extracts time series features
      Automates feature engineering
      Filters irrelevant features
    Input Types
      Sensor readings
      Stock prices
      Patient vitals
    Capabilities
      Parallel processing
      Variable length series
      Statistical hypothesis filter
    Integration
      scikit-learn compatible
      pip install
      Jupyter notebook examples
Click or tap to explore — scroll the page freely

Code map

Detail Auto

An interactive map of this repo's files and how they connect — its source is parsed live in your browser. Click Visualize to build it.

filefunction / class

Things people build with this

USE CASE 1

Automatically generate hundreds of candidate features from sensor readings or stock prices to feed into a scikit-learn classifier without hand-crafting each metric.

USE CASE 2

Reduce manual feature engineering by letting tsfresh extract and statistically filter time series features before training a predictive model.

USE CASE 3

Process large collections of time series in parallel across multiple CPU cores when the dataset is too big for single-threaded extraction.

Tech stack

PythonJupyter Notebookscikit-learn

Getting it running

Difficulty · easy Time to first run · 30min

Parallel extraction across many long series can require significant RAM, check memory before running on large datasets.

In plain English

TSFRESH is a Python package that automatically extracts large numbers of descriptive characteristics from time series data. A time series is any sequence of measurements recorded over time, such as sensor readings, stock prices, or patient vital signs. Instead of a data scientist manually deciding which properties of those sequences to compute (averages, peaks, patterns of change), TSFRESH does that extraction automatically, producing hundreds of potential features from each input series. The motivation is to reduce the manual work involved in preparing data for machine learning. Before training a model to classify or predict something from time series, someone typically needs to turn the raw sequence into a set of numbers that a model can work with. TSFRESH automates that step. It applies methods from statistics, signal processing, and time series analysis to each input and produces a table of measurements that can then be fed directly into standard machine learning libraries, including scikit-learn. Because hundreds of automatically generated features are likely to include many that are irrelevant for any particular task, TSFRESH also includes a filtering step. This step tests each feature statistically to determine how much it actually explains the outcome you are trying to predict, and removes features that do not carry useful information. The filtering method is grounded in hypothesis testing theory and is described in academic papers cited in the README. The package supports parallel processing so that extraction across large numbers of time series can be distributed across multiple CPU cores or machines. It works with time series of different lengths, which is useful when recordings in a dataset are not all the same duration. Installation is through pip. Documentation is hosted on Read the Docs, and Jupyter notebook examples are available through the repository.

Copy-paste prompts

Prompt 1
I have a pandas DataFrame with columns [id, time, value] representing sensor readings. Show me how to use tsfresh to extract features and filter them to predict equipment failure.
Prompt 2
How do I configure tsfresh to run feature extraction in parallel across all CPU cores? Show me the relevant parameter and how to pass it to extract_features().
Prompt 3
I want to combine tsfresh with scikit-learn in a Pipeline. Show me how to wrap tsfresh feature extraction with a RandomForestClassifier for time series classification.
Open on GitHub → Explain another repo

← blue-yonder on gitmyhub — every repo by this author, as a profile.

Verify against the repo before relying on details.