feast-dev/feast

★ 7,026PythonAudience · dataComplexity · 4/5LicenseSetup · moderate

Mindmap

mindmap
  root((feast))
    What it does
      Store ML features
      Prevent data leakage
      Sync train and serve
    Storage
      Offline store
      Online store
      Historical datasets
    Data Sources
      Snowflake
      BigQuery
      Parquet files
    Use Cases
      Training pipelines
      Real-time prediction
      Feature browsing
    Audience
      ML engineers
      Data scientists

mindmap root((feast)) What it does Store ML features Prevent data leakage Sync train and serve Storage Offline store Online store Historical datasets Data Sources Snowflake BigQuery Parquet files Use Cases Training pipelines Real-time prediction Feature browsing Audience ML engineers Data scientists

Click or tap to explore — scroll the page freely

Things people build with this

USE CASE 1

Build a single feature repository that both your training pipeline and production API read from, eliminating training-serving skew.

USE CASE 2

Generate point-in-time correct training datasets that prevent data leakage by looking up feature values as they existed at a specific historical moment.

USE CASE 3

Register and browse ML features through a built-in web UI so your team knows what features exist and where they come from.

Tech stack

PythonSnowflakeBigQueryRedshiftPostgreSQLParquet

Getting it running

Difficulty · moderate Time to first run · 30min

Requires an external data source such as Snowflake, BigQuery, or Parquet files plus a configured online store for production use.

Use freely for any purpose, including commercial use, as long as you keep the copyright notice.

In plain English

Feast is an open source feature store for machine learning, written in Python. In machine learning, a "feature" is a piece of data used to train a model or make predictions, such as a user's average purchase amount or how recently they logged in. Feast is the system that keeps those features organized, consistent, and available whether you are training a model on historical data or serving predictions in real time. The main problem Feast solves is consistency between training and serving. Without a dedicated feature store, teams often compute the same numbers in two separate places: once for training, once for production. The values can drift apart, quietly degrading model quality. Feast fixes this by acting as a single source of truth. It maintains an offline store for processing large amounts of historical data (used during training) and a low-latency online store for fetching features quickly at prediction time. Feast also protects against a subtle and costly error called data leakage, where information from the future accidentally gets included in training data. It does this by generating point-in-time correct datasets: when you ask for a feature value, Feast looks up the value that existed at that specific moment in history, not a later one. Getting started involves installing the Python package, creating a feature repository with a single command, and defining your features as configuration files. From there you register them with feast apply, load historical data to build training sets, push current values to the online store, and read them back at low latency in production. A built-in web UI lets you browse and explore registered features. Feast connects to many common data sources including Snowflake, BigQuery, Redshift, Postgres, and Parquet files. Community plugins extend support further. The project is Apache 2.0 licensed and actively maintained with a public roadmap that includes vector search support for AI workloads.

Copy-paste prompts

Prompt 1

Show me how to define a Feast feature view using a Parquet file as the data source and retrieve features for a list of user IDs for model training.

Prompt 2

I have a scikit-learn model in production. Walk me through using Feast to push feature values to the online store and fetch them at prediction time with low latency.

Prompt 3

How do I use Feast to create a point-in-time correct training dataset that prevents future data from leaking into my historical training examples?

Prompt 4

Set up a Feast feature repository with Snowflake as the offline store and Redis as the online store, and show me the feast apply command to register features.

Open on GitHub → Explain another repo

← feast-dev on gitmyhub — every repo by this author, as a profile.

Verify against the repo before relying on details.