explaingit

kedro-org/kedro

10,863PythonAudience · dataComplexity · 3/5LicenseSetup · moderate

TLDR

Python framework that turns messy data science notebooks into organized, reusable pipelines with a standard project layout, a config-driven data catalog, and automatic step ordering based on how your functions connect.

Mindmap

mindmap
  root((kedro))
    What it does
      Structured pipelines
      Data catalog
      Step ordering
    Key concepts
      Project template
      Node functions
      Dataset connectors
    Visualization
      Kedro-Viz diagram
      Pipeline explorer
    Deployment
      Local machine
      Kubeflow Argo
      AWS Batch Databricks
    Audience
      Data engineers
      Data scientists
      ML teams
Click or tap to explore — scroll the page freely

Code map

Detail Auto

An interactive map of this repo's files and how they connect — its source is parsed live in your browser. Click Visualize to build it.

filefunction / class

Things people build with this

USE CASE 1

Refactor a one-off Jupyter notebook data pipeline into a testable, team-shareable Kedro project.

USE CASE 2

Connect a pipeline to multiple data sources, local CSVs, S3, databases, without hardcoding paths in your code.

USE CASE 3

Visualize how data flows between pipeline steps using Kedro-Viz to explain the process to non-technical teammates.

USE CASE 4

Deploy the same pipeline to AWS Batch or Databricks by swapping the runner without rewriting the pipeline logic.

Tech stack

PythonJupyterArgoPrefectKubeflowAWS BatchDatabricks

Getting it running

Difficulty · moderate Time to first run · 30min

pip install kedro is straightforward, but learning the project template and catalog config takes 20-30 minutes before your first working pipeline.

Use freely for any purpose including commercial projects, Apache 2.0.

In plain English

Kedro is an open-source Python framework for building data engineering and data science pipelines in a structured, reusable way. It was created to address the common problem that data science work often starts as messy Jupyter notebooks or one-off scripts that become hard to maintain, share, or move into production. Kedro brings software engineering practices to data work so that pipelines are easier to understand, test, and reuse across a team. The main building blocks Kedro provides are a project template, a data catalog, and a pipeline abstraction. The project template gives you a standard folder structure so that new projects start consistently. The data catalog is a configuration-driven system for connecting to different data sources and destinations, including local files, cloud storage, databases, and other formats, without scattering connection details through your code. The pipeline abstraction lets you write your data processing steps as ordinary Python functions and then declare how they connect to each other. Kedro resolves the execution order automatically based on those connections. Kedro also includes an optional visualization tool called Kedro-Viz that generates an interactive diagram of your pipeline, showing how data flows between steps. This can be useful for communicating what a pipeline does to teammates who did not write it. On the deployment side, Kedro supports running pipelines on a single machine or distributed across clusters. It integrates with orchestration platforms including Argo, Prefect, Kubeflow, AWS Batch, and Databricks. Kedro is hosted by the LF AI and Data Foundation, an organization that provides neutral governance for open-source AI and data projects. The code is released under the Apache 2.0 license. It supports Python 3.10 through 3.14 and can be installed via pip or conda in a few commands. The README describes the project as coming out of real-world experience building machine-learning applications with large, messy datasets, and the problems that approach revealed when working in teams.

Copy-paste prompts

Prompt 1
I have a messy Jupyter notebook that reads a CSV, cleans it, and trains a model. Rewrite it as a Kedro pipeline with a data catalog entry for the CSV.
Prompt 2
How do I add an S3 dataset to my Kedro data catalog so my pipeline can read from it without any AWS credentials in the code?
Prompt 3
Show me how to split a large Kedro pipeline into modular pipelines so different team members can work on separate parts.
Prompt 4
I want to deploy my Kedro pipeline to AWS Batch. What changes do I need to make to my project?
Open on GitHub → Explain another repo

← kedro-org on gitmyhub — every repo by this author, as a profile.

Verify against the repo before relying on details.