explaingit

andkret/cookbook

Analysis updated 2026-06-24

15,082PythonAudience · dataComplexity · 2/5Setup · easy

TLDR

The Data Engineering Cookbook is a markdown book by Andreas Kretz that teaches how to build data pipelines, covering Linux, Git, Docker, cloud, big data platforms, and interview prep.

Mindmap

mindmap
  root((Data Engineering Cookbook))
    Content
      Platform blueprint
      Roadmaps per role
      Skills matrix
    Basic skills
      Coding and Git
      Linux and shell
      Docker and Kubernetes
      Cloud IaaS PaaS SaaS
    Advanced skills
      Big data 4 Vs
      Beyond classical ETL
      Platform planning
    Extras
      130+ data sources
      1000+ interview questions
      Books and podcasts
    Format
      Markdown sections folder
Click or tap to explore — scroll the page freely

Code map

Detail Auto

An interactive map of this repo's files and how they connect — its source is parsed live in your browser. Click Visualize to build it.

filefunction / class

What do people build with it?

USE CASE 1

Follow a role-specific learning roadmap to move from analyst or software engineer into data engineering.

USE CASE 2

Use the skills matrix as a self-assessment when preparing for a data engineering job change.

USE CASE 3

Practice with the 1,000+ interview questions before a technical screen.

USE CASE 4

Borrow the Connect/Buffer/Process/Store/Visualize blueprint when planning a real data platform.

What is it built with?

Markdown

How does it compare?

andkret/cookbookbudtmo/docker-androidrossant/awesome-math
Stars15,08215,09115,097
LanguagePythonPythonPython
Last pushed2026-05-19
MaintenanceMaintained
Setup difficultyeasyhardeasy
Complexity2/53/51/5
Audiencedatadeveloperresearcher

Figures from each repo's GitHub metadata at analysis time.

How do you get it running?

Difficulty · easy Time to first run · 5min

Read the markdown files in the sections folder directly on GitHub, no install needed.

License is not stated in the explanation.

In plain English

The Data Engineering Cookbook is a long, structured set of notes for people who want to learn data engineering, written by Andreas Kretz and kept as a single repository on GitHub. Data engineering, as the cookbook frames it, is the discipline of building the plumbing that moves data into and around the systems where data scientists and analysts use it. The repository itself is the book, split across many markdown files under a sections folder. The contents are organised as a curriculum rather than as software. The Introduction explains what a data engineer does, sketches a Data Science Platform Blueprint with stages for Connect, Buffer, Processing Framework, Store, and Visualize, and offers separate learning roadmaps for beginners, data analysts, data scientists, and software engineers. A Skills Matrix and a section on becoming a senior data engineer round out the opening chapter. The Basic Engineering Skills section covers a wide range of background topics: learning to code, getting comfortable with Git, agile development (including Scrum and OKR), software engineering culture, how a computer works, networking, security topics like SSL keys, JSON Web Tokens, and GDPR, Linux basics with shell scripting and cron jobs, Docker (including Kubernetes orchestration), and cloud concepts such as IaaS versus PaaS versus SaaS, the major providers, on-premises trade-offs, and hybrid setups. The Advanced Engineering Skills section continues with the data science platform itself, the four Vs of big data, planning, and a discussion of the limitations of traditional ETL. Beyond the skills chapters, the repository lists free hands-on courses and tutorials, real-world case studies, best practices for cloud platforms, a list of 130+ data sources for data science work, more than 1,000 interview questions, and a section of recommended books, courses, and podcasts. A separate Updates file acts as a change log. The README also points to the author's wider work: a YouTube channel, a Twitter account, an Amazon shop with podcast gear, and a paid online academy at learndataengineering.com that includes courses, a certification, and a Discord community. Contributions to the cookbook itself are welcome, and the README links to a contributing section near the bottom.

Copy-paste prompts

Prompt 1
Read andkret/Cookbook and turn the beginner roadmap into a 90-day study plan with one concrete task per day.
Prompt 2
Summarise the difference between IaaS, PaaS, and SaaS as explained in andkret/Cookbook with one cloud-provider example for each.
Prompt 3
Pull 50 of the most-asked SQL interview questions from andkret/Cookbook and group them by topic (joins, windows, aggregation).
Prompt 4
Map the Connect/Buffer/Processing/Store/Visualize blueprint to a real stack with Kafka, Spark, S3, and Snowflake.
Prompt 5
Compare the Docker and Kubernetes chapter in andkret/Cookbook with the official Kubernetes docs and list what is missing.

Frequently asked questions

What is cookbook?

The Data Engineering Cookbook is a markdown book by Andreas Kretz that teaches how to build data pipelines, covering Linux, Git, Docker, cloud, big data platforms, and interview prep.

What language is cookbook written in?

Mainly Python. The stack also includes Markdown.

What license does cookbook use?

License is not stated in the explanation.

How hard is cookbook to set up?

Setup difficulty is rated easy, with roughly 5min to a first successful run.

Who is cookbook for?

Mainly data.

Open on GitHub → Explain another repo

This repo across BitVibe Labs

Verify against the repo before relying on details.