Get a quick overview of the entire data engineering landscape before deciding which skills to learn first
Identify gaps in your current data engineering knowledge by scanning the roadmap categories
Use the diagram to explain the data engineering field to a manager or non-technical stakeholder
Find the names of tools in areas like orchestration, storage formats, or query engines to research further
data-engineer-roadmap is a visual reference guide showing the tools, technologies, and concepts a person would need to learn to work as a data engineer. Data engineering is the discipline of building and maintaining the pipelines that move, store, clean, and prepare data so that analysts, dashboards, and machine learning systems can use it. This roadmap attempts to map out that entire field in a single diagram, presented as a large image hosted in the repository. The roadmap covers the modern data engineering landscape as of 2021, grouping topics across areas such as cloud platforms, data pipeline tools, storage formats, orchestration systems, query engines, and programming languages. A text version of the diagram is included in the repository for users who cannot view the image. There is also a separate extras diagram covering additional tools that are useful to know but not strictly required for most roles. The README includes a note for beginners: a working data engineer would typically master only a subset of these tools over several years, shaped by the company they work for and the kinds of problems they encounter. The diagram is intended as a map of the overall landscape, not a checklist to complete before getting started. The README itself is sparse and the main content is the roadmap image, which the README links to but does not describe in text. The project was created by datastack.tv, a learning platform that produces screencast tutorials for data engineers. Community suggestions and pull requests are welcome.
← datastacktv on gitmyhub — every repo by this author, as a profile.
Verify against the repo before relying on details.