Build a visual ETL pipeline that pulls records from a MySQL database, cleans them, and loads them into a data warehouse without writing SQL by hand.
Schedule a nightly data migration job that moves records from a legacy system into a new database using the PDI command-line engine.
Connect multiple heterogeneous data sources, databases, CSV files, web services, and merge them into a unified dataset for reporting.
Extend Pentaho Kettle with a custom plugin step to handle a transformation that the built-in steps do not support.
Requires Java 11 and a full Maven build of multiple modules, building from source takes significant time.
Pentaho Data Integration, also known as Kettle or PDI, is a tool for moving and transforming data between different systems. ETL stands for Extract, Transform, Load, which describes the basic idea: pull data out of one place, reshape or clean it, and put it somewhere else. This is a common task when combining data from multiple databases, migrating from one system to another, or preparing raw data for reporting and analysis. The software has both a visual designer and a command-line engine. Users can build data pipelines by dragging and dropping steps in a graphical interface, connecting them to form a workflow that processes records row by row. The engine then runs those workflows, which can be scheduled or triggered programmatically. It supports connecting to databases, flat files, web services, and many other data sources. This repository is the source code for the open-source community edition of the product. It is organized into several modules: a core library, the main execution engine, an engine extension layer, a database connection dialog, a user interface module, and a plugins folder that extends functionality. The codebase is built with Maven, a Java build tool, and requires Java 11. Developers who want to build it from source run a standard Maven build command. The project includes unit tests and integration tests, and contributors are expected to attach pull requests to a Jira issue tracker. Code style is enforced with a checkstyle configuration included in the project. The community forum for questions and support is hosted at the Hitachi Vantara community site, which now maintains the project.
← pentaho on gitmyhub — every repo by this author, as a profile.
Verify against the repo before relying on details.