Sync data between MySQL and Hive without writing JSON config files by using the web UI to map columns and generate the configuration automatically.
Schedule incremental data transfers so only new or changed records are copied, saving time on large tables.
Monitor running sync jobs in real time from the browser and stop a job immediately if something goes wrong.
Manage multiple database connections and create sync jobs for many tables at once instead of configuring them one at a time.
Requires Java 8, MySQL 5.7, and a working DataX installation before DataX-Web will run.
DataX-Web is a visual management interface built on top of DataX, an open-source data transfer tool originally developed by Alibaba. DataX moves data between different types of databases and storage systems. The problem it solves well is that configuring a DataX job normally requires hand-writing a JSON file, which is tedious and error-prone. DataX-Web replaces that manual step with a web interface where you point and click to set up the same jobs. You connect your databases as named data sources in the interface, then pick a source and a destination, map the columns between them, and the tool generates the required configuration automatically. Supported data sources include common relational databases (MySQL, Oracle, PostgreSQL, SQL Server), Hive (used in big-data pipelines), HBase, MongoDB, and ClickHouse. For relational databases, you can create sync jobs for many tables in one go rather than configuring them one at a time. Scheduling is built in, so you can set jobs to run on a timer without a separate scheduler. The system supports incremental sync, meaning it can track which records were added or changed since the last run and only move those, rather than copying the entire table every time. While a job runs, you can watch the log output in real time from the browser, and you can stop a running job from the same page. The project integrates with xxl-job, an open-source job scheduling system. Executor nodes (the workers that actually run the DataX processes) can be deployed across multiple machines for distributed operation. The interface shows CPU, memory, and load information for each executor so you can see which machines are busy. Beyond DataX tasks, the system also supports scheduling Shell scripts, Python scripts, and PowerShell scripts. User accounts have two roles: admin and regular user. Data source credentials are stored in encrypted form. The project is written in Java and requires Java 8 and MySQL 5.7 to run.
← weiye-jing on gitmyhub — every repo by this author, as a profile.
Verify against the repo before relying on details.