explaingit

weiye-jing/datax-web

5,996JavaAudience · dataComplexity · 4/5Setup · hard

TLDR

A web interface for Alibaba's DataX data-transfer tool that replaces hand-written JSON config files with a point-and-click UI for setting up, scheduling, and monitoring database sync jobs.

Mindmap

mindmap
  root((datax-web))
    What it does
      visual job builder
      automated scheduling
      incremental sync
      real-time log viewer
    Tech Stack
      Java
      MySQL
      xxl-job
    Data Sources
      MySQL and Oracle
      Hive and HBase
      MongoDB and ClickHouse
    Use Cases
      database sync
      batch table migration
      script scheduling
    Audience
      data engineers
      ops teams
Click or tap to explore — scroll the page freely

Code map

Detail Auto

An interactive map of this repo's files and how they connect — its source is parsed live in your browser. Click Visualize to build it.

filefunction / class

Things people build with this

USE CASE 1

Sync data between MySQL and Hive without writing JSON config files by using the web UI to map columns and generate the configuration automatically.

USE CASE 2

Schedule incremental data transfers so only new or changed records are copied, saving time on large tables.

USE CASE 3

Monitor running sync jobs in real time from the browser and stop a job immediately if something goes wrong.

USE CASE 4

Manage multiple database connections and create sync jobs for many tables at once instead of configuring them one at a time.

Tech stack

JavaMySQLxxl-job

Getting it running

Difficulty · hard Time to first run · 1h+

Requires Java 8, MySQL 5.7, and a working DataX installation before DataX-Web will run.

License terms are not described in the explanation.

In plain English

DataX-Web is a visual management interface built on top of DataX, an open-source data transfer tool originally developed by Alibaba. DataX moves data between different types of databases and storage systems. The problem it solves well is that configuring a DataX job normally requires hand-writing a JSON file, which is tedious and error-prone. DataX-Web replaces that manual step with a web interface where you point and click to set up the same jobs. You connect your databases as named data sources in the interface, then pick a source and a destination, map the columns between them, and the tool generates the required configuration automatically. Supported data sources include common relational databases (MySQL, Oracle, PostgreSQL, SQL Server), Hive (used in big-data pipelines), HBase, MongoDB, and ClickHouse. For relational databases, you can create sync jobs for many tables in one go rather than configuring them one at a time. Scheduling is built in, so you can set jobs to run on a timer without a separate scheduler. The system supports incremental sync, meaning it can track which records were added or changed since the last run and only move those, rather than copying the entire table every time. While a job runs, you can watch the log output in real time from the browser, and you can stop a running job from the same page. The project integrates with xxl-job, an open-source job scheduling system. Executor nodes (the workers that actually run the DataX processes) can be deployed across multiple machines for distributed operation. The interface shows CPU, memory, and load information for each executor so you can see which machines are busy. Beyond DataX tasks, the system also supports scheduling Shell scripts, Python scripts, and PowerShell scripts. User accounts have two roles: admin and regular user. Data source credentials are stored in encrypted form. The project is written in Java and requires Java 8 and MySQL 5.7 to run.

Copy-paste prompts

Prompt 1
I'm using DataX-Web to sync a MySQL table to Hive on a schedule. Walk me through connecting the data sources, mapping the columns, and setting the sync interval.
Prompt 2
My DataX-Web incremental sync job isn't picking up new rows. How does the incremental tracking work and how do I configure the offset condition?
Prompt 3
I want to deploy DataX-Web executor nodes on two separate machines for distributed operation. How do I register additional executors and check their CPU and memory stats in the console?
Prompt 4
A DataX-Web scheduled job silently stopped running. How do I check the real-time log and find out why the task was rejected?
Prompt 5
I need to schedule a Python script to run every night alongside my DataX jobs in DataX-Web. How do I add a script task to the scheduler?
Open on GitHub → Explain another repo

← weiye-jing on gitmyhub — every repo by this author, as a profile.

Verify against the repo before relying on details.