DataX is an open-source data synchronisation tool from Alibaba. It is the open-source version of Alibaba Cloud's DataWorks data integration service and is widely used inside Alibaba for moving data between different storage systems in bulk. The job it solves is the common one of copying tables, files or events from one database or storage system to another, especially when the source and destination speak different protocols. The design is plugin-based. DataX itself is a synchronisation framework, each kind of data source is supported by a pair of plugins, a Reader that pulls data out of the source and a Writer that pushes data into the destination. Because every connector is just a plugin, adding support for a new data source automatically makes it interoperable with all the existing ones, without having to write a special pipeline for each pair. Out of the box, DataX ships with plugins for relational databases like MySQL, Oracle, OceanBase, SQL Server, PostgreSQL, DRDS and Kingbase, for big-data and warehouse systems like HDFS, Hive, HBase, MaxCompute (ODPS), Hologres and ADS, for Alibaba Cloud storage and middleware such as OSS, OCS, DataHub and SLS, for graph databases like Alibaba's GDB and Neo4j, and for newer engines such as databend. You would actually use DataX if you need to do a one-off migration of a whole database, set up a recurring batch upload of operational data into a data warehouse, or pull data from many different systems into a single analysis environment. DataX is written in Java, and a commercial managed version with real-time capabilities is offered separately as part of Alibaba Cloud DataWorks. The full README is longer than what was provided.
← alibaba on gitmyhub — every repo by this author, as a profile.
Verify against the repo before relying on details.