explaingit

heibaiying/bigdata-notes

16,897Java

TLDR

BigData-Notes is a comprehensive beginner's guide to big data technologies, written in Chinese.

Mindmap

A visual breakdown will appear here once this repo is fully enriched.

In plain English

BigData-Notes is a comprehensive beginner's guide to big data technologies, written in Chinese. It is a structured collection of tutorials and notes covering twelve major tools and frameworks that are widely used in the big data industry. The guide covers Hadoop (a system for storing and processing very large datasets across many computers), Hive (a tool for querying big data using SQL-like language), Spark (a fast processing engine for large-scale data), Storm and Flink (tools for processing continuous streams of live data in real time), HBase (a database optimized for storing massive amounts of structured data), Kafka (a system for passing high-speed data between applications), Zookeeper (a coordination service for distributed systems), Flume and Sqoop (tools for moving data between systems), Azkaban (a workflow scheduler), and Scala (the programming language used by several of these tools). Each section includes introductions to the technology's core concepts, installation guides, command references, and Java or Scala code examples for common operations. The material is primarily written in Chinese and is structured as a learning path, taking someone from no big data knowledge through hands-on setup and use. You would use this resource if you are a developer or student looking to get started with big data technologies, especially within the Java ecosystem. No prior big data experience is assumed.

Open on GitHub → Explain another repo

Generated 2026-05-21 · Model: sonnet-4-6 · Verify against the repo before relying on details.