wangzhiwubigdata/god-of-bigdata

★ 10,462Audience · dataComplexity · 1/5Setup · easy

Mindmap

mindmap
  root((Big Data Guide))
    Foundations
      Java basics
      JVM internals
      Distributed theory
      Zookeeper
    Frameworks
      Hadoop and HDFS
      Hive SQL queries
      Spark processing
      Flink streaming
    Databases
      HBase lookups
      Kafka streaming
      OLAP analytics
    Interview prep
      Question sets
      Algorithms
      Hands-on articles

mindmap root((Big Data Guide)) Foundations Java basics JVM internals Distributed theory Zookeeper Frameworks Hadoop and HDFS Hive SQL queries Spark processing Flink streaming Databases HBase lookups Kafka streaming OLAP analytics Interview prep Question sets Algorithms Hands-on articles

Click or tap to explore — scroll the page freely

Things people build with this

USE CASE 1

Study big data engineering concepts in Chinese while preparing for technical job interviews in that field.

USE CASE 2

Use the Flink and Spark sections as a structured reference while learning real-time and batch data processing.

USE CASE 3

Work through the interview question sets to practice answering big data engineering technical interview questions.

Tech stack

JavaHadoopSparkFlinkKafkaHBaseHiveZookeeper

Getting it running

Difficulty · easy Time to first run · 5min

In plain English

This repository is a Chinese-language study guide for people who want to work professionally with big data technologies, particularly those preparing for technical job interviews in that space. The project description translates roughly to "focused on big data learning and interviews, the road to becoming a big data master." All the content, links, and navigation are written in Chinese. The guide is organized into four broad sections. The first covers the programming and infrastructure foundations that a big data engineer needs before touching the specialized frameworks: Java fundamentals, concurrent programming, JVM internals, distributed systems theory, a coordination service called Zookeeper, remote procedure calls, the Netty network library, and Linux basics. Each topic links out to a series of articles, mostly hosted on CSDN (a major Chinese developer blogging platform) or in markdown files inside the repo itself. The second section covers the big data frameworks directly: Hadoop (for storing and processing very large datasets across many machines), Hive (for querying that data using SQL-like syntax), Spark and Flink (two different engines for processing data quickly, including data that is arriving in real time), HBase (a database designed for very fast lookups across huge tables), and Kafka (a system for moving streams of data between services reliably). Each framework gets its own collection of articles covering how it works, how to configure it, and common problems. The third section focuses on practical, hands-on articles the author published across Flink, Spark, Kafka, and OLAP (analytics database) topics. The fourth section is interview preparation: question sets and algorithm topics specifically aimed at big data engineering roles. The repository also links to a WeChat public account and a Bilibili video channel where the author publishes additional material. It is primarily a reading and reference resource, not a runnable codebase. The full README is longer than what was shown.

Copy-paste prompts

Prompt 1

I'm preparing for a big data engineering interview and need to understand the difference between Spark and Flink for real-time processing. Explain the key trade-offs.

Prompt 2

Help me understand how Kafka integrates with Flink for building a real-time data pipeline, based on common big data engineering patterns.

Prompt 3

I'm studying HBase for a technical interview. Explain the architecture and when you would choose HBase over a traditional relational database.

Prompt 4

Walk me through how HDFS and MapReduce work together in Hadoop, at the level I'd need to explain it clearly in an interview.

Open on GitHub → Explain another repo

← wangzhiwubigdata on gitmyhub — every repo by this author, as a profile.

Verify against the repo before relying on details.