Analysis updated 2026-06-24
Stand up an HDFS cluster to store petabytes of log data on commodity servers
Run a MapReduce batch job across a multi-node cluster for nightly ETL
Serve as the storage layer underneath Hive, Spark, or HBase
| apache/hadoop | android10/android-cleanarchitecture | konloch/bytecode-viewer | |
|---|---|---|---|
| Stars | 15,545 | 15,548 | 15,511 |
| Language | Java | Java | Java |
| Setup difficulty | hard | moderate | easy |
| Complexity | 5/5 | 3/5 | 3/5 |
| Audience | data | developer | developer |
Figures from each repo's GitHub metadata at analysis time.
Real clusters need careful JVM, networking, and HDFS tuning, single-node mode is fine for learning but not realistic.
Based on the description and topics, this appears to be Apache Hadoop, a widely-referenced open-source Java framework for distributed storage and processing of large datasets. The README does not provide further detail.
Apache Hadoop is an open-source Java framework for distributed storage and batch processing of very large datasets across clusters of commodity machines.
Mainly Java. The stack also includes Java, HDFS, YARN.
Apache License 2.0, a permissive open-source license that allows broad commercial and modified use with attribution.
Setup difficulty is rated hard, with roughly 1day+ to a first successful run.
Mainly data.
This repo across BitVibe Labs
Verify against the repo before relying on details.