Discover which distributed database or stream-processing framework fits your data engineering project by browsing the categorized tool list.
Find academic papers, books, and video talks on distributed systems and big data to deepen your understanding of the underlying technology.
Quickly compare options for a specific need such as time-series databases, graph databases, or ML platforms all in one place.
Awesome Big Data is a curated reference list for anyone who wants to explore the ecosystem of tools and technologies used to handle large volumes of data. It is not software you install or run. It is a structured collection of links, organized into categories, that points you toward relevant projects, frameworks, databases, and learning resources. The goal is to give developers, data engineers, and researchers a starting point when they need to discover what exists in a given area. The list is divided into many categories covering different aspects of working with large datasets. On the storage side, it covers traditional relational databases like PostgreSQL and MySQL, as well as document databases, key-value stores, graph databases, columnar databases, and time-series databases. Each of these represents a different way of organizing and querying data, suited to different kinds of problems. There are also sections on distributed file systems, which are systems that spread large files across many machines, and distributed programming frameworks, which are tools for running computations across clusters of computers in parallel. Other categories address data ingestion (getting data from various sources into a system), stream processing (analyzing data in real time as it arrives rather than in batches), scheduling (coordinating when and how jobs run), machine learning platforms, search engines, security, and business intelligence tools used for reporting and dashboards. There is also a section on data visualization tools for turning data into charts and graphs. Beyond software tools, the list includes curated readings: academic papers on distributed systems and big data research from 2001 onward, video talks, and books on topics including streaming systems, distributed databases, and graph-based approaches to data analysis. The list is community-maintained and openly accepts contributions. It follows the "awesome list" convention, a format popularized across GitHub where curators gather high-quality links on a specific topic into a single structured document. The full README is longer than what was shown.
← oxnr on gitmyhub — every repo by this author, as a profile.
Verify against the repo before relying on details.