explaingit

awesomedata/awesome-public-datasets

Analysis updated 2026-05-18

75,149Audience · dataComplexity · 1/5LicenseSetup · easy

TLDR

A curated directory of thousands of free, publicly available datasets organized by topic, helping researchers and developers find real-world data for analysis and machine learning projects.

Mindmap

mindmap
  root((repo))
    What it does
      Curated dataset links
      Organized by topic
      Link status tracking
      Community maintained
    Categories
      Agriculture
      Biology
      Climate
      Economics
      Finance
      Government
      Health
      Machine Learning
    Use cases
      Start ML projects
      Academic research
      Data exploration
      Training models
    Key features
      Free access
      Working link validation
      Brief descriptions
      Topic organization
Click or tap to explore — scroll the page freely

Code map

Detail Auto

An interactive map of this repo's files and how they connect — its source is parsed live in your browser. Click Visualize to build it.

filefunction / class

What do people build with it?

USE CASE 1

Find real-world datasets to train machine learning models for classification, regression, or clustering tasks.

USE CASE 2

Locate domain-specific data for academic research in fields like climate science, economics, or public health.

USE CASE 3

Discover publicly available government and institutional datasets for data journalism or policy analysis.

USE CASE 4

Browse curated data sources across dozens of topics to explore what kinds of public data exist.

How does it compare?

awesomedata/awesome-public-datasetsnestjs/nesttypicode/json-server
Stars75,14975,40475,540
LanguageTypeScriptJavaScript
Setup difficultyeasymoderateeasy
Complexity1/53/51/5
Audiencedatadeveloperdeveloper

Figures from each repo's GitHub metadata at analysis time.

How do you get it running?

Difficulty · easy Time to first run · 5min
Use freely for any purpose including commercial, as long as you keep the copyright notice.

In plain English

Awesome Public Datasets is a community-maintained directory of high-quality, freely available datasets organized by topic. It does not contain any runnable code or tools, it is a reference list that connects researchers, data scientists, developers, and students to real-world data they can download and use for analysis, machine learning projects, or research. The problem it addresses is discoverability. The internet contains enormous amounts of publicly released data from governments, universities, research institutions, and companies, but finding a specific, high-quality dataset for a given topic typically requires hours of searching across scattered websites. This repository consolidates thousands of those links into one place, organized into categories such as Agriculture, Biology, Climate, Economics, Finance, Government, Health, Machine Learning, Social Networks, Transportation, and dozens more. Each entry in the list is a link to the actual dataset source, typically accompanied by a brief description of what the data contains. Entries are also flagged with a status indicator showing whether the link is currently working or broken, which is useful since data sources sometimes disappear over time. The list is automatically generated from a separate codebase that tracks and validates each link. You would use this repository when starting a data science or machine learning project and needing real data to work with, when doing academic research that requires a specific kind of dataset, or simply when exploring what kinds of public data are available. The datasets span domains from crop yields and genomics to stock prices, public transit, and social media activity. Most entries are free, though the README notes that some require registration or payment. The repository has no specific programming language, it is structured documentation.

Copy-paste prompts

Prompt 1
I need a dataset for a machine learning project on climate or weather prediction. What datasets does awesome-public-datasets recommend in the Climate category?
Prompt 2
Show me how to find and download a free public dataset from awesome-public-datasets for analyzing social networks or transportation patterns.
Prompt 3
I'm doing research on economics or finance. What are the best free datasets listed in awesome-public-datasets for historical stock prices or economic indicators?
Prompt 4
Help me navigate awesome-public-datasets to find a dataset in the Health or Biology category that I can use for a data analysis project.
Prompt 5
What datasets in awesome-public-datasets would be good for building a machine learning model, and how do I check if the links are still active?

Frequently asked questions

What is awesome-public-datasets?

A curated directory of thousands of free, publicly available datasets organized by topic, helping researchers and developers find real-world data for analysis and machine learning projects.

What license does awesome-public-datasets use?

Use freely for any purpose including commercial, as long as you keep the copyright notice.

How hard is awesome-public-datasets to set up?

Setup difficulty is rated easy, with roughly 5min to a first successful run.

Who is awesome-public-datasets for?

Mainly data.

Open on GitHub → Explain another repo

This repo across BitVibe Labs

Scan in gitsafehub Deploy in gitdeployhub awesomedata on gitmyhub

Verify against the repo before relying on details.