explaingit

awesomedata/awesome-public-datasets

🔥 Hot75,510Audience · dataComplexity · 1/5ActiveLicenseSetup · easy

TLDR

A curated directory of thousands of free, publicly available datasets organized by topic, helping researchers and developers find real-world data for analysis and machine learning projects.

Mindmap

mindmap
  root((repo))
    What it does
      Curated dataset links
      Organized by topic
      Link status tracking
      Community maintained
    Categories
      Agriculture
      Biology
      Climate
      Economics
      Finance
      Government
      Health
      Machine Learning
    Use cases
      Start ML projects
      Academic research
      Data exploration
      Training models
    Key features
      Free access
      Working link validation
      Brief descriptions
      Topic organization

Things people build with this

USE CASE 1

Find real-world datasets to train machine learning models for classification, regression, or clustering tasks.

USE CASE 2

Locate domain-specific data for academic research in fields like climate science, economics, or public health.

USE CASE 3

Discover publicly available government and institutional datasets for data journalism or policy analysis.

USE CASE 4

Browse curated data sources across dozens of topics to explore what kinds of public data exist.

Getting it running

Difficulty · easy Time to first run · 5min
Use freely for any purpose including commercial, as long as you keep the copyright notice.

In plain English

Awesome Public Datasets is a community-maintained directory of high-quality, freely available datasets organized by topic. It does not contain any runnable code or tools, it is a reference list that connects researchers, data scientists, developers, and students to real-world data they can download and use for analysis, machine learning projects, or research. The problem it addresses is discoverability. The internet contains enormous amounts of publicly released data from governments, universities, research institutions, and companies, but finding a specific, high-quality dataset for a given topic typically requires hours of searching across scattered websites. This repository consolidates thousands of those links into one place, organized into categories such as Agriculture, Biology, Climate, Economics, Finance, Government, Health, Machine Learning, Social Networks, Transportation, and dozens more. Each entry in the list is a link to the actual dataset source, typically accompanied by a brief description of what the data contains. Entries are also flagged with a status indicator showing whether the link is currently working or broken, which is useful since data sources sometimes disappear over time. The list is automatically generated from a separate codebase that tracks and validates each link. You would use this repository when starting a data science or machine learning project and needing real data to work with, when doing academic research that requires a specific kind of dataset, or simply when exploring what kinds of public data are available. The datasets span domains from crop yields and genomics to stock prices, public transit, and social media activity. Most entries are free, though the README notes that some require registration or payment. The repository has no specific programming language, it is structured documentation.

Copy-paste prompts

Prompt 1
I need a dataset for a machine learning project on climate or weather prediction. What datasets does awesome-public-datasets recommend in the Climate category?
Prompt 2
Show me how to find and download a free public dataset from awesome-public-datasets for analyzing social networks or transportation patterns.
Prompt 3
I'm doing research on economics or finance. What are the best free datasets listed in awesome-public-datasets for historical stock prices or economic indicators?
Prompt 4
Help me navigate awesome-public-datasets to find a dataset in the Health or Biology category that I can use for a data analysis project.
Prompt 5
What datasets in awesome-public-datasets would be good for building a machine learning model, and how do I check if the links are still active?
Open on GitHub → Explain another repo

Generated 2026-05-18 · Model: sonnet-4-6 · Verify against the repo before relying on details.