Find real-world datasets to train machine learning models for classification, regression, or clustering tasks.
Locate domain-specific data for academic research in fields like climate science, economics, or public health.
Discover publicly available government and institutional datasets for data journalism or policy analysis.
Browse curated data sources across dozens of topics to explore what kinds of public data exist.
Awesome Public Datasets is a community-maintained directory of high-quality, freely available datasets organized by topic. It does not contain any runnable code or tools, it is a reference list that connects researchers, data scientists, developers, and students to real-world data they can download and use for analysis, machine learning projects, or research. The problem it addresses is discoverability. The internet contains enormous amounts of publicly released data from governments, universities, research institutions, and companies, but finding a specific, high-quality dataset for a given topic typically requires hours of searching across scattered websites. This repository consolidates thousands of those links into one place, organized into categories such as Agriculture, Biology, Climate, Economics, Finance, Government, Health, Machine Learning, Social Networks, Transportation, and dozens more. Each entry in the list is a link to the actual dataset source, typically accompanied by a brief description of what the data contains. Entries are also flagged with a status indicator showing whether the link is currently working or broken, which is useful since data sources sometimes disappear over time. The list is automatically generated from a separate codebase that tracks and validates each link. You would use this repository when starting a data science or machine learning project and needing real data to work with, when doing academic research that requires a specific kind of dataset, or simply when exploring what kinds of public data are available. The datasets span domains from crop yields and genomics to stock prices, public transit, and social media activity. Most entries are free, though the README notes that some require registration or payment. The repository has no specific programming language, it is structured documentation.
Generated 2026-05-18 · Model: sonnet-4-6 · Verify against the repo before relying on details.