explaingit

fivethirtyeight/data

Analysis updated 2026-06-24

17,359Jupyter NotebookAudience · dataComplexity · 1/5LicenseSetup · easy

TLDR

Public archive of datasets and code behind FiveThirtyEight articles, covering politics, sports, economics, and culture. Each folder maps to one published story.

Mindmap

mindmap
  root((data))
    Inputs
      Raw CSVs
      Notebooks
      Index file
    Outputs
      Datasets per story
      Reproducible analyses
    Use Cases
      Reproduce a FiveThirtyEight study
      Teach data journalism
      Practice data wrangling
    Tech Stack
      Jupyter Notebook
      Python
      R
    License
      CC BY 4.0 data
      MIT code
Click or tap to explore — scroll the page freely

Code map

Detail Auto

An interactive map of this repo's files and how they connect — its source is parsed live in your browser. Click Visualize to build it.

filefunction / class

What do people build with it?

USE CASE 1

Reproduce a FiveThirtyEight politics or sports analysis from raw data

USE CASE 2

Use a CSV as a teaching dataset for an intro stats or pandas class

USE CASE 3

Bootstrap a side project with vetted real-world journalism data

What is it built with?

JupyterPythonRCSV

How does it compare?

fivethirtyeight/datastefan-jansen/machine-learning-for-tradingufund-me/qbot
Stars17,35917,32217,322
LanguageJupyter NotebookJupyter NotebookJupyter Notebook
Setup difficultyeasyhardhard
Complexity1/54/54/5
Audiencedatadatadata

Figures from each repo's GitHub metadata at analysis time.

How do you get it running?

Difficulty · easy Time to first run · 5min

Just clone and open the CSVs in pandas or a spreadsheet, no install needed.

Data is free to use and share with attribution under CC BY 4.0, and the code is free to use under MIT.

In plain English

This repository is the public archive of data and code that powered the articles and charts published by FiveThirtyEight, a data journalism outlet. Each folder in the repository corresponds to a story or analysis, containing the raw data files and any code used to process or visualize them. The datasets cover topics FiveThirtyEight wrote about, including sports, politics, economics, and culture. An index file lists all available datasets alongside links to the accompanying articles. The data is released under the Creative Commons Attribution 4.0 license, meaning anyone can freely use and share it with attribution. The accompanying code is under the MIT License. Sports predictions and forecasts in the repository are no longer being updated as of June 2023. The rest of the data archive remains available as a historical record. You would use this if you are a student, journalist, or data analyst who wants to explore real-world datasets from published journalism, reproduce a FiveThirtyEight analysis, or use their data as a starting point for your own work.

Copy-paste prompts

Prompt 1
Pick a recent dataset from fivethirtyeight/data and walk me through loading it in pandas with a chart
Prompt 2
Help me reproduce the FiveThirtyEight NBA Elo ratings using the CSVs in this repo
Prompt 3
Find me 3 small datasets in fivethirtyeight/data good for a beginner pandas tutorial
Prompt 4
Compare the polling data folder in fivethirtyeight/data to what a 2024 election model would need

Frequently asked questions

What is data?

Public archive of datasets and code behind FiveThirtyEight articles, covering politics, sports, economics, and culture. Each folder maps to one published story.

What language is data written in?

Mainly Jupyter Notebook. The stack also includes Jupyter, Python, R.

What license does data use?

Data is free to use and share with attribution under CC BY 4.0, and the code is free to use under MIT.

How hard is data to set up?

Setup difficulty is rated easy, with roughly 5min to a first successful run.

Who is data for?

Mainly data.

Open on GitHub → Explain another repo

This repo across BitVibe Labs

Scan in gitsafehub Deploy in gitdeployhub fivethirtyeight on gitmyhub

Verify against the repo before relying on details.