explaingit

nytimes/covid-19-data

6,979Audience · researcherComplexity · 1/5Setup · easy

TLDR

The New York Times' historical archive of US Covid-19 case and death counts from early 2020 through March 2023, organized by country, state, and county in plain CSV files and no longer updated.

Mindmap

mindmap
  root((repo))
    What it does
      US Covid data archive
      Case and death counts
      Frozen historical record
    Data levels
      National totals
      State by state
      County level
    Extra datasets
      Prison outbreaks
      College outbreaks
      Excess deaths estimate
      Mask use survey
    Format
      CSV files
      Rolling averages
      Methodology docs
Click or tap to explore — scroll the page freely

Code map

Detail Auto

An interactive map of this repo's files and how they connect — its source is parsed live in your browser. Click Visualize to build it.

filefunction / class

Things people build with this

USE CASE 1

Analyze Covid-19 case and death trends by state or county for a research paper or data journalism piece.

USE CASE 2

Build an animated choropleth map of pandemic spread across US counties over time.

USE CASE 3

Compare excess death estimates between states or time periods using the pre-calculated rolling averages.

Getting it running

Difficulty · easy Time to first run · 5min

In plain English

This repository is the New York Times archive of Covid-19 case and death data for the United States, collected from the start of the pandemic in early 2020 through March 2023. It is no longer being updated. As of that date, the Times switched to using data from the federal government for its ongoing Covid tracking pages, and this repository was frozen as a historical record. The core of the archive is a set of CSV files, which are plain-text spreadsheets, recording the cumulative count of confirmed and probable Covid-19 cases and deaths over time. These counts are organized at three geographic levels: the whole country in one file, broken down by state in another, and broken down by county in a third. Each row in these files represents a single day and location, with the total number of cases and deaths reported up to that point. Beyond the main case and death counts, the repository includes several additional data sets. One tracks outbreaks in prisons, another in colleges and universities, and a third estimates the elevated number of total deaths during the pandemic compared to historical norms. There is also a one-time survey from July 2020 on mask use by county, and a set of pre-calculated rolling averages intended to smooth out the day-to-day noise in the raw counts. The data was compiled by Times journalists monitoring government announcements and health department releases across all states and territories. The README explains the methodology in detail, including how the team handled inconsistencies in how different states classified and reported cases. Anyone can download the files directly or copy the entire repository. The data has been used for research, journalism, and data visualization projects.

Copy-paste prompts

Prompt 1
Load the nytimes covid-19-data county CSV into a pandas DataFrame and plot the 7-day rolling average of new daily cases for California counties from January to June 2021.
Prompt 2
Using the New York Times state-level Covid CSV, calculate which five states had the highest death rate per 100k residents during the Delta wave peak in summer 2021.
Prompt 3
I am building a choropleth map of Covid spread across US counties. How do I join the county CSV with a GeoJSON file of county FIPS boundaries to render a map in Python using geopandas?
Open on GitHub → Explain another repo

← nytimes on gitmyhub — every repo by this author, as a profile.

Verify against the repo before relying on details.