nytimes/covid-19-data

★ 6,979Audience · researcherComplexity · 1/5Setup · easy

Mindmap

mindmap
  root((repo))
    What it does
      US Covid data archive
      Case and death counts
      Frozen historical record
    Data levels
      National totals
      State by state
      County level
    Extra datasets
      Prison outbreaks
      College outbreaks
      Excess deaths estimate
      Mask use survey
    Format
      CSV files
      Rolling averages
      Methodology docs

mindmap root((repo)) What it does US Covid data archive Case and death counts Frozen historical record Data levels National totals State by state County level Extra datasets Prison outbreaks College outbreaks Excess deaths estimate Mask use survey Format CSV files Rolling averages Methodology docs

Click or tap to explore — scroll the page freely

Things people build with this

USE CASE 1

Analyze Covid-19 case and death trends by state or county for a research paper or data journalism piece.

USE CASE 2

Build an animated choropleth map of pandemic spread across US counties over time.

USE CASE 3

Compare excess death estimates between states or time periods using the pre-calculated rolling averages.

Getting it running

Difficulty · easy Time to first run · 5min

In plain English

This repository is the New York Times archive of Covid-19 case and death data for the United States, collected from the start of the pandemic in early 2020 through March 2023. It is no longer being updated. As of that date, the Times switched to using data from the federal government for its ongoing Covid tracking pages, and this repository was frozen as a historical record. The core of the archive is a set of CSV files, which are plain-text spreadsheets, recording the cumulative count of confirmed and probable Covid-19 cases and deaths over time. These counts are organized at three geographic levels: the whole country in one file, broken down by state in another, and broken down by county in a third. Each row in these files represents a single day and location, with the total number of cases and deaths reported up to that point. Beyond the main case and death counts, the repository includes several additional data sets. One tracks outbreaks in prisons, another in colleges and universities, and a third estimates the elevated number of total deaths during the pandemic compared to historical norms. There is also a one-time survey from July 2020 on mask use by county, and a set of pre-calculated rolling averages intended to smooth out the day-to-day noise in the raw counts. The data was compiled by Times journalists monitoring government announcements and health department releases across all states and territories. The README explains the methodology in detail, including how the team handled inconsistencies in how different states classified and reported cases. Anyone can download the files directly or copy the entire repository. The data has been used for research, journalism, and data visualization projects.

Copy-paste prompts

Prompt 1

Load the nytimes covid-19-data county CSV into a pandas DataFrame and plot the 7-day rolling average of new daily cases for California counties from January to June 2021.

Prompt 2

Using the New York Times state-level Covid CSV, calculate which five states had the highest death rate per 100k residents during the Delta wave peak in summer 2021.

Prompt 3

I am building a choropleth map of Covid spread across US counties. How do I join the county CSV with a GeoJSON file of county FIPS boundaries to render a map in Python using geopandas?

Open on GitHub → Explain another repo

← nytimes on gitmyhub — every repo by this author, as a profile.

Verify against the repo before relying on details.