explaingit

beamandrew/medical-data

5,995Audience · researcherComplexity · 1/5Setup · easy

TLDR

A curated reference list of publicly available medical datasets for machine learning research, covering medical imaging, electronic health records, and physiological signals, with access instructions for each.

Mindmap

mindmap
  root((medical-data))
    Imaging Datasets
      Brain MRI
      Lung CT scans
      Skin lesion images
      Retinal photographs
    Clinical Data
      ICU patient records
      De-identified EHRs
      MIMIC-III database
    Physiological Signals
      ECG recordings
      Wearable sensor data
      Time-series measurements
    Access Notes
      Free public download
      Registration required
      Usage terms apply
    Audience
      ML researchers
      Medical AI teams
Click or tap to explore — scroll the page freely

Code map

Detail Auto

An interactive map of this repo's files and how they connect — its source is parsed live in your browser. Click Visualize to build it.

filefunction / class

Things people build with this

USE CASE 1

Find publicly available brain MRI or CT scan datasets with links and access instructions for training a medical imaging model.

USE CASE 2

Locate the MIMIC-III ICU patient record dataset and understand the application process required to gain access.

USE CASE 3

Discover ECG and physiological signal datasets suitable for wearable health monitoring research.

USE CASE 4

Get citation information for well-known medical ML benchmarks like OASIS or LIDC alongside their download links.

Getting it running

Difficulty · easy Time to first run · 5min

Most datasets require registering and agreeing to data use terms before access is granted, downloads are not instant.

License terms vary per dataset, many require signing a data use agreement before access is granted.

In plain English

This repository is a curated list of medical datasets available for machine learning research. It does not contain data itself, it collects links, descriptions, and access instructions for dozens of publicly available medical collections, with notes on whether a dataset requires registration before downloading. The list is organized by data type. The medical imaging section covers datasets for cardiac MRI scans, brain MRI, CT scans, retinal photographs, skin lesion images, lung CT images, X-rays, and more. Several of these are well-known research benchmarks: OASIS covers brain MRI for Alzheimer's studies across hundreds of subjects, LIDC is a spiral CT lung image collection built to support cancer detection algorithms, and the ISIC archive contains over 23,000 classified skin lesion images including malignant and benign examples. Other sections cover electronic health records and clinical data. The MIMIC-III database, which contains de-identified records from tens of thousands of intensive care unit patients, is one of the most widely used datasets in medical machine learning research and appears in the list with notes on how to apply for access. There are also entries for physiological signal data, including electrocardiogram (ECG) recordings and other time-series measurements from wearable sensors and hospital monitors. The repository is intended as a reference document. Each entry typically includes a brief description of what the dataset contains, a link to the dataset or its homepage, and in some cases a citation for the original research paper that introduced it. The list notes that many datasets, especially those containing patient data, require researchers to apply and agree to usage terms before access is granted. The README explicitly asks readers to respect usage restrictions for each listed dataset. The full README is longer than what was shown.

Copy-paste prompts

Prompt 1
I want to train a skin lesion classifier. Which datasets in the medical-data list cover labeled skin images, and how do I apply for access to the ISIC archive?
Prompt 2
I'm researching ICU patient outcomes with machine learning. What does MIMIC-III contain and what steps do I need to follow to apply for access?
Prompt 3
I need a lung CT dataset for cancer detection research. What is the LIDC dataset and what usage restrictions do I need to agree to before downloading?
Open on GitHub → Explain another repo

← beamandrew on gitmyhub — every repo by this author, as a profile.

Verify against the repo before relying on details.