explaingit

hoveychen/reddit-gems

18HTMLAudience · developerComplexity · 1/5Setup · easy

TLDR

A searchable archive of 12 years of posts from r/coolgithubprojects, 14,604 curated open-source project links filtered by topic, language, and year, browsable from a single HTML file with no setup needed.

Mindmap

mindmap
  root((reddit-gems))
    What it does
      Archive subreddit posts
      Browse 14604 projects
      Filter and search
    Features
      Language filter
      Topic categories
      Year filter
      Favorites list
    Content
      12 years of posts
      14 topic categories
      Inline media playback
    Tech
      Single HTML file
      Python scripts
      Arctic Shift data
Click or tap to explore — scroll the page freely

Code map

Detail Auto

An interactive map of this repo's files and how they connect — its source is parsed live in your browser. Click Visualize to build it.

filefunction / class

Things people build with this

USE CASE 1

Browse 12 years of community-curated GitHub projects filtered by programming language, topic, and year to find tools or inspiration

USE CASE 2

Search across 14,604 open-source projects by topic category to quickly find tools for a specific use case

USE CASE 3

Use the Python rebuild scripts to regenerate the archive from fresh Reddit data or adapt them for a different subreddit

Tech stack

HTMLPython

Getting it running

Difficulty · easy Time to first run · 5min

In plain English

Reddit Gems is a complete archive and browser for every post ever made to the subreddit r/coolgithubprojects, which is a community where people share interesting open-source GitHub projects. The archive covers 12 years of posts from 2014 through 2026. The raw archive contains 25,794 posts pulled from Arctic Shift, a public service that archives Reddit data. After removing duplicates and spam, 14,604 posts remain, organized into 14 topic categories. The browser page is a single HTML file that loads this processed dataset and lets you filter by theme, programming language, year, and score. It also supports search, a favorites feature, and infinite scroll so you can browse through a large number of entries without the page becoming slow. Media from the original posts is handled inline where possible. Images are shown directly in the browser. YouTube videos and Reddit-hosted videos play in place. Multi-image gallery posts display as carousels. The README notes that about 87% of videos and 70% of galleries play inline, older posts where the archive did not capture all metadata fall back to a thumbnail and a link to Reddit. The project includes Python scripts to rebuild the archive from scratch if needed: one to scrape the source data, one to process and classify it into the browser format, and additional scripts for generating sorted markdown digests by language and year. The browser itself requires no build step and runs directly from the files. The site is also deployed live on GitHub Pages. The UI supports both English and Chinese, toggled from a button in the corner. All content belongs to its original Reddit authors, this repository is described as a research and archival resource.

Copy-paste prompts

Prompt 1
I want to browse reddit-gems for interesting Python machine learning projects from 2020 to 2022. How do I use the filter controls in the HTML file to narrow down by language and year?
Prompt 2
I want to adapt the Python scraping and classification scripts in reddit-gems to archive a different subreddit. Walk me through which scripts to run and in what order.
Prompt 3
The reddit-gems browser is not playing an older Reddit-hosted video inline. How does the fallback behavior work and what controls whether a video plays in place vs shows a link?
Open on GitHub → Explain another repo

← hoveychen on gitmyhub — every repo by this author, as a profile.

Verify against the repo before relying on details.