explaingit

jane-xiaoer/wtf-was-that-site

15PythonAudience · developerComplexity · 3/5ActiveSetup · moderate

TLDR

Personal bookmark pipeline that watches your Chromium bookmarks, screenshots each new page, asks Gemini to classify it, and writes a structured entry into Notion.

Mindmap

mindmap
  root((wtf-was-that-site))
    Inputs
      Chromium bookmarks
      Chrome extension events
      URL screenshots
    Outputs
      Notion database rows
      Optional Obsidian notes
      Optional Vercel deploy
    Use Cases
      Auto capture interesting sites
      Build a personal tools wall
      Train a personal classifier
    Tech Stack
      Python
      Playwright
      Gemini
      Notion API

Things people build with this

USE CASE 1

Auto-capture every new browser bookmark into a structured Notion database

USE CASE 2

Build a personal tools wall that updates on Notion edits

USE CASE 3

Train a personal Gemini classifier that drifts toward your taxonomy

Tech stack

PythonPlaywrightGeminiNotion APIFSEventslaunchd

Getting it running

Difficulty · moderate Time to first run · 30min

Requires a Notion workspace with a hand-built database schema, a Gemini API key, and Playwright's Chromium download before the first capture.

In plain English

wtf-was-that-site is a personal bookmark-capture pipeline for people who keep finding interesting tools online and then cannot remember what they were called. The pitch: you press Cmd+D in any Chromium browser; a background process notices the new bookmark, opens the page in a headless browser, takes a screenshot, asks an AI model to summarize what the site is, then writes a structured entry into your Notion database with name, category, capabilities, and other fields filled in. What this repository ships is the capture pipeline only, not a hosted service. You bring your own Notion workspace, your own Gemini API key, and optionally your own Vercel project. The Notion database has to be set up manually (or with Notion's official CLI) with a specific set of fields, including Name, URL, Headline, Category, Subcategory, Tags, Capabilities, Use cases, Tech highlights, Cover, Visit count, Status, Last visited, and a free-form My Notes column that is preserved when an entry is re-captured. Two ways to trigger a capture are offered. A FSEvents watcher on macOS reads the Bookmarks file under each installed Chromium browser (Chrome, Edge, Brave, Arc, Vivaldi, Opera, and others), with a typical lag of 0 to 30 seconds. The other option is a small Chrome extension that listens to chrome.bookmarks.onCreated for instant capture. Both can run side by side, and a supplied launchd plist starts the watcher on boot. The pipeline can optionally write a Markdown note to an Obsidian vault, push a card into a separate Next.js 'tools wall' repo and trigger a Vercel redeploy, or send to Feishu; any of those can be skipped by leaving the relevant variables blank. The AI side uses Playwright to capture the page and Gemini 2.5 Flash to classify it against a 12-category taxonomy. When you manually re-categorize a tool in Notion, a feedback collector picks up the edit and feeds it back as a fewshot, so the classifier drifts toward your taste over time. Setup on macOS is described as a five-minute job: clone the repo, create a virtualenv, install Playwright's Chromium, build the Notion database, fill in a .env with NOTION_TOKEN, NOTION_DB_ID, and GEMINI_API_KEY, then run capture.py against an example URL. The repository is in Python with 14 stars.

Copy-paste prompts

Prompt 1
Walk me through setting up the Notion database schema that wtf-was-that-site expects
Prompt 2
Show how to swap the Gemini 2.5 Flash call for a Claude or local model in the capture pipeline
Prompt 3
Help me write a launchd plist that starts the FSEvents watcher on login
Prompt 4
Add a step to the pipeline that also posts a card to a Feishu channel
Open on GitHub → Explain another repo

Generated 2026-05-22 · Model: sonnet-4-6 · Verify against the repo before relying on details.