explaingit

timescale/pgai

5,798PLpgSQLAudience · developerComplexity · 3/5Setup · moderate

TLDR

A Python library and PostgreSQL extension that auto-generates and syncs AI text embeddings in your database, so you can build semantic search without writing custom sync code.

Mindmap

mindmap
  root((pgai))
    What it does
      Embedding sync
      Semantic search
      NL queries
    Features
      Vectorizer
      Semantic Catalog
      Failure recovery
    Tech
      Python
      PostgreSQL
      OpenAI API
    Use cases
      AI search apps
      Knowledge bases
      NL database query
Click or tap to explore — scroll the page freely

Code map

Detail Auto

An interactive map of this repo's files and how they connect — its source is parsed live in your browser. Click Visualize to build it.

filefunction / class

Things people build with this

USE CASE 1

Add semantic search to a PostgreSQL table by declaring a Vectorizer on a text column, no manual embedding code needed.

USE CASE 2

Let non-technical users query a database with plain-English questions instead of SQL via the Semantic Catalog.

USE CASE 3

Keep AI embeddings automatically up-to-date as records change, with built-in retry for service outages.

USE CASE 4

Use pgai with any OpenAI-compatible embedding endpoint, including locally-run models, to power AI search in a web app.

Tech stack

PythonPostgreSQLPLpgSQLOpenAI

Getting it running

Difficulty · moderate Time to first run · 30min

Requires a running PostgreSQL database and an API key from an embedding provider such as OpenAI.

In plain English

pgai is a Python library built by Timescale that turns a PostgreSQL database into a search engine for AI-powered applications. The core problem it solves is keeping text embeddings in sync with your data automatically. Embeddings are numerical representations of text that AI systems use to find related content, and normally you have to write custom code to generate and update them whenever your data changes. pgai handles that for you. The main feature is called the Vectorizer. You tell it which column of your database table contains the text you want to index, and it takes care of generating embeddings for every row, storing them, and updating them as records change. The README describes this as being similar to declaring a database index: you define what you want, and the system manages the underlying complexity. The vectorizer runs as a separate worker process that processes a queue of pending items in batches, with built-in handling for service failures, rate limits, and slow responses from the embedding provider. A second feature is the Semantic Catalog, which lets you query your database using natural language questions instead of SQL. You describe your tables and data in plain language, and the system translates your question into the right SQL query automatically. This is aimed at applications where non-technical users or AI agents need to retrieve data without knowing the schema. pgai works with several embedding providers (OpenAI is used in the quick start example) and with any standard PostgreSQL database, including hosted options like Timescale Cloud, Amazon RDS, and Supabase. Installation is done through pip, and the database components are set up via a CLI command or from Python code during application setup. The architecture is designed so that the main application and the embedding process are decoupled: if the embedding service goes down, it does not affect your core data operations. Jobs queue up and process when the service recovers.

Copy-paste prompts

Prompt 1
Using pgai with OpenAI, set up a Vectorizer on the description column of my PostgreSQL products table so I can run semantic similarity searches against it.
Prompt 2
I want users of my app to ask questions about my database in plain English, show me how to configure pgai Semantic Catalog on a Supabase database.
Prompt 3
My embedding provider went down and pgai queued pending jobs, how does recovery work and what should I monitor to confirm jobs completed?
Prompt 4
Compare pgai to writing my own embedding sync pipeline: what specific tasks does pgai handle that I would otherwise have to code and maintain?
Prompt 5
Install pgai in a Python app and configure it to use a locally-run embedding model via an OpenAI-compatible endpoint.
Open on GitHub → Explain another repo

← timescale on gitmyhub — every repo by this author, as a profile.

Verify against the repo before relying on details.