Add semantic search to a PostgreSQL table by declaring a Vectorizer on a text column, no manual embedding code needed.
Let non-technical users query a database with plain-English questions instead of SQL via the Semantic Catalog.
Keep AI embeddings automatically up-to-date as records change, with built-in retry for service outages.
Use pgai with any OpenAI-compatible embedding endpoint, including locally-run models, to power AI search in a web app.
Requires a running PostgreSQL database and an API key from an embedding provider such as OpenAI.
pgai is a Python library built by Timescale that turns a PostgreSQL database into a search engine for AI-powered applications. The core problem it solves is keeping text embeddings in sync with your data automatically. Embeddings are numerical representations of text that AI systems use to find related content, and normally you have to write custom code to generate and update them whenever your data changes. pgai handles that for you. The main feature is called the Vectorizer. You tell it which column of your database table contains the text you want to index, and it takes care of generating embeddings for every row, storing them, and updating them as records change. The README describes this as being similar to declaring a database index: you define what you want, and the system manages the underlying complexity. The vectorizer runs as a separate worker process that processes a queue of pending items in batches, with built-in handling for service failures, rate limits, and slow responses from the embedding provider. A second feature is the Semantic Catalog, which lets you query your database using natural language questions instead of SQL. You describe your tables and data in plain language, and the system translates your question into the right SQL query automatically. This is aimed at applications where non-technical users or AI agents need to retrieve data without knowing the schema. pgai works with several embedding providers (OpenAI is used in the quick start example) and with any standard PostgreSQL database, including hosted options like Timescale Cloud, Amazon RDS, and Supabase. Installation is done through pip, and the database components are set up via a CLI command or from Python code during application setup. The architecture is designed so that the main application and the embedding process are decoupled: if the embedding service goes down, it does not affect your core data operations. Jobs queue up and process when the service recovers.
← timescale on gitmyhub — every repo by this author, as a profile.
Verify against the repo before relying on details.