explaingit

data-context-hq/datacontext

18PythonAudience · ops devopsComplexity · 3/5ActiveSetup · moderate

TLDR

Python library that attaches request, user, and trace context to every database query so you can later trace a slow or weird query back to the code path that ran it.

Mindmap

mindmap
  root((datacontext))
    Inputs
      Service config
      Wrapped functions
      context use blocks
    Outputs
      JSONL events
      OTEL spans
      Callback events
    Use Cases
      Trace slow queries
      Audit AI agent SQL
      Tag tenant queries
    Tech Stack
      Python
      SQLAlchemy
      PostgreSQL
      OpenTelemetry
      BigQuery
      Snowflake

Things people build with this

USE CASE 1

Attribute slow Postgres queries to the request or background job that fired them

USE CASE 2

Tag every BigQuery call with tenant and region for audit logs

USE CASE 3

Capture AI agent generated SQL with actor and operation fields for review

USE CASE 4

Forward query context into OpenTelemetry traces for end-to-end debugging

Tech stack

PythonSQLAlchemyPostgreSQLOpenTelemetryBigQuerySnowflake

Getting it running

Difficulty · moderate Time to first run · 30min

You must call datacontext.configure once at startup and wrap the right data-access functions, otherwise no events fire.

In plain English

DataContext is a Python library for figuring out which part of an application caused a specific database query. In a live service, by the time a slow or strange query reaches the logs or the database team, the link back to the request, background job, or AI agent that triggered it has usually been lost. DataContext attaches that runtime context to every query as it happens, so the people on call can answer the question of who actually sent it. The library is installed from PyPI with pip install datacontext, and it has optional extras for OpenTelemetry, SQLAlchemy, PostgreSQL, BigQuery, Snowflake, Dagster, and dbt. To use it, the developer calls datacontext.configure once at startup, naming the service, the environment, and the data-access function that should be wrapped. From then on, every call to that function emits one finished event when it returns or raises an exception. The author stresses that wrappers keep return values and exceptions exactly as they were, and that if the library itself fails the application keeps running. Each emitted event is a JSON object with a stable shape: the event name, start and end timestamps, the service and environment, the database system and client name, a SHA-256 fingerprint of the query, a sanitized version of the SQL, the duration in milliseconds, and the file, line, function, and short stack of the code that issued it. When the developer wraps a block of code in a context.use(...) block, fields like operation, actor, request_id, tenant, and region are added to every query captured inside it. If OpenTelemetry is active, the trace_id and span_id are attached as well. DataContext is described as deliberately small and early. The supported instrumentation today is manual helpers, function wrapping, native SQLAlchemy, PostgreSQL, BigQuery, and Snowflake hooks, plus Dagster and dbt execution-context attribution. Output goes to JSONL files, a callback function, or an OpenTelemetry-oriented sink. The maintainers say further database clients and ORMs will be prioritized based on real requests in GitHub Discussions and issues. The production-behavior section lists the safety rules: wrappers do not swallow exceptions, capture failures fall back to a minimal event, sink failures are logged and dropped, and raw SQL is opt-in while sanitized text is the default.

Copy-paste prompts

Prompt 1
Install datacontext and configure it for my FastAPI service so SQLAlchemy queries get request and tenant tags
Prompt 2
Wrap a Dagster job in a context.use block so every query inside it carries the operation and actor
Prompt 3
Pipe datacontext events into a JSONL file and write a script that finds the slowest queries by fingerprint
Prompt 4
Send datacontext events into our OpenTelemetry collector and link them to existing HTTP spans
Prompt 5
Enable raw SQL capture in datacontext only for the staging environment with a config flag
Open on GitHub → Explain another repo

Generated 2026-05-22 · Model: sonnet-4-6 · Verify against the repo before relying on details.