Build a real-time data pipeline that continuously processes live feeds from Kafka or databases without rewriting code for batch mode.
Create an AI question-answering system that automatically updates answers as source documents change in Google Drive or SharePoint.
Set up a RAG pipeline that retrieves and embeds documents, then keeps results fresh as new documents arrive.
Process data from 300+ sources via Airbyte connectors using a single Python codebase that scales across multiple machines.
Requires Rust compilation, Docker/Kubernetes orchestration, and external services (Kafka, PostgreSQL) to run meaningful examples.
Pathway is a Python framework for building data pipelines that can handle both real-time streaming data and traditional batch data using the same code. The core problem it addresses is that most data engineering tools force you to choose between two separate worlds: tools designed for processing data in real time (streaming) and tools designed for processing data in large periodic batches. Pathway lets you write the logic once and run it in either mode, which simplifies development and testing. Under the hood, Pathway is powered by a Rust engine based on a technique called Differential Dataflow, which incrementally updates computation results as new data arrives rather than recomputing everything from scratch. This makes it efficient for continuously incoming data. Despite the Rust engine doing the heavy lifting, you write all your code in Python using Pathway's API, and the framework takes care of multithreading, multiprocessing, and distributed execution automatically. It includes a wide range of connectors to data sources like Kafka (a messaging system), Google Drive, PostgreSQL databases, and SharePoint, plus an Airbyte connector for access to over 300 additional data sources. For AI use cases, Pathway includes LLM helpers, tools for embedding text, splitting documents, querying language models, and building RAG (Retrieval-Augmented Generation) pipelines that stay up to date as source documents change. You would use Pathway when you need a data pipeline that continuously processes live data feeds, or when you want to build an AI question-answering system that automatically updates as your documents change. The tech stack is Python 3.10 and above, deployable via Docker and Kubernetes.
Generated 2026-05-18 · Model: sonnet-4-6 · Verify against the repo before relying on details.